Indifference to dark data

We should learn from recent experience of large data technologies the lesson that decision making can benefit from streaming data in addition to (and often instead of) the publication science of one-time experiments.    It is clear now that policy making needs access to a continuous stream fresh data about old ideas, especially when that data accumulates over time.   With access to the technologies to do this work, it is unacceptable to base policies on the failed approaches of the past that rely on published studies.

Advertisements

Spontaneous Data

With big data, we end up with deep historical data from distant events. There will be something needed to fill in the gaps that were mysteries at the time. That gap filler will be spontaneous data whether we acknowledge it or not. Even if we as humans leave the gap unfilled, we can’t be sure that our data analytics or machine learning algorithms won’t fill it. When it does, how can we be sure it won’t come up with a supernatural explanation that it keeps to itself?

Dark nothing hypothesis macro-sized particles

The popular dark-matter hypothesis takes for granted the existence of fundamental particles that are outside of human capacity to observe. The hypothesis in the first article is that these hidden particles are as-yet undetected peers of sub-atomic particles we already know. The lack of perturbation of post-collision dark matter implies that if such sub-atomic dark-matter particles exist, they do not collide individually like particles we know. My conjecture is that the entire blob depicted in ghastly blue in the visualization is a single particle, or an agglomeration of galaxy-sized fundamental particles. The collisions didn’t affect these particles because the collisions are trivial for the scale of these particles.

Science of data contrasted with science

In this blog, I have been discussing a perspective gained from my specific experience working directly with a large amount of diverse data.   When I started that project, I was thinking only of the technical challenges of getting data, preparing it for the data store, and doing something useful with it.   These were…

Science based on Observations

In my last post, I described how sciences dealing with the immediate physical world involve models where time is an independent variable.  For dealing with the world, we have theories that connect causal events by the elapsed time between those events.   In contrast, the science of studying the record of past events is dependent…