Masters of Ignorance: effective data analysis

We should study observations separately from derivations from theories.   The deliberately ignorant takes the position that data is superior to science.   There is a valid place for the deliberately ignorant when included in teams with domain experts representing each of the relevant scientific disciplines.   In order to work, the deliberately ignorant needs to be skilled at his craft of being ignorant in the right way to propel the team towards a new solution without annoying everyone to the point of being expelled.

Indifference to dark data

We should learn from recent experience of large data technologies the lesson that decision making can benefit from streaming data in addition to (and often instead of) the publication science of one-time experiments.    It is clear now that policy making needs access to a continuous stream fresh data about old ideas, especially when that data accumulates over time.   With access to the technologies to do this work, it is unacceptable to base policies on the failed approaches of the past that rely on published studies.

Spontaneous Data

With big data, we end up with deep historical data from distant events. There will be something needed to fill in the gaps that were mysteries at the time. That gap filler will be spontaneous data whether we acknowledge it or not. Even if we as humans leave the gap unfilled, we can’t be sure that our data analytics or machine learning algorithms won’t fill it. When it does, how can we be sure it won’t come up with a supernatural explanation that it keeps to itself?

Fake News: A Dedomenocratic Perspective

What really makes legacy news fake is the tyrannical influence of past narratives that influence what future observations we accept. Fake news is the need to keep old narratives relevant when the such a narrative never would have emerged if started from scratch with the data available at the current moment.

Materialize the model to level the competition with observations

Having model data explicitly materialized into tables gives the data clerk to recognize the deficiency that this data is not observed data. This provides the data clerk the opportunity to ask whether there can be another source for this data. Perhaps, for example, some new sensor technology became available that provides observations that previously required models to estimate. The analyst can then revise the analysis to use that new data instead of the model-generated data.