Materialize the model to level the competition with observations

Having model data explicitly materialized into tables gives the data clerk to recognize the deficiency that this data is not observed data. This provides the data clerk the opportunity to ask whether there can be another source for this data. Perhaps, for example, some new sensor technology became available that provides observations that previously required models to estimate. The analyst can then revise the analysis to use that new data instead of the model-generated data.

Dark Data in Aggregates or Summaries

In this series of blog posts, I repeatedly invoked the term Dark Data to refer to data is made up to fill in for data we wish we had direct observations.   I used the term dark in the way astronomers use it in talking about dark matter and dark energy.    We know something…

Dark Data at individual level

In earlier posts, I chose the word dark data to refer to accounted data where corrupted or missing data fails to account for something we assume must exist. In my experience, I worked with data that often was missing some data points.  From the very start of the project, it was recognized that data will…

Dark Data

Science is so important to society and policy today that it is pressed to have an answer to everything.  Over time, the practice of science has become more reluctant to admit that there are limits to scientific knowledge.   In the past, science would point out that some knowledge is beyond what can be supported…

Serendipitous Data

In my prior job, I was working with large amounts of data with a lot of properties.  To do my tasks, I created sometimes complex SQL queries to get the information I wanted.   The project started with modest goals but with schedules as short as a few hours to not only come up with…