Dark Data

Science is so important to society and policy today that it is pressed to have an answer to everything.  Over time, the practice of science has become more reluctant to admit that there are limits to scientific knowledge.  

In the past, science would point out that some knowledge is beyond what can be supported by evidence and rigorous testing.

Today, we expect science to have an answer to everything.  Science accommodates by filling in missing knowledge using models.   The models generate data to fill in what is unobserved.   Having filled the gaps, the scientists presents the results and society accepts the results as scientific and thus beyond debate.

Examples of science doing this are in the introduction of data to fill in the gaps of what is observable and known.   While useful as motivation for future research, these results are presented as scientific facts.   Examples include propositions of dark energy to account for the majority of energy in the universe, dark mass as the majority of mass in galaxies, and missing common ancestors in the fossil record.    I grant that there is very good reason to believe these may in fact exist, but this is not a scientific discovery, it is invented data.

Again, for scientific inquiry, there may be a value to proposing something to take the place of an unknown.   But that value is useful in the context of designing experiments to focus on these unknowns.   But these ideas are instead being popularized as scientific knowledge, as facts that deserve our confidence.

I think the public would be better served by a more confident science that draws the line for the public: a line that says science stops abruptly at the line where there is convincing proof.   We can use models to project beyond that line, but modeling results are the product of the application of science, not a product of science itself.   Application of science should come with an implicit disclaimer of “use at your own risk”.   In contrast, a product of science has no need for such disclaimer.

Modern science has loosened its standards to claim products of applications of science are as equally scientific as products of science itself.  This is how science is promoted to society as part of the gold standard of the scientific method.

It sets a bad example because that means we can all claim as scientific facts any product of an application of science where that result is not obtainable by direct experimentation of evidence.   Anything predicted by a science-based model is scientific.

My concern is not about the actual sciences.  My concern is public policy.

Of recent concern is the current debates about the use application of big data.  In particular, the exploitation of bulk metadata to seek answers about particular groups or individuals.

There are recent studies that demonstrate that something someone is trying to keep secret can be discovered by their metadata.   This raises legitimate alarms about privacy concerns.

But my biggest concern is the dark data.  Just like in cosmology where most of the mass is dark, and most of the energy is dark, so too is for individual most of the information is dark.   Even with access to contents of individual communications, most of the data about the truth is as unknowable as the mysterious stuff in the universe.  Obviously with metadata, the problem of dark data is even worse.

Some someone really found out that he has a serious illness he wants to keep secret.  His metadata shows an unusual call to a doctor’s office, followed a few days later by a call to a medical-results line, followed shortly after to a hospital or specialist doctor.  This can suggest that his secret is compromised.

But that suggestion is only met by filling in the narrative with dark data.  Another word for dark data is story telling.  

People like telling stories.  And the example set by the scientists is that stories are perfectly acceptable as long as the stories fit the constraints of a scientific model.

For social sciences, the scientific models are statistical.   The statistics do describe the actual measurements in the experiment and good studies control variables to improve the applicability of the result to a wider population.  The statistical study is scientific.  The application to the general population is not science, it is an application of science.  

Application of science has that implicit warning “use at your own risk”.  What I worry about we are accustomed to ignoring this disclaimer.  We are conditioned by scientists themselves that application of science carries no risk: application of science in effect is the same thing as science.   Take it to the bank.

We are experiencing a big data explosion.  Lots of people are having access to it, often for the first time and with minimal training in thinking about that data.   Too often, we are over confident of what the results of our queries are telling us.  We run with what we find.

That running can put someone at risk not only for prosecution or public condemnation or embarrassment, but that someone will have no recourse because the accusations are supported by a big data query.

Less melodramatic, but more concerning, is that the the big data results will inform policy makers who will not bother to question the results.  The results came from a scientific model and dark data was filled in with model-generated numbers.   Easy decision, and just in time for lunch.

Again, it would be helpful to have the authority of scientific community to draw attention to the need for extreme skepticism of the application of science and in particular to the story-telling that bridges over dark data.


6 thoughts on “Dark Data

  1. Pingback: Evidence-based decision making when dark data contaminates the evidence | kenneumeister

  2. Pingback: Improving government with frequently updated laws: rule by data | kenneumeister

  3. Pingback: Do we need narratives | kenneumeister

  4. Pingback: Fake News: A Dedomenocratic Perspective | Hypothesis Discovery

  5. Pingback: Evidence-based decision making when dark data contaminates the evidence | Hypothesis Discovery

  6. Pingback: Improving government with frequently updated laws: rule by data | Hypothesis Discovery

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s