Critical theory in data science: extracting dark dark data for scrutiny

Yesterday I encountered a post describing the role of critical theory in social sciences.  My initial reaction was negative.  I regret the focus on critical theory at the expense of learning how to collect and argue real evidence.  I have an inherent distrust of critical theory goals, but my stronger objection is that we need skills in classical approach to critical thinking about actual evidence.  In earlier posts, I suggested that data science can benefit from those who are trained in social sciences because they are trained to get the best information (and rejecting the worst information) out of a limited amount of evidence available.  In data science, we may talk about big data with billions or trillions of records compared to social-science quantities of a few dozen data points (evidence), but we still encounter the same basic problem of trying to answer questions that are not perfectly supported by the evidence.   We need to make decisions based on the available evidence and those decisions should involve deep scrutiny about the data’s relevance, reliability, and authority.   These are classical skills of what I call the historical sciences that include social sciences.   In earlier posts (such as this one), I described the problem that the current data-science field lacks emphasis on skills to scrutinize the data itself instead of the more straight-forward technical skills of computing science.

I had imagined that I could find this data-scrutiny skill in the social sciences.   When I see someone’s resume with a background in one of the historical sciences (such as social science), my first reaction is a positive one in that I assume this person will understand how to approach evidence and challenge that evidence for relevance for a particular argument.  This is the skill I most want on data projects.   I assume the computing science techniques are trainable.  The core aptitude and attitude to challenge evidence needs more rigorous education in a college setting, more rigor than I can provide on the job.  Thus when I hear of talk of critical theory and its growing importance in these fields, I feel disappointed.   This is not the skill I want from a social scientist.

Classical approaches to critical thinking about evidence involves scrutinizing the evidence for its suitability for a particular argument and evaluating how far the evidence advances the theory.   In contrast, critical theory dismantles the evidence itself and raised doubts about any meaning we attempt to assign to it.   Instead of taking an observation and finding out how it relates to an argument, critical theory denies the legitimacy of the observation in the first place.

In the above post on critical theory, he illustrates the theory with the passage that starts:

I am tired. The early morning sun is shining through my window. I roll out of bed and into my slippers, not ready to face the day.

A simple enough passage, yes? Nothing in particular happens — just a tired character waking up in the morning, right? Perhaps literally. But let us examine this scene through a literary critical lens, and see what it can reveal.

The first paragraph is the available evidence.  The “literally” is exactly the evidence available to us.   This particular evidence is a first person account of some experience.  We can consider this evidence for its value for a particular argument.   I assume the narrator is a reliable chronicler of his morning experience.   My focus is on whether any of this data is relevant to an argument I’m trying to make and if so how that data would fit.   Perhaps the argument involving finding an explanation for some event that occurred later in the day.  Could this morning experience support a claim, for example, that this person’s behavior may the result of poor sleep the night before?   Does this evidence actually provide proof of lack of sleep?  These are critical thinking questions about relevance of the evidence, but takes the evidence as it is presented.

Instead the author focuses his attention on the underlying biases we have in reading the statement and through that process discredits any meaning in the observation at all.   The meaning we assign to the statement comes from our prejudices and inescapable biased views about what kind of creature embodies this narrator, what it means to live with a bed that can be rolled out of with slippers conveniently nearby.  The narrator may be completely unlike ourselves, of a different race, sex, and culture.  The bed and the slippers may actually be some exotic construction that can meet the needs for these functions.   The feeling of being tired or being ready for a day may be unlike anything I experienced.   The sun may be seen from a far northern latitude and the window could consist of a layer of ice, or the sun may be from a different solar system being one of 3 suns and thus this is one of 3 mornings.   Ultimately, the passage has no meaning at all except for the meaning I assign it from my own biased perspective and experience.

I distinguish critical thinking from critical theory by the fact that critical thinks accepts evidence as being useful while critical theory denies and general utility of evidence.   When I study data, that data is evidence of an observation.   I presume that the observation is real.  Critical thinking starts with the presumption that observation is a valid observation.  This does not make the remaining work easy.  The remaining work is the careful scrutiny of the validity or relevance of the observation to an argument in a way that I can defend the argument against those who have opposing arguments.

Critical theory doesn’t help with arguments except to make arguments impossible because there can be no universally accepted evidence.   All evidence is relative to the specific person who reviews that evidence.

The above describes my initial reaction.  However, after thinking about it some more, I realized that critical theory exists in my concept of different qualities of data.   In earlier posts I described a taxonomy of different qualities of data.  At one end of this taxonomy is the rare bright data that is perfectly documented and controlled so that we have no doubt about what it says.   Bright data is the data that is not susceptible to objection by critical theory.   Most data is dim with some ambiguity about what it measures and how reliably it measured it.   Dim data is also outside of objection by critical theory because there is no interpretative model in place.   Instead, dim data involves some poor documentation or some imprecision in a measurement.

In my taxonomy discussions, I focused most of my attention on what I call dark data, a term I invented to refer to model-generated data that comes from our theories instead of direct observations of the real world.   Note that my use of dark data differs from standard practice that used the term for what I label as unlit data: real observation data that has never been validated or documented.   I think it is important to separate dark data from actual observation.

In many posts, I described how dark data gets in the way of discovering new information about the real world.  Dark data substitutes for missing observations.   I use the term in analogy to the terms dark matter and dark energy that cosmologists use to describe missing observations in astronomy.   Dark data occurs any time we fill in missing observations with predicted observations from our theories.   Even when we have extreme confidence in those theories, the fact remains we do not have an actual measurement.

The problem with dark data is that it prevents us from learning anything new about the real world.   We assume we already know what we would have seen if an observation were possible.   Another problem is that once we admit the model-generated data in the same data store as the observational data, we start to build new theories that gives equal credibility to observation and model-generated data.

There is a role for model-generated data late in the decision-making process but the model must work on clean observational data.   Mixing model-generated data with observational data contaminates the data for later modeling.

I have written many posts to discuss my suspicions of model-generated data.   I value the ability to recognize what is model-generated and what is an actual observation.   It now occurs to me that critical theory is relevant for this project of making the distinction between observation and models.   Critical theory is about separating the model from the observation.   Once separated, we can challenge that the validity of the model part.   This is my argument in the last link about dark matter contaminating theories of abundance of ultraviolet radiation from galaxies.   The dark matter is a model, not an observation.   Our interpretation of ultraviolet radiation data may be mislead by the data supplied by our model of dark matter.

It is a leap to compare the thinking to about the group-identity of a narrator in a story to the thinking of reconciling physical measurements of ultraviolet light and galactic masses.   However, at the level of the scientist’s aptitude and attitude, the skill sets are similar.  Critical theory teaches the lesson of that our biases can mislead us.

Although it is possible that our biases can mislead us, this is not a certainty.   I worry that modern social science curriculum approaches critical theory to the extreme where the expectation is that our bias must always deceive us.

I agree with the notion that as should be able to recognize the distinction of our models we bring and the limited observation of evidence of the real world.  In the literary passage, my presumption of the narrator as a modern human may is invalid because the narrator may be as easily be a Neanderthal, his bed is a stone slab, and his slippers are shards of bark.   However, in the cosmology example, the model is dark matter and this model has a broad consensus of its meaning and its validity.  I worry that critical theory would demand that we reject dark matter entirely because it is just on model of many alternative models.  There is a minority of scientists that have doubts about dark matter, and there is no way I can prove them right or wrong.   Dark matter has that name because its existence lacks direct evidence.

Critical theory is appealing trait for data science because it represents the aptitude to identify and isolate models from observations.   However, I do not want to dismiss models or relegate them into a relativism of one model being as good as another.  I only want to isolate models from observational data.  I want to apply special scrutiny on the model data when the observational data begins to conflict with that model.

My complaint about dark matter, for instance, is that often it is used as an observation while at the same time we know it cannot be observed.   Dark matter is valid to use in a theory but treated separately from actual observations.   For example, when we have observations of a certain level of ultraviolet light from a galaxy, we can not claim that our observation of dark-matter contradicts that observation.   Model-generated data is a second-class sort of data when compared with observational data.   When they conflict, I would prefer to suspect the model instead of the observation.  This is how we can discover new hypotheses.  If I dismiss the observation in favor of the model, then I can only confirm what I already think I know.

Critical theory has a place in data science as tool to identify models behind observations.   Once models and observations are separated, we should be able to continue to use classical argument approaches to build an argument.

In terms of education, I think it is far harder to train good argumentative skills than it is to train critical theory skills.   Critical theory builds on the natural trait we see in young children who ask “what about this? or have you considered that?”.   We can benefit from improving our skills in challenging models, but I don’t find this as hard as being able to construct from available evidence a strong argument that can withstand the challenges of an adversarial peer.

It is my impression that modern social science (or humanities) curricula places too much emphasis on critical theory (challenging our ability to observe anything) at the expense of critical thinking (building an argument with evidence).   I prefer to see more emphasis on the critical thinking of construction and defending arguments because I think these are far harder to learn and to train.

In my experience, I found it relatively easy to train on the job about suspecting models in data just as I found it relatively easy to learn new computer-science and statistical technologies.  What I found difficult was training to make competent use of the evidence to construct a defensible argument.   The biggest source of that difficulty comes from under-developed attitude to recognize how an argument may be challenged and then take preemptive measures to defend those challenges.

Development of this critical-thinking attitude is too costly to learn on the job.   My initial interest in a someone trained in social-science background comes from my hope that this is a person who understands how to relate evidence to arguments or decision-making.   If instead I get someone who only knows critical-theory, I will be very disappointed.

I appreciate most about social sciences (and related studies) the critical thinking skills to construct and defend arguments based on evidence.  The critical theory can add benefit in terms of separating the models from the observations, but it must permit the preservation of some common-ground evidence for supporting arguments to allow new discoveries.   Critical theory that denies any objective truth unclouded by models makes for interesting conversations, but it is not something I would pay a salary for.


One thought on “Critical theory in data science: extracting dark dark data for scrutiny

  1. Pingback: Critical theory manufactures dark data, my concern about hiring social sciences | kenneumeister

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s