Evidence-based decision making when dark data contaminates the evidence

In earlier posts, I complained about obligating society to make and accept decisions based on evidence.   I am objecting particularly to the subset of such decisions that come from big data analytics where machines can completely control the information supply chain from collecting observations to enacting decisions that all must follow.  But more generally, the concept of evidence-based decision making requires us to make decisions supported by evidence (documented observations of past events) and this concept considers any decisions not supported by evidence to be invalid.    This concept gives absolute authority to evidence, and dismisses any role for human fears and doubts, even those from qualified humans society appoints as decision-makers.

In my last post, I attempted to defend the role of fears and doubts in decision making.   While decision-making should consider all relevant evidence including the evidence of our uncertainty of known issues, the human decision-maker needs to consider the possibilities that there could be surprising issues we have not yet imagined and that we do not yet know nothing about.   We employ humans as decision-makers so that they can use their judgement to exercise appropriate fears and doubts.   The core expectation we have for decision-makers is that they consider whether the evidence for a decision overcomes their fears and doubts.

After a decision is made, we will hold human decision-makers accountable for any bad consequences.  That accountability requires them to defend their judgement in balancing their fears and doubts against the evidence available at the time.   If the decision based on evidence produces a bad result (such as the failures we observed in the treatment of the first Ebola patient in USA), we need to be persuaded that the decision-maker satisfactorily evaluated reasonable fears and doubts when considering that evidence.  Alternatively, if the decision is based on fear and doubt, then we also need persuasion that the fears and doubts were reasonable.   Either justification for a decision is acceptable as long as we can be persuaded of it despite the consequences providing new evidence of bad outcomes.   The newly acquired evidence will influence future decisions even as we excuse the decision maker for making the original decision.

As I mentioned in earlier posts, if we do not require this human accountability for quality of judging fears and doubts, then we can automate decision-making and eliminate the human decision-maker entirely.  Algorithms can assess a statistical recommendation based on both the known evidence and the known issues where there are some uncertainties.   These algorithms can present final decisions if fears and doubts are not required.   Although there may be research into algorithms to emulate fear and doubt, I do not accept that they will ever replace human fears and doubts.   If we want to include fears and doubts into the decision-making process, we will want a human in that role so that we can have a dialog as peers who share fears and doubts we can recognize ourselves.

As a society, we could decide that fears and doubts are not legitimate considerations for decision-making and we are only waiting for technology to catch up so that we an automate a pure evidence-based decision-maker.

In my opinion, the decisions made throughout this initial Ebola case were strictly evidence-based.  From an evidence-based perspective, the decisions (later determined to be errors) were the only decisions available.   CDC guidelines, hospital practices, and practitioner training all were up-to-date and each of these incorporated all of the best evidence at the time, assuming due diligence in keeping these policies up to date.    Evidence-based decision-making requires following approved policies despite any individual’s fears and doubts.  When a crisis condition occurs, it is not time to second guess established wisdom in form of policies that are in place at the time.

Although we later learned that the policies had problems, that was not proven until we collected new evidence that showed that the policies were insufficient.   Evidence-based decision-making accepts the fact that we will encounter new evidence as we enact decisions because this new evidence will influence future evidence-based decisions.  Before that evidence exists, we must follow the prescriptions of the older evidence even when that obligates us to suffer the consequences.   As I argued in an earlier post, we are obligated to suffer the consequences of evidence-based decision-making so that we can catalog that experience of suffering as new evidence to consider for future decisions.

When we restrict decisions to consider only evidence, we dismiss fears and doubts as legitimate evidence for consideration.   Fears and doubts involve the things that we don’t know.  We don’t know something because we don’t have evidence of its existence.  This is different than knowing something exists but not knowing much about it (such as some of the uncertainties about how to treat Ebola and how to prevent its spread).   Fears and doubts involve surprises such as our discovery that official policies and guidance in US hospitals were insufficient to assure successful treatment and to prevent the spread.   The fears and doubts about the preparedness of US to handle Ebola within its borders concern other surprises not yet exposed.  For example, the environment in US may present unforeseen favorable environment for Ebola to spread, or our ability to delay death will increase its R0 value here.

We reject fear and doubt of the unknown unknowns precise because these fears lack evidence.   Evidence based decision making requires that the decision-making not contradict the evidence and at the same time be fully supported by actual evidence.   With this guiding principle for valid decision-making, we should also reject dark data that I describes as model-generated data that substitutes for missing real-world observations.

Note, my use of the term dark data is different from data-science definition.  My term alludes to cosmology’s dark-stuff that they know must be out there based on models despite lack of any direct observational evidence, and I prefer the term unlit data to describe the data-science term dark data.   This is my blog, I’m using my term.   What I call “unlit data” (and others call dark data) presents its own form of ignorance that I’m not going to discuss today.

Evidence based decision-making should reject dark data for the same reason it rejects a role for fear and doubt.  Fear and doubt are not valid because they lack evidence.   Dark data consist of stuff we invent to substitute for missing observations.   Unlike the undocumented fear and doubt, dark data occupies space in the data store.   Because dark data is data, and appears to be evidence.   The problem is that dark data is not observational data.  Dark data tells us what we suspect about the real world when the real-world is not giving us an observation.   The suspicions embodied in dark-data are the opposite of fear and doubt: they reflect our science-based confidence instead of our skeptical fears.

Dark data is not observational evidence.  Dark data has as little legitimacy in evidence-based decision making as fears and doubts.   Both are biased assumptions about the world that substitute for lack of evidence.   We reject fear and doubts because those lack evidence.  We should at least cast suspicions toward dark data that substitutes for dark data.

My last post emphasized the fears and doubts of the unknown unknowns.  In reference to Donald Rumsfeld‘s famous quote, the other two possibilities of the known knowns, and the known unknowns, are legitimate evidence where statistics can quantify or bound the latter.   Dark data is neither of the categories listed in that quote.   Dark data are human-imagined knowns: using our assumptions about the world to substitute for missing data.

During my reading of the popular news reporting about the Ebola epidemic, I notice that there is a lot of ignorance about the disease.  Some of that ignorance is explicit.   Public officials emphasized this ignorance as underlying unwarranted fears and doubts of the dangers of Ebola becoming an epidemic in USA.   I also observe there is much more ignorance hidden from our view because the official assumptions replaced missing observations.

One example of dark data ignorance is the claim that conditions in USA are fundamentally different than conditions in West Africa when it comes to considering Ebola.   The missing data is that we simply don’t know how Ebola will behave in USA, either in the treatment of patients in well-equipped hospitals, or in the virus’ ability to spread through the population.

We have clear observations of differences in wealth and customs of the two different parts of the world.   We have less clear observations that the customs in Africa can contribute to the severity of the disease and to its ability to spread through the community.

From what I see, there is no direct evidence that these differences conclusively lead to the severity of the disease in actual patients or to the spread of the disease in the community.  The supposition that the unique conditions of poverty and customs caused these conditions is itself a form of dark data.  For example, popular news states almost as fact that the first victim of Ebola in the current outbreak acquired the disease from fruit bats because the community has a custom of hunting and eating these bats.   But as stated in this article,  “Scientists don’t know exactly how the toddler contracted the virus”.  This is an admission of ignorance, but the custom of eating fruit-bats is a coincidence that might explain this initial infection.   This custom is also a possible unique-to-Africa risk.

A much more troubling form of dark data is the presumption that all of the differences in wealth and customs between USA and West Africa are in favor of USA when it comes to the risk of the spread and severity of Ebola.  Until very recently, we had zero observational evidence of how Ebola will behave in the unique environment offered in USA.   The assertions that we have nothing to fear is backed only by the model-generated substitutions for the lack of direct evidence in USA.   Instead of directly observing that the disease is harmless in USA, we substitute the model-generated data that tells us that the USA lacks the conditions in West Africa exacerbating the severity of the epidemic.

With the first Ebola case within USA, we observe that even with access to quality health care and abundant funds, the patient can still suffer horribly and die, and the disease can still infect two others who have direct contact with that patient during the later stages of the disease when it is most infectious.   This one objective observation supports the conclusion that no difference in Ebola experience in USA and West Africa.  So far, it is the only observation we have.

While it is useful to argue that we made some mistakes that should be avoided in the future, we are still left with the hard evidence fully confirming that Ebola has the same impact within USA as it does in West Africa.

Ignorance works both ways in arguing against fear and doubt.   As the previous post discusses, we argue that the fears and doubts are not legitimate because of ignorance: the lack of evidence disqualifies these concerns from decision making.   Also, the argument leverages the ignorance of how Ebola would behave in USA by substituting the lack of evidence with claims that the conditions making the problem difficult in West Africa are not present here.   We argue against ignorance behind fears and doubts by exploiting the opportunity to substitute our assumptions (of nothing to fear) for the ignorance from lack of data of how Ebola will behave in this country.

Using dark data this way is essentially fighting ignorance with ignorance.   It might win the argument, but it will not make us safer.


3 thoughts on “Evidence-based decision making when dark data contaminates the evidence

  1. Pingback: Distinguishing dark data and predictive modeling roles in decision making | kenneumeister

  2. Pingback: Truth as a confounding variable that interferes with interpreting data | kenneumeister

  3. Pingback: Exposing model generated information for public scrutiny | kenneumeister

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s