Distinguishing dark data and predictive modeling roles in decision making

We require decision makers to make both types of decisions: the risk-avoidance decisions that should limit itself to the best possible evidence (similar to the standards used in criminal courts), and planning decisions that should employ more evidence including simulation and modeling to answer what-if questions. Fears and doubts of the unknown unknowns are also relevant to what-if questions for planning. Planning questions involve our attempts to be prepared for what might happen. Risk-avoidance question involve our attempts to be reasonable about preventing something from happening in the first place. I think it is best to distinguish what is admissible evidence for these two types of decisions. Risk-avoidance decisions should reject all forms of ignorance-data such as fears, doubts, and model-generated dark data. In contrast, planning decisions should include ignorance-data because we are not saying something will happen but we just want to be prepared to our best ability in case it does happen.

Evidence-based decision making when dark data contaminates the evidence

Ignorance works both ways in arguing against fear and doubt. As the previous post discusses, we argue that the fears and doubts are not legitimate because of ignorance: the lack of evidence disqualifies these concerns from decision making. Also, the argument leverages the ignorance of how Ebola would behave in USA by substituting the lack of evidence with claims that the conditions making the problem difficult in West Africa are not present here. We argue against ignorance behind fears and doubts by exploiting the opportunity to substitute our assumptions (of nothing to fear) for the ignorance from lack of data of how Ebola will behave in this country.

Using dark data this way is essentially fighting ignorance with ignorance. It might win the argument, but it will not make us safer.
Using dark data this way is essentially fighting ignorance with ignorance. It might win the argument, but it will not make us safer.

Listening to the data, some observations from Ebola news in Dallas

With that sense in mind, I want to express some observations I have from listening to the data about the recent events in the Ebola cases occurring in Dallas, Texas. The data I’m listening to are news reports instead of database records. But I’m assuming the news reports are summaries of valid data about what is happening. The following are some things I hear when I listen to this data.

Big data is a paper tiger in face of addressing a crisis like Ebola outbreak

My earlier Ebola post … implied that [data science] participation is optional. With this post, I think the employment of big data predictive analytics is not optional. This disease will spread to affluent areas where people will learn of their degree of contact separation from the infected individual. We urgently need predictive analytics to inform these people of quantitatively verified risk of contracting the disease given that degree of contact separation.

Big Data failing to mobilize to fight Ebola epidemic, too timid to tackle real problems

We have the capacity to supply the necessary technologies to this region to begin collecting social media data. We just need the incentive that this data will provide valuable contributions to the fight against the Ebola epidemic and that the data science community is available to devote their efforts and put their reputations on the line to come up with big-data recommendations that really make a difference for an urgent and life-critical critical. First of all, we need data scientists to present solid proposals for how big data can help the management of the populations with the goal of improving our prospects for controlling the epidemic.