Databases motivates philosophy with multi-valued logic anticipated by Buddhist thinkers

We now accept the concepts of data-lakes where the end analyst confronts multiple versions of the truth. These approaches forces us to understand data as being neither or both when it comes to deciding whether something is true or false. This is not a conquest of Buddhist thought over western philosophy. Instead it is real world challenge we face in dealing with conflicting data from multiple sources where each is confident they are providing the truth. Even when we do not explicitly invoke philosophical concepts in the practical consideration of data, our thinking about data is closer to the eastern way of thinking about truth than it is to the western philosophy.

Model-generated dark data contaminates our data stores with outdated information

There are multiple reports that the epidemic is not spreading as quickly as predicted earlier. These observation deserve more credibility than the simulation results. However, other reports dismiss these observations as being a fault of data collection problems. This dismissal implies a preference for the simulation results instead of direct observations. This use of simulation results as competitive to actual observations is what I call dark data. I value the observations more than I trust simulated data for the same facts.

Paying attention to data and predictions teaches the lesson to suspect models

By paying close attention to the new observations and comparing them to past interpretations, we are able to gain a better appreciation for our ignorance. Our models may be imperfect. There may be previously unknown variables that can have major consequences. We need to pay attention in order to observe the failure of new information confirming old expectations. Doing so teaches us respect for our ignorance. In particular, such experiences justify fears and doubts for future decisions.

Evidence-based decision making when dark data contaminates the evidence

Ignorance works both ways in arguing against fear and doubt. As the previous post discusses, we argue that the fears and doubts are not legitimate because of ignorance: the lack of evidence disqualifies these concerns from decision making. Also, the argument leverages the ignorance of how Ebola would behave in USA by substituting the lack of evidence with claims that the conditions making the problem difficult in West Africa are not present here. We argue against ignorance behind fears and doubts by exploiting the opportunity to substitute our assumptions (of nothing to fear) for the ignorance from lack of data of how Ebola will behave in this country.

Using dark data this way is essentially fighting ignorance with ignorance. It might win the argument, but it will not make us safer.
Using dark data this way is essentially fighting ignorance with ignorance. It might win the argument, but it will not make us safer.

Big data is a paper tiger in face of addressing a crisis like Ebola outbreak

My earlier Ebola post … implied that [data science] participation is optional. With this post, I think the employment of big data predictive analytics is not optional. This disease will spread to affluent areas where people will learn of their degree of contact separation from the infected individual. We urgently need predictive analytics to inform these people of quantitatively verified risk of contracting the disease given that degree of contact separation.

Data analysis cannot find what you don’t want to find

Despite the challenges for exhaustively identifying possible bad outcomes, I still think it is valuable to invest some portion of our data collection and analysis to seek out worst case scenarios. Even when we find results that lack sufficient evidence to change our decisions, the results can inform us of what to be aware of as we continue to watch the arrival of new data… we cannot find what we don’t want to find.

Decision aid from simulated big data

In this public release paper from MITRE, they describe a tool they developed that adjusts windows of time to allocate for certain operations on the ground and immediate airspace of an airport.   In particular, this tool strives to reduce the already rare occurrences of near collisions of arriving and departing aircraft through the use…

Pessimistic skepticism as a virtue for data science

To support the decision maker, the data scientist (the student of data itself) needs to anticipate the doubts of the decision maker. The data scientist needs to challenge proactively the data itself for the possible doubts of its authenticity, accuracy, and relevance.

Entertaining doubts is indistinguishable from skepticism and pessimism. This is a virtue for data science.