The government sponsorship of big data projects in the name of security, law enforcement, or in medical cost optimization appears to place a huge faith in the efficacy of big data to deliver truth. It reminds me of the theocratic rule of the medieval church using their confidence in the biblical data to justify their actions with the population.
Even though we consider our society as largely secular or separate from religion, we are still humans. Time and time again, humans have demonstrated an eagerness to find a higher authority to justify their actions. They may initially turn to specific leaders who derive their authority from access to supernatural worlds, or they may turn to written works with authority derived from supernatural inspirations. The supernatural makes the actions somehow religious but the key term is authority. Humans desire an outside non-human authority to resolve their differences and to guide their decision making.
Big data solutions appears at a convenient time when we need that authority but we reject supernatural alternatives. Big data is derived from secular processes. Big data delivers seemingly objective data points to support conclusions. Big data analysis convincingly presents explicit guidance in the form of patterns of unexpected combinations of attributes. Big data provides explicit and and easily interpreted guidance.
In a recent post I suggested a medical insurance coverage example. Big data may inform us that people with brown eyes, ingrown toenails, crooked teeth, and passport with stamps from particular countries generally suffer poor outcomes for a particular medical procedure. This pattern may convince us to not approve this potentially risk of wasting money on a procedure that may not work.
How is this different than consulting an oracle, or to rely on prayer for inspiration? The big data answer comes from something materialistic. That is the only difference.
Big data with billions of data points each with thousands of dimensions presents huge opportunities to find specific data to answer virtually any question. This appears to be the motivation of large scale data collections for enforcing laws and regulations, or for optimizing delivery of educational and medical services. Big data promises answers to questions that previously had no answers. Big data promises to back up those answers with supporting objective data. Big data promises fast decision making.
I think a main reason we ultimately rejected theocracy was because it was so inconsistent. Any guidance from texts or revelation are inherently ambiguous. Perhaps some current set of rulers can come to a consensus for a short period of time, but over time that consensus changes. The answers changes based who reads the text or receives the revelation. We kept asking the same questions because previous answers never fully answered the underlying question. We noticed that each time we asked we got different answers and often with the same disappointing outcome.
Stripped of religious justifications, our medieval theocratic experience may be reduced to an experience of receiving excessively confident answers to questions where the answers proved to be inconsistent and disappointing. The confidence was misplaced on human’s ability to interpret the texts or revelations.
We replaced the theocratic approach with a secular approach that relied on public debate where all of the authority was in living people who can be questioned or challenged directly. This noisy and uncomfortable approach ultimately led us through centuries of vigorous and largely favorable changes. This process of relying on debates between living authorities works much better than earlier theocratic approaches.
But these debates are very uncomfortable. For whatever reasons, humans largely yearn for a non-human authority to resolve our differences and allow us to live without arguing with each other. We yearn for an outside force that will free us from having to argue with each other.
This yearning probably has increased in recent years because we still face contentious questions. Many of these questions are appear impossible to resolve through debate. The earlier generations answered all of the easy questions (this is not true, but they look easy in retrospect). We are now left with only the hardest questions. Now the questions are so hard that we want to find some way to move the questions outside of human debate and in the hands of a non-human authority.
Religion still offers its services but the process of adopting a universal religion is unthinkably painful even for those who are willing to accept a supernatural authority.
Along comes big data.
We now have the technical capacity to obtain, transmit, and store huge amounts of data providing multidimensional information on large populations of individuals. We have technologies that can quickly query and aggregate this data. We have fast and brilliant reporting tools that can display rich data allowing us to perceive patterns. The patterns can suggest hypotheses. The hypotheses can suggest answers.
Big data may be the non-human authority we need to past this nasty business of debating among living authorities.
How do we envision using this big data for enforcing laws or regulations or for optimizing services for education or healthcare? We will approach the big data with a query. We will receive the results of that query. We’ll interpret those results to make hypotheses that suggest the right decisions. We’ve recreated our own version of the ancient Greek’s oracle at Delphi.
The oracle at Delphi was delivered in spoken word while big data is delivered graphically. The spoken word of the Delphi was often gibberish that had to be interpreted by a priest. The big data reports have to be interpreted by humans and any disagreements of interpretations must be deferred to a credentialed specialist. In terms of process, there is not much difference between asking questions of big data and of asking questions from the oracle at Delphi. If there is any difference, it must be in the quality of the answers.
Whether we can provide reams of supporting observations has nothing to do with the quality of the answers. This volume of data primarily helps in persuasion. The tests for quality of answers are whether the answers are unambiguous, consistent, and deliver favorable outcomes.
I already raised the issue of the ambiguity of graphical reports for multiple interpretations. The presence of ambiguity means we haven’t solved the problem of eliminating debate. We merely moved the debate to a smaller group of people, not unlike the priests of Apollo. The population as whole is expected to defer to these specialists any resolution of ambiguity of interpretations. This remains appealing because it moves the debate outside of the general population.
Big data solutions also exhibit a great deal of inconsistency. The same question asked at different times will produce different answers for a variety of reasons. Obviously as time advances so does the accumulation of new data. New data may provide support for new ideas that lacked support previously. Less obviously is that over time the quality of observations changes: data can become degraded or improved, or the data can change in terms of what it measures. Repeated requests for the exact same query will return different results because the data will be different.
Big data is still a relatively young phenomena so that its long term inconsistency is not popularly known. Speaking from my own experience, I find this inconsistency to be very common and very frustrating.
The final test is whether the answers will deliver favorable outcomes. The whole point of using big data in government is to make serious decisions of who will be winners and who will be losers. Big data in medicine will decide who gets a treatment (and have an additional chance for survival) and who will be denied that opportunity. Big data in law enforcement will decide who gets prosecuted and convicted and who will be left alone. Initially we may get away with using the data justification for our decisions. But inevitably over time we will accumulate evidence that the decisions were not optimal: the patient denied care had a better chance of survival than the patient that received it, or we convicted the wrong person.
When these cases come out later, the other two faults of big data will exacerbate the problem: the inconsistency and ambiguity of newer queries will support claims that we made the wrong choice in the first place. Inevitably that will raise our suspicions that our faith in big data was as misplaced as our earlier faith in theocracy.
My opinion is that big data offers only one thing: hypothesis discovery. The necessary next step for a newly discovered hypothesis is turning it over the present-tense sciences to do hypothesis testing with new observations specifically to challenge that hypothesis. That testing is likely to weaken the confidence in the hypothesis derived purely on patterns of historical data. It is only after that testing using fresh observations should the hypothesis be available for the persuasion artists. That persuasion involves the messy and uncomfortable debates between living people. The same debates we’ve been enjoying for the past few centuries.
I worry that we are placing too much faith in big data as a replacement for how we go about governing our lives. I worry that we will look to big data to take questions outside of human debate and and place it in the hands of an ambiguous and inconsistent oracle that is as likely to give us grief as pleasure.