Many of the success stories attributed to data science using big data with it large volume, velocity, and variety share a common theme of being conducted largely unknown to the subjects being explored. If big data represents a new approach to managing social systems, then this approach has been proven on social systems that were unaware of the new approach.
In my recent posts, I have been discussing the challenge of accountability of decision making. Before recent times with focus on big data, we had a long history of organizing around the concept of accountable decision makers. These decision makers were expected to understand the alternatives well enough to persuade everyone of the value of the chosen approach. In addition, they were expected to understand the chosen alternative well enough to be able to accommodate a new demand or to justify why the unchanged plan is the best approach.
Accountability means being available to present the persuasive case or to negotiate a compromise solution long after the actual decision has been made. Accountability enables continued social cooperation and cohesion without the need for coercion.
With big data systems and their focus on large volumes, high velocity, and diverse variety, there is a burden on accountable decision makers to produce more decisions faster and over a wider range of implications. At some point, the need for implementing recommendations will exceed the human capacity for accountability. To obtain the big promises of big data solutions for very complex problems (for example, health care delivery), we will need to implement automated decision making with no human accountability. The lack of human accountability comes with the risk (if not inevitability) of social unrest and disintegration.
When our decision makers’ persuasive arguments are reduced to the appeal to the wisdom of algorithms and the quality of data where both are outside of their control, then they will serve no added value over automating the algorithms. This leaves an accountability vacuum.
In earlier posts, I described the special case of the production system that is interacting with the real world. In production, human scrutiny of data and processes in production will impede the goals of volume and velocity. To make big data systems work with the three Vs, we invest in data science in the development phases of the project to build processes and algorithms based on anticipated requirements of what will happen in the real world.
Data science itself is identical to how we understood computer science before the introduction of personal computers and its focus on interactive user interfaces. Computers scientists can produce very effective and robust algorithms. However, computer science is not a discipline known for accountability. Although modern software systems endure very rigorous and exhaustive testing, there remains an assumed contract of the licensee assuming all risks of the production operation of the software. The computer scientists assures us the software does what it is designed to do and meets requirements identified in advance. The computer scientists do not guarantee that the software will in fact deliver beneficial results during production.
I recognize that critical software does endure extensive testing to assure that software is available for production. Even with that testing, there remains a lack of accountability when the world surprises us. Computer scientists (now called data scientists) are not accountable for production. The entire concept of production implies a hand off of software from the developers to the operators. The operators are accountable for the production operation, not the developers. The operators will demand high standards for delivered software but they can not demand accountability for the surprises that were unanticipated in the requirements. This amounts to the lingering notion of use at your own risk that characterized earlier software products. Today this notion is merely recast as a line between development (labor intensive) and production (accountability intensive). Production assumes the risk of using the software.
Accountable decision making resides in the operations part of the project where the system is in production. Modern data solutions present too many recommendations too quickly and over too wide a variety. The combination overwhelms any human to understand well enough to persuade others or to negotiate a change. Production automates decision making to implement machine-generated recommendations. There is still a need for accountability. Accountability falls on the operations part of the business. The only tools available to operations are those that report the data itself. Operations is not equipped to scrutinize data and resolve issues of misuse or abuse of data. They can only report the data as it exists. To be fair, operations has very powerful reporting tools with rich selection of data, but these tools are for reporting not in-depth scrutiny of data.
The three Vs of volume, velocity, and variety that characterizes big data propagates to unrealistic burdens on accountable decision making but also continues to propagate to the customers. In most popular literature success stories of big data, the customers are the ultimate beneficiaries with better and cheaper options. Customers are the targets of big data solutions. They may be citizens in a government as easily as they may be consumers or a product. Initially they were unaware of the big data project but now they are aware and becoming better educated.
Even in the success stories, there will be some who will fail to enjoy the benefits or who may become substantially harmed. The traditional approach to addressing grievances is by demanding accountability from the provider of the service or product. The new big data model can not provide accountability in this old sense of being competent at addressing grievances. With big data, the algorithms are out of anyone’s control but the data is good and the algorithms are judiciously chosen to provide the greatest bigger goals.
The end customers are becoming educated on the new technologies through the recent popularization of each of the following: big data, extensive and increasingly intrusive sensors, predictive analytics, and compelling visualizations. Although the three Vs of volume, velocity, and variety are less well known explicitly, customers are correctly inferring that these aspects of big data must exist. Denied traditional forms of accountability, these customers will expect direct access to these three Vs so they can fill accountability role for their own lives.
I described in earlier posts about the inevitable requirement for all participants in organizations (whether they are employees, product consumers, or government citizens) to become data scientists in the sense that of being able to query data and scrutinize it themselves to come to their own conclusions. The lack of inherent lack of centralized accountability in big data systems places the accountability on the individual. To be accountable, the individual will need to be a decision maker. The individual will need to be able to access and use the data to support well informed decisions.
The examples of the recent disruptive models of ride sharing (Lyft or Uber) and accommodation sharing (AirBnB) illustrate this empowerment of individuals to be accountable for their own choices by providing data in terms of locations, prices, and reputation ratings. Occasionally and inevitably some of these transactions may result in disappointments but the accountability is primarily between the individuals in the transaction and not the service as a whole. The value of a centralized accountable authority occasionally becomes illustrated by some particularly bad transactions, but the customers continue to find the self-accountable approach to offer more advantages than the older model. Instead of demanding central accountability to address the failure cases, customers are demanding even more data so they can make better informed choices. For example, a ride-sharing transaction may need more than the collection of reviews of prior customers. Potential customers will seek out more extensive background and prior history investigations of the operator offering a service.
Increasingly the customers will demand access to all of the three Vs. They will want access to sufficient volume, velocity, and variety to permit them to make informed decisions on their own to replace the missing centralized accountable authority. The data they demand is more manageable than the centralized case because they only need the data that is relevant to their decision, but that data will need to come from a larger store of data.
Technology is moving in this direction in many areas. We have greater tools for investigating our decisions in far more detail than before.
In many areas of our lives, we are far better equipped to make informed decisions. People are becoming used to the idea that data should be available for their own scrutiny for any matter that affects their lives. The individual customers demanding three-V big data are becoming increasingly frustrated when they are not allowed to access it. This frustration is justified because the deteriorating centralized accountability must be replaced by individual accountability. The traditional concept of decision making is moving to the individual level. Everyone needs to become a decision maker who demands exhaustive analysis of all relevant data to be convinced of the best option to select. Individual decision makers demand access to the big data products, in volume with velocity and variety.
People are learning that big data and Internet of Things are making quality objective data available that is used for quicker decision making. They are learning that the quicker decision making is overwhelming to a centralized accountable person. They are learning that they need to assume the accountability that is no longer available from authorities due to the speed and volume of data. Inevitably, everyone will demand big data volume, velocity, and variety to allow them to be accountable for their day-to-day actions. This demand will put pressure on dismantling traditional more deliberative and secretive approaches that specifically deny individuals direct and immediate access to data.
One of the early big data success stories resulted in profiting by buying then selling stock within a few seconds. Traditionally, each was a separate decision typically requiring research to justify the decision with an anticipation to wait for the result to occur. The program trading looks for extremely short term opportunities for a particular stock and may never pay attention to that stock ever again. This kind of mentality pervades a lot of big data solutions especially those that expect to operate in “real time”. Whatever real-time analytic processes means, it implies making independent decisions at closely spaced intervals that exploits the latest data as quickly as possible. A marketing campaign will prepare and broadcast a new advertisement around a particular word that just began trending upward on twitter on that same day. The marketing campaign will abandon that word and the associated story the very next day because another trend has started.
These quick real-time decisions will produce mixed results. Sometimes the decisions will turn out bad but are averaged out because good decisions outnumber the bad ones. Sometimes good decisions will nonetheless result in harm of some sub-group. People will demand accountability when they perceive a decision has harmed them. The sought accountability is a responsible person who can explain the justification of the decision or who can implement a change to prevent it from happening in the future. Increasingly, that accountability is unavailable. People will assume accountability for themselves. In order to assume accountability, they will demand access to the same data so they can come to their own judgement of the appropriateness of the rationale behind the decision.
The natural and traditional approach to decision making is to isolate the customers (subjects of decision making) from the decision making process and in particular from the handling of the data. One of the ways to characterize the recent disruptive businesses is in the particular way that they break down this barrier and give people access to the decision-making process. The modern disruptive trend is about empowering people become their own decision makers.
However, the disruptive business success stories still impact a relatively minor part of people’s activities. The larger more traditional business are using big data solutions to scale up their traditional model of shielding the customers (or subjects) from the decision making process. Big data solutions inevitably results in a deterioration of accountability. Big data is big because it exceeds the capacity of any human to comprehend let alone to persuade others or to revise to satisfy an arbitrary request. People will recognize this lack of accountability and assume accountability for themselves. To do this, they will demand access to the data that the traditional models deliberately protect against public access.
Corporate data is valuable data made trusted by large investments in cleaning, conforming, validating, verifying, etc. Traditionally, this data is considered proprietary and a key competitive advantage. The traditional corporate model is not going to allow people to have direct access to decision making because doing so will grant that access to the company’s competitors.
Big data erodes a company’s ability to be accountable for its decisions. People will demand access to that data so they can assume the decision making role. Companies will resist sharing their valuable clean and trusted data. This leaves people no other choice but to find information for themselves. They will seek out information from other sources of more dubious quality and certainly of less internal consistency that is needed to combine information from multiple sources. People have access to the same technologies that corporations use for big data. These technologies are not hard to learn or to use, and they are relatively inexpensive especially with respect to cloud computing. People will use the big data technologies on data they can acquire themselves. This available data is likely to be not as good the corporate data. It is inevitable they will come to different conclusions.
When decision-making moves toward the general population instead of a centralize accountable authority, the general population will outnumber the centralized authority. The general population will offer many more decisions and explanations (story telling) of their more inferior data. Combined with the overwhelmed central authority’s ability to explain his decision, these alternative stories are likely to become more compelling than the story offered for the corporate decision. The popular stories may suggest a company is not treating its customers well, or they may even suggest the company is committing some violation of law or process. Because all of the stories use the same technologies and practices, the only defense is to compare the relative merits of the different sets of data. Persuasion of the population will require opening up this proprietary data for the public to inspect.
The concepts behind disruptive businesses are more than just a business strategy to win new business. The concepts are the inevitable consequences of big data eroding the accountability of the centralized decision maker. The data is too large and complex for a human to understand the global consequences well enough to satisfy every aggrieved party with an explanation or a compromise. The people will fill the vacuum of accountability themselves but only for their own lives. They will demand access to the best data to make the most well informed decisions using the commonly available big data technologies. Ultimately everyone will have to become decision makers or organize into small tribes led by someone the tribe trusts to make decisions for the tribe. They will demand access to data and will use whatever they can find. The pressure will be on companies to share out their most trusted data to prevent the population from using lower quality data in coming to conclusions.
Disruptive business models are the inevitable consequences of big data and the Internet of Things pushing accountability to the individual level. The individual will demand the three Vs to satisfy his personal needs making informed decisions for self-accountability