This post follows on the theme of the past two posts. In the earlier post, I explored the idealization of big data as a kind of mathematical or religious truth that can not be challenged so that any failure is always a fault of human ignorance, negligence, or malice. In the later post, I explored the different topic of how society exists to assign humans to be accountable for justifying decisions or to negotiate for more equitable decisions. In that post, I suggested that the motivation for people entering social constructs such as governments is to have someone who has at least some control over the outcomes that affect everyone’s lives.
The two recent posts are related. The vulnerability of failure for big data technologies is that they inform a decision that affects people’s lives. In general, we want a human decision maker to be accountable for justifying the rationale for a decision or for modifying the decision process to accommodate a new need. This social imperative for human accountability is where we can question the trustworthiness of big data.
It is true that big data technologies are basically mathematical. They primarily concern collecting accurate and clean data and applying mathematical algorithms to derive some result. A big data algorithm can not fail any more than a quadratic formula can fail. The result is exactly determined by the mathematical operation on the input values that we assume are accurate.
In the second post above, I linked an article about robot workstation assistants that can work safely alongside a human specialist. Big data is very much like a robotic assistant to a decision maker. As with industrial robots, we can encounter scenarios that don’t require direct human supervision at all. Big data solutions can affect decisions automatically: automating the decision maker. This occurs frequently today in online marketing, for example.
The vulnerability of discrediting big data technologies occurs when it comes to the accountable human decision maker. Social systems demand a human to be accountable for the final decision. Social stability requires an accountable human who can provide a satisfying explanation that justifies the result, or who can negotiate a more equitable future result.
Big data’s big weakness is its charm of being a science built on mathematical truths. This charm encourages us to trust this mathematical truth so that we can spend less time scrutinizing the problem ourselves. This charm justifies the elimination of routine data science scrutiny at the production stage of the system. We can trust that good algorithms using good data will produce good decisions. This charm risks the possibility of destroying the human accountability when the decision maker’s response to a grievance is that the decision is out of his hands and too complex for him to understand.
Populations form and support social constructs in large part to assign accountability to human decision makers who have the task of persuading the population of a just decision or of being capable being persuaded to change a rule. When that assigned individual responds to a grievance with a response that says he doesn’t understand the rules and he is powerless to change them, he fails to meet the role expected by society.
Society could decide that human accountability is no longer needed. This result is frequently promoted as one of the promises of big data. We can eliminate the messy business of debate by turning over processes to be outside of control of any humans. That is the state of nature before we made the social construct in the first place to place a human in charge. Somehow the big data system will always be benevolent to everyone so no one will have grievances. I believe a population of humans will always produce a subgroup that will find some grievance. Social harmony depends on being able to provide them an effective outlet to address those grievances.
The alternative is that society will demand replacing the powerless and ignorant decision maker. The problem with big data is that its large volume, velocity, and variety is beyond any human’s capacity to understand. We rely on algorithms that are proven on a smaller human-understandable level and that we trust will extrapolate to levels beyond human comprehension.
Society demands human decision makers who are accountable for the decisions that affect people’s lives. The humans have to be in control of the decisions. They understand the decision and they know where it is possible to accommodate different needs.
In an earlier post, I argued that it is misleading to describe the field of working with large data as data science. The actual task expected of the practitioners is more like a data clerk. The data clerk needs to select algorithms and to verify the results. This is not unlike financial or governance-compliance jobs that require knowing the expanse of allowable procedures and which ones to apply to a certain task. The jobs impose accountability that the practitioner followed accepted practice and verified results. This is what we need from data scientists in big data projects, but I fear we aren’t asking it from them.
Instead, we see data scientists as research scientists or as applied science engineers. The researchers are developing new algorithms that are sold for use by the applied science engineers. The engineers apply the algorithms according to the prescribed instructions and turn the result over to production. That’s the end of the job.
Data scientists are not accountable for production in the way society expects decision makers to be accountable. Data scientists are like the automobile manufacturer delivering a working car to a consumer but is shielded from accountability if the consumer wrecks the car. In most cases, the manufacturer is not accountable for the wreck when it is due to human error or just bad luck.
For large scale systems, society demands accountability for results that do come down to operator error or just bad luck. Society expects the decision maker to take into account the possibility of human error, malevolence, or bad luck after those events occur. The modern data scientist only takes those considerations into account before these events have a chance to occur. The modern data scientist is an engineer that attempts to make the best possible product but is immune of accountability of its later operation (in production phase).
I want to discuss a couple recent news items that are related to engineering failures. Although unrelated to data, I want to use them as analogies to the accountability we want for operational systems.
The first is more recent concerning a fault on a roller coaster ride that left a car full of passengers at the tallest point and with an steep sideways slant. The engineering response was that the car behaved exactly how it was designed, to stop and secure the car when a fault was detected. Eventually everyone was rescued without any harm so the story has a happy ending. But the story doesn’t end with the rescue. Even though it had a happy ending, we will demand accountability. We want to know what exactly was the fault that was detected. Perhaps the fault was a false alarm that should have not have occurred in the first place and thus unfairly denying the passengers an uneventful ride. Alternatively, if the fault was serious then it should have prevented the ride from starting in the first place. Another inquiry can approach the question of the good engineering design that intended to halt a car in mid-ride but equip the car with safety releases that releases all passengers of the car at the same time. If the engineers knew the car could stop mid-ride, they should have designed a way to release passengers one at a time. The rescue took much longer because of this need to harness all passengers of a car before releasing the restraint.
Even though the story had a happy ending, we will demand answers from someone who can justify how everything turned out exactly as it was designed to, or who can affect a change to prevent this from recurring in the future.
The second example is an earlier event occurring on a commuter train where a passenger’s foot fell into the gap between the train and the platform. The image that struck me is that a mob action was credited for solving the problem. Essentially for a brief moment, society was replaced by a mob that acted in good will but without accountable leadership. Again, the story ends well: no one was hurt. Even the title of this news article places the blame on the passenger for “not minding the gap”.
The system spokesman claims this gap is less than 2″. Minding the gap probably refers to the risk of pinching or tripping, not of falling into. I don’t think a grown man’s foot will fit into a 2″ slot and certainly not sink into mid-thigh depth.
The link to above engineering analogy is that the train and station are designed to have a gap of not greater than 2″ and so therefor that must be what the gap was when the man fell into it. I suspect the gap was considerably wider than 2″ and if this were known to be possible the system would not have been allowed to operate in the first place. At least for the popular media, the description of the engineered gap size was sufficient to describe the reality at the time. This illustrates that charm effect I described earlier. The science charms us into thinking that reality matched the intention.
In this scenario, it is the station manager who has the role of accountability especially for the safety of the passengers. I link this article after the earlier to show two different responses. In the earlier response, there was a call to emergency response teams that worked carefully and diligently to safely rescue each passenger. Their high priority of safety slowed the process down. They could have rescued passengers one at a time instead of one car at a time. They could have told the other passengers to hold on tight when the restraint was released. They didn’t do that. They applied a careful plan to not allow any risk. This was a very accountable response.
In contrast the train response didn’t involve calling and waiting for an emergency response team that might have taken a similar amount of time to find a way to safely pry the train from the platform to free the man’s leg. This would involve retrieving and deploying multiple mechanical jacks to evenly and steadily push the train on either side of the leg and placed where there was good structural support. This rescue operation would be a civilized response that justified a long downtime of the system by the priority concern for safety.
Instead, the managers allowed a mob response that just happened to have a happy result of a fast solution that injured no one. The risk of injury was huge. The car could have rocked back and crushed the thigh bone. Someone in the crowd could have slipped on the tile or shattered a window. Sometimes a crowd response is justified when there is an immediate risk of danger. This is not one of those times. I assume a safety switch would prevent the train from moving if the door couldn’t close, and this man’s predicament was not going to allow the door to close. Also, the situation appeared stable enough to allow an emergency team response. There are far safer careful approaches for prying the car from the platform although it would have taken a much longer time.
What bothers me about the scenario is this sudden mob response fostered or even encouraged by the station manager. There must have been a sense by everyone that this was a bigger emergency then it actually was. That sense of emergency may have been fostered by the disbelief it was possible in the first place. The gap was a 2″ tripping hazard not a fall into hazard. If the gap was wide enough to fall into, the entire system probably never would have been approved for commuter use. The impossibility of the system allowing this to happen indicated that this was a bigger emergency than it was and that it justified an emergency mob reaction.
I point to this second example as an analogy of what we are facing by allowing ourselves to be charmed by the perfectibility of data volume, velocity, and variety for decision making so we don’t have to think about it any more. When something goes wrong, the only explanation is that there is an immediate emergency that demands a mob response. Since nothing should have gone wrong, when something does go wrong, we assume it is an emergency where the social structures can not possibly help. This helplessness of a civilized response is amplified by frequent assertions of the unfathomable complexity of a perfected system. We will think we have no choice but to rely on the mob to solve the problem.
The fallibility of mathematical or scientifically perfectible systems like big data comes in the form where society feels that the only way to solve an unexpected problem is to trust a mob response instead of deferring to the accountable authorities. Mob responses don’t usually end happily.
3 thoughts on “The danger in big data is its charm”
Pingback: Big decisions responding to volume, velocity, and variety of recommendations | kenneumeister
Pingback: The danger in big data is its charm | Hypothesis Discovery
Pingback: Big decisions responding to volume, velocity, and variety of recommendations | Hypothesis Discovery