Data analysis cannot find what you don’t want to find

In an earlier post, I defended a pessimistic skeptical approach to studying data.   Reporting of recent news of various calamities frequently describes the events as catching us by surprise, that no one saw these possibilities until after they started happening.  In particular, a common theme of these problems is the unexpected break down of social order.  Everything seemed to be working fine until something happened, whether that was an ambiguous police shooting, a collapse of a region government’s security forces, or a breakdown of medical and sanitation infrastructure in face of a highly infectious disease.    We may have been able to predict the triggering events in each of these cases, but we failed to predict the catastrophic consequences at the social scale.

One reason why we did not anticipate these possible calamities is that we simply didn’t try to look for them.   Much of our data analysis focuses on decisions that we want to make.   We seek to identify the best outcomes or the least worse outcomes.   To focus on these decisions, we cull the list of options to consider to be the most promising results to make the decisions we want to make.

Because we invest all of our effort in looking for what we want to find, we are less likely find what we don’t want to find.   That missed opportunity for discovering an uncomfortable truth may be far more important for the future than finding some method for short term gain.

We recognize that any decision will have both good and bad consequences so we seek out a balance where the good outweighs the bad.    When we identify both good and bad consequences, we tend to error toward the middle ground.   We avoid exaggerating the good consequences that set too high of expectations.   At the same time, we tend to avoid exaggerating the bad consequences.   Our focus on selecting the best decision biases our evaluation of the outcomes close to the middle of possibilities between good and bad.   Because of this bias, we do not allow ourselves to contemplate the extremes, especially when it comes to the ones with disastrous consequences.

An alternative project for data analysis would be to seek out the conditions that can lead to very uncomfortable possibilities.   Because these are conditions we definitely want to avoid, our common practice in analysis to avoid studying them at all.   I would argue that identifying these bad decisions does not have to change our choice for the best decision in order to provide us a benefit.   The benefit the investigation of these conditions is that it can inform us of what to be alert for as we observe new data.   When bad events occur, I would prefer to hear that someone knew it could get this bad instead of hearing that this event surprising everyone of even its possibility.

This is another argument for a pessimistic approach to data analysis.   This argument is the same as defending the cynic in work teams.  The cynic consistently points out what can go wrong.  A cynic trained in effective argumentation can present compelling doom scenarios.   Although his point may be to hamper progress toward a particular decision, his value is in making us aware of bad things to be looking for as we proceed with our preferred solution.

Today, the modern workforce greatly discourages the cynic to the point of an advocacy for the banishment of any cynics.   We want work teams that function well together with a shared confidence in success that will bring good fortune for all.  In many workplaces, the culture seeks out to identify and to isolate the cynic.   If we fail to reeducate the cynic to be more optimistic about the project, we will do what we can to encourage him to leave.   We have an ever lowering of tolerance for pessimistic predictions of our goals.

The same occurs when we invest data analysis.   For data analysis projects, we invest much more on supporting evidence for our preferred decisions.   For example, if our goal is to seek out a larger market share, we seek out data that relates to market participation.   In that example, we are not going to seek out information about social conditions that may be affected if we are successful.   In modern times, we replace the word innovation with the word disruption.   The very choice of the term disruption implies a break down of social norms, but our pursuit of the innovations pays no attention to measuring those social norms that we are disrupting.

Similarly, we invest our analysis resources on issues related to making our innovation successful.  Of all of the predictive algorithms we can choose, we will select the ones that are most suited to advance our goals for success.   With these benefit-focused algorithms, we will analyze the data that has the most causal influence in the direction that will lead us to our goals.   The entire analysis project is biased toward the goal of success.   Previous successes sufficiently justify this bias.   Those successes clearly and beneficially changed some market or society as a whole.

We strive to exploit all available opportunities to promote our innovation.   Anything we can do to promote our success is fair game because our success deserves our undivided attention.   Once achieved, our success will demonstrate its intrinsic value.   Historical examples of success encourage us by showing that previously unexpected ideas have become widely appreciated by the overall population.   Those successes caused disruptions in other’s lives but that seems always to work out one way or another.

The disruptions caused by our success are someone else’s problem.  We don’t pay attention to how those problems get solved or how well those resolutions will work.   We don’t measure the negative consequences of our disruptive success.   We don’t analyze the robustness of the solutions for these negative consequences.   Usually, we see no reason to focus on what may be going wrong as a consequence of our success.

The above discussion alludes to examples from making business decisions.  We allow that a business should focus on its own success.   However, the same problem also extends to public policy and government.

A major part of our policy making is on regulating the successes of business models.   This alone focuses our attention on the elements that are most closely related to that success.   In this way the bias toward looking at the beneficial consequences infects policy making as well.   Even though regulation’s goal is to mitigate the harmful effects of the success, the focus on analysis is on the direct consequences of the success.   We seek to regulate successful businesses.  Unsuccessful companies will not last long and thus do not need regulation.  Implicit in our policy analysis is the dismissal of inevitable disruptive consequences as being someone else’s problem.

For example, a new business model will result in a new population of unemployed people.  From the point of view of regulating that business, we accept this consequence as a necessary consequence of progress.    The previously employable people join the ranks of a group that shares only the quality of being unemployed.   A different set of policies govern the problems of the unemployed population.  These policies focus on the goal of undoing the unemployment condition by finding some kind of employment, where any type of employment will suffice.   The unemployment analysis focuses entirely on reducing the prevalence of unemployment.   An improved economy or a more relevantly trained workforce will solve the problem of unemployment.   It is not a priority for unemployment policies to measure the growing frustration of the population or to analyze the potential consequences of their dissatisfaction.

We find out these consequences when it is too late.  In many of the recent calamities, there is a common thread of a break down in society.  The disasters are a consequence of the sudden breakdown of discipline in security forces or a breakdown in discipline in medical practices when a serious challenge emerge.   This is what we did not see coming.   Before the serious challenge emerged, we saw what appeared to be a well functioning economy of people doing their jobs, getting paid, and spending their earnings to enable economic growth.   We assumed this benefit of a well functioning economy is self evident proof of the necessity of their continued participation in their jobs.   Instead, we learned too late that they were insufficiently prepared or motivated to perform their assigned duties at the level needed to meet the increased challenges of new epidemics or organized violence.

We were surprised by this breakdown in large part because we didn’t even try to look for its possibility.   Our focus is almost entirely on progress.  We were satisfied with incremental improvements in statistical measurements for wider education, more wealth, better access to clean water and sufficient food, and improved lifestyles.   Implicitly, we assume that progress inevitably grows especially when that growth is under human control.   It is irrational for humans to deliberately tear down progress that benefits them personally.

Unfortunately, frustration is irrational.   Returning to the more benign example of the employed, I would think that an unemployed person whose past job is forever lost to history would be eager to return to any job that provides earnings to allow him to return to a more active economic participation.   That participation will provide benefits such as restoring access to luxuries that he previously enjoyed.   The nature of the work should not matter as much as it is work that delivers the paycheck that in turn can improve the economy.    Despite that, there is a problem a sizable portion of the population that is not even attempting to find employment.   They have dropped out of the labor market entirely.  We lack any satisfying explanations for why they remain outside of the workforce, withdrawn from participating in the economy as much as they had done previously.

Certainly, I have no understanding of what is behind the presently very low labor force participation rates compared with historical norms.   These are people who could be earning money somewhere but are not even trying.  We do not know why they are not trying.   It seems irrational to us that someone can decide to decline to pursue an income with so much of daily life depends on recurring expenses even for non-essential luxuries (such as access to smart phones that seem indispensable for participating in modern life).

However, I point to this lack of understanding of low labor force participation as evidence that we also do not understand the motivation for those who are actually participating.   Many people are making incomes and many are reasonably satisfied with the benefits of those incomes.   We make an important common assumption for much of our analysis for future business and government progress.   That assumption is that people who are currently employed in their work are in fact devoted to their jobs or careers because they are convinced of their value in sustaining a widespread economy that benefits them personally.   The assumption is that they will stay at their jobs when a severe challenge demands their extra devotion to their duties.   The self-evident value of a well functioning economy will convince people to not abandon their jobs, or neglect their duties on the job when the situation becomes suddenly more challenging.

I think we need a persuasive cynic to point out that we should explicitly test this assumption.  We should seek measurements that will support analysis of what can go wrong.   We should invest in analysis into exploring the uncomfortable consequences of that data.

In addition to the data analysis that focuses on beneficial progress, we need data analysis that focuses on what may be undermining that progress.   We can benefit from a war gaming approach to analysis.  A war game invests in two or more competing teams where one team seeks to meet some beneficial objective and the other team attempts to defeat the first team.  Additional teams may have different roles to frustrate the first team’s pursuit of its objective.  In war games, we reward the winning team no matter who wins.  If the competing teams succeeds in defeating the first, we reward that team with recognition of their skills and we penalize the losing team for their poor execution and planning.  A war game gives our preferred solution an opportunity to fail.

In such games, there are independent and well funded efforts to gather data and analyze that data to meet the team’s objective.  Of these multiple efforts, only one team has a beneficial objective.   The war game approach very much values the contributions of the competing analysis to thwart the progress of the preferred team.   The value we get from the antagonistic team is the discovery of what can go wrong, and how bad it can get.

We should consider war game approaches for more decision making, certainly at the government policy level but perhaps also at the business level.

A benign business example may be the initial success of discovering a way to construct web content in a way to get more favorable rankings by search engines.   The initial (not war gamed) analysis focused entirely on techniques that influence ranking in search results.  The immediate success of that effort was short lived because others began to use the same tricks and the search engine owners improved their algorithms to reduce the influence of some of the tricks.   Search engine optimization remains a hot field because the rules are constantly changing with more competitors and more sophisticated ranking algorithms.   I suggest that a more thorough war-game approach to the initial analysis could devise a strategy that could have may the initial success longer lasting.   I do not really think this would have been worth the investment for this particular market, but I hope it illustrates a potential benefit from up-front cynical analysis.

In true war games, it is much more difficult to be the adversary of the preferred team.   The preferred team is following a plan that is best supported by available data to produce the best outcome.   In most war games, the strategy that the preferred team follows has already fully exploited available data.   This puts the competing antagonistic teams at a disadvantage if they rely only on that same available data.   Chances are pretty good that that the preferred strategy has made allowances for that data.   I suspect that in real war games, the challenging teams fail to defeat the preferred team unless they go beyond the available data and act on new information or simply follow plans based on hunches or theories backed with no historical data.   When the competing team wins, it is probably because of their human thinking to devise a new strategy instead of deriving the strategy from historical data.

Even if we devote our efforts to cynical predictive analysis of data, we may fail because the data itself is biased toward the beneficial result.   In the earlier example of workplace participation, we have no real well to measure frustration or alternatively devotion to jobs for those with incomes doing jobs they find currently acceptable.   Observations of long tenures, low job turn-overs, and few openings suggest a happy and devoted workforce, but they also mask underlying frustration and lack of commitment for when the job becomes difficult.

Devoting resources to a cynical predictive analysis of historical data may fail to uncover the potential of disastrous consequences.  We will fail to find supporting evidence (as in the workforce example, failing to measure frustration and ambivalence) and our predictive analysis may identify bad scenarios that cannot pass rigorous statistical tests for confidence.  In true war game, humans actively participate to come up with new ideas that are not supported by historical data.  Put into practice, these ideas may win.   However, outside of that true war game scenario, we would dismiss the ideas precisely because they have insufficient data to back it up.

I grant that a perfectly reasonable excuse for being surprised by some disastrous result is that there was insufficient data to suggest its possibility.   The mirror of evidence-based decision making is lack-of-evidence-based excuse making.

Despite the challenges for exhaustively identifying possible bad outcomes,  I still think it is valuable to invest some portion of our data collection and analysis to seek out worst case scenarios.   Even when we find results that lack sufficient evidence to change our decisions, the results can inform us of what to be aware of as we continue to watch the arrival of new data.

Someday a dangerous infectious disease will arrive in our country.   When that happens, some people will fail to follow appropriate instructions to prevent its spread.   Some people may fail to preform their health care duties diligently or may decide to abandon their jobs entirely.   All of the great practices we have for managing a crisis will be for nothing if people do not cooperate because of the lack of commitment to face the risk that accompanies that cooperation.   The result can be catastrophic.  But when it happens, we will have a legitimate excuse that we didn’t see it coming because the data never suggested this possibility.

We can’t find what we don’t want to find.


3 thoughts on “Data analysis cannot find what you don’t want to find

  1. Pingback: Predicting crime with big data is profiling and will not help predicting big crime | kenneumeister

  2. Pingback: Predicting crime with big data is profiling and will not help predicting big crime | Hypothesis Discovery

  3. Pingback: Data analysis cannot find what you don’t want to find | Hypothesis Discovery

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s