Spark data: distracting data deliberately introduced to influence analysis

In earlier posts, I imagined a form of government that completely automates rule making based on trusted data and analytic algorithms.  In such a government, there is no role for human deliberation and this allows the maximum advantage of the most recent data even if lacks rational justification from deliberation and debate.   This speed of rule making will necessarily keep the duration of the rules short so that new data can inform new rules.   Also the frequency of new rules will require that only a few rules be in force at the same time.   The rules in force are the ones that are most urgent.   To keep the government purely data-driven (evidence-based) and free from human manipulative persuasion, I insisted that urgency itself must come from algorithms using data.

I am using the government by data as a thought experiment to explore the consequences of proposals for automating decision making using big data.   I object to the notion of eliminating an accountable decision maker on the smaller scale such as operating a business unit.   I also argued that human accountability is essential for democratic government in order to maintain super-majority consent of majority rule by having someone defend decisions that produced bad consequences.

But, I realized that democracy itself is not a requirement.  The same reasoning that obligates a decision maker to follow an algorithm recommendation can also obligate the population as a whole to cooperate.   This is a form of authoritarianism, but one that may be accepted if everyone trusts the fairness and reliability of the data and the algorithms.  Adopting an agile approach of short-lived rules can help earn that trust because of the prospect of new rules that will take advantage of even more recent data.   This agile approach lead to my thinking that a pure government by data, a dedomenocracy, is feasible once we have sufficient technologies available to handle the necessary data volume and velocity.

I am assuming one of the necessary technologies is security to prevent any internal manipulation of the data and algorithms.   Even with that technology, there remains an opportunity for manipulating the algorithms by managing the subjects observed by the data sources.   For example, an algorithm to measure public sentiment of urgency can be influenced by staging social-media organized flash mobs that attract crowds for the spectacle instead of any firm conviction the message for the event.   The observation would be a large protest and the signs would identify an issue so that together there may be a conclusion that this is a matter of some urgency.   When timed just right, this planned event could influence a decision that may otherwise not have happened based on actual sentiments about the issue.

Even with all of the data technologies that we can trust to make governing rules automatically, there is still a role for human debate and deliberation.  That role is in policing and auditing the data to be alert to data that does not belong in the data store for automated decision-making by algorithms.

This role of data oversight involves skills that are analogous to the classical rhetorical skills for debating policies but it is instead directed at the data instead of at arguments.   In particular, we need new skills to identify data fallacies in analogy to rhetorical fallacies.

A rhetoric for data is similar to my earlier discussions were I attempted to characterize data by its nature.  Instead of treating data equally or measured only by its technical attributes of there sources, I proposed identifying categories that describe different levels of capturing the current reality.   I described these categories using a metaphor of light: for example, bright data is an excellent capture of current reality while dark data is only what we expect the current reality to be.   This taxonomy of data-value can be a starting point for building a rhetoric for data.

To this taxonomy, I want to add a data fallacy that I call spark data.   Spark data is like the brief flash of light like what happens when there is a static electricity discharge.  I chose the analogy because it involves darkness (a description for human biases) with only a brief temporary brightness.   Spark data is a data fallacy because it is a relatively irrelevant event that we have no control over.   We need to be careful to not let spark data distract the algorithms.  If sparks occur at the right time, it can even lead to bad decisions.   We need a way to recognize and isolate spark data.

In current times, much of the political gridlock we have in the federal government caused by spark data.   If this were a pure form of government by data, I’d imagine that the government will be issuing rules that ideally would deal with major issues such as our debt, underfunded entitlements, foreign policies, military spending, and bureaucratic reform.  Instead, our current government devotes nearly all of its attention on distractions.

This focus on distractions has become a constant theme in politics over the past ten years.   These distractions are sparks that capture everyone’s attention even though the spark is brief, relatively minor compared to other issues, and usually topics that have no practical solutions in any case.   Examples of sparks are scandals such as the Benghazi event, VA hospital abuses, or IRS scandals.   Other spark examples include statistical claims such as income inequality, the 77 cents to the dollar disparity of women’s and men’s pay, a war on women, racist policing, a rape culture on campus, climate change, and same sex marriage.   Each of these examples are backed with evidence but the evidence does not by itself seem to justify giving these higher priority over something more urgent as the need for reforming entitlement programs with revised funding and benefits in order to get them on a more sustainable path.

There are a deliberate attempts to elevate the priority of these distracting issues with little relevance or little prospect of finding a solution.   These attempts are successful in their intention of avoiding or postponing the discussion of the tough issues by substituting topics that capture everyone’s attention (like a spark) even though there is not much or anything we can do about it.

This deliberate attempt at distraction is a major reason for the current government’s inability to address the more urgent problems.  There are endless supply of distractions to keep from dealing with the hard problems that have solutions but those solutions are painful.   It is far more attractive to discuss the sparks.  These discussion run out the clock so that we end up with continuing budget resolutions or to pass the problem to whoever wins the next election.

When I thought of the term spark to describe this data, I was mis-remembering something I read earlier that actually goes by the name stray voltage.   I like spark better because it fits with my light-theme of my data taxonomy.   One article describes stray voltage:

This is the White House theory of “Stray Voltage.” It is the brainchild of former White House Senior Adviser David Plouffe, whose methods loom large long after his departure. The theory goes like this: Controversy sparks attention, attention provokes conversation, and conversation embeds previously unknown or marginalized ideas in the public consciousness…

As a theory, “stray voltage” exists in a kind of strategic void. It can’t be dismissed or embraced as workable because creating controversy for the sake of controversy is, well, achievable.

Spark data is a created controversy for the sake of having a controversy that directs all attention to the controversy so there is none left to spend on more difficult and pressing matters.   The resulting conversation works like a filibuster speech that runs down the clock without accomplishing anything.   When the time is up, we have to adjourn without getting anything done.   It appears the primary strategy of the executive branch is to run out the clock for the current presidency to avoid damaging his legacy.   Stray voltage will keep everything distracted until that clock runs out about two years from now.

Stray voltage may be a useful description as a rhetorical tactic, but the sudden controversy and the reaction to it are data.   The exact same trickiness can work in manipulating the automated rule-making in a dedomenocracy.   An invented controversy and a vocal reaction to it can generate data to convince the algorithms to make a rule on this issue instead of making rules on some other topic.  This is not a flaw in design or security of the data processing infrastructure.  Spark data is real data and the statistical-based algorithms can determine that the controversy is urgent enough to make a new rule.  The new rule inspired by spark data will displace the opportunity for another rule in a dedomenocracy that must limit the number of rules active at any time.   The distracting data will translate into a distracting rule.

To defend the dedomenocracy from this kind of manipulative distraction, we need a way to identify and intercept spark data before it becomes part of the data store for making decisions.

A key part of my concept of a dedomenocracy is that the population are actively involved in the data science roles of over-seeing the data and the algorithms.   As in modern usage of the term data science, data science includes implementation issues such as identifying and acquiring new data sources, identifying data models to incorporate separate data sources, and scrutinizing the performance of competing analytic algorithms.    However, I envision a deeper role of data science to distinguish the different natures of various kinds of data and to insist that these natures are used appropriately.

The data science in dedomenocracy will include the identification and flagging of spark data that is a distraction that should not receive much attention.

I visualize spark data as an analogy to the experience of static electric shock such as in the following scenario.   During a cold winter day, a person is studying at a desk.  As the sky darkens, this person gets up to walk over to the light switch for more light.   When the finger is a short distance from touching the switch, he gets a static shock resulting from walking across the carpet.    The shock includes a flash of life, an audible snap, and a noticeable sensation on the finger tip.   His immediate reaction is to briefly pause in the surprise of this happening.   Before proceeding to turn on the light, he may attempt to see if it happens a second time, or shuffle his feet over the the carpet a few times to see if that will repeat the discharge.   Eventually, he tires of this and then proceeds over to the kitchen to check out something to snack on.   In this imagined scenario, the discharge completely distracted him from both his immediate task (turning on the light) and the motivation for that task (to continue studying).

This is analogous to what I see happening in modern exercise of politics when they introduce topics such as outrageous data on gender pay gaps, rape incidents on campus, recent extreme weather events (being evidence of global climate change), or conjectured bureaucratic conspiracies such as in Benghazi, and IRS political targeting.   While each of these issues claim some evidence of their claims, the issues are not the most urgent issues facing the country.   Like the static discharge, these issues may have happened, but are not repeatable, and there is nothing we can do to undo the past even and no practical way to address the problem.  The only possible consequence is extracting some punishment that will have no effect on the future at all.   Politically, the only practical consequence is to waste time in debate over the claims until it is time to adjourn.    Like in the static-discharge scenario, the politics abandons the hard work and seeks out a refreshing break after the irrelevant investigation of the spark.

The country has serious problems about federal budget deficits, unfunded entitlements especially related to health care expenses, over-regulation from accumulation of decades of regulations where much of it comes from agencies that have outlived their original purpose, and serious national security challenges.    Even something as recently passed as the Affordable Care Act, needs a dramatic change to get the law to conform to the practical realities that have come up since passage.

Unlike the sparks, all of these issues have practical solutions that can provide immediate or near terms benefits.   These issues do, however, require a lot of work with debate and compromise to get agreement on any of a number of solutions that can help.   Because this work is hard and painful, politics eagerly welcomes every spark (or stray voltage) of new irrelevant and unsolvable controversies.

In a dedomenocracy, the population needs data science skills to recognize the spark data and quarantine it from the other more relevant data for automated rule making for the most urgent issues.   Sparks should not define urgency and should not influence the rules that address the real urgent issues of emergency or opportunity.    In my discussion of dedomenocracy, I emphasized the need for toleration within the population to accept that many problems (especially ideological ones) do not need immediate attention and in fact may never be addressed.   Due to the rapidity of making short-lived rules, the rules must be restricted to the most urgent issues.   Dedomenocracy can not afford to be distracted by spark data.

Democracy also can not afford to be distracted by spark data (stray voltage) for the same reason.   The urgent issues need solutions that require hard and painful choices.   Unfortunately, the modern practice of democracy demands obedience to daily public opinion polls that are easily manipulated by stray voltage or spark data.  Instead of governing by the people, modern democracy wastes time on arguments over sparks.

In order to earn the benefits of taking advantage of short-term opportunities for benefits or avoidance of risks, a dedomenocracy must act quickly on the most important issues with the largest immediate consequences.   A dedomenocracy can not afford distractions of spark data.

The following is an alternative static discharge story for an analogy to an appropriate response to a shock.   In this story there is a house party to watch a televised game but the guests decide spontaneously to move the furniture to make room for a board game to play on the floor while watching televisions.   The guests ask if the screen can be adjusted to tilt more downward so they can have a better view.   The host reaches up to adjust the screen and experiences a static shock.   Almost certainly the guest will not even notice the discharge.   Even if the host exclaims the fact of its occurrence it will not distract the party.   The host himself will very likely quickly dismiss this inconvenience, proceed to adjust the screen and then rejoin the party.   The spark did happen and it was something that should not have happened.  But it does not interrupt the progress of the priority, in this case a happy one.

This is how government (especially a dedomenocracy) should treat spark data.   There are innumerable static discharges that can occur.  People will make embarrassing mistakes.   People belong to cultures with deeply rooted traditions that can offend others.  There is not enough resources or time to solve every single problem that annoys people.  Meanwhile, there are major issues that government can solve (although with painful choices) and these solutions can provide substantial near-term consequences either in terms of reaping benefits or avoiding hazards.   People should allocate the government’s resources to these issues that can be solved and can deliver desired results.

In a dedomenocracy, this objective requires diligence to keep spark data from contaminating the store that provides data to the automated analytics for rule making.


8 thoughts on “Spark data: distracting data deliberately introduced to influence analysis

  1. Pingback: Grammatic fallacies in data | kenneumeister

  2. Pingback: More fallacies in data: inequality of income and employment | kenneumeister

  3. Pingback: Measles outbreak and antivax movement foretells dedomenocracy in action | kenneumeister

  4. Pingback: Dedomenocratic Party in a democracy facing external enemies | kenneumeister

  5. Pingback: Controversy over Religious Freedom Restoration Act (RFRA): Separate laws for young and old | kenneumeister

  6. Pingback: Datum Governance: Distinguishing bots from real world | kenneumeister

  7. Pingback: Big Data as a ship on a sea of missing data | kenneumeister

  8. Pingback: Spontaneous Data | kenneumeister

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s