Media’s Ferguson Fable, a morality tale of dark data

On November 24, the St Louis County prosecuting attorney released the news that the grand jury would not indict Darren Wilson on any charges concerning the death of Michael Brown.   After that announcement, stories of rioting, looting, gunfire, and arson dominated the news.    Virtually every news organization and popular opinion writers (bloggers) have presented virtually angle on this story.   I have nothing of significance to add to the actual events that occurred.  Look elsewhere.

I offer some observations as someone (a nobody) who has nearly complete ignorance about this case and is trying to learn what is going on from the news and other reports.   In particular these observations come from a dedomenological perspective.   I am very disappointed in the quality of data that modern journalism has made available to me.

In earlier posts, I described different qualities of data ranging from bright data (well documented and controlled information) to dim data (open to debate) to model-generated data either in the form of dark-data to substitute for ignorance or forbidden data that keeps us ignorant.

The brightest data in this scenario is the emotions involved.   The protests clearly demonstrated that many people are upset about the circumstances of the original shooting and the later no bill finding of the grand jury.

Much of this outrage comes from the narrative that Michael Brown was shot after showing a gesture of surrender: shot with his hands up.   The controversy depends on whether that in fact actually happened.   The officer claimed he had fired in self-defense in a series of escalating events starting with his being hit in the face while seated in his vehicle.   The grand jury considered the evidence and found that the officer’s account best matched the evidence.    The counter claim for continued outrage was that the prosecutor manipulated the grand jury to get this result.  As I mentioned, the actual details are discussed extensively elsewhere and will continued to be discussed for years to come.  I have nothing to add about this.

My only observation is that people on various sides are upset.  From a data science perspective, the information about people being upset populates a particular dimension of data.   We have a lot of data about people being upset.  This is bright data because there is plenty of documentation (video, audio, written accounts, opinions, etc) to give no doubt that people are upset.

As second form of bright data is the evidence of what was presented to the grand jury.   I have not studied the actual released material, but it appears to be extensive.  Although there may be other details that have not been released, I’m confident that what has been released was actually presented to the grand jury.  This released material qualifies as bright data of what the grand jury had available for their consideration.

To clarify my use of terms, an example of presently bright data is the package of information that was available to the grand jury.  The actual information within that packages consists of various different types of data: the actual evidence of what occurred on August 9 that led to the fatal shooting.   That evidence is not as bright.  The evidence consists of photographs, autopsies, crime-scene investigation reports, and various testimonies.   Although there is a fair quantity of evidence, the individual pieces of evidence is subject to some debate.   This is what I call dim data.   These are observations that are ambiguous as to what they have to tell us about what actually happened.

The grand jury considered all of the available evidence and concluded that they fit together in a way that best collaborated the officer’s telling of the events.   The third piece of bright data we have is that the grand jury decided to not issue an indictment.   This choice to not indict is the fact that was released in the press conference.  Their findings are a matter of public record.

During the prosecuting attorney’s press conference, he presented the reasoning for the grand jury’s conclusion.  The new data dimension here is the reasoning.   At best this dimension is dim data.   I don’t think we can ever know for certain the actual thinking by each of the jurors.   In addition to observing the evidence, their thinking may have been influenced by their peers and by the personalities testifying in front of them.   Although the reasoning is a form of very dim data, it is the natural product of a grand jury system we have used for hundreds of years.   Although we may never know the exact reasoning of the jury’s conclusion, the long history of use of grand juries gives us confidence in a process of allowing a jury of peers to review the evidence.  The grand jury procedure deserves trust that it will come to a fair decision about the weight of the evidence to support a trial.   We accept the grand jury result despite the fact we ourselves don’t understand how they came to their conclusions.  We trust that they are peers and that they have come to a reasonable conclusion.

Unlike trial juries, the grand jury is not tasked to find guilt beyond a reasonable doubt.  The grand jury merely decides whether a person should be accused (indicted) of a crime that should go to a jury trial to determine guilt beyond a reasonable doubt.   My understanding is that the grand jury’s task is to find that there is sufficient evidence for a trial.

One of the criticisms of this particular decision is that most cases presented to a grand jury do result in indictments.  The grand jury does not determine guilt, and as a result has a lower bar to meet.  They are only determining whether a trial is justified by the evidence.  As a result, most of the time the grand jury will find some crime that is worth prosecuting in a trial.

This case was unusual in that the grand jury did not find this case to justify a prosecution.   I repeat my claim as being a layperson merely trying to learn what is going on.   This data about the relative likelihood of grand juries finding in favor of indictments appears to be bright data because it involves query of all cases presented to grand juries and the proportion of those cases that result in indictments.   Few grand jury deliberations result in no indictments, and this is one of those few.   This information is bright data.   However, the implication that this rare result involves unfair manipulation of the jury seems to be not even dim data.  Such speculative data (the assertion that the grand jury was manipulated to give a certain outcome) falls in a category I call dark data, or model generated data.

Model generated data substitutes for missing data.   In general, models or our beliefs of what is going on provides default data to take the place of the absence of any firm evidence.   In this one case, there is no consensus about the integrity of the grand jury system.   Some believe strongly that the government is so corrupted that the prosecution conspired with a corrupt police department with sufficient skill to get a grand jury of peers to make their preferred decision that is contrary to what less manipulated jury would have decided.   Others believe that the government is not that corrupt to create such a biased and unfair outcome.

The biggest question in this case is whether this one small self-governing suburban community in St Louis is seriously corrupted, in particular in terms of fairly governing of different racial groups.   Because this question is unanswered, the competing theories of grand jury reliability do not present a reliable source of data.   The above data about the grand jury possibly being compromised is dependent on whether the government is corrupted to the point to make this compromise possible.  This is begging the question.  I don’t know the answer about the level of corruption in the local government and thus I can’t know whether that corruption could have compromised the grand jury process (and their reasoning).

An ignorant observer of distant events, this is the question I most need answered.   Is the local government in Ferguson so corrupt that we can not trust the grand jury result?   If so, then the grand jury result is worthless.

What I am lacking is data about the level of corruption of the local government, those who make the local laws, those who police the streets, and those who operate the justice system.   This case fundamentally hinges on this one point of whether the government is corrupt to the point of dysfunction.   What we really need to learn is whether this is a corrupt government prone to racism-inspired abuses.   From my distant perspective, I only see the singular data point of this one case.  The interpretation of this case depends on whether the local government is corrupt or not.

Here is where I am profoundly dissatisfied with the information available to me.   Despite the fact that the event occurred nearly 4 months ago, I have seen only two dimensions of bright data:

  • The evidence of the shooting that was made available to the the grand jury, and
  • The fact that some people are very upset about this

I have not yet seen bright data to satisfy my most pressing question of whether this local government is seriously corrupt in its racially inspired abuses of its community.

Despite the fact that everything hinges on this one question, there has been no real attempt to collect data to answer this question.   This is an example of my suggestion for the future of journalism  to become more active in broadening the scope of new data as I described in an earlier post.   One way to summarize that post is to observe that modern journalism is becoming irrelevant as we have more widespread use of big data and analytics to generate evidence-based decision making.   The earlier role of journalists to prepare individual reports to influence opinions is becoming obsolete because we are steadily removing human decision makers from the decision process.   Modern decisions increasingly rely on massive amounts of data instead making persuasive generalizations (opinions) of small samples of data.  To improve these decision, we need to move away from influencing human decisions and move toward broadening the scope of data available.

The journalism in the Ferguson case appears to me to be very anachronistic.  Compounding this impression is the anachronistic claim that a modern day St Louis county exhibits a degree of racist corruption that rivals the darkest period at the time of the Civil War.   The concept that a police officer would execute a surrendering person is more fitting of a time over a century and a half ago than it is today.

Similarly, the approach of journalism to cover this outrage is out of date, but for a different reason.  Journalists are covering a story that would be more believable in the distant past.   They are covering this story with a goal of publishing influential reports that were widely applauded a half century ago.  This entire journalism project is trapped in the past.

The announcement of the grand jury finding was covered by every major new agency each with their own reporters on the ground to cover the exact same news conference.   We are blessed with lots of independent reports of the same event: the reading of a statement about the grand jury findings that we could observe for ourselves.

Following the announcements, the protests began.  Although there were many instances of violence, vandalism, arson, and looting over a wide area following the announcement, there were a finite number of such events.   Each of these were extensively covered by reporters and photographers who spent as much time competing among each other for better views as avoiding the violence of the rioters or the responses by police.   For each burning building, burning car, looting of a business, we have numerous reports and images from various angles.   This coverage was redundant to the abundant independent bloggers and amateur live-streaming at the same time.

I applaud the extensive coverage of these events.  We have very bright data about the protests.  There is very little left to the imagination of what happened that night.  But this is old journalism of capturing the spectacle of a violent episode.  This is the ancient form of journalism intent on selling news papers (or web-site page views).  This is the form of journalism that attempts to use these isolated reports to support opinion pieces that we hope will convince human decision makers in leadership roles.  The coverage since August did in fact provoke a statement from the president of the US.

Today, however, decisions are becoming more based on data rather than on human decision makers as I have discussed in earlier posts such as here and here.  Having 20 different angles of videos on the same burning car or building, where each of these linger at great length on the progress of the flames does not add any information to the simple data point that this was yet another instance of arson.

Old journalism is to attract readers or viewers, and such images attract large audiences.   These images are cheap today.  Many images, and perhaps many of the best images came from unpaid amateurs.   Collecting one’s own copyright images of such an event has very little monetary value today.  Anyone in the crowd can take a picture, and anything that burns makes a great subject for a photo or video that will attract attention.

From my perspective as a citizen wanting to know whether where is a corrupt racially divisive government in this modern urban community in a sophisticated city in the center of country, this redundancy of images and reporting is irrelevant.  A single still image of each burning building or vehicle answers the question that there were a certain quantity of such actions.  These still images provide sufficient evidence that the perpetrators were sufficiently motivated to commit arson.   The data points I received was that there were a good number of separate instances and they appeared to include both public and private property.   Also many of the events appeared to be mostly indiscriminate cases based on opportunity.   Some of the events appear to be motivated for opportunities other than protest, such as choosing looting targets that had more attractive property to possess.

The few dozen or hundreds of data points add to the already bright dimension of data telling us that people are very upset about something.   I learned this from the riots from 4 months ago, but now I know they they are still upset or perhaps even more upset.   But this new data does little to advance the project of figuring out whether the local government is so irredeemably corrupt that the town must be destroyed.

This is where we need a new concept of journalism to get into the business of mining for new data instead of mining into data.  Multiple camera views of the same acts of vandalism produces redundant data.  What I need is more dimensions of data.

In particular, I need data about what all of the other residents in that town are thinking.   There were large crowds in the street, but a large portion of that crowd included journalists, agitators, and protesters from outside of town.   These outsiders’ knowledge of any corruption was learned from the recent news and social media gossip.  Their presence reinforces the hypothesis that the town suffers from unacceptable racist corruption.

This theory of racist corruption is a form of dark data but the participation based on that dark data produces a feedback loop.  This feedback reinforces the theory that there must be corruption.   The feedback ends up confirming the model.   For example, the supposition that the local police is excessive in their use of force motivates a larger riot that causes property damage that causes the police to react more forcefully.  The old school reporting of just what is occurring where the reporter can witness it amplifies this effect by showing an immediate example of force that seems out of proportion to local conditions.   These reports fail to inform us of the wider challenges that the police were facing at the same time.   In the end, new observations appear to confirm suspicions but that data would not have come to be if the model weren’t assumed in the first place.

I suspect much of the later observations of riots and angry protests were responses to observations of police and government reactions to earlier protests.   I don’t find this data to be very useful to answer the most important question of whether the government and policing were racially abusive on a scale more appropriate to conditions 150 years ago than today.   Police response to rioters is not a fair observation of routine policing prior to the events of early August.   Prior to the shooting of Michael Brown, how much confidence did the local population have in the ability of the local policing and government to serve their needs in a racially equitable manner?

I do not know this answer.   Disappointingly, I have seen little additional information over the last four months to come closer to that answer.   The subsequent news has all been contaminated by the assumptions about the corruption or fairness of the government.   Outsiders have come in based on the assumption that there is corruption and the new information is entirely about how the local government and policing is responding to this escalation based on rumors.

There has been plenty of opportunity for journalists to collect relevant information.   All that is needed is to change the focus from mining into data (reporting from the street) to mining for new data (seeking out new information off the street).

A measure of the corruption of the local government is how freely do people give consent to being governed.   I described this in earlier posts.  A legitimate democratic-style government needs a super-majority to consent to be governed by a simple majority.   I define a super-majority as enough in number to provide sufficient support to the the government to survive a protest of a smaller minority.

The problem with the super-majority is that it is often silent.   Over the past few years, I have paid attention to reports of many large scale protests in various countries.   These protests involve large crowds filling streets and protests persisting every day for weeks.   Initially, my attention is on the crowds and I feel hopeful that the population succeeds in getting what they want.   Sometimes they do succeed and I feel great for their success.   But the celebratory mood does not last long before it becomes clear that the winning protesters are unable to set up a legitimate government.  Despite the seemingly endless crowds, their numbers still represented a minority of the population.   Most of the people did not participate in the protests and either disagreed with the protesters goals or were uninterested in helping them.   When the former protesters sought to build a government, they were not able to create a stable one.  I imagine this failure as a failure to build a super-majority consent for their new government.

The repeated lesson from these many successive examples is that the journalism invested far too much on covering the appealing story of the actively protesting minority.  This extensive coverage often gave the impression that the entire population was against a very small number of leaders.  This gives us confidence to support the crowds and to cheer when they depose the few.  The disappointing aftermath teaches us that we grossly under-estimated the numbers who supported the old regime, or the numbers of adversaries that the old regime was able to contain.

Journalism failed to inform us of the people who did not show up to protests, or what their opinions were about consenting to be governed by the protesters.    The protesters’ success in producing political changes resulted in worse conditions than what existed before the protests.   This is not to say that the protester’s goals were wrong.   In most cases, the protesters goals were very admirable even in hindsight.   The problem is the protesters never had a chance at earning the respect of a super-majority.   Their goals were unrealistic.

Now we see a protest movement in our country centered on events at Ferguson.  This seems to be a repeat of earlier protests in that we get extensive coverage of the protesters goals and grievances.   The protester’s complaints engender our sympathies and support.  But the lessons of earlier protests is that the more important question is who is not out protesting.

In Ferguson in particular, the protests have distorted the statistics.   Many in the streets protesting are from out of town.   Others, belonging to racial groups who are not experiencing corrupt government, are protesting in solidarity with those who claim they have been experiencing that corruption.   The people on the street whether they are protesting peacefully or violently do not tell the full story of how the super-majority of the population feels about their government.

Imagine if Ferguson were a sovereign country.    Would the protesters be able to form a better government from the one that their protesting is attempting to overthrow?

To answer this question we need a different dimension of data.   We need to measure the sentiments of the people who are not participating in the protests.   Even those who are participating in protests out of solidarity with the aggrieved parties may personally have no personal complaint about the government.

The current government may be working satisfactorily.  Any working government is bound to cause events that cause some people to become upset.   It is unrealistic to expect a legitimate government to never cause harm.   A legitimate government is one that manages to maintain super-majority support through effective accountability by its leaders either to persuade the dissenters that their complaints are addressed, or to persuade the rest that the dissenters are unrealistic.

Officials in Ferguson and St Louis County attempted to demonstrate that accountability by conducting an unusually thorough review of the evidence in front of a grand jury.   The results of that demonstration apparently did not satisfy a portion of the population who chose to conduct violent protests.   But the demonstration could still succeed if it persuades the unseen super-majority that this minority protesting violently are being unreasonable.

I see very little data about this non-protesting population.  I see no real attempt by journalists to even seek out this data.  But this data is critical to this controversy.

In my above mentioned post about the future of journalism, I described that future as taking on the role of becoming the new data sensors to obtain broader data involving the silent non-eventful population to complement the coverage of the remarkable events.   The latter do get more attention, but they are increasingly cheap as everyone in the crowd is equipped to be journalist competitors and most of them work for free.   Although we do need the data about the remarkable events, but we don’t have to work hard or pay much to get it.   Start a fire someplace and dozens of cameras will catch the flames.

The hard data is what is happening far from these scenes.   During the protests, there were people who stayed home or arranged their days so they would avoid the protests.   Some people may be in the vicinity of the protests but not agreeing with the protesters.  They may think the more vocal protesters are exaggerating the problems.   These non-protesting persons’ opinions are what the reporters should be seeking.  This can provide the data that can tell us if there really was a corrupt racially-biased government.

From the very start of the protests, there were elements of violent protests.   In addition to destruction and looting, there were vandalism in painted warnings that people will get hurt if they complained.   There was abundant evidence of intimidation within the community, intimidation meant to keep the objecting voices silent.   This element of intimidation makes obtaining the broader opinion even more difficult to obtain, especially if those opinions were supportive of the government.

There are many journalistic approaches to get at this data.   For example, we may measure a community trust and faith in its police based on how frequently they call police for assistance and how serious those conditions have to become before the police are called.   If the community generally shows little hesitation in calling their local police, this is at least evidence that they think the police will act professionally and in their interests.   Similarly, records about how long people stay in the community even when their jobs are outside of the town can provide some evidence that they find the local conditions tolerable or even desirable over other locations.

The project of data-driven analysis and decision making needs the journalistic skills to obtain data that is not going to come from voluntary amateur journalists.   In today’s journalist economy, if there is a big scene going on, we can be sure that the scene will be documented by the participants with their cell phone cameras and recorders.   The real value is obtaining the information at the opposite side of the scene, where people are avoiding the scene or have reason to fear speaking out against the scene.  Journalists have the skills to obtain these hidden stories.

In Ferguson, this effort to measure what everyone else is thinking (where they can express their true thoughts safely) would have provided us the data to confirm or refute the allegations that the local government is racially abusive, is no longer deemed legitimate in the community, or is no longer enjoying the community’s super-majority consent to be governed.

Instead the journalists sought to compete with the crowd journalists to replicate the reporting from amateurs but give the reports added authority by the fact that they are employed by reputable news agencies.   Professional journalists added no new value except for some respected new agency independently confirming what every has already seen.

I started this post as another post complaining about dark data, where dark data is substitution of assumptions for missing observations.   In Ferguson, we were fundamentally ignorant on the basic fact of whether the local government was out of control racists who indiscriminately brutalize and murder the black community.   This image of rampant racism became the dark data that journalists used to build their stories.   The dark data was that racism is real and it is worse than we previously imagined possible in 2014 in a modern city.

The journalists (especially young white journalists) assumed that the entire black community unanimously agreed with the complaints expressed by those who showed up on the streets.   They assumed that because the complaints came from blacks, then every black person who had a home in the community were either in the streets or lending their moral support from home.   This dark data assumption convinced these journalists that it was sufficient to invest exclusively on cover the street protests to measure the sentiment of the entire community.

The huge deployment of professionally trained journalists was largely wasted in redundantly covering the protest scene covered by amateurs, when they could have pursued the larger story of obtaining the opinions of all of those who were not participating.  Perhaps this unseen population objected to the new conditions that prevented their government from providing the government’s usually effective service of protecting their property and putting out the fires.   Perhaps the silent super-majority has always given consent to a local government that provided services when they need it: such as when their businesses are being robbed or their property is on fire.   We didn’t learn this because everyone was busy independently confirming that the protesters on the street were very upset.


12 thoughts on “Media’s Ferguson Fable, a morality tale of dark data

  1. Pingback: Consent to be Governed: the test of Ferguson-inspired protests | kenneumeister

  2. Pingback: Fake but accurate: what we once called fiction has now become non-fiction | kenneumeister

  3. Pingback: Gossip Journalism: big data can use bright dirty data unfit for publication | kenneumeister

  4. Pingback: Big social data obligation to tell stories requires coercion, big data invites reconsideration of torture | kenneumeister

  5. Pingback: Databases motivates philosophy with multi-valued logic anticipated by Buddhist thinkers | kenneumeister

  6. Pingback: Appreciating biblical stories as proto-journalism | kenneumeister

  7. Pingback: Datum Governance: Distinguishing bots from real world | kenneumeister

  8. Pingback: Life in automated world | kenneumeister

  9. Pingback: Fake but accurate: what we once called fiction has now become non-fiction | Hypothesis Discovery

  10. Pingback: Gossip Journalism: big data can use bright dirty data unfit for publication | Hypothesis Discovery

  11. Pingback: Big social data obligation to tell stories requires coercion, big data invites reconsideration of torture | Hypothesis Discovery

  12. Pingback: Databases motivates philosophy with multi-valued logic anticipated by Buddhist thinkers | Hypothesis Discovery

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s