I am not an avid watcher of TV news but occasionally I do catch some report that shows a captivating visualization of a zoom starting from a rotating globe and ending up at the street level before cutting to the a reporter on the scene. I can see the value of getting the entire audience to have a good context of knowing what part of the world, country, and city that is being reported. I do something similar myself when I see an article on the web, copying any location information into a map site to find the location. I did that the other day when there was a breaking news report of a house fire about a mile from my home: the article just mentioned the block number but I was able able to use the street view feature to get an idea of what type of housing might be involved (newer wood-frame or older brick-and-block).
By now online mapping with zooming and aerial and street view of any location is readily available. For the most part the information in the maps are accurate and reasonably current. Today this is an invaluable resource for understanding at least the geographic context of some breaking news.
Complementary to the online maps are the street traffic overlays that show traffic congestion in near-real time. For news about some road being closed for any reason, the city map of all roads and congestion points can put the shutdown in context that many other roads are congested as well: where traffic is moving, it is often moving slow. I recall this being useful a few months ago when there was a surprise morning-commute snow squall that quickly made all the streets slick and stopped traffic throughout the city (I mentioned the event here). During that event, I accessed another modern resource of real time weather maps showing where precipitation was occurring, how heavy it was, where it was heading. This is so much more valuable than what we once had available about some vague statement that some people seeing snow and that snow is heading in a particular compass direction.
These examples of maps provide vivid and relevant information to provide context for a specific even or breaking news item. These are invaluable to understand the event, not only the specific event but the surrounding context that inform us of how concerned we should be, for example, whether the problem will get worse or affect more people. In the recent event of the earthquake in Nepal, one landslide risk map showed the risk being throughout the entire country of Nepal, helping to understand how widespread the effects were of this earthquake.
These examples set a new standard for rapid access to context information to accompany the new information for breaking news. In the case of street maps and aerial/street views, this information required extensive investment long before the event occurred. In the case of the more recent information (street congestion, weather radar imagery, landslide risk assessments) there was a need for prior investment for models and technologies to provide this information on a timely basis. These investments were made on a global scale where the vast majority of this readily available capability may never been needed for matching with a breaking news story. But when a breaking news story does occur, we welcome the ready access to this information specific to the broader context of the story. For example, in Nepal earthquake our ability to see how widespread the shaking occurred allows us to recognize that the photographs we do see may be representative of the kind of damage suffered in less densely populated areas.
We need more investments into more dimensions of this context information to be immediately available to help us comprehend breaking news events. A good illustration of a lack of context information comes from the past week’s news of protests and riots and Baltimore, Maryland. I live in Washington DC area, not far from Baltimore, but I have no real experience of being in Baltimore so my perspective is probably similar to most of the nation’s trying to understand what is happening.
For the first few days, we received a lot of on-the-scene reporting of the protests/riots visible on the streets. We had a good understanding of the geographic locations within the city where these were occurring. Yet, personally I felt there was so much lacking from the news reporting. I was frustrated that it seemed that even the news media didn’t even recognize that this missing information would even be something of interest to the viewer.
There was plenty (and often redundant) reporting of opinions and emotions of people on the street, but I wanted to more about about the opinions of the residents and business people of the same areas but who were not on the street. The exclusive reporting of sentiment on the street had a strong implication (and sometimes explicit expression) that there was consensus that the situation demanded strong protests and violence. Perhaps that is true, but where were the maps showing residential population density to provide context to those on the streets. If not everyone was on the street, then what was the opinions of those? For example, how many shared this view?
There was plenty of coverage for the bystander video of the specific arrest of Freddie Gray and placement into the van. What was missing was complementary video or background information about standard training for making similar arrests as well as videos to show how common or rare it is for an arrested person to behave as Freddie did. A strong message of the reporting is that it should have been obvious something was wrong or unusual that the police ignored. Without readily available historical data of similar arrests, I either have to come to my own conclusions (based on no personal experience of watching any arrest) or to accept the authority of some reported assurance that this was unusual. This missing information is similar to the traffic congestion maps during snowstorms to tell whether one event is really unusual or actually quite common.
The broader justification for the protests and riots was that it was a response to a long history of abusive policing within the city. If this is true, then there should be an extensive data trail that should immediately be available to document this. Analogous to the street-view database of online maps, we should expect immediate details of all of police encounters including both abusive and non-abusive ones. In contrast, I didn’t see any information about this except for live interviews with people on street, politicians, or researchers giving their opinions. It was only by accident I stumbled upon this interview that gives a more in-depth details of the extent and history of the abuses. This background information is important so that we can better understand what is going on that led up to the current events, but this is just qualitative narrative by one source that may not be representative. It turns out that Baltimore does have open data access to its police statistics, but that data is very incomplete and vague. In analogy to the map resources, this government data is like the old 1-kilometer resolution satellite images when we need street view photos with such resolution that requires blurring out personally identifying information. This data was not available as the protests broke out, and I am sure we suffered as a result.
Another narrative about the protests was that the population has lost trust in the broader justice system and the city government despite both being fairly elected with a single dominant political party. The majority of both the government and the population belong to the same political party, yet we are to believe that the population can not trust the government as if they are living under some tyrannical ruler. This trust or consent in government is essential for patience to allow the tedious investigative and judicial processes to come to a just conclusion. The narrative was that the population as a whole no longer granted that consent for patience. To me, the missing data is whether there remains a super-majority consent to the government (including the justice process). As I’ve already hinted, this is a city with peaceful and successful elections without a significant opposing political party. My presumption is that the city enjoys wide support from its citizens to patiently wait for the usual judicial process to work. Again, the data I have is like the 1-km resolution satellite imagine, in this case in the form of results from biannual elections. While the reporting did a good job in capturing the fact that the people on the streets were no longer patient, we lacked information about everyone else. In particular, is there a super-majority that continues to support the government to allow it to operate normally. I define a super-majority consent as far more than a simple majority needed in elections so that the super-majority can easily outnumber large protests. Recent foreign-country events emphasize the importance or distinguishing the super-majority consent from the visible protesters who despite filling streets end up not representing a sufficient super-majority to take over rule.
I may be especially frustrated because I follow closely the enthusiastic promotion of data science and big data technologies. These promotions promise a far better world as result of the big data. Some of this promise is illustrated by the examples I provided at the start of this post. It is amazing how much geographic and location-specific data we have available immediately on the news of a major earthquake in a relatively remote location. Meanwhile, closer to home and in an area of higher affluence and existing investment in government open data, the relevant data that could mitigate the protests and riots and restore governmental harmony is woefully lacking. The promises of big data remain somewhere in the near but indefinite future.
My impression is that we should be enjoying the benefits of big data today and in particular the recent (and on-going) events in Baltimore represent a major missed opportunity for big data to show its value. The opportunity was missed because no one has been investing in populating existing big-data technology with neighborhood relevant information about detailed police practices (trained and experienced), or measurements of minority opinions in context of their neighbors. Like in the big-data success stories of geographic mapping, traffic-congestion, or weather maps, this data requires prior investment to collect data and to improve that data so that relevant and useful data will be available immediately when needed such as a sudden spark of widespread protests and rioting. The primary promise of big data is that it can save us the pain and hardship of having to start that research only after the problem starts. It should be helping us today.
NYTimes article provides an example of what would have been more valuable if published a week ago when the protests and riots began. It is valuable to have a selection of different intervals but ultimately it is excessively constrained by the media to have just a few such snapshots arranged in a balanced manner and connected to a self-consistent narrative for long-form journalism.
Moving to a data collection model will allow journalists to compile every story over time and in a way that postpones seeking balance and narrative-ties for the moment the information became available. Last weekend, we could have queries “all perspectives of Sandtown” and then matched the counter-stories and constructed a narrative that would better inform us about how to interpret the current events. This would have the advantages of being available immediately when relevant and of being unbiased by the current news. A proactive data collection of the neighborhood would benefit us by giving us the baseline of the relationships between neighbors and between residents and police.
This requires a different model for journalism than rushing out some research after the fact.
Pingback: Big Data as a ship on a sea of missing data | kenneumeister