One way to summarize my recent posts, is to build a new journalistic infrastructure to collect detailed accounts of abundant personal stories for use in data analytics. Instead of wasting time writing long narratives on single stories, the project should focus on recording observations in structured and tagged data so that it is immediately available in big data stores for analysis. Narratives will come eventually following interpretations of analytics and visualization. The data tools will include drill-into capabilities to obtain specific details to illustrate relevant points. In my more recent post, I described the missed opportunity of not recording response personnel in Ebola hot zones so that we can have objective observations of how these persons contract the disease. In the earlier post, I described a generalization of all human activities to be able to answer today’s question of what happened yesterday. Both posts suggest a project of collecting extensive observations to be cataloged into data stores for simple retrieval.
I argue that as we move toward a life where data analytics obligates decision making and obligates compliant participation or cooperation, continued social harmony requires access to this data similar to how our ancestors relied on the daily paper to keep informed. We need to know what happened yesterday, and what happened is not some narrative but instead it is the entire collection of data that will feed the analytics that will produce the decisions that we will have no choice but to follow.
The following tweet by @evetushnet brings up views that contrast with the ones I have been discussing:
— Eve Tushnet (@evetushnet) October 10, 2014
The first article at the Washington Post complements her earlier article at the American Spectator in that both critically review the recent trend of increasing popularity of journalism involving detailed narrative accounts of a single person’s experience. Such accounts, described as first person journalism or democratization of opinion, are contrasted with the more traditional journalistic practice that involve broader research and collaboration to build a balanced view of the issues involved. From Eve Tushnet’s article, the quote “This first-person style is not exactly Columbia School journalism.” give an impression that first-person style is a lower of journalism, although she goes on to describe there is journalistic rigor in verifying the subject and the credibility of the claims.
Both articles are noting that such first person accounts are very popular among readers. People enjoy reading these individual accounts of direct experiences with some important broader condition. The danger is that these single individual accounts may not be representative of the entire population affected or the accounts may be biased either by the motives of the subject or by the impact on their lives being extremely unique to their particular circumstances. In contrast, my recent posts seek to encourage more more accounts just like these. We need many more accounts than can be realistically published one at a time.
I take the example of health care providers who serve Ebola patients. Among the entire population of professionals (health care providers, journalists, or cleanup professionals) who enter a hot zone, only a small number will actually contract the disease themselves. I argued that a detailed documentation of their every action in the hot zone can be very valuable to identify the mode of transmission and help quantify the hardiness of the virus to survive to that point. In order to have this objective data available when one of these professionals are diagnosed with the disease, implicitly we must be collecting this data on every such person who deliberately entered the hot zone for the purpose of providing their professional service. This call for data is similar to first-person journalism or democratization of opinion but it is on a much grander scale to include ever single professional offering services in a Ebola hot spot.
The difference is the volume of such accounts. I am asking for a fully accounting of the experiences from every single professional service provider who enters the Ebola infected zone, while the first-person journalist will have a satisfying marketable result with just the single high quality narrative of one person’s experience. The journalist sells a story that a general audience of readers find interesting to read on a news-gathering web-site (modern incarnation of a newspaper). My goal is to feed these stories into a data store on a machine.
Currently, the single-story narrative has the advantage of having a straight-forward financial incentive that directly or indirectly derive from the number of human readers of the story. In contrast, machines hosting data stores do not pay subscriptions for new content and are immune to marketing persuasion: machines have no money. Most of the modern success stories from big-data analytics involves data obtained for free, voluntarily and often inadvertently provided by subscribers of seemingly unrelated service (such as fitness tracking apps).
I am amazed to watch how quickly new schemes such as fitness-tracking apps become popular and thus introduce new streams of personal data for data mining. I don’t doubt there is far more data can be collected with future apps. The problem is that this data is primarily about a particular subset of the population that is most appealing to marketing: healthy and affluent people who are free to make purchases. For difficult and challenging issues like the Ebola crisis, we really need data in areas where there is inherently no market for such apps that can collect freely volunteered data. In my criticism that big data is a paper-tiger when it comes to Ebola, I observed that the needed data is absent in our data stores. No amount of analytics is going to make up for missing data when it comes to answering critical questions such as how it is this spreading, where it will hit next, or what are we doing wrong. To get that data, we need to provide a financial incentive for people to go out and collect it.
In my most recent post, I presented two ways we can pay for data. One was is to invest in equipping with video-recorders all professionals as they do their duties inside the Ebola hot zone. This may be largely automated if there is a way to keep the camera focused on the action of the professional to capture potential opportunities for infection. The other way was illustrated with the published documentary of a single victim’s progress of coming down to the disease culminating in death and the disinfection and disposal of belongings.
That second way, the documentary of a single individual’s fatal experience with the disease, is an example of first-person journalism. Even this one example provides important clues about how the disease is progressing. For example, one thing that impresses me is that the community appears to be very well educated about the disease and are doing the best they can with the resources they have. The problem is that this single example may not be representative of all of the other cases, perhaps other cases are more like the rumored older traditions that are contributing to the spread of the disease.
There is a danger in reading too much into a single documented case study like this. This danger is suggested in the two articles presented in Eve Tushnet’s tweet. One answer to this danger is to pursue a more traditional investigative journalism to balance the single person experience with more thorough journalistic research to produce a more comprehensive report that presents the current best understanding of the entire problem.
I would argue that another answer is to collect even more such stories at comparable levels of detail. Instead of one documentation of one person’s experience, we should have an exhaustive collection of documentation for every single person’s experience with that challenge. In the case of Ebola, for example, it is reasonable to expect a complete collection of experiences of everyone in the zone where Ebola is active: including both those who never get the disease as well as those who do and survive or succumb to the disease.
While a single story may find a market for human reader when published on a web-site, there is no market for tens of thousands of similar stories (especially those stories of people who came through the experience without any tragic or emotional consequences). As a result, we currently don’t get this data.
The root problem that is holding back the value of first person journalism is that the compensation incentive of journalism is on selling a narrative that will attract human readers. These readers are either paying subscribers or belong to demographics that attract advertisers. To make money, the journalist has to seek out fresh and inherently interesting cases and provide the necessary research to fill out the details to make a publishable article that will attract a large number of readers (ideally through a chain of referrals or recommendations).
The problem with first person journalism is not that the subject is a single individual’s experience. The problem is that there is no financial incentive to provide similar case studies that exhaustively spans that single individual’s peer who lived different experience in the same environment. We need a large number of similar accounts to be better informed about the overall problem and not be mislead by a biased single case study.
Data-driven analytics currently thrives on wealth of available data that happened to be freely volunteered by others. The ultimate realization of the value of the data comes after the analytics and visualization presents a story that attracts an audience. To be relevant to the difficult and urgent problems, data science projects need to find ways to propagate the financial benefits of the final results back to the information sources in order to provide the necessary incentive to uncover new but difficult to obtain data. The journalism market needs new incentives to redirect their skills toward collecting a vast number of first-person accounts. As I described earlier, the journalist’s skills are exemplified by the input and the output stages of data science projects. The mathematics and software for analytics and visualization need data to work with and story-tellers to attract audiences to the results.