In earlier posts (such as this one), I described one of the skills of data scientists is to be good story tellers. I explained that “story telling” is the way STEM-trained people describe the art of rhetoric to present a persuasive argument. I explained that this is an essential skill for data science. Until we fully obligate decision-makers to obey the results of data analytics and visualization (or automate their role out of existence), we need to use rhetoric to persuade them. STEM-trained people are not trained in rhetoric so they use the phrase story-telling to describe this process. The story-telling involves arranging info-graphics and slide presentations that tell a story in the most obvious and direct terms possible. Sometimes, they interpret the phrase “story telling” to give them license for inventing some details to make the story more comprehensible and entertaining.
I view data science projects as a supply chain that starts with collecting observations and then passes through multiple steps of data cleansing and governance until it is available for predictive analytics and simulation that supports decision making. In many past discussion, I emphasized the need to defend the supply chain from contamination by dark data, a term I use to describe assumed (or model-generated) data to fill in for missing data. The predictive analytics and simulations at the end of the supply chain will inevitably introduce assumptions and models to the decision making process. This introduction of assumptions should be the final step in the supply chain so that all of the data provided to this step involves bright (well-documented and well-controlled) data. Bright data is usually rare, so we should demand dim data be as bright as possible. In particular, we must avoid introducing model-generated data as inputs to models for predictions to support decision-making. All of the assumptions should be internal to the predictive algorithms and simulations themselves so that the documentation of these assumptions are available to the decision maker.
In most cases today we do not have automated decision making. Instead we have to persuade a human decision maker to take an action that matches our recommendations. Even when our recommendations come from trusted predictive analytics or simulations fed by excellent data, the decision maker will need additional persuasion. A decision maker who understand he will be personally accountable for a decision will demand a persuasive argument to overcome his own reasonable doubts and fears. Usually, the decision maker will challenge humans to defend their recommendations.
Those decision makers who do not demand human defense of recommendations have effectively been automated to accept whatever the data analytics recommend. Their position becomes a ceremonial position only.
Summarizing the above points, there remains an essential human role in data science. This role is the provide the persuasive argument to convince the decision maker to follow the recommendations from predictive analytics. This persuasion involves rhetoric. The skilled decision maker will challenge his data-scientists with new arguments and he will expect effective and convincing responses. Meanwhile the data-scientist needs to be prepared to defend potential arguments and the present a case to avoid provoking irrelevant arguments. There is a craft to presenting convincing case to the decision-maker. This craft includes techniques such as info-graphics, visualizations, and vivid slide presentations. To be presentable to a decision-maker, these products need careful design to present the best case to support the intended recommendation. I claim that this careful design is a form of story-telling.
My motivation for writing this post is about an observation I had about general literature found in bookstores and magazines. I introduced the above background to provide a possible tie-in with data science.
My observation about literature was something I became aware of in my youth a few decades ago. I recall distinctly of being taught that non-fiction was a higher or more respectable form of writing than fiction. The reason why that was so distinct in my mind is because this standard was contrary to my natural preference of reading and writing fiction. I can not offer any concrete evidence about this shift in preference for non-fiction. For now, I only relate my impression that this was a relatively recent change in cultural expectations. My impression is that fiction as a literary form had a higher degree of respect and honor in the past than now. There was a shift to elevate non-fiction and denigrate fiction.
I recall noticing this in book stores where there were increasing numbers of non-fiction topics and more shelves for their books. At the same time, the space for fiction appeared to shrink. Even within fiction, many of the books were older books while non-fiction books tend to be more recently written. This is what I noticed back then.
I recalled that impression recently when I was reading a review of recently published books. The reviews of non-fiction books described qualities of the work that would better fit works of fiction. These non-fiction books exhibit humor, character development, and unifying themes that held the work together. In my imagination, I saw a fiction writer struggling to stay within the boundaries of a non-fiction label, or even pushing those limits, just to stay in the more respectable realm of non-fiction.
The example that caught my attention was the book Systemantics. The Systems bible. The book appears to be presented as a book for systems professionals to provide some examples and principles about how designing systems is different from other types of design. Being a person interested in systems, this seems to be an interesting book. I have not read it, but I have it on my list for future reading. From the reviews, it sounds like an entertaining read in addition to being informative. As I read through several reviews, I began to notice a pattern that there was more praise for the entertainment than for the information. The book is listed in the Amazon library under non-fiction as a work of philosophy. I wonder though, given the entertaining style of writing, the book may be well suited for a fiction category of humor. Based on the reviews, the book appears more a book of humor than a book of philosophy, but both the title and introduction definitely emphasize it as a non-fiction work.
Again, I would have to read the book myself to offer any real conclusions, but just from the information on the Amazon site, this book appears to be recent example of what I observed earlier in my life. The author is marketing his work of humor as non-fiction. I wonder if he is doing this because it makes the work more respectable.
Assuming that it is true that modern culture considers non-fiction works to be more respectable than fiction (and I am convinced it does), I think this is a relatively recent phenomena. In an earlier post, I wrote some thoughts about Herman Melville’s Moby Dick. I don’t think anyone would object my calling this book a work of fiction. However, in my post, I admired the rich scientific details he provides that captures knowledge of cetaceans at the time of early 19th century. Much of the details included in that work of fiction match very well with modern science knowledge of cetaceans. Although the book is a work of fiction, it could have served well at the time as an introductory book on the science of whales and their cousins. In fact, it could serve that purpose even today. Herman Melville could have stripped out the fictional story line from the book and still end up with a decent sized work of non-fiction. From my reading of his life, I get the impression that he set out to write a work of fiction and it was stylish then to include lots of credible details and he happened to know a lot of details about whales and whaling. In his time, books that sold were books of fiction.
If Herman Melville lived 200 years later he could have written nearly the exact same content and sold it as non-fiction. The book would need a little re-arranging to relegate the fictional tale as an illustration subordinated to the main theme of the book that describes the science of whales and the practice of whaling. Rearranging the content a little bit would convert this book of fiction into non-fiction.
In the past, there seemed to be less of a stigma attached to calling a book a work of fiction. Readers wanted to read fiction. They wanted to read characters who best come to life when they are fictional characters free from constraints from the very limited documentation of actual lives. They also wanted to read adventures with details of events that would normally lack documentation or collaborating evidence. This kind of fictional detail sold books back then. They wanted fiction.
At the same time, they were very sophisticated readers who expected the details to be credible and even verifiable. In the case of Moby Dick, the first release had to be revised to change the ending because the book starts off as a first person account but ended with the narrator drowning at the climatic end. This could probably work in modern fiction, but it was unacceptable then. Fictional stories had to be credible.
Clearly Melville had love of the science of whales and the practice of whaling. I imagine that he happily provided those details. If he wasn’t so eager to provide the details, his readers would have demanded it. The fiction of the story involves the characters and adventures, but the surrounding information had to be credible and factual. Fiction of his time required research.
This book stands out in particular as a work that is nearly a work of non-fiction with fictional illustrations. Many other books of the time were probably similarly a mix of fiction and non-fiction but sold without hesitation as stories of fiction.
Today, if one invests in any research into a book, the book will almost certainly be sold as non-fiction. We no longer expect fiction to educate us of real-world facts or to be consistent with known facts. Perhaps the change in respectability of works freed up fiction to become more fanciful or less detailed. Today’s fiction can concentrate entirely on adventures and characters instead of lengthy descriptions of the surroundings. Also, fiction can get away with non-realistic circumstances, explicitly in science fiction and fantasy, or implicit in suspense novels or mysteries.
Works of fiction are still very popular. Some titles makes a lot of money for the authors. And many others attract at least a sizable audience. I would like to write a book of fiction that sells millions of copies.
When I say that non-fiction receives more respect than fiction, I am talking about about what the work says about the author. An author of non-fiction earns respect by having reviews that praise the scholarship of the material in the work even if very few people will ever read it. To obtain a comparable level of respect as a writer of fiction, one would have to write something that sells millions of copies.
I don’t think we made this distinction in earlier times. A writer of fiction would receive respect even for a small number of sales. It was expected that the author would invest a lot of time researching for a book (or relying on his earlier education) to provide a lot of factual material in the work of fiction.
Above I mentioned that a rearrangement of material in Moby Dick could convert it into a work of non-fiction. The book would describe whales and whaling and would advance a coherent story to illustrate the principles at the appropriate points. This pattern describes many modern works of non-fiction. The detailed exposition of researched facts would be livened up with short narratives of fictional tales to help illustrate the concepts. A book describing some city for example, may describe a location through a first- or second-person narration that attempts to place the reader in that setting.
In technical books, such as those describing some modern software practice, the books will include fictional examples to illustrate the lessons. A book on website design, for instance, will usually involve an example of a fictional website of a fictional company. Such a book’s depiction of this fictional scenario closely parallels the Melville’s story of Ishmael on the Pequod.
In this example of a book on website design, the book could convey very accurate information about how to make a website even though the example it uses is completely fake. The fake example is fiction but it is very useful to illustrate the accurate factual information the book is teaching. The website example is fake, but the overall book is accurate. The reader recognizes this distinction: he is reading the book to learn about how to build a website, not to learn something about the company in the illustrative example. Still, the example is fiction.
Another example that occurs to me is in many school textbooks that have word problems or illustrations that often involve fictional scenarios that either would not occur in real life or would be over simplified to describe a real life scenario. The examples are fake. The material is accurate.
This discussion suggests to me that modern non-fiction shares something in common with earlier works of fiction. Both often include a mix of accurate facts and entertaining fictional narratives. In contrast, modern fiction is less constrained to be bound by accurate facts if it has any at all. The choice to arranging the fictional and accurate content is what determines a work to be fiction or non-fiction. Rearranging the material can fundamentally change the classification of the book. My above example of the website design book could rearrange the order of the text to present Moby Dick like story of some imagined website. (I like the Moby Dick example because its odd and inconsistent organization suggests that it could be reshuffled.) This new book of fiction would be about a fictional company building its fictional website using modern best practices that are detailed in the book.
The point I want to make is that non-fiction books include both fake and accurate. If I buy the book on website design, I would not be bothered at all by the juxtaposition of an accurate book with an illustrative example that is fake.
Modern non-fiction often can be described as fake but accurate.
In my discussion of the stories that data scientist may tell as part of a persuasive argument, data and analytics provide the accuracy but the illustrative story delivers the persuasion. The illustration typically involves some creative imaginative writing to make a coherent story that everyone can easily recognize and where the lesson is clear. It is rare that a single well-documented case will provide an effective story. Instead, a story may be constructed using a composite of separate cases. The story combines the most compelling elements from each case in such a way that the story has the desired impact. The story is fake, but its instances are accurate. During revisions of the presentation, the presenters will remove internal inconsistencies of the internal details and this makes the instances fake although based on accurate information. Finally the story requires a narrative that includes creative writing to tie the instances together into an entertaining story. The story is fake, but it is based on stuff that really happened.
In data science, STEM professionals will typically present stories in graphics instead of text. Typically the graphics include a lot of different information combined into a single image or animation. Sometimes animated figures in cartoon form will tell the story but where the cartoon figures emphasizes that the actual example is fake but the basic points are accurate.
Often, there may be a demonstration of using the actual data for a contrived scenario: the scenario is fake in order to present all of the key information in a very short period of time. We assume the decision maker recognizes the scenario as fake and focuses his attention on the underlying facts. In my above example of the web-site design book, the author is trying to impress the reader on the steps to prepare a website. He does not want to sell the reader on the fake company he uses as an illustration.
The data scientist story teller is taking advantage of the modern concept of a non-fiction work. The data scientist is presenting non-fiction accurate data using fake illustrations. This approach is familiar to modern audiences because this use of fake examples is common in popular non-fiction literature. We recognize that the example should not be confused with the underlying accurate information.
In an earlier post, I argued that journalists are data scientists. They collect and verify data and then they present a narrative that presents this data in a form that will retain the interest of the audience long enough to absorb the accurate information. In recent news, the example of the Michael Brown murder in Ferguson, the underlying journalism presented information of a community with long-lasting frustrations with their police department. The journalists illustrated this data with a narrative that Michael Brown was either shot in the back fleeing or shot while gesturing surrender (or both). This is the “hands up, don’t shoot” narrative that has caught on in protests across the nation. In this case, much of the debate on both sides concerns the accuracy of the chosen illustrative example. While this debate is justified, it misses the point that the bulk of the non-fiction content concerns the underlying message of frustration of the community with its police department.
In non-fiction we should be familiar with being suspicious of the illustrative example. The illustration is to dramatize the underlying facts, but the illustration itself is probably not completely accurate. In the Ferguson case, the illustration is a story to persuade us of the need to accept a particular notion for police reform. Arguing over the validity of an illustrative example is like arguing about whether a text book exercise would ever occur in real life. Illustrations in non-fiction works can be fake.
The “hands up, don’t shoot” scenario appears very unlikely to have happened, but there is non-fictional evidence of community frustrations with their police. Similarly there was no sole survivor named Ishmael of a ship named Pequot wrecked by a whale, but Moby Dick provides a non-fiction of cetaceans and of actual practices of whaling.
I think this is what distinguishes modern non-fiction from non-fiction in earlier ages. In earlier ages (Melville’s time), the non-fiction took pains to exclude any fanciful information at all. Non-fiction was devoted the presentation of actual evidence that supports a particular conclusion or theory. Fiction (such as Moby Dick) was a separate work that incorporated those facts into an illustrative example. Today’s non-fiction typically incorporates both fanciful illustrations and actual facts. The danger is that some readers may confuse the illustration as additional facts.
There may be a lesson here that the older practice of classifying literature was better. Older forms of non-fiction free of any imaginative information provided a reliable source of information because the intention was to include nothing in the document that was not supported with facts. Non-fiction would have no imagined illustrative examples. Fiction was a distinct form of literature that used facts in a tale either to entertain or to persuade (and frequently do both). The lesson for modern data science is that we may be better served by following the older model of non-fiction that sticks to the facts, and leave the story telling to others.