This post is motivated by this Fast Company article Why top tech CEOs want employees with liberal arts degrees. This article caught my attention because it confirms some of what I’ve been writing about in earlier posts. In particular, I have been claiming that some liberal arts skills are essential for the aspect of data science that I feel is most important. That aspect involves scrutinizing data for ambiguity in order to prepare and defend an argument about the meaning and relevance of the data.
I assert that one foundation of data science is a skill in classically-defined rhetoric: the art of persuasion with principled avoidance of logical or rhetorical fallacies. Although the concepts of logic and rhetoric may be reduced to a few courses, students learn the skills through repeated practice of collecting evidence, formulating arguments, presenting the argument concisely in written or verbal form, and being prepared to defend the argument. In contrast to STEM disciplines, liberal arts disciplines more thoroughly challenge students to prepare and defend term papers that require exercise and demonstration of rhetorical skills applied to a variety of topics.
I maintain that rhetoric is critical skill that is often lacking in many technology-driven programs. Rhetorical skills are especially needed to support the new phenomena of data science. The modern concept of data science evolved out of computer science. Data science is largely a subset of computer science rather than a discipline of the physical sciences. Historically, we allowed for a purely technologist concept of computer science because they prepared software to be turned over to user (promoted to production) for operational responsibility. Although the software development process may involve extensive testing for correctness of software, the ultimate responsibility for operational use of the software rests on the users of the software.
Computer science products always carries an implicit disclaimer of “use at your own risk” because the software developers are not responsible for operations. Earlier, this disclaimer was explicitly stated. Today the same disclaimer exists in the wall we place between development and production. Software developers are on one side of the wall to assure the software meets specifications. Software operators are on the other side of the wall to absorb responsibility for any consequences of the use of that software. Software operators use software at their own risk.
Computer science algorithms are captured entirely in source code. We can thoroughly test this source code in a development environment completely isolated from the production environment. When we promote tested source code into the production environment, we have confidence that the algorithms will operate correctly. As a result, we have confidence that any unfortunate consequence of the software must be because of some form of operator error or negligence either in the use of the software or in the specification of the requirements. The operator assumes the risk of using the software.
Although data science is subset of computer science practice, the goal of data science to tackle high volume, large variety, and high velocity data transforms their product to an operational one. This goal is to exploit data in a way that is beyond the human capacity to comprehend or independently verify. Humans can comprehend the concepts of the algorithms and the correctness of the data. It is the volume, variety, and velocity of data that prevents humans from comprehending the reasoning behind the conclusions that data science automates.
As I mentioned in earlier posts, data science is different from computer science in the sense that the data becomes part of the algorithm. I described a simple algorithm of a dictionary look-up as an analogy to the sophisticated algorithms for predictions. In computer science, a dictionary is a simple structure that can store data with special keys to locate specific data items. The testing of a dictionary implementation involves showing that it can store and retrieve the data supplied to it. The dictionary software proves it can operate like a dictionary. More sophisticated data representation algorithms prove their correctness in a similar fashion: the algorithms demonstrate that they can retrieve information previously supplied to it.
As I described with dictionaries, there is no way for a developer to predict that possible values returned for certain keys. For example, a dictionary that returned a Boolean true value for the key “T” during testing may return the Boolean false value for the same key “T” during operation because the operator redefined the value. The dictionary works as intended to return the operator’s preferred value for a certain key value, but the actual value returned may not have been anticipated by the developers. This is a simple example, but something similar occurs in complex algorithms where unpredictable operational data populates key-value type of relationships.
Because the operational behavior of the algorithm depends more on the data than the software, the line between software development and production blurs or disappears. Unlike computer science that can exist in isolation of production, data science must participate in the operational part of the project. Data science must share the risk of using the software.
Data science must be prepared to answer for the unexpected. Data scientists have no opportunity to hide behind the disclaimer of use at your own risk that works so well for its ancestral computer science practices. Data scientists have to be prepared to accept responsibility for operations and to accept accountability for any consequences.
In my opinion when it comes to handling operational responsibility and accountability, people who have extensive liberal arts training are better prepared because of their repeated practice of rhetorical skills making valid arguments applied to increasingly complex topics. Liberal arts are prepared for arguing cases involving ambiguity.
Certainly, much of the sophistication of data science algorithms acknowledges the existence of uncertainty through the use of statistics. These algorithms require STEM disciplines of assuring that the algorithms are valid, appropriate, and correctly implemented. However, there remains a distinction in attitude between STEM and liberal arts disciplines. The liberal arts discipline will argue for or against a result in spite of the provable correctness of the statistical algorithm and its implementation. This opportunity for argument is valid because the consequences depend as much on the actual data (evidence) as it does on the implemented algorithm.
Data science needs to encourage this level of argument because they are inevitably going to be responsible and accountable for the operational consequences of the algorithm.
From this perspective, it is reassuring to see the article claiming CEO-level appreciation for the value of liberal arts education in technology industry. The article is too short to determine whether this appreciation goes as deeply as I am suggesting, but at least they are talking about this distinction in training during the critical formative years of college.
Despite this appreciation, the CEO do not seem to be doing anything about this. Even if they understand the need for critical thinking skills that come from liberal arts education, their companies only have job opportunities for technical skills. The only way these companies will hire a liberal arts major is when that individual also learns the technical skills of a particular discipline. For example, a company may hire a sociologist because the sociologist knows the statistical language of R or the statistical software of SPSS. The hiring requires technical competence. Although it is possible that an additional background in liberal arts may be beneficial it is probably not going to outweigh a candidate without that background but with a more thorough experience in the technology. Technology companies have only have job openings for technology jobs.
Technology firms may appreciate the value of liberal arts educations but they will continue to hire only technology skills.
There is a fundamental difference between liberal arts and STEM in the practice of these disciplines as well as the training. The value proposition of liberal arts is the preparation, presentation, and defense of a persuasive argument. This is a very time consuming activity as exemplified by the generally lighter course load of liberal arts studies (especially in Junior and Senior years) to allow for more time to prepare well researched and well written term papers.
If liberal arts backgrounds offer a value added to technology firms, that value must come from allowing liberal arts majors to practice their craft of rhetoric. We can not be satisfied with the mere platitude that some software developer met his quota for software product that was somehow better because he had previously studied comparative religion. Very little if any liberal arts practice went into his development efforts.
We do need the liberal arts perspective on the ground floor of technology. Liberal arts backgrounds need to be part of the team in a way that recognizes that they are performing a completely different kind of task from the technologists. We need to start hiring onto technical teams a non-technical position whose tasks is to engage in the same skills liberal arts majors practice when they prepared term papers or oral arguments.
This need is especially acute for modern data science projects in the area called big data and predictive analytics.
In the context of a modern software development practice known as agile development, isolated teams work on short duration sprints to develop software and present the working product during frequent periodic reviews to the product owner and stake holders. It is within this isolated team (often called a scrum team) that data science needs liberal arts skills. The scrum team must include at least one member whose task is similar to the term paper of liberal arts course work. This liberal arts tasks is in addition to and independent of the production of software. We need this person to devote as much time as possible in the preparation, presentation, and defense of argument.
In fact, I mentioned this in earlier posts as the need for “story telling”. In those posts, I described an increasing demand for data scientist to have story telling skills. Often the job description explicitly uses the words “story telling”. My earlier writings took a dismissive attitude that such story telling is a substitution of a comprehensible rhetorical metaphor for the human incomprehensible conclusion of big data analytics. I would not have been as dismissive of the concept if they had used different words. But now I realize that “story telling” is generally how STEM perceives rhetorical skills. Rhetoric is exemplified by liberal arts term papers. To STEM disciplines, term papers appear to be story telling. It is not story telling, it is rhetoric.
We can not realize the supposed benefits of recommendations from big data analytics unless we can persuade the decision makers to accept these recommendations. Unfortunately, the very project is to come up with recommendations that are beyond human comprehension (for example, see this article on same web site). The success of big data projects rests on the ability of the data science team to persuade decision makers.
Persuasion requires rhetoric skills practiced in the development of liberal arts majors. Also, persuasion requires the practice of exactly these same skills through the comparably time consuming activity of preparing for a presentation of a persuasive argument. We need liberal arts professionals practicing liberal arts within the technology teams. Sprint cycles need to conclude with a well developed persuasive argument. To a STEM profession, this argument appears to be story telling, but in the rhetoric discipline story telling (metaphor) is just one form of argument.
If technology CEOs truly value the capabilities of liberal arts trained professionals, they should direct their companies to create liberal arts job opportunities within the deepest technology development teams. These are not technical jobs that happen to have a liberal arts background. Instead these are jobs that are focused on delivering the rhetorical products that must accompany the technical product. Such jobs are especially needed for big-data data science projects where the success of the project depends on persuading decision makers to accept otherwise incomprehensible recommendations.
Technology firms must employ strictly liberal arts positions within development teams. For example, when a data science scrum team demonstrates their sprint accomplishments, the liberal arts members provide the persuasive narrative based on an effort comparable to their college experience of preparing or presenting term papers.
Pingback: Critical theory manufactures dark data, my concern about hiring social sciences | kenneumeister