Democratic government involve rhetoric to persuade. In recent posts, I describe an alternative dedomenocracy that replaces human debate and deliberation with automated algorithms working on large-scale data collections in order to generate short-lived rules that everyone must follow. While this concept government by data removes humans from the decision making process, humans still contribute in two ways: first by practicing data science to scrutinize the quality of data and algorithms, and second by being the source of data for future rule making.
With rule making algorithms that are strictly objective without any human biases (including prior concepts of Truth, or theories) other than statistical methods, the results of the algorithms are very much dependent on the data. In such a system, data (not software) determines the behavior of the algorithms. This presents the opportunity for abuse or manipulation to gain an unfair advantage from automated rule making by introducing specially designed data.
Although governing by data eliminates the need for classical rhetoric in making decisions, there is a need for a comparable discipline in evaluating the validity of data. The data science of scrutinizing the data needs a science to identify fallacies in data.
In many earlier posts, I discussed the problem of model-generated data where we use theories to create data to substitute for missing data (a concept I call dark data) or we use theories to reject data as part of data cleansing when data departs too far from theory prediction (a concept I call forbidden data). Generally, I argue that both practices are wrong: we should leave missing data empty, and we should retain any data no matter how surprising it is. In both cases, we can flag the fact that the data is missing or the data is suspect, but the data store should contain only as-is data: data as observed by an authenticated sensor. When we permit models to inject artificial data or to reject actual data, we bias our data store with past theories and this can blind us from discovering something new about the world.
Dark and forbidden data are comparable to classical rhetoric’s concepts of fallacies. Dark and forbidden data are data fallacies. As in rhetoric where identification of fallacies can invalidate arguments, data fallacies can invalidate the data or the analytics based on that data.
Just as the history of rhetoric has identified many rhetorical fallacies to avoid being persuaded of bad arguments, there is a need to identify many data fallacies for the same reason. Data fallacies provide a defense against attempts to manipulate data to gain an unfair advantage in the resulting analytic-based rule-making.
The following are some other ideas as possible data fallacies that can unfairly influence automated rule making.
A data fallacy comparable to an ad hominen is the rejection of a data source that has a history of generating data that supports a competing goal or contradicts a preferred goal. Again, the intention of dedomenocracy is a pure form of governing by data without any input about human biases. It is wrong to reject a data source of accurate recordings of observations because the observations do not support a human claim.
An example of an ad hominen applied to data may be the rejection of field study observations that describe non-human behaviors with analogies to human inner lives. When recorded by an experienced and educated scientist, this anthropomophism may be a very accurate description of the appearance of the behavior. There is value in this observation even if in fact the non-human being has no inner life that humans would recognize. We can still learn from the analogy when we trust the observer to be honest in describing the observation.
Another data ad hominen may be the rejection of satellite global temperature measurements in determining that global average temperature because the satellite data makes a weaker case about global warming. If there is no reason to doubt the accuracy of the satellite measurements, they should be part of the data store for determining policies involving global climate management. Certainly the data should be distinguished from ground measurements with some understanding that the measurements are not interchangeable, but neither data source should be rejected if both are accurate according to the source’s standards.
Another type of data fallacy may be something comparable to a rhetorical straw man. The straw man argument involves an easily disproved claim that is anonymously assigned to an opposing view. For example, an view under attack may be a claim that all living beings have inner lives and intelligence that we can recognize as human-like. A data straw man is to observe the missing evidence that such beings engage in human-language conversations and negotiations with humans. The straw man data is the missing data to document cases of such human and non-human commerce. While there may be some people who believe that non-human animals can communicate as peers with humans and there is ambiguous demonstrations of a wide range of animals responding appropriately to human-language commands, this is not an essential condition to believing in true intelligence in non-human beings or communities of beings.
There may be many other ways to identify fallacies in handling data that may have an analogous effect on dedomenocracy automated rule-making as classical rhetorical fallacies have on persuasive arguments. In order to defend against malicious or unfair manipulation of a dedomenocracy, we need to develop ways to identify data fallacies that we can use to govern the quality of data for automated rule making.