This is a continuation of my last post that provides an introduction that applies equally here. Good clean and properly governed data in our data stores may suffer from something comparable to a fallacies identified in grammar, logic, or rhetoric in classical trivium education.
The trivium focuses on valid forms of persuasion for the purposes of debating policies and where the persuasion (or rhetoric) employs valid forms of logic using grammatically correct statements. I suggested that a dedomenocracy that uses statistical analytics on trusted data to automate policy decisions needs a similar approach challenge the content (or meaning) of data . Instead of fallacies in different levels of human languages, there are fallacies in different levels of information content in data. The project of seeking out such data fallacies corresponds to my own concept of the profession of data science.
My last post provided examples that I describe as grammatical in nature where modern uses of terms no longer have same meaning as older uses of the same terms. The fallacy is including both usages occur in the same data set as will happen with data warehouses, data lakes, or other long term storage of data. Over time the same label values of a field may have mutually exclusive meanings.
In an earlier post, I labeled as spark data the data deliberately introduced for the intended purpose of distraction away from the more important and solvable problems (where the solutions will be painful). I describe this kind of data comparable to a rhetorical fallacy. The data is valid data and the meaning is unambiguous, but they are irrelevant to the project of government because they have no solutions and they distract attention away from more pressing issues that actually have real solutions we can adopt. As with classical rhetorical fallacies, these data fallacies are deliberately introduced for unfair manipulation of the government.
To round out my analogy, I want to identify an analogy to logical fallacy in data content. I think my earlier discussions of income inequality and workforce participation may be described as logical fallacies. Both cases infer an label by considering the negative of some data. Income inequality is the conclusion that lower incomes can be higher because of evidence that some people are making a lot more money. Similarly, low workforce participation is the conclusion that these people can be employed because of evidence that other people have jobs. We can derive quantities of lower incomes or of non-participating workers as the remainder of the population after subtracting those with higher salaries or with jobs respectively. The fallacy is the assertion that this remainder population has a homogeneous and consistent meaning, in particular that these are always worse than the alternative.
Certainly, we can have labels for the opposite of well-compensated or fully-employed. The problem comes when we assign meaning to these negative labels. All we really know is that they are not in the positive category. There may be many reasons to be in the negative category and there there may be many consequences. The problem with the single negative label is that demands a definition or an interpretation. We readily accept that poorly-compensated workers or non-working adults are in need of more money or more employment.
This assumption seems to be a syllogism fallacy (such as the illicit major). For example
- People with high incomes have good lives
- People with low incomes do not have high incomes
- Thus, people with low incomes do not have good lives
When we identify a population with a label of low incomes we imply that their lives would be better if they had higher incomes. This meaning is similar to the above syllogistic fallacy of the illicit major. While there is no doubt that many poor people would desire higher incomes, there are many who choose lower incomes because of some other benefit they get from the jobs. The jobs may be less demanding, or may involve the kind of work they find more enjoyable.
Similar observation applies to the working-age population that is not participating in the workforce. All we know is that they are not working. We do not know anything about their condition other than the fact that they are not working (at least not in a way that we can measure).
In both examples, we can observe historic trends where in the recent past these populations were fewer in number. The trend suggests that more people would like higher pay (relative to the rest of the population) or more people would like jobs. However, there could be some other development where more recent culture has made attractive new opportunities for people to seek lower incomes or to avoid work obligations.
Public policy debates address the negative labels instead of the positive ones. For the employment example, the policy focuses on the negative group of those who do not have jobs. This negative group includes under-employed, unemployed between jobs, long-term unemployed, discouraged job-seekers, and people who have no interest in working at all (for any number of reasons).
We are motivated to pursue job-creating policies in order to benefit the jobless. It is not clear that this is legitimate policy goal because all we know about the jobless is that they are not in the group that has jobs.
For an illustration, a frequently proposed job-creation policy involves funding infrastructure projects that will create new (though temporary) demand for labor. However, these jobs will require trade skills that may be very rare in the ranks of the not-employed, and very few in this group are eager to pursue these skills in time to take advantage of the new opportunity.
Another major issue with the non-workers is that they reside in locations that are too remote from suitable jobs. They remain out of the workforce because the jobs are not going to come to them, and they are not going to go to the jobs. This is especially true in rural areas with low population density. While some policy may be possible to make work projects locally, these are less likely to provide lasting economic benefits compared with building or repairing fixed-location highways, bridges, tunnels, or pipelines. Unless the jobs do show up locally, these people are not going out of their way for jobs no matter how plentiful they become.
The negative label in data (poorly compensated, or not employed) represents a catchall label for all other possibilities. The negative label is “none of the above”. All we can say about the label is that it is not one of the positive labels. The individuals in the negative label category may prefer a more positive label for their status but none of the available positive labels fit. The fallacy is that the individuals in the negative labeled category need to move into one of the known positive labels: low-income people want higher incomes, not employed want jobs. The reality may be that they would prefer that their current conditions be recognized accurately with new positive labels that do fit.
I’ll end this discussion here because I need some more time to develop a concept of logical fallacy lurking in content of data. The grammar-level or rhetorical-level fallacies seem more obvious to me than logical-level fallacies. I think this is a result of not thinking about it enough. The logical fallacies involve the interpretation of negative labels. Negative labels occur frequently and often become the targets of policy. I need more time to think about my own experiences with negative labels and how they interfere with decision-making.