There is a new type of skepticism concerning data instead of truth. In many earlier posts, I expressed skepticism of different types of data but I used analogy of light instead. In other words my depiction of data as somewhere between bright data or dark data was a scale of increasing skepticism. I consider bright data as well controlled and accurate present-tense observations. Most observational data is not as trustworthy so I describe most observational data as having some degree of dimness. At the far end of skepticism is dark data, a term I made for data generated from models lacking any present-tense observations.
Model generated data, or dark data as I call it, represents historic data. I acknowledge that models can be very useful and may be very accurate in terms of representing some aspect of reality. Yet, I separate it from present tense observational data. To attain my goal of solving new problems, I want to accept the possibility that models may be in some way inapplicable to the present conditions. Present tense data may support some new idea that is inconsistent with the established models, but this would be hampered if I impose models on the conclusions made from observations. Examples of how models can obscure observations are when models are used to fit the data or are used to filter outliers.
It occurs to me that this suspicion of models is similar to the classical philosophy of skepticism of knowledge. I am allowing for the possibility that the models may be wrong. Models are theories or other representations of scientific knowledge based on prior experiments, peer review, and repeated testing. Suspecting theories seems to be similar to suspecting knowledge itself.
I don’t agree with this. I am generally confident that knowledge is possible and that man is capable of knowing the natural truth in many topics. For example, I had earlier talked about the New Horizons mission to Pluto. I had faith that this mission would succeed because I believed that the people who started designing and building the probe 20 years before the flyby date were confident they can make everything work, also people at launch time were confident that the prior 10 years of work was ready for launch and also that the launch would succeed, and also people at the time of the flyby were confident that all of the work of the prior 20 years was both competent and true to natural truths. This is not philosophical skepticism in the classical sense.
The modern skepticism is what motivated the mission in the first place. We know a lot about the science of planetary bodies and orbits. The mission produced observations that redundantly confirmed many predictions. However, the mission also presented observations that were not predicted. Some of those surprises were contrary to predictions. For example, the observations showed a more geologically active body with unexpected geographic features for such a small and remote body.
The modern skepticism is not a skepticism about the possibility of knowledge or of man’s ability to acquire that knowledge.
The modern skepticism is about data instead of knowledge. My skepticism is that there is too much missing data. This is a refinement of the old skepticism that comes from the age of big data. As we acquire more data and more diverse data, we become more acutely aware of the data we are missing.
Every year or two, we obtain new generation of technology delivers more storage capacities and faster retrieval methods. The rapidity of data capabilities forces us to recognize what we were missing within our working lives when we had to accept less data. When we add more data or more types of data, we learn we can do more and come up with new conclusions.
The modern skepticism is a practical consequence of this experience with rapid data-technology changes. We are confronted with how much data is out there that we do not yet have access to. We await the next improvements in data with anticipation that we’ll benefit from acquiring, storing, and analyzing more data. That more data will produce new discoveries, either building on existing knowledge, or discovering new insights about reality.
Skepticism of data is the recognition that despite all the data we currently can access and analyze, we realize there is vastly more data out there that we don’t have.
An older scientific statement is that with each new discovery we find many more questions. In the data context, with each new achievement from data, we learn that there is vastly more data that we want to obtain and control. When we had gigabytes, we desired terabytes, but having terabytes only made us hope for petabytes, then etabytes, and beyond.
Even in terms of sensors, we were once satisfied with a single sensor for a particular measurement, but now we want multiple instances of the same type of sensor where even if they had overlapping fields of view, we would know we can exploit the redundancy.
As data capabilities grow so fast, there is an erosion in the fundamental goals of older scientific approaches to reduce nature to descriptive or mathematical models that can we pass on to next generations through education. The durable truthful knowledge can always be re-discovered with newer data. Freeing the process from established models will allow us to exploit fully the data we can access currently. The data may just confirm what we already know.
For example, we knew there would be some type of body at the place where we expected Pluto to be. The new data confirmed that, but it also showed us that it was of a composition that we did not expect. With the new data we built a new model of the planet and its satellites. That model was similar to established models, but we could have come to the same conclusions without the prior expectation of existing models.
Unlike skepticism of knowledge or of ability to know the truth, the modern skepticism is a skepticism of having enough data.