Many of my earlier posts extrapolate from my experiences as a self-described data-scientist. Another way to describe these posts as viewing the world through my internal biases of seeing everything as just data.
In some posts I suggest that the entire universe appears nothing more than a one-dimensional line of data.
Other posts suggest that the 3 spacial dimensions are illusions encoded in this historical data. I’m showing another bias from my working with multidimensional databases where the norm is working with dozens (or more) dimensions. Note that dimensions in data are expressed often as carefully designed categories instead of numbers. It just happens that some of those categories include distances of the three dimensions. To me, a dimension is something encoded in data.
The entirety of the universe is almost entirely just data with its encoded dimensions. The non-data part of the universe is the mere instant of the present moment. The data part of the universe has no influence on the present moment. The present moment is a free actor whose actions have consequences that join the historical record, the data part of the universe.
An earlier post described a three part division of the universe: the present moment is where actions occur, the historical record of the knowable past, and the totality of all of the opportunities of completely possible events that were alternatives to what actually happened. I described this division in moralistic terms, describing the present-moment actor (such as ourselves) as struggling to make the best choice of which options to place into historical record or into the missed-opportunity bucket.
In another series of posts, I proposed redefining the sciences into two categories. Instead of hard sciences and soft (having less predictive power) sciences, I proposed present-tense science and past-tense science. The present-tense science is dealing with the constraints of laws, materials, and time to run experiments, operate machinery, or collect observations. The past-tense science is patiently interpreting the historical record.
Today, I suggest that present-tense science can be further subdivided into experimental science, observational science, and practical science.
Practical science is the science that runs human economies: an everything-else category to include engineering or operations used to provide economic benefits to humans. This is a topic for a different kind of post.
Experimental science refers to the scientific method and hypothesis testing. It is the discipline of science that collects controlled data. Controlled data is very carefully managed and selected for its relevance to a particular hypothesis or objective.
Observational science refers to the collection of uncontrolled data. It is the natural science of observing nature ideally without any interference by the scientist. The example is the naturalist recording animals in their natural environment — the naturalist strives to position himself as distant as possible from the subject so as to not have any influence on the observations.
The two present-tense sciences defer in terms of the data they collect. One collects carefully controlled data. The other collects uncontrolled data.
Allow me now to focus on the historical-sciences. The historical sciences confront historical data. Their motivation is to read the data collected by the present-tense scientists. The historical-sciences create hypotheses that allows meaningful arrangement of the observed data. A byproduct of the historical sciences is motivate the present-tense sciences to focus on these new or revised hypotheses.
The historical-sciences inevitably encounter missing data — data that the hypothesis predicts but is not present in the record. Sometimes the hypothesis is so widely accepted that the predicted data is widely accepted as part of the available data. I call this predicted data “dark data”. It is model-generated data that derives its credibility from the belief in the model instead of actual observations of from the present-tense science. Dark data is the data generated by historical-science.
As I mentioned previously, I consider my occupation to be data science that I locate in the historical-sciences. I have great appreciation and respect for the difficulties of historical sciences. Hypotheses and dark data are what make historical-science so challenging. As beautiful as these can sometimes be, we continuously challenge their validity.
One quality of historical-science is its patience. Nothing is ever conclusively decided. Everything is subject to new challenges.
I use the term historical-science to invoke the analogy of human history. Even today we are still arguing about what happened during reasonably documented events from centuries ago, events that were repeatedly investigated by multiple generations of very capable historians. No matter how highly we regard the predecessor historians, we are obliged to replace their interpretations of hypotheses with ones that better explain the data: either the data available to the predecessor or newly discovered data not available to him.
No matter how much data we have, there are questions that don’t have directly relevant data. We make up the missing data to fit an hypothesis. Often this is implicit. Accepting a hypothesis means accepting the predictions of the hypothesis and that includes the predictions that match observations and predictions that have no corresponding observations.
The historical-science challenges both the hypothesis and the model-generated data.
With all of this background of my understanding of the sciences, I ask the question what does it mean to be anti-science? Another way to describe anti-science is to be a proponent of a pseudoscience. In either case, we are asked to keep it out of our schools.
From a data-science point of view, I offer a suggestion that the focus of our suspicions should instead be placed on the hypotheses and dark data from historical sciences. But instead of a ban on these, I recommend that they be separated from the observations of present-tense sciences.
Schools should focus study the very careful collection of observations from controlled experiments or from uncontrolled natural observations. This is the study of the methods of present-tense sciences.
The schools should also focus study on the very diligent scrutiny of hypothesis and model-generated data. This is the study that constantly and perpetually criticizes the hypotheses and model-generated data.
These skills are about deconstructing the arguments and logic of the hypotheses, reconstructing alternative hypothesis, and then presenting or defending one side or the other. These are the skills of the historical-sciences. These are the skills of classical rhetoric.
From my perspective, the present-day demand to keep anti-science or pseudoscience out of schools is identical to our already near complete ejection of classical rhetoric from schools. We are systematically rejecting the teaching of the skills to question and indeed to challenge hypotheses and model-generated data.
To say that schools much teach only indisputable science is to say we can not teach disputing hypotheses and dark data. We can not teach the practice of historical-science.
Teaching only the indisputable is in my mind is the definition of promoting anti-science in schools.