What makes data possible

In an earlier article I was thinking about the meaning of nothingness that separates the particles inside an atom or that separates celestial bodies.   It occurs to me now that maybe they may not be the same kind of nothingness.  Thinking about nothingness turns it into something and that makes possible the idea that that can be different kinds.

Nothingness is what we fill the space we calculate must exist because of geometry.  Something with a radius sweeps out a volume and that volume is not always full of material stuff.  Nothing is what we call the non-material stuff that fills the rest of our imaginary sphere.

As noted in another article, in my prior employment I did something that is sometimes labeled as data science.   The computing profession is very eager to call their art a science in order to make it seem like what they do is repeatable or something.   But unlike software science or computer science, I think there may be a real science in the study of data.

Data has no material cause.  Materials may store or transmit the data, but the data itself is not material.  Data is not explained by particle physics or mechanical physics or celestial physics.  It is just taken for granted.  We understand science through data that just happens to be around.

If the universe were to end with a big crunch (an unlikely scenario given current understanding) there would be no material left but we don’t doubt that data will still exist.  Certain the data point that the universe once existed will survive even if has no material place to rest or be transmitted.

In my job, I learned to respect data as an entity on its own right.  Before you can accept the information embedded in a particular piece of data, you need to know about the data creature.  

Data is about something.  This something is what interests us most.  This something has grounding in fact or theory we have.  The data is an approximate measure of that something at some particular time.   But before we can use the data to inform us about that something, we first have to trust the data.

Data is like a witness in a courtroom trial.  It is telling us something relevant to the case.  But the witness himself must be thoroughly vetted and cross-examined.  Is the witness a reliable observer?  When was the observation relative to the event?  What was the state of the witness at that time?  What was the relative timing of the witness telling of his observations with the timing of the telling by other witnesses (could they have influenced each other).   Among several witnesses, does this have something independent to add or is it just confirming other witnesses?  If there is a difference between two testimonies is it due to useful differences in vantage points or is it due to faulty observations by one or the other.   How do we know if there is something that was not witnessed?  How do we select or combine multiple roughly identical witness accounts?

All of the above questions is what I am thinking when I attempt to ask of data before asking what the data is telling me.  I am not a lawyer.  Undoubtedly I may have mischaracterized the process from a legal perspective.   On the other hand, I think data science can learn a lot from the legal profession in terms of how to handle witness testimony.  What we have to work with are data that are just as unreliable witnesses of actual events as humans are.  The reasons for the unreliability are completely different, but virtually every fallibility of a human can be observed in data.  

We need to test data constantly in order to continue to use its information.   This is just like two court cases that happen to involve the same person as a witness.  The process is started all over again in each case.

Data science must address billions of data items at a time.  We automate approaches to test each data item to see if it meets rough tests for what is expected or does not meet tests for things we suspect.  

If data becomes suspect, we study them more closely.  If the data is bad data, then we attempt to work back to try to prevent such bad results from recurring.  If suspect data can not be discounted as bad, then we attempt to understand what this data is telling us.  Once we figure out how to work with this new information, we adjust our expectations to not flag future instances as suspect.

For the non-suspect data, we periodically review it as well to be sure that the individual data items are realistic.   Data is about real world things and the real world is always changing.  Is the data keeping up with what we know is changing in the world?  Is the data still showing us a lifelike variability of the chaos and randomness in the real world?

All of this is an attempt to explain data science as a discipline that respects data as a witness to the events we want to understand.  We subject data to a lot of scrutiny.  As we do so we discover new things about the data that merits future scrutiny.  Like human witnesses, we are always finding new ways that data can deceive us.

Thinking of data this way makes data into something tangible and yet distinct from the material world.  Data is not explained by particle physics.  Data is what informs us about particle physics.  But what explains data itself.  Data as the witness to the phenomena that we study?  What makes data possible?

In cosmological terms, the big bang event did not only give rise to particles and energy that later condensed to stars and planets.  It also created the possibility to witness these things: it created the possibility for data to exist.   If my very complex biological body is composed ultimately of sub-atomic particles, what ultimately explains data?

This returns to the question of explaining nothingness.  Nothingness of the vacuum of space is a necessary consequence of observations of distances not completely filled with matter.  With the data of the dimensions or data to justify geometry, there would be no nothingness.  Nothingness is a consequence of data.

The big bang being analogous to an explosion of light from a primordial darkness often alluding the the “let there be light” phase of scripture.  An alternative is a zero-dimensional everything that was interrupted by the introduction of nothingness, or the introduction of data … “let there be data”.  

What makes it possible for data to exist in the universe?  Is it a different family of forces and particles?  It is a kind of dimension separate from space and time?  Or is it something else entirely, escaping our attention because never asked the question?