Indifference to dark data

It has been over 2 years since New Horizons probe passed by Pluto, and yet it remains on the top of my list of achievements of mankind.

I am not impressed by the data the probe collected about the Pluto system.   There may have been some major discoveries or observations from the mission, I just don’t care about it.   We ended up seeing up close yet another large heavenly body that was like the other bodies.  It shared with other bodies the common features of plains, dunes, mountains, valleys, craters and so on.   It also shared the uniqueness of a different combination and arrangement that suggest processes that are not easily explained.   It was as interesting as meeting a new person: similar to other people and yet obviously different.   It is nice to meet new people, but it is not shocking.

For the mission to have really impressed me for its observations, I would need to see something as different as meeting a person who quickly demonstrated some supernatural power.   I was hoping for something different, like evidence of it being a gas midget or having evidence of prior inhabitation by some type of life.   Because it didn’t surprise me at that level, it joined the many other worlds we imaged up close as being interesting only to those who are curious about explaining topological features.   In that sense, the mission was a failure because it lessens the incentive to further explore these bodies.   I’m sure there will be more missions, but they will not benefit from my political support because I don’t see any value for the further discovery that closer looks bear resemblance to things we see on earth and most of the time these things are at best tourist attractions.

Maybe someday we’ll come up with some near light-speed travel between planets making extraterrestrial tourism possible.   We’ll have lots of new vacation tour opportunities to add to the list of tours I have no interest in taking.

Despite that, I have the highest respect and admiration of the success of delivering a tiny probe to pass by such a small object at such a far distance at the predicted schedule.   In order to make the attempt itself even possible, the engineers and scientists had to come up with a way to do this within a tight budget.    The probe needed adequate instrumentation to collect images and data and send it back to Earth.   This equipment needed to survive a decade of disuse in harshness of space.

More impressive is that we needed a navigation to get to the place at the right time with as little fuel as possible.  This relied on high confidence in planning a trajectory that required little fuel while still reaching its destination.   This required a trust in calculations that predicted the path of travel.   The accomplishment of the mission is that it demonstrated that the trust was well placed.   Science, or at least the science for engineering, is something we can trust for such precise long-term planning.   Certainly, this has been sufficient proven through much earlier accomplishments, but the leap in scale of this achievement provides a new record for how well we can trust our science.

The importance of this mission is its providing the capstone achievement of navigating through the solar system.

To me, the New Horizons mission may prove to be the “jump the shark” episode for planetary exploration with robotic probes.   There will be more missions, but they are losing public support.  The primary benefit for these missions are the photographic images suitable for framing or display.  Any science exhibited in the images is merely a theme that makes the image visually interesting.

There are many more opportunities for novel closer-up images of objects in the solar system.   Meanwhile, this source of image making faces stiff competition from the cheaper and more prolific source of computer generated images and digital artists.  From now on, it seems that the primary product of these missions will only be the capture of new images.   We can get more interesting images with more frequent novelty from our computers.

So far, there does not appear to be any opportunities for human advancement with these missions.   Except possibly for the Moon, there are no 2 week vacation tour opportunities.   There will be no new extraterrestrial economic enterprises of mining, manufacturing, production of food or energy, or colonization to support large scale migrations.    Even if we learn new things such as the history of how the solar system formed, these will not have any relevance to the progress of the human condition.


In contrast to the Pluto mission, the recently completed Cassini mission to Saturn was much more productive because it allowed for multiple observations following in depth study of previous observations.   However, even here, the mission ended with more questions that need more data to address.

Cassini’s early mission with the Huygens lander was similar to a Pluto mission, giving us just a one time view of Titan beneath the haze.  That view gave us the details to ask questions.  We had expected hydrocarbon oceans and a cycle similar to Earth’s water cycle but with volatile hydrocarbons instead.  Instead, we saw a drier surface with features that need explanation.

We will never get the explanation.   I doubt man will ever return to Titan even with a robotic lander.

Despite the long mission of multiple orbits and multiple flybys of the moons, the Cassini probe itself is a scaled up but same result of the Huygens probe.   That result is to give us observations that ask new questions, but we may never have another opportunity to gather new observations to answer those questions.

In contrast, I suppose, we have sent multiple missions to Mars, Jupiter, Venus, and even Mercury.   We have a lot of surface information from Mars, and more missions are in the planning stages that may occur.   We have sent increasingly sophisticated orbiters to Jupiter.

In both cases, the return missions illustrate the need to return.  The earlier missions have uncovered new data that allows us to ask new questions.  Those questions motivate us to design return missions either to answer those questions, or to focus our goals given the constraints posed by those questions.

My point, from a data perspective, is the inevitability of a need to go back to retrieve new data.   When we propose large science missions, we often sell the project on the one time achievement of answering some specific question, the answer of which would be as satisfying at Newton’s discovery of gravity.   What bothers me is that this one-shot mission is acceptable for our investment.   We at one point thought that the New Horizon’s mission was worthwhile.   I was among the ones who supported this mission even at its scaled down version with a brief flyby.

At the time, I had only begun working with high volume and high variety data.  My thinking now has changed.   I now see any such space mission as a big data query.   The mission will return a limited result set from an abundance of data.   That result set may answer my question, but more often than not, the result set exposes  more questions, many of which would appear to be more important than the first question.   Often the followup questions are along the lines of something is not right: the data is flawed, or the understanding is wrong.

One time science missions are inherently flawed in that they neglect the necessity of going back for more data: similar data at different views, or completely different varieties of data of the same view.    Because these missions will never give us that opportunity, the data they retrieve has the value of decorative imaging of the data.

The missions do provide us new data and that data can answer some questions.   My observation is that those answers are hollow unless we can go back and test that data with new observations or with observations of new types of data.    Does the surface of Pluto appear the same today as it did 2 years ago?    What would more detailed observations of certain features tell us?   These questions are predictable at the start of the mission, and these question at the vary least lessen the value of the mission in the first place.

I wonder what my job would be like if for every question I get, I’m told I’m allowed only one query.   Assuming I had lots of time and attention to get the query just right, I would write the perfect query for the question.   However, once I run that one query, I would be stuck with the results forever.   I will have to present the results to the stake holders.

Inevitably, they would ask questions I hadn’t anticipated or I hoped they wouldn’t ask.   For example, they may raise doubts that the query was written correctly, or that some data may be missed.   In this scenario, I would have no opportunity to query for additional information.   My only choice would be to defend those first results.  I could dissect the query for any flaws.  I could check for whether I had missed anything in the planning for the query.   However, I would not have the opportunity to run a new query.

I conclude that I would not want that job.

I started my career with a fascination on analysis, and a desire to be a theoretician.  I admired the people who could make prediction from an isolated office surrounded by nothing but books and paper.   This was in the early 1980s when access to data was severely constrained: only a few people could access real data, and that data was very small.   I still respect the skills of analysis of existing data and of the processes that could corrupt that data.   These remain important skills.

In the 1980s, the skills of theoretical and mathematical analysis was essential because observation data was rare.   Also, I recall a sense that observational data was unnecessary in many cases.   An example from physics courses was that given a velocity of a projectile in a vacuum, we can confidently calculate the position and velocity at any future time so there is no value in measuring either during the flight.   This confidence carried over to other projects such as civil projects.  We could calculate in advance the behavior of the bridge based on our knowledge of the materials and of the structure.   Properly designed and approved, the structure would behave within the goals for supporting various loads, withstanding winds or earthquakes, and properly adapting to the temperature.   After the bridge is built, we could check the daily weather report and be confident about how much the elements have expanded due to temperature or how much the structure is swaying due to the wind.

Since the 1980s, there has been a change brought about in part due to affordability of data collection and analysis, but also due to a decreased confidence in our initial calculations.    The latter loss of confidence was partly the result of economizing on cost by pushing the designs with narrower tolerances for error.   Together, there is a change in approach to add instrumentation to our creations, instrumentation that implicitly admits that our initial calculations may not be fully trusted.

For the bridge example, we still do the hard work of engineering the structure to meet all the requirements, but we build into structures sensors to give updates of actual performance long after the original design was completed.   For bridges, we add sensors to measure not just the temperature of materials, but the actual expansion or contraction.   We measure actual amplitude and frequency of sway in addition to measuring the wind speed and direction.   These continuous data give us new opportunities to get better performance of the bridges, but also gives us new insights about how such structures actually behave in reality.

Tying back to the space probe missions, the one-shot space probe missions are analogous to the older engineering approaches relying extensively on theoretic based predictions used in engineering.   The analogy I’m trying to get at is that that one shot effort is expected to stand for a long time.   Like the bridge built a century ago is expected to still be useful today, we expect that future questions about Pluto can be answered by revisiting the data we collected about it 2015.

Today, the idea of solitary space probe missions seems anachronistic.   Certainly, today we have the advantage of miniaturization and more efficient transport to make more effective missions, but these are a modernization of an outdated idea.

A more current approach to space exploration is establishing a persistent system of continuous monitoring of the subject of interest, and following this with increment improvement for more extensive monitoring by expanding the coverage or by adding more variety to the measurements.   A modern approach would give us the ability to revisit old questions with new data, as well as to ask new questions without being constrained by old data.

One of its kind and one time, never to be repeated missions are harder to sell in modern time.   There is a parallel with the improved civil engineering projects to push the designs beyond what we can have complete confidence at the time of design.   Our science is also pushed beyond our complete confidence.   All of the recent probes have produced unexpected results, and these results contradict earlier science.    We resolve these contradictions using the old ideas of revising our theories to explain what we found.   This approach is much more unsatisfactory in the modern age than it was in the past.   I expect the opportunity to get more data to better define the parameters of the unexpected results as well as to test the new theories.  I expect incrementally expanding continuous monitoring.

Technology of the 21st century has changed our expectations of what to expect from data.   The modern expectation is of streaming data as opposed to one-time data collection to be stored to permit repeated analysis of the same data set.   Streaming data accumulates into ever larger data stores that capture observations over longer periods of time and also expand to accommodate new sources at they become available.   The modern goal is to establish permanent sensors on a topic of interest.

I’ve been describing this in context of the obviously limited opportunities of space explorations up until now.   However, this generalizes to all of science in the sense that streaming data competes favorably over the standard of science experiments.   Often many scientific experiments are similarly one-time only limited data collection of an experiment that closes with a mission completion in the form of a published paper or a breakthrough that can be repeatedly discussed for the remainder of the scientist’s career.   In the older model, the continued discussion of a scientific result is always a rehash of some earlier study.   Sometimes the original study is decades old and yet never updated with newer data.

In established science practice, the publication of a first finding removes the incentive to repeat the same study.   To distinguish this practice from the emerging model, I would call the older model a publication-based science.   The final result of publication science is the permanent and often final publication of a particular study.   The merit of the published science is measured by the number of times other studies reference it, especially those studies that develop new ideas from the old findings rather than attempt to reproduce the study with fresh data.   The publication-based science advances science by the necessity of new publications that similarly break new ground after referencing the prior study.

The emerging competitor to publication science is the science of streaming data.   The new science continues the data collection of topics already studied.   This data accumulates over time so that future inquiries about the earlier findings will reference more recent measurements in addition to the earlier published findings.   The newer measurements will either increase the confidence of the earlier findings, or it will challenge it with observations of inconsistencies.

The old publication approach to science is increasingly unappealing after observing the benefits of being able to incorporate more recent observations of the same phenomena either to compare with older results, or to add to the older results for improved statistics or other mathematical modeling.

When I have the choice of talking about old results or of consulting more data collected since the earlier study, I prefer the latter.   Consequently, when presented with a choice of investment into future studies where one option is a one-time experiment like another planetary flyby mission, and the other is an establishment of a continuous data collection of a particular area, I prefer the latter.   When talking about any topic, I prefer to include more recent observations to the discussion rather than to argue over who has the better interpretation of historic measurements from instruments and scientists lost to time.

Most of the above discussions are with pure science where pure is a euphemism for science with no practical implications in the foreseeable future.   It is interesting to learn how Pluto may have been created, but there is very little chance that this understanding will affect our lives.   However, the discussion also applies to the practical sciences, and in particular the social sciences that inform policy making.

Social sciences include topics such as psychology, sociology, anthropology, or economics.   These sciences remain publication based sciences.   We are asked to respect published works in peer-reviewed journals, and particularly those that are frequently cited by others.   Because these are published studies of one-time experiments, or discourse is fresh discussions about ancient data.   Subsequent studies collect new data, but that data supports newer findings that somehow advances the science in a way that merits publication.   We do not collect fresh data to augment the ancient data for the sake of the original thesis.   There is no market for publishing old conclusions with newer data.

From a policy perspective, there is a market for newer data that addresses an established finding.   A recent example is the health care debate in the US that resulted in the Affordable Care Act in 2010.   Many studies concerning impacts on budget and on access to health care supported the legislation.   Despite passage of the bill, we continue to argue about the interpretation of the original data, or we discuss newer studies of new experimental designs to suggest alternative approaches.

We seem destined to make the same mistake over again by basing policy on older studies.  A better approach is to first select those studies we determine to be most reliable for setting policy.   The second step is to instrument the relevant parts of the economy and the health care system to stream a collection of the relevant data with up to date reporting that includes the latest results.   The policy making step will follow the instrumentation step, where the policy is explicitly tied to that instrumentation.  The instrumentation is a concrete establishment of the understanding that lead to the policy enabling us to see the current implications of that understanding.   In particular, the monitoring provides us a way to assess the legitimacy of that understanding that lead to the policy.

The publication approach to science fails to make good policy because it forces us to presume the legitimacy of the science consulted to make the old policy.   The publication process that starts with a skepticism of pre-published results ends with strong skepticism against any criticism of published results.   There publication approach to science imposes a stagnation onto policy that is counterproductive in terms of governing a complex population during uncertain times.

We should learn from recent experience of large data technologies the lesson that decision making can benefit from streaming data in addition to (and often instead of) the publication science of one-time experiments.    It is clear now that policy making needs access to a continuous stream fresh data about old ideas, especially when that data accumulates over time.   With access to the technologies to do this work, it is unacceptable to base policies on the failed approaches of the past that rely on published studies.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s