Dark Data: Evolution’s missing links

In an earlier post, I described as stealth the problem of missing evidence of gradually transitioning forms.  Instead both the fossil record and the observation of current biology shows remarkable stability of forms once they are established.

I am not trained in biology so I have not much to contribute outside of uninformed musings.  

I was approaching the topic of evolution from two non-biological perspectives.

One is from the perspective of a data scientist, skilled mostly in databases but with historical data so I consider myself an amateurish historian.   One of my themes about data science is to distinguish bright data from dark data.   Bright data is data from well documented and controlled observations.   In evolution, the fossil record and naturalist observations are bright data.   Dark data is data generated from accepted theoretical models to fill in gaps in the record.    In evolution, dark data is the supposition of transitional forms and common ancestors otherwise missing from the records.   As an aside, I’ll note that bright data may be more appropriately described as usually being dim because of some tarnish to its reputation as reliably documented or controlled, but it is still distinct from non-observed model-generated dark data.

Evolution is an interesting case study because the information available is very representative of Big Data and Multi-dimensional data analysis.   The data is extensive covering a huge variety of life forms, distributed globally, and recorded over hundreds of millions of years.    It also includes both bright and dark data.

The second angle I was using was exposed by my choice of using the word stealthy.   I was invoking a certain amount of cleverness similar to secret technological project at a company working out the details of some new product that once released will take the world by surprise.    When evolution is up to something new it does it tinkering as secretly as it can.

The observation of that earlier post was that we see in nature and in fossil records are the successful species.   The success is evidenced by their sufficient abundance to be observed or to be preserved in fossils so that they can be found.    What is remarkable is that observable species (the bright data) appear to be very stable over time and also appear to be at dead-end branches from the main trunk of evolution.

The trunk of the evolutionary tree is so slender as to be invisible.   The word tree is misleading in that it connotes a thick trunk with narrow branches.   Instead we see thick branches off of a filament-thin trunk.   A better analogy may be to compare evolution to a mushroom with an occasionally noticeable mushroom sprouting from a persistent nearly invisible network of underground filaments.   The process of evolution is like those filaments.

I propose that evolution occurs primarily along unsuccessful species.  A successful species has no need to change.   It will end abruptly at the introduction of the surprise introduction of a new form that out-competes it.

That surprise comes from a species or population that is struggling at the edge of extinction.   Such a population will not leave much or any evidence of existing.   Its small population size also assures that new  genetic innovations can spread rapidly through the population even if the innovation offers not advantage or may not even be expressed.   Over time, the population accumulates a sufficiently large number of genetic innovations that a new innovation that turns all of the dormant talents on at the right sequence abruptly introduces a new capability that is radically different from their predecessor.  

I like the analogy of the technology laboratory where a lot of different innovations assembled in new ways is required to make a disruptive commercial product.  First, that laboratory has to assemble the various new components to make the product real.  This takes time and for whatever reason is not seen coming by the market or competitors.

The new innovation springs seemingly out of nowhere to and then joins the ranks of successful species.   Its introduction may displace other species perhaps to the point of extinction.   And its dominance of its niche will allow it to remain stable until some future time when it is displaced by some future surprise.

The surprises are always being worked out of sight in some corner where some population is struggling just to survive another generation.

One of the arguments of intelligent design is that most successful biological processes, traits, or behaviors, involve a set of several independent parts that offer little or no advantage on their own.   I recognize this is a debatable proposition.   However, my first thought was that it sounds like a corporate funded research and development project that may go on for years with no immediate benefit to the company until the product is released.

If there is anything like intelligence occurring in my stealth-evolution model it is in the idea of something recognizing that a recent generic innovation may be useful sometime in the future and is worth distributing to the population in a deactivated form.   The accumulation of useful innovations may remain dormant until enough innovations are present.   That sounds intelligent, in a stealthy kind of way.   Even more remarkable is that this process would occur so frequently to be able to create the diversity that exists today and even greater diversity in the fossil record.  It reminds me of competitive innovation in corporate industry.

My perspective is about data science and in particular the idea of dark data, model generated data to fill in gaps.   The dark data is the largely invisible common ancestor species the clearly shows an introduction of some trait shared among descendant species.

We satisfy ourselves that the common ancestor that expresses the adaptive trait must have existed.

Most complex organisms appear to share some traits with one species and some traits with a different species, but both of these species share little in common.   These two “cousin” species appear to share a more distant common ancestor than the supposed common ancestor that times them to the first species.

Labeling the common ancestor as dark data, as a hypothesis, frees us to consider alternative hypothesis.  Maybe there were one common ancestor for one trait and a different common ancestor for another.   Maybe genetic material can jump between species in other ways than through linear reproduction.   In particular, I imagined some parasite that spends part of its life in one animal and part of its life in another.  Perhaps the parasite can be a carrier of genetic information between species.

I started to invent a fictional explanation of human’s unusual lack of body hair as somehow being transmitted from pigs.   I thought of the various parasites that seem to frequently jump between the species as a possible carrier.    

I thought of pigs because I also like pigs.   I had the pleasure of raising a couple pigs during my childhood and I found them very fascinating for enough reasons to fill a different blog post.   It is easy for me to accept some kinship with pigs.

All of this was just idle uninformed imagination that I thought would make an entertaining short mention and then move on.   But today, I find an extensively researched and thorough argument on pig human hybridization as alternative theory of human evolution by Eugene M McCarthy, a PhD in genetics.    So far advanced is his work that the only thing this post has in common with his is that my middle name is the same as his first.

His site offers much a lot of enjoyable reading.   I very much recommend reading it just for the depth of details he presents.   His case overwhelms me but that doesn’t say much.   It is fun to read.

I take special interest in reading his work as a case study of hypothesis discovery using data science.   He assembled a lot of evidence of a lot of different data dimensions to suggest a very compelling hypothesis.   I think his work is an example of excellent data science work.   And I appreciate his conclusion that I equate as my concept of hypothesis discovery.   This is an unexpected hypothesis discovered by looking at the (mostly bright) data.  He emphasizes that it is only a hypothesis and needs testing but as he says “Before dismissing such a notion, I would want to be sure on some logical, evidentiary basis that I actually should dismiss it.” 

I very much agree and it is an exciting hypothesis.

His site has a lot of other information.   In particular, he addresses the point I was making earlier about the stealthiness of evolution.  While my fantasy considered what was going in where no data exists, he more professionally discusses what the existing data suggests in terms of stabilization.   Stabilization is that once a new form emerges it is remarkably stable until it becomes extinct, a period that may last millions of years.   This observation is contrasted to the expectation of a gradual evolution involving incremental changes selected by nature.   This doesn’t explain the abrupt appearance of new forms that are documented in the fossil record, but it does describe the available data.  I add this to my examples of good data science practice.

This is on my reading list for future reading.   From what I read so far today, I was motivated to link it to what I had previously been thinking as an idle uninformed daydreamer.


2 thoughts on “Dark Data: Evolution’s missing links

  1. Pingback: Role for Forbidden Data | kenneumeister

  2. Pingback: Stealthy Evolution: A three parent hypothesis | kenneumeister

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s