Truth as a confounding variable that interferes with interpreting data

In an earlier post, I explored some ideas about how our Truths can cut off opportunities to engage in a world where data is not consistent with the truth.  In particular, I was arguing against moral truths when we observe that there are large communities who do not subscribe to the same morals.   Our acceptance of our moral positions prevents us from tolerating or condoning immoral activities.   This intolerance robs us of an opportunity to influence immoral activities in order to moderate some of the more extreme and repulsive practices.

I attempted to contrast our current democracy that operates on moral persuasion with a (near) futuristic dedomenocracy government that operates on observed data alone.  In a dedomenocracy, observations of morality are rare and ancient, usually in the form of very old literature making arguments for what is right.   While these moral teachings are part of the data, their attributes of few in number and ancient in age must compete with more voluminous and recent observations of what is actually occurring.   Ideally, the decision-making in a dedomenocracy should place a very small weight to morality data where that weight is appropriate for the few and ancient observations.   In dedomenocracy, morality needs fresher observational updates in order to earn a higher weight in decision making.

Unfortunately, the monotheist god in the Abrahamic tradition is mute to modern ears in order to allow us the free will to come to our own conclusions.  He will not speak to provide new moral data that relate to modern circumstances.  If he does speak, he will not speak through an objective source that we can subject to evaluation for authenticity.

Similarly, human moral philosophy progresses at a very slow pace with fresh consensus emerging only a couple times per century after much deliberation and debate.   There is no technological sensor that can provide us new observations of moral truths.  We have only the few old Truths we accept.   As I noted above, these truths are not absent in the data, but instead they are overwhelmed by fresh real-world observations provided by technology.   The real world presents us with observations that compete with moral truths.

In a dedomenocracy, the only valid decisions would be those that trusted statistical algorithms derive from trusted data stores.   Data provide the only constraint on machine-generated decisions.  The decisions are free from having to conform with moral truths when the data lack fresh observation of moral truths being relevant to the current circumstances.   In order for dedomenocracy to come up with similar results we get from persuasive democracy, we would have to artificially introduce morality observations into the data.  For example, we can assert that Truths are applicable to the current circumstances.   We have some justification for this approach because our concept of Truth is universal.   However, in dedomenocracy this observation has to be weighted by the individual making this claim.   In order to restrain influence dedomenocracy decisions, the data needs to confirm this observation of Truth by each of the inhabitants.    The examples I provided in my last post provide scenarios where there are large populations who do not hold these same truths.   Modern data will not support the conclusion that the truth is universal.    In these scenarios, a dedomenocracy does not behave like a democracy in terms of accepting moral Truths.  The only way to get the dedomenocracy to conform to democracy is to cheat by introducing model-generate data (that I call dark data) that asserts the moral truth is universal.  The dark data is the observation of immoral behavior by those who are acting contrary to the moral Truth.

To get the algorithms to accept the universality of the morality truth, we need to replicate that observation an apply it to each individual.  Actual observations of actions contrary to the moral truth are balanced by the model-generated dark-data that the action is immoral.   In the ideal dedomenocracy, even though the population must cooperate with machine-generated decisions, the population can challenge the legitimacy of the data and the algorithm.  The universal truth becomes explicit data that says each individual has that morality.   Each individual will be able to query his data and find this data point of an explicit assignment of a moral truth on his person.   In a dedomenocracy, he can demand that that data point be corrected.   Democratic participation in dedomenocracy involves participation in the data science project of keeping the data clean.  This process will eliminate any attempt to assign model-generated data to a person when that person shows that that assignment is false.  A dedomenocracy can not claim universal acceptance of a moral truth unless everyone agrees with that truth as it relates to his particular circumstances.

In a democracy, we talk about universal or moral truths in general terms.  In dedomenocracy, the same effect requires explicit assignment of the moral truth to each individual.  A dedomenocracy invites the population to participate in the data science project of keeping the data clean.  That data cleansing will reject these contrary claims of morality on a per-individual basis.   A single person may object that the morality does not apply to him or to his particular circumstance.   When the entire population broadly participates in this kind of data cleansing, the moral observation will not have the status of being universal.  As a result, the moral case will have a weaker influence on a dedomenocracy.

Dedomenocracy is a scaled up version of modern data science practice using big data predictive analytics to automate decision making.   As a data science project, there is a need to evaluate the data in terms of how closely it represents a fresh unambiguous observation of the real world at a specific time instead of a reproduction of a past observation through model-generated dark data.   Darker data involves some level of contamination with historic observations or with our interpretation of past observations.   The problem with darker data is that its use of old and potentially outdated data can discount more recent observations that can tell us something new and unexpected about the current circumstances of the world.

In an earlier post, I gave the example of how our understanding of the Ebola epidemic in Western Africa evolved over a matter of a few weeks from a prediction to an imminent catastrophe to a far less severe epidemic.  The initial interpretation was heavily biased by dark data using simulation models with very naive (and probably dismissive) assumptions of the medical sophistication of the local populations to manage an epidemic without outside help.  As the weeks progressed, the model’s predictions did not match the data but the official interpretation was that the new data was flawed due to a break-down in data collection due to the supposed devastation caused by the epidemic that was thought to be occurring.   Eventually the data overwhelmed the model’s assumptions so that now we recognize that at least in Liberia that initially was expected to be a catastrophic epidemic now has the disease reasonably under control.   In this example, there was evidence that the local governments and communities were making progress in managing the disease effectively.  Officials ignored this evidence at first because they considered it to be less reliable then the simulation results.   This example in overconfidence of models resulted in a poor allocation of resources to build now useless Ebola treatment facilities in a country that could have benefited more from the same investment spent on other priorities.  (To be fair, I was caught up in the initial alarm based on the modeled predictions, but I was arguing for more bright data to replace dark data from simulations).

In more recent days, I have been tracking the news of the Air Asia QZ8501 crash in the Java sea.  Although the global attention waned after locating the wreckage and observing that the plane is in pieces indicating a very hard impact, I continue to monitor the news to observe how the official explanations of the crash evolve with finding new evidence.   In the early days after the loss of contact of the plane, there were understandably a wide range of theories that extrapolated from the known data at the time: the pilot’s request for a flight deviation and the weather radar showing storms in the area.   One early theory was that weather at that altitude somehow caused the airplane to crash.  Subsequent evidence collaborates that explanation but the theory has evolved to match the evidence.

Commercial airline aircraft have the benefit of being equipped with durable recorders of cockpit sounds and mechanical parameters.   Investigators have recovered these recorders and have begun studying the contents to better understand what happened.  At the time of this writing, preliminary results have not been released although there have been claims that the cockpit experienced stall warnings.

Even with this wealth of data, there may not be enough data to explain how weather could down an aircraft.  For example, we may never be able to determine the relative contributions of weather, pilot actions, equipment failure, or structural failure.   One of the mysteries will be what exactly was going outside of the airplane.   The recorders are mostly blind to recording the conditions of the air mass that enveloped the plane at the time.  As noted, the earlier reports observed storms in the vicinity and that weather could be a factor.   However, these weather observations are not precise enough to characterize the local conditions surrounding the plane.   Instead, there is the evidence of the plane’s flight that involved a rate of climb that was far outside of normal capabilities for this type of aircraft.   One explanation, or perhaps a major contributor, of this climb may be due to the aircraft being in a rising column of air, such as inside a spinning vortex.   The aircraft itself may provide a gauge of the motion of the air that the aircraft encountered.   The flight recorder data may provide more information about the pilot’s intentions and awareness of what was going on.  It may be that the outside environment magnified the pilot’s intentions or caused the pilot to be mislead about what was happening.   I suspect the complex, extreme, and unknown conditions outside of the aircraft resulted in very complicated, confusing, or even contradictory information in the recorders.  It may not be possible to reconstruct all that was happening even with the recorded data.

Whatever the cause of the unusual climb, the subsequent descent appears consistent with a stall that the pilots were unable to recover.  Perhaps the plane entered a spin.   I saw an earlier report of a terminal track that showed a large looping descent but I have not found it reported in a reputable news source.  Also, I’m not sure that the stall has official confirmation from the data.   Assuming that the aircraft entered a stall, the stall needs an explanation.   Although the stall resulted from the aircraft lacking sufficient airspeed to operate at the new altitude, there is a question of how the aircraft reached that point.

I suspect that the explanation for the cause of the stall may have only dark (theory-based) data.   Also, I suspect the official dark-data story-telling will involve conditions inside the aircraft.  The stall was a result of some combination of the pilot’s actions and the aircraft’s instruments, engines, and control surfaces.   Although the actual explanation is a theory, this theory is an extrapolation of the knowable data of recovered recorded and observations of air-frame wreckage.    My objection is that this extrapolation from knowable data may provide a false sense of confidence in an official explanation for what started the stall.

The problem with this case is the missing data of what was going on outside of the aircraft.   The aircraft may have been caught in a rotating rising column of air so that the aircraft was stable inside that column.  Eventually the aircraft may have exited this vortex into an air mass with a very different wind field.   The aircraft may have suddenly entered a new environment where the aircraft was pointed in the wrong direction.   The aircraft could have been flying sideways or even backwards.

The eventual official explanation may differ dramatically from an external weather event cause.   Extrapolating from the known evidence from the aircraft, the explanation is likely to point to some failure of the plane or pilot: a different plane or pilot may have saved the flight.

What was occurring outside of the aircraft is mostly unknown evidence.    However, the clues from the air aircraft may provide clues to something new about the natural world.  In particular the aircraft may have discovered a new type of condition that can occur inside a storm and this condition cannot be navigated by any winged aircraft.

I note that the official approach of seeking an explanation from inside the aircraft is justified because this is how we can improve pilot training or aircraft engineering.  The aviation profession is dedicated to improving pilot training (that already includes avoid entering intense storms) and aircraft design to survive extreme weather events.   I suspect that this may be an example where we need multiple versions of the truth.   There may be two explanations: one for the aviation community and another for the weather science community.   For weather science, the QZ8501 flight was a sensor of a weather phenomena of storms that can occur during this season over the Java sea.   An aircraft entering such a weather cell may have no recoverable way to escape.  The consequences happened to be tragic.

I’m using the QZ8501 to illustrate a problem with working with data when truth can get in the way of discovering important information.   In this case, the underlying truth involves the investigation focus on the recorders and the physical wreckage.   The recorders and wreckage do provide abundant data that can explain what happened, but the explanation itself will mostly center on what we already know: that pilots make decisions during the flight and that the airplane responds to the combination of the pilot’s intentions, the airplane’s capabilities, and the environment around it.   This truth is the imagining of the airplane trying to avoid some problem and later trying to recover afterwards.

Another story that can be explored is what was going on in the storm in the vicinity of the airplane.  In that story, the airplane is a probe into the storm and this probe discovered something unusual about a storm and perhaps something very new.   I don’t doubt that weather scientists will perform their own study of the incident to learn something new about severe weather at that altitude.   Their results probably will not be publicized as much as the official finding of what happened with the pilot and machine that led to the tragedy.

Advertisements

7 thoughts on “Truth as a confounding variable that interferes with interpreting data

  1. Pingback: Confounding variable of Ideology in interpreting political conflicts: the truth may be generational conflict | kenneumeister

  2. A follow up on QZ8501, a preliminary report describes the pilot leaving his seat to disconnect flight augmentation computer and this left the copilot. While the investigation continues to explain why the pilot disconnected the FAC and how the copilot lost control under manual flying, the investigation may be overlooking a confounding variable that the plane in fact is not manually flyable in the conditions it found itself in. Perhaps the only way the flight could have been saved was by computer and its erratic behavior may have presented the best or even only option for dealing with the conditions.

  3. Pingback: Dedomenocracy lessens the authority of expertise | kenneumeister

  4. This post provides an interesting illustration of the difference of seeking the Truth and of merely following the data. The difference is illustrated by the difference of making medical and mental diagnosis.

    The medical diagnoses uses symptoms to guide investigation to find the root cause that identifies the root cause that can be effectively targeted for treatment. This is an attempt to find the Truth behind the observation.

    The psychological diagnoses (guided by the DSM) uses symptoms alone to identify a diagnosis to guide treatment. The combination of multiple symptoms of sufficient severity defines a diagnosis. Clinical trial data identifies treatments that prove significant effectiveness for the diagnosis. Treatment proceeds without any claim of understanding the root cause or underlying Truth that is the source of the symptoms.

    The implication is that the medical approach is superior. in other words, Truth is superior to symptoms (observations). In my post above, I’m arguing that Truth can get in the way of making good decisions. If we have sufficient measurements, we are better off acting on the data alone (based on patterns matching effective treatments) than to attempt to identify the Truth and focus the treatment solely on the identified Truth.

    In the medical field, sometimes the symptoms can suggest multiple underlying conditions or that one true underlying condition can not explain all of the symptoms or their severity. This may suggest a treatment that may does improve the condition but misses other conditions that may in fact be more important.

    For example, a diagnosis following an auto-accident trauma can identify broken bones to be treated but neglect a internal bleeding that can become fatal. The discovery of an underlying Truth and application of treatment was correct. The problem is that it was not the most urgent condition to be treated. A data-driven approach that does not emphasize Truth can observe that auto accidents frequently involve internal bleeding even if there are no symptoms and that this should be treated first.

    As I understand it, this is in fact what does happen in trauma centers to act on the circumstantial evidence of being in an auto accident instead of any visible or reported symptoms. This is acting on data (experience of what happens to bodies in auto accidents) instead of Truth. Even good medicine works on data (in this case to run tests for internal bleeding) instead of acting on an hunch based on reported symptoms alone. This is similar to how psychology works.

    The contrast of medical and mental diagnoses can illustrate the difference decision making with data and with Truth.

  5. Pingback: Dedomenocracy illustration: safety of airline travel based on statistics not physics | kenneumeister

  6. Pingback: Data is antagonist of science | kenneumeister

  7. Pingback: Economy of compensated opinions in a dedomenocracy | kenneumeister

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s