The folly of Instrumenting Humans

I noted in an earlier post an article that discussed the new opportunities of gathering data on individuals using proximity cards and always-on smart phones.   For example, the smart phones could have new apps that record movements and proximity to other individuals, and perhaps pick up some state-of-mind signals from tone of voice or other means.   I brought this article up at the end of my post about the impenetrable inner world of individual experience.

My proposition was that people will always have information through their experience that we can never measure.   In contrast, the article suggests that technology is increasingly personal and people are increasingly accepting in wearing that technology throughout their entire day.  Even though the technology may only obtain small bits of non-private information through the day instead of anything excessively intrusive, those bits of information may be very illuminating about the person’s inner life.   A lot of human behavior is predictable either by habit or by instinct.  It could be possible with some certainty to associate various bits of information with what is going on inside the individual’s private life.

I kind of agree that a large amount of small bits of information consistently collected during the day and compared with similar populations may propose new insights or hypotheses.    This is what big data does.   Given enough sample size with enough dimensions to choose from, a pattern can be found among some dimensions.

My objection is that such a pattern is at best only a hypothesis discovered from historical data.   A discovered hypothesis is not a tested hypothesis.   A discovered hypothesis is only a suggestion of a possible experiment to test.

Before big data became a fad, we were always able to discover hypotheses from available data, but we were more honest about their needing controlled experiments to test.   Because experimental resources are always limited, we would collectively agree which hypotheses were strong enough to merit an investment of a controlled experiment.    In any case, our decision making will not access the hypothesis until it has some independent verification through a controlled experiment.

Today is different.   We are becoming accustomed to thinking of hypotheses discovered by patterns of in large enough sets of data to be simultaneously tested by that same data.   In a recent post, I described what I called accessory data as illustrated by the clothing a patient wears to a routine doctor’s appointment.   That accessory data could be collected and added to the database.   Pieces of accessible data that when combined with other data may reveal some greater truth.   In my example, I suggested that maybe we can find a pattern in clothing choices combined with a BMI could more accurately predict health risks.   In many ways, my accessory data proposal is like the proposal of data collected by a smart-phone app.

It is certainly possible if not highly likely that such patterns would be observed.  I recall my early lessons in designing experimental simulations.  These simulations were of very large systems with a lot things going on simultaneously.   Despite the size and breadth of the simulation and the cost of a single run, a multi-run test using different random numbers could only provide a single statistic measurement out of the countless of other opportunities.   One of the reasons was the statistical confidence assigned to that single measurement.   We may strive for a 95% of the derived value, meaning there is a 5% chance we are wrong.   If we attempted to use the same experiment to observe more variables, then we are more likely to be wrong.    Increasing the number of measures attempted with a single experiment increases our certainty that at least one is completely wrong.    I mentioned before that I see a lot of similarities between big data and statistical simulations.   One similarity is the certainty that a pattern is wrong if there are too many opportunities to choose from.   The best default assumption from any discovered hypothesis from big data is that it is wrong.

We are very creative thinkers who can imagine a hypothesis from any observed patterns.   With so much data available, we can create hypotheses faster than we can possibly experimentally test them.   We also reason that since a controlled experiment is only a collection of data and that data is already collected so perhaps we can go straight from discovered hypothesis to tested hypothesis.   In short, today we convince ourselves that can go straight from the historical data to the decision maker.

This attitude is an extension of how we work with large IT systems.   For years, we have been heavily instrumenting all of the critical infrastructure so that we can rapidly detect and respond to failures or other performance problems.   Many such problems are reported explicitly as alarms detected directly by the system.   The system sends an message demanding urgent attention to an operator who can do something to fix the problem.   This is how we operated equipment even before there was IT.   A simple household fuse did the same thing.

But in IT equipment we also started recording routine information about the system and storing that data in a persistent storage that I’ll call a log file.   The log data can include routine measurements, simple messages indicating that something ran normally at a particular time, or some trigger-based message that wouldn’t trigger an urgent alarm message.   The log data measured a lot of dimensions about the equipment.   This log data became useful during forensic activities after being alerted that something needs attention.

The log data within single systems could include a lot of different pieces of data that could be measures or dimensions to use for analysis.   Different systems would provide different pieces of information.   The various pieces of information may be matched at least by time stamps and often by other clues.   This is an ideal opportunity for big data technologies.   These technologies currently support the optimization of the IT operation, computer network, application servers, server farms, etc.

Reporting solutions based on these log file could identify patterns of various dimensions that could suggest optimization or redesign opportunities.   After some concurrence other operators, this suggestion may be acted up by modifying some aspect of the systems in some controlled fashion.   Sometimes it would be first tested in a preproduction mock-up of the system and this qualifies as an experimental test: there is no impact on the operational system.   Sometimes the real world conditions could only be replicated on the production system so the experiment is done on the operational system, perhaps during non-critical periods and usually with a way to quickly (immediately) revert to a previous configuration.

In these large scale systems, there are various procedures to follow to assure exhausting all possible testing opportunities before risking a failed experiment with the operational system.   However, eventually there will be that final test and often that test will fail.  We understand this failure possibility even if that understanding doesn’t protect the staff from pain of criticism.

Within the IT world of operating production systems, there is a general respect for caution that even though there is compelling evidence that some change may be useful, the change is not fully tested until it survives in production.   It could fail.

When I read about data systems based around measuring humans, I immediately think we want to treat humans like servers in a server farm.   If only humans were heavily instrumented and logged, we could use that data to come up with new hypotheses that can then lead to new policies that can lead to a more optimized society.

I fear that this confidence is emboldened by the obvious successes exhibited by the rapid improvements of ever more complex IT projects.   There is a lack of perspective that sometimes the proposed IT hypotheses are disastrously wrong.   Often the disastrous results are barely noticed because of the precise timing to implement during the least disruptive period and the ability to immediately revert back to the original configuration.    This kind of failure doesn’t get enough attention compared to attention paid to the grand successes.

I agree that there are ways to instrument individual humans at population scales with observations that approach the granularity available in server log files.    There can be enough information to feed a large data store that will permit discovering new hypotheses.  The problem is how exactly does that discovered hypothesis get tested?

Human society has no equivalent of multi-tiered development stages each with approximate replications of the production society where we can test concepts before exposing them to the real world.   Even in the production stage of real world society, we lack the precision to test a policy for a few short minutes during the least disruptive period (if that period exists at all).   Policies take long time to take affect, and they are never easily retracted.    Society offers no equivalent to IT’s ability revert to last known good configuration.

If we could do in society what we can do in IT projects and roll-back a bad idea, then our politics would be so much easier.  We could admit a mistake, undo the damage, and within moments return to as if nothing happened in the first place.    The reason why politics is so difficult is because we have no roll-back option to bad policies, and the only test bed available is the real world society.   Society offers nothing comparable to the multistage multiple-approval progressive testing cautionary approaches available to operational IT systems.

With access to more information and with that data getting closer to the individual level, we will certainly find new patterns that can suggest hypotheses.   Given that the default of such a discovered hypothesis is that it is wrong, what options other than immediate operational implementation do we have to test the hypothesis?    Given the risks involved, we are still stuck with uncomfortable political debate.

With all the advances in big data, we still need multiple stage test societies to test ideas before making them operational in the real society, and we still need a way for an immediate roll-back capability to return society back as if nothing happened at all.


2 thoughts on “The folly of Instrumenting Humans

  1. Pingback: Government by data, a different approach to separation of church and state | kenneumeister

  2. Pingback: Government by data, a different approach to separation of church and state | Hypothesis Discovery

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s