Big Data will not save Health Care

One aspect of the new health care laws is the creation of new big data opportunities to combine health record information into a big data store to allow for analysis of combinations of conditions and treatments that offer the most cost effective outcomes.     We expect that such big data analysis will present opportunities for cutting costs by selecting more cost effective approaches for specific patients and by avoiding costly but ineffective approaches for others.

I am concerned about the idea of using big data analysis to approve or deny a doctor’s request for treatment for a patient.   Consider for example, a patient who has been dutifully paying health insurance his entire life gets some life-threatening condition for which a treatment is available.    In the past, his insurance would cover the treatment recommended by the doctor.    In the future, the administrators will run a big data query and possibly find out that people with the same eye-color, height, nose hair density, ingrown toenails, and a regular consumer of diet coke have poor outcomes: this procedure would be denied because big data will tell them it is not worth the cost.     A big data decision in the name of cost savings overall meant that this one person wasted a lifetime of healthcare premiums for something he will never receive.

Contrast that with someone with no prior insurance premiums develops a conditions requiring immediate attention.   His health insurance can not be denied based on pre-existing conditions and his big-data query results confirms the procedure would be cost effective.

In the past (with its problems of escalating health care costs) the individual regularly paying his premium will get health care because it is covered and it is recommended by a doctor.   The individual who avoided health insurance premiums would have to pay costs out of pocket of from some other means.   Even for those insured, insurers were allowed to adjust premiums based on risks of the particular groups involved.  That premiums and coverage would be negotiated far in advance of needing the coverage.

Now, everyone more or less has to pay the same cost for health insurance.   Everyone belongs to the same risk pool.    This is great for eliminating the denial or excessive premiums for preexisting conditions.   Hidden is the fact that a policy’s statement of covering a particular procedure does not assure the patient will get access to that procedure.

As my silly example hinted at: a patient’s pre-existing conditions will still be used to deny access to health care.   Ingrown toenails, height, eye color, and history of diet coke consumption are pre-existing conditions that may disqualify the patient for access to a procedure but does not disqualify the patient from paying his premiums.      My example is silly but real-world combinations will be equally irrelevant to the condition although more easily rationalized after it is pointed out.

In the past, the insurance consumer will understand up front what care is going to be covered.   The conditions listed such as sex, age, tobacco use, prior treatments for conditions are well known to the patient.   If he is shopping for insurance, he will understand what the premium would cover and why it is what it is.   He may not like the price, but at least he knows in advance what conditions are influencing the price.    The important point is that if he accepts the premium then the insurer accepts the responsibility to pay for all covered procedures recommended by a doctor.    In the past, the health insurance consumer knew what he was paying for.

Now, we are all covered and we all pay the same premiums.   We have universal insurance.    But when it comes time to access health care, we will be subjected to a big data query that takes all of the data about our selves and compares them with historical cases with similar combinations of conditions.   Such queries may involve unlimited number of iterations to fine tune the combinations to get the most confident answer of either yes or no.    The query is run at the moment of decision of the health coverage, not in advance when the premiums were paid.   Also the query is run on conditions and measures the patient is probably unaware of.   In particular, the query itself may not even be predictable as the analyst iterates through refinements of the query parameters to get the most confident answer.    These may be conditions that the patient never suspected would be related.   Conditions such as some childhood illness could indicate a unacceptably high risk of poor outcomes of a particular procedure.

We did not eliminate inequality in access to health care.  We merely shifted from a deterministic inequality based on access to wealth to an unpredictable inequality that borders on random chance.   No one can predict if some future hypothetical condition would be covered when it is needed.   That choice can only happen at the moment when that decision has to be made.   The decision will be made not only on the relevant health conditions of the patient but by seemingly irrelevant conditions that can be compared with historical recipients of the same treatment to see if the procedure will be cost effective in this particular case.     There is no guarantee that the query process will return the same results when run at different times with different mix of available data points available at the time of the query.

From the perspective of the healthy insurance premium payer, access to any particular treatment is like a lottery.   No matter how confident he may be that his needs will be approved, the decision will be based on the big data query.   The big data query can suddenly discover an unexpected confounding condition.

As I understand the debate, this change from wealth-based to outcome-based (random) access to coverage for pre-existing conditions is acceptable.    Perhaps it is true that some combinations of conditions could have a huge disadvantage for surviving a major surgery.   If it is true then we should deny the potentially ineffective procedure.   The conditions that influenced the determinations may be treatable.   For example, if the problem is untreated dental infection, then the procedure may be approved after obtaining that treatment.    More likely the conditions are immutable (such as history of a particular childhood disease) and the decision is permanent.

The denial decision is permanent.   You get one chance to run the query and based on data available on that date, your conditions indicate that it is not cost effective.   Big data keeps changing with new data so that a query run a year later may produce a different recommendation.    Even if you survived that long, you probably will not get a chance to ask for the query to be run again especially if none of the conditions has changed.

The postponement decision is temporary.   Say for example, the patient is told that he can not get a surgery until he gets a dentist to repair a tooth.    At that point, a new query is run to see if the remaining conditions match well with good outcomes.   The new query now has access to more data from other patients and it is possible it will identify a completely different confounding condition not even identified earlier.  The procedure may be denied a second time for entirely different reasons because the available data has changed.

The problem is the process of mining big data.   Big data mining is an iterative process that narrows down to the factors that produce the most confident recommendation.    Different data from the patient or present in the data store can lead to different sets of factors.  This becomes even more acute as more factors (or dimensions) become available to explore.

Even if an earlier analysis suggested that a single condition was preventing approval, removing that condition does not assure approval by the later query.

Big data does not solve the health care problem of pre-existing conditions.  It simply provides an alternative approach to deny access to care based on those pre-existing conditions.

Conceding the point that this is approach is more fair than basing determination on wealth alone, I still see a major problem.

I agree that all of the technology can assemble huge numbers of records of vast populations.   That massive amount of data can be queried quickly to find patterns related to a particular patient’s whole set of conditions.

My problem is my lack of trust in the data itself.

In my earlier posts (summarized here) I described various reasons to suspect the trustworthiness of data.   I’m certain that the wide variety of data streams that will feed a health-care big-data store will have the full range of suspect data.  My bias is that most of that data is suspect data.

We can solve the trust problem by investing in sufficient data science analysis to continuously and thoroughly scrutinize the data feeds to verify their quality to support the intended purposes of provisioning health care to individuals.     If this kind of scrutiny is occurring,  it is likely to be invisible to the consumer.   I strongly suspect the scrutiny will be insufficient and what little there is will be employed only during the design phase of the project.   I suspect there will be virtually no routine review of data quality during the operational phase of the system.  From what I can tell, we are assuming that all health care data will be gold-standard bright data needing no routine scrutiny.   I’m think most data will have far more problems than that.

Technology is not the limiting factor for building big data systems.   The limiting factor is the labor required to assure that data remains trustworthy.   Very little of that data is trusted so highly it never needs scrutiny after the initial design.    The limitation comes from our unwillingness to budget for the routine review of these data as the data evolves over time.   We want to believe that as long as the physical equipment or the designed software is the same, the quality of the results will remain the same.   I don’t accept that assumption for such big data schemes envisioned for health care provisioning.

Big data is not the solution to the health care reform.  It is likely to harm it.   The current sentiment is that big data is very effective and technology available today makes it possible.   We believe that because we are conveniently ignoring the problem that the data is not all equally of high quality.   The big data will be contaminated with bad data that will end up making bad decisions.  Those bad decisions and the bad data cause of those decisions will become known and become politically toxic.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s