With the enthusiasm of personal health diagnostic tools that connect automatically (such as through smart phones) to health data vaults there is an tremendous opportunity to undermine HIPAA privacy protections by secretly encoding the individual’s contact information within the measurement data using steganography techniques. The market for cracking HIPAA protections either for private gain or for harming an individual will make this type of attack very attractive for future hackers. With the large number of such devices, there are plenty of opportunities to find some piece of health information to hide the identification information. Even something as simple and uninteresting as periodic oximeter measurement time stamps could be sufficient to get the identifying information past the anonymization software.
Personal technology usually involves a subscription registered to an e-mail address or phone number. Malware can use steganography to hide the user’s contact information in the recorded data. For example, an oximeter that records blood oxygen at regular intervals can slightly jitter the recording interval by a number of seconds (or milliseconds) computed from a hash of the user’s contact information, taken a few bits at a time so that the entire hash can be recognized over a several hours of measurements. The hacker would be able to sell the hash key used through the black market.
For clinical studies, the analytic services such as IBM’s Watson Health (I touched on in this earlier post) would combine the individuals health records for a particular study and then apply anonymization algorithm to remove the explicit identification information of text or recognizable features such as face images. This anonymization would not recognize or obscure the hashed identifying information hiding within the measurements.
When the clinical researcher gets this data, he may use a tool to scan the measurements to extract the hashed identification data. This would detect the hash and a point of contact to retrieve the identification information. The researcher may be part of some commercial company who could benefit from identifying the individual for direct-sale marketing purposes, or he may use this opportunity for his own private gain.
In an earlier post, I attempted to make a case that routine analytics of big data can accidentally expose privacy-protected data when there are enough dimensions available. I wanted to argue that even when we have confidence that a deliberate attacker targeting a specific identification would be unsuccessful, there remains the possibility of accidental discovery of random sets of people who happen to be sole occupants of certain categories in some combination of dimensions. This is less of a concern than the targeted attack, but there is still a risk of an unscrupulous analyst taking advantage of the opportunity when he recognizes a high-value target in that identification. In this present post, I presented a mechanism that makes targeted identification of de-identified data possible through malware that hides the identifying information within the measurement data.
3 thoughts on “Wearable health technologies, such as fitness trackers, can compromise HIPAA data”
The above scenario is a mix of a data hack and a classic malware attack, the software needed to be modified to encode the identifying information. This is a possibility but I thought of another technique does not require compromising the health tracker.
Health trackers measure physiology signals similar to those picked up by polygraph machines: heart rate, blood flow, skin conductivity, breathing. These all can be affected by immediate surroundings and mental state such as during a conversation.
An attacker can encode an identifying signal on the health tracking through a carefully timed (or recorded) encounter with the target to engage in a conversation that will provoke a sequence of emotional responses, such as laughter followed by anger or fear. The health tracker would record the sequence of physiological changes with relative time stamps that the attacker will also know. Like the attacker in the above scenario, he may not ever have access to the private data. Instead, he offers for sale his “signature” of specific times for specific emotions with a name of the person who he targeted.
Someone with access to the aggregated but de-identified health data including information about medical records as well as health-tracking data can query the data set for the specific date and time for any individuals who experienced the expected swings in physiology that correspond to the recorded event. If he finds one, it will likely be unique and very likely be the same person. Because his data is aggregated, he will then know things such as medical diagnoses, type of treatment, and other medical information that was aggregated with the physiological data from the health tracker.
This makes a better example because it does not involve any compromise to the tracker hardware or software. The hack occurs outside of the IT surface of the health network.
The scenario is can be made cheaper when the interaction is between bots instead of humans. The bots may be on personal computers and evoke the same identifying swings in emotions to a large population, each one with distinctive patterns that can be queried later. Even today, a streaming video service to a subscriber may capture the time a person watches certain scenes that can be expected to cause sudden physiological changes. With view-on-demand, those changes are likely to be unique for that particular time. These approaches can result in databases of identifying stress events for a large population and this data can be used to re-identify large populations or through process of elimination improve the confidence of a particular identification.
Another trick may come from emotion-detecting technology. These algorithms will be able to accurately record a persons current emotional state over time. The emotional states exposed in facial expressions could strongly correlate with physiological measurements picked up by wearable health technology. The types of measures monitored by wearable health technology (breathing, heart rate, blood flow, skin conductance, etc) are similar to the measures included in polygraphs that claim to detect emotional responses. There may be a strong correlation between the facial expressions and the internal physiological responses.
The article describes using emotion-reading technology for entertainment or other non-health related purposes. For entertainment, the software may modify the content to adapt to the audience’s emotional reaction. Such modifications may be recorded with time stamps that could produce a time-signature traceable to a specific subscriber. If these emotional responses correlate with physiological measurements, then the sequence of emotional changes may provide a signature that can be used later to identify private health data after the wearable device data is combined with other health information. That combined information may be anonymized but that process will not modify the actual health measurements that inherently contain a signature of a sequence physiological changes that can be matched to a particular person.
The scenario is like in the previous comment where the operator of the emotion-reading machine merely makes the sequence data available for third parties to obtain. One of those third parties may be a researcher who has access to de-identified clinical health data. If that data includes wearable health monitors, the signature from the emotion-reader may be apparent in the measurements of the wearable health monitor. The researcher may be able to recover the identification by combining both the privacy-protected health data and the non-privacy-protected emotion-reading data.
Pingback: Data governance vs Datum governance | kenneumeister