With the enthusiasm of personal health diagnostic tools that connect automatically (such as through smart phones) to health data vaults there is an tremendous opportunity to undermine HIPAA privacy protections by secretly encoding the individual’s contact information within the measurement data using steganography techniques. The market for cracking HIPAA protections either for private gain or for harming an individual will make this type of attack very attractive for future hackers. With the large number of such devices, there are plenty of opportunities to find some piece of health information to hide the identification information. Even something as simple and uninteresting as periodic oximeter measurement time stamps could be sufficient to get the identifying information past the anonymization software.
Personal technology usually involves a subscription registered to an e-mail address or phone number. Malware can use steganography to hide the user’s contact information in the recorded data. For example, an oximeter that records blood oxygen at regular intervals can slightly jitter the recording interval by a number of seconds (or milliseconds) computed from a hash of the user’s contact information, taken a few bits at a time so that the entire hash can be recognized over a several hours of measurements. The hacker would be able to sell the hash key used through the black market.
For clinical studies, the analytic services such as IBM’s Watson Health (I touched on in this earlier post) would combine the individuals health records for a particular study and then apply anonymization algorithm to remove the explicit identification information of text or recognizable features such as face images. This anonymization would not recognize or obscure the hashed identifying information hiding within the measurements.
When the clinical researcher gets this data, he may use a tool to scan the measurements to extract the hashed identification data. This would detect the hash and a point of contact to retrieve the identification information. The researcher may be part of some commercial company who could benefit from identifying the individual for direct-sale marketing purposes, or he may use this opportunity for his own private gain.
In an earlier post, I attempted to make a case that routine analytics of big data can accidentally expose privacy-protected data when there are enough dimensions available. I wanted to argue that even when we have confidence that a deliberate attacker targeting a specific identification would be unsuccessful, there remains the possibility of accidental discovery of random sets of people who happen to be sole occupants of certain categories in some combination of dimensions. This is less of a concern than the targeted attack, but there is still a risk of an unscrupulous analyst taking advantage of the opportunity when he recognizes a high-value target in that identification. In this present post, I presented a mechanism that makes targeted identification of de-identified data possible through malware that hides the identifying information within the measurement data.