Big data can re-identify de-identified data

The enthusiasm for the benefits of big data comes from widely promoted reports of past successes. The promise of big data techniques is that it can provide similar successes in other contexts. Big data involves volume, velocity, and variety. The volume and velocity depend on automated queries and report building. The variety introduces the opportunity for new benefits. The combination of automation and opportunity from variety is what makes re-identification possible or even very likely.

Advertisements

Dedomenocracy’s nemesis: the innovative criminal

Even including super-intelligent machines into the concept of dedomenocracy, there will remain the present-day complaint that the government needs to get lucky every day but the criminal human needs to get lucky only once. This problem will remain long after we replace democracy with dedomenocracy. The most dangerous criminal is the non-criminal who immediately acts on his newly discovered hypothesis. Even superhuman intelligent dedomenocracy may not be able to discover this hypothesis first.

Predicting crime with big data is profiling and will not help predicting big crime

Using big data to predict finely distinguished demographics that are more prone to committing crime does not change the fact that this is profiling. Profiling is controversial. Using big data in this way is likely to make the controversy more hotly debated as we learn of harsher and more prolonged harassment of innocents with the misfortune of being categorized in the some demographic with a higher likelihood for committing crime. We make not progress by reducing crime in a way that increases social unrest and disapproval.

Risk of predictive analytics taking data too far

A recurring theme of many earlier posts is that the benefit of historical data is to discover new hypotheses with enough credibility to justify investment into future investigation.   The necessary investment for a discovered hypothesis is to create new experiments to test that hypothesis where those experiment carefully collect new data controlled specifically for…

Predictive Analytics and Spurious Correlations

The spurious correlations site has a lot of interesting charts showing various arbitrary combinations of trends that show strong correlations and yet have no rational basis that suggest causation.  Also, the site has a nice feature to explore other correlations by using the hyperlinks on the chart titles to find other trends that correlate with…

Science based on Observations

In my last post, I described how sciences dealing with the immediate physical world involve models where time is an independent variable.  For dealing with the world, we have theories that connect causal events by the elapsed time between those events.   In contrast, the science of studying the record of past events is dependent…