Legacy applications can benefit from big data approaches without the need to replace the legacy architecture with new technologies. Instead the big data can augment the application by collecting higher volume, variety, and velocity data about the user’s activity using the application. Analysis of this data can inform decision makers where there may be problems with the work-products. Correspondingly, it can provide requirements analysts with information about where improvements are needed or with more complete library of edge cases to consider for new designs.
When we look to data technology to solve problems, we should permit the technologies to identify the problems that can be solved with the current capabilities instead of demanding that the technologies evolve to solve the hard problems we have been working on. There are many opportunities to make progress even if we don’t touch the hard problems. Allowing technology to solve what it can solve now may transform the hard problems to be narrower, or possibly even less visible. For example, there are other ways we can improve overall life expectancy without curing any cancers, perhaps with investments in areas unrelated to health care. It is our nature to focus on objectives that catch our attention. This focus can blind us to immediate opportunities that are realistic given our current situation.
If someone wants to cause trouble for the big data owner, they can leverage the known missing data to raise accusations that the big data owner will not have any data to use in defense. The accusations can suggest cheating, fraud, criminal activities, etc that can harm reputations or invoke costly and lengthy investigations that can deny the owner of realizing the potential benefits of the big data analytics.
Data deception is a concern for automated decision making based on data analytics (such as in my hypothetical dedomenocracy). I think it is already a concern with our current democracy. I fear the current enthusiasm for data technologies because I do not see much in the way of appreciation for the possibility of deception. There is a huge confidence in the combined power of large amounts of data and sophisticated statistical tools (such as machine learning). Missing from our consideration is how well the data actual captures the real world. The data is not necessarily an honest representation of what is happening in the real world. It is very possible that the data may include deliberate deception.
I’m describing this as the security of the datum instead of the data. Specific observations are vulnerable to exploitation instead of everything observed by sensors. The malware is in the population being observed instead of in the IT systems.
To combat this kind of problem, we are going to need an additional approach of datum governance to protect the observed population from deliberately inserted biases.
The enthusiasm for the benefits of big data comes from widely promoted reports of past successes. The promise of big data techniques is that it can provide similar successes in other contexts. Big data involves volume, velocity, and variety. The volume and velocity depend on automated queries and report building. The variety introduces the opportunity for new benefits. The combination of automation and opportunity from variety is what makes re-identification possible or even very likely.
Oral story telling was the original big data. The various oral stories were saved in persistent memory and captured a large volume and variety. The invention and adoption of written works displaced the oral tradition and that brought and end to that earlier big data. In this sense, our current excitement about big data may be a rediscovery of a capability available our ancient ancestors. Big data and oral story telling tradition both offer inexpensive and durable means to manage a large number of distinct and very individualized stories. In the modern era, we are rediscovering the need to collect individual stories and thus granting them ability to circulate like what happened in the preliterate society of oral story tellers.