The above video presents four eras of human communication from an evolutionary perspective where there was a long time when humans only gestured and grunted, then there was a long time when humans spoke but did not write, then a long time when people wrote. The useful information is the progression of information content possible with each era. The unnecessary information is evolutionary explanation. For this discussion to work, there doesn’t have to be a specific period of time when human culture flourished with illiterate people fluent in verbal languages. There are clearly expansions of content starting with gestures, then adding verbal languages, then adding written languages. The Internet era allows us to publish and retrieve information separately from the story-telling.
With big data, we end up with deep historical data from distant events. There will be something needed to fill in the gaps that were mysteries at the time. That gap filler will be spontaneous data whether we acknowledge it or not. Even if we as humans leave the gap unfilled, we can’t be sure that our data analytics or machine learning algorithms won’t fill it. When it does, how can we be sure it won’t come up with a supernatural explanation that it keeps to itself?
For those who were surprised by this recent election, be prepared to be even more surprised by the next one.
The advantage of data on read strategy is that it separates the processes of data collection from the processes of applying a schema in order to interpret the results. We can learn more easily that our prior knowledge was wrong when we get prior knowledge out of the data store.
For the project of knowledge or hypothesis discovery, this sharding of history is more valuable than attempting a historical report using the operational database. The sharded history retains the context of the data. For a business example, assume a report for the previous period involved some action by an employee who has since been promoted to a different position. Using the operational database for this historical information will naturally return the erroneous result that the new position was responsible for the prior action when in fact that action was done in capacity of the older position.
The potential return for exploiting operational data will not justify the investment. This return is naturally limited by the short time period available to take advantage of the opportunity. The window of opportunity is naturally short because new operational data will present distractions of new opportunities to pursue. Also, the competitors and customers also are employing their own operational data intelligence so that they will quickly close any advantage gap. Unfortunately, this investment distracts the organization away from historical data that offers more durable knowledge discovery.
Legacy applications can benefit from big data approaches without the need to replace the legacy architecture with new technologies. Instead the big data can augment the application by collecting higher volume, variety, and velocity data about the user’s activity using the application. Analysis of this data can inform decision makers where there may be problems with the work-products. Correspondingly, it can provide requirements analysts with information about where improvements are needed or with more complete library of edge cases to consider for new designs.