The advantage of data on read strategy is that it separates the processes of data collection from the processes of applying a schema in order to interpret the results. We can learn more easily that our prior knowledge was wrong when we get prior knowledge out of the data store.
For the project of knowledge or hypothesis discovery, this sharding of history is more valuable than attempting a historical report using the operational database. The sharded history retains the context of the data. For a business example, assume a report for the previous period involved some action by an employee who has since been promoted to a different position. Using the operational database for this historical information will naturally return the erroneous result that the new position was responsible for the prior action when in fact that action was done in capacity of the older position.
The potential return for exploiting operational data will not justify the investment. This return is naturally limited by the short time period available to take advantage of the opportunity. The window of opportunity is naturally short because new operational data will present distractions of new opportunities to pursue. Also, the competitors and customers also are employing their own operational data intelligence so that they will quickly close any advantage gap. Unfortunately, this investment distracts the organization away from historical data that offers more durable knowledge discovery.
In an earlier post, I presented some interactive reporting based on custom categorization and aggregation of data available from Capital Bikeshare. Those reports used Excel pivot tools and SQL Server Reporting services using both relational T-SQL and an Analysis Services cube I constructed to make the desired navigation and aggregation easier to report. My eventual…
With modern speed of data retrieval, analysis, and visualization, we may be encountering a new form of logical fallacy of appealing to authority where the authority comes from the speed at which we can present affirming data for our theses. Assuming that human behavior is a product of evolution, there has not been enough time for evolution to adapt to the new reality of nearly instant affirmation of some consequent. Historically, we recognized a pattern that we can trust affirming data if it arrives quickly. Before modern data technologies, the speed of finding affirming data was an indication that affirming data is abundant around us so it didn’t take long to find. That particular mode of thinking is no longer valid with modern data technologies. The instant access to a wide variety of data makes it possible to find affirming data very quickly. It will take a few generations for evolution to catch up to teach us to not trust speed of affirmation as proof of some hypothesis.
Following the lessons from computer neural networks, we should recognize that intelligence in an organizational neural network arises within the network itself. It does not dependent on hierarchical decision makers. Neural-network organizations have no need for individually accountable human decision makers such as managers or officers. Such an outcome is consistent with the goals of evidence-based decision making that ideally obligate decisions based on the evidence alone and not on whim of a designated leader.
Data deception is a concern for automated decision making based on data analytics (such as in my hypothetical dedomenocracy). I think it is already a concern with our current democracy. I fear the current enthusiasm for data technologies because I do not see much in the way of appreciation for the possibility of deception. There is a huge confidence in the combined power of large amounts of data and sophisticated statistical tools (such as machine learning). Missing from our consideration is how well the data actual captures the real world. The data is not necessarily an honest representation of what is happening in the real world. It is very possible that the data may include deliberate deception.