I offer a new word to inspire this post: Dedodemocracy : government by data. In the very recent past, there has been a very rapid shift to exploit large data collections to change the way our government makes policies. We increasingly using large historical data stores to replace statistical approaches using smaller sample sizes. We are also fine tuning policies to apply to smaller subgroups, getting closer to much more individualized policies. My recent posts suggested how this may be happening in health care policy and in particular the provisioning of health care to control costs.
We are increasingly become a government based on data instead of by the people. This is accelerated even more by how more and more policy making is being transferred from democratically elected legislatures to bureaucracies that are largely out of reach of the democratic control. In part, both trends are reinforcing each other: policy relies on data and for various practical reasons that data is available only to the agencies who make that policy.
Data technologies are improving at a rapid pace. We are quickly removing technical limitations that in the past made it impractical to share data outside of small groups of analysts. I mentioned in another post that we should open to the public any collected data that the government justifies collection based on the idea that there is no expectation of privacy for that data. For example, any bulk collections of metadata that is justified as not requiring a court order or warrant should be available to the public at large as quickly as it is available to an agency’s analysts. Earlier objections that this was not technically feasible are increasingly indefensible. Technology can allow for a large base of users accessing the same data with their own queries.
The remaining barriers are political in nature. Those barriers will remain in place for a little more time because we haven’t really discussed the implications of dedodemocracy. The government admits this is public data because it doesn’t have to get a specific warrant for its collection. That data is accessible by citizens who happen to be employed by the government. There is little qualitative difference between that analyst and similarly trained analysts who are not employed by the government. And only training distinguishes the trained analysts from the general population.
As we increasingly rely on data to make and enforce policy, we will recognize we need more democratic participation in that data project. Eventually we will object to the government’s exclusive holding and accessing of this data the government uses to govern our lives. Eventually, we will want to have open data: data available to all.
Open data initiatives are already starting and some of those are gaining momentum. The government itself has begun to make its data more open. These are small starts because the government data is generally post-analysis data. The currently inaccessible bulk pre-analyzed data should also be open data. Assuming that we will remove the technical barriers that impede the ability of the entire population being able to access this data, then only policy remains to prevent this from occurring.
Returning to the comparison of analysts. There is little difference in trained data analysts employed within government (civil service or contractors) and trained data analysts outside of government employment. Only training distinguishes trained data analysts from the general population.
If the future of government is government by data, then we should be training everyone to become data analysts. Data analysis skills should be taught as part of the basic school curriculum. Data skills are different from other primary or secondary curricula. Data skills include the topics of scrutinizing the data-qualities of the data. By data qualities, I am referring to the distinctions I have been discussing in earlier posts partially summarized in my proposed taxonomy. Those concepts should be expanded to include other uncertainties (such as precision, accuracy, or missing data) and to include good practices in interpreting the query results.
It is easy to see that eventually we’ll be demanding much more participation by the citizens in querying the available data in order to properly participate in modern democratic government. Even today, many of the current debates of modern politics depend on data where access to that data is highly restricted, either to one side of the argument or to exclude access by the general population. As citizens who need to be persuaded by the arguments, we need better access to that data.
The primary barrier to that access is training. The training is not hard, it could be done as part of elementary or secondary education. Unfortunately, the current focus of education appears uninterested in this type of training. We are more concerned about skills of earlier generations: math and sciences (and both are mostly about memorizing historical mathematical or scientific discoveries). We may be better off discarding some of that emphasis in favor for building skills necessary to make new discoveries from analyzing data.
In my taxonomy, I emphasize the highest quality data as recorded observations that are obtained from very well documented and well controlled methods. Ideal observations are accurate and precise. Ideal observations are completely free of any influences imposed by prior theories or hypothesis. I described how dark data (data generated by models to replace observations) and forbidden data (observed data that models reject) can bias the data in favor of confirming old ideas instead of discover new ideas. These model-influenced types of data are necessary for quality control reasons, but we should at least be able to distinguish them from more valuable direct observations.
The goal of making public policies is that new policies will be relevant to current conditions. This demands avoiding propagation of obsolete notions into the present data. We need to find in data what is reality today.
The trend in public policy making is to become much more focused onto smaller subgroups. Policy is becoming more specific to specific categories of populations, locations, or circumstances. My recent posts explored some of that specificity occurring in health care policies. In order to debate these policies, we need skills to recognize what data is relevant to these debates and to recognize the relative value provides by the different parts of that data.
These are not hard skills to learn. These are skills that are best learned through repeated practice. The ideal time to learn these skills are during the period of primary and secondary education. Starting data science training at the third grade provides nearly a decade of practice for working with data. Also, in about a decade it will be necessary that adults be able to analyze data for themselves if they want to participate in policy debates.
Two things must happen immediately to prepare the next generation for dedodemocracy. The first as mentioned is to start to introduce the data-analysis or data science skill training in the primary and secondary education with continuous practice throughout the entire period perhaps starting at the third grade.
The second is to give these students access to real data to practice their skills. We need to encourage faster development of open data projects that make available real and current data for students to use in their training.
Most primary and secondary topics can be learned from information printed in books. Specific lessons usually fit on a single page or even a single paragraph. Data skills are completely different. The volume of data is impractical to print in books and printed data is impractical to analyze. Data skill training requires access to databases. These databases need access to real world data that are relevant to the exercises. Ideally that data will come from the government’s open data initiatives. Ideally these initiatives will continue to expand to expose all data available to government-employed analysts.
Pingback: Bright Data, Hidden World | kenneumeister
Pingback: Morlocks and Eloi | kenneumeister
Pingback: Occam’s Razor in age of big data | kenneumeister
Pingback: Bright Data make trains run on time | kenneumeister
Pingback: Authoritarianism by data: the obligation to participate | kenneumeister
Pingback: In government by data, the morning paper is data open to all | kenneumeister
Pingback: On congress using CBO to deceive the public about the Affordable Care Act | kenneumeister
Pingback: Reforming government with decimal system for government work product | kenneumeister
Pingback: Electronic Records of government staff labor as a method of reform | kenneumeister
Pingback: Consent to be Governed: the test of Ferguson-inspired protests | kenneumeister
Pingback: Useful Activity Tracking of government work requires flexible coding | kenneumeister
Pingback: Applying lessons learned from frame rates in motion pictures to modern journalism | kenneumeister
Pingback: We need human decision makers to detect story-telling and demand that stories are properly documented | kenneumeister
Pingback: Gossip Journalism: big data can use bright dirty data unfit for publication | kenneumeister
Pingback: Improving government with frequently updated laws: rule by data | kenneumeister
Pingback: Dedomenocracy: unsupervised government | kenneumeister
Pingback: Alternative democracies | kenneumeister
Pingback: Authoritarianism by data: the obligation to participate | Hypothesis Discovery
Pingback: Useful Activity Tracking of government work requires flexible coding | Hypothesis Discovery
Pingback: Reforming government with decimal system for government work product | Hypothesis Discovery
Pingback: Electronic Records of government staff labor as a method of reform | Hypothesis Discovery
Pingback: Morlocks and Eloi | Hypothesis Discovery
Pingback: Occam’s Razor in age of big data | Hypothesis Discovery
Pingback: Bright Data make trains run on time | Hypothesis Discovery
Pingback: In government by data, the morning paper is data open to all | Hypothesis Discovery
Pingback: Applying lessons learned from frame rates in motion pictures to modern journalism | Hypothesis Discovery
Pingback: We need human decision makers to detect story-telling and demand that stories are properly documented | Hypothesis Discovery
Pingback: Gossip Journalism: big data can use bright dirty data unfit for publication | Hypothesis Discovery
Pingback: Improving government with frequently updated laws: rule by data | Hypothesis Discovery