Each morning it takes me about 30 minutes between getting out of bed to when I get to a position to begin writing. The routine is usually routine and extraordinarily mundane (get up, get dressed, make bed, etc). However, some days take much longer than others. Sometimes, my computer insists on being rebooted to install some update for one reason or another. Sometimes, I do my routine out of order. For example, an efficient morning involves going down to start heating water for my morning tea before I make my bed so that water is boiling by the time I’m done. If I make the bed first, then I’m waiting a few minutes for the water to boil. Sometimes, I don’t even bother making the bed because I am not sure I won’t want to return to it.
I am using this trivial example as a metaphor for some work activity. I imagine a scenario where this 30-minute period is a billable task. I may want to track my morning activities so I can defend why it sometimes takes longer than 30-minutes. I may also want to analyze the recorded data for how varied my mornings are.
Recently, there has been a growing popularity of activity tracker apps to exploit mobile technologies such as smart phones. These apps often offer an option to save tracking data at a central location and this makes possible broader analysis of activities across entire populations.
For example, this article describes an analysis of sleep patterns interrupted by a nighttime earthquake near Napa California in August. In this particular case, the data came from a wearable bracelet designed to monitor sleep quality and did not require human input and the disrupting event was independently recorded. I find the study to be interesting. It shows a surprising result that in the area hit strongest, 20% of the wearers managed to sleep through the earthquake. Also, only about 10% of those awaken by the earthquake were unable to fall back to sleep.
Imagine if I were tracking my morning wake-up routine and suddenly needing to account for my morning being disrupted by this same earthquake. This data could show that while I was not alone and having my morning disrupted, others managed to get through it with less disruption. This could be useful to know, even for something as trivial as tracking a morning routine.
Earlier (here and here), I wondered about tracking activities of government employees in such a way that their activity data would be available to the public so that we can evaluate the diligence of their work toward introducing new regulations or performing some duty. Activity tracking needs to be precise enough to distinguish different kinds of activities and yet still broad enough to be easy to record and not be too intrusive. The above sleep tracker app is an reasonable model because it records the occurrence of sleep, but not the details such as whether the sleep was on a couch, a bed, or the floor. In my discussion about government workers, I am interested primarily in matching their activities with the results of their departments. For example, if they record an activity of attending a meeting, there should be preceding preparatory activity that is appropriate for that meeting. The level of detail of activity tracking is comparable to the above sleep tracker where sleep is an analogy to constructive work activity.
When an agency releases a new controversial regulation, the public would benefit if they had access to activity data to show that the appropriate employees in the agency have invested appropriate levels of effort toward a diligent professional construction of the regulation. Take for example the recent EPA announcement of proposed new tighter regulations on ground level ozone limits:
On November 25, 2014, the EPA proposed to strengthen the National Ambient Air Quality Standards (NAAQS) for ground-level ozone, based on extensive scientific evidence about ozone’s effects.
This proposal is controversial because it follows shortly after an previous reduction that industry is still trying to meet and that the even lower limits may not be affordable:
Leaked details of the proposal drew sharp criticism from industry groups, which argue that tighter restrictions will lead to higher costs and losses in jobs and economic productivity.
Our laws permit the existence of the EPA and grants them the power to make regulations involving clean air. We should expect burdensome recommendations as a result of their duty to execute the clean air act. New regulations will produce a burden. The question is whether the EPA has sufficiently justified the new burden through internal review.
I think we need more data to show that EPA invested appropriate efforts in preparing this recommendation. We need quantification of their assertion that this was “based on extensive scientific evidence”. The quantification should tell us that appropriately qualified staff invested an appropriate amount of time reviewing scientific evidence of both costs and benefits, and this activity included sufficient opportunities for internal debates. Assuming that these internal debates occur during meeting, we would like to observe that the participants invested appropriate amount of time preparing for the meeting.
As I mentioned in the earlier posts on tracking government employee activities, this is tracking broad activities such as spending time in a meeting, or spending time working on a desktop application. The tracking is coded to activity categories instead of a detailed record of the content of their individual contributions. The desired information is that the new regulations were the result of appropriate amount of time on the appropriate activities by the appropriate staff. We need more information than the current information available that simply informs us that EPA employs certain staff who managed to earn their annual salary and bonus. We want to know that they actually did what we expect them to do in constructing a new regulation that will burden us.
As I discussed earlier, we are moving away from a democratic government toward an autocratic government by data where we are obligated to accept the recommendations of extensive scientific evidence without popular democratic debate such as in congress. Better public access to data about the investment of resources that go into these decisions can permit continued our continued voluntary consent to be governed by these regulations.
Enabling rigorous activity tracking on Government employees and contractors can provide this information to assure us that they are doing the right kinds of activities with the right amounts of time to justify the regulation.
At the beginning of this post, I presented a trivial scenario of getting up in the morning. Using this as an analogy for a job activity, our current information about government staff is very general. We know that the government hires certain people and that they are in good standing to remain employed for the year. From this we can assume they are getting up in the morning but this is dark data (my term for model-generated data) in the sense that we have no actual data to back this up.
It very safe to assume that if a person shows up to work, he must have got up out of bed. However even this assumption can be challenged with modern use of flexible teleworking and compressed workweeks. We do not know if people actually get out of bed when they show up to work. More significantly, with modern government practices of flexible work schedules and compressed work-weeks, we can not be assured that the right people will be present when critical meetings or events occur. Knowing that the right person was on the staff and he earned a full year’s pay does not tell us if he actually participated in the meetings or made the contributions that we expect from him for a particular regulation. We need more detailed tracking of activities to see that his activities made reasonable contribututions to key decisions.
For example, I assume EPA hires experts whose job it is to understand what industry is capable of doing and how much it will cost them. Our confidence in the decision would be enhanced from observing data that these experts did appropriate kinds of activities to prepare for key decisions and to present their counter-arguments.
I imagine a future system that makes available to the public the meta-data on all of the activities performed by everyone working for the government (government staff and contractors). This would be meta-data to describe the nature of the task but not include the content or identify the individual performing them other then their skill category and grade level. The tasks were be coded into a master list of possible codes and each code will indicate the nature of the work, the justification for the work, and whether the work was initiated or follow-on from earlier work.
I’m envisioning a master list similar to the medical diagnostic codes of ICD-10 but using modern data technologies and practices instead of ICD-10’s assumptions of 1970s databases. In contrast to ICD-10’s concept of identifying single-character codes for collections of a range of options collected as an kind of an aggregate key, this list would have natural language descriptions for each option to select from an online form. Instead of selecting entire codes in one operation, the form would present appropriate sub-options once the higher option is available. Modern analogies are forms that first ask for country and then provide a list of options for states and provinces in that specific country, and then provide a list of appropriate cities for that state or province. Instead of geographic information, this form would present work-activity information in a successive manner.
Unlike the ICD-10 approach that relies on annual updates of fixed lists (tabular text files) with the assumption that these would be read or memorized by humans, my suggestion would be a central server providing the most current list to online access devices, typically personal electronics devices. The personal electronics devices can cache the master list during periods where there is no network access but it will synchronize with the master list once network access is restored. This is important because I expect the list to be update frequently.
I am critical of the ICD-10 codes because the entire concept appears based on assumptions of data technologies that existed before the Internet, and certainly before modern mobile app technologies. The codes appear to be intended to be read by humans (if not memorized) and entered manually into some form. If the ICD-10 were designed fresh today, I would imagine it would not look anything like what the medical industry must implement fully by next year. For one thing, a modern design would use indecipherable GUIDs that would key to descriptions of each condition. The conditions would be organized through user-interfaces taking advantage of fields of the description record instead of characters in the key itself.
The design would allow for quicker adaption to capture new variations as they come up instead of the current requirement to select a close approximation to the current situation. For example, in ICD-10, there is a single code for the Ebola Virus Disease (A984). However, recent experience of Ebola Virus disease (EVD) in USA involved at least 5 variations of the condition that would be better described with separate codes. In recent months, the US medical system experienced cases where:
- Individuals need to be monitored because the came in close contact to an infected EVD. In at least one case, this involved a form of quarantine inside a hospital although the patient did not have EVD.
- EVD patients caught very early when symptoms were mild and treatment with anti-viral medication and blood transfusions eliminated the virus from the body.
- EVD patients with far advanced symptoms of hemorrhagic fever and organ failure and far higher infectiousness to others. These patients best qualify for the A984 code and are need of the most extensive treatment and will experience the highest mortality rate (about 50% in US hospitals).
- Death resulting from EVD.
- Recovered EVD patients where the virus is no longer in blood but may still be present such as men’s semen so as to require continued restrictions and monitoring.
While the A984 code can apply to all of these cases, this will not help much for data analysis because each of these requires drastically different procedures and have very different outcomes. For data analytics, it would be better to have more codes. Also, the first example in the above list was a condition we apparently did not anticipate because we implemented changes in protocols after the disease was observed in US. We need a coding approach that allows for new codes to be introduced and disseminated much faster than once a year followed by mandatory retraining to learn the new codes.
At the beginning of this post, I described a trivial but analogous scenario of getting out of bed in the morning. There could be a single code for getting out of bed and this is an activity that typically stretches over 30 minutes. I would rather have separate codes for getting dressed, making the bed, preparing the tea, starting up the computer, etc. Even with this in place, I want to have an opportunity to enter new codes such as the above example of an earthquake that if it had occurred it would have interrupted my sleep and would probably had me checking out of the house to make sure there was no damage. I would want the ability to add a new code (or multiple codes) for that to distinguish this from my normal activities.
Another complaint about the ICD-10 is its approach to handling some condition that is not otherwise listed. For example A98 code is for “Other viral hemorrhagic fevers, not elsewhere classified”. It is laughable that in 2015 we are requiring the medical profession to use such a 1980s concept of a single code for “none of the above”. If we suddenly encounter multiple new viral hemorrhagic fevers, all of them would necessarily get the same code even if their treatment and prognosis would be very different. Again, this may have been a practical necessity in 1980s when lists had to be published in paper form and updated annually (or less frequently). Although it is highly unlikely that we will even see a single unspecified hemorrhagic fever, let alone multiple new ones, we no longer have a technological constraint that requires a single code if they do occur. If given the opportunity to design ICD-10 with modern practices, we would not even think of having such a default condition at all. Each “not previously specified” condition would be submitted as a brand new code with its description. If suddenly there are 10,000 patients with hemorrhagic fever not otherwise specified, we would end up with 10,000 new codes, one for each. Later we may identify these as identical and the solution would be to build a code-substitution table that translates the original codes into the common code for the now identified condition.
My point about criticizing the archaic design of ICD-10 is that although I’m using the concept as a model for my activity tracker for government workers, I would expect a fresh modern database design approach instead of prior design of human decipherable keys that composites of single character sub-codes. The key values should be GUIDs (globally unique identifiers) so we can maintain a single global data set that can allow for individual workers to add new activities that do not fit well with the predefined set without risking duplicate keys.
I am imagining a government worker who will be entering these activity codes throughout the day, typically perhaps 5-6 times an hour. He will be using his personal electronics device (such as a government-approved smart-phone) that would use smart technology to prioritize his options to be the ones he is most likely to use based on his history and the current time of day. Ninety percent of the time, he would be able to select the right code with a single touch of available options or by scrolling just a couple screens of options. Occasionally he may encounter something new to him but is common enough that he can find a matching condition with a search option.
Often he will not have time to search at that time so he will enter an option for to be determined later. Each time he makes this choice, there will be a fresh code created (a fresh GUID) as a placeholder for this undefined task. I imagine work rules that will require the worker to resolve each of these before the end of his day. When he updates a previously undefined task, there will be a mapping entry to translate the freshly created code to the code in the standardized list. Alternatively, he may find no suitable description and provide his own description for this newly created code. The supervisor or other higher authorities may work with him to map this to an appropriate existing code or submit his option as a new code for everyone else to use.
Unlike the ICD-10 codes, the keys in this system will not have any human-recognizable information in the key itself. Instead the key will reference a record involving multiple columns (or dimensions) to associate the code with closely related codes. Analysis of this data will use the columns (not the keys) for queries about different activities.
Although the ICD-10 codes for medical diagnosis cover over 90,000 different codes, these codes could be enumerated in advanced based on extensive historical records of medical diagnosis. Even with this number, as I mentioned above, there are many option for “not otherwise specified” subcategories.
I would expect that coding for government work activities to result in at least as many different codes but with the disadvantage that we have no prior record of what these activities might be. Adopting a coding approach that inherently allows for dynamically growing list of activities will permit us to eventually accumulate a useful separation of codes to define the range of work products performed by government workers. Also, this can allow us to implement this much faster than than the 10 years it took to design the ICD-10 standard and the 20 years it has taken to adopt it.
We have an immediate need to provide better public visibility into the activities of government staff. Adopting a flexible GUID-based coding scheme for government work products can permit a rapid implementation of this requirement.