Bright Data make trains run on time

This video from a conference presents a demonstration of extensive instrumentation of an entire public infrastructure project (in this case trains).   This data provides central controllers access to real time information for virtually every aspect of the operation from the mechanical operation of individual trains to the number of people waiting at stations.

The presenter describes this particular demonstration as an example of the future of the Internet of Things, where every object is connected and remotely accessible.  Although the demonstration was about trains, the concept is applicable to everything from commuter trains to sewer systems.   The speaker asserts that every industry and government has the requirement for the tremendous efficiency from remote expert when it is done right.

The problem with that last statement was who was not in that room.    This is an industry conference for industries or bureaucrats from executive branches of government.   Despite the reference that this represents the future of the Internet of Everything, the later descriptions of the architecture makes it clear these are private Intranets.   The architecture globally distributed and extensive like the Internet.   But, unlike the Internet, the public is excluded.   That exclusion is explicit in the brief mention of security that obviously has to occur everywhere.   The diagram showing where security applies implies that security in this context is to keep unauthorized people out.    The public is excluded.

I am reminded about the apocryphal statement about Mussolini’s government making trains run on time.   Whether this happened or not, my interpretation of the statement was that strong centralized control exclusive of public participation is needed in order to obtain beneficial efficiency and reliability.    That is what I was seeing in the implied audience of this presentation.   Centralized planner building private networks of everything secretively with no need for public oversight or review.   This is happening in governments.

In an earlier post I described how we are moving toward a new form of government.  This is government by data.   The above video illustrates one example of this happening.   The demo itself focuses on non-controversial routine questions that would normally not be of interest for public participation.   However, overall this is a system that will affect the public in many ways.

In this example, the people are using the transportation system.   They will live through the good and the bad of whatever this system produces.   If the trains run more efficiently and that crowding is avoided, people will probably not complain.   It is not hard to imagine where the system can make things worse, causing the entire system to break down or even result in injuries or death.   When this happens, people will demand investigations, accountability, and corrective actions.    As things are going now, this will happen too late to wake people up that they should be involved today right now in the planning and approval of these systems.

In my example about government by data, I asserted that continued democratic participation is still needed.   People will demand access to the data that is effectively controlling their lives.    In the train example, if the train system says we need to evacuate a train in an unfamiliar neighborhood, we will demand the data to explain why.   We will want that data in the same real time as the operators, where that data is deliberately hidden from the public in the name of security.

The time for democratic participation is now, during the planning phase to assure that sufficient data is available to the affected public.   Sufficient data is all data that relates to what is affecting any individual.  I illustrated this in an earlier post about employment scheduled by big data where employee needs to be empowered by the same data that is affecting his employment life.  It is easy to imagine similar unintended impacts on individual lives for these highly instrumented systems.

In emergency scenarios requiring careful crowd control, this information may be delayed in order to efficiently move the crowd, but we should be planning now to be sure that individuals will have access to the relevant information as soon as possible afterwards.  People’s lives are affected by data.   They need this data either to regain some control over their day-to-day lives or to retain democratic participation in government systems that affect their lives over the long term.

Returning to the immediate scenario of the demonstration, the demonstration emphasized the real time operations of a system.   I felt the scenario was self explanatory and easy to relate to.    What bothered me was what was left out.

The first omission was the difficulty of managing extensive, diverse, and globally distributed instrumentation.   The term Internet of Everything is explicit that everything will be an instrument.   The telecommunications field has a long history of such large scale and long term instrumentation.    The problems accumulate over time as more types of devices are added, or existing devices have multiple vendors, models, or versions.   Devices break down and they will send alerts but these may not coincide: alerts may be sent in correctly functioning devices or malfunctioning devices will not send alerts.    There is a need for extensive event management capability to manage the life of all of the possible instruments to be sure they are all functioning properly and to efficiently manage operations support of instrument.   The routine manual operational support of everything being an instrument is not trivial over the long haul.

It is easy to visualize the operation where the sensors are all working and well placed.   The centralized monitoring capability necessarily must be optimized to organize all of that information into just few views in order to not overwhelm the operator.   What is omitted is how this optimization of operator view can remain relevant when devices fail, or they are replaced with new instruments or different technologies entirely.   Part of the challenge of the Internet of Everything is the problem that everything keeps changing and that change is not uniform or instant.   A new kind of video sensor may become available but it will be deployed in phases where some locations may never receive one at all.

I imagine the scenario of needing to evacuate a station quickly to direct people to appropriate exits.   The standard procedure may depend on observations of accurate counts of people in the station.   What happens when the scenario occurs in a station that lacks this ability to count people?   What happens to the scenario when the station has this capability but that capability is failed or reporting incorrect data?   There is a possibility that the unique circumstances of a particular location can overwhelm an operator with inappropriate procedures for the data available and thus failing to meet the goal.    There is a need for public participation to know that these details are being thought through extensively for every possible location where this technology is used.

The other omission is implied by the mention of the vendor involved.  Microstrategy is a company that specialized in data mining capabilities.   They demonstrated the real time observations for operational needs.   Implicit is that all of this information is going to be stored indefinitely for later analysis.   The Internet of Everything means more than just everything being connected for immediate operational control.    It also means that everything will be stored indefinitely in large data stores and will be queried arbitrarily for questions that are likely not foreseen when the events happened.

Implicit in the statement that this will lead to efficiency is the idea that this efficiency will come from off-line mining of the historical observed data.    However, as we’ve learned in other data systems, the same data may be used for other purposes as well.  In either case, I think it is appropriate to ask how much of this data should be available for direct query by the public.    For example, one analysis may be in optimizing allocation of trains to tracks based on historical patterns of traffic on particular days.   This information will affect the passengers in terms of what to expect for wait times or crowds.   Access to this information can allow them to plan their days to better coincide when there will be fewer crowds or when there will be more trains.   We need a public discussion right now about how much of this data should be kept private and out of reach of the public.   The public needs access this same data and the same query tools available to the internal analysts.   Maximizing the public access to the data and tools will help to limit the abuses possible where only certain people can access that data.

The final point I found missing from the demonstration was about the security of the information.   How does the operator or the analyst of historical data know for sure that the information in the data is accurate, actually occurred at time and place asserted, and came from the correct, authorized, and calibrated device?   Just as the demonstration could easily be a training simulation using fake data, it could be an operational station being fed deliberately spoofed data such as playing back data from historical recordings or by replacing data with deliberately modified data.   The operator may see a normal scene for that time and place when in fact the location is over-crowded or extremely chaotic.    Another example may be the analyst looking for historical trends of crowd counts that had not been refreshed for random number of hours.

As a public concern, this data is being used to affect the lives of the public.    The model for data that affects people’s lives is data that admissible as evidence to present to juries during a trial.  I described this in an earlier post.   This data must survive scrutiny and cross examination to be sure it is direct observations instead of hearsay, and that its source is infallible.     This kind of confidence needs to apply to all data involved in such public projects.   Like the court proceedings, the data that is used to make decisions needs to be proven to be trustworthy.    Similarly the systems working with this data should include a measure of doubt and a standard for how much doubt is tolerated before making a decision.

What I see missing is any concept of doubt beyond alerts about signal strength, or of sensor failures.   Even if everything is working correctly, how certain are we that the reported information matches reality.    Historically, there is a good reason for operators to be in the same physical location, such as train operator being inside the train.   He can see and feel for himself what is actually occurring.   That confidence is lost when it is digitized in sensors and distributed over networks and presented in a remote and comfortable control room.

As I watched the video, I saw a sales pitch for confidence in this technology.  Not only can we trust the presented information, we actually have no reason to doubt it.   I assert that there should always be a presumption of doubt.   There are too many ways the data can be wrong, most of them for unanticipated reasons.

The video presents the concept of instrumenting everything and connecting them into a network that can distribute the information for central control and planning.   This presented as a private choice by industry or government bureaucracies.   It is also presented with a huge confidence of the power of technology for sensing and for distributing sensed data.    To me, and especially for public projects, these concepts will affect the lives of the public in profound ways usually associated with government.   As these systems become deployed and more integrated, there will be less visibility to the public and there will be fewer opportunities for the public to participate in the planning or operation of the project.   However, these are projects that typically are associated with government.

For democracy to retain relevance in this new word of Internet of Everything, we need to demand that the Internet part of that phrase to mean open to all.

Advertisements

6 thoughts on “Bright Data make trains run on time

  1. Pingback: Risk of predictive analytics taking data too far | kenneumeister

  2. Pingback: Data Supply Chain: data enrichment close to the source | kenneumeister

  3. Pingback: Extraordinary Popular Delusion and madness of Crowd (data) | kenneumeister

  4. Pingback: Extraordinary Popular Delusion and madness of Crowd (data) | kenneumeister

  5. Pingback: Data Quality, Governance, Trust when some people don’t play nice | kenneumeister

  6. Pingback: Dedomenocracy’s nemesis: the innovative criminal | kenneumeister

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s