Some thoughts about 737 Max grounding

This is just a quick post about some admittedly uninformed thoughts triggered by the recent 737 Max crashes.

Summarizing some thoughts from the last post, the plane introduced a new engine with better fuel efficiency on an existing air frame.   There are solid economic and environmental justifications to accept the new design and to prioritize the return of these models to operation.   However, the new design did present a new behavior that required new automation in the MCAS unit as well as some retraining of pilots.  It is alleged that both of these may have been under-developed.   The evidence so far from the recent crashes appear to back this up, and in any event I have no doubt that the grounding of the fleet is the correct action.

The question is how long is it going to take to fix the problems in software, hardware, and training in order to return the fleet to operational status.   So far as I can tell, no one knows how long it will take to fix these problems.

I think that is a major problem: the fact that they don’t have a good estimate for how long it will take to solve the problem.   Clearly the problem is complicated and that they don’t understand what needs to be fixed or how to fix it.   Also, the necessary condition is to restore everyone’s confidence that the plane is safe and that it can be reliably flown safely.   Such a criteria is going to require a lot of convincing of a wide range of decision makers.

Given the circumstances we have to work with, it is understandable that there are so many uncertainties at this time.  I question the circumstances itself.

Why did it take so long to determine there is a problem, and why did it require a crash to prove the problem exists?   Similar, I am frustrated by the fact that following a crash there is the wait to find the flight data recorder.   At the root of all of these problems are the legacy reliance of flight data recorders based on robust hardware on board the aircraft that records a lot of information.

The data could be transmitted over radio in real time and relayed to a data storage location.   I recognize the arguments that this would be expensive in terms of added cost to the aircraft but also the cost of a global spanning network of radio relays either on ground and sea or in space.   But as we are experiencing now, the down time of grounding an entire fleet is also very expensive.  I think we have reached a point that justifies the expense of a real-time flight-data collection of all flights to a central storage locations.

This new plane relies on automation directly in the form of the MCAS and indirectly in the computer aided engineering and simulation testing of the aircraft.   I consider the failure to be an automation failure perhaps more than it is a human failure.

Automation is essential to meet the schedules to achieve efficiency goals and serving rapidly changing market conditions.   We need to accept this modern reality of the necessity of automation in engineering and design as well as in operation.   In order to give the automation the largest advantage to develop a safe aircraft and to rapidly detect potential flaws before they lead to crashes, there is a need for extensive data of actual flight conditions.

In my thinking, the automation must have access to all of the flight data from every flight hour of every aircraft in service.   This data should have been available from the start of this operation so that it can be automatically analyzed for any anomalous behaviors.   As I understand it, there were pilot reports of complaints.   Why didn’t we have all the hard data to discover these complaints without needing pilot reports or more importantly the anomalies that the pilots did not report or may not have even noticed.   Clearly the data would be available within the aircraft.   The data needs to end up at a data center for deep analysis to verify the correctness of the automation both of engineering and of operation.

Had we the continuous collection of all data from all flights and a process to review that data for valid behavior, we likely would have discovered the problem before it lead to tragedy.   Now that we have the tragedies and now face the grounding of the fleet, the recovery of trust of the aircraft’s safety is currently hampered from the lack of the flight data of all of the successful flights and proper operation.   To test any proposed changes in software, hardware, or training procedures, I would want to simulate these changes against all of the actually experienced flight conditions observed to date, where those conditions presented some unique challenge to the entire system (hardware, software, and pilot) where the response was optimal as well as where the response was sub-optimal.

I don’t know if this data exists, but I strongly suspect it does not exist to the extent that it could have been: flight data for every minute of flight operation of every flight flown on the aircraft type.   If that data were already in large data storage already, we would be in a better position to identify the problems and to predict how much time is required to fix the problems and restore the operation.

We are in a period where we must rely on extensive automation both for engineering new solutions to meet modern imperatives (such as improved fuel efficiencies) and for the reliably safe operation of equipment that we can’t afford to have fail.   A necessary prerequisite for this automation is comprehensive data collection and storage to support analysis and simulations that will lead to more successful automation in the future.

Hardware flight data recorders have been proven to be very valuable in the past and we definitely should continue to use them, but they are no longer sufficient.   We need to invest in a network that will permit live continuous data collection from all aircraft flights to be stored in a data center where the data is available for analysis and for automated engineering to detect problems at the earliest possible time and to automate the engineering of a fix that solves the problem while not interfering with all the times when the system worked beneficially.

To get the best possible benefit from automation, we need to use all of the data from actual operation of the product of that automation.    We can no longer afford to wait for recovery of a black box, and then be restrained by only that one data sample.

To summarize, these are just thoughts that occur to me, a person who has no knowledge about the current realities of the engineering and data collection within the airline industry, but a person who recognize the value of comprehensive data collection, storage, and retrieval from actual operation of advanced designed systems, especially those designs that rely heavily on computer automation and simulation.

Addendum 20190317:

I published this before adding the second thought I had about this episode.   This concerns the fact that Boeing is facing lawsuits for liability for the two crashes.   Perhaps the crashes could have been avoided with better data about problems occurring leading up to the crashes or there could have been better testing or training.   My point though centers on the fact that the planes used a new technology in the more efficient yet more powerful engines.

There is a significant political movement to take dramatic actions to address global warming, a problem that they project has only about a decade for us to solve.   Solving global warming with dramatic reduction in greenhouse gases will require adopting a wide range of dramatic changes with very little time for testing and verification.   The solutions proposed will have to be adopted very quickly and they will be far more substantive then the change in engine and its placement on a single type of aircraft.

Inevitably, these climate change solutions will present major risks of large scale injury or death to people.   If the climate change remedies are really as urgent as needed, we will need to be prepared for some major disappointments.   We will need to be prepared to accept the losses and just as quickly adopt changes that advances the cause.  If solving climate change is really that urgent, we will have to exempt every effort to reduce greenhouse gases from any liability for anything that will go wrong.

The 737 Max is an example of an innovation that reduces greenhouse gases.  It encountered a problem that lead to major loss.  To restore the realization of its fuel efficiency we need to quickly solve the problem and restore the aircraft to operational status (with restored trust in its safety).   Lawsuits for liability will be counterproductive both for solving the current problem and for developing future designs that would be even more beneficial.   That said, the lawsuits in this particular case is not a major concern in terms of the larger project of reducing greenhouse gases.

I am only noting that this is an example of something we can expect on even a larger scale if we were to adopt major changes in greenhouse-gas saving technologies and practices with very little time for development, testing, and gradual employment.   Inevitably there will be a large scale catastrophe resulting in many injuries or deaths.   To meet the time scale proposed as being available for solutions, we will need to exempt from liability for any damages for something introduced too hastily and later exposed to have a major flaw in human safety.

From what I have seen of greenhouse gas saving solutions, few if any will receive the engineering scrutiny and review that the commercial airline industry subjects to their innovations.   We can expect these other initiatives to be even more likely to fail, and to fail at even larger scales.

In the greenhouse gas debates, there is a lot of discussion of the risks of doing nothing.   There needs to be more discussion on the risk of doing something, especially if that is done too hastily.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s