Machines learning nonsense that works


Data without Theory is nonsense, and

Theory without Data is Poppycock

paraphrasing Dr. Tetyana Obukhanych (2013)

The dominant theme in this blog site is about the inherent value of bright data and the inherent distrust in dark data.    The above quote’s “data” is what I call bright data, and “theory” is what I call dark data (theoretical-model generated values).   This theme is nonsense, or at least not worthy of discussion.

I began thinking about the distinction of bright data from dark data when I recognized that machine learning (specifically with neural networks) are able to assign weights to an abundance of sensor observations through a process of learning.   The end result is that the machine is able to make accurate and useful decisions based only on sensor data and the set of weights it learned to assign to each input.

I argued in the past that this learned set of weights or factors for each component observation may be the equivalent of a theory.   The machine has come up with a theory that when applied to new observations can produce accurate and useful results.

The problem I noticed is that there are a large number of different neural network designs, with different numbers of sensors, nodes, and layers.   Each one of these designs will come up with a theory and each may have some degree of accuracy and usefulness, some being more effective than others.    Even when using the same design but presented with different learning examples or a different order of learning examples, there can be different patterns of weights assigned to each input.    All of these variations represent different theories about the world, and many of them can be competitively effective.

The problem with machine learning is that they are creating theories, or models of the world, that humans can not understand.   We know that as long as we copy the values of the weights and apply them to a copy of the system, that copy will behave the same as the original.

We can move a theory from one neural network to another, but only if that other neural network has an identical architecture of nodes and layers.   In general, we cannot copy a theory learned by one neural network architecture to another one with a different number of connections, nodes, or layers.   In particular, we as humans cannot comprehend the theory that the neural network is using.

The theory mentioned in the above quote is something that can be communicated between humans so they they may test the theory independently and debate the results.

With machine learning, we can tell that the machines are learning some theory about the world based only on observations, and we can tell that the theory is effective.  With more advanced architectures such as deep learning, we can tell that the machine’s discovered theory of the world is able to predict or uncover truths we cannot understand ourselves, even when the machine demonstrates that a theory is possible.

Machine learning, or artificial intelligence, challenges our historic approach to science of communicating theories between people who can then go about testing the verbal or mathematical description with independently designed experiments.   We know that the machines are figuring something out that we can’t understand, yet we have no way to test the machine’s theory with our own independent experiments because we have no way to comprehend the theory the machine has discovered.

Machine learning when applied to discovery is a direct challenge the entire notion of science.   We can insist on science being that which humans can communicate to each other so that others may independently test the concept.    Doing so would disqualify from consideration whatever it is that the machines are discovering.   The problem comes when the machines begin discovering things that are very useful and yet are beyond any human’s ability to comprehend.    This especially occurs when the machines are processing data from a multitude of sources involving a multitude of dimensions each generate data at rates too fast for human mind to process.

I am not an artificial intelligence practitioner, although I did dabble in it with some college courses and some independent training and experimentation.   I was always fascinated with the topic, but at the time when I was in college, computers were too limited to do anything useful even if I could afford them.    At the time, my skills didn’t impress anyone to the point of offering me a career path in the field.

I recall the simple machines I did build and marveled at their accomplishments even if they were very basic.   In particular, I contrasted the way I saw the solution of the problem with how I knew the machine was working.   Somehow both myself and the machine were getting competitive in terms of responding intelligently to some observation.   Even though it didn’t take long to experience the relief that I figured out things the machine got wrong (in part because these machines were underdeveloped), I still marveled at what the machine was thinking (metaphorically) when it processed its inputs.   It clearly wasn’t thinking about the problem the same way as I was.   Yet, it was reasonably competitive with me at least within the limited context the machine learned.

From a human perspective, the machine is learning something based on data alone.   The process of guided learning could be described as a way of communicating a theory by examples but it is clearly not as comprehensive understanding as the mathematical models I was using.

A trivial example might be to train some system to learn the theory of gravity by having it compute trajectories for particular starting conditions of mass and velocity.   I could feed it with a lot of examples of say something like bullet ballistics from a variety of hand guns (handguns or rifles).  Eventually it would learn to aim reasonably well, demonstrating some understanding of gravity.   However, if I then presented the machine with a problem with a gun able to fire over the horizon, it would not perform very well, and it would be incapable of figuring out how to closely flyby the planet Pluto.

The machine is learning something valuable about ballistics, but it is not learning a theory of gravity.   Of course, I can train it to learn these outside cases, but I would probably have to add complexity to the learning architecture.   Even then, I would have to train it with simulated examples since it is not feasible to provide it with a sufficient number of actual planetary missions for it to learn from observing.   Also, even then I doubt it will be able to predict where a spacecraft would be after thousands of solar orbits.

I interpret the opening quote to refer to a particular type of theory that can be communicated between humans.   Even that does not make sense when I consider that prior to Newton’s laws of mechanics, humanity had a long history of building machines and even accurately aiming projectiles (or accurately getting out of the way of them).   They did so with observations primarily.   They may have some internal understanding of how things worked, but when their attempts to communicate this understand was always flawed.   People learned to be good archers through practice, not through classroom training or through reading books.

Clearly, the humans that lived prior to knowing modern science were not living nonsensically.   For the most part, they were operating on observations alone.

There is a problem with demanding theory to be present before interpreting data.  It gives theory authority over the data.   The data’s job is either to verify the theory (contrary data should be rejected) or to make the theory relevant to the current circumstances.

One we have a theory, we need a plausible alternative theory before we will allow data to challenge the first theory.   In testing terms, we need an alternative hypothesis to motivate our testing of the null hypothesis.   We need that alternative hypothesis to propose a test where the null hypothesis conceivably would fail while the alternative would succeed.

Machine learning is not encumbered by this constraint.   Machine learning never has a null hypothesis, and neither does it have an alternative hypothesis, at least not in the comprehensive explanation sense we use the term hypothesis.  I can train a machine to learn to recognize the numbers between 0 and 4 until it is very successful while not recognize the numbers between 5 and 9 properly.   I can then continue to train the machine to recognize the higher numbers and it will gradually forget what it learned about the lower numbers unless I continue to train it with the full set of single-digit numbers.

As we get better capabilities to work with large data sets, and better capabilities of machine learning working on these data sets, we are observing benefits that are impossible for humans to provide on their own, and even more impossible to get accepted in the sense of scientific publication in peer-reviewed journals.

Eventually, we will want to take advantage of this technology to guide our governance.   If we do allow machines to govern us as I describe in my fantasy government by data and urgency, then we must give up the notion of theories.   The machines will be coming up with reliably useful results without ever informing us of any theory we can comprehend and debate about.

From a scientific sense, a dedomenocracy is nonsensical because it is data without theory.   If we allow ourselves to be governed by machines (as we increasingly are), then we are allowing ourselves to be governed by nonsense.   I think this is literally true.  There is no way we can make sense of what that government is telling us.

There might be benefits to living under a nonsensical government such as I describe as a dedomenocracy.   Such a government would have no use for formal testing of causality and may in fact operate only on simple correlations.   I described in earlier posts my belief that problems of spurious correlations (such as thinking that wet streets cause rain) become much less likely when correlations are observed over a sufficiently large variety of dimensions of data.   The multi-variate correlations become more likely to be usable as the number of variables increase.   In any case, the problems we are facing today involve far too many variables to test for casual relationships in any reasonable amount of time.

The current crises of how to respond to COVID19 and how to evaluate the safety and effectiveness of vaccines illustrate the failure of the scientific approaches of theory with data.   The scientific theory approach led us into a disastrous first reaction of strict shutdown of services arbitrarily defined as non-essential (a definition that automatically included nearly all government activities as essential).   The scientific emphasis on theory and null hypothesis testing forces us to continue a clearly bad decision until we can gather compelling proof to reject that null hypothesis of a bad decision.

We are similarly trapped with the warp-drive vaccine approach.   The null hypothesis is that vaccine science is mature, that vaccines are both effective and safe, and nearly perfectly so.   We do have abundant observations to the contrary.   In the example of influenza vaccines, many people who get influenza did have a recent vaccine.   Also, there are cases of damaged immune systems, organs, reproduction systems, and mental capacities that were proven to be caused by vaccines (or the additives within them).   We are stuck with the presumption that the next vaccine will be safe and effective, and thus should be mandated, because the data so far does not sufficiently disprove the null hypothesis of safety and effectiveness.

I do have concerns that our faith in science may be guiding us to a catastrophic failure unparalleled in human history.   The current obsession of annihilating the virus at all costs will likely collapse the entire modern economy that relies on near full employment in every  conceivable occupation (including illegal ones).   The current obsession of the necessity of vaccines for public health will likely lead us into being vaccine dependent, leaving us with an impaired immune system unable to respond to anything except when delivered in vaccine concoctions.   The obsession to defeat those who object to vaccines is leading us to mandatory and forced vaccinations with no exceptions.   When combined with a vaccine distributed just a few months after formulation, we risk crippling our future with a huge population requiring lifelong and expensive treatment for chronic disease caused by the vaccine, or even a future with a smaller population due to the inability to have viable offspring.

It is conceivable that our faith in science over observations could return the human condition to where it was at after the fall of the bronze age, only the mysterious monuments would need to be explained by even bigger giants.  The risk of this happening is significant even if it is unlikely.

We live in an age where this is an option to transition to a nonsensical government guided solely by data and urgency, where data refers only to recorded observations (the more recent the better).   This dedomenocracy specifically makes theory (dark data) optional and even readily dismissed in favor of actual observations.   Based on the examples we are already observing with machine intelligence, such a nonsensical government would be effective if given a sufficient volume, variety, and velocity of observations to work on.

In context of the current crisis, the nonsensical government has the freedom to make new decisions at each new alarm of urgency where the decisions are not constrained to follow some null hypothesis set by precedence of prior decisions.    The decisions are made only for the information currently available, without any need to respect prior theories.    Of course it is impossible to predict what such a government would do in the current circumstances, but it has options we do not permit us to have.

It could decide that the best approach is to permit over-the-counter use medications such as hydroxychloroquine for people to take when they have early symptoms with the prospects that it would allow a large number of people to recover more quickly with fewer complications.   It doesn’t have to be perfect to provide provide the public-health benefit of minimizing the ability of the virus to transmit to others.

I am not saying that this is a good idea.   But it is an option that the nonsensical government is free to consider, and one that our null-hypothesis approach forbids us from considering.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s