I recently observed a recorded lecture by Noah Feldman about the nature of evidence. He is speaking as a person trained in the legal profession but he is discussing evidence as it is handled in the sciences.
I welcome this discussion of the legal point of view of the handling of evidence by the sciences. In several earlier posts, I wondered about the intersection of law and data. In particular, I view data as really just a form of evidence. Data is about something that happened in the past. My observation is that once we see data as evidence, then we are confronted with the task of treating data like lawyers evaluate evidence. Data, like evidence, is subject to critical scrutiny.
I think this lecture does a good job exposing some of the scrutiny that can apply to data. In particular, he is describing the nature of evidence in the science. The core of his argument is that all evidence is interpreted. There are three levels of interpretation. One is the interpretation of individual scientist presumably skilled of both the domain of the evidence and the proper analysis of it. The next level is the idea of an interpretative community that must be convinced of a particular interpretation of the evidence. The final level is the idea of a governing body that regulates the interpretative community in a way that sets limits or criteria for interpretation.
The lecture is entertaining and well presented. My goal for this post is to discuss some of my thoughts on the points he raises.
I have already mentioned my first reaction. Once we accept the idea that data is evidence, we are confronted with the possibility that the very nature of evidence can be subject to critical examination of the same breadth and scope that lawyers routinely employ. The lecture is valuable by pointing out that a lawyer will ask some very uncomfortable questions.
In fact, I think he asks several uncomfortable questions in the lecture. In the opening argument, he draws into question the very nature of evidence as being a product of interpretation that is subject to human traits of differences of opinions between practitioners and constraints imposed by interpretative communities and governing bodies. In his first example, he examines the quality or level of validity of different types of studies (randomized controlled trials or observed studies in actual world) and observes that top practitioners in a field can completely disagree on this point.
I like his comment about the scientists in the CERN HARP experiment didn’t consider the option of discarding the results of a study that was immediately recognized as having many serious flaws. This comment was just one quick sentence but I think it illustrates the kinds of questions that can be asked once we accept that data is evidence. What is the justification for proceeding to invest in a project that was already determined to be a failure? Being well prepared for his argument, he presented the answer to that question. His answer is that these scientist routinely deal with flawed data due to natural events outside of their control. He found that answer by asking the first question. He found someone who could give him a satisfactory answer.
My observation is that this is the kind of probing question we should expect to be asked of any use of data for a decision. I think most data scientists will be unprepared to provide an answer to that type of question, let alone a satisfactory one.
In my own experience, we had a system that required receiving a continuous stream of data analogous to a streaming video. The quality was degraded if we lost some of the transmitted data. We reported the percent of transmitted data that was received and observed that our purposes were met as long as there was only a small percentage of transmissions that were lost. Occasionally, technical issues would result in periods of very poor reception and we would tolerate that by finding comparable periods with higher quality reception. Eventually we encountered a technical issue that resulted in persistent and severe degradation of the reception of the data. We continued to generate reports with a big warning that a lot of the transmitted data was missing. We were satisfied with this “use at your own risk” approach, but I don’t recall anyone asking why we didn’t just shut down the analysis entirely until the technical issues were resolved. I don’t think we could have come up with a satisfying answer to someone who would insist on it.
The bad reception was outside of our control and we were trying to make the best use of what we did have. The lawyer question is why didn’t we consider the option of not using the data at all. As in the lecture’s example, we had techniques for addressing missing data including the identification of certain types of analysis that were not feasible with the available data. But I am not convinced that is a satisfying answer of why didn’t we just discard the entire data set and just wait until the technical issues could be resolved.
The real answer of why we continued to utilize the bad data was basically because that is how we got paid. I suspect a similar answer applies to the scientists on the CERN HARP experiment. If the HARP scientists discarded their data, then they’d have to find different work. Similarly, if we stopped preparing the reports, we would not meet the schedule of deliverable products.
I am not a lawyer, but the explanation that the continuation of handling bad data is a contract necessity is probably a very satisfactory answer. It does explain why the processing had to continue despite a consensus that the data was essentially worthless. The explanation is sound and the contract is well managed. The problem is that this explanation fails to defend the value of the data and of any analysis dependent of that data.
It is a satisfactory answer that processing and preparation of analysis was a contract necessity, but there is an inevitable follow up question of so why did we continue to use the analysis. I’m not a lawyer, but I’d be surprise that would not be asked. In this particular project, I’m sure we could come up with some reasonable explanation but it would be challenging because no one asked that question.
My point in this and similar posts, is that data scientists need to be better prepared to answer precisely these kinds of questions. We need to think about data as evidence that can be challenged in very indirect ways such as the above example. It is not enough to have efficient and effective implementations of relevant algorithms for available data. We also need to defend why we think the implementations are efficient, effective, and relevant given the specific conditions of a particular time. Being that all data is evidence, we may need to answer such indirect questions about even distant past practices.
Treating data as evidence can elevate the data science from technical expertise to being prepared to have satisfactory answers to any question that can come up in a legal discovery process.
The above lecture is very effective in gently presenting a critical description of science. At least for me, it was only after reflecting on what he was saying that I realized that he introduced reasonable amounts of doubt about the quality of evidence in science. It seems very reasonable that educated and highly qualified people can disagree on interpretation. It is very reasonable to expect some form of community standards for managing disagreements. It is reasonable that those standards may impose order through limiting communication of dissenting views. All of this makes sense from the reality of human activities. But combined these introduce an element of doubt about the reliability of the products of that community.
As data scientists, we need to be prepared to answer that doubt that can be inferred from our answers explaining our routine activities.