The title of this post invokes the meaning of second guessing, or second opinion and applies it to hypothesis discovery.
Earlier posts presented my view that the concept of business intelligence or the use of big data to find new information is a project of discovering new hypothesis. The intelligence of business intelligence is the process of hypothesis discovery. We look at patterns in exposed in aggregations of categories that map large amounts of available data. The patterns suggest to us new hypotheses — alternative and sometimes unexpected new ways of looking at the world.
In those earlier posts, I emphasized the distinction of hypothesis discovery from hypothesis testing. A hypothesis that survives rigorous testing is one that we have confidence in applying. Without any confirmation testing, a discovered hypothesis is a guess. I do note that we may choose to test a guess by applying it to the real world. In such cases, we may not have a choice, or we may be particularly self-confident, or more willing to take risks. Such an application is the same as testing the hypothesis: it could succeed or fail. If it succeeds, we advance our reputations, careers, etc. If the test fails, we lose and if the failure didn’t cause any major problems, the entire test is forgotten. Within big data projects, the trend is increasing confidence allowing immediate application of the newly discovered hypothesis.
Another series of my posts concerning the different qualities of data that should make us aware of the risks of a discovered hypothesis being wrong. Ideally, the available data would be direct observations of the real world where the observations are accurate and well documented to provide confidence of their relevance. I called such data bright data in order to point out that real data tends to be dim, dark, or unlit. Even when we have confidence in the data system, we should continue to scrutinize the data within that system.
Part of the reason for increasing confidence in big data systems is the an initial assumption that all data is ideal. After discovering something interesting, we then invest in the effort to verify the data for accuracy and relevance. Given the large volume of data and the unexpected combinations that give rise to a new discovery, this is not a bad approach. The problem with this approach is that we rarely have the luxury of time to thoroughly scrutinize the data behind a new hypothesis. There is also less support for that analysis especially for long-lasting projects that has a history of successful testing of the hypotheses from that project. In effect, the previously dim, dark, or unlit data becomes bright data by virtue of being incorporated into a data system that has a good track record of being right.
Another series of posts concerning education promote my view that data science should become a part of basic education, joining reading, writing, and math as both a basic skill and a habit. I tried to make the case that the modern world is becoming driven by data. Independent participation in life increasingly will depend on our ability to query our own data to understand the constraints governing our lives and to prepare alternative arguments such as supporting opposing views in a democracy.
Modern life will be characterized by doing data science of selecting data, scrutinizing the data for accuracy and relevance, and formulating claims and counter-claims. This finally brings me to the subject of this post, and that is the importance of promoting opportunities for second hypothesis.
Someone comes up with a hypothesis based on compelling information discovered in a data analysis. As I described above, that is a discovered hypothesis that should be followed by testing. Also as mentioned above, increasingly we project our confidence in our data systems to our confidence in the discovered hypothesis so either we skip the test or just impose the minimal amount of testing. A second hypothesis (a second guess) is a way to weaken that confidence in the hypothesis, or to disconnect the confidence in the data system from the confidence of the discovered hypothesis. Empowering the audience of the initial presented hypothesis with data science skills and habits presents the opportunity to observe alternative hypotheses with the same data. Proposing an alternative and competitive hypothesis will introduce doubts that will diminish our enthusiasm for applying the initial hypothesis and increase our interest in testing the hypothesis.
We need to foster the opportunity for second hypothesis (second guesses, second opinions) by making the data more widely accessible. An important aspect of that accessibility is the skill and conditioning of data science. We need more people who are comfortable working with data and who are alert to the opportunities and problems presented by that data.
The idea of second hypothesis came to me as a read a story highlighted on LinkedIn from Quartz about employment drug testing. The story suggests that drug testing exposes racism in hiring but in an unexpected way. The story starts with a data analysis that shows that employment rates for blacks increases in states with higher frequencies of pre-employment drug testing. The introduction of drug tests appears to give an advantage (or remove a disadvantage) to a minority. The surprising notion is because of an earlier assumption that drug testing would have the opposite affect because we assumed that minorities were more likely to use drugs. Because in fact there is little difference in drug use across different populations, the drug testing eliminates this subtle racist assumption.
I mentioned earlier that part of our confidence in discovered hypotheses is because we have confidence in the data. I should add that another part of that confidence comes from the compelling story of the hypothesis itself. The short article makes a good case that states with lower incidents of drug testing and lower black employment rates suggests racism in the form of hiring managers making assumptions about drug usage based on skin color alone. Now that that is discovered, we can move on to improving policy to make drug testing more widely used for employment in order to eliminate this racist assumption about drug use.
Could there be an alternative explanation? Drug testing is a recent phenomena. My impression is that the market imposed these tests due to some sensational reporting about misdeeds by employees of companies where those misdeeds were found to involve drugs. Even if this fact of drug use had little to do with the misdeed, the employer was faulted for not being more careful in preventing the hiring of drug users. I recall this became a marketing tool as well where companies advertised their 100% drug free workforce as a competitive advantage. If a company needs to protect itself from accusations that hire drug users, it has no option but to use objective drug tests. There is no option to make this judgement based on subjective criteria of “this person doesn’t look like a drug user”. If this drug use is an important hiring criteria there would be a drug test. Conversely, if there is no drug test, then drug use is likely not to be a very high priority for the employer.
My impression is that most hiring managers do not care about drug use, especially recreational drug use, as long as the employee is responsible enough to be free of its influence during duty hours. What people do for recreation is irrelevant to whether the employee is productive in advancing the business goals. Unless drug testing is part of some marketing statement or liability insurance, there is little value to determining whether an employee uses drugs. My impression is that the drug testing is over-used where it is not needed although I concede that it is a widely accepted practice.
Again using my own experience in a hiring position, I wonder if something else may explain the same trend of the pre-employment test benefiting minorities. I suggest that the emphasis could be on the test rather than the drug. The hiring process usually involves some form of assessment of the candidate. That evaluation may involve an interview where the manager talks to the candidate. In that scenario, it is common for the manager to want to somehow test the candidate. This usually comes in the form of asking questions. Sometimes the questions are very specific about the job duties, but frequently the questions are more open ended such as “where do you see yourself in five years”. The ideal interview is evaluating the immediate responses to an objective question.
One of the recent developments compared to a half century earlier is the elimination of employer-administered qualification tests. In the past, the employer or hiring manager would devise and administer a test to evaluate a candidate’s fitness for a position. The result of the test gave the manager the confidence to make a decision. I think part of that confidence came from the mere fact that the candidate made the effort to get a passing grade in the immediate time frame of this particular hiring decision. Also, I think it made the hiring decision simpler by having some objective approach to eliminate the ones who failed the test. Due to various lawsuits, this type of testing became highly vulnerable to being challenged through lawsuits of discriminatory practices. As a result, the hiring manager lost this hiring decision tool of filtering out candidates who pass a test from those who did not pass the test.
Today, most larger organizations have strict limitations on what is allowed to transpire in an employment interview. Testing during an interview is highly discouraged. For competence testing, there is a rise on the requirement for external training certifications. What hiring managers want is some form of immediate testing for the current candidate. In some technical fields, this is in the form of various firms that specializing in timed objective competency testing through an independent third-party testing company. In less technical areas there is always drug testing.
Drug testing returns a pass or fail grade for a candidate. Usually the test has to be completed in a very short time period such as 48 hours of receipt of a job offer or interview invitation. This time-relevant test with a pass or fail outcome can help fill in the objective testing void left by forbidding employer-administered testing. Having any type of objective tests relieves the pressure of the subjective interview. Observing the results of a passed a test, any test, is preferable to making a decisions subjectively.
My alternative hypothesis or second guess of the same pattern of better employment outcomes for minorities is that it is the test that matters, not the drug tests. The key difference is the following. People who hire fewer minorities because they don’t use drug tests are not making the assumption that minorities are more likely to be drug users. Instead, lacking any acceptable form of objective testing, these hiring managers are left only with a subjective interview. I personally don’t think most people are racists, but left only to subjective assessments people prefer others who are remind them of people they are familiar with. The subjective-only criteria opens the opportunity for the discrimination against the other.
I believe most hiring managers want objective testing of candidates, where that testing is applied immediately during the employment processing to qualify a candidate for a job. It is not sufficient to have a credential or an employment history. There is a need to ask an immediate question and to get a reasonable answer. Having such testing opportunities frees the manager from the burden of subjective assessments. The hiring decision is simplified by requiring that the candidate pass a time-sensitive test.
This second hypothesis suggests a different policy direction for improving hiring diversity. In contrast to broadening the application of drug tests in particular to eliminate subtle racism, we should revisit the older practice of time-sensitive objective testing in general. This second hypothesis suggests we may have made a mistake in thinking that employer-administered testing were potentially discriminatory. The reality may be that the elimination of these tests introduced discrimination that was prevented by those tests. This is consistent with the above article that claims the evidence of drug testing is contrary to prior assumptions about more common drug use among minorities. It is likely we made the mistake about assuming the fitness of minorities to pass other perhaps more relevant forms of objective pre-employment testing. Just as minorities are no more likely to be drug users, they are no more likely to be less intelligent.
Although I elaborated on my thoughts on pre-employment testing, my intention is to illustrate the value of proposing alternative hypothesis for the same data. Even when we have confidence in data, we can raise doubts about a compelling discovered hypothesis by proposing competing hypothesis that explain the same data but suggests different policy recommendations.
The broadening the training and conditioning of the practice of data science provides us the opportunity to have more competing discovered hypothesis from the same data. These competing hypothesis can improve our decision making by encouraging more testing or more careful evaluation of alternatives. The competing hypotheses introduces the opportunity for debate that is implicit in the analysis of historical data. That debate is a key part of data science.