I work primarily with machine-generated data about machines. The goal is to apply this data to explain some current condition and especially make predictions to support planning. In a recent example, we are trying to understand the cause for some periods of slow performance. We had unambiguous information about the delays but the task was to explain the cause of the delays that were noticeably longer than nearby periods both before and after. The simple task was to report on these measures showing the frequency of slow events and depicting trends but we needed to know what conditions separated the slow performance from the normal performance.
The support group for the product asserted that the delays were due to a limitation of some resource. With this clue we gathered available data about any type of resource: memory, CPU, network, etc. When compared to the slow periods, in each case the utilization was much less than what would be considered to be high. Note, this was all available data. The support group assured us that some resource was running out during slow periods but this utilization was so short in duration that it would get averaged out in the available measurements.
For machine data, we have the opportunity to request future releases to improve monitoring to allow us to isolate without ambiguity what exactly was getting exhausted. However, even if that is possible, often we never get this kind of metric. Often the developers would not know of a cost-effective way to measure this, or even if they did, such measurements would be very low in priority for future development. Even though it is possible, it is unlikely we’ll ever get this data. Certainly, we won’t get this data for the current situation that does demand an answer. If we have to live with these events, we would benefit by being able to predict the occurrences or at least distinguish between the normal degradation from some new degradation that may indicate a problem that we could solve: we don’t want to a known unknown prevent us from acting on something that may be knowable.
Lacking the direct measurement of what we were looking for, we sought out some proxy for the measurement. For example, instead of looking at absolute utilization, we looked at sudden changes in utilization. Even though the result was still within normal limits, the sudden change could suggest a small period of high utilization that would get averaged out in the measurement. In addition, we compared the change with the statistics of recent history: a change of more than a couple standard deviations from mean (of previous 24 hours) could be a hint that there would be a problem. With one of these proxy measurements, there was a good correlation with the observed periods of slow performance.
There is no causal reason why a change in utilization would cause slow performance if the changed utilization is not above a normal limit. We hypothesized that the change could correlate with a brief period of high utilization, but we had no direct evidence that this occurred. This derived measurement (a dramatic change in utilization) almost always coincided with the observed slow behavior, but the correlation was not perfect because there were other periods of sudden changes that did not show any slow performance. This was a poor substitute for the missing measurement of the resource being over-utilized.
To improve this measure, we included more tests of sudden changes in other things we could measure such as volume of work done or the mix of different kinds of jobs done. Again, there was no prior known causal link for these measures on the observed behavior — if there were the product developers could use this information to improve this performance. We were using what we could measure to substitute for what we couldn’t measure.
In the end, we came up with a combination of various unrelated measures where the simultaneous increase in more than one one these tests were well correlated with the presence or absence of the problem. This was enough to implicate an internal cause of the problem and lessen our concern that some new external cause may be present. This improvised measure is not a substitute for the direct measure we wanted, but it was useful to satisfy the immediate problem of isolating the problem to an internal cause, one that should be solved with an upgrade. The eventual upgrade did in fact solve the problem.
I mention this as an example of using a combination of calculations of quantities that have no direct causal link to the phenomena and yet ended up being reasonable at explaining that phenomena: that the cause was the under-sized machine and the problem would be solved with an upgrade.
I don’t work with social or psychological data, but I do have at least some familiarity with working with proxy data for something we can not directly measure. I am also familiar with how difficult it is to defend the use of some measure that is not clearly linked causally to the problem. I often find a metric useful for making predictions even though I was using somewhat arbitrary calculations on available but only remotely related measurements.
There are problems that lack direct measurements and involving complex processes that can not be explained with single simple measurements. Meanwhile, we often have an abundance of easily measured quantities that are either too coarse or too irrelevant to causally explain the results.
I see at least some parallel between my experience and the topic of IQ measurements, although I recognize that IQ is far more thoroughly studied and refined. IQ is calculated from relatively easily obtained measurements through answers of multiple questions in a test that can be administers in a reasonably short period of time.
The result is a score or a set of scores, but these are just scores about taking a test (designed to be easily scored) under artificial circumstances of a test environment. While we can label these scores as an IQ where the I stands for intelligence, in the end the score is just a measure of answers to a set of questions.
I understand the topic well enough to know that at this point it gets very controversial. What we really want to know is something about potential outcomes of an individual when presented with a new challenge. We want to know how likely that person will be able to fill some position, or how well the person will respond to some challenge, or even make a prediction of entire life outcomes (wealth building, health maintenance, resilience to obstacles, etc).
Ideally, we could tap some internal measure that would read out exactly the quantities that would determine these answers. To that end, we are striving to find those measurements through detailed DNA sequencing, more extensive medical measurements, or more exhaustive collection of life history, but these are a long ways to being practical to implement. Meanwhile, we have a relatively simple measurement of some test score, and we have found strong correlations of that score with a variety of answers we are more interested in. IQ measures even correlate very well with topics such as life longevity or high quality of health, even though it is hard to explain those solely on intelligence.
Personally, I have no idea what my IQ is. I suspect is it fairly close to average (100). Even if it were high, I would not like the idea that the set of tests would somehow capture what I consider to be what I do that I consider to be smart. I have good reason for this, because often when taking IQ-like tests (such as finding a missing diagram that matches the pattern of the other diagrams) I will find matches that I know would be scored as a mistake even though I feel confident I can defend it. I will choose the wrong answer even if I know it is not the right one and take pride in knowing I just lowered my score.
My point is that I don’t like the idea that an IQ score in any way describes my capabilities. Unlike many other people though, I’m content to lose out on opportunities because someone considers me to be not smart enough based on some score. In school, I often went out of my way to get lower grades with the confidence I could have done better. There were plenty of examples where I was disappointed in some tests that I expected to do better, but there were many others where I was content that the score was lower because I got wrong marks on exactly the questions I knew I provided the wrong answers. My eccentricity is that I prefer not to compete with someone else who is able to do the same work. I manipulate testing as a way to get out of work someone else could do.
Back to the topic of IQ testing, the testing does not directly measure what we want to know. IQ measures something about an individual that is very consistent over the lifetime of that individual: getting higher than normal score in youth reliably predicts getting a similarly higher score later in life. It also measures something that is highly correlated with things we are interest in: fitness for a certain type of task, life outcomes, etc. A case is made that if one measurement is sufficiently correlated with another measurement, then the measurements are measuring the same thing. Thus a score on a test called an IQ test measure intelligence because it highly correlates with our later observations of actual intelligence (however we define it). It may still be a spurious correlation, but the extent of data indicates that the correlation is very persistent.
In my opinion, the biggest complaint about IQ is the predictive power on individuals. A person may excel at some specialty but by doing something that makes a tiny contribution to an IQ test. A person bring a wide range of capabilities where each capability may have different levels of competencies compared to his peers: the aggregate capability that IQ tests attempt to measure may over- or underestimate an individual capability.
While IQ testing is controversial for evaluating individuals, it is even more controversial when aggregated over different populations. Many different groupings of populations show differences in the statistical distribution of IQ scores in such a way that is possible to distinguish the populations by measuring the distribution.
The controversy is heightened by the implication that one groups may be on average smarter than some other group. To me, this is unfortunate because intelligence of a group has no real practical purpose: the benefits of intelligence come from individuals, not groups.
When it comes to populations, the concept of intelligence is not relevant to the types of questions we are most interested in. One of the questions of interest about groups about a group’s ability to self govern. There are various forms of government: consensus (democratic), committee (republic), authoritarian (management). To be successful, all of these rely on persuasion.
As a result, what we really would like to measure by a group is the groups capacity for being persuaded. In particular, there are two different measures that are important.
- How easy is it to persuade the group to reject their current premise in favor of an alternative that is better argued with better evidence, better logic, or fewer fallacies.
- How hard is to persuade the group to reject a recently accepted premise in favor of a an alternative that is argued with poorer evidence, poorer logic, or more rhetorical fallacies.
When it comes to groups, the success of the group depends on the aggregate ability of the group to be persuaded. While it may help that the group has a number of intelligent members, in the end what matters is the group as a whole is persuaded on the better arguments and are not persuaded by poorer arguments.
It is not hard to imagine how a group with a high aggregate IQ could be poor at self-governing. All it takes is a disagreement about some objective to have rivaling arguments competently presented by intelligent advocates. Highly intelligent groups could end up making very poor self-governing decisions.
In contrast, a group with very low aggregate intelligence may benefit from less diversity of competing objectives by their leaders.
In terms of self-governing, what really matters is their capacity to be persuaded with good evidence, sound logic, and non-fallacious reasoning.
I think persuading a group is distinct from persuading an individual. We can train individuals for critical thinking about evidence and for skills in logic. In contrast, most groups have many members (if not the majority) who lack this type of training. To govern themselves, the group as a whole needs to be persuaded. To be governed well, the group needs to be persuaded on the superior arguments.
An individual is either persuaded or unpersuaded on the specifics of an argument. The persuasion of a group works differently since there will be different individuals persuaded to varying degrees and thus have varying degrees of agreement with the dominant view of the group. Depending on the nature of the group, the dominant opinion may require a simple plurality, a simple majority, or some super majority. Even autocrats need persuading enough of the population to discourage a rebellion.
In this increasingly globalized world, we confront the question of how well groups may govern themselves optimally not only to survive within their local context but also keep up with the standards of the global norms. The goal is for self government but that government should thrive at a comparable level to the global norm. What is that measurement that will determine a group’s success at self-governing?
In response to a clear lack of a measure of the group’s ability to govern itself, we seek a substitute of something that is easy to measure. A frequent candidate of such a substitute is the group’s aggregate IQ score. Group IQ seems to come up as a default assumption, but high IQ groups appear (to me) to have dysfunctional self-government. There is something distinct from intelligence that determines the group’s self-governing capability: the group’s ability to be persuaded to accept a new proposition that is better argued and the groups ability to reject new propositions that are poorly defended.
IQ may provide a measure of a person’s potential outcome in life. That does no good for the individual if his group can’t govern itself successfully.