Data categories: what is my age group

In my last project, we had a challenge of assigning observations to a set list of categories.   To be useful for multidimensional analysis, there was a relatively small set of categories that had descriptive names but where we used algorithms to determine which category to assign to a particular observation.    The description of the category was chosen for how we wanted to read the data instead of describing the actual algorithm used to identify the category.    The problem was that it was easy to identify cases where a particular item didn’t match the description of the category despite the fact that it matched the criteria to place it there.

It occurred to me that there is a analogy of my own life experience.   In particular how it seems I keep being placed in the wrong age group.

In the late 1960s and early 1970s, there was a lot of talk about how the large population of the baby boom generation was challenging the country to adapt.  The baby boom was followed by an equally challenging baby bust generation.  The first couple years of 1960s were somewhere in between these too generations and that is where I was.   At different times I was placed in the baby boom generation or in the baby bust generation.   So which problem was I contributing?   The problem of being among the too many or being among the too few?     I’m sure this is common as the different generations are most recognized by their midpoints than their boundaries.   But in this case there was demographic significance in terms of strains based on numerical size of the population.    In the past decade or so I stopped hearing about the baby bust.   The baby boom was big enough to absorb the bust and still be very big.   So I’m solidly a boomer, I guess.   It wasn’t so clear earlier.

Another example is that my sister was born less than year after I was.  For about a month every year, we’re the same age.   We joke that we were twins for a month.   But more significantly was the timing of our birthdays.   Her birthday was just before the start of the school year and mine was just after the start.   That means at the start of the school year, we were the same age so we entered school at the same time: she being among the youngest in the class and me being among the oldest.   Because we were in the same grade everyone else assumed we are real twins.  This included both teachers and students.   The categorization of a grade in school is based on an age range of one year and the most likely explanation for two siblings in the same class is that they are twins we were subjected to policies meant for twins.

If the family had lived in the state a year earlier, I would have started the previous year because the way the eligibility was defined.   That made me more than a year older than most my class mates.   To make it easy to comprehend, I claimed I flunked kindergarten.   My mom vouches for that by claiming I wasn’t ready anyway.   This illustrate a different problem of making up an excuse for a categorization error.   The problem is that the birthday range for school eligibility failed to separate me and my sister into separate grades.

As I mentioned, I was older than most of my classmates.   This led me to want to be with the next grade ahead.   This provided a perverse kind of incentive to study harder thinking that maybe I could outrun the calendar.   I’m sure at one point I did think maybe just maybe they’d let me skip a grade.   It probably was possible and I might have succeeded except I think the administrators rather liked the idea of having twins in the same grade (or there was a policy to exempt twins for skipping grades).    I’m kidding about the administrators but not about the extra incentive to study hard.

The state had a policy that there would be mandatory driver’s education when reaching eligible age for a learner’s permit.   I would be eligible a year ahead of the rest of my class.   It was my last change to skip ahead at least in this limited achievement.  But even that was denied me because although I met the age requirement, the available slots needed to be filled by the more senior class.    If I had been in that class, I would have had a slot because I was older than some of them in that class.   This is illustrates a category error resulting in a different decision than what would have happened based on the age criterion alone.

I guess the next example was self-inflicted.   I did intend to go to graduate school and at least contemplated pursuing a PhD but after graduating from undergraduate school, I wanted to get some work experience.   I did return to graduate school five years later but I went to a remote land grant college where it was much more the norm that students would go straight from undergraduate school into graduate school.  In hindsight this is smart: I wouldn’t recommend my choice to anyone who wants an advanced degree.   I went back hoping to re-experience a college atmosphere but even that 5 years was a noticeable difference.    I was old enough to identify closer with  the faculty than the students.  In fact, I was granted several opportunities to teach classes in the absence of the professor.  I enjoyed that opportunity but in hindsight I wouldn’t blame anyone in the class who felt cheated for having a grad student as a teacher.  By this time I was really confounding any categorization.

Lately I have been unemployed.  That’s what I keep telling myself anyway.   My age is in the lower range of retirement age.  Some are considering me to be retired.   This is causing some new problems that I didn’t expect.   I say I’m between jobs but to others it looks like I’m retired.

This post is meant to provide an analogy to the problem of converting a continuous measurement (in this case my age) with a broad category with an arbitrary description.   The work-related controversies had to with a material property but it is fun to look at the same controversies from an individual living person point of view.

I think the various errors I recounted here are the kinds of errors that we accepted with the work-related example.

For the work-related controversy, we satisfied ourselves that these errors were tolerable for our narrow purposes at the time.    This is where there is a difference.  I am one individual passing through different arbitrary categories.   In contrast a data system will continue to use the same categories until something changes it.   The problem is when someone decides to use the data system for a different purpose than it is was originally intended.  Originally, we accepted the categories for its intended use, but we were aware they would not be appropriate for other purposes.   When considering a new purpose to reuse old categories, there should be an investment in labor to validate the categories for that purpose.   I fear we often don’t make that investment.

To clarify, I’m talking about description of categories, not the criteria.   There is less with a category described by the criteria such the category of a specific annual age.

What is dangerous is when the combination of criteria are given a descriptive label such as two siblings in same grade in school belong to the category twins.   Such a category may be ok for some policy decision (such as the policy to separate the twins to have different teachers) but it runs the risk of misuse of a policy meant for true twins (such as my suspicion disqualification of twins for consideration of skipping grades based on merit).


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s