Data science and education

This post follows up on last post suggesting that there should be a fresh emphasis on data science (the selection, scrutiny and analysis of data) in the elementary and secondary education.    Recent education reforms such as common core present an opportunity to look forward to the challenges of future citizens and workers participating in the modern economy.    In previous posts, I asserted that data science will be an essential part of participation in democratic government and in the workforce.    Education is not giving data science the attention it deserves.   I am not only disappointed by the lack of attention, I’m concerned that we are teaching the wrong lessons.

In my last post, I described one essence of data science as a habit of investigating what is being presented information.  This habit involves questioning the presented content and then pausing to investigate any and every observation that is intriguing.    I compared this to a text with lots of hyperlinks that jump to related documents, but extended it to include any part of the text that can be explored through online searches.   Education should have a goal of developing habits in students to look for information behind the information presented to them.   For example, students should be rewarded for finding information that the student found by taking the initiative to investigate into a concept presented in a particular text.

I contrasted the older technology of printed text books that were meant to be a sole source of content for a particular course.   For the older educational models, the information in the textbook was the information needed to pass a course.   Today, increasingly we are using digital media such as e-books on computers.   Unfortunately, the approach is to digitize the printed text book as if the only opportunity was to save printing cost and reduce the weight the student must carry around between classes.    E-books have techniques to research within a text: highlighting a part of the text can display a dictionary entry, an encyclopedia entry, or a internet search result.   Although students may be taking advantage of these features, it appears the educational system is not taking advantage of the opportunity to build educational opportunities inherent in this capability.    The old model of learning exclusively of what is in the printed text is emphasized in the rigorous standardized testing that leaves no room for variation that is inherent in discovery from self-directed research.

My last post ended with the observation that Internet marketing is conditioning people with the concept that investigating a concept (clicking into an advertisement) always leads to some form of a sales pitch that often includes some trickery to commit the reader to something.     This is conditioning people that departing from the initial text should be undertaken with greater caution.    Investigating an internet advertisement generally attempts to secure an economic commitment or revealing of personal information from the reader.   That caution discourages an investigation in the first place.   Internet advertisement suggests that researching a top level presentation ends up costing rather than rewarding the reader.

Education needs to present positive rewarding of investigating information.    In the modern world, it is crucial for people to investigate the information behind assertions.    People need to be comfortable with seeking out data behind a particular government policy debate.  In an earlier post, I presented the example where an employee needs better access to data that will determine his working hours in a just-in-time scheduling scenario.   I’m convinced that every aspect of our lives increasingly will be driven by data.   To survive or succeed in this world, people need to be proactive in doing their own data science.   Data science is knowing when to investigate (to challenge) a concept, how to query for selecting data that supports or refutes the concept, how to scrutinize the query results for relevance and reliability, and how to assemble the results into a conclusion about that concept.   Challenging concepts needs to become a habit, something that we do frequently, all the time.

Navigating through a data-centered world is analogous to driving a car.   In place of checking the speedometer or checking the mirrors, we need to be checking the data behind the concepts being presented to us.   This conditioning allows us to be the driver in a data-centered world.   Lacking this conditioning leaves us only the option of always being in the passenger seat.   My post about just-in-time employment scheduling demonstrated what it is like to be in that position.    The example case studies demonstrate hardships of coping with uncertainty that comes from lack of access to the data that determines their time commitments and income possibilities.  One way to cope in those scenarios is to query that same data to see what the future prospects are.   Although this may be difficult initially, I believe that the trend will be to make this data more readily available to all interested parties such as employees whose hours are affected by that data.   Ideally, the employee will bring a habit for investigating underlying data.   This habit will encourage us to demand access to this underlying data all the time.   We need to this access to data in order to remain in control of our lives and our ability to contribute.

I am disappointed that even the latest education trends are not emphasizing the data science: the conditioning to investigate information behind concepts.   In some cases, I see the opposite occurring.   We are conditioning students with bad practices.

I will take recent controversies of early math education of addition or subtraction.    I’m referring to recently publicized workbook examples that require multiple step diagrams to show thinking behind adding two numbers.

Part of the controversy is that the new techniques don’t match the techniques the parents learned.   This is a separate issue.  It is beneficial to enlist the assistance of parents to the child’s education by sticking to traditional methods, but it is possible the newer methods are proven to be more readily learned by more students.     I’m not arguing either of these cases.

For this discussion, I’ll illustrate adding 12 to 24.   When presented with a number problem such as 12+24, the numbers are completely abstract.   The traditional approach to learning involves memorizing rules for addition, and memorizing the addition-tables for various combinations of simple numbers.    This memorization does not offer any explanation.    The traditional approach tells the student to memorize the rules.   Being in the grandparent age group of current students, I’m biased to think that the traditional approach appropriately emphasizes an abstract approach to math involving numbers.

The modern education approach directs the student to produce models of the numbers and using diagrams that illustrate the process in more concrete terms.   In one approach, 12 is replaced by a ten-stick and two one-dots, and 24 is replaced by two ten-sticks and four one-dots.    Looking at the resulting diagram, the student can count three ten-sticks and six one-dots giving the correct total of 36.

Initially, I thought of this example as a good example of investigating the concept.   The concept is the operation of addition.   The investigation is that under this concept is something that involves ten-sticks and one-dots.   Like in my earlier descriptions of data science, this involves finding data behind the concepts (build stick and dot models), scrutinizing the data, and coming to a conclusion.    This seems to be representative of the habits of data science that I’ve been advocating.

I object to this because it encourages a weak form of data science.   The student is inventing a context for an arbitrary abstract number problem: add 12 and 24.  This invented context suggest the reason why 12 needs to be added to 24 is because there are ten-sticks and one-dots that need to be combined.   In my earlier discussions about my taxonomy of data science, this is an example of what I called dark data.   Dark data is invented data because we lack observations.   Dark data can mislead us into thinking this invented information is real information about the world.  Handling dark data requires advanced skills to identify it, isolate it, and subject it to additional scrutiny that we would not give direct observed data.

In this case we invent the concept that sticks and dots are involved.    We present this concept as an idealization of addition of numbers.  In fact, this is a fictional story we invent because we have no information about why the numbers need to be added.    We can’t begin to describe this fictional aspect until the student has been introduced to the ideal of bright data where we know in advance why we want to add the numbers.

In the earlier discussions, I described bright data as ideal data that is well documented and well controlled.    The best way to learn data is to start with bright data.   From a data science perspective, bright data for the need for adding 12 and 24 is a preexisting word problem.   In a data science approach, the student would query why is it necessary to add 12 and 24.  The query would be against a data store that would return the exact word problem for this addition.   For example, the query result may be a word problem of how many eggs will one have after buying two one-dozen packages when there is already one package at home.    The addition may proceed using the same addition rules, but there are no ten-sticks or one-dots.   There are cartons of eggs.    The invented ten-stick and one-dot approach to addition suggests the way to solve the egg problem is to rearrange the eggs into cartons that hold ten eggs at a time and then deal with loose eggs outside of any cartons.

My larger complaint is that it encourages a weak form of data science.   The ten-stick and one-dot approach encourages imaginary rationale for a problem.   Because there is no underlying word problem, we encourage students to invent a word problem.   They learn that the reason to perform an abstract math problem is because there are ten-sticks and one-dots that can be drawn on paper.   This is an invented word problem.   This invention is like getting dark data from a query.   A database query that returns not observed data but instead some data generated by some computer model.   The computer model generates ten-sticks and one-dots.   The suggestion is that there is a kind of reality to ten-sticks and one-dots.

Recasting this exercise as a database query can vastly improved the learning experience and begin to build the good habits of data science.  For example, the query could use a underlying database that is the school’s current list of students.   The test question of 12 + 24 are the computed counted from an underlying query of all of the Ethans and Emmas in the school.    The student is presented with this numeric problem so he could perform the addition abstractly using memorized rules and tables.   However, the student has the opportunity to query why are these two numbers being added.   That query would return that these are the counts of Ethans and Emmas.   The student has the opportunity to further subdivide the counts by classes and find that there are single-digit numbers in certain classes.   He could then proceed to work with this realist mix to come up with a strategy to add the results in succession as a cross-check with his initial abstract approach adding by following addition rules.   Like the above example of dozen-egg cartons, this example breaks the problem into classes that have at least one Ethan or Emma.   The database approach offers a rich and thus interesting examples with an opportunity for independent verification.

In the past we taught word problems after learning the rules.   Inventing word problems takes a lot of effort especially to be interesting and challenging.   While numeric problem exams may include dozens of problems, word problem exams are more limited in part due to the effort to produce a word problem for a particular case.    Today, large and readily available databases present the opportunity to make every numeric problem a word problem based on queries of that database.   An algorithm could randomly select dimensions such as first names, birth months, birth day-of-month, to produce a wide variety of a numeric problems.   In each case the top level abstract numeric problem has a real underlying word problem of bright real-world data that the student can investigate and verify for himself.  Word problems could be taught at the same time as teaching the rules because every numeric problem comes from a real and even entertaining word problem.  Word problems are database queries that resulted in the top-level numeric problem.   This is consistent with how our modern data-centered world operates.

Structuring education in this way where every problem comes from a database query of real and accessible data allows and even encourages the student to query what is behind this question.   This can being building up the habit of investigation that will be essential for thriving in the modern data-driven world.

I believe most of primary and secondary education can be recast as a database approach.   The students text’s and exams will always provide opportunities for the student to investigate the underlying data for independent verification.   This approach offers the new learning opportunity of rewards for independent discovery of that same underlying data.   This ability to independently explore data and discover patterns will be very valuable for the student’s future.


12 thoughts on “Data science and education

  1. Pingback: Data Mining For Children | kenneumeister

  2. Pingback: In government by data, the morning paper is data open to all | kenneumeister

  3. Pingback: On congress using CBO to deceive the public about the Affordable Care Act | kenneumeister

  4. Pingback: Electronic Records of government staff labor as a method of reform | kenneumeister

  5. Pingback: Improving government with frequently updated laws: rule by data | kenneumeister

  6. Pingback: Government by data and urgency, but urgency must be defined by data | kenneumeister

  7. Pingback: Economic motivation in dedomenocracy: avoiding culture of poverty | kenneumeister

  8. Pingback: Electronic Records of government staff labor as a method of reform | Hypothesis Discovery

  9. Pingback: Data science and education | Hypothesis Discovery

  10. Pingback: Data Mining For Children | Hypothesis Discovery

  11. Pingback: In government by data, the morning paper is data open to all | Hypothesis Discovery

  12. Pingback: Improving government with frequently updated laws: rule by data | Hypothesis Discovery

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s