In recent years there has been an overflowing of reports about the limitless potential of big data solutions to solve all of the worlds problems. I think I have seen examples that touch on every conceivable human fear from curing cancer to early identification of earth-colliding asteroids.
Most of the recently realized benefits have come in the form of marketing or in new businesses made possible by new marketing tricks. However, reports of the success of these projects reinforces the claim that big data can solve problems so we can trust it for automated decision making for all problems facing humanity.
Today we are facing a crisis of an out-of-control Ebola crisis. In response, we are attempting to mobilize a great many resources to set up emergency hospital beds, provide transportation services to get people to those hospitals, and train new staff for optimal handling of cases to prevent further spread. At the same time we are accelerating approval processes for new medicines to better manage the illness or to find a vaccine to prevent it occurring in the first place. There has been some criticism that some of these efforts have started too late and are not proceeding fast enough, but the logistics of setting up treatment centers for all affected individuals is logistically challenging given the geographic area involved and the rapidity of the spread of the disease.
The epidemic is currently out of control and rapidly spreading. Projections indicate a potential for a million (or many millions) deaths in the next 12 months. Although better medications and better health care facilities will help, they are inevitably coming too late to make a significant change in the spread of the disease.
Meanwhile, big data enthusiasts continue to comfortably promote their promises for solving climate change or promoting get-rich schemes of disruptive businesses. There is no mobilization of big data to address the spread of Ebola. Big data may in fact provide huge benefits in controlling the spread of the disease. The present Ebola epidemic crisis definitely can benefit from the super-powers of big data to save humanity.
It is increasingly unlikely that setting up better hospitals or delivering more effective medications is going to have much impact on the number of Ebola illnesses and deaths in the next 12 months (if not several years).
The effected areas are experiencing deeper social problems that are not only impeding efforts to control the disease but actually are accelerating its spread. Examples include:
- Attacks on emergency response teams because of fear that these foreigners are spreading the disease
- Invasions of hospitals to retrieve sick relative because of distrust of hospitals or misunderstanding of the disease
- Looting of contaminated hospital supplies (such as bed mattresses) because of the local poverty conditions
- Actively resisting offered assistance for relatives who may be afflicted with the illness
These social conditions are playing a large role in the spread of the disease and the increasing severity of the crisis. Our response to the epidemic can become better informed if we had a better understanding of the finest details of these social issues. These same social conditions appear to be ideal opportunities for current analytic capabilities of big data solutions. As noted above, so far the biggest big-data successes involve social-media inspired marketing or business plans that manipulate consumer behaviors for some benefit. The root social problems exacerbating the Ebola epidemic crisis is a very similar kind of problem.
Although I see some evidence of big-data solutions to support the logistical efforts or the acceleration of medical resources, these are incremental improvements of processes we already had available. Big data benefits for these existing efforts are primarily in reducing costs instead of increasing effectiveness. Big data is not yet being used in a way that can provide unique value in managing the spread of the disease. One area for a uniquely valuable contribution is in big data’s already demonstrated ability to exploit social media data.
To address the social issues, we need social media data on the healthy not-yet infected populations. The lack of data science participation at this level is easily excused because the epidemic is occurring among the most poor nations with little access to social-media technologies such as smart phones and Internet groups. The excuse is that we can’t enjoy the benefits of big data solutions if there is no big data to exploit.
I think big data advocates are too comfortable with the excuse that this region lacks the big data required for general population social media data mining. It is convenient to sit out a crisis where the risks are so near term and so high. It is far more comfortable to talk about improving profit margins or preventing some catastrophe that might otherwise occur a few generations from now.
Clearly we lack big data on the individual-level social interactions in these areas. Although we can not immediately run queries on existing data, we could start identifying the uniquely helpful analytics we could supply if we had the data. Identifying these opportunities can motivate others to provide the resources to make this data available. Logistically, it is easier to distribute smart phones and set up cell towers in healthy districts than it is to set up emergency hospitals in infected areas with an already infected population that exceeds the number of beds available.
There is a huge opportunity for big data to make an impact on regaining control of this epidemic by analyzing social data of healthy populations and their degrees of separation from the infected populations:
- Identify the different local traditions or customs for responding to illnesses
- Identify distinct communities with different customs or prejudices
- Identify fine-detail boundaries that separate territories for these different communities
- Identify communities with mutual distrust or animosity, or with mutual respect or trust
- Collect longitudinal data to shows how routine behaviors contributed to the introduction and spread of the disease
- Identify routine behaviors that may help identify the encounters with the animal vectors for this disease
There are limitless possibilities of what we can learn from social data from communities before, during, and after the epidemic impacts them. Big data proponents have an opportunity now to identify useful analytics and the necessary data to provide useful results. Perhaps this is happening, but I don’t see it.
Much of the history of big data is to take advantage of what already exists. The bigness of data is the result of exploiting and combining data that already existed and previously served some other function, usually in the context of operating some aspect of the business. Big data promotion is largely about reusing existing data in new ways. The innovations in big data apply new analytic or visualization algorithms on existing data. Because there is so much existing data (largely unexploited) available, there is little incentive to identify new sources of data. The benefit of big data is gaining value from the application of cheap algorithms to existing data.
Collecting new data is expensive because collecting data is expensive. It is easy to sell big-data analytics on existing data because somehow the cost for collecting the data in the first place has already been paid. Much of big data marketing is about how to exploit existing data: either to build more extensive storage facilities, or to apply more sophisticated algorithms.
This is not going to help the Ebola crisis. The Ebola crisis is immediate. It is so urgent that we could use all the capabilities available. As I list above, I believe there are many areas where big data can provide unique and highly valuable insight on how to manage and contain the spread of the disease and to facilitate more effective responses. What we are currently lacking are serious efforts to identify these opportunities.
Government leadership can benefit from two offers of volunteered assistance from the data science community. The first as I suggested above is listing various social analytic products that can help manage various population responses to the disease or help us adopt responses to mitigate their customs that may risk contracting or spreading the disease. The second assistance is in the commitment to invest in developing algorithms and in supplying analytic resources to be prepared to perform the analysis if the data were to become available.
Once we become aware of the benefits of a big-data analytics to manage the populations to better restrain the spread or severity of this epidemic, we will have incentives to find ways to obtain the data to begin this process. Although the epidemic is occurring in very poor and underdeveloped areas, many of the examples I identified above may be practical to obtain. The primary missing data for these questions are missing meta-data from modern social media.
This essential social-media applications is accessible through smart phones. World-wide, we have a huge capacity to manufacture smart phones. We also have a surplus of older generation phones that often are directed to waste disposal or recycling. There seems to be enough capacity to provide free phones to the effected areas.
Many of these areas lack electrical infrastructure to recharge the phones. However, we also have a huge capacity to produce solar cells that are especially well-suited for the relatively low power requirements to recharge a phone or a small community of phones. It is feasible to have a electrical power infrastructure entirely for the purpose of powering smart phones. It may be practical to manufacture cheap phones that integrate the solar panels on the back of a smart phone for a self-contained self powered unit. Even if they are heavier and thicker than what is acceptable in the retail market, they may be ideal for this specific application.
Smart phones require cell towers that also are lacking in these areas, and cell towers do require a lot of highly reliable power. Cell towers also are fixed resources that are frequently targets for sabotage and looting. I don’t doubt there are many innovative possibilities that can supply this necessary infrastructure. For example, I note that so far the crisis combination of poor communities dealing with Ebola is concentrated near the equator. It may be possible to launch a modest number of equatorial orbiting satellites in very low earth orbit to provide the needed cellular connectivity. The satellites will have their power source and be close enough to support the needs of this population. Power budgets may be managed by switching on cellular services only when accessible to the nations that need it. These could be relatively cheap satellites and launched quickly. The orbits may even be low enough that the satellites will reenter the atmosphere after a couple years. This will save on costs for cheaper launches and lowering engineering requirements for long-life reliability. Alternative options may be high altitude blimps or autonomous aircraft (drones).
We have the capacity to supply the necessary technologies to this region to begin collecting social media data. We just need the incentive that this data will provide valuable contributions to the fight against the Ebola epidemic and that the data science community is available to devote their efforts and put their reputations on the line to come up with big-data recommendations that really make a difference for an urgent and life-critical critical. First of all, we need data scientists to present solid proposals for how big data can help the management of the populations with the goal of improving our prospects for controlling the epidemic.
Update 10/1/2014: This CNBC.com article documents some big data efforts along the lines I described above, but the message is that this is not widely appreciated or accepted. Now is the time for big data to show its relevance to a real world crisis.