In the early 1980s there was a TV show called In Search Of that claimed to present in each episode a balanced review of various controversies where the available evidence does not satisfactorily explain the mystery. This post is not about the content of that show, but instead on the concept of setting the word “search” apart from “evidence”. This concept implies that search is what we do after we exhaust all of the available evidence.
In several earlier posts, I compared data in data stores to evidence and asserted that the two concepts are the same thing. Data and evidence are artifacts with some claim of an observation of the past. Evidence may be physical and stored in jars or shelves and evidence may be ambiguous what it means, but it captures a form of information about a past event even if that event was the production of the evidence. Data in computers has similar problems in terms of varying quality and ambiguity. In either case, both are available for retrieval and interpretation. When the available evidence fails to satisfy our curiosity we would need to go out and search for new evidence we don’t yet have in our catalogs.
Today’s technologies supports a rapidly expanding capability to exploit data. We have increasing numbers of ways to capture data, and these methods are so cheap we can deploy them everywhere. We have increasing network capacities to move data to data centers, and those data centers have the technology to store all of this data. The data centers have technologies that can rapidly retrieve, analyze, and visualize the data in its stores. Although individual data stores may constrain their focus on particular topics, those projects made choices to exclude data that could have been available.
Today, the universe of available data is immense in variety of data types, in the ubiquitous deployment of sensors, and in the frequency of data measurements. The immensity is so overwhelming that we are beginning to use the word Everything to describe it. Although the recent concept of Internet of Everything is more specific to ubiquitous networked sensors, the concept can also describe our expectation that everything will be captured and retrievable as data.
If the data stores capture everything then a properly implemented algorithm can retrieve information for about anything. In other words, we are beginning to accept the idea that a search for something means to query a database for something. In the modern sensibilities of avoiding relational database terminologies, we may instead define searching as data analytics and visualization. But ultimately, a search is becoming synonymous with a query of available data.
Despite the immensity of the data stores, they do not capture everything. The dust on my floor is right in front of me, but I don’t think there is a way I could query for this fact from any data store. Of course, that dust isn’t that important, it isn’t even important enough yet to motivate me to sweep it up. But it is a fact that may have some consequences such as embarrassment if I get an unexpected visit.
Similar, my trees need some assistance in removing the vines growing up their trunks. I placed it on my to do list despite not finding this fact from a query.
Despite the immensity of the data available today, it does not describe not everything.
Recently, I finally started an account on Twitter. In my attempt to find some people to follow, I ran into a problem because twitter’s suggestion app only provides a briefest of introductions and just a couple recent tweets. Occasionally, I’ll get lucky and find one who has a link to a web site I can explore for some more information. But that eliminated too many possibly interesting feeds. I then started to pay attention the the number of other followers and the number of tweets. I grew my list of follows by selecting the ones that appear to be more active.
Then I watched my home page on twitter fill up with tweets from single accounts posting a tweet every few seconds with links to some article. I wonder if they possibly could have read each of those articles before recommending them. Perhaps they did and saved the list for a morning exercise of tweeting their recommendations. But my goal was to search for someone more like myself who seeks to tweet original thoughts. Original thoughts do not come that quickly.
Tweeting a recommendation to someone else’s content gets the tweet-counts up. Although I can imagine an audience for a focused aggregation like that, I prefer to considering articles that had some editorial contemplation of the relevance and quality of the content of the article, relevance that was deeper than the relevance of the referenced content’s author or title.
I also noticed that many twitter users promise to follow their their followers and indeed for them the two numbers are nearly the same. Since I strive to post original content, I don’t expect my number of followers to be very large, but I do hope that someone who shares my interests will eventually find and follow me. I am looking for the same in return, but I recognize the possibility that some people may just happen to very entertaining original content generators so filtering out large numbers of followers is not a good choice for my purposes of filtering interesting feeds.
Late I found out that there are services that will provide thousands of followers for a small fee. This fact at least hinted that the number of followers may mean nothing at all.
I’ve been on with twitter for less than 24 hours and spent most of that time querying the data to find interesting feeds. My lessons so far is that these queries are not very helpful. I’m finding data, but the data is not helping me in my search for someone who shares my interests.
My little exercise of casual querying of data about twitter accounts exemplifies how we use the term search today. A search is defined as a query of data and our goals is to find what we are looking for in the data. I think most will recognize the popular satisfaction of answering a request for a search with the results of a web-search engine. A request to search for something is satisfied by plugging the key words into a Google or Bing search.
I am complaining that this use of the term search for querying data was originally meant as an analogy to searching in the real world. A real world search requires an intelligent agent to do the searching for evidence that he doesn’t yet possess. The search would be for some fact about the real world. We reuse the term for data as an analogy of what a machine is doing to find some piece of data. Search is a metaphor for a machine query.
I prefer to keep the concepts distinct. I search. Data engines query.
I tried to illustrate this distinction with my brief initial search for interesting people who might be interested in me. I was searching because I was trying to find real people and the real traits they uniquely possess. I caught myself when I realized that I was querying data that happened to be under some control of a person. Querying data about twitter users is not the same thing as searching for mutually interesting people who use twitter.
I am being honest in admitting this is my very first entry with a Twitter account. However, I have been paying attention to it from a distance. I recall the earlier days when people would exchange Twitter tags as a form of a calling card. They would meet in person first, and then keep in touch through twitter. I did not get the impression that Twitter was ideal as an introduction service.
It seems to me that expecting a query to satisfy the goal of searching for people is a futile exercise. That goal requires a real search of getting out to places where I can encounter and introduce myself to interesting people. There is this small problem that the interesting people I am looking for are people like myself who don’t get out much.
In a round-about way I go back to querying data as a method of searching for people who are hard to meet in public. Experience tells me that doesn’t work. The reason why it doesn’t work is because query only has access to data and that data is not representative of the real person. In the above twitter feed search, the best evidence of interest is whether the individual has a personal blog and the number of tweets is comparable to the number of posts. That’s my goal: tweet to announce my latest thoughts. I will respond to tweets (if any) with future posts to add to my blog. The best hope I have for finding who I am interested in is finding someone who similarly uses his tweets as an envelop for his original blog post.
My brief experience so far indicates that this not easy. Despite my desire to tweet only for my blog posts, I have already tweeted in reply to other people’s tweets. I’m hoping that my tweets add something of original value but it is tempting tweet a response to tweets that were not original and involved very little thought by the one who posted it. If I get carried away, my tweet counts will look the same as someone who is not tweeting links to original content.
My motivation for writing this post was not to pick on Twitter. I just started twitter and I’m enthusiastic that I’ll find some potential value in it. I need some time to figure out how to make it work for me.
The reason why I started twitter is because I quit and closed out accounts I had on other services I had hoped would work for searching for new relationships: one service was devoted to business relationships, and the other was for personal ones.
My impression of the core value of those subscription based services is that mechanisms to make an introduction. In both cases the subscription fee permits contacting account holders with direct messages similar to email. While nothing stops me from finding email contact information for a variety of businesses using web searches or a variety of people using social networking sites like Facebook, these subscription services promises more protection of privacy to encourage exposing more information about a person’s (or business’s) goals and experiences.
It is probably a failing of my own world view, but I often equate the project of finding relationships with the project of finding an employer. After all both involve finding a mutually welcomed relationship. The actual information varies but that information can fit in the same data exchanges. The professional resume is the same as a personal profile. Following a company is the same as adding someone to the list of favorites. Initial contact involves email exchanges with the intent to convince other party to meet for an interview or a date. That process is pretty much identical.
I quit both services at the same time. That simultaneity was significant. Although there was a triggering event that motivated me to remove one person from my contacts, I didn’t do the sensible thing of doing just that and moving on. I allowed myself to follow my emotional reaction to abandon both efforts entirely. I let myself follow advice from an inner voice that I had not yet understood.
After cutting off the services, I was very critical of myself about over reacting. Certainly it was an over reaction in it the particulars. If I had remained more sensible it is possible that the services might have bore fruit in the near future. That possibility could keep me a satisfied subscriber indefinitely.
But very quickly I realized I made a very good choice. There was something fundamentally wrong with my expectations of using these services. Even though the services specifically sell themselves for what I expected, they are fundamentally unable to provide what I need. I need to search. The service only offered the opportunity to query.
Both services suffered from the same problem. The data available for querying was poor quality data. It is like the earlier examples observed in Twitter. On job site, a quality of an individual was measured by the size of his network, but this network size was manufactured. Joining a network means combining the networks of two people. The more people in a network, the more visibility you will have. But the network was false information. In most cases, the majority of people in a network don’t know anything about each other except for the data available the site.
The triggering event for me was someone who asked to be added to my network and then later announced to the network their dislike of my thoughts. That’s a pretty good sign that they didn’t know me in the first place. We’ve joined our networks when my goal is to connect with people. It wasn’t that kind of a connection.
To be fair, the job site discouraged this practice although it did praise the value of large networks. On the other hand, I’m skeptical about their motives because they make money by charging money for people to talk to people outside of their networks so the site may want to keep the networks from being too inclusive. That also explains why everyone is striving to be invited into everyone else’s network.
The personal relationship site had a different manipulation that was much more a fault of the site. On both sites, I was especially blunt and honest about describing who I am. I was not selling myself as some celebrity waiting to be discovered.
On the relationship site, the first thing I encountered was a helpful app that showed two profiles and asked which one do I like. I interpreted that to be some machine-learning training session to learn my preferences for the profile that was closer (but not close enough) for what I was looking for. What I didn’t realize is that each choice received a message that I liked their profile and that this is an opportunity to open a conversation. I figured this out when I started receiving my own notifications of others liking my profile. They were not liking my profile, they were simply playing that same machine-learning game.
The fatal complaint about the dating site is the same as the job site in that they got in the way of my project of searching. The dating site excitedly told me about interested parties that hundreds of miles away when I’m looking for someone local. The job site told me about interested employers based on some one-word skill when I’m interested in a broader concept of what the work is about. In both cases, they offered only an ability to query data. I don’t think either kind of relationship can be discovered from a data query.
For a job site, despite decades of experience quickly learning new software languages some company may indicated interest in my precisely because I had Perl listed as a skill. The dating site recommended excellent matches because we both liked pets and avoided alcohol. In both case, the query results were valid. The data did match what the query requested. The query proved that the available evidence (data) is not satisfying what what I’m looking for. The key evidence is not in their databases.
As an analogy, I can imagine myself getting along just fine with a like minded person who happens to occasionally drink too much or hates pets. What matters most to me is what never gets into the data: the reality of the person. Likewise, I fantasize that there are companies who might find my personal character to be valuable despite the fact I don’t know a thing about their business. That fantasy is based on real experience of three separate satisfying job experiences that started on exactly those terms.
Finally I realized that there was a fundamental error in the concepts of both sites. This realization initially occurred subconsciously. Perhaps this kind of realization could only have occurred subconsciously because it challenged a rational concept that querying data can be an efficient means to search. My realization that searching is fundamentally different from querying. In fact the two concept may be mutually incompatible. We search for what exists in reality. We query data.
The name of this blog site is hypothesis-discovery and this is a term I describe discovering something that goes beyond the data to realize a new explanation of the data. This discovery is similarly subconscious in origin. The above experience of realizing that querying data is distinct from searching may be an example of how hypothesis can be discovered. A newly discovered hypothesis is an answer to a previous frustration of not understanding the data.
In earlier posts, I made the point that the closest data gets to reality is what I called bright data. Bright data is well documented and controlled observations. Even bright data is disqualified as a representation of reality because it can only be historical data. No data can describe the current reality. As soon as data is created, the reality moves on and leaves the data being historical because data includes a time-stamp that will never be reproducible.
We can query data but data will never be reality. I tried to describe the problem earlier in that people (possibly life in general) will change behaviors based on their awareness of being observed. The observations may tell us what happened in the past. Observations can be matched with earlier observations to see what might happen next. But the entire concept of disruptive markets illustrates the industriousness of people figuring out a way to exploit new opportunities presented by new conditions such as the recognition they are being watched.
Data is not always bright data. As I described in the earlier examples, data can and most likely will be manipulated. A data point that someone likes me doesn’t mean that person even looked at my profile. A data point that someone has 10,000 contacts may mean that not a single one of them have even seen the spelling of that person’s name.
Bright data is rare. Most data or dim, dark, or even unlit (like the offer to sell 1000 twitter followers). The only thing one can learn from a query is that the query returned plenty of matching data in a reasonable amount of time. Such queries can be as entertaining as computer games.
The concepts of querying data whether it is described as Big Data or Internet of Everything are leading to the same result of distracting human attention away from searching the real world. Instead we are asked to spend our time immersed in a world of a big computer game. The game will be gamed. The real world will leave us behind.