Broader questions concerning assisted spelling

Narrated by AI

This post follows up on my last post concerning a video interview about a non-speaking person communicating through a keyboard with his father’s encouragement. In the earlier post, I briefly mentioned the criticism of the spell-to-communicate approach. That criticism is that the resulting communication is actually coming from the facilitator, perhaps subconsciously, by completing a thought and the encouraging the subject to spell out this thought that is in the facilitator’s mind.

As I watch the video, and after giving the benefit of the doubt that the presentation was not deceptive, I found this particular example is believable. The person with severe motor impairments is actually very intelligent but needs a communication tool that he can use. At the same time, I am not willing to defend my conviction to others who may object that the communication is not coming from the disabled person. For one thing, this is a fulfillment of the father’s earlier wishes, and he is clearly happy with this result. To get this, both he and his son needed to be trained how to collaborate in the communication. The father’s role may be to regularly encourage the son to continue spelling, but that still puts himself into the communication.

Again, I think this demonstration was genuine, but it is also easy to imagine someone deceiving themselves that they are communicating with their son or daughter when actually they are just communicating with themselves. Take for example the description where the son was given an open ended question asking for his opinion, and he answered in a surprising and clever way. The surprise factor convinces the parent that communication occurred.

I think about my writing in this blog. Frequently, I will write something that surprises me. Many of my blog posts end with an observation or conjecture that I did not originally have in mind when I started to write. Also, that observation would seem foreign to me. Often, I can’t believe that came from my mind, it was so unexpected. Despite that surprise, I do not claim that the blog is communicating to me.

Actually, I have in the past made that claim. The reason why I enjoy writing long-form so much is because it feels like I am talking to a different person with his own independent thinking, knowledge, wisdom, and cognition. When I was much younger, I described this experience as if my hands had a different intelligence than my mind. My mind would come up with a thought, but by the time it reached my fingers, what comes out is a counter argument. It is kind of like what just happened here. I argued against my own case. I can be convinced that I’m talking to an intelligence that happens to reside in my hands or finger tips. Consequently, it is not surprising that I can be convinced that the autistic person is the one actually communicating.

For this blog post, I want to describe a broader problem with spell to communicate. In recent decades, there has been an explosion of inter-personal communication facilitated by the Internet. This communication has connected people in different parts of the world, people who otherwise would never had communicated with each other. This communication is largely through text. Even now with the abundance of videos and podcasts, many of these are read from scripts, or they are free-form conversations concerning something that was written. My point is that modern communication relies more heavily on written words than in times before the modern Internet.

Written words are important for search engines and for translation services. Technology is getting better at converting speech into text or translating spoken words from one language to another. Even with that technology, the originator of the thought will want to capture his thoughts in words if he wants to maximize the potential of someone else finding it from a search engine or accurately quoted in someone else’s writing.

During the current pandemic rules, I played with the idea of using a text-to-speech utility on my phone. I reasoned very early that this virus appears to spread when people talk to each other. If everyone used a text-to-speech utility on the phone, then maybe that would slow the spread.

It is slow to type something, so I sought out a utility that would save particular questions, requests, or responses so that I could pull them up easily. I was thinking about using it first for a trip to the grocery store where I particularly talk to the person behind the deli, and the person behind the fresh fish counter. I knew what I wanted in advance, so I could save the specific request on the phone, then when I approached the counter, I would just press a button and the phone would speak what I wanted.

I never actually did that, but I toyed with the idea.

In my most recent trip to the store, the deli counter was unmanned by the normal staff and the person who served me did not understand my language very well. Even though he repeated what I said, he was unsure as to how much weight I wanted. He fished in his pocket and pulled out large wads of paper, shuffled through them until he found a hand-written list of different weights, and he asked me to point to the one I wanted. The list was in English, but I think he recognized it by the number: I pointed to the eighth option in his list. With that information he filled my order exactly.

Going back to my idea, I would not need to actual hit the text-to-speech button. I could just show them the text on the phone. The real lesson is that I didn’t even need a phone. All I needed was a piece of paper with my request written out.

If I am going to write out what I want in advance, and then show that list to someone at the store, I might as well just use one of those online delivery service. I do not use those because I do not trust them with the fresh stuff I buy. I want to see what the specific item looks like before I buy it. This is especially true for the fresh bread, fish, vegetables, and fruit. Even though I may have my heart set on something, I won’t buy it if it is does not look good. I also won’t buy something if it is the last item of its kind available, even if it looked ok. It will be a long while before I start using grocery delivery services.

Disregarding the brilliance of hand-writing things out on pieces of paper, the other approaches involve typing into a computer. The modern user interface for typing, especially when using on on-screen keyboard such as on a phone, is to have the computer offer suggestions based on the initial letters you are writing. If the suggested word is correct, you can click on it and save time from typing out the rest of the word.

This kind of assistive technology is available to everyone. We can pick suggested words instead of typing it out ourselves. When we do so, are the words truly our own?

I recall back to the video about the spell-to-communicate approach. In the examples, the facilitator patiently waits and encourages the writer to finish spelling the entire word even though the facilitator secretly figured out what the word would be after a couple letters. This is very similar to the automated word suggestions we all have, but in this case, the writer is required to fill out the entire word. This provides some confirmation that the word is actually his own instead of the facilitators, although there is still some doubt as to how much is truly his own.

In both examples of the human facilitator and the computer’s suggested words, the reader is thinking much faster than the writer. When there is such a disparity in speed, the presumed owner of the word should be the one that expressed it first.

A common experience even in spoken conversation is where the listener finishes the thought of the speaker. There is a frustration of having to wait for the speaker to finish saying what you already know he is going to say. Speaking in person is more tolerable because we can look for additional clues in body language, or we can interject non-verbally with body language. The frustration is amplified when on phone calls, and especially on multi-party conference calls. You know where the conversation is going but you have to wait for it to finish to its inevitable conclusion before going on to something else.

This conference call experience may be an analogy of what is happening inside the computer. It is right now probably watching me type and has already figured out what the sentence would be after I wrote the first word. It had to patiently wait for me to type it all out instead of auto-filling it in.

I imagine my mind being like that patient listener on the conference call. It is already on the next thought while it waits for the fingers to finish typing what its earlier thought was. There is no brain in the hand. If the mind is thinking a different thought than the thought that started the sentence, then the middle of the sentence is going to become muddled with the combination of both the old and new thought.

I assert that this kind race condition between the reader and writer does not happen in spoken conversations, especially when done in person. In conversations and even arguments, there is rarely long periods of soliloquy. The conversation exchange speakers after a sentence, or at most a few sentences. Because the listener and speaker are operating at the same speed, we have more confidence that the speaker is solely and completely owns his words. It is very unlikely that the words came from the listener through some subtle cues. Both listener and speaker are struggling to keep up with each other. Neither is idle for long enough to complete the thought of the other.

I don’t have a reference, but I recall a statement I think coming from Plato’s time where there was a public suspicion of people who only wrote and did not speak. I think this came from the sophists. They would seriously consider only those cases that are presented live and verbally, and even then, the speaker should not be reading from a prompter or be obviously repeating something memorized. The more trusted speaker is the that that chooses his words at the moment of speaking them. He obviously has a broader point to make, but he is making that point in real time instead of prepared in advance.

I think this respond to the same issue I raised, about the need for the speaker and listener to be operating at the same speed. If someone recites a prepared speech from memory or from a prompter, there is a predictability to the prose. When the listener regularly guesses what the speaker is going to say next, then the listener can question whether the thoughts are truly from the speaker. The same person can be both the speaker and the script-writer, but the spoken words of the script-writer and not the speaker. Also the spoken words would be historic instead of contemporary.

I recall earlier in my career where I assisted someone who had to give an early morning briefing to the executives. I recall that briefing being at around 5:30 in the morning. In any case I would not be there. I assisted by working late to get the information together for the next day. The report was to summarize what happened on the previous day, and luckily we measured the day on universal time, so that the day ended around 8PM local time. I had time to prepare my inputs before I went to bed. The point here is that the presenter had to give the briefing in person at that early hour. If the executives learned something happened between 8PM and the time of the briefing, they would ask about it, and they would expect the presenter to have some kind of response. My response to that was to prepare a dynamic dashboard that automatically summarizes the previous day but also automatically provides at least an automated assessment of what happened more recently. This is now common practice but it was innovative to that audience at that time.

We still have this suspicion of previously prepared content. We want words that are currently on the speaker’s mind. In my example, the dashboard was for the speaker, not the executives. The speaker had the benefit of having the more recent information. He had the opportunity to vocalize something relevant about recent events.

I believe this is what mattered most to the audience. They had confidence in the speaker because he was able to answer something he had no time to prepare for. Also, that answer would be one the listeners had no time to predict.

I am increasingly suspicious of written words, especially when presented as a representation of a conversation. This blog site falls into that category. I am suspicious of my own writing. That is way I describe this blog site as a trash can for my thoughts.

At least, so far, this site does not offer word- or sentence-completion suggestions. I would not doubt that is coming. It is necessary for people using phones or pads without keypad keyboards.

The word or phrase completion appears when using search engines. You start typing a few letters and it starts to offer suggestions. I do take advantage of it to complete the word, and then start typing the next word and pick the suggested word after that. It was surprising at first how quickly it guessed what I was about to search.

Then I learned that it was more complicated than just searching the dictionary for word choices starting with the same letters. The suggestions are based on popular searches for the same letters. With that knowledge, the experience changed. I would start typing out something I actually wanted, but I would pay attention to the suggestions because these were popular topics for some reason. Often I would select the topic because I was curious why it was so popular. My point here is this action was not a thought that occurred to me. The thought came the search engine. This is something people are talking about that happens to correspond somehow to what I started typing. I clicked on the suggestion because I was curious, but I was not curious before the suggestion appeared.

In normal experience, this assistive technology can be very manipulative. In the search engine case, it drew me into some current event I otherwise would have avoided. It also directed me toward a particular perspective on the event, usually something that was intended to provoke an emotional response. I would then bring this up in my next conversation with someone. This idea would not be my own.

Another area where assistive or predictive spelling occurs is in software programming where the tools are fluent in the computer language. In these tools, when you start typing, it would provide the list of correct commands or options to use in the precise context of where you are writing. This greatly reduces the compilation errors or even syntax errors that once made programming so annoying. Because I enjoy typing so much, I will often just type over the suggestion even though they are letter-for-letter identical. Even so, I welcomed this technology under the naïve expectation that this would make programming more acceptable to those who shun it. They don’t even need to start typing, they can use the search feature to find and populate the commands without being responsible for spelling it all out correctly. In this case, the computer is doing the actual programming or coding. Ultimately, this technology would automate the coding so completely that the only thing asked from the author is his intention. Even that may be predicted.

I feel a lot of the recent political escalation stems from how much of our conversations has replaced interactive verbal communication with scripted communication. The recorded messages in blogs, micro-blogs, podcasts, or videos are delayed and often are drawn out to the extent that the listener can often guess what is going to be said. Because he can guess what is coming, it is reasonable for him to assume he had heard it before, that it was not an original thought of the speaker. The speaker may be saying what was already said elsewhere. Usually that would be correct. Ultimately there is a loss of faith in the speaker when his thoughts or even exact words can be predicted.

Compounding the problem is the encouragement for the listener or reader to complete the author’s thoughts and skip ahead or to another topic. The reader is convinced he knows what the remaining thought will be. I have been guilty of this myself. I will start reading or listening to something and then conclude that there is nothing to gain by spending any more time on it. Sometimes I learned I misjudged, and something unique did come up, and that something could change my mind on something.

Technology is teaching us that our thoughts are predictable. Technology is conditioning us to begin to predict what others have to say. The consequence is that we are losing patience we once had to hear to read the entirety of what the thinker has to say. Because we trust our predictions, every topic becomes polarized into binary extremes. The prediction choices are either the speaker is on my side or he is on the other side. In both cases, I know what he is going to say, so why waste time listening.

The problem posed in the spell-to-communicate approach for autism is a broader problem for all of us. We are all spelling to communicate for increasingly larger proportions of our communications with others. For all of us, this introduces the doubts that the thoughts and ideas are truly our own. The slowness of spelling things out gives the opportunity for someone or something to insert their thoughts into our consciousness. As that happens, we lose the diversity of thought we once enjoyed when we relied much more on in person verbal conversations.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s