AI for CE
Among the most common “AI” tools is the spell checker usually included in word processors, an outgrowth of Ralph Gorin’s 1971 SPELL program that checked letters in a word against a list of correctly spelled words by comparing single or adjacent letters, and while early spell checkers were limited to Mainframes, they migrated quickly to PCs in the early 1980’s. These relatively simple systems were based on a comparisons, which would imply that the more words to compare against the more accurate the checker, but it was found that as the word lists grew over 100,000 the number of misspellings increased.
The problem facing early spell checkers is that they were not able to discern context, so they faced difficulties with words that sounded the same but had different spellings and different use cases, such as ‘there’ and ‘their’. As local computing power and storage increased, systems were designed that could evaluate context by comparing phrases or sentences against grammatical rules that were embedded in the system and could determine the correct spelling of a word based on both the word list and the rules, but data scientists found that as processing power and storage increased further, they could enter millions of data points that AI systems could ‘look at’ to better understand context, not only for language, but for almost anything, leading to the development of AI algorithms that fall into three classes.
Supervised – The most common category, where labeled data is feed to a system and is used to train the system to predict outcomes as it is given new data. Typical systems uses decision trees that test the data using sophisticated ‘if…then’ nodes to move through the decision process until a conclusion is drawn that is based on all of the tagged data that the system has seen. Systems like these are the basis for systems that have been in the press in recent months, such as those that can ‘paint’ a picture based on the style of a particular painter, having been fed the digital images of all of that artists work. The system can create an image based on the user’s request using the characteristics it has recognized for that artist. In its simplest form we asked an AI art generator to paint a picture of daisies in a vase using a generic style and Figure 1 is the result.
Unsupervised – This category does not use tagged data, so the algorithm must evaluate the relationships between the data points without specific definitions, which they tend to do by creating clusters of similar data around central data points, and using a number of methodologies to evaluated the clustered data.
Reinforcement – This category is a bit like training an animal, as the system uses an ‘agent’ to perform an action within a certain ‘environment’, and then rewards the agent when the action is completed. The agent uses the reward (positive or negative) to help it with the next iteration of the action until the environment ends the process, just as rewarding an animal for a correct response helps the animal to understand what the correct response is.
Both of the applications mentioned are common to CE products and in many cases are integral to their function, but AI continues to develop and will become more integral in CE products and daily life. Having a background in the music industry, we were intrigued by Google’s (GOOG) MusicLM system, a bit less sophisticated than the artistic systems mentioned above, but nonetheless amazing in its ability to create music from text. In its simplest application the words “acoustic guitar” or “flute” generate the following sound clips[1]:
https://google-research.github.io/seanet/musiclm/examples/audio_samples/10s_samples/instruments/acoustic-guitar.wav,
https://google-research.github.io/seanet/musiclm/examples/audio_samples/10s_samples/instruments/flute.wav
but more interesting is how it generates clips from the text “blues”, “West coast Hip-hop”, “East coast Hip-hop”, or “Reggae”
https://google-research.github.io/seanet/musiclm/examples/audio_samples/10s_samples/genres/blues.wav
https://google-research.github.io/seanet/musiclm/examples/audio_samples/10s_samples/genres/west-coast-hip-hop.wav
https://google-research.github.io/seanet/musiclm/examples/audio_samples/10s_samples/genres/east-coast-hip-hop.wav
https://google-research.github.io/seanet/musiclm/examples/audio_samples/10s_samples/genres/reggae.wav
However things get even better when the system gets more complex text that does not specify a genre but a caption, such as “beach in the Caribbeans” or “escaping prison”,
https://google-research.github.io/seanet/musiclm/examples/audio_samples/10s_samples/places/beach-in-the-caribbeans.wav
https://google-research.github.io/seanet/musiclm/examples/audio_samples/10s_samples/places/escaping-prison.wav
with applications in TV, gaming and advertising quite obvious. Not only is the system able to create genre related music or even contextual music, but it has also learned about diversity across all forms of music and with the same text token “progressive rock guitar solo” came up with a number of different guitar solos based on its learned data.
https://google-research.github.io/seanet/musiclm/examples/audio_samples/diversity-samples/progressive-rock-guitar-solo/same-text/0.wav
https://google-research.github.io/seanet/musiclm/examples/audio_samples/diversity-samples/progressive-rock-guitar-solo/same-text/1.wav
https://google-research.github.io/seanet/musiclm/examples/audio_samples/diversity-samples/progressive-rock-guitar-solo/same-text/2.wav
While this is only a single AI application, there are many, including the famous/infamous ChatGPT and its many clones, that will find their way into CE applications regardless of whether we believe they are useful, helpful, legal, moral, or justified as they represent ways in which content can be created without the expenses associated with human involvement. We have given up moralizing about the potential ‘disintegration of society’ that AI has the potential to cause, since almost all generations have heard the same thing about some technological change (TV, video games, CDs, etc.), and society has found a way to adapt to the change. However, as only a small portion of humankind has the desire to challenge the validity of textual or visual media, it will take some time before we understand how AI will affect the fabric of human creativity. Will it turn us into a society of eloi[2], allowing AI to keep us docile with a constant supply of curated entertainment, or will it free us from the burdens of everyday life, allowing our creativity to blossom? Honestly, we don’t have a clue but to paraphrase Bette Davis “Fasten your seatbelts, it’s going to be a bumpy ride.”
[1] These sound clips reference the Google MusicLM site links for each file. Ctl+Click should play each individual file.
[2] “The Time Machine” – H.G. Wells