Avatars Can Be Fun
Back in 1972 the game Maze War was developed by three high school students who were participating in a NASA work/study program to help visualize fluid dynamics for spacecraft. The project morphed into the development of what is considered the world’s first 1st person shooter game, which could be played over ARPANET, the precursor to the internet, and contained the world’s first avatar as seen in Figure 2. Game avatars continued to progress but GIFs began to surface in the 1990’s when internet chat became a reality, and typical 100x100 pixel avatars were used to represent the ‘chatter’, with game avatars moving toward more customization, while other games took a more cartoonish track, such as the Nintendo () Mario series.
That said, avatars have progressed considerably and as noted above, can take on some decidedly human characteristics as more sophisticated shading and lighting become possible to the average user, but when it comes down to it, avatars are obviously not people and are limited by the number of variable characteristics available to the avatar and the ability of software to manipulate those characteristics in a way that emulates a human form. Given that the face has over 24 individual muscles on each side, the possibility for creating human facial expressions is a daunting task, and while computing power continues to increase, much of human emotion resides in facial expressions, which could involve any number, if not all, of those muscles. Not only would they need to move realistically, but they also need to mimic speech, which would also include gestures and movements that are both subtle and must be realistic or they will contribute to the feeling that the avatar is not real.
We have noted that animators have been able to define human expressions in terms of muscle movements and have been able to translate that into animated characters (https://youtu.be/sCCRBg-byGM) but mapping facial expressions against the complexities of speech is a far more complex task. In our note we showed the use of ultra-realistic animations being used to either mimic existing newscasters or the creation of ‘new’ TV reporters that are astoundingly real and ‘read’ copy in real time, inclusive of facial expressions and body movements that make them hard to tell apart from their human counterparts.
Of course, there is the ‘deepfake’ crowd, that uses this technology to foment distrust by using social or political figures while overlaying speech that was never uttered, a very disturbing trend during a period when we can easily see the effects of misinformation across the internet, but even more disturbing was a recent article in Scientific American that is based on a survey and article recently published in The Proceedings of the National Academy of Sciences that concluded that AI synthesized faces are indistinguishable from real faces and are considered more trustworthy than real faces. The article describes the use of GANs (Generative Adversarial Networks) that use two neural networks, a discriminator and a generator to synthesize an image of a fictitious person.
The generator starts the process with a random group of pixels and with each iteration the discriminator takes that image and compares it to its database of real faces. If the generators face is different from the database images, which it is early on, the discriminator penalizes the generator which keeps trying until the discriminators says the ‘new’ face looks like the ones in its database. This takes many iterations but the result is a face that has the characteristics of those in the database, albeit not exactly the same as any one in particular. Such systems are used to fill in parts of photographs or art where damage has caused deterioration, to create virtual fashion models that require no photographer or the bevy of service people that are needed in real life, or to develop realistic avatars for games, and are used in broadcast to read copy.
However the study goes further pitting 315 participants against a roster of 128 faces as to whether they were real or fake. The average accuracy found was 48.2%, close to the 50% level that would express a ‘chance’ choice, with the images in Figure 5 showing those that were chosen as the most and least accurately, and a second test after participants were made aware of rendering artifacts (visual hints) and general feedback, saw an improvement in accuracy, but still close to the 50% ‘chance’ level. Taking the experiment further, the study had 223 participants rate the ‘trustworthyness’ of the faces on a scale of 1 to 7, with the results showing that the synthetic faces scored 7.7% higher than the real faces and women’s faces were 13.3% more trustworthy than men’s. The reasoning being that the synthetic faces were more toward the ‘norm’ than the real faces and therefore more trustworthy.
While the data generated by the study was oriented toward generating consistent data, it certainly points out the potential for the use of synthetic images and to cause confusion and distrust. Avatars are fun and in most cases are obviously representations of characters or humans that are exaggerated, but as the development of systems that can create realistic images and speech that are almost impossible for the average person to identify as fake, the potential for misuse increases. There are systems that can identify deep fakes but relative to the amount of content that is downloaded to the internet, they can only make a small dent, and the eventual expansion to the Metaverse will add to that potential content on a global scale.
We have no problem with the technology being used for creating realistic imagery and even ‘human-like’ figures, but it is necessary for those synthetics to be identified to the public as such or the eventual end will be the complete distrust of the public as to what it sees, other than on a physical basis. If you think fake Facebook accounts that spread misinformation are a scourge now just think of what it would be like to have images of political figures spouting words never spoken during a political campaign or global political leaders pushing aggressive agendas that have nothing to do with politics or detente. In 1971 when the Dramatics released “Whatcha See is Whatcha Get” it was true, but that ship has sailed. If you don’t think so, go to this website and watch some of the demo videos, which were made with software on a smartphone.