ToF for Audio?
There are a number of systems used to create 3 dimentional affectures for typical 2 dimentional audio recordings. When recorded tracks are mixed in a studio, sound engineers can use a variety of ‘tricks’ to make sounds appear to have ‘depth’, and as far back as the early 1070’s, 4 channel consumer audio products became available, although they required a 4 speaker system. As ‘quad’ was never effective enough to justify the added cost of the hardware, such systems disappeared quickly and over time various techniques were used by hardware manufacturers to enhance typical stereo recordings, particularly for films. One such enhancement was the sub-woofer, which filtered out all frequencies above 200Hz and passed the remaining low end sinals to a specialized amplifier and speaker to bring out the vibrations associated with action movie soundtracks.
As TV technology continued to improve, many found the imbedded audio insufficient and companies like Dolby (DLB), DTS (XPER), THX (DIS), and iMAX (IMAX) began to find ways to enhance audio by creating processing systems that enhanced audio by using digital means to separate a stereo feed into a number of components (typically 6, known as 5.1) that maintained a left/right speaker set-up but added a center channel, two rear channels and the sub-woofer noted above. For years this was sufficient for most however as TV screens increased in size, 5.1 audio, which tends to be horizontal (as if performed on a stage), did not correspond to the fact that images on a large screen might be at the top or bottom and not always in the middle and so audio processing companies came up with the idea that since the audio was digitized, why don’t we add two more speakers (above and below, creating a 7.1 format.
Again, this was still not enough for designers who wanted to create even more realistic audio to match what was happening on large screens, and the idea of ‘objects’ was developed. Object oriented audio means that aside from the standard 7.1 locations where sounds can be located, certain sounds (objects) are given metadata that allows them to be placed anywhere in the spatial realm but also allows them to move in 3 dimensional space. This means that a sound engineer can assign a particular sound (loosely defined as a track) as an object and move it to match an object moving on the screen, no matter what direction it moves. Once assigned and tracked by the engineer that audio object will always appear to follow the image on the screen.
This technology is very effective in theaters and in residential environments where the required number of speakers can be placed, but with the expansion of mobile devices and earbuds, the world of audio plunged back into the dark ages. Early ear buds (some still are) sounded like tin cans and the many hours of time artists and engineers spent trying to make a recording sound optimal were reduced to a tiny vibrating disc that had the frequency response of a 1950’s car radio. Ear buds have improved, at least to a degree, but audio that was mixed to 5.1 or 7.1 became flat without the additional speakers that earbuds could not provide.
Implementations of Dolby Digital+™, the most popular digital audio format, began to appear in mobile devices, allowing 3D spatial audio to be implemented in headphones but was a bit limited and mixes had to be adjusted to compensate for the lack of 5.1 or 7.1 speakers, but Apple decided that it could do more than just provide relatively expensive earbuds to its users and came up with the idea of ‘Spatial Audio’. Spatial Audio uses the accellerometer and gyroscope found in mobile devices to map the sound field to the users head movements, so if the user turns toward the left, the audio moves the same way. As always, you would need an Apple device to use spatial audio (some Beats products also work) and the Air Pods must be 3rd generation to use the function, but with the release of iOS 16 Apple has taken the idea further and this is where ToF comes in.
Apple users can use their iPhone to ‘map’ their head and ears. Similar to the Face ID setup, you not only hold the phone in front of your head but you also ‘map’ each side so the system can get an image of your ears and head shape from a number of different angles. Once the process is completed iOS 16 will remember your ‘head configuration’ so changing Air Pods will not mean a bunch of new head scans. Once you have been scanned you now have ‘Personalized Spatial Audio’ which is said to reduce the audio artifacts that appear when algorithms convert 3 dimensional audio to headphone formats, but the difference is likely to be rather elusive for the average user. That said, the idea of mapping audio to each user is akin to making sure that the speakers in a 5.1 or 7.1 set-up are in the proper locations and is certainly a path toward improving audio when using earbuds.
Having been in the audio engineering business years before entering the financial world we can appreciate Apple’s pursuit of improving the audio experience, despite the fact that it is done through tiny speakers jammed intoi your ears. Any system that helps to recreate the subtleties that are so painstakingly added to recorded music are o.k. with us and we commend Apple for trying, even though we expect the average Air Pod user is likely to never look at audio settings or listens with one bud out. Yes we are audio purists and elitists and listen only to FLACs when using headphone (not earbuds) and vynal when listening on speakers, but we appreciate Apple’s efforts to make the earbud experience a bit more realistic.