Last month, I posted (from 20 years in the future) about how the integration of hearable technology, augmented reality, and artificial intelligence might change the way we think about hearing aids and communication disorders. It only takes a bit of reflection to realize that the hearing aids of the future will offer capabilities even normal-hearing users will want to access. Similarly, many of the greatest benefits for impaired listeners may come from technologies developed for other purposes such as auditory telepresence and social communication. Today, I want to look a little closer to the present. What steps in these directions could be taken with today's technology? What applications might lie just around the corner that could benefit hearing aid users, or entertainment-minded listeners? Two exciting but achievable developments come to mind: enhanced FM systems for hearing-aid listening, and hearable applications for concert-goers.
Enhanced FM systems
Today's hearing aids aim to restore or enhance the audibility of target sounds–such as speech–in listeners with reduced auditory sensitivity (hearing loss). Amplification can enhance all sounds equally, or be programmed to enhance quiet sounds more than loud sounds (compression). Amplification can also be directional, amplifying sounds in front of the listener but not to the sides or behind. When a hearing-aid user knows in advance which talker they want to hear, another very powerful option becomes available: the talker can wear a microphone that transmits her speech directly to the hearing aids using FM radio signals. Using an FM system in this way, good audibility can be experienced no matter where the target talker stands in the room–even in the presence of other distracting noises.
Imagine an FM system used in a classroom setting. A student with hearing aids might normally experience tremendous difficulty understanding the teacher in a room full of restless kids. But with an FM system in place, the teacher's voice comes through loud and clear, beamed directly to hearing aids in both ears. The sounds of other children are still audible through the mic channels of the hearing aids, but the teacher's voice is heard as if through headphones. It's an invaluable and well-loved approach to giving impaired listeners the information they need to communicate effectively. Modern FM systems can adjust levels automatically and switch between channels tuned to different talkers. As the devices shift to digital audio signals, these capabilities will grow even more.
Despite their many clear benefits, FM systems are not perfect solutions. Some readers might have noticed from my description that the talker's voice is currently delivered to both ears at the same time. That means that the listener's perception is a lot like listening to music over headphones: sound appears in the middle of the head, rather than "out there" at the talker's location. This doesn't seem to be a problem for understanding speech, but it could certainly be a problem for spatial awareness: it may not be clear whether the teacher is instructing students nearby or across the room, or where to look when he requests "Eyes on me!". There is currently a lot of debate about whether this disruption of the natural spatial characteristics could be a problem for the development of spatial hearing. I'm not going to comment on that issue; instead, I'd like to imagine what we would need to build an FM system with more natural localization cues.
We know that the most important cues for sound localization are differences between sounds at the two ears. Specifically, sounds are louder and arrive earlier at the ear nearer to a sound source, giving rise to the so-called interaural level difference (ILD) and the interaural time difference (ITD) cues. An FM system capable of providing these cues would need to make small adjustments to the sound in each ear, and these would need to be updated as the talker moves around the room or the listener turns his head. Assuming that these signal processing steps are performed by a computer and not by the hearing aids (a safe assumption given current technology), that would also require broadcasting a separate signal to each hearing aid (i.e., the FM signal should be in stereo).
So, technically, we require a system that can (1) track the talker's location in the room, (2) track the listener's location and head orientation, and (3) broadcast a stereo signal to the hearing aids. Does such technology exist? Certainly. There are numerous products–at all different price points–designed to track motion and orientation using cameras, electrical signals, gyroscopes, etc. Some use remote cameras and are relatively non-intrusive (e.g., Microsoft Kinect), while others provide more accuracy but require a sensor or target to be worn (e.g., Vicon, Polhemus). The key point is that current motion-capture technology is already suitable for this application. Similarly, stereo broadcasting to hearing aids is also possible, given that many two-channel FM systems are currently in use.
A rudimentary binaural FM system could implement a very simple real-time algorithm to introduce ITD and ILD cues appropriate to the relative positions of talker and listener. These would provide reliable information that, when paired with motor and visual information, might even produce realistic spatial perception. A more advanced system might use head tracking data with recordings of head-related transfer functions in order to provide more realistic 3-D audio cues. Both are established approaches that any modern PC can implement in real time.
In all likelihood, our FM system would involve several components installed in a room (such as a classroom): at minimum, one or more motion-capture cameras and a PC. Could we also use installed hardware in place of the talker's body-worn microphone? An array of directional microphones embedded in the walls, ceiling, or furniture would be well suited to pick up the talker's voice. The challenge would be knowing which microphones to patch into the FM system, since some will be dominated by other noise sources. Recall, however, that the system is already required to track the talker's position in the room. This information could certainly be used to generate an appropriate mix of microphone signals that capture and isolate the talker's speech with no body-worn microphone at all. Given the right motion-capture software, it should even be feasible to track multiple potential talkers, adjusting the mix dynamically to emphasize them as they speak up.
So, how long until an FM listener and his student cohort can walk into a classroom and launch into a discussion, with the room invisibly tracking and adjusting FM signals to provide optimal signal-to-noise ratios and appropriate spatial cues for each talker? Certainly not 15 or 20 years. Each piece of this technology currently exists; it should be a matter of 1-2 years, or an Engineering Master's Thesis, to integrate them.
Hearables for concert-goers
By now, we all know that attending rock concerts without hearing protection is a bad idea. For many of us, that has meant progressing from disposable foam ear plugs (which kill all the high frequencies and make the music sound terrible) to spending $10-$15 on high-fidelity ear plugs with good frequency balance. You might even consider investing (wisely) hundreds of dollars in custom ear plugs, shaped–like hearing aids–precisely to your ears and offering customizable attenuation. It makes a big difference to listen in comfort and safety.
Musicians face a more serious and complicated version of the same problem: they are exposed more frequently–and for longer durations–than casual concert-goers, and they have a critical need to hear their music clearly as they perform. On-stage monitor speakers can present dangerously high levels of sound as the engineers attempt to overcome room and crowd noise while helping musicians hear themselves in the mix.
In-ear monitors have become an increasingly popular solution to this problem for musicians. Custom molded to individual ears, they block outside noise like powerful custom earplugs while their high-quality transducers act like custom earphones. Typically, the monitors receive an audio signal from the on-stage monitor mix, and engineers can adjust each musician's signal to craft an individual mix of all the instruments. The result is that each musician can hear themselves clearly while listening at a much lower level than with on-stage monitors.
As in every other field, the technology for in-ear monitors continues to advance. Monitor systems now transmit and receive wireless signals, with increasing options for "personal mixing systems" that allow each musician, rather than a sound engineer, to adjust their own mix directly. Such systems allow more flexibility in changing the monitor mix from song to song as performance needs change.
Much like hearable technology in general, in-ear monitors reflect the convergence of several technologies drawn from hearing aids (custom-molded inserts), earphones, and wireless communication technology. As such technologies continue to converge in hearable gear for the general public, will non-musicians want access to the capabilities that on-stage musicians have now? I asked my colleague, Erick Gallun, what that might look like:
Imagine attending a concert and, instead of slipping your earplugs in and shutting out your party's conversation, you insert your hearables and set them to forward speech from your friends, but not other attendees, until the music starts [oops, getting ahead of ourselves here] select from a number of mixes "published" by the sound engineer: standard front-of-house mix, vocals-heavy mix, front-of-house with crowd cancellation, etc. Or, dial in your own mix from the individual-instrument signals sent to musicians' monitors. Erick confessed that maybe only "music nerds" would want access to those signals. But, we reasoned, if hearables can stand in as hearing protectors (and they should), will they simply go silent to emulate ear plugs? Or should they provide a signal of some sort? And if so, what sort? The concert's own musical program seems the obvious choice.
The technology for this type of custom-mix concert is already available in the form of wireless audio transmitters and in-ear monitoring systems. Heck, a pretty solid demo could probably be built using a PC for local digital "broadcasting" and smartphone apps for the audience members. Would it be compelling enough to actually use? For many current concert-goers, it might not. But for many potential attendees who avoid concerts because they can't hear the band over all the noise, it just might.
One might also wonder how audiences will feel about attending concerts where each person listens through their own devices. Some concert-goers might find the experience socially isolating. Others might find the shared earphone experience to be more intimate. Interested in those issues? They are already being explored by pioneers of the "Silent Disco" movement.
Of course, there will probably be issues of copyright and broadcast licensing once bands start live-streaming to personal devices, but sooner or later an enterprising club or band will conduct the necessary experiments. With current technology, they could develop (and control) compelling new audience experiences. Eventually, though, hearables may become capable of forwarding signals directly to other devices. Imagine pulling up an audio stream from another listener in the first row, or dialing up a mix across potentially hundreds of time- and frequency-calibrated auditory viewpoints, cancelling out the various elements of crowd noise to obtain an ideal "crowd's ear view" of the performance. That type of sharing will open up amazing new possibilities, not just for music but throughout daily life. It will also expose extreme concerns about privacy and ownership of communication. That, however, is a discussion for another day.