Despite public familiarity with digital hearing aids and related sound-processing devices, the initial market for hearable technology seems to be defined less by hearing than by other concerns. A few counterexamples aside (Doppler Labs, for example), many devices appear simply as new form factors for wearable fitness trackers (Bragi Dash, Samsung’s Gear IconX). For those applications, a variety of sensors come into play: accelerometers, heart-rate monitors, etc. But what about devices intended mainly to process sound and augment hearing? What use can they make of non-audio sensors? In this post, I want to explore examples from the research world, where the future of auditory augmentation looks increasingly “multisensory.”
Last time, I wrote about FM systems for hearing aids, and how those systems might be enhanced by room-level monitoring and restoration of auditory spatial cues (see Just around the corner: enhanced FM systems, hearables for concert-goers). Typically, FM systems use radio waves to broadcast the signal of a microphone directly to a listener’s hearing aids. If the microphone is located close to (or worn by) a target talker–a classroom teacher, for example–a tremendous advantage in signal-to-noise ratio can be achieved. The listener hears a clean signal, as if standing very close to the talker.
FM systems have clear advantages in many scenarios, but they are especially limited in situations with multiple talkers of potential interest, such as at a cocktail party. Some FM systems provide two microphones, so that two talkers can transmit on separate channels. Such a system might present a mix of both talkers at all times, or allow the listener to manually select one or the other, or employ some type of auto-switching algorithm (switching to the louder source, perhaps). There are advantages and disadvantages of each approach, but one thing is clear, regardless: part of the time, the system will transmit the “wrong” signal, so that the input fails to match the listener’s goals and/or attentional focus. A more difficult and effortful listening situation is thus created, potentially offsetting the signal-to-noise advantage of the FM system. The situation could be drastically improved if a system could track the listener’s attention in real time, and use that information to present listeners with the most relevant/important sound.
MicUp: Head-controlled multi-channel wireless for hearing aids.
One approach to monitoring listeners’ attention is to keep track of which talkers they are facing. As a conversation evolves and different talkers add to it, most listeners naturally turn their heads back and forth to follow the action. Directional microphones–which selectively amplify sounds arriving from the front–can take advantage of this fact. But directional microphones are not as powerful, in signal-to-noise terms, as FM wireless systems. Scientists Owen Brimijoin and Alan Archer-Boyd, working with the MRC Institute for Hearing Research in Glasgow, Scotland, have implemented a different approach using wireless transmission and simple computer vision. In their system, called “MicUp,” each of several talkers wears a small badge that carries a microphone and an infrared light. Invisible to human eyes, the lights flash in a pattern that can be detected by small infrared cameras worn on the listener’s head. The principle is very similar to that used by Nintendo’s Wii controller. Because each badge uses a unique pattern of light flashes, the camera can “see” which badge(s) the listener is facing toward, and a simple device can then adjust the level of those badges in the mix delivered to the hearing aids. Because the cameras can also see where each badge is, additional processing can provide spatial cues so that sounds appear to come from the correct location. The result combines the advantages of FM systems with rapid tracking of the listener’s attentional focus among multiple talkers. Although numerous challenges remain, such as how to track badges when other objects get in the way, and how to incorporate the cameras into comfortable wearable frames, systems like MicUp suggest how future systems might integrate audio, video, and data signals to provide seamless perceptual experiences. For more information, visit Dr. Brimijoin’s web page at https://www.nottingham.ac.uk/medicine/people/owen.brimijoin or Dr. Archer-Boyd's at http://www-hearing-research.eng.cam.ac.uk/Main/HearingPeople.
Visually guided hearing aids.
A related way to identify what listeners are attending to is to consider where they are looking. By monitoring eye position, the direction of gaze can be computed quite accurately. Eyeglass-mounted eye tracking for research is now available from a number of vendors, suggesting that real-time, all-day eye tracking may become available (and affordable) in the near future. Scientists at Boston University, led by Prof. Gerald Kidd, have begun testing a new new system that uses eye gaze to control steerable directional hearing aids (Kidd et al. 2013). A head-worn microphone array is used to implement directional audio “beam-forming.” The multiple microphone signals are combined in various ways to alter the directional pattern of microphone sensitivity (ranging from broad to narrow, front to side, etc.). The beam-forming system is controlled by the listener’s eye gaze, so that sound is amplified from wherever the listener is looking. Similarly to MicUp, the Visually Guided Hearing Aid combines auditory and visual information to enhance important over distracting sounds, here in a single wearable package.
Brain-computer interfaces in hearable devices?
MicUp and the Visually Guided Hearing Aid both aim to enhance sounds arriving from an attended direction, and use overt signals about attention (head and eye orientation) to do so. But human listeners can also pay attention to sounds without turning and looking directly at the target talker (psychologists call this “covert” attention). Could future devices measure attention by some other means, and use that information to augment auditory experience of covertly attended items?
It turns out that auditory responses in the human brain mimic key features of attended sounds. When listeners are presented with two competing speech streams, brainwaves measured with electroencephalography (EEG) or magnetoencephalography (MEG) entrain to the envelopes of the attended stream. Computer algorithms can then decode the signals from scalp-attached electrodes and determine which source the listener is attending (see Ding and Simon, 2012). Currently, this type of “brain reading” requires a great deal of data from multiple sensors. But in the future, decoding algorithms might exploit redundancies to achieve real-time performance with a smaller number of sensors. In fact, compact EEG sensors have already been developed to integrate into wearable and hearable form factors (Looney et al. 2012, Bleichner et al. 2015, Mirkovic et al 2016). These employ electrodes placed near or inside the ear to make electrical contacts in close proximity to the auditory parts of the human brain. These early studies have demonstrated the technological feasibility of such devices, as well as their sensitivity to auditory brain responses in competing-talker scenarios.
These three examples demonstrate the potential of harnessing information from a wide variety of sensors, sensory modalities, and data channels for auditory devices. Multi-sensor integration will lead to hearable devices and opportunities for auditory augmentation far beyond what could be provided by sound alone.