Hearables and Auditory Virtual and Augmented Reality (2016)

Category: "Hearing Aids"

Hearables that do more than just listen

Posted by on Oct 13 2016 in Hearing Aids, Sensors

Despite public familiarity with digital hearing aids and related sound-processing devices, the initial market for hearable technology seems to be defined less by hearing than by other concerns. A few counterexamples aside (Doppler Labs, for example), many devices appear simply as new form factors for wearable fitness trackers (Bragi Dash, Samsung’s Gear IconX). For those applications, a variety of sensors come into play: accelerometers, heart-rate monitors, etc. But what about devices intended mainly to process sound and augment hearing? What use can they make of non-audio sensors? In this post, I want to explore examples from the research world, where the future of auditory augmentation looks increasingly “multisensory.”

Last time, I wrote about FM systems for hearing aids, and how those systems might be enhanced by room-level monitoring and restoration of auditory spatial cues (see Just around the corner: enhanced FM systems, hearables for concert-goers). Typically, FM systems use radio waves to broadcast the signal of a microphone directly to a listener’s hearing aids. If the microphone is located close to (or worn by) a target talker–a classroom teacher, for example–a tremendous advantage in signal-to-noise ratio can be achieved. The listener hears a clean signal, as if standing very close to the talker.

FM systems have clear advantages in many scenarios, but they are especially limited in situations with multiple talkers of potential interest, such as at a cocktail party. Some FM systems provide two microphones, so that two talkers can transmit on separate channels. Such a system might present a mix of both talkers at all times, or allow the listener to manually select one or the other, or employ some type of auto-switching algorithm (switching to the louder source, perhaps). There are advantages and disadvantages of each approach, but one thing is clear, regardless: part of the time, the system will transmit the “wrong” signal, so that the input fails to match the listener’s goals and/or attentional focus. A more difficult and effortful listening situation is thus created, potentially offsetting the signal-to-noise advantage of the FM system. The situation could be drastically improved if a system could track the listener’s attention in real time, and use that information to present listeners with the most relevant/important sound.

MicUp: Head-controlled multi-channel wireless for hearing aids.

One approach to monitoring listeners’ attention is to keep track of which talkers they are facing. As a conversation evolves and different talkers add to it, most listeners naturally turn their heads back and forth to follow the action. Directional microphones–which selectively amplify sounds arriving from the front–can take advantage of this fact. But directional microphones are not as powerful, in signal-to-noise terms, as FM wireless systems. Scientists Owen Brimijoin and Alan Archer-Boyd, working with the MRC Institute for Hearing Research in Glasgow, Scotland, have implemented a different approach using wireless transmission and simple computer vision. In their system, called “MicUp,” each of several talkers wears a small badge that carries a microphone and an infrared light. Invisible to human eyes, the lights flash in a pattern that can be detected by small infrared cameras worn on the listener’s head. The principle is very similar to that used by Nintendo’s Wii controller. Because each badge uses a unique pattern of light flashes, the camera can “see” which badge(s) the listener is facing toward, and a simple device can then adjust the level of those badges in the mix delivered to the hearing aids. Because the cameras can also see where each badge is, additional processing can provide spatial cues so that sounds appear to come from the correct location. The result combines the advantages of FM systems with rapid tracking of the listener’s attentional focus among multiple talkers. Although numerous challenges remain, such as how to track badges when other objects get in the way, and how to incorporate the cameras into comfortable wearable frames, systems like MicUp suggest how future systems might integrate audio, video, and data signals to provide seamless perceptual experiences. For more information, visit Dr. Brimijoin’s web page at https://www.nottingham.ac.uk/medicine/people/owen.brimijoin or Dr. Archer-Boyd's at http://www-hearing-research.eng.cam.ac.uk/Main/HearingPeople.

Visually guided hearing aids.

A related way to identify what listeners are attending to is to consider where they are looking. By monitoring eye position, the direction of gaze can be computed quite accurately. Eyeglass-mounted eye tracking for research is now available from a number of vendors, suggesting that real-time, all-day eye tracking may become available (and affordable) in the near future. Scientists at Boston University, led by Prof. Gerald Kidd, have begun testing a new new system that uses eye gaze to control steerable directional hearing aids (Kidd et al. 2013). A head-worn microphone array is used to implement directional audio “beam-forming.” The multiple microphone signals are combined in various ways to alter the directional pattern of microphone sensitivity (ranging from broad to narrow, front to side, etc.). The beam-forming system is controlled by the listener’s eye gaze, so that sound is amplified from wherever the listener is looking. Similarly to MicUp, the Visually Guided Hearing Aid combines auditory and visual information to enhance important over distracting sounds, here in a single wearable package.

Brain-computer interfaces in hearable devices?

MicUp and the Visually Guided Hearing Aid both aim to enhance sounds arriving from an attended direction, and use overt signals about attention (head and eye orientation) to do so. But human listeners can also pay attention to sounds without turning and looking directly at the target talker (psychologists call this “covert” attention). Could future devices measure attention by some other means, and use that information to augment auditory experience of covertly attended items?

It turns out that auditory responses in the human brain mimic key features of attended sounds. When listeners are presented with two competing speech streams, brainwaves measured with electroencephalography (EEG) or magnetoencephalography (MEG) entrain to the envelopes of the attended stream. Computer algorithms can then decode the signals from scalp-attached electrodes and determine which source the listener is attending (see Ding and Simon, 2012). Currently, this type of “brain reading” requires a great deal of data from multiple sensors. But in the future, decoding algorithms might exploit redundancies to achieve real-time performance with a smaller number of sensors. In fact, compact EEG sensors have already been developed to integrate into wearable and hearable form factors (Looney et al. 2012, Bleichner et al. 2015, Mirkovic et al 2016). These employ electrodes placed near or inside the ear to make electrical contacts in close proximity to the auditory parts of the human brain. These early studies have demonstrated the technological feasibility of such devices, as well as their sensitivity to auditory brain responses in competing-talker scenarios.

These three examples demonstrate the potential of harnessing information from a wide variety of sensors, sensory modalities, and data channels for auditory devices. Multi-sensor integration will lead to hearable devices and opportunities for auditory augmentation far beyond what could be provided by sound alone.

Thanks to Simon Carlile (Starkey Hearing Technologies), and Owen Brimjoin (MRC IHR) for specific discussions that inspired and led to this post.

Just around the corner: enhanced FM systems, hearables for concert-goers

Posted by on Sep 01 2016 in Uncategorized, Hearing Aids, Music

Last month, I posted (from 20 years in the future) about how the integration of hearable technology, augmented reality, and artificial intelligence might change the way we think about hearing aids and communication disorders. It only takes a bit of reflection to realize that the hearing aids of the future will offer capabilities even normal-hearing users will want to access. Similarly, many of the greatest benefits for impaired listeners may come from technologies developed for other purposes such as auditory telepresence and social communication. Today, I want to look a little closer to the present. What steps in these directions could be taken with today's technology? What applications might lie just around the corner that could benefit hearing aid users, or entertainment-minded listeners? Two exciting but achievable developments come to mind: enhanced FM systems for hearing-aid listening, and hearable applications for concert-goers.

Enhanced FM systems

Today's hearing aids aim to restore or enhance the audibility of target sounds–such as speech–in listeners with reduced auditory sensitivity (hearing loss). Amplification can enhance all sounds equally, or be programmed to enhance quiet sounds more than loud sounds (compression). Amplification can also be directional, amplifying sounds in front of the listener but not to the sides or behind. When a hearing-aid user knows in advance which talker they want to hear, another very powerful option becomes available: the talker can wear a microphone that transmits her speech directly to the hearing aids using FM radio signals. Using an FM system in this way, good audibility can be experienced no matter where the target talker stands in the room–even in the presence of other distracting noises.

Imagine an FM system used in a classroom setting. A student with hearing aids might normally experience tremendous difficulty understanding the teacher in a room full of restless kids. But with an FM system in place, the teacher's voice comes through loud and clear, beamed directly to hearing aids in both ears. The sounds of other children are still audible through the mic channels of the hearing aids, but the teacher's voice is heard as if through headphones. It's an invaluable and well-loved approach to giving impaired listeners the information they need to communicate effectively. Modern FM systems can adjust levels automatically and switch between channels tuned to different talkers. As the devices shift to digital audio signals, these capabilities will grow even more.

Despite their many clear benefits, FM systems are not perfect solutions. Some readers might have noticed from my description that the talker's voice is currently delivered to both ears at the same time. That means that the listener's perception is a lot like listening to music over headphones: sound appears in the middle of the head, rather than "out there" at the talker's location. This doesn't seem to be a problem for understanding speech, but it could certainly be a problem for spatial awareness: it may not be clear whether the teacher is instructing students nearby or across the room, or where to look when he requests "Eyes on me!". There is currently a lot of debate about whether this disruption of the natural spatial characteristics could be a problem for the development of spatial hearing. I'm not going to comment on that issue; instead, I'd like to imagine what we would need to build an FM system with more natural localization cues.

We know that the most important cues for sound localization are differences between sounds at the two ears. Specifically, sounds are louder and arrive earlier at the ear nearer to a sound source, giving rise to the so-called interaural level difference (ILD) and the interaural time difference (ITD) cues. An FM system capable of providing these cues would need to make small adjustments to the sound in each ear, and these would need to be updated as the talker moves around the room or the listener turns his head. Assuming that these signal processing steps are performed by a computer and not by the hearing aids (a safe assumption given current technology), that would also require broadcasting a separate signal to each hearing aid (i.e., the FM signal should be in stereo).

So, technically, we require a system that can (1) track the talker's location in the room, (2) track the listener's location and head orientation, and (3) broadcast a stereo signal to the hearing aids. Does such technology exist? Certainly. There are numerous products–at all different price points–designed to track motion and orientation using cameras, electrical signals, gyroscopes, etc. Some use remote cameras and are relatively non-intrusive (e.g., Microsoft Kinect), while others provide more accuracy but require a sensor or target to be worn (e.g., Vicon, Polhemus). The key point is that current motion-capture technology is already suitable for this application. Similarly, stereo broadcasting to hearing aids is also possible, given that many two-channel FM systems are currently in use.

A rudimentary binaural FM system could implement a very simple real-time algorithm to introduce ITD and ILD cues appropriate to the relative positions of talker and listener. These would provide reliable information that, when paired with motor and visual information, might even produce realistic spatial perception. A more advanced system might use head tracking data with recordings of head-related transfer functions in order to provide more realistic 3-D audio cues. Both are established approaches that any modern PC can implement in real time.

In all likelihood, our FM system would involve several components installed in a room (such as a classroom): at minimum, one or more motion-capture cameras and a PC. Could we also use installed hardware in place of the talker's body-worn microphone? An array of directional microphones embedded in the walls, ceiling, or furniture would be well suited to pick up the talker's voice. The challenge would be knowing which microphones to patch into the FM system, since some will be dominated by other noise sources. Recall, however, that the system is already required to track the talker's position in the room. This information could certainly be used to generate an appropriate mix of microphone signals that capture and isolate the talker's speech with no body-worn microphone at all. Given the right motion-capture software, it should even be feasible to track multiple potential talkers, adjusting the mix dynamically to emphasize them as they speak up.

So, how long until an FM listener and his student cohort can walk into a classroom and launch into a discussion, with the room invisibly tracking and adjusting FM signals to provide optimal signal-to-noise ratios and appropriate spatial cues for each talker? Certainly not 15 or 20 years. Each piece of this technology currently exists; it should be a matter of 1-2 years, or an Engineering Master's Thesis, to integrate them.

Hearables for concert-goers

By now, we all know that attending rock concerts without hearing protection is a bad idea. For many of us, that has meant progressing from disposable foam ear plugs (which kill all the high frequencies and make the music sound terrible) to spending $10-$15 on high-fidelity ear plugs with good frequency balance. You might even consider investing (wisely) hundreds of dollars in custom ear plugs, shaped–like hearing aids–precisely to your ears and offering customizable attenuation. It makes a big difference to listen in comfort and safety.

Musicians face a more serious and complicated version of the same problem: they are exposed more frequently–and for longer durations–than casual concert-goers, and they have a critical need to hear their music clearly as they perform. On-stage monitor speakers can present dangerously high levels of sound as the engineers attempt to overcome room and crowd noise while helping musicians hear themselves in the mix.

In-ear monitors have become an increasingly popular solution to this problem for musicians. Custom molded to individual ears, they block outside noise like powerful custom earplugs while their high-quality transducers act like custom earphones. Typically, the monitors receive an audio signal from the on-stage monitor mix, and engineers can adjust each musician's signal to craft an individual mix of all the instruments. The result is that each musician can hear themselves clearly while listening at a much lower level than with on-stage monitors.

As in every other field, the technology for in-ear monitors continues to advance. Monitor systems now transmit and receive wireless signals, with increasing options for "personal mixing systems" that allow each musician, rather than a sound engineer, to adjust their own mix directly. Such systems allow more flexibility in changing the monitor mix from song to song as performance needs change.

Much like hearable technology in general, in-ear monitors reflect the convergence of several technologies drawn from hearing aids (custom-molded inserts), earphones, and wireless communication technology. As such technologies continue to converge in hearable gear for the general public, will non-musicians want access to the capabilities that on-stage musicians have now? I asked my colleague, Erick Gallun, what that might look like:

Imagine attending a concert and, instead of slipping your earplugs in and shutting out your party's conversation, you insert your hearables and ~~set them to forward speech from your friends, but not other attendees, until the music starts [oops, getting ahead of ourselves here]~~ select from a number of mixes "published" by the sound engineer: standard front-of-house mix, vocals-heavy mix, front-of-house with crowd cancellation, etc. Or, dial in your own mix from the individual-instrument signals sent to musicians' monitors. Erick confessed that maybe only "music nerds" would want access to those signals. But, we reasoned, if hearables can stand in as hearing protectors (and they should), will they simply go silent to emulate ear plugs? Or should they provide a signal of some sort? And if so, what sort? The concert's own musical program seems the obvious choice.

The technology for this type of custom-mix concert is already available in the form of wireless audio transmitters and in-ear monitoring systems. Heck, a pretty solid demo could probably be built using a PC for local digital "broadcasting" and smartphone apps for the audience members. Would it be compelling enough to actually use? For many current concert-goers, it might not. But for many potential attendees who avoid concerts because they can't hear the band over all the noise, it just might.

One might also wonder how audiences will feel about attending concerts where each person listens through their own devices. Some concert-goers might find the experience socially isolating. Others might find the shared earphone experience to be more intimate. Interested in those issues? They are already being explored by pioneers of the "Silent Disco" movement.

Of course, there will probably be issues of copyright and broadcast licensing once bands start live-streaming to personal devices, but sooner or later an enterprising club or band will conduct the necessary experiments. With current technology, they could develop (and control) compelling new audience experiences. Eventually, though, hearables may become capable of forwarding signals directly to other devices. Imagine pulling up an audio stream from another listener in the first row, or dialing up a mix across potentially hundreds of time- and frequency-calibrated auditory viewpoints, cancelling out the various elements of crowd noise to obtain an ideal "crowd's ear view" of the performance. That type of sharing will open up amazing new possibilities, not just for music but throughout daily life. It will also expose extreme concerns about privacy and ownership of communication. That, however, is a discussion for another day.

Bin-Li: A short story about binaural listening agents and hearing aids of the future.

Posted by on Jul 29 2016 in Hearing Aids, Fiction

I "met" Bin-Li around the time of my 65th birthday, in 2036. I’d had hearing aids before…high-tech hearing aids that amplified the sounds my ears were no longer sensitive to. They had smart algorithms for reducing noise and different modes for focusing on a single conversation versus listening broadly to the world around me. They even had modes that were halfway decent for listening to music. But Bin-Li is different. Bin-Li (my audiologist told me this was short for “Binaural Listener”) is like a computerized agent that listens to sound through my own ears, understands, and remembers the events and conversations that are going on around me. She can even read my brainwaves–in a simple fashion–to help decide which parts I most want to hear and understand.

“Bin-Li, what did he just say?” Sometimes I feel like a broken record, asking Bin-Li to repeat something or recall an earlier part of the conversation. But then I think back to my grandfather, and his struggles with old-fashioned hearing aids. He never seemed to understand anything that was said, and he was always struggling with the volume setting, trying to find a balance where he could pick up someone’s voice without too much extra noise. He never could; instead, he spent most of his time withdrawn from conversations, sitting there with a blank or exasperated look. He was a fiercely intelligent man; you knew he had a lot to say, and that he desparately wanted to be part of the banter, if only he could make it out. Or I think back to my own father, who was constantly asking my mother to repeat what someone had just said. And how exasperated she was, that he never seemed to be paying attention to what she said, or what anyone else said.

Bin-Li’s calm and reassuring voice is never exasperated. She’s always there, close by my shoulder, ready to discreetly repeat or explain a bit of conversation. In response to “What did they say?,” Bin-Li will tell me, “The man on the left asked what restaurant you should visit tonight. The woman on the right responded that she’d had too much Chinese this week; maybe Thai would be better.” In fact, Bin-Li can usually identify each talker by name, and more: “Bin-Li, who is that speaking now?” She’ll reply “That’s Mary Wilson. She works at your daughter’s school, in the office. You met her last year at the Christmas party. She has a son, Jack, and a husband, John.”

Bin-Li is more than just a communication aid; she’s also a memory aid. She experiences my conversations; she can play them back, review them, and can even understand them. She can identify important items and add them to my itinerary or to my contacts. She can interface with my phone and use it, for example, to make restaurant reservations while I’m in a crowded, noisy bar. She can send messages, dictate notes. Many of these are things that my phone could do twenty years ago. But somehow it’s different, having her there with me, all the time. Especially now that it’s become so difficult for me to understand what people are saying around me.

Bin-Li’s voice is produced by two earpieces that seal snugly and comfortably in my ears. But her voice does not appear inside my head, like listening to music over headphones. Not normally, anyway; sometimes I like to have her voice close to my ear, a sort of “inner-voice” that guides me as I move through the world. But more often, I use the standard setting, which makes her appear as if she is in the room with me, just over my left shoulder. When I turn my head, her voice does not move along with it, but stays in the right place just like any other sound in the world. And she always sounds as if she is properly in the room I’m in. It’s hard to explain, but it’s very unlike listening to, say, an audiobook with my old-fashioned stereo earphones (or even modern "binaural" recordings). That always sounded strange and artificial, like a photo inserted haphazardly into a scene with the wrong lighting or camera angle. The result is quite literally "out of place:" a sound that comes from nowhere in particular, inside my head, or just somehow not belonging to the room I’m in. Bin-Li is different. She seems real, tangible. A lot of that, I think, has to do with where she seems to be when she speaks to me. Right there, just beyond my left shoulder. Always, that is, unless she finds someone standing in her place. Then she moves, as naturally as anything, to a different place where I can easily separate her voice from the others.

My old "directional" hearing aids made everything sound like it was in the middle of my head, and mushed together. But with Bin-Li, I hear separated talkers, in separated locations. When I turn my head to look at a talker, I hear that talker in the correct place. Usually, Bin-Li puts the talkers in the places they should be, so that when I look I can see the talkers in the locations I hear them. But Bin-Li can move the sources of sound to make it easier to tell them apart, if I ask her to. The new locations are always totally compelling. Just as with Bin-Li’s own voice, the locations appear fixed when I turn my head, and convincingly in the room.

Last week we went to a noisy jazz club. There was a lot of musical sound in the club–some coming from the band on stage, some coming from the PA speakers (which seemed to be everywhere)–not to mention the important conversation at our table. I asked Bin-Li to “collapse” the music and put it onstage. I’ve read a little about this, and find it extremely interesting. It’s a hard problem, because the sounds in the room–the music, the loudspeakers, the talkers–are mixed in with all kinds of echoes, reverberation, and noise. Bin-Li’s algorithms can sort that out, and in doing so they can figure out which sounds belong to the band, and which to the room itself. Bin-Li recreated the sound of the band, on the stage and with much less extra noise and reverberation–an acoustic experience much more like listening to music on my living-room stereo at home. It was a very pleasant experience, even for this hearing-impaired listener. I could hear the talkers at my table, each in their correct place, and still appreciate the music, which I could even turn toward and focus on when an interesting solo caught my ear.

I’m very thankful for Bin-Li and this new technology that has replaced my hearing aids. My communication is more effective, and I feel more connected to the space and to the people in it, my communication partners. Supplementing my own understanding and my memory for who is talking, Bin-Li makes me feel younger and more engaged.

But I’m not the only person using this technology. In fact, most of the users aren’t even hearing impaired at all. My kids and grandkids also have devices like Bin-Li. They call them “hearables;” an admittedly cutesy name that combines “hearing aids” with “wearable computing”. They use it for different things. Of course, they can use Bin-Li in much the same way I do, to remember conversations, identify people they’ve only met once or twice, to clean up a noisy listening environment. But mostly they use it for socializing with other users. These days, kids and younger adults always seem to be talking to someone who isn’t there. They wander the streets in animated conversations with real people who can’t be seen because they are located someplace else, but with whom they interact in much the same way they would if physically present. I suppose they never get bored or lonely, because their friends are always with them. And their friends can listen through their ears, to experience what’s happening in each others’ environment. I’ve even seen them do this while standing in the same room, at parties. When one of the kids shouts “Hey, you gotta listen to this,” their friends in the room and all around the world who are part of their current conversation can hear (in some kind of realistic sense that I don’t fully understand) what that person is talking about. They can play it back, experience the same space even though they might be on different continents, but most importantly experience the act of close conversation with their friends and colleagues.

Every once in a while, one of the kids calls me up like this. They don’t call it “calling;” they call it something else, but to me it seems like a phone call. There’s a little beep, and then Bin-Li tells me “Your grandson Jeffrey would like to speak to you. Should I add his layer?” When I say “yes,” suddenly it is as if Jeffrey is there in the room. If I closed my eyes, I would have a hard time telling that he isn’t. His voice sounds, just like Bin-Li, to be in the same room with me. When I turn my head, his voice stays in the correct place (just like all the other sound sources Bin-Li renders for me). We have a conversation: we laugh, we talk, we tell jokes. The exasperating thing is that the way kids use this technology, I never know when to hang up. They seem to just leave it on, like a full-time communication channel with each of the people in their lives. I suspect they “mute” the parts of their conversations they don’t want me to hear. Or maybe their version of Bin-Li knows which parts are addressed to me and which are not. Admittedly, I don’t understand this part, but it’s pretty interesting, and it’s really changed the world. People are running around having these “layered” conversations, regardless of their physical proximity.

I suppose we should have seen this technology coming. Twenty years ago, we certainly had earphones that fit in the ears, which people wore almost non-stop for music listening. We had advanced hearing aids that could take in sound, process it, and play the modified sound to the listener. We had the rudiments of artificially intelligent agents, in our phones: voices that we could talk to and make requests of. We had ubiquitous technology; everyone had a phone in their pockets. Now, I talk about my “phone” as if it’s a real thing, but it’s just a tiny function incorporated into Bin-Li. The world has sure changed.

Yes, even twenty years ago, everyone was running around with buds in their ears. The difference is, that back then they were isolated. They were isolated from the world around them, and they weren’t really integrated into the world of communication that they were trying to connect to. Some people ran around with “Bluetooth” headsets. They talked to people who weren’t there, much like the kids do today. But the people who weren’t there were simply voices in the ear; they didn’t really belong to the space, in the way that we now take for granted. I can hardly imagine how difficult a conference call with 8-12 people must have been back then.

Today’s technology is pretty amazing, and I can’t wait to see where it goes next. I wish I could have been there twenty years ago, as it was all coming together. As people were finally learning how to exploit spatial hearing to build “binaural listeners” that could understand an auditory space and the talkers in that space, and then to turn that information into realistic and comprehensible auditory scenes for both normal-hearing and hearing-impaired listeners.

People like me, with sensorineural hearing loss, have poor sensitivity to some sound frequencies due to a loss of hair cells in the ear. It’s less of an issue these days than in the past, before the advent of advanced hearing aids. Now we can very reliably amplify the affected frequencies and restore sensitivity. But other people suffer from communication disorders that are more “central” or “cognitive.” For them, the problem isn’t in the ear, it’s in the brain. Some have trouble understanding speech; others have trouble dealing with echoes and reverberation. There’s no quick fix for such people. You can’t just make some sounds louder, but Bin-Li works for them because she does so much more than that. Bin-Li can simplify the sounds to isolate a single talker, if necessary, repeat or explain parts of a conversation, or show them on a visual display. I don’t use a visual display myself, but I’ve seen demos that generate real-time captions even with multiple talkers. So regardless of the nature of the communication disorder, this technology has helped tremendously.

Today, this technology is everywhere: in the audiology clinic, the entertainment industry, and in normal day-to-day activity. I can’t imagine a young person today who would walk around without their “hearables” in place. As one of my grandkids put it recently, “It would be like walking around with your eyes closed.”

-Chris Stecker, Nashville, April 26 2016