Hearables and Auditory Virtual and Augmented Reality (2016)

Bin-Li: A short story about binaural listening agents and hearing aids of the future.

Posted by on Jul 29 2016 in Hearing Aids, Fiction

I "met" Bin-Li around the time of my 65th birthday, in 2036. I’d had hearing aids before…high-tech hearing aids that amplified the sounds my ears were no longer sensitive to. They had smart algorithms for reducing noise and different modes for focusing on a single conversation versus listening broadly to the world around me. They even had modes that were halfway decent for listening to music. But Bin-Li is different. Bin-Li (my audiologist told me this was short for “Binaural Listener”) is like a computerized agent that listens to sound through my own ears, understands, and remembers the events and conversations that are going on around me. She can even read my brainwaves–in a simple fashion–to help decide which parts I most want to hear and understand.

“Bin-Li, what did he just say?” Sometimes I feel like a broken record, asking Bin-Li to repeat something or recall an earlier part of the conversation. But then I think back to my grandfather, and his struggles with old-fashioned hearing aids. He never seemed to understand anything that was said, and he was always struggling with the volume setting, trying to find a balance where he could pick up someone’s voice without too much extra noise. He never could; instead, he spent most of his time withdrawn from conversations, sitting there with a blank or exasperated look. He was a fiercely intelligent man; you knew he had a lot to say, and that he desparately wanted to be part of the banter, if only he could make it out. Or I think back to my own father, who was constantly asking my mother to repeat what someone had just said. And how exasperated she was, that he never seemed to be paying attention to what she said, or what anyone else said.

Bin-Li’s calm and reassuring voice is never exasperated. She’s always there, close by my shoulder, ready to discreetly repeat or explain a bit of conversation. In response to “What did they say?,” Bin-Li will tell me, “The man on the left asked what restaurant you should visit tonight. The woman on the right responded that she’d had too much Chinese this week; maybe Thai would be better.” In fact, Bin-Li can usually identify each talker by name, and more: “Bin-Li, who is that speaking now?” She’ll reply “That’s Mary Wilson. She works at your daughter’s school, in the office. You met her last year at the Christmas party. She has a son, Jack, and a husband, John.”

Bin-Li is more than just a communication aid; she’s also a memory aid. She experiences my conversations; she can play them back, review them, and can even understand them. She can identify important items and add them to my itinerary or to my contacts. She can interface with my phone and use it, for example, to make restaurant reservations while I’m in a crowded, noisy bar. She can send messages, dictate notes. Many of these are things that my phone could do twenty years ago. But somehow it’s different, having her there with me, all the time. Especially now that it’s become so difficult for me to understand what people are saying around me.

Bin-Li’s voice is produced by two earpieces that seal snugly and comfortably in my ears. But her voice does not appear inside my head, like listening to music over headphones. Not normally, anyway; sometimes I like to have her voice close to my ear, a sort of “inner-voice” that guides me as I move through the world. But more often, I use the standard setting, which makes her appear as if she is in the room with me, just over my left shoulder. When I turn my head, her voice does not move along with it, but stays in the right place just like any other sound in the world. And she always sounds as if she is properly in the room I’m in. It’s hard to explain, but it’s very unlike listening to, say, an audiobook with my old-fashioned stereo earphones (or even modern "binaural" recordings). That always sounded strange and artificial, like a photo inserted haphazardly into a scene with the wrong lighting or camera angle. The result is quite literally "out of place:" a sound that comes from nowhere in particular, inside my head, or just somehow not belonging to the room I’m in. Bin-Li is different. She seems real, tangible. A lot of that, I think, has to do with where she seems to be when she speaks to me. Right there, just beyond my left shoulder. Always, that is, unless she finds someone standing in her place. Then she moves, as naturally as anything, to a different place where I can easily separate her voice from the others.

My old "directional" hearing aids made everything sound like it was in the middle of my head, and mushed together. But with Bin-Li, I hear separated talkers, in separated locations. When I turn my head to look at a talker, I hear that talker in the correct place. Usually, Bin-Li puts the talkers in the places they should be, so that when I look I can see the talkers in the locations I hear them. But Bin-Li can move the sources of sound to make it easier to tell them apart, if I ask her to. The new locations are always totally compelling. Just as with Bin-Li’s own voice, the locations appear fixed when I turn my head, and convincingly in the room.

Last week we went to a noisy jazz club. There was a lot of musical sound in the club–some coming from the band on stage, some coming from the PA speakers (which seemed to be everywhere)–not to mention the important conversation at our table. I asked Bin-Li to “collapse” the music and put it onstage. I’ve read a little about this, and find it extremely interesting. It’s a hard problem, because the sounds in the room–the music, the loudspeakers, the talkers–are mixed in with all kinds of echoes, reverberation, and noise. Bin-Li’s algorithms can sort that out, and in doing so they can figure out which sounds belong to the band, and which to the room itself. Bin-Li recreated the sound of the band, on the stage and with much less extra noise and reverberation–an acoustic experience much more like listening to music on my living-room stereo at home. It was a very pleasant experience, even for this hearing-impaired listener. I could hear the talkers at my table, each in their correct place, and still appreciate the music, which I could even turn toward and focus on when an interesting solo caught my ear.

I’m very thankful for Bin-Li and this new technology that has replaced my hearing aids. My communication is more effective, and I feel more connected to the space and to the people in it, my communication partners. Supplementing my own understanding and my memory for who is talking, Bin-Li makes me feel younger and more engaged.

But I’m not the only person using this technology. In fact, most of the users aren’t even hearing impaired at all. My kids and grandkids also have devices like Bin-Li. They call them “hearables;” an admittedly cutesy name that combines “hearing aids” with “wearable computing”. They use it for different things. Of course, they can use Bin-Li in much the same way I do, to remember conversations, identify people they’ve only met once or twice, to clean up a noisy listening environment. But mostly they use it for socializing with other users. These days, kids and younger adults always seem to be talking to someone who isn’t there. They wander the streets in animated conversations with real people who can’t be seen because they are located someplace else, but with whom they interact in much the same way they would if physically present. I suppose they never get bored or lonely, because their friends are always with them. And their friends can listen through their ears, to experience what’s happening in each others’ environment. I’ve even seen them do this while standing in the same room, at parties. When one of the kids shouts “Hey, you gotta listen to this,” their friends in the room and all around the world who are part of their current conversation can hear (in some kind of realistic sense that I don’t fully understand) what that person is talking about. They can play it back, experience the same space even though they might be on different continents, but most importantly experience the act of close conversation with their friends and colleagues.

Every once in a while, one of the kids calls me up like this. They don’t call it “calling;” they call it something else, but to me it seems like a phone call. There’s a little beep, and then Bin-Li tells me “Your grandson Jeffrey would like to speak to you. Should I add his layer?” When I say “yes,” suddenly it is as if Jeffrey is there in the room. If I closed my eyes, I would have a hard time telling that he isn’t. His voice sounds, just like Bin-Li, to be in the same room with me. When I turn my head, his voice stays in the correct place (just like all the other sound sources Bin-Li renders for me). We have a conversation: we laugh, we talk, we tell jokes. The exasperating thing is that the way kids use this technology, I never know when to hang up. They seem to just leave it on, like a full-time communication channel with each of the people in their lives. I suspect they “mute” the parts of their conversations they don’t want me to hear. Or maybe their version of Bin-Li knows which parts are addressed to me and which are not. Admittedly, I don’t understand this part, but it’s pretty interesting, and it’s really changed the world. People are running around having these “layered” conversations, regardless of their physical proximity.

I suppose we should have seen this technology coming. Twenty years ago, we certainly had earphones that fit in the ears, which people wore almost non-stop for music listening. We had advanced hearing aids that could take in sound, process it, and play the modified sound to the listener. We had the rudiments of artificially intelligent agents, in our phones: voices that we could talk to and make requests of. We had ubiquitous technology; everyone had a phone in their pockets. Now, I talk about my “phone” as if it’s a real thing, but it’s just a tiny function incorporated into Bin-Li. The world has sure changed.

Yes, even twenty years ago, everyone was running around with buds in their ears. The difference is, that back then they were isolated. They were isolated from the world around them, and they weren’t really integrated into the world of communication that they were trying to connect to. Some people ran around with “Bluetooth” headsets. They talked to people who weren’t there, much like the kids do today. But the people who weren’t there were simply voices in the ear; they didn’t really belong to the space, in the way that we now take for granted. I can hardly imagine how difficult a conference call with 8-12 people must have been back then.

Today’s technology is pretty amazing, and I can’t wait to see where it goes next. I wish I could have been there twenty years ago, as it was all coming together. As people were finally learning how to exploit spatial hearing to build “binaural listeners” that could understand an auditory space and the talkers in that space, and then to turn that information into realistic and comprehensible auditory scenes for both normal-hearing and hearing-impaired listeners.

People like me, with sensorineural hearing loss, have poor sensitivity to some sound frequencies due to a loss of hair cells in the ear. It’s less of an issue these days than in the past, before the advent of advanced hearing aids. Now we can very reliably amplify the affected frequencies and restore sensitivity. But other people suffer from communication disorders that are more “central” or “cognitive.” For them, the problem isn’t in the ear, it’s in the brain. Some have trouble understanding speech; others have trouble dealing with echoes and reverberation. There’s no quick fix for such people. You can’t just make some sounds louder, but Bin-Li works for them because she does so much more than that. Bin-Li can simplify the sounds to isolate a single talker, if necessary, repeat or explain parts of a conversation, or show them on a visual display. I don’t use a visual display myself, but I’ve seen demos that generate real-time captions even with multiple talkers. So regardless of the nature of the communication disorder, this technology has helped tremendously.

Today, this technology is everywhere: in the audiology clinic, the entertainment industry, and in normal day-to-day activity. I can’t imagine a young person today who would walk around without their “hearables” in place. As one of my grandkids put it recently, “It would be like walking around with your eyes closed.”

-Chris Stecker, Nashville, April 26 2016

Category: "Fiction"