The Voice Controlled

Over 180 years ago, Charles Babbage, Victorian polymath and inventor of the first mechanical computer, proposed that the air is a ‘vast library’ of every word ever spoken. If he could be here today he would immediately notice that his proposal had actually become a reality. Contemporary technologies are fulfilling our obsession for ‘absolute recollection’ and Babbage’s thinking resonates today more than ever. Every word, image, action or movement of ours is forever impressed and recorded in an age of cloud computing and connectivity – at a time of digital archives of everything – and in a society of full-scale surveillance.

Starting from this ‘vast library’ and through a combination of phantasmagoric effects and emerging technologies, Rafael Lozano-Hemmer creates an ever-changing audio-visual environment to transport us into a three-dimensional, physical manifestation of Babbage’s proposal. In Atmospheric Memory the invisible “air” of recorded voices, memories or data, materialises in front of our eyes. Invisible clouds of spoken or whispered “voices” are unveiled in front of us and transformed into something we can touch and see. In a theatrical and engaging way, Lozano-Hemmer urges us though to look at these “voices” beyond than mere archives or libraries of words that would be otherwise lost. In a way, Atmospheric Memory presents us with how we have transformed the “air” surrounding us into a giant machine that always listens and records our actions. What are the implications of this obsession with monitoring everything that happens or being said? And can we ever reverse this?

The visionary English mathematician and writer Ada Lovelace, known mainly for her work on Babbage’s Analytical Engine had imagined the Analytical Engine to be able to do a lot more than used as a calculating machine. For instance, she thought the Analytical Engine could be used to produce algorithmic music – something that later became possible for computers, as she predicted. Lovelace envisioned Babbage’s machine to be able to manipulate content such as text, sound or images. Could Lovelace had imagined the Analytical Engine able to capture or acquire a voice?

Sixty years earlier, inventor Wolfgang von Kempelen, known for his chess playing pseudo-automaton – The Turk – made what was probably one of the earliest attempts to build a speaking machine. Kempelen explored the physiology of speech production and created a mechanism to appropriate the human voice. The speaking machine could produce sounds, words and even sentences, although hardly comprehensible. In the early 1860s, a young Alexander Graham Bell created a speaking “head” after seeing an automaton by Charles Wheatstone, inspired by Kempelen’s speaking machine. It will be years later and more research and experiments that will lead to Bell’s invention of the telephone, which opened up a new field of voice communication technology, but also a long journey of wiretapping and intercepting voices.

We have since learned to give up our rights to electronic privacy and surrender to a web of listening devices or smart speakers that have taken a place in our lives and occupied our domestic spaces. A whole world of connected objects, from home assistants, TVs and household appliances to wearables, cars, and urban objects and furniture constantly observe and listen to us, but we are getting accustomed to ignore this.

But long before the existence of these now ubiquitous connected devices, prevailed a series of monitoring, eavesdropping systems and architectures of control, from acoustical wall funnels and optical apparatuses to Jeremy Bentham’s panopticon prison structure composed of a central watching tower surrounded by cells. In the 17th century, celebrated scholar and polymath Athanasius Kircher designed statua citofonica or ‘talking statue’, an early intercom system with a series of spiral-shaped funnels hidden in the walls of a building that connected courtyards and public spaces creating a giant listening device. Kate Crawford and Vladan Joler reference Kircher’s statua citofonica in their recent brilliant mapping study of The Amazon Echo – in the context of human labour, data and material resources – talking about the Echo as a listening agent and “ear” in the home.

Alexa, the voice assistant who speaks to us through the Amazon Echo, along with other voice enabled devices and smart speakers that have taken a place in our homes, are not anymore perceived as commercial products; they are ‘actual voices’. By anthropomorphising Alexa, Siri or similar, by giving them a human-like voice, we forget what these systems really are. They become advisors, confidants, educators, they are there for us to converse with and trust them. Yes, every word we speak is probably recorded in the “air”; that invisible mesh of listening systems formed by our Alexas, Siris, Google Homes, thermostats and so on. One day your appliances will know your whereabouts, conversations, journeys, purchases, browsing histories. We tend to look at these devices as individual entities, although they are disembodied parts of a large, complex networked, corporate system. Kircher’s talking statues that reproduce the eavesdropped conversations from the street, picked from the spiral-shaped tubes, hidden in buildings, perfectly depict our surveillance society today.

Danish artist Cecilie Waagner-Falkenstrøm recently created FRANK, an artwork using Artificial Intelligence. FRANK is a humanised voice that can have conversations with people; it is a speaking oracle, a synthetic voice based on machine intelligence that gives personal guidance regarding existential dilemmas. You can talk to FRANK about personal anxieties, hopes, dreams and fears. FRANK will listen and give you “his” insightful direction and counselling. Actually, speaking to FRANK is not so different from how we talk to Siri, Alexa or how we might be interacting in the next few years with more voice interfaces.

Voice is an organ that identifies us and that is uniquely ours. From birth, voice – our first cry – marks our existence to the world, but it’s also one of the main modes of human expression, emotion and communication. More than 200 years after Kempelen’s speaking machine, computer systems that not only recognise, but can also synthesise human speech are commonplace, and our voice can serve as training data sets for teaching machines to speak imitating human speech. Applications such as Baidu, Deep Voice or Lyrebird use machine learning to create human-like artificial voices or enable the cloning of human speech. Imitating any human voice has become a reality that raises serious concerns and questions about how technologies like this could be used but also misused.

Our words are not only forever imprinted and captured in this ‘vast library’ of the Cloud, our (disembodied) voice is also taken over, borrowed by a machine. In The Invention of Morel, a novel by Argentine writer Adolfo Bioy Casares, the main character in the novel, a fugitive, is hiding on a desert island until one day he sees a group of tourists arriving, so he retreats away from them2. Among the tourists is a woman whom he spies on and falls in love with her. The fugitive tries to talk to the woman but she ignores him and slowly he realises none of the tourists notice him either. He also notices there are two moons and two suns, conversations are repeated, and the island visitors complain they are cold when it’s hot or swim in a pool or rotting fish. He finally finds out the truth, that one of the island visitors, named Morel, invented a machine capable of reproducing reality. The group of tourists that he sees then are a recording of Morel’s machine looping their actions in a week on the island forever. By recording them though, the machine has captured their souls, so they are all dead.

Voice is a metaphor for agency and power. In a society obsessed with capturing its actions, images and voices, we quickly see the romantic and positivist side of retaining memories, failing to see the context of metrics and surveillance. Who has control over what is kept and what is left behind? And who owns these vast libraries of our voices, actions, or data? In a contemporary society of absolute recollection, people with the weaker voices and the most vulnerable members of the community have even less control over becoming subjected to invisibility or hypervisiblity. So the possibility to go back, be forgotten or escape will be probably available to only the privileged few in the future.

Subscribe to our email updates