This is default featured slide 1 title
This is default featured slide 2 title
 

Exploring the Future of Musical A.I

Here is a song called “Daddy’s Car.” If it appeared in some YouTube ad for Bounty paper towels, you probably wouldn’t blink. It sounds like a fourth-generation photocopy of the Beatles. Under normal circumstances, I would probably listen to it once, deem it “successfully generic” and move on.

But “Daddy’s Car” was written under highly abnormal circumstances; it was dreamed up by a computer’s brain. That brain was built by a Sony-owned software company called Flow Machines, which tapped its vast neural network to compose a melody in the “style of the Beatles,” a melody that was then tweaked and finessed by a French musician named Benoît Carré. For all its apparent unoriginality, “Daddy’s Car” is one of the most remarkable unremarkable songs ever written.

Computers are writing more and more music these days. But they’re not following a mysterious, inward-springing muse, humming melodies because they liked the way the clouds looked outside the window that day. No, like everything else machines do, they are making music because we are telling them to—and companies are investing millions of dollars in this complicated act of ventriloquism. Besides Flow Machines, which is funded by a $2.5 million grant from the European Research Council, there is also Magenta, an initiative launched by Google last year

Here’s the thing: We really, really want our machines to talk to us. Tous, for us, in our own voices, in a voice we’ve never heard before—it doesn’t matter, we just want the company. Ever since the human voice shot across telephone wires, we have been staring at cold coils and trying to animate them. By now, we’ve got the talking part pretty much down—we can wake up our computers by yelling “hey” at them, or tell them to remind us to watch the Warriors game six days from now, or ask them absently to dim the lights a little, and they’ll comply.

The next logical step is to make them sing and play to us. But giving machines a creative voice, literal inspiration (inspirare, meaning “to breathe”)—that is a much slipperier problem, tougher to program, than making them talk. To hone in on the luminous stuff coming out of a mouth or an instrument, to capture the sounds themselves, let alone ones abiding by their own internal logic and order—what’s the math for that? Where do the notes stop and the music begins?

Douglas Eck works at Magenta, an offshoot of the artificial-intelligence project Google Brain that aims to develop “algorithms that can learn how to generate art and music.” He’s an engineer as well as a musician. “The pinnacle of my career was in my early 20s, I played a lot of bluegrass and punk in coffee houses in Indiana; Johnny Cash meets Johnny Rotten. I had dozens of people see me play live,” he cracks. He’s thought a lot about the difference between the notes and the music, in a philosophical sense.

Perhaps surprisingly, Eck rejects the fundamental idea of a Turing Test for a pop song. He doesn’t want Magenta to fool listeners, or replace musicians.  “We’re not chasing the idea that we can make a human-feeling music without humans,” Eck says. “I’m not that interested in being able to push a button and have a computer make something that is emotionally evocative by itself. While I think that’s an interesting goal, it’s not my goal.”

The biggest stumbling blocks he and his team hit, he admits, were human—and not machine-made. Good old cognitive bias reared its head: “We didn’t start by thinking about what musicians want, which is funny and surprising because we’re all musicians,” he says, laughing. “The first round of Magenta looked like it was engineered by a bunch of Googlers.”

Poking around the open-source forum as a layperson quickly confirms Eck’s diagnosis. It’s a place of command lines and prompts, a coder’s realm. “We’re still stuck in that mode,” he says, adding somewhat comically, “and we’re kind of engineering our way out of it.”

“This kind of problem shows up everywhere,” he continues. “One challenge for self-driving cars is that you’re used to making eye contact with the driver before you know if it’s safe to cross the street. But how do you make eye contact with a driverless car? For the same reason, building machine-learning tools that musicians actually useis tough.” They have to pass “the guitar pedal test,” Eck says, presumably meaning they beg musicians to pick them up, fiddle with them, and see what kind of funny sounds they can make.

Watch how Google’s Magenta uses neural networks to play a piano “duet” with real-life musicians.

For now, though, Magenta is a playground of coders, and they are uploading their early results for each other’s perusal, often with a bracing dash of self-deprecation. “I know it’s a bit cheesy but I hope you enjoy,” writes user Jose Cano of his Magenta collaboration. It is, for sure, a bit cheesy. The piano on the piece feels clumsy, kind of like a beginner student who has just discovered the satisfying tang of noodling up and down the harmonic minor scale. When the Magenta-generated synth tries going beyond that, it is immediately out of its depth, hitting notes outside the logical framework. The vocabulary is limited, but you sense an intelligence feeling its way around some basic rules.

 

For his part, Eck takes a bit of a “polar bear riding a tricycle” attitude towards machines making music—it’s probably more impressive the bear is doing it at all, nevermind its skill level. He says he’s OK if some of the musical experimentation that volunteer users trade with Magenta’s team “kind of sucks.” It’s a refreshingly humble attitude from a bunch of Google engineers. “If Frank Ocean came along and said, ‘Let’s collaborate with Magenta,’ I might say, ‘It’s not the time.’”

Nonetheless, what Eck wants, and what Magenta so far lacks, is a team of musicians playing around with it, bending the software into shapes the engineers could have never imagined. “Look what happened with the drum machine,” he says. “It became brilliant when it got used for creative purposes for which it wasn’t intended. Maybe I have a Magenta model that’s trained to generate new sounds, or new sequences of melody, but what makes it interesting is that another artist comes along and plays with it.”

Around the time of our interview, Magenta unveiled a new tool called NSynth—a neural network that has been trained on nearly 300,000 instrument sounds. The way it has “learned” these sounds represents a leap in thinking: Usually, when computers reproduce sounds, they reduce the physical vibrations of sound waves into numbers that approximate those vibrations, and then the machine crunches those numbers. That’s how you end up playing flutes and trumpets and cellos on a keyboard.

NSynth operates on a slightly higher plane. Instead of converting waves to numbers, it works from a series of “ideas” programmed into its head already about what instruments sound like—this is the stuff of cortexes, not brain cells. In effect, it’s basically a smart synth, and its neatest trick is the ability to crossfade between two sounds in its database to make a new one. So when you ask it to blend, say, the sound of a goose honk and a harp, it spits something out that indeed sounds like a strange approximation of those two things. With a little doing, you can import those sounds into your home studio.

Playing around with the dials and fader is fun, but it feels more like the past than the future: I’m reminded of my older brother’s little toy Casio in the mid-’80s, and its voice recorder. Was mixing goose honks and a harp much different from recording curse words into a microphone so I could play “Chopsticks” with them? NSynth ultimately feels more like being shown the parts of airplane wing than flying.

In the realm of musical AI, Flow Machines feels a bit more like liftoff. Click around on their website, and it will eagerly show you how it learned to write a harmony for Beethoven’s Ode to Joy that was directly inspired by Bach chorales. Then, watch a video of the Flow Machine’s in-house singer-songwriter, Benoît Carré, perform a fairly stupefying one-man rendition of the Beatles “I Feel Fine” with a “smart” loop pedal created by the company. Instead of recording what you tell it to, the Reflexive Looper “listens” to your playing and makes decisions, on its own, what to record and loop. It is also intelligent enough to take that looped material and transpose it, on its own, into a new key. Watching Carré test-drive it in real time is breathtaking.

Musician Benoît Carré uses Flow Machines’ Reflexive Looper, which decides what to record and loop on its own as a musician plays and sings into it.

Where Magenta bills itself as a “research project,” Flow Machine is focused on output; there will be a compilation album made by a series of artists using Flow Machines’ software soon, on its own label. François Pachet, the head of Flow Machines, isn’t ready to share specific details on the album project, but he hopes it will kick Flow Machines, and the larger project of AI-assisted music in general, into a new phase, one where the presence of AI ceases to be the talking point.

“When we launched the original tunes, people listened to the technical artifact—now, we’d like to be judged only on the music,” he says, laughing. “It’s going to be hard.”

Flow Machines works differently than Magenta. It is a vast database of songs and styles. You pick a few from its vault as inspirational starting points, not much different from how humans write songs on their own. Then you tell the software a few things about the music you want it to generate; not too many short notes, for example, or not too many chords; medium tempo, with some inverted chord voicing. Then Flow Machines gets to work and starts spitting out ideas: You keep the ones you like. You can even zero in on part of a melody and ditch the rest, encouraging the software to develop a single idea further.

In other words, Flow Machines is sort of a virtual musician. It’s limber and responsive enough to extemporaneously draw on a rich base of references but it can also collaborate. It can tweak its phrasing, play it a little less like this and a little more like this, draw out a phrase or get rid of a frilly turnaround, based on a person’s input.

Flow Machines has to both learn to make music and to make music with others, and Pachet can’t think of any other machine learning that resembles it. “You give an idea, and the system fills in the blanks, but then you criticize the results,” he says. I ask him if voice texting—in which a phone has to be able to both understand and recognize every word in the English language and also be able to interpret my peculiar way of speaking it—is close. “Not really,” he says. “Unless you are asking your phone to take your voice text as a prompt and write a new paragraph in the style of William Faulkner, or Obama.”

A video explaining the overall concepts behind Flow Machines.

There are a couple of well-known thought experiments about machine consciousness. One of them is called “The Chinese Room.” It was first posed by the American philosopher John Searle in 1980, and this is roughly how it goes: You are locked in a room. Someone feeds you pieces of paper with Chinese writing on them through a slot in the door. You don’t speak or understand a word of Chinese, but you have a roomful of books in your room that tell you which Chinese symbols to copy back. You dutifully copy those Chinese letters by hand and pass them back through the slot to your mysterious interlocutor. You don’t know what the Chinese letters that you’re reading say, nor do you know what you are writing back. As far as you know, you’re trading recipes, or comparing bathroom tiles, or debating the nature of the universe.

But the people on the other side of the room would have no idea just how ignorant you are. To them, you are a sophisticated and fluent Chinese speaker participating in a meaningful discussion. This was Searle’s metaphorical argument against conscious machines, which, he argued, played the role of the blank copyists in the room. Manipulation of symbols is one thing, but innate comprehension of those symbols is another.

This is a neat metaphor for the music being made by Magenta and Flow Machines. When I ask both Douglas Eck and François Pachet about their theories of machine consciousness, both are brusquely unsentimental. “This is a technology,” Eck insists. “Try to think of an art form that uses zero technology. In every case that I’ve looked at, when we’ve used new technology, it’s always made us more creative. We should admit that our brains are not all that smart without an environment to lean on. For example, most of us can’t even do long division without technology in the form of paper and pencil.”

Pachet offers a similar answer. He thinks that Flow Machine is capable of generating “perfectly good” songs all by itself, but that truly unique ones only happen with an artist present. “There are so many choices to make when writing a piece of music,” he says. “Only an artist can make the sort of choices that result in a great piece.”

Those decisions—phrasing, timing, feeling—happen at a level that machines don’t know how to aim for, at least not yet. Truly great music is born of a burning desire to communicate, as well as good, old-fashioned accidents of misunderstanding, which humans still seem to have the market cornered on. Excitement, which has been the divining rod leading most human investigations, still seems foreign to processing systems. You can teach machines to spin sugar, but not to crave it.

And yet, Pachet notes, sometimes the software generates stunning music anyway, all on its own. “Sometimes, it will map a guitar line onto a new song, and it works so well; it’s great, and we don’t know why. And the system, to be honest, doesn’t know either—it has no ideas on its own about what’s a ‘good’ generation or a ‘bad’ one.” This doesn’t feel so different from humankind’s relationship to music in general; we love certain sounds and decide we can’t live without them, while others repulse us. Do we really understand why? We try to explain it, to ourselves and to others, but the impulse, and the base-level attraction, occurs somewhere beneath our conscious minds’ reach.

This is why, ultimately, Pachet doesn’t have any truck with grand theories of machine consciousness. Despite working on what feels like one of the final frontiers of human behavior—is anything more irreducibly human than music?—he doesn’t see a ghost in the machine, a latent soul waiting to rise from the lines of code. He compares the modern advancements of musical AI with the birth of the digital synthesizer. “That was just a tool, and this is exactly the same thing,” he says. “The tool does not have any idea about what it creates, and this is just a new generation of tools. We just have to learn how to use them.”

It’s ironic, or maybe appropriate, that the only person I speak to who waxes mystical about the sounds these programs make is a musician. Benoît Carré’s full-time job is to collaborate with Flow Machines, exploring its quirks and taking it for test-drives every day. Intriguingly, working with this sophisticated algorithmic software has plugged him into the more subconscious side of his songwriting brain.

Take “Mr. Shadow,” for instance, a song Flow Machines generated working from hundreds of American jazz standards. The melody was one of countless results that the program spit out—it captured Carré’s imagination, and he built the song from there. He is entranced by the way the software “reads” human voices, and is convinced it taps into something elemental about them. “It’s like if I captured the soul of the singer,” he says. “What I get is not a voice that tells me stories with lyrics. I get an emotion, a style, a way of singing—things that are the essence of the singer. It’s like an abstract song, but very emotional. That’s what I find most interesting about this tool; you recreate a sense of listening to music like if you were a child who didn’t understand every word, but you get the feeling. It’s a new kind of song.”

“Mr. Shadow” is a song generated by Flow Machines based on jazz standards.

Music making is, at its root, a sort of dreaming—in making it, we are closer to our subconscious than we are, arguably, with any other form of artistic creation. The only substance we are pushing is air, and once that air goes still, the medium dies, and all we have are recordings. It is the most ethereal of all the arts, and as machines take over more and more of the “manual labor” of the job, musicians might very well be freed up to delve even deeper into those dreamscapes.

“When I’m playing with Flow Machines, I’m more connected with my unconsciousness and my feelings, and less with my mind,” Carré says. “It’s very instinctive. The music that you make with those tools helps you to go further inside yourself, to explore the shadow side of your creativity. I’m discovering a new way to make music, yes, but maybe I’m also discovering a style. The melodies that come out are not in your style, but they are the result of your choices. But there’s something more, something augmented, something that doesn’t seem like our reality.”

Carré is more likely to think of Flow Machine as “an intelligence,” which is the word he uses to describe playing with the Reflexive Looper. He’s made the most music with these machines, after all, and he is the one with their melodies in his head; they feel quite real to him. “When I work with Flow Machines, I feel like I’m collaborating,” he notes. “I do not feel alone. These programs are proposing melodies to me that I can sing in the shower the next morning. They are melodies that obsess me sometimes, because they are really different. It’s like when you are about to sleep: You have thoughts, but you don’t identify them; they are real, but not real. There is something otherworldly about them.”

Back to home

Leave a Reply

Your email address will not be published. Required fields are marked *