In March, Spotify launched its first AI-powered characteristic with the debut of its AI DJ — a wise audio information with a convincingly real looking voice. That AI persona was really based mostly on an actual individual, because it seems — Spotify’s head of Cultural Partnerships, Xavier “X” Jernigan, who had the honour of changing into the primary voice mannequin for the AI characteristic.
TechCrunch sat down with Jernigan to be taught extra concerning the course of for coaching the AI and Spotify’s future plans for its AI DJ efforts.
The brand new AI DJ personalizes the music listening expertise for listeners, curating a choice of music based mostly on their pursuits. It additionally has spoken commentary about every track — very similar to an actual radio host.
Along with Jernigan’s main position at Spotify, he’s additionally the host of assorted Spotify podcasts, together with “The Window,” “Showstopper” in addition to the now-defunct podcast “The Get Up.” So, he’s used to having his voice heard by thousands and thousands of listeners. Nonetheless, having his voice memorialized as an AI is a novel expertise.
Spotify selected Jernigan to be the primary voice mannequin as a result of his “voice and character resonated with quite a lot of our listeners already,” Jernigan advised TechCrunch. “[The company was] pretty assured that I might resonate on this means as properly.”
Spotify’s Morning Present, “The Get Up,” garnered practically 6 million listeners and was a prime 10 podcast on Spotify earlier than it abruptly resulted in 2022, demonstrating Jernigan’s pull.
Nonetheless, being the voice mannequin for DJ was exhausting to wrap his head round at first, the podcast host admitted.
“I obtained pitched on being this voice mannequin for DJ and my thoughts was blown when it was defined to me,” Jernigan advised us. “Think about for those who’re listening to this for the primary time you don’t have something to have a look at and I’m identical to, ‘Wait, what? It’s gonna be me but it surely’s not me, and it’s textual content and voice, but it surely’ll sound like me, and it’s AI?”
“For me, it was a brand new expertise working with AI on this means. I used to be simply blown away,” he added.
Spotify says its AI DJ was constructed utilizing each Sonantic and OpenAI applied sciences.
Sonantic is an AI startup that Spotify acquired final 12 months. The corporate’s tech was liable for constructing AI-based real looking voices, together with the one used for Val Kilmer’s voice in “Prime Gun: Maverick.”
Previous to the acquisition, Spotify spent just a few years researching AI-powered know-how and labored on the DJ characteristic “in some iteration,” Jernigan famous. He declined to share precisely how lengthy the method took however stated integrating the Sonantic know-how “actually kicked it into excessive gear.”
Jernigan defined the method of coaching the AI, which entailed going right into a studio, studying off a script and talking in numerous cadences and inflections to convey completely different feelings. He fed the AI sure phrases that solely he makes use of to make it really feel as genuine as attainable.
“We use phrases that I say… I don’t say ‘tunes’ for songs. That’s simply not how I discuss,” he stated. “I say, ‘hits’ or ‘bangers.’ So, you’ll hear DJ say these sorts of phrases,” Jernigan continued. “We even did an entire technique of like, how do I say ‘hey,’ how do I say ‘howdy.’ I carried round a pocket book, and I might simply write down these completely different phrases that had been one thing I might say.”
He added that the Spotify workforce made certain to maintain in his pure pauses and breaths so the AI voice would really sound human-like.
Even Jernigan’s mother gave her stamp of approval to the outcomes.
“[DJ] handed the mama check. I performed it for her earlier than it got here out, explaining it to her and I’m attempting to get her to wrap her thoughts round it,” he stated. “She listened to all my podcasts, so she’s used to listening to my voice recorded and performed earlier than and he or she was like ‘That sounds precisely such as you.’ My mama stated it appeared like me, so I knew it was spot on.”
Though real looking AI voices exist already, we’d argue that Spotify’s DJ is the calmest and most chill-sounding in contrast with others we’ve heard. Although Google’s Duplex know-how could sound genuine, it’s not essentially a voice that’s good to take heed to while you’re attempting to vibe out to your summer time jam playlist.
“For me, doing the efficiency from a voice appearing standpoint, my goal was to attach with individuals and to converse with individuals and to consider one individual. So, after I was coaching the AI, I simply pictured one individual after I was within the studio, speaking to them and being their buddy,” he added.
Along with making the AI voice sound pleasant to listeners, the design of the DJ itself was additionally made to really feel approachable.
The animated inexperienced circle that customers see when listening to the DJ is a nod to the Spotify emblem and strikes like a mouth when the AI talks.
“When it got here to the design, we considered all the expertise — the way it works, the way it sounds, the way it seems and how you can make it private for every consumer,” Emily Galloway, head of Product Design for Personalization at Spotify, advised TechCrunch. “Early on for the visible aspect, we explored some choices that felt extra technical (think about issues like soundwaves). But this didn’t really feel proper since we wished to humanize the AI…”
“We wished to make it appear and feel distinctive. In actual fact, it was so distinctive that it was awarded a design patent,” Galloway added.
Jernigan contributed to DJ in different methods apart from recording his voice.
To ensure that the AI to supply professional commentary concerning the music, Spotify put collectively a author’s room comprised of curators, tradition specialists and music specialists.
Jernigan has an intensive background in music, so he was additionally a participant within the author’s room. He beforehand labored for prime artists like Diddy, Amy Winehouse and a couple of Chainz, amongst others.
And whereas Jernigan is the primary voice mannequin for DJ, there’s the potential for listeners to listen to extra voices sooner or later.
TechCrunch requested Jernigan if the corporate had any plans to rent voice fashions that talk different languages.
“Keep tuned,” he hinted.
The AI DJ is at the moment solely obtainable in English for Premium subscribers within the U.S. and Canada. As of February, the DJ characteristic remains to be in beta testing.
“We obtained an entire bunch of actually cool new options popping out throughout the board,” Jernigan stated. “We obtained actually dope stuff that’s popping out.”