Stability AI, known for its artificial intelligence text-to-image generator, has stepped into the world of AI music.
The London-headquartered company announced on Wednesday (September 13) the launch of Stable Audio, an AI generator that is the musical equivalent of Stable Diffusion, its image-generating tool that helped make the company an AI unicorn last year.
The new text-to-music generator works by taking a series of verbal cues input by a user and turning it into an audio track.
For instance, entering “post-Rock, guitars, drum kit, bass, strings, euphoric, uplifting, moody, flowing, raw, epic, sentimental, 125 BPM” will result in this track.
The free version, which is meant for non-commercial use only, enables users to create 20-second tracks, while the Pro version, meant for commercial products with fewer than 100,000 monthly active users (MAUs) allows for tracks of up to 90 seconds long.
Commercial products with more than 100,000 MAUs require an enterprise license.
Unlike some other AI products, the AI algorithm behind Stable Audio was trained on licensed content, via a partnership between Stability AI and music library AudioSparx.
The technology behind Stable Audio is similar to the one used in the Stable Diffusion image generator, relying on a “latent diffusion” AI architecture. In simplest terms, this means the algorithm can be trained faster, and can generate content faster, by mapping the data it works with into a simple virtual space, making analysis easier.
Stability AI says its technology is “the first in the industry” to use this technique for generating audio.
“Using the latest advancements in diffusion sampling techniques, our flagship Stable Audio model is able to render 95 seconds of stereo audio at a 44.1 kHz sample rate in less than one second on an NVIDIA A100 [graphics processing unit],” Stability AI says on its website.
“As the only independent, open and multimodal generative AI company, we are thrilled to use our expertise to develop a product in support of music creators,” Stability AI CEO Emad Mostaque said in a statement.
“Our hope is that Stable Audio will empower music enthusiasts and creative professionals to generate new content with the help of AI, and we look forward to the endless innovations it will inspire.”
The company’s Stable Diffusion product, released in August of last year, has become one of the most popular text-to-image generators on the market, helping to propel Stability AI to a market valuation of USD $1 billion as of last October, based on a funding round in which it raised $101 million. According to a report at Forbes this past spring, the company is now seeking to raise funds at a valuation of around $4 billion.
Stability AI has been making its way into the music space for some time now. One of the research groups in its ecosystem is Harmonai, a “community-driven” organization that publishes open-source generative audio tools.
The company also partnered with legendary rocker Peter Gabriel on a competition called “DiffuseTogether” in which participants were invited to submit an AI-generated video set to Gabriel’s music.
Notably, Stability AI hired Ed Newton-Rex, who – among other things – founded and created AI music-making platform Jukedeck. He also worked as Product Director in TikTok’s in-house AI lab, and is now VP of audio at Stability AI.
In an interview with MBW this past spring, Newton-Rex suggested that – far from being a threat – AI technology will be a major boon to musicians and to the music business.
“AI will be at its most powerful [as] a tool used by musicians in countless different ways. Frankly, [that includes] ways that even people like me working in the industry today can’t yet predict,” he said.
In Newton-Rex’s view, “the main benefit [of AI] for the music industry is increasing value for rights holders. That may sound counterintuitive [in the context of debates around AI making music] but when you have AI, the music that you write, or that you own, can become so much more valuable, because it’s no longer just one static thing. It can be modified.
“So maybe a track you’ve written or that you’ve gotten in your library is lengthened to fit a different TV ad, maybe the instrumentation is changed to get the right mood in a video, maybe you change the entire style to fit something totally new.
There are people “who realize the opportunity that generative AI can bring the music business,” he added.
“I think rightsholders are in a really good position. What if you want… music that reacts to your run [as you exercise]? Rightsholders who own the songs that people love to listen to are in a perfect position [for that]. Because AI isn’t just generative – it’s also adaptiv