Technology has always been a great leveller. From the industrial age to the age of the internet, it has improved the quality of life for the masses and made things previously unimaginable more accessible. One only needs to take a look at their smartphone to understand how communicating with someone sitting thousands of kilometres away has become so common that most do not even think about it much. Before Graham Bell, such long-distance communication was only available to the rich and influential due to the high costs associated with it.
Such examples are countless. From social media providing true connectivity across the world, smartphone apps digitising tasks that required physical presence and took away hours from a day, and remote work that empowers people living far away from corporate hubs with better earning opportunities, technology has democratised accessibility itself. In many ways, generative artificial intelligence (AI) has become the next torch-bearer to expand accessibility to new frontiers.
One such area where accessibility can make a big impact is the music industry. Despite the arrival of independent streaming platforms such as Spotify, SoundCloud, Apple Music, and more, making music distribution cheaper, the problem statement that remains is music creation. Today, original background music is a much-needed commodity. From professional artists to social media creators and podcasters, everyone requires music tracks for their content, preferably original, to avoid any copyright strikes by platforms (YouTube content creators are well aware of its effect) or a lawsuit.
But creating music is not everyone’s cup of tea. Likely, if you have not trained for years to master one or multiple musical instruments, yet you want original and unique music for your professional needs, you find yourself stuck with only two expensive solutions — hire a music producer or a session musician, or pay online to buy stock music. But not anymore, because this is where AI has stepped in.
Take the example of Beatoven.ai, an Indian AI-powered music generation platform that lets users write a simple text prompt to generate new and unique background music within ten seconds. To understand how this technology works, its various implications, and the experience of running such an innovative startup, we at Gadgets 360 spoke with Mansoor Rahimat Khan, the co-founder and CEO of Beatoven.ai.
The inception and journey of Beatoven.ai
Mansoor Rahimat Khan comes from the Gwalior-Indore-Dharwad Gharana of Sitar, a famous family of musicians that have played and shaped modern-day Sitar music for seven generations. Khan was no different, but he chose a different path owing to another of his passions — technology. “I completed my graduation from the National Institute of Technology (NIT), Goa, in electronics and communication engineering. This was also when I started delving into the space that lies at the intersection of music and technology,” Khan told us.
After working for a few years, Khan met Siddharth Bhardwaj, an alumnus of the Indian Institute of Technology (IIT), Allahabad (now known as Prayagraj), and a music enthusiast. The duo, sharing similar interests, identified the problem of music licensing in content and wanted to build something that could make music more accessible to millions of creators — whether on social media or professionally pursuing a career. That was the genesis of Beatoven.ai.
But there was one problem. Even as the duo began working on the product and the startup in 2021, their solution to the problem required generative AI, which was still a year away from reaching the mainstream (in November 2022, ChatGPT arguably started the gen AI race).
“Initially, the prototype we built in 2021 was a very bare-bone platform. Users could select a genre and a tempo and specify a duration, and we would generate an original piece of music. Back then, no large language models (LLM) existed, so we had to build our entire tech stack from scratch. Today, we have our own proprietary tech that we started building back then,” Khan said.
Things became easier once the AI wave came about, and Beatoven.ai benefitted from the availability of LLMs in the market, using which they could better equip their platform to cater to its current user base of one million.
The Beatoven.ai platform
The web-only platform is a generative AI-powered music generation tool for content creators. Users, once they have signed up, can write a text prompt to generate original background music. Alternatively, the platform also allows users to pick a tempo, duration, genre, and mood to create music.
Once the user has added the input, the AI takes over and generates four separate tracks. The platform also offers post-generation editing features where users can change an instrument, reduce or increase volume in specific parts, or recompose an entire section of the track. Khan said a single track can be up to 15 minutes long, although there is no upper limit, and the suggested value exists to keep rendering time short. A track of an average length of 1-2 minutes will take about 10 seconds to generate. Based on data shared by the company, since inception, Beatoven has generated 15 lakh soundtracks and boasts 3 lakh downloads.
The platform currently does not allow users to make fusion tracks where two or more genres are blended, but Khan told Gadgets 360 exclusively that the company will soon release a new update that will add this feature.
We also tested out the platform and found the music to be quite realistic. The following song was created using the prompt “Create a high-energy EDM anthem with a beat drop that is perfect for a dance party”.
The Beatoven.ai tech-stack
There are two components to the Beatoven platform. The first is the LLM, which allows users to type prompts in natural language and then process that information in a format the AI can understand to convert it into music. The startup uses GPT models for this part.
The second component understands the user intent and generates a track that fulfils the parameter. This architecture was created by the company natively. The AI model uses contrastive learning architecture to make it happen. Khan highlights that the inspiration for this technique came from OpenAI’s CLIP model, but quickly points out that the OpenAI model was built for text and images, and Beatoven was the first to use it for sound and music. Due to it being a proprietary work, the company was also able to optimise the process. For instance, Khan told us that the platform uses CPU inference instead of GPU inference. This is notable given even small LLMs require GPU inference to run.
The startup has sourced almost 1,00,000 data samples from independent artists to train the AI model. The company collaborated with nearly 250 artists globally and paid them for exclusive tracks. Khan claimed that the company had ethically sourced all of its training data and did not scour the internet for it. Interestingly, Adobe is reportedly doing the same at present to build an AI video generation model.
However, data, today, has become an incredibly costly resource that is required continually to upgrade AI models and improve them. While Beatoven continues its practice of collaborating with artists to procure data even today, in the future, it plans to cut costs by introducing a revenue-sharing model, where artists would be paid based on the number of tracks generated where the AI used the song sample or the data.
How Beatoven.ai plans to deal with the competition
AI-based music generation is not entirely a unique proposition today. Many players have entered the segment, recognising the potential. Some include Google with its MusicLM, OpenAI with its Jukebox, and Adobe with its Project Music GenAI Control. However, none of these models is available to the public today, and they remain under development. But competition for Beatoven still exists. A big rival for them would be Suno AI, which not only creates music but also adds AI-generated voices to the music to offer a full-fledged song.
In answer to the concern, Khan highlighted that the company offers unlimited music generation without adding a rate limit. Further, he highlights that the company is building an entire ecosystem. While on one side, it is catering to users by generating music, on the other hand, it also offers a place for artists to sell their original music. The entire suite of offerings, along with the promise of “ethically sourced and copyright-free unique music”, is what Khan believes gives Beatoven the edge in the market.
A look towards the future
Beatoven is now looking at expansion of its platform to cater to a global user base. The startup has already begun onboarding artists from different parts of the world as 70 percent of its user base resides outside the country. Khan believes this global outlook, along with focusing on improving the AI model, will be the key to hitting its target of five million users in the next two years.
Technology can often be a two-edged sword. While the benefits of AI-generated music cannot be understated, the question that arises is whether such easy and affordable music creation can have an adverse impact on aspiring musicians. Is the commodification of music really the right way to go?
Khan believes while music creation is going to become the next big disruption in the industry, it is unlikely to take away the dreams and livelihood of musicians and singers. “I believe artists are still going to be at the centre of this disruption because AI cannot compete with human creativity,” he said.