AssemblyAI Raises $50M to Develop 'Superhuman' Speech AI, Unlocking New Applications

By Greg Tavarez December 15, 2023

The global AI market size is predicted to reach $1,811.8 billion by 2030, up from $136.6 billion in 2022 with a CAGR of 38.1%, according to Grand View Research. Underneath that massive umbrella that is AI is speech AI, which is gaining momentum thanks to the likes of industry giants Google, Amazon and Microsoft as well as startups like Deepgram and Otter.ai. For example, Otter makes meetings more productive with transcriptions that take notes throughout the meeting.

Another platform that is accelerating the momentum around speech AI is AssemblyAI, a speech-to-text and speech intelligence platform. To give itself a distinct edge in this competitive landscape, AssemblyAI recently secured $50 million in Series C funding to develop "superhuman" speech AI models, aiming to revolutionize voice-driven applications across industries.

AssemblyAI was founded with the ambition of creating speech AI models that unlock a new wave of applications powered by voice data. Think of the knowledge contained in company meetings, podcasts, videos, customer calls or even voice-based machine interactions. Accurate understanding and analysis of voice data opens doors to a multitude of novel opportunities.

Over the past two years, advancements in data availability, computing power, and neural network architectures like the Transformer have significantly propelled AI models across various domains, making the dream of superhuman Speech AI models more attainable. As an example, AssemblyAI's latest Conformer-2 model, trained on 1.1 million hours of voice data, achieved accuracy and robustness in tasks like speech-to-text and speaker identification. It boasts a 43% reduction in errors on noisy data compared to other models and a nearly 50% accuracy improvement over previous generations.

In short, AssemblyAI's current suite of speech-to-text models delivers accuracy and additional features like speaker identification, sentiment analysis and chapter detection. The company boasts over 25 million daily inference calls and processes more than 10TB of voice data, serving clients across media, education, healthcare, finance and more.

Taking speech AI further, AssemblyAI is developing its next-generation Universal model, destined to become a performer in multilingual Speech AI tasks. This model trains on a dataset of over 10 million hours of voice data, leveraging Google's new TPU chips. This represents a 1,250-times increase in training data compared to the company's first model released in 2019.

The emergence of powerful LLMs capable of ingesting accurately recognized speech and generating summaries, insights and classifications also opens new possibilities for voice-data-driven products and workflows. This LLM technology underpins AssemblyAI offerings like Audio Intelligence models for automated chapter detection and content moderation, which support brand safety and content management at scale for leading enterprises. Additionally, the new LeMUR product utilizes LLMs for text generation tasks over recognized speech.

"This new capital will support our ambitious research plans, new model development, training compute, market expansion, as well as help us build our team,” AssemblyAI founder and CEO Dylan Fox stated in the announcement. “We believe that the best way for us to continue to innovate is to bring together some of the best minds in AI. And, with 10,000-plus new organizations signing up for our API every month, we're just scratching the surface of the new voice-powered AI applications we'll see enter the market over the next year.”

The "superhuman" ambition seems to go beyond accuracy. If successful, the development and deployment of "superhuman" speech AI models could have profound implications for various sectors. Imagine classrooms where AI tutors analyze student conversations to personalize learning, or healthcare systems where AI agents interpret medical consultations and identify potential health risks. The possibilities are vast, and AssemblyAI is poised to be at the forefront of this transformative technology.

The round, led by Accel, brings AssemblyAI's total funds raised to $115 million — 90% of which was raised in the last 22 months, as organizations across virtually every industry have raced to embed Speech AI capabilities into their products, systems and workflows.

Edited by Alex Passett

Get stories like this delivered straight to your inbox. [Free eNews Subscription]