Gladia’s Insight: Real-Time Processing is the Future of Audio Transcription APIs

French startup Gladia, which provides a speech-recognition application programming interface (API), has secured $16 million in a Series A funding round. Essentially, Gladia’s API allows you to convert any audio file into text with high accuracy and quick turnaround time.

While Amazon, Microsoft, and Google offer speech-to-text APIs as part of their cloud-hosting services, newer models from specialized startups like Gladia have shown better performance.

There has been significant progress in this field, especially after the launch of OpenAI’s Whisper. Gladia competes with other well-funded companies like AssemblyAI, Deepgram, and Speechmatics.

Gladia initially offered an improved version of Whisper’s speech-to-text model with additional features. For example, the startup supports diarization out of the box — detecting multiple speakers in a conversation and separating the recording and transcribed text accordingly.

Gladia supports 100 languages and various accents. This reporter can confirm its effectiveness, as we’ve used Gladia for transcribing interviews with no issues with accents.

Imagem destacada

The startup offers its speech-to-text model as a hosted API for users to integrate into their applications. Over 600 companies, including meeting recorders and note-taking assistants like Attention, Circleback, Method Financial, Recall, Sana, and Veed.io, use Gladia.

With the new funding, Gladia aims to streamline the process by combining audio intelligence and large language model (LLM) tasks into a single API call. This can allow customers to generate conversation summaries from bullet points without third-party LLM APIs.

Gladia is also working on reducing latency for real-time transcription. The company can currently transcribe live conversations with a latency of under 300 milliseconds, aiming to provide batch quality with real-time capabilities.

XAnge is leading the Series A funding round, with participation from Illuminate Financial, XTX Ventures, Athletico Ventures, Gaingels, Mana Ventures, Motier Ventures, Roosh Ventures, and Soma Capital.

Gladia envisions a future where audio applications experience a “ChatGPT moment.” As transcription models become integrated into consumer apps by companies like Apple and Google, the value of automated transcription will become more apparent, leading to increased integration of audio features by developers and API providers like Gladia stepping in to meet the demand.

Leave a Reply

Your email address will not be published. Required fields are marked *