MCP server for Gladia audio transcription and intelligence. Enables LLMs to transcribe, analyze, and translate audio/video content through Gladia's API.
- Sign up at app.gladia.io
- Navigate to the API Keys section
- A default API key is automatically created for new accounts
Gladia offers 10 hours of free audio transcription per month. No credit card required.
npm install mcp-gladiaOr run directly:
npx mcp-gladiaSet your Gladia API key as an environment variable:
export GLADIA_API_KEY=your-api-key-hereAdd to ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"gladia": {
"command": "npx",
"args": ["mcp-gladia"],
"env": {
"GLADIA_API_KEY": "your-api-key-here"
}
}
}
}claude mcp add gladia -- npx mcp-gladiaThen set your API key in the environment or .env file.
Any MCP-compatible client can use this server via stdio transport. Set the command to npx mcp-gladia and provide GLADIA_API_KEY as an environment variable.
Upload an audio/video file to Gladia for transcription.
| Parameter | Type | Required | Description |
|---|---|---|---|
| filePath | string | Yes | Path to the audio/video file |
Supported formats: mp3, wav, m4a, mp4, mov, avi, flac (max 1GB).
Submit an audio file for transcription with automatic polling until completion. Supports all Gladia audio intelligence features. Returns the completed result or a job ID if timeout occurs (5 min).
| Parameter | Type | Required | Description |
|---|---|---|---|
| audioUrl | string | Yes | URL from upload_file |
| language | string | No | Language code (e.g. en, fr). See supported languages |
| detectLanguage | boolean | No | Auto-detect language (default: true) |
| diarization | boolean | No | Enable speaker identification |
| diarizationConfig | object | No | { numberOfSpeakers?, minSpeakers?, maxSpeakers? } |
| subtitles | boolean | No | Generate subtitle files |
| subtitlesConfig | object | No | { formats: ["srt", "vtt"] } |
| customVocabulary | string[] | No | Custom words to improve recognition |
| summarization | boolean | No | Enable transcription summary |
| summarizationConfig | object | No | { type: "general" | "concise" | "bullet_points" } |
| sentimentAnalysis | boolean | No | Enable sentiment/emotion analysis |
| namedEntityRecognition | boolean | No | Enable entity detection |
| chapterization | boolean | No | Enable chapter detection with timestamps |
| translation | boolean | No | Enable translation |
| translationConfig | object | No | { targetLanguages: ["fr", "es"], model?: "base" | "enhanced" } |
| audioToLlm | boolean | No | Enable custom LLM analysis |
| audioToLlmConfig | object | No | { prompts: ["your question about the audio"] } |
Check the status of a transcription job (useful for long-running jobs that timed out).
| Parameter | Type | Required | Description |
|---|---|---|---|
| jobId | string (UUID) | Yes | Job ID from a previous transcribe request |
List past transcription jobs with optional filtering.
| Parameter | Type | Required | Description |
|---|---|---|---|
| offset | number | No | Pagination offset |
| limit | number | No | Max results (default: 20) |
| status | string | No | Filter: queued, processing, done, error |
| afterDate | string | No | Filter by creation date (ISO 8601) |
| beforeDate | string | No | Filter by creation date (ISO 8601) |
| kind | string | No | Filter: pre-recorded, live |
Delete a transcription job and its data.
| Parameter | Type | Required | Description |
|---|---|---|---|
| jobId | string (UUID) | Yes | Job ID to delete |
All intelligence features are enabled as options on the transcribe tool and processed server-side by Gladia.
Generate a summary of the transcription in one of three formats:
| Type | Description |
|---|---|
general |
Balanced, comprehensive summary (default) |
concise |
Short overview of key points |
bullet_points |
Key takeaways as a bullet list |
{ "summarization": true, "summarizationConfig": { "type": "bullet_points" } }Detect sentiment and emotion for each sentence in the transcript, with speaker attribution when diarization is enabled.
Sentiments: positive, negative, neutral, mixed, unknown
Emotions: adoration, anger, joy, fear, surprise, sadness, neutral, and more
{ "sentimentAnalysis": true }Detect and classify entities mentioned in the audio. Supports 50+ entity types across multiple categories:
| Category | Entity Types |
|---|---|
| PII | Name, Email, Phone Number, SSN |
| Location | City, Country, Address |
| Medical (PHI) | Conditions, Drugs, Injuries |
| Financial | Bank Account, Credit Card |
| Demographic | Age, Gender, Occupation |
| Temporal | Date, Time |
Supports GDPR, HIPAA, and CPRA compliance workflows.
{ "namedEntityRecognition": true }Automatically segment audio into logical chapters. Each chapter includes:
- Summary — overview of the chapter content
- Headline — short title
- Gist — one-line bottom line
- Keywords — key terms mentioned
- Timestamps — start and end times
{ "chapterization": true }Translate transcriptions to one or more target languages.
| Model | Description |
|---|---|
base |
Fast translation, covers most use cases |
enhanced |
Higher quality, better for complex content |
{ "translation": true, "translationConfig": { "targetLanguages": ["fr", "es"], "model": "enhanced" } }Run custom analysis prompts directly against the audio content. No need to post-process transcripts with a separate LLM.
{
"audioToLlm": true,
"audioToLlmConfig": {
"prompts": [
"Extract the key decisions made in this meeting",
"What are the action items and who is responsible?"
]
}
}Identify and separate speakers in the audio. Output includes speaker labels on every utterance.
{
"diarization": true,
"diarizationConfig": { "minSpeakers": 2, "maxSpeakers": 5 }
}100+ languages supported for transcription. Use the language code with the language parameter, or set detectLanguage: true (default) for automatic detection.
| Language | Code | Language | Code | Language | Code | ||
|---|---|---|---|---|---|---|---|
| Afrikaans | af |
Hawaiian | haw |
Persian | fa |
||
| Albanian | sq |
Hebrew | he |
Polish | pl |
||
| Amharic | am |
Hindi | hi |
Portuguese | pt |
||
| Arabic | ar |
Hungarian | hu |
Punjabi | pa |
||
| Armenian | hy |
Icelandic | is |
Romanian | ro |
||
| Assamese | as |
Indonesian | id |
Russian | ru |
||
| Azerbaijani | az |
Italian | it |
Sanskrit | sa |
||
| Bashkir | ba |
Japanese | ja |
Serbian | sr |
||
| Basque | eu |
Javanese | jw |
Shona | sn |
||
| Belarusian | be |
Kannada | kn |
Sindhi | sd |
||
| Bengali | bn |
Kazakh | kk |
Sinhala | si |
||
| Bosnian | bs |
Khmer | km |
Slovak | sk |
||
| Breton | br |
Korean | ko |
Slovenian | sl |
||
| Bulgarian | bg |
Lao | lo |
Somali | so |
||
| Catalan | ca |
Latin | la |
Spanish | es |
||
| Chinese | zh |
Latvian | lv |
Sundanese | su |
||
| Croatian | hr |
Lingala | ln |
Swahili | sw |
||
| Czech | cs |
Lithuanian | lt |
Swedish | sv |
||
| Danish | da |
Luxembourgish | lb |
Tagalog | tl |
||
| Dutch | nl |
Macedonian | mk |
Tajik | tg |
||
| English | en |
Malagasy | mg |
Tamil | ta |
||
| Estonian | et |
Malay | ms |
Tatar | tt |
||
| Faroese | fo |
Malayalam | ml |
Telugu | te |
||
| Finnish | fi |
Maltese | mt |
Thai | th |
||
| French | fr |
Maori | mi |
Tibetan | bo |
||
| Galician | gl |
Marathi | mr |
Turkish | tr |
||
| Georgian | ka |
Mongolian | mn |
Turkmen | tk |
||
| German | de |
Myanmar | my |
Ukrainian | uk |
||
| Greek | el |
Nepali | ne |
Urdu | ur |
||
| Gujarati | gu |
Norwegian | no |
Uzbek | uz |
||
| Haitian Creole | ht |
Nynorsk | nn |
Vietnamese | vi |
||
| Hausa | ha |
Occitan | oc |
Welsh | cy |
||
| Pashto | ps |
Wolof | wo |
||||
| Yiddish | yi |
||||||
| Yoruba | yo |
Upload my-recording.mp3 and transcribe it
Transcribe this meeting recording with diarization enabled, expecting 3-5 speakers.
Generate a bullet-point summary and extract action items using audio-to-LLM.
Transcribe this podcast, detect the language, translate to English and French,
and run sentiment analysis on the conversation.
Transcribe this customer call with named entity recognition to identify
any PII mentioned (names, emails, phone numbers).
Transcribe this earnings call and use audio-to-LLM with these prompts:
- "What are the key financial metrics mentioned?"
- "What is the company's guidance for next quarter?"
- "Summarize the Q&A section"
| Issue | Solution |
|---|---|
GLADIA_API_KEY is required |
Set the GLADIA_API_KEY environment variable |
Unsupported file format |
Use mp3, wav, m4a, mp4, mov, avi, or flac |
File too large |
Files must be under 1GB |
| Transcription timeout | Use get_transcription_status with the returned job ID |
| Translation fails | Ensure translationConfig.targetLanguages is provided |
Invalid uuid |
Job IDs must be valid UUIDs (from transcribe or list_transcription_jobs) |
git clone https://github.com/gladiaio/mcp-gladia.git
cd mcp-gladia
npm install
npm run build
npm run devRequires Node.js 18+.
MIT