Dark Mode

Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

PalabraAI/palabra-ai-python

Repository files navigation

Palabra AI Python SDK

Python SDK for Palabra AI's real-time speech-to-speech translation API Break down language barriers and enable seamless communication across 25+ languages

Overview

The Palabra AI Python SDK provides a high-level API for integrating real-time speech-to-speech translation into your Python applications.

What can Palabra.ai do?

  • Real-time speech-to-speech translation with near-zero latency
  • Auto voice cloning - speak any language in YOUR voice
  • Two-way simultaneous translation for live discussions
  • Developer API/SDK for building your own apps
  • Works everywhere - Zoom, streams, events, any platform
  • Zero data storage - your conversations stay private

This SDK focuses on making real-time translation simple and accessible:

  • Uses WebRTC and WebSockets under the hood
  • Abstracts away all complexity
  • Simple configuration with source/target languages
  • Supports multiple input/output adapters (microphones, speakers, files, buffers)

How it works:

  1. Configure input/output adapters
  2. SDK handles the entire pipeline
  3. Automatic transcription, translation, and synthesis
  4. Real-time audio stream ready for playback

All with just a few lines of code!

Installation

From PyPI

pip install palabra-ai

macOS SSL Certificate Setup

If you encounter SSL certificate errors on macOS like:

SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate

Option 1: Install Python certificates (recommended)

/Applications/Python\ $(python3 -c "import sys; print(f'{sys.version_info.major}.{sys.version_info.minor}')")/Install\ Certificates.command

Option 2: Use system certificates

pip install pip-system-certs

This will configure Python to use your system's certificate store.

Quick Start

Real-time microphone translation

from palabra_ai import (PalabraAI, Config, SourceLang, TargetLang,
EN, ES, DeviceManager)

palabra = PalabraAI()
dm = DeviceManager()
mic, speaker = dm.select_devices_interactive()
cfg = Config(SourceLang(EN, mic), [TargetLang(ES, speaker)])
palabra.run(cfg)

Set your API credentials as environment variables:

export PALABRA_CLIENT_ID=your_client_id
export PALABRA_CLIENT_SECRET=your_client_secret

Examples

File-to-file translation

from palabra_ai import (PalabraAI, Config, SourceLang, TargetLang,
FileReader, FileWriter, EN, ES)

palabra = PalabraAI()
reader = FileReader("./speech/es.mp3")
writer = FileWriter("./es2en_out.wav")
cfg = Config(SourceLang(ES, reader), [TargetLang(EN, writer)])
palabra.run(cfg)

Multiple target languages

from palabra_ai import (PalabraAI, Config, SourceLang, TargetLang,
FileReader, FileWriter, EN, ES, FR, DE)

palabra = PalabraAI()
config = Config(
source=SourceLang(EN, FileReader("presentation.mp3")),
targets=[
TargetLang(ES, FileWriter("spanish.wav")),
TargetLang(FR, FileWriter("french.wav")),
TargetLang(DE, FileWriter("german.wav"))
]
)
palabra.run(config)

Customizable output

Add a transcription of the source and translated speech. Configure output to provide:

  • Audio only
  • Transcriptions only
  • Both audio and transcriptions
from palabra_ai import (
PalabraAI,
Config,
SourceLang,
TargetLang,
FileReader,
EN,
ES,
)
from palabra_ai.base.message import TranscriptionMessage


async def print_translation_async(msg: TranscriptionMessage):
print(repr(msg))


def print_translation(msg: TranscriptionMessage):
print(str(msg))


palabra = PalabraAI()
cfg = Config(
source=SourceLang(
EN,
FileReader("speech/en.mp3"),
print_translation # Callback for source transcriptions
),
targets=[
TargetLang(
ES,
# You can use only transcription without audio writer if you want
# FileWriter("./test_output.wav"), # Optional: audio output
on_transcription=print_translation_async # Callback for translated transcriptions
)
],
silent=True, # Set to True to disable verbose logging to console
)
palabra.run(cfg)

Transcription output options:

1 Audio only (default):

TargetLang(ES, FileWriter("output.wav"))

2 Transcription only:

TargetLang(ES, on_transcription=your_callback_function)

3 Audio and transcription:

TargetLang(ES, FileWriter("output.wav"), on_transcription=your_callback_function)

The transcription callbacks receive TranscriptionMessage objects containing the transcribed text and metadata. Callbacks can be either synchronous or asynchronous functions.

Integrate with FFmpeg (streaming)

import io
from palabra_ai import (PalabraAI, Config, SourceLang, TargetLang,
BufferReader, BufferWriter, AR, EN, RunAsPipe)

ffmpeg_cmd = [
'ffmpeg',
'-i', 'speech/ar.mp3',
'-f', 's16le', # 16-bit PCM
'-acodec', 'pcm_s16le',
'-ar', '48000', # 48kHz
'-ac', '1', # mono
'-' # output to stdout
]

pipe_buffer = RunAsPipe(ffmpeg_cmd)
es_buffer = io.BytesIO()

palabra = PalabraAI()
reader = BufferReader(pipe_buffer)
writer = BufferWriter(es_buffer)
cfg = Config(SourceLang(AR, reader), [TargetLang(EN, writer)])
palabra.run(cfg)

print(f"Translated audio written to buffer with size: {es_buffer.getbuffer().nbytes} bytes")
with open("./ar2en_out.wav", "wb") as f:
f.write(es_buffer.getbuffer())

Using buffers

import io
from palabra_ai import (PalabraAI, Config, SourceLang, TargetLang,
BufferReader, BufferWriter, AR, EN)
from palabra_ai.internal.audio import convert_any_to_pcm16

en_buffer, es_buffer = io.BytesIO(), io.BytesIO()
with open("speech/ar.mp3", "rb") as f:
en_buffer.write(convert_any_to_pcm16(f.read()))
palabra = PalabraAI()
reader = BufferReader(en_buffer)
writer = BufferWriter(es_buffer)
cfg = Config(SourceLang(AR, reader), [TargetLang(EN, writer)])
palabra.run(cfg)
print(f"Translated audio written to buffer with size: {es_buffer.getbuffer().nbytes} bytes")
with open("./ar2en_out.wav", "wb") as f:
f.write(es_buffer.getbuffer())

Using default audio devices

from palabra_ai import PalabraAI, Config, SourceLang, TargetLang, DeviceManager, EN, ES

dm = DeviceManager()
reader, writer = dm.get_default_readers_writers()

if reader and writer:
palabra = PalabraAI()
config = Config(
source=SourceLang(EN, reader),
targets=[TargetLang(ES, writer)]
)
palabra.run(config)

Async Translation

import asyncio
from palabra_ai import PalabraAI, Config, SourceLang, TargetLang, FileReader, FileWriter, EN, ES

async def translate():
palabra = PalabraAI()
config = Config(
source=SourceLang(EN, FileReader("input.mp3")),
targets=[TargetLang(ES, FileWriter("output.wav"))]
)
result = await palabra.arun(config)
# Result contains: result.ok, result.exc, result.log_data

if __name__ == "__main__":
asyncio.run(translate())

Synchronous Translation

from palabra_ai import PalabraAI, Config, SourceLang, TargetLang, FileReader, FileWriter, EN, ES

# Synchronous execution (blocks until complete)
palabra = PalabraAI()
config = Config(
source=SourceLang(EN, FileReader("input.mp3")),
targets=[TargetLang(ES, FileWriter("output.wav"))]
)
result = palabra.run(config)
# Result contains: result.ok, result.exc, result.log_data

Signal Handling

# Enable Ctrl+C signal handlers (disabled by default)
result = palabra.run(config, signal_handlers=True)

# Default behavior (signal handlers disabled)
result = palabra.run(config) # signal_handlers=False by default

Result Handling

Both run() and arun() return a RunResult object with status information:

result = palabra.run(config)
# or: result = await palabra.arun(config)

if result.ok:
print(" Translation completed successfully!")
if result.log_data:
print(f" Processing stats: {result.log_data}")
if result.eos:
print(" End of stream signal received")
else:
print(f" Translation failed: {result.exc}")

I/O Adapters & Mixing

Available adapters

The Palabra AI SDK provides flexible I/O adapters that can combined to:

  • FileReader/FileWriter: Read from and write to audio files
  • DeviceReader/DeviceWriter: Use microphones and speakers
  • BufferReader/BufferWriter: Work with in-memory buffers
  • RunAsPipe: Run command and represent as pipe (e.g., FFmpeg stdout)

Mixing examples

Combine any input adapter with any output adapter:

Microphone to file - record translations

config = Config(
source=SourceLang(EN, mic),
targets=[TargetLang(ES, FileWriter("recording_es.wav"))]
)

File to speaker - play translations

config = Config(
source=SourceLang(EN, FileReader("presentation.mp3")),
targets=[TargetLang(ES, speaker)]
)

Microphone to multiple outputs

config = Config(
source=SourceLang(EN, mic),
targets=[
TargetLang(ES, speaker), # Play Spanish through speaker
TargetLang(ES, FileWriter("spanish.wav")), # Save Spanish to file
TargetLang(FR, FileWriter("french.wav")) # Save French to file
]
)

Buffer to buffer - for integration

input_buffer = io.BytesIO(audio_data)
output_buffer = io.BytesIO()

config = Config(
source=SourceLang(EN, BufferReader(input_buffer)),
targets=[TargetLang(ES, BufferWriter(output_buffer))]
)

FFmpeg pipe to speaker

pipe = RunAsPipe(ffmpeg_process.stdout)
config = Config(
source=SourceLang(EN, BufferReader(pipe)),
targets=[TargetLang(ES, speaker)]
)

Benchmarking

The SDK includes a powerful benchmarking module for performance analysis and quality testing. Run comprehensive benchmarks with detailed metrics, latency measurements, and trace data export.

# Quick benchmark
uv run python -m palabra_ai.benchmark examples/speech/en.mp3 en es --out ./results

# With Docker
make bench -- examples/speech/en.mp3 en es --out ./results

See Benchmarking Guide for complete documentation including configuration options, output files, and advanced usage.

Features

Real-time translation

Translate audio streams in real-time with minimal latency Perfect for live conversations, conferences, and meetings

Voice cloning

Preserve the original speaker's voice characteristics in translations Enable voice cloning in the configuration

Device management

Easy device selection with interactive prompts or programmatic access:

dm = DeviceManager()

# Interactive selection
mic, speaker = dm.select_devices_interactive()

# Get devices by name
mic = dm.get_mic_by_name("Blue Yeti")
speaker = dm.get_speaker_by_name("MacBook Pro Speakers")

# List all devices
input_devices = dm.get_input_devices()
output_devices = dm.get_output_devices()

Audio Configuration

Sample Rates by Protocol

The SDK automatically handles audio sample rates based on the connection protocol:

WebSocket (WS) Mode

  • Input (to API): Always 16kHz mono PCM
  • Output (from API): Always 24kHz mono PCM

WebRTC Mode

  • Input (to API): 48kHz mono PCM
  • Output (from API): 48kHz mono PCM

The SDK automatically resamples audio to match these requirements regardless of your input/output device capabilities.

Supported languages

Speech recognition languages (Source)

Arabic (AR), Bashkir (BA), Belarusian (BE), Bulgarian (BG), Bengali (BN), Catalan (CA), Czech (CS), Welsh (CY), Danish (DA), German (DE), Greek (EL), English (EN), Esperanto (EO), Spanish (ES), Estonian (ET), Basque (EU), Persian (FA), Finnish (FI), French (FR), Irish (GA), Galician (GL), Hebrew (HE), Hindi (HI), Croatian (HR), Hungarian (HU), Interlingua (IA), Indonesian (ID), Italian (IT), Japanese (JA), Korean (KO), Lithuanian (LT), Latvian (LV), Mongolian (MN), Marathi (MR), Malay (MS), Maltese (MT), Dutch (NL), Norwegian (NO), Polish (PL), Portuguese (PT), Romanian (RO), Russian (RU), Slovak (SK), Slovenian (SL), Swedish (SV), Swahili (SW), Tamil (TA), Thai (TH), Turkish (TR), Uyghur (UG), Ukrainian (UK), Urdu (UR), Vietnamese (VI), Chinese (ZH)

Translation languages (Target)

Arabic (AR), Azerbaijani (AZ), Belarusian (BE), Bulgarian (BG), Bosnian (BS), Catalan (CA), Czech (CS), Welsh (CY), Danish (DA), German (DE), Greek (EL), English (EN), English Australian (EN_AU), English Canadian (EN_CA), English UK (EN_GB), English US (EN_US), Spanish (ES), Spanish Mexican (ES_MX), Estonian (ET), Finnish (FI), Filipino (FIL), French (FR), French Canadian (FR_CA), Galician (GL), Hebrew (HE), Hindi (HI), Croatian (HR), Hungarian (HU), Indonesian (ID), Icelandic (IS), Italian (IT), Japanese (JA), Kazakh (KK), Korean (KO), Lithuanian (LT), Latvian (LV), Macedonian (MK), Malay (MS), Dutch (NL), Norwegian (NO), Polish (PL), Portuguese (PT), Portuguese Brazilian (PT_BR), Romanian (RO), Russian (RU), Slovak (SK), Slovenian (SL), Serbian (SR), Swedish (SV), Swahili (SW), Tamil (TA), Turkish (TR), Ukrainian (UK), Urdu (UR), Vietnamese (VI), Chinese (ZH), Chinese Simplified (ZH_HANS), Chinese Traditional (ZH_HANT)

Available language constants

from palabra_ai import (
# English variants - 1.5+ billion speakers (including L2)
EN, EN_AU, EN_CA, EN_GB, EN_US,

# Chinese variants - 1.3+ billion speakers
ZH, ZH_HANS, ZH_HANT, # ZH_HANS and ZH_HANT for translation only

# Hindi & Indian languages - 800+ million speakers
HI, BN, MR, TA, UR,

# Spanish variants - 500+ million speakers
ES, ES_MX,

# Arabic variants - 400+ million speakers
AR, AR_AE, AR_SA,

# French variants - 280+ million speakers
FR, FR_CA,

# Portuguese variants - 260+ million speakers
PT, PT_BR,

# Russian & Slavic languages - 350+ million speakers
RU, UK, PL, CS, SK, BG, HR, SR, SL, MK, BE,

# Japanese & Korean - 200+ million speakers combined
JA, KO,

# Southeast Asian languages - 400+ million speakers
ID, VI, MS, FIL, TH,

# Germanic languages - 150+ million speakers
DE, NL, SV, NO, DA, IS,

# Romance languages (other) - 100+ million speakers
IT, RO, CA, GL,

# Turkic & Central Asian languages - 200+ million speakers
TR, AZ, KK, UG,

# Baltic languages - 10+ million speakers
LT, LV, ET,

# Other European languages - 50+ million speakers
EL, HU, FI, EU, CY, MT,

# Middle Eastern languages - 50+ million speakers
HE, FA,

# African languages - 100+ million speakers
SW,

# Asian languages (other) - 50+ million speakers
MN, BA,

# Constructed languages
EO, IA,

# Other languages
GA, BS
)

Note: Source languages (for speech recognition) and target languages (for translation) have different support. The SDK automatically validates language compatibility when creating SourceLang and TargetLang objects.

Development status

Current status

  • Core SDK functionality
  • GitHub Actions CI/CD
  • Docker packaging
  • Python 3.11, 3.12, 3.13 support
  • PyPI publication
  • Documentation site (coming soon)
  • Code coverage reporting (setup required)

Current dev roadmap

  • TODO: global timeout support for long-running tasks
  • TODO: support for multiple source languages in a single run
  • TODO: fine cancelling on cancel_all_tasks()
  • TODO: error handling improvements

Build status

  • Tests: Running on Python 3.11, 3.12, 3.13
  • Release: Automated releases with Docker images
  • Coverage: Tests implemented, reporting setup needed

Requirements

  • Python 3.11+
  • Palabra AI API credentials (get them at palabra.ai)

Support

License

This project is licensed under the MIT License - see the LICENSE file for details.


(c) Palabra.ai, 2025 | Breaking down language barriers with AI

About

Python SDK for Palabra AI's real-time speech-to-speech translation API. Break down language barriers and enable seamless communication across 25+ languages

Topics

Resources

Readme

License

MIT license

Contributing

Contributing

Stars

Watchers

Forks

Packages

Contributors

Languages