Getting Started

How It Works

Two ways to use AethonVoice. Pick the one that fits.

Path 1

REST API (For Developers)

Two ways to get results: poll for status, or let us call your webhook when it's ready. Two modes to submit: single job via POST /tts/submit, or batch via POST /tts/batch-submit for many items in one request — see Batch Processing below.

Option A: Polling

Submit a job, check back periodically, download when ready. Simple and reliable.

Polling Flow

1. POST /tts/submit       →  { jobId, status: "queued" }
2. GET  /tts/status/:jobId →  { status: "processing" }
3. GET  /tts/status/:jobId →  { status: "done", downloadUrl: "..." }
4. Download MP3 from downloadUrl (no auth needed)

Option B: Webhook Callback

Provide a callbackUrl and we'll POST the results to your server when the job completes. No polling needed.

Webhook callback flow: submit with callbackUrl, receive results via POST

Webhook Flow

1. POST /tts/submit  →  { jobId, status: "queued" }
   body: { text, voice, langs, callbackUrl: "https://your-server.com/webhook" }

2. (your app does other work — no polling needed)

3. POST your callbackUrl ← AethonVoice calls you
   {
     "jobId": "Xk9mP2qR7vNw",
     "status": "done",
     "downloadUrl": "https://storage.googleapis.com/...",
     "durationMs": 2100
   }

4. Download MP3 from downloadUrl (no auth needed)

Best for batch jobs and server-to-server integrations. You can still poll GET /tts/status/:jobId as a fallback.

If generation fails, you still receive a callback — the payload carries a top-level error string so you don't have to walk the items map:

Failure callback

{
  "batchId": "Bt9xK2pL...",
  "status": "error",
  "error": "All 3 items failed",
  "items": { /* per-item status + error */ }
}

Credits are only debited for items that successfully returned audio — failed items are never charged.

Quick Start

1 Get an API key

Sign up and create an API key from your dashboard. Keys are issued in the form av_live_<32-bytes-base64url> and shown only once at creation — we store a SHA-256 hash, never the raw key. Send it as Authorization: Bearer <key> on every request (the X-API-Key header is also accepted).

2 Submit your first job

curl

curl -X POST https://aethon.lab.ai/api/tts/submit \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "text": "Welcome to AethonVoice.",
    "voice": "aris",
    "langs": ["en"]
  }'

Response:

json

{
  "jobId": "Xk9mP2qR7vNw",
  "status": "queued"
}

3 Poll for status

curl

curl https://aethon.lab.ai/api/tts/status/Xk9mP2qR7vNw \
  -H "Authorization: Bearer YOUR_API_KEY"

Response (when complete):

json

{
  "jobId": "Xk9mP2qR7vNw",
  "status": "done",
  "downloadUrl": "https://storage.googleapis.com/...",
  "durationMs": 2100
}

4 Download

The downloadUrl is a signed URL. Download it directly — no authentication header needed.

Polling Tips

Wait 2-3 seconds after submit before first poll
Poll every 3-5 seconds
First request may take 30-60 seconds (cold start — model loading into GPU memory). Subsequent requests on a warm worker are much faster (3-15 seconds).
Timeout after 120 seconds

Endpoints

Endpoint	Method	Description
`/tts/submit`	POST	Submit single TTS job
`/tts/batch-submit`	POST	Submit batch of TTS items
`/tts/status/:jobId`	GET	Poll single job status
`/tts/batch-status/:batchId`	GET	Poll batch status with per-item progress

Job Status Values

Status	Meaning
`queued`	Job received, waiting for GPU worker
`processing`	Audio is being generated
`done`	Audio ready — `downloadUrl` included
`error`	Generation failed — `error` message included
`partial`	(Batch only) Some items succeeded, some failed

Credits & Credit Gate

1 credit = 1 second of generated audio, rounded up per job. Charges are always based on the actual duration returned — never on your input text length — and failed items are never debited.

Before we queue a job, we run a quick pre-flight estimate based on character count and language speaking rate. If the estimate exceeds your available balance by more than a 30-second tolerance, the request is rejected with 402 Payment Required before any work is done:

402 response

{
  "error": "Insufficient credits",
  "estimatedSec": 420,
  "balance": 85
}

A 30-second tolerance means borderline jobs are allowed through and may end the session with a small negative balance — the next top-up restores you automatically.

Error Codes

Code	Meaning	When
`400`	Bad Request	Missing/invalid `text`, `voice`, or `langs`
`401`	Unauthorized	Missing `Authorization` header, or key invalid/revoked
`402`	Payment Required	Estimated duration exceeds balance + 30s tolerance
`404`	Not Found	Unknown `jobId` or `batchId`
`429`	Rate Limited	Too many requests in the current window

Batch Processing

Submit multiple items in one request. Each item can have different text, voice, and languages — useful for dubbing, dataset generation, or multi-speaker scripts.

curl

curl -X POST https://aethon.lab.ai/api/tts/batch-submit \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "items": [
      { "key": "word-01", "text": "สวัสดีครับ", "voice": "aris", "langs": ["th"] },
      { "key": "word-02", "text": "こんにちは", "voice": "lyra", "langs": ["ja"] },
      { "key": "word-03", "text": "Auf Wiedersehen", "voice": "nolan", "langs": ["de"] }
    ],
    "callbackUrl": "https://your-server.com/webhook"
  }'

Response:

json

{
  "batchId": "Bk8nQ3rS6uMx",
  "status": "queued",
  "itemCount": 3
}

Poll GET /tts/batch-status/:batchId to see per-item progress. Start downloading completed items before the full batch finishes. If you supplied callbackUrl, we POST the final payload to your server — no polling needed.

Path 2 Coming Soon

MCP Server (For Everyone)

No code required. AethonVoice will work as a tool inside any MCP-compatible AI assistant.

What You Need

An AI assistant that supports MCP (Claude Code, Codex, Coworks, Cursor, Windsurf, and others)

An AethonVoice API key

A one-time MCP Server connection setup

How It Works

Once connected, you talk to your AI assistant in plain language:

"Generate Thai pronunciation for สวัสดีครับ using the Aris voice"

The assistant calls AethonVoice, generates the audio, and returns a download link.

"Read this paragraph aloud with the Lyra voice"

Paste any text. The assistant sends it to AethonVoice and gives you the MP3.

"Create audio for this blog post in French"

Long-form content works too. The assistant handles the submission and polling automatically.

"Pronounce this Japanese word: ありがとうございます"

Quick pronunciation lookups — useful for language learning, content creation, or accessibility.

Reference

Input Format

Text

Plain text in any of the 21 supported languages (24 locale variants). Mixed-language text is handled automatically.

Language Tags

For explicit control over language boundaries, wrap portions in XML-style tags:

language tags

<ja>聞いてください</ja> This is English <th>สวัสดีครับ</th>

Tags are optional. Tagged portions use the specified language. Untagged portions are split automatically by character range detection. You can mix tagged and untagged text freely.

When to use tags: When languages share the same script (e.g., Chinese and Japanese both use CJK characters). For most other language pairs, automatic detection works without tags.

Paralinguistic Tags

Insert natural sound effects inline:

Tag	Effect
`[pause]`	Silence (0.7-1.1s, randomized)
`[laugh]`	Laughter from voice's sound bank
`[sigh]`	Sigh from voice's sound bank
`[er]`	Hesitation / filler sound

example

I was thinking [pause] maybe we should [laugh] just go for it

Tags work with both API and MCP Server.

Voices

Voice	ID	Gender	Character
Aris	`aris`	Male	Warm, steady, authoritative
Nolan	`nolan`	Male	Clear, friendly, upbeat
Lyra	`lyra`	Female	Gentle, expressive, nuanced
Senna	`senna`	Female	Calm, articulate, professional

Additional Request Fields

Field	Required	Description
`callbackRef`	No	Opaque reference string stored with the job. Useful for correlating results with your own records.

Language Codes

Specify languages with the langs array. The first element is the primary language. List all languages present in your text.

Code	Language	Code	Language
`en`	English	`id`	Indonesian
`th`	Thai	`ms`	Malay
`ja`	Japanese	`hi`	Hindi
`ko`	Korean	`ar`	Arabic
`zh`	Chinese (Mandarin)	`bn`	Bengali
`yue`	Cantonese	`fa`	Persian
`fr`	French	`ur`	Urdu
`de`	German	`vi`	Vietnamese
`es`	Spanish	`tr`	Turkish
`it`	Italian	`ru`	Russian
`pt`	Portuguese

Output Format

Parameter	Value
Format	MP3
Bitrate	96 kbps
Sample rate	24 kHz
Channels	Mono
Download URL expiry	7 days

Why 24 kHz / 96 kbps?

These numbers look lower than music streaming, but they're optimal for speech. Human speech tops out at ~8 kHz — well within the 12 kHz Nyquist limit of a 24 kHz sample rate. 100% of the speech signal is captured.

At 96 kbps MP3, the compressed audio is perceptually transparent for speech — indistinguishable from the uncompressed original in listening tests. The result: smaller files, faster downloads, identical quality.

For context, OpenAI TTS also outputs at 24 kHz. This reflects the TTS research consensus that 24 kHz is the sweet spot for speech synthesis. Higher sample rates add file size with zero audible benefit for voice.

Download URLs are signed — no authentication needed to download. The cryptographic signature carries access. URLs cannot be guessed or enumerated.

Full API Documentation

This page covers the essentials. For complete API reference including all request/response schemas, error codes, rate limits, and advanced usage:

Read the Full API Docs

Get Started

Studio-grade TTS in minutes. Choose your path.

For Developers

Get an API key and make your first TTS call in minutes.

Get API Key Read the Docs →

For Everyone

Use AethonVoice through any MCP-compatible AI assistant. No code needed.

Set Up MCP Soon Claude Code, Cursor, Windsurf & more