API Overview
The Lens Audio TTS API provides high-quality text-to-speech synthesis with support for multiple voices, emotion control, and voice cloning.
Base URL: https://audio-chat.ask-lens.ai
- Convert text to natural-sounding WAV audio
- Stream audio generation via Server-Sent Events
- Clone voices using reference audio files
- Control speech emotion and speed
- Manage API keys and track usage
Authentication
All TTS and voice-clone endpoints require an API key. Pass it using either method:
Bearer Token (Authorization header)
``
Authorization: Bearer ak_your_api_key_here
``
X-API-Key header
``
X-API-Key: ak_your_api_key_here
``
Admin endpoints (key management, usage) require the admin secret instead of an API key.
Rate Limits
The API uses queue-based rate limiting. Each TTS request is placed in a processing queue. When the queue is full, requests are rejected with a 429 status.
Use GET /audio/queue-status to check current queue utilization before submitting requests.
{
"queue_size": 12,
"max_queue_size": 100,
"utilization": 0.12
}Error Code Reference
| Status | Meaning |
|---|---|
| 400 | Bad Request - Missing or invalid parameters |
| 401 | Unauthorized - Missing or invalid API key |
| 403 | Forbidden - Insufficient permissions (admin endpoints) |
| 404 | Not Found - Resource does not exist |
| 429 | Too Many Requests - Queue is full, try again later |
| 500 | Internal Server Error - Unexpected failure |
| 502 | Bad Gateway - Upstream service error (e.g. S3 download failed) |
| 503 | Service Unavailable - TTS engine not ready |
| 504 | Gateway Timeout - Request timed out |
TTS
/audio/ttsText to Speech
Convert text to speech audio. Returns a binary WAV file with metadata in response headers.
Request Body
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| text | string | Yes | — | The text to synthesize into speech. |
| voice_id | string | No | zh-Somer | The voice to use for synthesis. |
| emotion | string | No | calm | Emotion style for the speech. Options: happy, sad, angry, calm, surprised, fearful, disgusted, melancholic. |
| emo_vector | float[8] | No | — | Custom 8-dimensional emotion vector for fine-grained control. Overrides the emotion parameter when provided. |
| speed | float | No | 1.0 | Playback speed multiplier. Range: 0.5 - 2.0. |
Response
Binary WAV audio dataResponse Headers
| Header | Description |
|---|---|
| X-Task-Id | Unique identifier for the TTS task. |
| X-Audio-Duration | Duration of the generated audio in seconds. |
| X-Queue-Size | Current queue size at the time of processing. |
Error Codes
| Status | Body | Description |
|---|---|---|
| 400 | {"error": "Missing text"} | The required text field was not provided. |
| 400 | {"error": "Invalid JSON"} | The request body is not valid JSON. |
| 400 | {"error": "Token limit exceeded (max 3000)"} | The input text exceeds the 3000 BPE token limit. |
| 429 | {"error": "Queue full, try again later"} | The processing queue is at capacity. |
| 500 | {"error": "Empty audio returned"} | The TTS engine returned no audio data. |
| 503 | {"error": "TTS engine not ready"} | The TTS engine is still initializing. |
| 504 | {"error": "TTS request timed out"} | The request exceeded the maximum processing time. |
Code Examples
curl -X POST https://audio-chat.ask-lens.ai/audio/tts \
-H "Authorization: Bearer ak_your_api_key" \
-H "Content-Type: application/json" \
-d '{"text": "Hello, world!", "voice_id": "zh-Somer", "emotion": "happy"}' \
--output output.wav/audio/tts/streamText to Speech (Streaming)
Stream text-to-speech synthesis via Server-Sent Events. Provides real-time status updates and base64-encoded audio chunks.
Request Body
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| text | string | Yes | — | The text to synthesize into speech. |
| voice_id | string | No | zh-Somer | The voice to use for synthesis. |
| emotion | string | No | calm | Emotion style for the speech. Options: happy, sad, angry, calm, surprised, fearful, disgusted, melancholic. |
| emo_vector | float[8] | No | — | Custom 8-dimensional emotion vector for fine-grained control. Overrides the emotion parameter when provided. |
| speed | float | No | 1.0 | Playback speed multiplier. Range: 0.5 - 2.0. |
Response
event: queued
data: {"task_id": "abc-123", "queue_position": 3}
event: processing
data: {"task_id": "abc-123"}
event: audio
data: {"task_id": "abc-123", "audio": "<base64-encoded WAV>", "duration": 2.5}
event: done
data: {"task_id": "abc-123"}/audio/voicesList Voices
Retrieve the list of all available voices for TTS synthesis.
Response
{
"voices": [
{
"voice_id": "zh-Somer",
"file": "zh-Somer.wav"
},
{
"voice_id": "zh-Luna",
"file": "zh-Luna.wav"
}
],
"default_voice_id": "zh-Somer",
"count": 2
}/audio/queue-statusQueue Status
Check the current TTS processing queue size and utilization.
Response
{
"queue_size": 12,
"max_queue_size": 100,
"utilization": 0.12
}Voice Clone
/voice-clone/registerRegister Voice
Register a new cloned voice by providing a reference audio file hosted on S3.
Request Body
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| voice_id | string | Yes | — | Unique identifier for the new voice. |
| s3_url | string | Yes | — | S3 URL of the reference audio file (WAV format recommended). |
Response
{
"voice_id": "my-custom-voice",
"cached": true,
"file": "my-custom-voice.wav"
}Error Codes
| Status | Body | Description |
|---|---|---|
| 400 | {"error": "Invalid voice_id"} | The voice_id contains invalid characters or is empty. |
| 400 | {"error": "Missing s3_url"} | The required s3_url field was not provided. |
| 400 | {"error": "Unsupported audio format"} | The reference audio file is not in a supported format. |
| 502 | {"error": "S3 download failed"} | Failed to download the reference audio from S3. |
/voice-clone/{voice_id}Remove Voice
Delete a previously registered cloned voice.
Response
{
"voice_id": "my-custom-voice",
"deleted": true
}Error Codes
| Status | Body | Description |
|---|---|---|
| 400 | {"error": "Invalid voice_id"} | The voice_id contains invalid characters or is empty. |
| 404 | {"error": "Voice not found"} | No voice with the given voice_id exists. |
API Key Management
/auth/keysCreate API Key
Generate a new API key for a given user.
Request Body
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| user_id | string | Yes | — | The user identifier to associate with the new key. |
Response
{
"user_id": "user-123",
"api_key": "ak_7a52e8882d90ba41ea9222dab0b972c8650cd5ccf6b19064"
}Error Codes
| Status | Body | Description |
|---|---|---|
| 400 | {"error": "Missing user_id"} | The required user_id field was not provided. |
| 403 | {"error": "Unauthorized"} | The request does not have admin privileges. |
/auth/keysList API Keys
List all active API keys, optionally filtered by user.
Query Parameters
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| user_id | string | No | — | Filter keys by user identifier. |
Response
{
"keys": [
{
"user_id": "user-123",
"api_key": "ak_7a52e...9064",
"created_at": "2026-03-10T12:00:00"
}
],
"count": 1
}/auth/keys/by-keyRevoke API Key
Revoke a specific API key.
Request Body
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| api_key | string | Yes | — | The API key to revoke. |
Response
{
"api_key": "ak_7a52e...",
"revoked": true
}Error Codes
| Status | Body | Description |
|---|---|---|
| 400 | {"error": "Missing api_key"} | The required api_key field was not provided. |
| 403 | {"error": "Unauthorized"} | The request does not have admin privileges. |
| 404 | {"error": "No active key found"} | The specified API key does not exist or is already revoked. |
/auth/keys/by-userRevoke All User Keys
Revoke all active API keys for a given user.
Request Body
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| user_id | string | Yes | — | The user identifier whose keys should be revoked. |
Response
{
"user_id": "user-123",
"revoked_count": 3
}Error Codes
| Status | Body | Description |
|---|---|---|
| 400 | {"error": "Missing user_id"} | The required user_id field was not provided. |
| 403 | {"error": "Unauthorized"} | The request does not have admin privileges. |
| 404 | {"error": "No active keys found"} | The user has no active keys to revoke. |
Usage
/auth/usageUsage Records
Retrieve usage records with optional filtering by user, API key, and time period.
Query Parameters
| Name | Type | Required | Default | Description |
|---|---|---|---|---|
| user_id | string | No | — | Filter records by user identifier. |
| api_key | string | No | — | Filter records by API key. |
| period | string | No | — | Time period filter. Options: D (day), W (week), M (month). |
| limit | number | No | 100 | Maximum number of records to return. |
| offset | number | No | 0 | Number of records to skip for pagination. |
Response
{
"records": [
{
"user_id": "user-123",
"api_key": "ak_7a52e...9064",
"endpoint": "/audio/tts",
"tokens": 42,
"audio_duration": 3.2,
"timestamp": "2026-03-20T14:30:00Z"
}
],
"summary": {
"total_requests": 156,
"total_tokens": 8420,
"total_audio_duration": 1234.5
},
"count": 1,
"limit": 100,
"offset": 0
}Code Examples
Full working examples in popular languages.
Python Client
import requests
API_KEY = "ak_your_api_key"
BASE_URL = "https://audio-chat.ask-lens.ai"
headers = {"Authorization": f"Bearer {API_KEY}"}
# --- Text to Speech ---
response = requests.post(
f"{BASE_URL}/audio/tts",
headers=headers,
json={
"text": "Hello, welcome to Lens Audio!",
"voice_id": "zh-Somer",
"emotion": "happy",
"speed": 1.0,
},
)
if response.status_code == 200:
with open("output.wav", "wb") as f:
f.write(response.content)
print("Audio duration:", response.headers.get("X-Audio-Duration"), "s")
else:
print("Error:", response.json())
# --- List Voices ---
voices = requests.get(f"{BASE_URL}/audio/voices", headers=headers).json()
print("Available voices:", [v["voice_id"] for v in voices["voices"]])
# --- Queue Status ---
status = requests.get(f"{BASE_URL}/audio/queue-status", headers=headers).json()
print(f"Queue: {status['queue_size']}/{status['max_queue_size']}")cURL
# Text to Speech
curl -X POST https://audio-chat.ask-lens.ai/audio/tts \
-H "Authorization: Bearer ak_your_api_key" \
-H "Content-Type: application/json" \
-d '{"text": "Hello, welcome to Lens Audio!", "voice_id": "zh-Somer", "emotion": "happy"}' \
--output output.wav
# List Voices
curl https://audio-chat.ask-lens.ai/audio/voices \
-H "Authorization: Bearer ak_your_api_key"
# Queue Status
curl https://audio-chat.ask-lens.ai/audio/queue-status \
-H "Authorization: Bearer ak_your_api_key"
# Register Voice Clone
curl -X POST https://audio-chat.ask-lens.ai/voice-clone/register \
-H "Authorization: Bearer ak_your_api_key" \
-H "Content-Type: application/json" \
-d '{"voice_id": "my-voice", "s3_url": "https://s3.amazonaws.com/bucket/ref.wav"}'
# Delete Voice Clone
curl -X DELETE https://audio-chat.ask-lens.ai/voice-clone/my-voice \
-H "Authorization: Bearer ak_your_api_key"Node.js
const fs = require("fs");
const API_KEY = "ak_your_api_key";
const BASE_URL = "https://audio-chat.ask-lens.ai";
const headers = {
"Authorization": `Bearer ${API_KEY}`,
"Content-Type": "application/json",
};
// --- Text to Speech ---
async function textToSpeech(text, voiceId = "zh-Somer", emotion = "calm") {
const response = await fetch(`${BASE_URL}/audio/tts`, {
method: "POST",
headers,
body: JSON.stringify({ text, voice_id: voiceId, emotion }),
});
if (response.ok) {
const buffer = await response.arrayBuffer();
fs.writeFileSync("output.wav", Buffer.from(buffer));
console.log("Duration:", response.headers.get("X-Audio-Duration"), "s");
} else {
const error = await response.json();
console.error("Error:", error);
}
}
// --- List Voices ---
async function listVoices() {
const response = await fetch(`${BASE_URL}/audio/voices`, {
headers: { "Authorization": `Bearer ${API_KEY}` },
});
const data = await response.json();
console.log("Voices:", data.voices.map((v) => v.voice_id));
}
// --- Queue Status ---
async function queueStatus() {
const response = await fetch(`${BASE_URL}/audio/queue-status`, {
headers: { "Authorization": `Bearer ${API_KEY}` },
});
const data = await response.json();
console.log(`Queue: ${data.queue_size}/${data.max_queue_size}`);
}
textToSpeech("Hello, welcome to Lens Audio!", "zh-Somer", "happy");