跳到主要內容

API Overview

The Lens Audio TTS API provides high-quality text-to-speech synthesis with support for multiple voices, emotion control, and voice cloning.

Base URL: https://audio-chat.ask-lens.ai

  • Convert text to natural-sounding WAV audio
  • Stream audio generation via Server-Sent Events
  • Clone voices using reference audio files
  • Control speech emotion and speed
  • Manage API keys and track usage

Authentication

All TTS and voice-clone endpoints require an API key. Pass it using either method:

Bearer Token (Authorization header) `` Authorization: Bearer ak_your_api_key_here ``

X-API-Key header `` X-API-Key: ak_your_api_key_here ``

Admin endpoints (key management, usage) require the admin secret instead of an API key.

Rate Limits

The API uses queue-based rate limiting. Each TTS request is placed in a processing queue. When the queue is full, requests are rejected with a 429 status.

Use GET /audio/queue-status to check current queue utilization before submitting requests.

json
{
  "queue_size": 12,
  "max_queue_size": 100,
  "utilization": 0.12
}

Error Code Reference

StatusMeaning
400Bad Request - Missing or invalid parameters
401Unauthorized - Missing or invalid API key
403Forbidden - Insufficient permissions (admin endpoints)
404Not Found - Resource does not exist
429Too Many Requests - Queue is full, try again later
500Internal Server Error - Unexpected failure
502Bad Gateway - Upstream service error (e.g. S3 download failed)
503Service Unavailable - TTS engine not ready
504Gateway Timeout - Request timed out

TTS

POST/audio/tts

Text to Speech

Convert text to speech audio. Returns a binary WAV file with metadata in response headers.

Request Body

NameTypeRequiredDefaultDescription
textstringYesThe text to synthesize into speech.
voice_idstringNozh-SomerThe voice to use for synthesis.
emotionstringNocalmEmotion style for the speech. Options: happy, sad, angry, calm, surprised, fearful, disgusted, melancholic.
emo_vectorfloat[8]NoCustom 8-dimensional emotion vector for fine-grained control. Overrides the emotion parameter when provided.
speedfloatNo1.0Playback speed multiplier. Range: 0.5 - 2.0.

Response

json
Binary WAV audio data

Response Headers

HeaderDescription
X-Task-IdUnique identifier for the TTS task.
X-Audio-DurationDuration of the generated audio in seconds.
X-Queue-SizeCurrent queue size at the time of processing.

Error Codes

StatusBodyDescription
400{"error": "Missing text"}The required text field was not provided.
400{"error": "Invalid JSON"}The request body is not valid JSON.
400{"error": "Token limit exceeded (max 3000)"}The input text exceeds the 3000 BPE token limit.
429{"error": "Queue full, try again later"}The processing queue is at capacity.
500{"error": "Empty audio returned"}The TTS engine returned no audio data.
503{"error": "TTS engine not ready"}The TTS engine is still initializing.
504{"error": "TTS request timed out"}The request exceeded the maximum processing time.

Code Examples

bash
curl -X POST https://audio-chat.ask-lens.ai/audio/tts \
  -H "Authorization: Bearer ak_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, world!", "voice_id": "zh-Somer", "emotion": "happy"}' \
  --output output.wav
POST/audio/tts/stream

Text to Speech (Streaming)

Stream text-to-speech synthesis via Server-Sent Events. Provides real-time status updates and base64-encoded audio chunks.

Request Body

NameTypeRequiredDefaultDescription
textstringYesThe text to synthesize into speech.
voice_idstringNozh-SomerThe voice to use for synthesis.
emotionstringNocalmEmotion style for the speech. Options: happy, sad, angry, calm, surprised, fearful, disgusted, melancholic.
emo_vectorfloat[8]NoCustom 8-dimensional emotion vector for fine-grained control. Overrides the emotion parameter when provided.
speedfloatNo1.0Playback speed multiplier. Range: 0.5 - 2.0.

Response

json
event: queued
data: {"task_id": "abc-123", "queue_position": 3}

event: processing
data: {"task_id": "abc-123"}

event: audio
data: {"task_id": "abc-123", "audio": "<base64-encoded WAV>", "duration": 2.5}

event: done
data: {"task_id": "abc-123"}
GET/audio/voices

List Voices

Retrieve the list of all available voices for TTS synthesis.

Response

json
{
  "voices": [
    {
      "voice_id": "zh-Somer",
      "file": "zh-Somer.wav"
    },
    {
      "voice_id": "zh-Luna",
      "file": "zh-Luna.wav"
    }
  ],
  "default_voice_id": "zh-Somer",
  "count": 2
}
GET/audio/queue-status

Queue Status

Check the current TTS processing queue size and utilization.

Response

json
{
  "queue_size": 12,
  "max_queue_size": 100,
  "utilization": 0.12
}

Voice Clone

POST/voice-clone/register

Register Voice

Register a new cloned voice by providing a reference audio file hosted on S3.

Request Body

NameTypeRequiredDefaultDescription
voice_idstringYesUnique identifier for the new voice.
s3_urlstringYesS3 URL of the reference audio file (WAV format recommended).

Response

json
{
  "voice_id": "my-custom-voice",
  "cached": true,
  "file": "my-custom-voice.wav"
}

Error Codes

StatusBodyDescription
400{"error": "Invalid voice_id"}The voice_id contains invalid characters or is empty.
400{"error": "Missing s3_url"}The required s3_url field was not provided.
400{"error": "Unsupported audio format"}The reference audio file is not in a supported format.
502{"error": "S3 download failed"}Failed to download the reference audio from S3.
DELETE/voice-clone/{voice_id}

Remove Voice

Delete a previously registered cloned voice.

Response

json
{
  "voice_id": "my-custom-voice",
  "deleted": true
}

Error Codes

StatusBodyDescription
400{"error": "Invalid voice_id"}The voice_id contains invalid characters or is empty.
404{"error": "Voice not found"}No voice with the given voice_id exists.

API Key Management

POST/auth/keys

Create API Key

Generate a new API key for a given user.

Request Body

NameTypeRequiredDefaultDescription
user_idstringYesThe user identifier to associate with the new key.

Response

json
{
  "user_id": "user-123",
  "api_key": "ak_7a52e8882d90ba41ea9222dab0b972c8650cd5ccf6b19064"
}

Error Codes

StatusBodyDescription
400{"error": "Missing user_id"}The required user_id field was not provided.
403{"error": "Unauthorized"}The request does not have admin privileges.
GET/auth/keys

List API Keys

List all active API keys, optionally filtered by user.

Query Parameters

NameTypeRequiredDefaultDescription
user_idstringNoFilter keys by user identifier.

Response

json
{
  "keys": [
    {
      "user_id": "user-123",
      "api_key": "ak_7a52e...9064",
      "created_at": "2026-03-10T12:00:00"
    }
  ],
  "count": 1
}
DELETE/auth/keys/by-key

Revoke API Key

Revoke a specific API key.

Request Body

NameTypeRequiredDefaultDescription
api_keystringYesThe API key to revoke.

Response

json
{
  "api_key": "ak_7a52e...",
  "revoked": true
}

Error Codes

StatusBodyDescription
400{"error": "Missing api_key"}The required api_key field was not provided.
403{"error": "Unauthorized"}The request does not have admin privileges.
404{"error": "No active key found"}The specified API key does not exist or is already revoked.
DELETE/auth/keys/by-user

Revoke All User Keys

Revoke all active API keys for a given user.

Request Body

NameTypeRequiredDefaultDescription
user_idstringYesThe user identifier whose keys should be revoked.

Response

json
{
  "user_id": "user-123",
  "revoked_count": 3
}

Error Codes

StatusBodyDescription
400{"error": "Missing user_id"}The required user_id field was not provided.
403{"error": "Unauthorized"}The request does not have admin privileges.
404{"error": "No active keys found"}The user has no active keys to revoke.

Usage

GET/auth/usage

Usage Records

Retrieve usage records with optional filtering by user, API key, and time period.

Query Parameters

NameTypeRequiredDefaultDescription
user_idstringNoFilter records by user identifier.
api_keystringNoFilter records by API key.
periodstringNoTime period filter. Options: D (day), W (week), M (month).
limitnumberNo100Maximum number of records to return.
offsetnumberNo0Number of records to skip for pagination.

Response

json
{
  "records": [
    {
      "user_id": "user-123",
      "api_key": "ak_7a52e...9064",
      "endpoint": "/audio/tts",
      "tokens": 42,
      "audio_duration": 3.2,
      "timestamp": "2026-03-20T14:30:00Z"
    }
  ],
  "summary": {
    "total_requests": 156,
    "total_tokens": 8420,
    "total_audio_duration": 1234.5
  },
  "count": 1,
  "limit": 100,
  "offset": 0
}

Code Examples

完整的熱門程式語言範例。

Python Client

python
import requests

API_KEY = "ak_your_api_key"
BASE_URL = "https://audio-chat.ask-lens.ai"

headers = {"Authorization": f"Bearer {API_KEY}"}

# --- Text to Speech ---
response = requests.post(
    f"{BASE_URL}/audio/tts",
    headers=headers,
    json={
        "text": "Hello, welcome to Lens Audio!",
        "voice_id": "zh-Somer",
        "emotion": "happy",
        "speed": 1.0,
    },
)

if response.status_code == 200:
    with open("output.wav", "wb") as f:
        f.write(response.content)
    print("Audio duration:", response.headers.get("X-Audio-Duration"), "s")
else:
    print("Error:", response.json())

# --- List Voices ---
voices = requests.get(f"{BASE_URL}/audio/voices", headers=headers).json()
print("Available voices:", [v["voice_id"] for v in voices["voices"]])

# --- Queue Status ---
status = requests.get(f"{BASE_URL}/audio/queue-status", headers=headers).json()
print(f"Queue: {status['queue_size']}/{status['max_queue_size']}")

cURL

bash
# Text to Speech
curl -X POST https://audio-chat.ask-lens.ai/audio/tts \
  -H "Authorization: Bearer ak_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, welcome to Lens Audio!", "voice_id": "zh-Somer", "emotion": "happy"}' \
  --output output.wav

# List Voices
curl https://audio-chat.ask-lens.ai/audio/voices \
  -H "Authorization: Bearer ak_your_api_key"

# Queue Status
curl https://audio-chat.ask-lens.ai/audio/queue-status \
  -H "Authorization: Bearer ak_your_api_key"

# Register Voice Clone
curl -X POST https://audio-chat.ask-lens.ai/voice-clone/register \
  -H "Authorization: Bearer ak_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"voice_id": "my-voice", "s3_url": "https://s3.amazonaws.com/bucket/ref.wav"}'

# Delete Voice Clone
curl -X DELETE https://audio-chat.ask-lens.ai/voice-clone/my-voice \
  -H "Authorization: Bearer ak_your_api_key"

Node.js

javascript
const fs = require("fs");

const API_KEY = "ak_your_api_key";
const BASE_URL = "https://audio-chat.ask-lens.ai";

const headers = {
  "Authorization": `Bearer ${API_KEY}`,
  "Content-Type": "application/json",
};

// --- Text to Speech ---
async function textToSpeech(text, voiceId = "zh-Somer", emotion = "calm") {
  const response = await fetch(`${BASE_URL}/audio/tts`, {
    method: "POST",
    headers,
    body: JSON.stringify({ text, voice_id: voiceId, emotion }),
  });

  if (response.ok) {
    const buffer = await response.arrayBuffer();
    fs.writeFileSync("output.wav", Buffer.from(buffer));
    console.log("Duration:", response.headers.get("X-Audio-Duration"), "s");
  } else {
    const error = await response.json();
    console.error("Error:", error);
  }
}

// --- List Voices ---
async function listVoices() {
  const response = await fetch(`${BASE_URL}/audio/voices`, {
    headers: { "Authorization": `Bearer ${API_KEY}` },
  });
  const data = await response.json();
  console.log("Voices:", data.voices.map((v) => v.voice_id));
}

// --- Queue Status ---
async function queueStatus() {
  const response = await fetch(`${BASE_URL}/audio/queue-status`, {
    headers: { "Authorization": `Bearer ${API_KEY}` },
  });
  const data = await response.json();
  console.log(`Queue: ${data.queue_size}/${data.max_queue_size}`);
}

textToSpeech("Hello, welcome to Lens Audio!", "zh-Somer", "happy");