Video Generation
Generate AI videos, talking avatar videos, face swaps, localizations, and video edits.
POST /videos/talking-avatar
Generate a talking avatar video with lip-sync. The avatar speaks a script using a selected voice, and the resulting video includes realistic lip movements synchronized to the audio.
You must provide either script + voice_id (to generate a new voiceover on the fly) or voiceover_id (to reuse an existing voiceover). If both are provided, voiceover_id takes precedence.
Credit cost: 20 credits per 250 characters of script (minimum 20, maximum 160).
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
avatar_id | string | Yes | UUID of the avatar to use. |
avatar_type | string | Yes | One of "platform", "custom", or "product". |
script | string | Conditional | The text the avatar will speak. Required if voiceover_id is not provided. |
voice_id | string | Conditional | ElevenLabs voice ID. Required if voiceover_id is not provided. |
voiceover_id | string | Conditional | UUID of an existing voiceover. Required if script and voice_id are not provided. |
lipsync_model | string | No | Lip-sync model to use. Default: "heygen-avatar4". Options: "heygen-avatar4", "lipsync-2.0". |
Response
{
"generation_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"type": "talking-avatar",
"status": "processing"
}Examples
curl -X POST https://www.adsumo.ai/api/v1/videos/talking-avatar \
-H "Authorization: Bearer adsumo_sk_..." \
-H "Content-Type: application/json" \
-d '{
"avatar_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"avatar_type": "platform",
"script": "Welcome to Adsumo. Create stunning AI video ads in minutes.",
"voice_id": "EXAVITQu4vr4xnSDxMaL"
}'POST /videos/ai-video
Generate an AI video from a text prompt using one of several supported models. Each model has different capabilities, supported aspect ratios, and duration options.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | One of "sora-2", "sora-2-pro", "veo-3.1", "veo-3.1-fast", "kling-2.6". |
prompt | string | Yes | Text description of the video to generate. |
aspect_ratio | string | Yes | Aspect ratio of the output video (e.g. "16:9", "9:16"). |
duration | number | Yes | Duration in seconds. Valid values depend on the model. |
resolution | string | No | Output resolution. For example "720p", "1080p", "4k". |
reference_image_url | string | No | URL of a reference image to guide generation. |
hd | boolean | No | Enable HD output. Applicable to Sora models only. |
generate_audio | boolean | No | Generate an audio track alongside the video. |
first_frame_image_url | string | No | URL of an image to use as the first frame. Applicable to Veo models only. |
last_frame_image_url | string | No | URL of an image to use as the last frame. Applicable to Veo models only. |
Model Constraints
| Model | Aspect Ratios | Durations (seconds) | Notes |
|---|---|---|---|
sora-2 | 16:9, 9:16 | 4, 8, 12, 16, 20 | |
sora-2-pro | 16:9, 9:16 | 4, 8, 12, 16, 20 | Higher quality output |
veo-3.1 | any | 4, 6, 8 | Supports first_frame_image_url and last_frame_image_url |
veo-3.1-fast | any | 4, 6, 8 | Faster generation, lower quality |
kling-2.6 | any | 5, 10 |
Response
{
"generation_id": "b2c3d4e5-f6a7-8901-bcde-f12345678901",
"type": "ai-video",
"status": "processing"
}Examples
curl -X POST https://www.adsumo.ai/api/v1/videos/ai-video \
-H "Authorization: Bearer adsumo_sk_..." \
-H "Content-Type: application/json" \
-d '{
"model": "sora-2",
"prompt": "A sleek product bottle rotating on a marble pedestal with soft studio lighting",
"aspect_ratio": "16:9",
"duration": 8,
"hd": true
}'POST /videos/swap-avatar
Replace the face in an existing video with an avatar or a custom face image. You must provide either avatar_id + avatar_type or image_url to specify the replacement face.
Credit cost: 1 credit per second of video in standard mode, 2 credits per second in pro mode.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
video_url | string | Yes | URL of the source video. |
mode | string | Yes | One of "standard" or "pro". Pro mode produces higher fidelity face replacement. |
duration | number | Yes | Duration of the video in seconds (1--30). |
width | number | Yes | Width of the video in pixels. |
height | number | Yes | Height of the video in pixels. |
avatar_id | string | Conditional | UUID of the avatar to use as the replacement face. Required if image_url is not provided. |
avatar_type | string | Conditional | One of "custom", "platform", or "product". Required if avatar_id is provided. |
image_url | string | Conditional | Direct URL to a face image. Required if avatar_id is not provided. |
Response
{
"generation_id": "c3d4e5f6-a7b8-9012-cdef-123456789012",
"type": "swap-avatar",
"status": "processing"
}POST /videos/localize
Translate and dub a video into another language. The audio is re-generated in the target language and lip movements are adjusted to match.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
video_url | string | Yes | URL of the source video. |
mode | string | Yes | One of "speed" or "precision". Speed mode is faster; precision mode produces higher quality dubbing. |
output_language | string | Yes | Target language code (e.g. "es", "fr", "de", "ja"). |
duration | number | Yes | Duration of the video in seconds (1--120). |
width | number | Yes | Width of the video in pixels. |
height | number | Yes | Height of the video in pixels. |
translate_audio_only | boolean | No | If true, only the audio track is translated -- lip movements are not adjusted. |
speaker_num | number | No | Number of speakers in the video. Helps improve speaker diarization for multi-speaker content. |
enable_dynamic_duration | boolean | No | If true, allows the output video duration to differ from the input to better accommodate the translated speech. |
Response
{
"generation_id": "d4e5f6a7-b8c9-0123-defa-234567890123",
"type": "localize",
"status": "processing"
}POST /videos/edit
Add captions and visual effects to a video using an AI caption template. You must provide either video_url (for any video) or generation_id (to edit a previously generated AI video).
Credit cost: 2 credits.
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
template_id | string | Yes | ID of the caption template to apply. |
video_url | string | Conditional | URL of the video to edit. Required if generation_id is not provided. |
generation_id | string | Conditional | UUID of a completed AI video generation. Required if video_url is not provided. |
language | string | No | Language code for caption generation. Default: "en". |
auto_approve | boolean | No | Automatically approve the generated captions. Default: true. |
Response
{
"generation_id": "e5f6a7b8-c9d0-1234-efab-345678901234",
"type": "edit",
"status": "processing"
}Checking Generation Status
All video generation endpoints return immediately with a "processing" status. To check whether a generation has completed, poll the generation status endpoint or configure a webhook. See the Overview for details on the async generation pattern.
curl https://www.adsumo.ai/api/v1/generations/a1b2c3d4-e5f6-7890-abcd-ef1234567890 \
-H "Authorization: Bearer adsumo_sk_..."