Skip to main content
Isolate dialogue from the music and effects bed in any audio or video content. Use the separated outputs for dubbing prep, podcast cleanup, broadcast post-production, or feeding clean speech into downstream AI models.

Create a Task

This example extracts both the dialogue and the background (music + effects) stems:
import requests

API_KEY = "your_api_key"
HEADERS = {"Content-Type": "application/json", "x-api-key": API_KEY}

response = requests.post(
    "https://api.audioshake.ai/tasks",
    headers=HEADERS,
    json={
        "assetId": "your_asset_id",
        "targets": [
            {"model": "dialogue", "formats": ["wav"]},
            {"model": "music_fx", "formats": ["wav"]}
        ]
    }
)

task_id = response.json()["id"]
print(f"Task created: {task_id}")
Check Task status to monitor progress and download results, or use webhooks to be notified when each target completes.

Outputs

TargetRole
dialogueClean speech — use as reference or feed into translation/dubbing
music_fxMusic + effects bed — layer localized voiceover on top

Use cases

  • Prepare content for localization and foreign-language dubbing
  • Clean up podcast audio by isolating the host’s voice
  • Extract clean dialogue for speech-to-text or AI training data
  • Separate effects and ambience for sound design workflows

Stem Separation

Separate music into individual instruments instead.

Music Detection

Find where music appears before separating.