Skip to main content

Models

When making API calls, you can choose from the following model names to determine the desired output. For instance, using the model name vocals will create a vocals job.

curl -L -X POST 'https://groovy.audioshake.ai/job/' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <TOKEN>' \
-d '{
"metadata": {
"format": "wav",
"name": "vocals"
},
"assetId": "abc123"
}'

Instrument Stem Separation

NameModel NameDescriptionAPIWidgetMaximum Content Length
InstrumentalinstrumentalMusic without vocals, only instruments3 Hours
DrumsdrumsPercussion instruments producing rhythmic beats3 Hours
VocalsvocalsIsolates singing and vocal sounds3 Hours
BassbassInstruments producing low-frequency sounds, typically the bass guitar or synthesizer bass lines3 Hours
OtherotherRemaining instrumentation after removing vocals, drums, and bass3 Hours
GuitarguitarInstruments from the guitar family, including electric, acoustic, and classical guitars3 Hours
Other-x-Guitarother-x-guitarRemaining instrumentation after removing vocals, drums, bass, and guitar3 Hours
PianopianoInstruments like Rhodes piano, upright piano, grand piano, and keyboard3 Hours
WindwindInstruments like flute, saxophone, producing sound by vibrating air3 Hours
Residual

If you would like to generate a residual stem, please set residual in the metadata field to true.

For more info, contact support@audioshake.ai

Dialogue, Music, & Effects

NameModel NameDescriptionAPIWidgetMaximum Content Length
DialoguedialogueSpeech or vocals isolated from any other sound3 Hours
Music removalmusic_removalRemoving music from audio while retaining dialogue, background effects, and natural sound1 Hour
Background (Music & FX)music_fxRemove dialogue to extracting a clean background stem of music and effects3 Hours
Music detectionmusic_detectionThe portions of an audio file that contain music3 Hours
Multi-Voicemulti_voiceSeparates dialogue from multiple speakers in audio recordings, delivering individual audio files per speaker. Available in two_speaker and n_speaker variants, detailed below.1 Hour

Transcription & Alignment

NameModel NameDescriptionAPIWidgetMaximum Content Length
TranscriptiontranscriptionText representation of spoken words or audio content1 Hour
AlignmentalignmentSynchronization of audio and corresponding text or captions1 Hour

Variants

Certain models offer variants optimized for specific audio processing use-cases. To use a variant, include "variant": "<desired_variant>" in the metadata parameters when submitting a job via the API. The available variants are listed below:

ModelVariantDescription
multi_voicetwo_speakerOptimized for separating two speakers. (Default)
multi_voicen_speakerCreates stems for any number of speakers.
vocalshigh_qualityHigher quality but longer processing time.
instrumentalhigh_qualityHigher quality but longer processing time.

Example request creating a multi_voice job with the n_speaker variant:

curl -L -X POST 'https://groovy.audioshake.ai/job/' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <TOKEN>' \
-d '{
"metadata": {
"format": "wav",
"name": "multi_voice",
"variant": "n_speaker"
},
"assetId": "abc123"
}'