Skip to main content

Models

When making API calls, you can choose from the following model names to determine the desired output. For instance, using the model name vocals will create a vocals job.

curl -L -X POST 'https://groovy.audioshake.ai/job' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <TOKEN>' \
-d '{
"metadata": {
"format": "wav",
"name": "vocals"
},
"assetId": "abc123"
}'

Instrument Stem Separation

NameModelDescriptionCredits / MinuteMax Length
InstrumentalinstrumentalMusic without vocals, only instruments. We recommend using the high_quality variant.1
DrumsdrumsPercussion instruments producing rhythmic beats1
VocalsvocalsIsolates singing and vocal sounds. We reccomend using the high_quality variant.1
BassbassInstruments producing low-frequency sounds, typically the bass guitar or synthesizer bass lines13 Hours
OtherotherRemaining instrumentation after removing vocals, drums, and bass13 Hours
GuitarguitarInstruments from the guitar family, including electric, acoustic, and classical guitars13 Hours
Other-x-Guitarother-x-guitarRemaining instrumentation after removing vocals, drums, bass, and guitar13 Hours
PianopianoInstruments like Rhodes piano, upright piano, grand piano, and keyboard13 Hours
WindwindInstruments like flute, saxophone, producing sound by vibrating air13 Hours
StringsstringsOrchestral string instruments like violin, viola, cello, and double bass13 Hours
Residual

If you would like to generate a residual stem, please set residual in the metadata field to true.

For more info, contact support@audioshake.ai

Dialogue, Music, & Effects

NameModelDescriptionCredits / MinuteMax Length
DialoguedialogueIsolates speech or vocals from any other sound1.53 Hours
EffectseffectsRemoves dialogue and music but retains the ambience, sound effects, and environmental noise1.53 Hours
Music removalmusic_removalRemoves music from audio while retaining dialogue, background effects, and natural soundN/A1 Hour
Background (Music & FX)music_fxRemoves dialogue to extracting a clean background stem of music and effects1.53 Hours
Music detectionmusic_detectionDetects the portions of an audio file that contain music0.53 Hours
Multi-Voicemulti_voiceSeparates dialogue from multiple speakers in audio recordings, delivering individual audio files per speaker. Available in two_speaker and n_speaker variants, detailed below.N/A1 Hour
Music Removal & Multi-Voice Availability

Currently Music Removal and Multi-Voice separation are not available via the /tasks route. Please contact support@audioshake.ai for access.

Transcription & Alignment

NameModel NameDescriptionCredits / MinuteMax Length
TranscriptiontranscriptionText representation of spoken words or audio content11 Hour
AlignmentalignmentSynchronization of audio and corresponding text or captions11 Hour
Combined T&A pricing

If you run Transcription and Alignment together (T&A), pricing is Premium at 1.5 credits per minute.

Variants

Certain models offer variants optimized for specific audio processing use-cases. To use a variant, include "variant": "<desired_variant>" in the metadata parameters when submitting a job via the API. The available variants are listed below:

ModelVariantDescriptionPlanCredits / Minute
multi_voicetwo_speakerOptimized for separating two speakers. (Default)PremiumN/A
multi_voicen_speakerCreates stems for any number of speakers.AdvancedN/A
vocalshigh_qualityHigher quality but longer processing time.Premium1.5
instrumentalhigh_qualityHigher quality but longer processing time.Premium1.5

Example request creating a multi_voice job with the n_speaker variant:

curl -L -X POST 'https://groovy.audioshake.ai/job' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <TOKEN>' \
-d '{
"metadata": {
"format": "wav",
"name": "multi_voice",
"variant": "n_speaker"
},
"assetId": "abc123"
}'