Skip to main content

AudioShake SDK Documentation

Overview

The AudioShake SDK is a powerful audio processing library that enables high-quality audio source separation. It provides a flexible API for separating audio into multiple stems (e.g., vocals, drums, bass, and other instruments) with support for both CPU and GPU processing.

To access the AudioShake SDK and obtain your Client ID and Client Secret, please contact info@audioshake.ai.

System Requirements

Each of our SDKs have different requirements depending on the platform they are being used on. Below is a list of hardware and software requirements for each of our platforms.

Hardware Requirements

Linux

  • Architecture: x86_64
  • For GPU processing:
    • CUDA 11.8 compatible GPU
    • Minimum 4GB VRAM (T4 equivalent recommended)
    • Performance scales with GPU capabilities

Windows

  • Architecture: x86_64
  • For GPU processing:
    • DirectX 12 compatible GPU
    • Minimum 4GB VRAM (T4 equivalent recommended)
    • Performance scales with GPU capabilities

Android

  • Architecture: ARM64
  • For GPU processing:
    • OpenGL ES 3.1+ compatible GPU
    • Minimum 2GB VRAM
    • Performance scales with GPU capabilities

Apple (iOS/macOS)

  • Architecture: Apple Silicon (Apple ARM) or Intel x86_64
  • Neural Engine: Optional
  • For GPU processing:
    • Metal-compatible GPU
    • Minimum 2GB VRAM
    • Performance scales with GPU capabilities

Software Dependencies

PlatformDependencies
Linux• OpenSSL
• libsndfile
• CMake 3.22.1 or higher
• For GPU processing: CUDA 11.8
Windows• OpenSSL
• CMake 3.22.1 or higher
• For GPU processing: DirectX 12 SDK
AndroidNo dependencies needed
Apple (iOS/macOS)No dependencies needed

Quick Start Guide

Running the demo application

Each SDK zip package contains a README.md file that contains instructions on how to integrate and run the provided demo application. The demo provides a very simple example of how to use our SDK on the platform. For help with more sophisticated integration flow, please contact support@audioshake.ai for further assistance from one of our engineers.

API Reference

This section describes the core classes and data structures of our SDK so that you can understand how to best use them.

AudioShakeSeparator Class

The AudioShakeSeparator class is the base foundation class for doing inference on the device. Below is detailed information about the parameters and configuration options for this class.

Constructor

AudioShakeSeparator(
const char *clientID, // Your AudioShake client identifier
const char *clientSecret, // Your AudioShake client secret key
void *model, // Pointer to the encrypted model data in memory
unsigned int modelSizeBytes, // Size of the model data in bytes
unsigned int inputSamplerate,// Sample rate of input audio (e.g., 44100 for CD quality)
unsigned int flags = useFastestBackend | inputFloat | inputNonInterleaved | outputFloat | outputNonInterleaved | chunkNormal
);

Parameters

ParameterTypeDescription
clientIDconst char*Your unique AudioShake client identifier. Required for authentication. Must be a valid string provided by AudioShake. Cannot be null or empty.
clientSecretconst char*Your AudioShake client secret key. Required for secure authentication. Must be a valid string provided by AudioShake. Cannot be null or empty.
modelvoid*Pointer to the encrypted model data in memory. Must be a valid memory address. Should contain the complete model data. Must match the size specified in modelSizeBytes.
modelSizeBytesunsigned intSize of the model data in bytes. Must be greater than 0. Must match the actual size of the model data. Used to validate model integrity.
inputSamplerateunsigned intSample rate of the input audio in Hz. Common values: 44100 (CD quality), 48000 (Professional audio). Must be a positive integer. Will be automatically resampled if needed.
flagsunsigned intCombination of configuration flags. Default: useFastestBackend | inputFloat | inputNonInterleaved | outputFloat | outputNonInterleaved | chunkNormal. Can be customized based on your needs.

Configuration Flags

The SDK provides several configuration flags to customize processing:

Backend Options:

  • useFastestBackend: Uses fastest available hardware (default)
  • useCPUBackendOnly: Forces CPU-only processing

Input Format Options:

  • inputFloat: 32-bit floating point input (default)
  • inputInt16: 16-bit signed integer input
  • inputNonInterleaved: Separate channel buffers (default)
  • inputInterleavedStereo: Interleaved stereo format

Output Format Options:

  • outputFloat: 32-bit floating point output (default)
  • outputInt16: 16-bit signed integer output
  • outputNonInterleaved: Separate channel buffers (default)
  • outputInterleavedStereo: Interleaved stereo format

Processing Chunk Size Options:

  • chunkNormal: Default chunk size (~3 seconds)
  • chunk2X: Half chunk size (~1.5 seconds)
  • chunk4X: Quarter chunk size (~0.75 seconds)
  • chunk8X: Eighth chunk size (~0.375 seconds)

Key Methods

Error Handling

const char *getInitializationError();

Returns initialization error message or NULL if successful.

Backend Information

const char *getBackendName();

Returns the name of the ML engine and hardware being used (e.g., "LiteRT GPU" or "CPU").

Stem Information

const char *getStemName(unsigned char stemIndex);  // Index of the stem (0 to getNumberOfStems()-1)
unsigned char getNumberOfStems(); // Returns total number of available stems

Provides information about available stems.

Audio Configuration

unsigned int getNumberOfChannels();    // Returns number of audio channels (typically 2 for stereo)
unsigned int getOutputSamplerate(); // Returns the sample rate of output audio
unsigned int getFramesNeeded(); // Returns number of input frames needed for next processing chunk

Returns audio configuration details.

Processing

int process(
void **inputChannels, // Array of pointers to input channel buffers
unsigned int numInputFrames, // Number of frames to process
void ***outputChannels // Pointer to receive array of output channel buffers
);

Processes input audio and returns separated stems.

Return Values:

  • Positive integer: Number of output frames processed
  • 0: More input is needed
  • -1: Error occurred

Process Method Parameters

ParameterTypeDescription
inputChannelsvoid**Array of pointers to input channel buffers. Each pointer should point to a valid audio buffer. Number of channels should match getNumberOfChannels(). Can be NULL to flush remaining audio (end of file).
numInputFramesunsigned intNumber of frames to process. Must be a positive integer. Should not exceed available buffer space. Typical values: 1024, 2048, 4096 frames. Affects processing latency and memory usage.
outputChannelsvoid***Pointer to receive array of output channel buffers. Will be populated with separated stem data. May be NULL if more input is needed. Each stem will have getNumberOfChannels() channels. Memory must be allocated by the caller.

SourceSeparationTask - Primary Interface

The SourceSeparationTask class provides a high-level wrapper around the AudioShakeSeparator, making it easier to perform audio source separation tasks. It handles the complexities of audio input/output management and provides progress tracking capabilities. We provide this class to you in our demo code as a helper.

Note: This class is only used for Apple and Windows SDK packages. Android uses the AudioShakeSeparator directly along with ExoPlayer's buffer management to demonstrate media handling. For further information or questions please contact support@audioshake.ai.

Key Components

Input Classes

  • SourceSeparationInput (Abstract base class)
    • AudioFileReader: Reads audio from files
    • RingBufferInput: Reads audio from a ring buffer

Output Classes

  • SourceSeparationOutput (Abstract base class)
    • WAVOutput: Writes stems to WAV files
    • RingBufferOutput: Writes stems to a ring buffer

Usage Example

// Create input and output handlers
AudioFileReader* input = new AudioFileReader("input.wav");
WAVOutput* output = new WAVOutput("output_directory");

// Create and run the separation task
SourceSeparationTask task(
"your_client_id",
"your_client_secret",
input,
output,
"path/to/model.crypt"
);

// Optional progress callback
void progressCallback(SourceSeparationTask* task, double progress, void* clientData) {
std::cout << "Progress: " << (progress * 100) << "%" << std::endl;
}

// Run the separation
if (!task.run(progressCallback)) {
std::cerr << "Error: " << task.getErrorMessage() << std::endl;
}

Constructor

SourceSeparationTask(
const char* clientID, // Your AudioShake client identifier
const char* clientSecret, // Your AudioShake client secret key
SourceSeparationInput* input, // Input audio provider (e.g., AudioFileReader)
SourceSeparationOutput* output, // Output audio writer (e.g., WAVOutput)
const PathStr& modelPath, // Path to the .crypt model file
void* model = nullptr, // Optional: pre-loaded model data
unsigned int modelSizeBytes = 0,// Optional: size of pre-loaded model
unsigned int additionalFlags = 0 // Optional: additional AudioShakeSeparator flags
);

Constructor Parameters

ParameterTypeDescription
clientIDconst char*Your AudioShake client identifier. Required for authentication. Must be a valid string provided by AudioShake. Cannot be null or empty.
clientSecretconst char*Your AudioShake client secret key. Required for secure authentication. Must be a valid string provided by AudioShake. Cannot be null or empty.
inputSourceSeparationInput*Pointer to a SourceSeparationInput implementation. Manages audio input data. Can be AudioFileReader or RingBufferInput. Must be a valid, initialized instance. Ownership is transferred to the task.
outputSourceSeparationOutput*Pointer to a SourceSeparationOutput implementation. Manages audio output data. Can be WAVOutput or RingBufferOutput. Must be a valid, initialized instance. Ownership is transferred to the task.
modelPathconst PathStr&Path to the .crypt model file. Required if model and modelSizeBytes are not provided. Must be a valid file path. File must be readable.
modelvoid*Optional pre-loaded model data. If provided, modelPath is ignored. Must be valid memory if provided. Should be used with modelSizeBytes.
modelSizeBytesunsigned intSize of pre-loaded model data. Required if model is provided. Must match actual model size. Ignored if model is nullptr.
additionalFlagsunsigned intAdditional configuration flags. Combined with default flags. See AudioShakeSeparator flags. Optional, defaults to 0.

Progress Tracking

double getProgress();  // Returns progress as a value between 0 and 1
bool isFinished(); // Returns true when all audio has been processed

Processing Methods

bool processOneIteration(
ProgressCallback progressCallback = nullptr, // Optional callback for progress updates
void* clientData = nullptr // Optional user data passed to callback
);

Processes a single chunk of audio. Returns true on success, false on error.

bool run(
ProgressCallback progressCallback = nullptr, // Optional callback for progress updates
void* clientData = nullptr // Optional user data passed to callback
);

Processes all audio until completion. Returns true on success, false on error.

Progress Callback Type

typedef void (*ProgressCallback)(
SourceSeparationTask* task, // Pointer to the task reporting progress
double progress, // Progress value between 0.0 and 1.0
void* clientData // User data provided in processOneIteration/run
);

Input Classes

AudioFileReader

AudioFileReader(const char* filePath);  // Path to the input audio file
ParameterDescription
filePathPath to the input audio file. Must be a valid file path. File must exist and be readable. Supports common audio formats (WAV, MP3, etc.)

RingBufferInput

RingBufferInput(
size_t size, // Size of the ring buffer in frames
bool isStereoInterleaved, // True for interleaved stereo, false for separate channels
int sampleRate // Sample rate of the audio data
);
ParameterDescription
sizeSize of the ring buffer in frames. Must be a positive integer. Affects latency and memory usage. Should be a power of 2 for optimal performance.
isStereoInterleavedFormat of stereo data. true: [LRLRLR...] format. false: [LLL...RRR...] format. Must match input data format.
sampleRateSample rate of the audio data. Must be a positive integer. Common values: 44100, 48000. Must match input data sample rate.

Output Classes

WAVOutput

WAVOutput(const char* outputPath);  // Directory path for output WAV files
ParameterDescription
outputPathDirectory path for output WAV files. Must be a valid directory path. Directory must exist and be writable. Each stem will be saved as a separate WAV file.

RingBufferOutput

RingBufferOutput(
size_t size, // Size of the ring buffer in frames
bool isStereoInterleaved // True for interleaved stereo, false for separate channels
);
ParameterDescription
sizeSize of the ring buffer in frames. Must be a positive integer. Affects latency and memory usage. Should be a power of 2 for optimal performance.
isStereoInterleavedFormat of stereo data. true: [LRLRLR...] format. false: [LLL...RRR...] format. Must match desired output format.

Performance Considerations

Optimization Tips

  1. Chunk Size: Use appropriate chunk sizes based on your latency requirements
  2. GPU Processing: Consider using GPU processing for better performance
  3. Memory Layout: Use non-interleaved format for better memory access patterns
  4. Memory Management: Process audio in chunks to manage memory usage

Error Handling

The SDK provides error information through:

  1. Constructor initialization errors
  2. Process method return values
  3. Backend-specific error messages

Platform Support

Supported Platforms

  • Linux (x86_64)
  • Windows
  • Android (with platform-specific optimizations)
  • Apple (iOS/macOS)

Platform-Specific Considerations

PlatformConsiderations
LinuxFull GPU support with CUDA
WindowsDirectX support for GPU processing
AndroidOptimized for mobile GPUs
AppleMetal and Neural Engine support

Security

The SDK implements several security features:

  • Encrypted model loading
  • Secure client authentication
  • Protected API access
  • Date-based expiry

This documentation provides a comprehensive overview of the AudioShake SDK. For specific implementation details or advanced usage scenarios, please contact support@audioshake.ai.