AudioShake SDK Documentation

Overview

The AudioShake SDK is a powerful audio processing library that enables high-quality audio source separation. It provides a flexible API for separating audio into multiple stems (e.g., vocals, drums, bass, and other instruments) with support for both CPU and GPU processing.

To access the AudioShake SDK and obtain your Client ID and Client Secret, please contact info@audioshake.ai.

System Requirements

Each of our SDKs have different requirements depending on the platform they are being used on. Below is a list of hardware and software requirements for each of our platforms.

Hardware Requirements

Linux

Architecture: x86_64 or ARM64
For GPU processing:
- CUDA 11.8 compatible GPU
- Minimum 512MB VRAM (T4 equivalent recommended)
- Performance scales with GPU capabilities

Windows

Architecture: x86_64 or ARM64
For GPU processing:
- DirectX 12 compatible GPU
- Minimum 512MB VRAM
- Performance scales with GPU capabilities
- We can recommend any mid-tier and above Nvidia, AMD or Intel video card

Android

Architecture: ARM64 architecture (x86_64 can be used with emulator)
For GPU processing:
- OpenGL ES 3.1+ compatible GPU
- We recommend OpenCL capability for efficient processing
- Minimum 512MB VRAM
- Performance scales with GPU capabilities

Apple (iOS/macOS)

Architecture: Apple ARM or Intel x86_64
Neural Engine: Optional
For GPU processing:
- Metal-compatible GPU
- Minimum 512MB VRAM
- Performance scales with GPU capabilities

Software Dependencies

Platform	Dependencies
Linux	• Minimum Version: Ubuntu 22.04
Windows	• Minimum Version: 10.0.19041, and Platform Toolset: v142 • Runtime Library: Multi-threaded DLL (/MD)
Android	• Minimum Version: Level 26 (Android 8.0)
Apple (iOS/macOS)	• Minimum iOS Version: 17.6 • Minimum MacOS Version: 14.6

Quick Start Guide

Running the demo application

Each SDK zip package contains a README.md file that contains instructions on how to integrate and run the provided demo application. The demo provides a very simple example of how to use our SDK on the platform. For help with more sophisticated integration flow, please contact support@audioshake.ai for further assistance from one of our engineers.

API Reference

This section describes the core classes and data structures of our SDK so that you can understand how to best use them.

AudioShakeSeparator Class

The AudioShakeSeparator class is the base foundation class for doing inference on the device. Below is detailed information about the parameters and configuration options for this class.

Constructor

AudioShakeSeparator(
    const char *clientID,        // Your AudioShake client identifier
    const char *clientSecret,    // Your AudioShake client secret key
    void *model,                 // Pointer to the encrypted model data in memory
    unsigned int modelSizeBytes, // Size of the model data in bytes
    unsigned int inputSamplerate,// Sample rate of input audio (e.g., 44100 for CD quality)
    unsigned int flags = useFastestBackend | inputFloat | inputNonInterleaved | outputFloat | outputNonInterleaved | chunkNormal
);

Parameters

Parameter	Type	Description
clientID	`const char*`	Your unique AudioShake client identifier. Required for authentication. Must be a valid string provided by AudioShake. Cannot be null or empty.
clientSecret	`const char*`	Your AudioShake client secret key. Required for secure authentication. Must be a valid string provided by AudioShake. Cannot be null or empty.
model	`void*`	Pointer to the encrypted model data in memory. Must be a valid memory address. Should contain the complete model data. Must match the size specified in modelSizeBytes.
modelSizeBytes	`unsigned int`	Size of the model data in bytes. Must be greater than 0. Must match the actual size of the model data. Used to validate model integrity.
inputSamplerate	`unsigned int`	Sample rate of the input audio in Hz. Common values: 44100 (CD quality), 48000 (Professional audio). Must be a positive integer. Will be automatically resampled if needed.
flags	`unsigned int`	Combination of configuration flags. Default: `useFastestBackend \| inputFloat \| inputNonInterleaved \| outputFloat \| outputNonInterleaved \| chunkNormal`. Can be customized based on your needs.

Configuration Flags

The SDK provides several configuration flags to customize processing:

Backend Options:

useFastestBackend: Uses fastest available hardware (default)
useCPUBackendOnly: Forces CPU-only processing

Input Format Options:

inputFloat: 32-bit floating point input (default)
inputInt16: 16-bit signed integer input
inputNonInterleaved: Separate channel buffers (default)
inputInterleavedStereo: Interleaved stereo format

Output Format Options:

outputFloat: 32-bit floating point output (default)
outputInt16: 16-bit signed integer output
outputNonInterleaved: Separate channel buffers (default)
outputInterleavedStereo: Interleaved stereo format

Processing Chunk Size Options:

chunkNormal: Default chunk size (~3 seconds)
chunk2X: Half chunk size (~1.5 seconds)
chunk4X: Quarter chunk size (~0.75 seconds)
chunk8X: Eighth chunk size (~0.375 seconds)

Key Methods

Error Handling

const char *getInitializationError();

Returns initialization error message or NULL if successful.

Backend Information

const char *getBackendName();

Returns the name of the ML engine and hardware being used (e.g., "LiteRT GPU" or "CPU").

Stem Information

const char *getStemName(unsigned char stemIndex);  // Index of the stem (0 to getNumberOfStems()-1)
unsigned char getNumberOfStems();                  // Returns total number of available stems

Provides information about available stems.

Audio Configuration

unsigned int getNumberOfChannels();    // Returns number of audio channels (typically 2 for stereo)
unsigned int getOutputSamplerate();    // Returns the sample rate of output audio
unsigned int getFramesNeeded();        // Returns number of input frames needed for next processing chunk

Returns audio configuration details.

Processing

int process(
    void **inputChannels,           // Array of pointers to input channel buffers
    unsigned int numInputFrames,    // Number of frames to process
    void ***outputChannels         // Pointer to receive array of output channel buffers
);

Processes input audio and returns separated stems.

Return Values:

Positive integer: Number of output frames processed
0: More input is needed
-1: Error occurred

Process Method Parameters

Parameter	Type	Description
inputChannels	`void**`	Array of pointers to input channel buffers. Each pointer should point to a valid audio buffer. Number of channels should match getNumberOfChannels(). Can be NULL to flush remaining audio (end of file).
numInputFrames	`unsigned int`	Number of frames to process. Must be a positive integer. Should not exceed available buffer space. Typical values: 1024, 2048, 4096 frames. Affects processing latency and memory usage.
outputChannels	`void***`	Pointer to receive array of output channel buffers. Will be populated with separated stem data. May be NULL if more input is needed. Each stem will have getNumberOfChannels() channels. Memory must be allocated by the caller.

SourceSeparationTask - Primary Interface

The SourceSeparationTask class provides a high-level wrapper around the AudioShakeSeparator, making it easier to perform audio source separation tasks. It handles the complexities of audio input/output management and provides progress tracking capabilities. We provide this class to you in our demo code as a helper.

Note: This class is only used for Apple and Windows SDK packages. Android uses the AudioShakeSeparator directly along with ExoPlayer's buffer management to demonstrate media handling. For further information or questions please contact support@audioshake.ai.

Key Components

Input Classes

SourceSeparationInput (Abstract base class)
- AudioFileReader: Reads audio from files
- RingBufferInput: Reads audio from a ring buffer

Output Classes

SourceSeparationOutput (Abstract base class)
- WAVOutput: Writes stems to WAV files
- RingBufferOutput: Writes stems to a ring buffer

Usage Example

// Create input and output handlers
AudioFileReader* input = new AudioFileReader("input.wav");
WAVOutput* output = new WAVOutput("output_directory");

// Create and run the separation task
SourceSeparationTask task(
    "your_client_id",
    "your_client_secret",
    input,
    output,
    "path/to/model.crypt"
);

// Optional progress callback
void progressCallback(SourceSeparationTask* task, double progress, void* clientData) {
    std::cout << "Progress: " << (progress * 100) << "%" << std::endl;
}

// Run the separation
if (!task.run(progressCallback)) {
    std::cerr << "Error: " << task.getErrorMessage() << std::endl;
}

Constructor

SourceSeparationTask(
    const char* clientID,           // Your AudioShake client identifier
    const char* clientSecret,       // Your AudioShake client secret key
    SourceSeparationInput* input,   // Input audio provider (e.g., AudioFileReader)
    SourceSeparationOutput* output, // Output audio writer (e.g., WAVOutput)
    const PathStr& modelPath,       // Path to the .crypt model file
    void* model = nullptr,          // Optional: pre-loaded model data
    unsigned int modelSizeBytes = 0,// Optional: size of pre-loaded model
    unsigned int additionalFlags = 0 // Optional: additional AudioShakeSeparator flags
);

Constructor Parameters

Parameter	Type	Description
clientID	`const char*`	Your AudioShake client identifier. Required for authentication. Must be a valid string provided by AudioShake. Cannot be null or empty.
clientSecret	`const char*`	Your AudioShake client secret key. Required for secure authentication. Must be a valid string provided by AudioShake. Cannot be null or empty.
input	`SourceSeparationInput*`	Pointer to a SourceSeparationInput implementation. Manages audio input data. Can be AudioFileReader or RingBufferInput. Must be a valid, initialized instance. Ownership is transferred to the task.
output	`SourceSeparationOutput*`	Pointer to a SourceSeparationOutput implementation. Manages audio output data. Can be WAVOutput or RingBufferOutput. Must be a valid, initialized instance. Ownership is transferred to the task.
modelPath	`const PathStr&`	Path to the .crypt model file. Required if model and modelSizeBytes are not provided. Must be a valid file path. File must be readable.
model	`void*`	Optional pre-loaded model data. If provided, modelPath is ignored. Must be valid memory if provided. Should be used with modelSizeBytes.
modelSizeBytes	`unsigned int`	Size of pre-loaded model data. Required if model is provided. Must match actual model size. Ignored if model is nullptr.
additionalFlags	`unsigned int`	Additional configuration flags. Combined with default flags. See AudioShakeSeparator flags. Optional, defaults to 0.

Progress Tracking

double getProgress();  // Returns progress as a value between 0 and 1
bool isFinished();     // Returns true when all audio has been processed

Processing Methods

bool processOneIteration(
    ProgressCallback progressCallback = nullptr,  // Optional callback for progress updates
    void* clientData = nullptr                    // Optional user data passed to callback
);

Processes a single chunk of audio. Returns true on success, false on error.

bool run(
    ProgressCallback progressCallback = nullptr,  // Optional callback for progress updates
    void* clientData = nullptr                    // Optional user data passed to callback
);

Processes all audio until completion. Returns true on success, false on error.

Progress Callback Type

typedef void (*ProgressCallback)(
    SourceSeparationTask* task,    // Pointer to the task reporting progress
    double progress,               // Progress value between 0.0 and 1.0
    void* clientData              // User data provided in processOneIteration/run
);

Input Classes

AudioFileReader

AudioFileReader(const char* filePath);  // Path to the input audio file

Parameter	Description
filePath	Path to the input audio file. Must be a valid file path. File must exist and be readable. Supports common audio formats (WAV, MP3, etc.)

RingBufferInput

RingBufferInput(
    size_t size,                    // Size of the ring buffer in frames
    bool isStereoInterleaved,       // True for interleaved stereo, false for separate channels
    int sampleRate                  // Sample rate of the audio data
);

Parameter	Description
size	Size of the ring buffer in frames. Must be a positive integer. Affects latency and memory usage. Should be a power of 2 for optimal performance.
isStereoInterleaved	Format of stereo data. `true`: [LRLRLR...] format. `false`: [LLL...RRR...] format. Must match input data format.
sampleRate	Sample rate of the audio data. Must be a positive integer. Common values: 44100, 48000. Must match input data sample rate.

Output Classes

WAVOutput

WAVOutput(const char* outputPath);  // Directory path for output WAV files

Parameter	Description
outputPath	Directory path for output WAV files. Must be a valid directory path. Directory must exist and be writable. Each stem will be saved as a separate WAV file.

RingBufferOutput

RingBufferOutput(
    size_t size,                    // Size of the ring buffer in frames
    bool isStereoInterleaved        // True for interleaved stereo, false for separate channels
);

Parameter	Description
size	Size of the ring buffer in frames. Must be a positive integer. Affects latency and memory usage. Should be a power of 2 for optimal performance.
isStereoInterleaved	Format of stereo data. `true`: [LRLRLR...] format. `false`: [LLL...RRR...] format. Must match desired output format.

Performance Considerations

Optimization Tips

Chunk Size: Use appropriate chunk sizes based on your latency requirements
GPU Processing: Consider using GPU processing for better performance
Memory Layout: Use non-interleaved format for better memory access patterns
Memory Management: Process audio in chunks to manage memory usage

Error Handling

The SDK provides error information through:

Constructor initialization errors
Process method return values
Backend-specific error messages

Platform Support

Supported Platforms

Linux (x86_64)
Windows
Android (with platform-specific optimizations)
Apple (iOS/macOS)

Platform-Specific Considerations

Platform	Considerations
Linux	Full GPU support with CUDA
Windows	Full DirectX GPU support with CUDA
Android	Optimized for mobile GPUs
Apple	Metal and Neural Engine support

Security

The SDK implements several security features:

Encrypted model loading
Secure client authentication
Protected API access
Date-based expiry

This documentation provides a comprehensive overview of the AudioShake SDK. For specific implementation details or advanced usage scenarios, please contact support@audioshake.ai.

Overview​

System Requirements​

Hardware Requirements​

Linux​

Windows​

Android​

Apple (iOS/macOS)​

Software Dependencies​

Quick Start Guide​

Running the demo application​

API Reference​

AudioShakeSeparator Class​

Constructor​

Parameters​

Configuration Flags​

Key Methods​

Error Handling​

Backend Information​

Stem Information​

Audio Configuration​

Processing​

Process Method Parameters​

SourceSeparationTask - Primary Interface​

Key Components​

Input Classes​

Output Classes​

Usage Example​

Constructor​

Constructor Parameters​

Progress Tracking​

Processing Methods​

Progress Callback Type​

Input Classes​

AudioFileReader​

RingBufferInput​

Output Classes​

WAVOutput​

RingBufferOutput​

Performance Considerations​

Optimization Tips​

Error Handling​

Platform Support​

Supported Platforms​

Platform-Specific Considerations​

Security​

Overview

System Requirements

Hardware Requirements

Linux

Windows

Android

Apple (iOS/macOS)

Software Dependencies

Quick Start Guide

Running the demo application

API Reference

AudioShakeSeparator Class

Constructor

Parameters

Configuration Flags

Key Methods

Error Handling

Backend Information

Stem Information

Audio Configuration

Processing

Process Method Parameters

SourceSeparationTask - Primary Interface

Key Components

Input Classes

Output Classes

Usage Example

Constructor

Constructor Parameters

Progress Tracking

Processing Methods

Progress Callback Type

Input Classes

AudioFileReader

RingBufferInput

Output Classes

WAVOutput

RingBufferOutput

Performance Considerations

Optimization Tips

Error Handling

Platform Support

Supported Platforms

Platform-Specific Considerations

Security