Skip to main content

Linux Integration

The libAudioShakeSDK.so library provides functionalities for audio source separation. It offers a flexible API allowing different types of inputs and outputs for processing audio data. The key class for performing source separation is the SourceSeparationTask, which interacts with various input and output classes to handle audio data.

The Linux library enables the separation of audio into stems via on-device or in-cloud processing.

Specs

  • x86_64 Linux
  • The base library requires libtorch and OpenSSL and libsndfile
  • GPU requirement: it works with CUDA 11.8 version of libtorch and have adequate VRAM (~4gig though could be less). Depending on your performance needs, you may need a T4 equivalent
  • Docker image, that has all the proper dependencies, is included. Base image: ubuntu:20.04
  • Demo project uses CMake.
  • The current model processes float32 stereo 44.1kHz audio data and produces 4 output stems (drums, bass, vocals, other).

Running AudioShakeDemoCLI

  • Navigate to the SDK's directory in the terminal.
  • Build the docker image by running docker build -t audioshake-demo-cli .
  • Run the docker container: docker run --rm -it -v "$(pwd):/app"audioshake-demo-cli.
  • Navigate to demo app directory inside the docker container. cd /app/demo/AudioShakeDemoCLI/
  • Build with cmake: cmake -S . -B build && cmake --build build
  • Copy an audio file (e.g. input.mp3) into the AudioShakeDemoCLI directory.
  • Run the app: build/AudioShakeDemoCLI /app/model/model.crypt input.mp3 . output
  • Find the stem audio files in the output directory.

Integration

  • libAudioShakeSDK.so file needs to be linked.
  • Cpp headers provided.
  • The SourceSeparationTask class does the source separation and it has an input and an output.
  • 3 different input sources: MemoryInput, FileInput and QueueInput.
  • 3 different output destinations: MemoryOutput, FileOutput and QueueOutput.
  • The memory input and output processes audio from an audio buffer stored in memory.
  • The file input and output uses libsndfile to decode and encode the most popular audio file formats.
  • The queue input and output uses a buffer and can be used to stream audio through the source separator.

The following code example separates an audio file and saves the output files to a directory:

void runSourceSeparation(const char* modelFilePath, const char* inputFilePath, const char* outputDirPath, const char* outputFileName) {
SndFileAudioFileStreamReader audioFileStreamReader;
FileInput fileInput(inputFilePath, audioFileStreamReader);
SndFileStemFileWriter stemFileWriter(44100, 2);
FileOutput fileOutput(outputFileName, outputDirPath, 44100, stemFileWriter);
const ComputeDevice computeDevice = ComputeDevice::ComputeDeviceCPU;
SourceSeparationTask task(modelFilePath, computeDevice, &fileInput, &fileOutput);
task.run();
}

Performance Benchmark

When profiled on 3-minute audio on a T4 using libtorch:

  • Cuda-11.8: 37X realtime
  • CPU: 4 CPUs: 2.77X realtime

Conclusion

The provided C++ headers and the AudioShake SDK for Linux library make it straightforward to integrate source separation capabilities into your projects, offering a streamlined and efficient workflow. The AudioShake SDK is an invaluable resource for developers seeking to harness the power of audio source separation on Linux. If you have any technical questions around the Linux SDK please contact us at support@audioshake.ai

Class Descriptions

SourceSeparationTask

The SourceSeparationTask class is responsible for performing source separation on audio data. It takes an input source and an output destination, processes the input to separate different audio sources, and writes the processed data to the output destination.

Constructor:

  • SourceSeparationTask(const char* modelPath, ComputeDevice computeDevice, SourceSeparationInput* input, SourceSeparationOutput* output, StemConfiguration stemConfiguration): Creates a new source separation task.
    • modelPath: The path to the model.
    • computeDevice: The device to run the model on.
    • input: The input for the task.
    • output: The output for the task.
    • stemConfiguration: The configuration of the stems. Automatically set to default

Key Methods:

  • void run(): The task will read audio data from the input, process it, and write the separated sources to the output. The task will run until the end of the input is reached and all audio frames are processed.

SourceSeparationQueue

The SourceSeparationQueue class is responsible for performing source separation on audio data using a queue-based approach.

Constructor:

  • SourceSeparationQueue(const char modelFilePath, ComputeDevice computeDevice): Creates a new source separation queue.
    • modelPath: The path to the model.
    • computeDevice: The device to run the model on.

Key Methods:

  • bool pushAudioFrames(const float const inputBuffer, unsigned int numberOfFrames, bool endOfInput): Pushes audio frames into the queue. The inputBuffer contains the audio data, numberOfFrames specifies how many frames are being pushed, and endOfInput indicates if this is the last chunk of input data.

AudioFileStreamReader (Abstract Class)

AudioFileStreamReader is an abstract base class for different file readers. It defines the interface for reading audio data from a file.

NOTE: In our example we are using SndFileAudioFileStreamReader as our implementation for this class for reading files using libSndFile. You can implement an override AudioFileStreamReader to use any other sound file reader you prefer.

Key Methods:

  • bool readAudioFrames(float** buffer, size_t inputBufferSize, size_t* numberOfFramesRead, bool* endOfInput) override;
  • Reads audio data into the provided buffer.
    • numberOfFramesRead: The number of frames read into the buffer (out).
    • endOfInput: True if the end of the input has been reached (out).

SourceSeparationInput (Abstract Class)

SourceSeparationInput is an abstract base class for different input sources. It defines the interface for reading audio data.

Key Methods:

  • virtual bool readAudioFrames(float** buffer, size_t bufferSize, size_t* numberOfFramesRead, bool* endOfInput) = 0: Reads audio data into the provided buffer. Returns true if the read operation was successful, false otherwise.
    • numberOfFramesRead: The number of frames read into the buffer (out).
    • endOfInput: True if the end of the input has been reached (out).

MemoryInput (Derived from SourceSeparationInput)

MemoryInput reads audio data from a memory buffer.

Constructor:

  • MemoryInput(size_t size, int sampleRate, unsigned int numberOfChannels = 2): Initializes the input with sample rate and channel count and the size of the data.

FileInput (Derived from SourceSeparationInput)

FileInput reads audio data from a file.

Constructor:

  • FileInput(const char* filePath, AudioFileStreamReader& audioFileStreamReader): Initializes the input with the path to the audio file and an associated file stream reader.

QueueInput (Derived from SourceSeparationInput)

QueueInput reads audio data from a buffer queue, useful for streaming audio.

Constructor:

  • QueueInput(size_t size): Initializes the input with a reference to a ring buffer.

Key Methods:

  • bool push(float* const* buffer, unsigned int numberOfFrames, bool endOfInput): Pushes audio data into the queue through the provided buffer. Returns true if the push operation was successful, false otherwise

StemFileWriter (Abstract Class)

StemFileWriter is an abstract base class for creating stem file writers. It defines the interface for writing audio data from a file.

NOTE: In our example we are using SndFileStemFileWriter as our implementation for this class for writing stem files using libSndFile. You can implement an override of StemFileWriter using any other sound file writer you prefer.

Key Methods:

  • virtual bool writeToStemFiles(float*** buffer, size_t numberOfFrames) = 0
  • Writes audio data into the provided buffer for multiple stem files. This is a multidimensional array
    • numberOfFrames: The number of frames read into the buffer.

SourceSeparationOutput (Abstract Class)

SourceSeparationOutput is an abstract base class for different output destinations. It defines the interface for writing audio data.

Key Methods:

  • virtual bool writeAudioFrames(float*** buffer, size_t numberOfFrames) override: Writes audio data from the provided buffer.

MemoryOutput (Derived from SourceSeparationOutput)

MemoryOutput writes audio data to a memory buffer.

Constructor:

  • MemoryOutput(size_t size): Initializes the output with the size of the buffer.

FileOutput (Derived from SourceSeparationOutput)

FileOutput writes audio data to a file.

Constructor:

  • FileOutput(const char* fileName, const char* destinationPath, int sampleRate, StemFileWriter& stemFileWriter): Initializes the output with the path to the audio file.
    • destinationPath: Path to destination directory
    • sampleRate: The sample rate of the audio file
    • stemFileWriter: The stem file writer used to write the audio to file

Key Methods:

  • virtual bool writeAudioFrames(float*** buffer, size_t numberOfFrames) override: Writes data from the provided buffer to the file.

QueueOutput (Derived from SourceSeparationOutput)

QueueOutput writes audio data to a queue, useful for streaming audio.

Constructor:

  • QueueOutput(size_t size): Initializes the output with a size.

Key Methods:

  • bool pull(float*** buffer, unsigned int numberOfFrames): Pulls audio data from the queue to the provided buffer.