callcut.io.load_audio🔗

callcut.io.load_audio(fname, *, sample_rate=None, mono=True, device=None)[source]🔗

Load an audio file.

Supports common audio formats (wav, mp3, flac, ogg, etc.) and video files with audio streams (mp4, etc.) via torchcodec/FFmpeg.

Parameters:

fnamestr | Path: Path to the audio file.
sample_rateint | None: Target sample rate in Hz. If None, the original sample rate is preserved. If specified, the audio is resampled to this rate.
monobool: If True, convert multi-channel audio to mono by averaging channels.
devicestr | torch.device | None: Device to place the loaded tensor on (e.g., "cpu", "cuda:0", "mps"). If None, uses the default torch device.

Returns:

waveformTensor: Audio waveform of shape (channels, samples) or (1, samples) if mono=True. Values are normalized to [-1, 1].
sample_rateint: Sample rate of the returned waveform in Hz.

Examples

Load an audio file at its original sample rate:

>>> waveform, sr = load_audio("recording.wav")
>>> waveform.shape
torch.Size([1, 32000])

Load and resample to 16 kHz:

>>> waveform, sr = load_audio("recording.wav", sample_rate=16000)
>>> sr
16000

Load directly to GPU:

>>> waveform, sr = load_audio("recording.wav", device="cuda:0")
>>> waveform.device
device(type='cuda', index=0)