callcut.extractors.SNRExtractorπŸ”—

class callcut.extractors.SNRExtractor(sample_rate, hop_ms=8.0, n_bands=8, *, win_ms=32.0, band_low=300.0, band_high=10000.0, baseline_s=10.0, normalize=True, eps=1e-12)[source]πŸ”—

Multi-band SNR feature extractor.

Extracts signal-to-noise ratio (SNR) features across multiple frequency bands. The pipeline computes:

  1. Short-time Fourier transform (STFT)

  2. Power spectrum in logarithmically-spaced frequency bands

  3. Baseline estimation via median filtering

  4. SNR in decibels relative to baseline

  5. Optional robust normalization per band

All computations run on the input tensor’s device, enabling GPU acceleration.

Parameters:
sample_rateint

Expected sample rate in Hz.

hop_msfloat

STFT hop length in milliseconds. Determines time resolution.

n_bandsint

Number of logarithmically-spaced frequency bands.

win_msfloat

STFT window length in milliseconds.

band_lowfloat

Lower frequency bound in Hz.

band_highfloat

Upper frequency bound in Hz.

baseline_sfloat

Baseline estimation window in seconds.

normalizebool

If True, apply robust z-score normalization per band.

epsfloat

Small constant for numerical stability.

Attributes

band_high

Upper frequency bound in Hz.

band_low

Lower frequency bound in Hz.

baseline_s

Baseline estimation window in seconds.

frame_rate

Frame rate in Hz (frames per second).

hop_ms

Hop length in milliseconds.

hop_s

Hop length in seconds.

n_bands

Number of frequency bands (alias for n_features).

n_features

Number of frequency bands.

normalize

Whether normalization is applied.

sample_rate

Expected sample rate in Hz.

win_ms

Window length in milliseconds.

Methods

__call__(waveform)

Extract features (alias for extract()).

extract(waveform)

Extract multi-band SNR features from a waveform.

frames_to_seconds(frames)

Convert number of frames to duration in seconds.

seconds_to_frames(seconds)

Convert duration in seconds to number of frames.

Notes

The SNR for each frequency band is computed as:

\[\text{SNR}_{\text{dB}} = 10 \cdot \log_{10} \left( \frac{E_{\text{band}} + \epsilon} {E_{\text{baseline}} + \epsilon} \right)\]

where \(E_{\text{band}}\) is the instantaneous band energy and \(E_{\text{baseline}}\) is the median-filtered baseline estimate.

The FFT size is automatically chosen as the smallest power of 2 that is at least as large as the window size.

Examples

Extract features from an audio file:

>>> extractor = SNRExtractor(sample_rate=32000, hop_ms=8.0, n_bands=8)
>>> waveform, sr = load_audio("recording.wav", sample_rate=32000)
>>> features, times = extractor(waveform)
>>> features.shape
torch.Size([8, 635])

Customize frequency range and bands:

>>> extractor = SNRExtractor(
...     sample_rate=32000,
...     band_low=500.0,
...     band_high=8000.0,
...     n_bands=4,
... )
>>> features, times = extractor(waveform)
>>> features.shape
torch.Size([4, 635])
__call__(waveform)[source]πŸ”—

Extract features (alias for extract()).

Parameters:
waveformTensor

Audio waveform of shape (1, samples) or (samples,).

Returns:
featuresTensor

Extracted features of shape (n_features, n_frames).

timesTensor

Time axis of shape (n_frames,) in seconds.

extract(waveform)[source]πŸ”—

Extract multi-band SNR features from a waveform.

Parameters:
waveformTensor

Audio waveform of shape (1, samples) or (samples,). Should be mono (single channel). Values should be normalized to [-1, 1].

Returns:
featuresTensor

SNR features of shape (n_bands, n_frames). Each row contains the SNR time series for one frequency band. If normalize=True, values are approximately zero-centered with unit scale per band.

timesTensor

Time axis of shape (n_frames,) in seconds, indicating the center time of each frame.

frames_to_seconds(frames)[source]πŸ”—

Convert number of frames to duration in seconds.

Parameters:
framesint

Number of frames.

Returns:
secondsfloat

Duration in seconds.

Examples

>>> extractor = SNRExtractor(sample_rate=32000, hop_ms=8.0)
>>> extractor.frames_to_seconds(250)
2.0
seconds_to_frames(seconds)[source]πŸ”—

Convert duration in seconds to number of frames.

Parameters:
secondsfloat

Duration in seconds.

Returns:
framesint

Number of frames (rounded to nearest integer).

Examples

>>> extractor = SNRExtractor(sample_rate=32000, hop_ms=8.0)
>>> extractor.seconds_to_frames(2.0)
250
property band_highπŸ”—

Upper frequency bound in Hz.

Type:

float

property band_lowπŸ”—

Lower frequency bound in Hz.

Type:

float

property baseline_sπŸ”—

Baseline estimation window in seconds.

Type:

float

property frame_rateπŸ”—

Frame rate in Hz (frames per second).

Type:

float

property hop_msπŸ”—

Hop length in milliseconds.

Type:

float

property hop_sπŸ”—

Hop length in seconds.

Type:

float

property n_bandsπŸ”—

Number of frequency bands (alias for n_features).

Type:

int

property n_featuresπŸ”—

Number of frequency bands.

Type:

int

property normalizeπŸ”—

Whether normalization is applied.

Type:

bool

property sample_rateπŸ”—

Expected sample rate in Hz.

Type:

int

property win_msπŸ”—

Window length in milliseconds.

Type:

float