callcut.extractors.SNRExtractor🔗

class callcut.extractors.SNRExtractor(sample_rate, hop_ms=8.0, n_bands=8, *, win_ms=32.0, band_low=300.0, band_high=10000.0, baseline_s=10.0, normalize=True, eps=1e-12)[source]🔗

Multi-band SNR feature extractor.

Extracts signal-to-noise ratio (SNR) features across multiple frequency bands. The pipeline computes:

Short-time Fourier transform (STFT)
Power spectrum in logarithmically-spaced frequency bands
Baseline estimation via median filtering
SNR in decibels relative to baseline
Optional robust normalization per band

All computations run on the input tensor’s device, enabling GPU acceleration.

Parameters:

sample_rateint: Expected sample rate in Hz.
hop_msfloat: STFT hop length in milliseconds. Determines time resolution.
n_bandsint: Number of logarithmically-spaced frequency bands.
win_msfloat: STFT window length in milliseconds.
band_lowfloat: Lower frequency bound in Hz.
band_highfloat: Upper frequency bound in Hz.
baseline_sfloat: Baseline estimation window in seconds.
normalizebool: If True, apply robust z-score normalization per band.
epsfloat: Small constant for numerical stability.

Attributes

`band_high`	Upper frequency bound in Hz.
`band_low`	Lower frequency bound in Hz.
`baseline_s`	Baseline estimation window in seconds.
`frame_rate`	Frame rate in Hz (frames per second).
`hop_ms`	Hop length in milliseconds.
`hop_s`	Hop length in seconds.
`n_bands`	Number of frequency bands (alias for `n_features`).
`n_features`	Number of frequency bands.
`normalize`	Whether normalization is applied.
`sample_rate`	Expected sample rate in Hz.
`win_ms`	Window length in milliseconds.

Methods

`__call__`(waveform)	Extract features (alias for `extract()`).
`extract`(waveform)	Extract multi-band SNR features from a waveform.
`frames_to_seconds`(frames)	Convert number of frames to duration in seconds.
`seconds_to_frames`(seconds)	Convert duration in seconds to number of frames.

Notes

The SNR for each frequency band is computed as:

\[\text{SNR}_{\text{dB}} = 10 \cdot \log_{10} \left( \frac{E_{\text{band}} + \epsilon} {E_{\text{baseline}} + \epsilon} \right)\]

where \(E_{\text{band}}\) is the instantaneous band energy and \(E_{\text{baseline}}\) is the median-filtered baseline estimate.

The FFT size is automatically chosen as the smallest power of 2 that is at least as large as the window size.

Examples

Extract features from an audio file:

>>> extractor = SNRExtractor(sample_rate=32000, hop_ms=8.0, n_bands=8)
>>> waveform, sr = load_audio("recording.wav", sample_rate=32000)
>>> features, times = extractor(waveform)
>>> features.shape
torch.Size([8, 635])

Customize frequency range and bands:

>>> extractor = SNRExtractor(
...     sample_rate=32000,
...     band_low=500.0,
...     band_high=8000.0,
...     n_bands=4,
... )
>>> features, times = extractor(waveform)
>>> features.shape
torch.Size([4, 635])

__call__(waveform)[source]🔗

Extract features (alias for extract()).

Parameters:

waveformTensor: Audio waveform of shape (1, samples) or (samples,).

Returns:

featuresTensor: Extracted features of shape (n_features, n_frames).
timesTensor: Time axis of shape (n_frames,) in seconds.

extract(waveform)[source]🔗

Extract multi-band SNR features from a waveform.

Parameters:

waveformTensor: Audio waveform of shape (1, samples) or (samples,). Should be mono (single channel). Values should be normalized to [-1, 1].

Returns:

featuresTensor: SNR features of shape (n_bands, n_frames). Each row contains the SNR time series for one frequency band. If normalize=True, values are approximately zero-centered with unit scale per band.
timesTensor: Time axis of shape (n_frames,) in seconds, indicating the center time of each frame.

frames_to_seconds(frames)[source]🔗

Convert number of frames to duration in seconds.

Parameters:

framesint: Number of frames.

Returns:

secondsfloat: Duration in seconds.

Examples

>>> extractor = SNRExtractor(sample_rate=32000, hop_ms=8.0)
>>> extractor.frames_to_seconds(250)
2.0

seconds_to_frames(seconds)[source]🔗

Convert duration in seconds to number of frames.

Parameters:

secondsfloat: Duration in seconds.

Returns:

framesint: Number of frames (rounded to nearest integer).

Examples

>>> extractor = SNRExtractor(sample_rate=32000, hop_ms=8.0)
>>> extractor.seconds_to_frames(2.0)
250

property band_high🔗

Upper frequency bound in Hz.

Type:: float

property band_low🔗

Lower frequency bound in Hz.

Type:: float

property baseline_s🔗

Baseline estimation window in seconds.

Type:: float

property frame_rate🔗

Frame rate in Hz (frames per second).

Type:: float

property hop_ms🔗

Hop length in milliseconds.

Type:: float

property hop_s🔗

Hop length in seconds.

Type:: float

property n_bands🔗

Number of frequency bands (alias for n_features).

Type:: int

property n_features🔗

Number of frequency bands.

Type:: int

property normalize🔗

Whether normalization is applied.

Type:: bool

property sample_rate🔗

Expected sample rate in Hz.

Type:: int

property win_ms🔗

Window length in milliseconds.

Type:: float