callcut.extractors.SNRExtractorπ
- class callcut.extractors.SNRExtractor(sample_rate, hop_ms=8.0, n_bands=8, *, win_ms=32.0, band_low=300.0, band_high=10000.0, baseline_s=10.0, normalize=True, eps=1e-12)[source]π
Multi-band SNR feature extractor.
Extracts signal-to-noise ratio (SNR) features across multiple frequency bands. The pipeline computes:
Short-time Fourier transform (STFT)
Power spectrum in logarithmically-spaced frequency bands
Baseline estimation via median filtering
SNR in decibels relative to baseline
Optional robust normalization per band
All computations run on the input tensorβs device, enabling GPU acceleration.
- Parameters:
- sample_rate
int Expected sample rate in Hz.
- hop_ms
float STFT hop length in milliseconds. Determines time resolution.
- n_bands
int Number of logarithmically-spaced frequency bands.
- win_ms
float STFT window length in milliseconds.
- band_low
float Lower frequency bound in Hz.
- band_high
float Upper frequency bound in Hz.
- baseline_s
float Baseline estimation window in seconds.
- normalize
bool If
True, apply robust z-score normalization per band.- eps
float Small constant for numerical stability.
- sample_rate
Attributes
Upper frequency bound in Hz.
Lower frequency bound in Hz.
Baseline estimation window in seconds.
Frame rate in Hz (frames per second).
Hop length in milliseconds.
Hop length in seconds.
Number of frequency bands (alias for
n_features).Number of frequency bands.
Whether normalization is applied.
Expected sample rate in Hz.
Window length in milliseconds.
Methods
__call__(waveform)Extract features (alias for
extract()).extract(waveform)Extract multi-band SNR features from a waveform.
frames_to_seconds(frames)Convert number of frames to duration in seconds.
seconds_to_frames(seconds)Convert duration in seconds to number of frames.
Notes
The SNR for each frequency band is computed as:
\[\text{SNR}_{\text{dB}} = 10 \cdot \log_{10} \left( \frac{E_{\text{band}} + \epsilon} {E_{\text{baseline}} + \epsilon} \right)\]where \(E_{\text{band}}\) is the instantaneous band energy and \(E_{\text{baseline}}\) is the median-filtered baseline estimate.
The FFT size is automatically chosen as the smallest power of 2 that is at least as large as the window size.
Examples
Extract features from an audio file:
>>> extractor = SNRExtractor(sample_rate=32000, hop_ms=8.0, n_bands=8) >>> waveform, sr = load_audio("recording.wav", sample_rate=32000) >>> features, times = extractor(waveform) >>> features.shape torch.Size([8, 635])
Customize frequency range and bands:
>>> extractor = SNRExtractor( ... sample_rate=32000, ... band_low=500.0, ... band_high=8000.0, ... n_bands=4, ... ) >>> features, times = extractor(waveform) >>> features.shape torch.Size([4, 635])
- extract(waveform)[source]π
Extract multi-band SNR features from a waveform.
- Parameters:
- waveform
Tensor Audio waveform of shape
(1, samples)or(samples,). Should be mono (single channel). Values should be normalized to[-1, 1].
- waveform
- Returns:
- frames_to_seconds(frames)[source]π
Convert number of frames to duration in seconds.
Examples
>>> extractor = SNRExtractor(sample_rate=32000, hop_ms=8.0) >>> extractor.frames_to_seconds(250) 2.0
- seconds_to_frames(seconds)[source]π
Convert duration in seconds to number of frames.
- Parameters:
- seconds
float Duration in seconds.
- seconds
- Returns:
- frames
int Number of frames (rounded to nearest integer).
- frames
Examples
>>> extractor = SNRExtractor(sample_rate=32000, hop_ms=8.0) >>> extractor.seconds_to_frames(2.0) 250
- property n_bandsπ
Number of frequency bands (alias for
n_features).- Type: