callcut.nn.BaseDetector🔗
- class callcut.nn.BaseDetector(n_bands, window_frames)[source]🔗
Abstract base class for call detection models.
Subclasses must implement:
forward: Process input features and return logits.receptive_field: Property returning the receptive field in frames._save_config: Return additional constructor kwargs for serialization.
Models accept input of shape
(batch, n_bands, time)and return logits of shape(batch, time).- Parameters:
- n_bands
int Number of input frequency bands.
- window_frames
int Number of frames per input window. This determines the temporal context the model sees during training and inference. The corresponding duration in seconds depends on the feature extractor’s hop size:
window_duration_s = window_frames * hop_ms / 1000.
- n_bands
Attributes
Number of input frequency bands.
Receptive field in frames.
Number of frames per input window.
Methods
forward(x)Forward pass.
predict(features, *[, hop_frames])Run sliding window inference on a full recording.
Notes
The receptive field is the number of input frames that influence a single output prediction. For a CNN, it is typically the sum of
(kernel_size - 1)across all convolutional layers. This determines how much temporal context the model uses when making predictions.Examples
Create a custom model by subclassing
BaseDetector:>>> class MyModel(BaseDetector): ... def __init__(self, n_bands: int, window_frames: int): ... super().__init__(n_bands, window_frames) ... self._conv = nn.Conv1d(n_bands, 1, kernel_size=5, padding=2) ... ... @property ... def receptive_field(self) -> int: ... return 4 # kernel_size - 1 ... ... def forward(self, x: Tensor) -> Tensor: ... return self._conv(x).squeeze(1) ... ... def _save_config(self) -> dict: ... return {} # no additional constructor args
- predict(features, *, hop_frames=None)[source]🔗
Run sliding window inference on a full recording.
The model is applied to overlapping windows across the recording. Where windows overlap, predictions are averaged to produce smoother, more robust per-frame probability estimates.
- Parameters:
- features
Tensor Input features of shape
(n_bands, n_frames). Should be on the same device as the model.- hop_frames
int|None Hop between consecutive windows in frames. Smaller values produce more overlap and smoother predictions but increase computation time. If
None, defaults towindow_frames // 4(75% overlap).
- features
- Returns:
- probabilities
Tensor Per-frame call probabilities of shape
(n_frames,). Values are in[0, 1], where higher values indicate higher confidence that a call is present.
- probabilities
Notes
The inference process:
Slide a window of size
window_framesacross the recording with stephop_frames.For each window, run the model to get logits, then apply sigmoid to get probabilities.
Accumulate predictions for each frame. Frames covered by multiple windows receive multiple predictions.
Average the accumulated predictions to get final per-frame probabilities.
For frames near the end of the recording that don’t fit a full window, the window is padded using edge values.
Examples
>>> from callcut.pipeline import load_pipeline >>> from callcut.io import load_audio >>> >>> model, extractor, decoder = load_pipeline("pipeline.pt", device="cpu") >>> >>> waveform, sr = load_audio("recording.wav", sample_rate=32000) >>> features, times = extractor(waveform) >>> >>> probs = model.predict(features) >>> probs.shape torch.Size([1234])
- abstract property receptive_field🔗
Receptive field in frames.
The number of input frames that influence a single output prediction. Used to determine padding requirements during inference.
- Type: