callcut.nn.TinySegCNN🔗

class callcut.nn.TinySegCNN(n_bands, window_frames, base=32)[source]🔗

Lightweight 1D CNN for call detection.

A small convolutional neural network (~10K parameters) that processes multi-band SNR features to detect animal calls. The architecture uses four 1D convolutional layers to capture temporal patterns across frequency bands.

Parameters:

n_bandsint: Number of input frequency bands.
window_framesint: Number of frames per input window. The corresponding duration in seconds depends on the feature extractor’s hop size: window_duration_s = window_frames * hop_ms / 1000.
baseint: Base number of filters (channels in hidden layers).

Attributes

`base`	Base number of filters.
`receptive_field`	Receptive field in frames.

Methods

forward(x)

Forward pass.

Notes

Architecture:

Input: (batch, n_bands, time)
  -> Conv1d(n_bands, base, kernel=9, padding=4) + ReLU
  -> Conv1d(base, base, kernel=9, padding=4) + ReLU
  -> Conv1d(base, base, kernel=5, padding=2) + ReLU
  -> Conv1d(base, 1, kernel=1)
Output: (batch, time)

The receptive field is 21 frames (sum of kernel_size - 1 for each layer).

Examples

>>> model = TinySegCNN(n_bands=8, window_frames=250)
>>> x = torch.randn(4, 8, 250)  # batch=4, bands=8, time=250
>>> logits = model(x)
>>> logits.shape
torch.Size([4, 250])

forward(x)[source]🔗

Forward pass.

Parameters:

xTensor: Input features of shape (batch, n_bands, time).

Returns:

logitsTensor: Output logits of shape (batch, time).

property base🔗

Base number of filters.

Type:: int

property receptive_field🔗: Receptive field in frames.