callcut.nn.TinySegCNNπŸ”—

class callcut.nn.TinySegCNN(n_bands, window_frames, base=32)[source]πŸ”—

Lightweight 1D CNN for call detection.

A small convolutional neural network (~10K parameters) that processes multi-band SNR features to detect animal calls. The architecture uses four 1D convolutional layers to capture temporal patterns across frequency bands.

Parameters:
n_bandsint

Number of input frequency bands.

window_framesint

Number of frames per input window. The corresponding duration in seconds depends on the feature extractor’s hop size: window_duration_s = window_frames * hop_ms / 1000.

baseint

Base number of filters (channels in hidden layers).

Attributes

base

Base number of filters.

receptive_field

Receptive field in frames.

Methods

forward(x)

Forward pass.

Notes

Architecture:

Input: (batch, n_bands, time)
  -> Conv1d(n_bands, base, kernel=9, padding=4) + ReLU
  -> Conv1d(base, base, kernel=9, padding=4) + ReLU
  -> Conv1d(base, base, kernel=5, padding=2) + ReLU
  -> Conv1d(base, 1, kernel=1)
Output: (batch, time)

The receptive field is 21 frames (sum of kernel_size - 1 for each layer).

Examples

>>> model = TinySegCNN(n_bands=8, window_frames=250)
>>> x = torch.randn(4, 8, 250)  # batch=4, bands=8, time=250
>>> logits = model(x)
>>> logits.shape
torch.Size([4, 250])
forward(x)[source]πŸ”—

Forward pass.

Parameters:
xTensor

Input features of shape (batch, n_bands, time).

Returns:
logitsTensor

Output logits of shape (batch, time).

property baseπŸ”—

Base number of filters.

Type:

int

property receptive_fieldπŸ”—

Receptive field in frames.