callcut.pipeline.evaluate_recordings🔗

callcut.pipeline.evaluate_recordings(model, extractor, recordings, decoder, matcher, *, hop_frames=None, boundary_tolerance_ms=None)[source]🔗

Evaluate a trained model on annotated recordings.

For each recording: loads audio, extracts features, predicts frame-level probabilities, decodes to call intervals, matches against ground truth, and computes event-level, frame-level, and boundary metrics.

Results are aggregated across all recordings.

Parameters:
modelBaseDetector

Trained model for call detection. Should already be on the desired device.

extractorBaseExtractor

Feature extractor matching the model’s expected input.

recordingslist of RecordingInfo

Recordings to evaluate. Each must have a valid annotation file.

decoderBaseDecoder

Decoder for converting probabilities to call intervals.

matcherBaseIntervalMatcher

Matcher for pairing predicted and ground truth intervals.

hop_framesint | None

Hop between inference windows in frames. If None, uses the model’s default (75%% overlap).

boundary_tolerance_msfloat | None

If set, discard matched events where either boundary error exceeds this tolerance when computing boundary accuracy statistics.

Returns:
reportEvaluationReport

Evaluation results with per-recording details and aggregate metrics.

Examples

>>> from callcut.extractors import SNRExtractor
>>> from callcut.nn import TinySegCNN
>>> from callcut.evaluation import HysteresisDecoder, IoUMatcher
>>> from callcut.io import scan_recordings
>>> from callcut.pipeline import evaluate_recordings
>>>
>>> extractor = SNRExtractor(sample_rate=32000)
>>> model = TinySegCNN(n_bands=8, window_frames=250)
>>> decoder = HysteresisDecoder()
>>> matcher = IoUMatcher()
>>>
>>> recordings = scan_recordings(list(Path("data/").glob("*.wav")))
>>> report = evaluate_recordings(model, extractor, recordings, decoder, matcher)
>>> print(report)