callcut.pipeline.evaluate_recordings🔗

callcut.pipeline.evaluate_recordings(model, extractor, recordings, decoder, matcher, *, hop_frames=None, boundary_tolerance_ms=None)[source]🔗

Evaluate a trained model on annotated recordings.

For each recording: loads audio, extracts features, predicts frame-level probabilities, decodes to call intervals, matches against ground truth, and computes event-level, frame-level, and boundary metrics.

Results are aggregated across all recordings.

Parameters:

modelBaseDetector: Trained model for call detection. Should already be on the desired device.
extractorBaseExtractor: Feature extractor matching the model’s expected input.
recordingslist of RecordingInfo: Recordings to evaluate. Each must have a valid annotation file.
decoderBaseDecoder: Decoder for converting probabilities to call intervals.
matcherBaseIntervalMatcher: Matcher for pairing predicted and ground truth intervals.
hop_framesint | None: Hop between inference windows in frames. If None, uses the model’s default (75%% overlap).
boundary_tolerance_msfloat | None: If set, discard matched events where either boundary error exceeds this tolerance when computing boundary accuracy statistics.

Returns:

reportEvaluationReport: Evaluation results with per-recording details and aggregate metrics.

Examples

>>> from callcut.extractors import SNRExtractor
>>> from callcut.nn import TinySegCNN
>>> from callcut.evaluation import HysteresisDecoder, IoUMatcher
>>> from callcut.io import scan_recordings
>>> from callcut.pipeline import evaluate_recordings
>>>
>>> extractor = SNRExtractor(sample_rate=32000)
>>> model = TinySegCNN(n_bands=8, window_frames=250)
>>> decoder = HysteresisDecoder()
>>> matcher = IoUMatcher()
>>>
>>> recordings = scan_recordings(list(Path("data/").glob("*.wav")))
>>> report = evaluate_recordings(model, extractor, recordings, decoder, matcher)
>>> print(report)