callcut.pipeline.evaluate_recordings🔗
- callcut.pipeline.evaluate_recordings(model, extractor, recordings, decoder, matcher, *, hop_frames=None, boundary_tolerance_ms=None)[source]🔗
Evaluate a trained model on annotated recordings.
For each recording: loads audio, extracts features, predicts frame-level probabilities, decodes to call intervals, matches against ground truth, and computes event-level, frame-level, and boundary metrics.
Results are aggregated across all recordings.
- Parameters:
- model
BaseDetector Trained model for call detection. Should already be on the desired device.
- extractor
BaseExtractor Feature extractor matching the model’s expected input.
- recordings
listofRecordingInfo Recordings to evaluate. Each must have a valid annotation file.
- decoder
BaseDecoder Decoder for converting probabilities to call intervals.
- matcher
BaseIntervalMatcher Matcher for pairing predicted and ground truth intervals.
- hop_frames
int|None Hop between inference windows in frames. If
None, uses the model’s default (75%% overlap).- boundary_tolerance_ms
float|None If set, discard matched events where either boundary error exceeds this tolerance when computing boundary accuracy statistics.
- model
- Returns:
- report
EvaluationReport Evaluation results with per-recording details and aggregate metrics.
- report
Examples
>>> from callcut.extractors import SNRExtractor >>> from callcut.nn import TinySegCNN >>> from callcut.evaluation import HysteresisDecoder, IoUMatcher >>> from callcut.io import scan_recordings >>> from callcut.pipeline import evaluate_recordings >>> >>> extractor = SNRExtractor(sample_rate=32000) >>> model = TinySegCNN(n_bands=8, window_frames=250) >>> decoder = HysteresisDecoder() >>> matcher = IoUMatcher() >>> >>> recordings = scan_recordings(list(Path("data/").glob("*.wav"))) >>> report = evaluate_recordings(model, extractor, recordings, decoder, matcher) >>> print(report)