callcut.evaluation.compute_frame_metrics🔗
- callcut.evaluation.compute_frame_metrics(probabilities, labels, *, threshold=0.5)[source]🔗
Compute frame-level precision, recall, and F1 score.
Frame-level metrics evaluate detection at the individual frame granularity: each frame is classified as either containing a call (positive) or not (negative), and compared against the ground truth labels.
- Parameters:
- probabilities
Tensor Predicted probabilities of shape
(n_frames,), with values in[0, 1]. Typically frompredict().- labels
Tensor Ground truth labels of shape
(n_frames,), with values0(no call) or1(call). Typically fromintervals_to_frame_labels().- threshold
float Classification threshold. Frames with
probability >= thresholdare classified as positive (call). Default is0.5.
- probabilities
- Returns:
- metrics
FrameMetrics Frame-level detection metrics including TP, FP, FN, TN, precision, recall, and F1 score.
- metrics
Notes
Frame-level metrics are useful for quick sanity checks during training but can be misleading for event detection. A model might achieve high frame-level F1 by correctly predicting the middle of calls while missing boundaries, or by predicting many short false alarms.
For final evaluation, event-level metrics (from
compute_event_metrics()) are generally more meaningful.Examples
>>> import torch >>> probs = torch.tensor([0.1, 0.8, 0.9, 0.7, 0.2, 0.1]) >>> labels = torch.tensor([0.0, 1.0, 1.0, 1.0, 0.0, 0.0]) >>> metrics = compute_frame_metrics(probs, labels, threshold=0.5) >>> metrics.precision 1.0 >>> metrics.recall 1.0