callcut.evaluation.compute_frame_metrics🔗

callcut.evaluation.compute_frame_metrics(probabilities, labels, *, threshold=0.5)[source]🔗

Compute frame-level precision, recall, and F1 score.

Frame-level metrics evaluate detection at the individual frame granularity: each frame is classified as either containing a call (positive) or not (negative), and compared against the ground truth labels.

Parameters:
probabilitiesTensor

Predicted probabilities of shape (n_frames,), with values in [0, 1]. Typically from predict().

labelsTensor

Ground truth labels of shape (n_frames,), with values 0 (no call) or 1 (call). Typically from intervals_to_frame_labels().

thresholdfloat

Classification threshold. Frames with probability >= threshold are classified as positive (call). Default is 0.5.

Returns:
metricsFrameMetrics

Frame-level detection metrics including TP, FP, FN, TN, precision, recall, and F1 score.

Notes

Frame-level metrics are useful for quick sanity checks during training but can be misleading for event detection. A model might achieve high frame-level F1 by correctly predicting the middle of calls while missing boundaries, or by predicting many short false alarms.

For final evaluation, event-level metrics (from compute_event_metrics()) are generally more meaningful.

Examples

>>> import torch
>>> probs = torch.tensor([0.1, 0.8, 0.9, 0.7, 0.2, 0.1])
>>> labels = torch.tensor([0.0, 1.0, 1.0, 1.0, 0.0, 0.0])
>>> metrics = compute_frame_metrics(probs, labels, threshold=0.5)
>>> metrics.precision
1.0
>>> metrics.recall
1.0