callcut.evaluation.compute_event_metricsđź”—
- callcut.evaluation.compute_event_metrics(ground_truth, predictions, matches)[source]đź”—
Compute event-level precision, recall, and F1 score.
Event-level metrics evaluate detection at the call/event granularity: each ground truth event is either correctly detected (true positive) or missed (false negative), and each prediction is either correct (true positive) or a false alarm (false positive).
- Parameters:
- Returns:
- metrics
EventMetrics Event-level detection metrics including TP, FP, FN, precision, recall, and F1 score.
- metrics
Notes
The metrics are computed as:
True Positives (TP): Number of matched pairs. Each match represents a ground truth event that was correctly detected.
False Positives (FP): Predictions without a match. These are “false alarms” - the model predicted a call where there was none.
False Negatives (FN): Ground truth events without a match. These are “missed detections” - real calls that the model failed to detect.
Precision and recall are computed as:
\[ \begin{align}\begin{aligned}\begin{split}\\text{Precision} = \\frac{TP}{TP + FP}\end{split}\\\begin{split}\\text{Recall} = \\frac{TP}{TP + FN}\end{split}\\\begin{split}F_1 = \\frac{2 \\cdot \\text{Precision} \\cdot \\text{Recall}} {\\text{Precision} + \\text{Recall}}\end{split}\end{aligned}\end{align} \]Examples
>>> from callcut.evaluation import Interval, IoUMatcher, compute_event_metrics >>> >>> gt = [Interval(0.0, 1.0), Interval(2.0, 3.0), Interval(4.0, 5.0)] >>> pred = [Interval(0.1, 0.9), Interval(2.1, 3.1)] # missed one >>> >>> matcher = IoUMatcher(iou_threshold=0.2) >>> matches = matcher.match(gt, pred) >>> >>> metrics = compute_event_metrics(gt, pred, matches) >>> metrics.tp 2 >>> metrics.fn # one ground truth was missed 1 >>> metrics.recall 0.666...