callcut.evaluation.compute_boundary_accuracy🔗

callcut.evaluation.compute_boundary_accuracy(ground_truth, predictions, matches, *, boundary_tolerance_ms=None)[source]🔗

Compute onset/offset timing errors for matched events.

Measures how accurately the predicted call boundaries align with the ground truth boundaries. Only matched events (true positives) are included in this computation.

Parameters:
ground_truthlist of Interval

Ground truth call intervals.

predictionslist of Interval

Predicted call intervals.

matcheslist of Match

Matches between ground truth and predictions, typically from IoUMatcher.

boundary_tolerance_msfloat | None

If set, discard matched events where either the onset or offset error exceeds this tolerance (in milliseconds). Only matches within tolerance contribute to the summary statistics. If None (default), all matches are included.

Returns:
accuracyBoundaryAccuracy

Boundary accuracy statistics including onset and offset errors in milliseconds, plus summary statistics (median, mean absolute, 95th percentile).

Notes

Errors are computed as predicted - ground_truth:

  • Positive onset error: Prediction started too late (detected late)

  • Negative onset error: Prediction started too early (detected early)

  • Positive offset error: Prediction ended too late (extended too long)

  • Negative offset error: Prediction ended too early (cut off too soon)

The summary statistics help characterize the overall boundary accuracy:

  • Median: Typical signed error (positive = systematic late bias)

  • Mean absolute: Average magnitude of errors

  • 95th percentile (of absolute): Worst-case bound for most errors

Examples

>>> from callcut.evaluation import Interval, IoUMatcher, compute_boundary_accuracy
>>>
>>> gt = [Interval(1.0, 2.0), Interval(3.0, 4.0)]
>>> pred = [Interval(1.02, 1.98), Interval(3.01, 4.05)]  # small errors
>>>
>>> matcher = IoUMatcher(iou_threshold=0.2)
>>> matches = matcher.match(gt, pred)
>>>
>>> accuracy = compute_boundary_accuracy(gt, pred, matches)
>>> accuracy.onset_mean_abs_ms  # average onset error ~15ms
15.0