Usage🔗
callcut detects animal calls in audio recordings. Each audio file needs a
companion annotation CSV (e.g. recording_annotations.csv) with
start_seconds and stop_seconds columns (values in milliseconds).
Training🔗
from pathlib import Path
import lightning as L
from callcut.evaluation import HysteresisDecoder, IoUMatcher
from callcut.extractors import SNRExtractor
from callcut.nn import TinySegCNN
from callcut.pipeline import evaluate_recordings, save_pipeline
from callcut.training import (
BCEWithLogitsLoss,
CallDataModule,
CallDetectorModule,
LoggingCallback,
SaveBestModelCallback,
)
L.seed_everything(42)
wav_files = sorted(Path("data/").glob("*.wav"))
# Feature extraction: multi-band SNR
extractor = SNRExtractor(sample_rate=32_000)
# Data module: handles train/val/test splitting at the recording level
dm = CallDataModule(recordings=wav_files, extractor=extractor, num_workers=0)
dm.setup("fit")
# Model: lightweight 1D CNN
window_frames = extractor.seconds_to_frames(2.0)
model = TinySegCNN(n_bands=extractor.n_features, window_frames=window_frames)
module = CallDetectorModule(model, loss=BCEWithLogitsLoss())
# Train
trainer = L.Trainer(
max_epochs=10,
accelerator="cpu",
devices=1,
callbacks=[LoggingCallback(), SaveBestModelCallback("best_weights.pt")],
enable_checkpointing=False,
)
trainer.fit(module, datamodule=dm)
# Save the full pipeline (model + extractor + decoder config)
decoder = HysteresisDecoder()
save_pipeline(model, extractor, decoder, "pipeline.pt")
Evaluation🔗
After training, evaluate on the held-out test split using event-level, frame-level, and boundary metrics:
from callcut.evaluation import HysteresisDecoder, IoUMatcher
from callcut.pipeline import evaluate_recordings, load_pipeline
model, extractor, decoder = load_pipeline("pipeline.pt")
# dm.test_recordings is available after dm.setup("fit")
report = evaluate_recordings(
model, extractor, dm.test_recordings, decoder, IoUMatcher()
)
print(report)
The EvaluationReport contains aggregated
EventMetrics,
FrameMetrics, and
BoundaryAccuracy across all recordings, as well as
per-recording results.
Inference on new recordings🔗
To run a trained pipeline on new audio files (no annotations needed):
from pathlib import Path
from callcut.pipeline import load_pipeline, predict_recordings
model, extractor, decoder = load_pipeline("pipeline.pt")
audio_files = sorted(Path("new_data/").glob("*.wav"))
predictions = predict_recordings(model, extractor, audio_files, decoder)
for pred in predictions:
print(f"{pred.audio_path.name}: {len(pred.intervals)} calls")
for interval in pred.intervals:
print(f" {interval.onset:.3f}s - {interval.offset:.3f}s")
Each RecordingPrediction contains the detected call
Interval objects with onset and offset
times in seconds.