callcut.training.CallDataModule🔗
- class callcut.training.CallDataModule(recordings, extractor, *, train_frac=0.7, val_frac=0.1, test_frac=0.2, window_s=2.0, window_hop_s=0.5, batch_size=32, num_workers=4)[source]🔗
Lightning DataModule for call detection.
Handles data loading, train/val/test splitting (balanced by window count), and DataLoader creation for training and validation. The test split is exposed via
test_recordingsfor use withevaluate_recordings().- Parameters:
- recordings
listofPath|str Paths to audio files. Each audio file should have a corresponding annotation CSV file with the same stem and
_annotations.csvsuffix.- extractor
BaseExtractor Feature extractor instance.
- train_frac
float Target fraction of windows for training (default: 0.7).
- val_frac
float Target fraction of windows for validation (default: 0.1).
- test_frac
float Target fraction of windows for testing (default: 0.2).
- window_s
float Window length in seconds for each sample.
- window_hop_s
float Hop between consecutive windows in seconds.
- batch_size
int Batch size for DataLoaders.
- num_workers
int Number of workers for DataLoaders.
- recordings
Attributes
Feature extractor used by this data module.
Total number of valid recordings.
Test recordings (available after setup or split).
Training dataset (available after setup).
Training recordings (available after setup or split).
Validation dataset (available after setup).
Validation recordings (available after setup or split).
Methods
setup(stage)Set up datasets for the fit stage.
Return the training DataLoader.
Return the validation DataLoader.
Notes
The splitting is done at the recording level, balanced by window count rather than file count. This ensures each split contains approximately the target fraction of training samples, even when recordings have different durations.
For reproducible splits, call
lightning.seed_everything(seed)before instantiating and callingsetup().Only the
"fit"stage is supported forsetup(). For evaluation on the held-out test split, useevaluate_recordings()withtest_recordings.Examples
>>> from pathlib import Path >>> from callcut.extractors import SNRExtractor >>> from callcut.training import CallDataModule >>> from callcut.pipeline import evaluate_recordings >>> import lightning as L >>> >>> extractor = SNRExtractor(sample_rate=32000, hop_ms=8.0, n_bands=8) >>> recordings = list(Path("data/").glob("*.wav")) >>> dm = CallDataModule( ... recordings=recordings, ... extractor=extractor, ... batch_size=32, ... ) >>> trainer = L.Trainer(max_epochs=10) >>> trainer.fit(model, datamodule=dm) >>> >>> # Evaluate on held-out test recordings >>> report = evaluate_recordings( ... model, extractor, dm.test_recordings, decoder, matcher ... )
- setup(stage)[source]🔗
Set up datasets for the fit stage.
- Parameters:
- stage
str Must be
"fit". Other stages are not supported; useevaluate_recordings()for evaluation on held-out test recordings (available viatest_recordings).
- stage
- train_dataloader()[source]🔗
Return the training DataLoader.
- Returns:
- dataloader
DataLoader Training DataLoader with shuffling enabled.
- dataloader
- val_dataloader()[source]🔗
Return the validation DataLoader.
- Returns:
- dataloader
DataLoader|None Validation DataLoader, or
Noneif no validation recordings.
- dataloader
- property extractor🔗
Feature extractor used by this data module.
- Type:
- property test_recordings🔗
Test recordings (available after setup or split).
- Type:
- property train_dataset🔗
Training dataset (available after setup).
- Type:
CallDataset| None
- property train_recordings🔗
Training recordings (available after setup or split).
- Type:
- property val_dataset🔗
Validation dataset (available after setup).
- Type:
CallDataset| None
- property val_recordings🔗
Validation recordings (available after setup or split).
- Type: