# Use Custom Dataloaders ## How the Existing Dataloader Works MedSegPy contains a builtin data loading pipeline. It's good to understand how it works, in case you need to write a custom one. MedSegPy provides an interface for loading and structuring data stored in different ways (3D volumes, 2D slices, etc.). Data structuring consists of scattering a single element into multiple elements (3D volume -> 2D/3D patches) or gathering multiple elements into a single element (multiple 2D slices -> 3D volume). For example, if data from a 3D scan is saved slice-wise across different h5 files and we want to train using a 3D network, we can use MedSegPy's interface for gathering data from different files into a single volume. MedSegPy's loading/structuring interface is defined by the [`DataLoader`](../modules/data.html#medsegpy.data.data_loader.DataLoader) abstract class. This class extends the keras [`Sequence`](https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/keras/utils/Sequence) class. Like Sequences, `DataLoaders` implement a `__getitem__` method that can be used for fetching batches. For training and validation purposes, we recommend following the keras API for loading data with sequences. As mentioned above, medical data often requires structuring/patching. This can result in returning batches of elements that are subsets of a single scan. For example, a data loader that indexes over 2D slices of a 3D scan is incredibly useful for training 2D models. However, during inference, metrics are typically calculated per scan and restructuring data outside of the data loader can be difficult. To simplify inference and downstream metric calculation, each data loader implements an `inference` method, which takes in a medsegpy [`Model`](../modules/modeling.html#medsegpy.modeling.model.Model) and keyword arguments that are typically used with [`predict_generator`](https://keras.io/models/sequential/#predict_generator). In `inference`, the data loader does the following: 1. It loads all dataset dictionaries corresponding to a given scan 2. Structures data in these dictionaries based on the data loader's defined structuring method. 3. Runs inference on scan data 4. Reformats scan data. Images/volumes will be of the shape `HxWx...`. Semantic segmentation masks and predictions will have shape `HxWx...xC`. 5. Yields a dictionary of inputs and outputs This method continues to yield input and output data in the medsegpy format until data for all scans are yielded. For more information, see [DataLoader](../modules/data.html#medsegpy.data.DataLoader). ## Dataloader example Below we describe loading data and training a model using the for OAI iMorphics 2D dataset, a dataset where 3D volumes are stored as 2D slices. For more information on acceptable dataset h5 files, see [datasets](datasets.html). The `DefaultDataLoader` handles both 2D single-slice scans and 3D scans stored as 2D slices. For more information on other dataloaders, see data loaders in [medsegpy.data.data_loader](../modules/data.html#medsegpy.data.DataLoader). ```python from medsegpy.config import UNetConfig from medsegpy.data import build_loader, DatasetCatalog, DefaultDataLoader from medsegpy.modeling import get_model cfg = UNetConfig() cfg.TAG = "DefaultDataLoader" # Specify the data loader type cfg.TRAIN_DATASET = "oai_2d_train" cfg.VAL_DATASET = "oai_2d_val" cfg.TEST_DATASET = "oai_2d_test" cfg.CATEGORIES = (0, (1, 2), 3, (4, 5)) cfg.IMG_SIZE = (384, 384, 1) model = get_model(cfg) model.compile(...) # compile with optimizer, loss, metrics, etc. # Using built-in methods to create loaders. # To build them from scratch, see implementation # of `build_loader`. train_loader = build_loader( cfg, cfg.TRAIN_DATASET, batch_size=10, is_test=False, shuffle=True, drop_last=True, ) val_loader = build_loader( cfg, cfg.VAL_DATASET, batch_size=10, is_test=False, shuffle=True, drop_last=True, ) test_loader = build_loader( cfg, cfg.TEST_DATASET, batch_size=10, is_test=False, shuffle=True, drop_last=False, ) # Start training model.fit_generator( train_loader, validation_data=val_loader, ... ) # Run inference. for input, output in test_loader.inference(model): # Do inference related things. ``` ## Write a Custom Dataloader Coming soon! ## Use a Custom Dataloader If you use [DefaultTrainer](../modules/engine.html#medsegpy.engine.trainer.DefaultTrainer), you can overwrite its `_build_data_loaders` and `build_test_data_loader` methods to use your own dataloader. If you write your own training loop, you can also plug in your data loader easily.