Evaluation¶

Evaluation is a process that takes a number of input/output/time triplets and aggregate them. You can always use the model directly and just parse its inputs/outputs manually to perform evaluation. Alternatively, evaluation is implemented in medsegpy using the DatasetEvaluator interface.

MedSegPy includes SemSegEvaluator, an extension of DatasetEvaluator that computes popular semantic segmentation metrics for medical images. You can also implement your own DatasetEvaluator that performs some other jobs using the inputs/outputs pairs. For example, to count how many instances are detected on the validation set:

class Counter(DatasetEvaluator):
  def reset(self):
    self.count = 0
  def process(self, inputs, outputs, time_elapsed):
    for output in outputs:
      self.count += len(output["instances"])
  def evaluate(self):
    # save self.count somewhere, or print it, or return it.
		return {"count": self.count}

Once you have some DatasetEvaluator, you can run it with inference_on_dataset. For example,

val_results = inference_on_dataset(
    model,
    val_data_loader,
    DatasetEvaluators(SemSegEvaluator(...)),
)

The inference_on_dataset function also provides accurate speed benchmarks for the given model and dataset.