# Evaluation

Evaluation is a process that takes a number of input/output/time triplets and aggregate them.
You can always [use the model](models.html) directly and just parse its inputs/outputs manually to perform
evaluation.
Alternatively, evaluation is implemented in medsegpy using the [DatasetEvaluator](../modules/evaluation.html#medsegpy.evaluation.DatasetEvaluator)
interface.

MedSegPy includes [`SemSegEvaluator`](../modules/evaluation.html#medsegpy.evaluation.SemSegEvaluator), an extension of `DatasetEvaluator` that computes popular semantic segmentation metrics for medical images.
You can also implement your own `DatasetEvaluator` that performs some other jobs
using the inputs/outputs pairs.
For example, to count how many instances are detected on the validation set:

```
class Counter(DatasetEvaluator):
  def reset(self):
    self.count = 0
  def process(self, inputs, outputs, time_elapsed):
    for output in outputs:
      self.count += len(output["instances"])
  def evaluate(self):
    # save self.count somewhere, or print it, or return it.
		return {"count": self.count}
```

Once you have some `DatasetEvaluator`, you can run it with
[inference_on_dataset](../modules/evaluation.html#medsegpy.evaluation.inference_on_dataset).
For example,

```python
val_results = inference_on_dataset(
    model,
    val_data_loader,
    DatasetEvaluators(SemSegEvaluator(...)),
)
```

The `inference_on_dataset` function also provides accurate speed benchmarks for the
given model and dataset.