Evaluation¶
Reference (Ground Truth)¶
Annotation process was handled by a radiology expert and an experienced medical image processing scientist for each individual slice using an open source program 3D Slicer. (See Fig. 2)
![]() |
![]() |
Fig. 2: (a-d) Examples of labeling using axial images. (e) Labelling process using multi-planar reconstructions and visualization in 3D slicer. (f) Visualization of annotated (g) portal vein (h) hepatic vein. (h-i) Quality control and continuity check of annotated vessels.
Evaluation Metrics¶
Four evaluation metrics will be used in order to obtain the best results:
- Localization Criterion: Mask Intersection over Union (Mask IoU) :
https://metrics-reloaded.dkfz.de/metric?id=mask_iou
loU measures the overlap between two structures (see above). Combined with a localization threshold, it is a common localization criterion. It is often referred to as Box loU when comparing bounding boxes, Mask loU when comparing segmentation masks, or Approx loU when comparing approximations of objects beyond bounding boxes.
- Overlap-based Metric: Centerline Dice (clDice):
https://metrics-reloaded.dkfz.de/metric?id=cl_dice
clDice measures the overlap between two structures, ideally tubular-shaped. The formula is similar to the DSC but relies on topology precision and topology sensitivity, which are defined based on the skeletons of the structures.
- Boundary-based Metric: Normalized Surface Distance (NSD):
https://metrics-reloaded.dkfz.de/metric?id=normalized_surface_distance
NSD measures the DSC on boundary pixels with an un certainty margin. The tolerance parameter τ represents the degree of strictness for what constitutes a correct boundary. Only boundary parts within the border regions defined by τ are counted as TP. NSD, therefore, captures known uncertainties in the reference and allows acceptable deviations from the reference for the predicted boundary.