Evaluation¶
Two-Phase Challenge Submission¶
SegRap2023 is a two-phase challenge. For the first phase (validation phase), the participants are required to submit the output of their algorithms as a single compressed zip file via the grand-challenge.org submission system. Each team will be allowed to create 5 submissions between July 10, 2023 (12:00 AM GMT) and Aug 10, 2023 (11:59 AM GMT) to this phase. The submitted zip files should be formatted like the one below. Make sure the results in the submitted zip file all be matched with the validation images one-to-one, which contains 45 predictions for task1 and 2 predictions for task2 for each case. Otherwise, the results are considered invalid submissions and no score will be generated.
results/
├─ SegRap_0001
├─ Brain.nii.gz
├─ Brainstem.nii.gz
├─ ...
├─ ...
├─ Brain.nii.gz
├─ Brainstem.nii.gz
├─ ...
├─ SegRap_xxxx
├─ Brain.nii.gz
├─ Brainstem.nii.gz
├─ ...
For the final phase (testing phase): the participants will be requested to submit their algorithm in the form of a docker container before Sept 10, 2023 (12:00 AM GMT). We believe this will enable a more thorough confirmation of reproducibility. After the competition, the docker models will be released with the consent of the corresponding participants.
Metrics¶
Two classical medical segmentation metrics: Dice Similarity Coefficient (DSC), and normalized surface dice (NSD), will be used to assess different aspects of the performance of the segmentation methods.
Ranking Method¶
Firstly, for each organ, we calculate the average DSC and NSD across all the patients respectively. Secondly, each participant will be ranked based on the organ-level DSC and NSD; each participant will have 45*2 or 2*2 rankings. Finally, average all these rankings and then normalize them by the number of teams. At the same time, we will take the statistical ranking. (Allow equal teams if there is no significant difference).
In addition, if the submissions have some missing results on test cases, the corresponding organ's DSC and NSD will be set to 0 and 0 for ranking. For example, a test case missed an organ and the ranking value of this organ in average DSC and NSD will degrade.
Note: In the training stage, participants are encouraged to explore using no-contrast or contrast-enhanced or both CT images in two tasks. In the evaluation stage, we only focus on the segmentation results of the submitted methods.