본 글은 Instance segmentation task의 evaluation metric (평가 척도)인 COCO evaluation metric을 설명한 글입니다. COCO dataset은 여기서 볼 수 있습니다.
On the instance segmentation task, its evaluation metric solves this problem by using the intersection over union (IoU). IoU means measuring the similarity between finite sets, defined as the size of the intersection divided by the size of the union of the sets. Based on the IoU between ground truth and prediction (IoU=|gt∩prediction||gt∪prediction|), a prediction is considered to be True Positive if IoU≥threshold, False Positive if IoU < threshold and False Negative for undetected ground truth(True Negative is not applicable because this mean there is no ground truth box and the model did not predict it as a bounding box, there are infinite possible positions to satisfy this.).
To consider every prediction element from the prediction from our task, we can initially set the threshold to a certain value and go through the process: for example, 0.5. But this means that two predictions of IoU 0.6 and 0.9 would have equal weightage. So setting a certain threshold also introduces a bias in the evaluation metric. One way to solve this problem is to set a range of IoU threshold values, and then compute AP value based on multiple IoU values. COCO evaluation metric calculates segmentation accuracy by considering whole predictions. Its IoU threshold ranges from 0.5 to 0.95 with a step size of 0.05 represented as AP@[0.5:.05:.95] (10 thresholds). So from each IoU value, AP value can be calculated. By taking the average of AP values, we can get the segmentation accuracy:
APCOCO=AP0.50+AP0.55+⋯+AP0.9510
To calculate the AP value from each IoU threshold, we should determine the precision-recall curve since AP value means area under the precision-recall curve (PR curve). To know the precision-recall curve, we have to get the confusion matrix. Computing the confusion matrix is based on the IoU concept described above. For example on Figure

To sum up, on a fixed IoU threshold, we can get a point of the precision-recall curve from each instance. Calculating the region of each precision-recall curve, we can get each AP value on an IoU threshold. The average value would be the segmentation accuracy.