본 글은 Instance segmentation task의 evaluation metric (평가 척도)인 COCO evaluation metric을 설명한 글입니다. COCO dataset은 여기서 볼 수 있습니다.
On the instance segmentation task, its evaluation metric solves this problem by using the intersection over union (IoU). IoU means measuring the similarity between finite sets, defined as the size of the intersection divided by the size of the union of the sets. Based on the IoU between ground truth and prediction ($IoU = \frac{|\textrm{gt} \cap \textrm{prediction}|}{|\textrm{gt} \cup \textrm{prediction}|}$), a prediction is considered to be True Positive if $\texttt{IoU}\geq \texttt{threshold}$, False Positive if $\texttt{IoU < threshold}$ and False Negative for undetected ground truth(True Negative is not applicable because this mean there is no ground truth box and the model did not predict it as a bounding box, there are infinite possible positions to satisfy this.).
To consider every prediction element from the prediction from our task, we can initially set the threshold to a certain value and go through the process: for example, 0.5. But this means that two predictions of IoU 0.6 and 0.9 would have equal weightage. So setting a certain threshold also introduces a bias in the evaluation metric. One way to solve this problem is to set a range of IoU threshold values, and then compute AP value based on multiple IoU values. COCO evaluation metric calculates segmentation accuracy by considering whole predictions. Its IoU threshold ranges from 0.5 to 0.95 with a step size of 0.05 represented as $\texttt{AP@[0.5:.05:.95]}$ (10 thresholds). So from each IoU value, AP value can be calculated. By taking the average of AP values, we can get the segmentation accuracy:
$ \textrm{AP}_{COCO}=\frac{\textrm{AP}_{0.50} + \textrm{AP}_{0.55} + \cdots + \textrm{AP}_{0.95}}{10} $
To calculate the AP value from each IoU threshold, we should determine the precision-recall curve since AP value means area under the precision-recall curve (PR curve). To know the precision-recall curve, we have to get the confusion matrix. Computing the confusion matrix is based on the IoU concept described above. For example on Figure \ref{eval}, if the IoU threshold is 0.8, TP of threshold 0.8 would be 1 and FP would be 1 and FN would be 0. This means the precision of threshold 0.8 is 0.5 and recall is 1. [0.5,1] would be the point of the precision-recall curve. The explanation of the evaluation metric of instance segmentation task we talked about so far is as follows:
To sum up, on a fixed IoU threshold, we can get a point of the precision-recall curve from each instance. Calculating the region of each precision-recall curve, we can get each AP value on an IoU threshold. The average value would be the segmentation accuracy.