Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression
About
Intersection over Union (IoU) is the most popular evaluation metric used in the object detection benchmarks. However, there is a gap between optimizing the commonly used distance losses for regressing the parameters of a bounding box and maximizing this metric value. The optimal objective for a metric is the metric itself. In the case of axis-aligned 2D bounding boxes, it can be shown that $IoU$ can be directly used as a regression loss. However, $IoU$ has a plateau making it infeasible to optimize in the case of non-overlapping bounding boxes. In this paper, we address the weaknesses of $IoU$ by introducing a generalized version as both a new loss and a new metric. By incorporating this generalized $IoU$ ($GIoU$) as a loss into the state-of-the art object detection frameworks, we show a consistent improvement on their performance using both the standard, $IoU$ based, and new, $GIoU$ based, performance measures on popular object detection benchmarks such as PASCAL VOC and MS COCO.
Related benchmarks
| Task | Dataset | Result | Rank | |
|---|---|---|---|---|
| Object Detection | COCO 2017 (val) | AP39.1 | 2454 | |
| Visual Question Localization and Answering | Overlapping classes t=1 | Accuracy52.88 | 11 | |
| Visual Question Localization and Answering | EndoVis at t=1 18 | Accuracy50.13 | 11 | |
| Visual Question Localization and Answering | N/O classes Old (t=1) | Acc0.00e+0 | 11 | |
| Visual Question Localization and Answering | Old N/O (t=2) | Accuracy23 | 10 | |
| Visual Question Localization and Answering | EndoVis17 (t=2) | Acc42.29 | 10 | |
| Visual Question Localization and Answering | Average t=2 | Accuracy39.78 | 10 | |
| Visual Question Localization and Answering | Average t=1 | Accuracy62.13 | 10 | |
| Visual Question Localization and Answering | EndoVis at t=1 17 | Accuracy74.13 | 10 | |
| Visual Question Localization and Answering | Overlapping t=2 | Acc38.9 | 10 |