Higher Order Tracking Accuracy HOTA for Multi-Object

Higher Order Tracking Accuracy

Multi-object tracking (MOT) is the task of detecting the presence of multiple objects in a video and linking these detections over time based on object ID. The MOT task is one of the cornerstones of computer vision, but unfortunately, evaluating MOT algorithms has proven to be very difficult. This is because MOT algorithms are complex tasks that require accurate detection, localization, and mapping over time. HOTA can evaluate all these aspects of tracking and is often preferred over current alternatives to evaluating his MOT algorithms such as MOTA and IDF1.

To assess how well a tracker is performing, we need to compare its output with a ground truth set of tracking results. HOTA classifies the potential errors between a set of predicted and ground-truth tracks, into three categories, 1) localization, 2) detection, 3) association, and calculates a score for each using a Jaccard index. The Jaccard Index is a measure of the similarity between two sets. It then combines these three scores for each category into the final HOTA score.

Localization measures the spatial alignment between a predicted detection and a ground truth detection, calculated as the ratio of the overlap (intersection) between the two detections to the total area covered by both (union). HOTA measures the overall localization accuracy (LocA) by averaging the localization score over all pairs of matching predictions and ground truth detections in the entire dataset. So we have:

HOTA also allows the analysis of localization accuracy.

Since detection measures the agreement between the set of all predicted detections and the set of all ground truth detections, we need to define which detections intersect between the set of predicted and ground truth detections. To do this, we define a localization threshold above which we say that two detections intersect. These matching pairs of detections are called True Positives (TPs) and can be viewed as the commonality between the two detection sets. Unmatched predicted detections are false positives (FP), and unmatched ground truth detections are false negatives (FN). So we have:

Association measures how well a tracker associates detections with the same identity (ID) over time, considering the ground truth set. This can be measured by using matching predictive detection and ground truth detection and measuring the alignment between the entire trace of this predictive detection and the entire trace of the ground truth detection. The intersection between two tracks can be measured as the number of true positive matches between the two tracks. These are called True Positive Associations (TPAs). All remaining detections in the predicted track are False Positive Associations (FPA), and all remaining detections in the Ground Truth track are False Negative Associations (FNA). So we have:

This measures the alignment between the two tracks. The overall association accuracy (AssA) can be measured by averaging the Association_Score across all pairs.

To create a single metric, we need to combine all three metrics for each threshold, then calculate the final score as the geometric mean of the detection and association scores. Through subsequent integration over different α thresholds, the localization accuracy is included in the final evaluation.

HOTA metrics enable meaningful analysis and comparisons between trackers on all four of these dimensions (missing detections, additional detections, track splits, and track combinations), making all these scores meaningful. Combined, you get an overall score for ranking trackers and HOTA also allows analysis of localization accuracy.

Using HOTA you can see not only how good your tracker is, but also what it’s good at. This is important for understanding the basic behavior of trackers, both when deciding on a tracker for your application and when considering how you can improve your current tracker.

Reference:

uiten, Jonathon, et al. “Hota: A higher order metric for evaluating multi-object tracking.” International journal of computer vision 129 (2021): 548-578.

Multi-object Tracking

Higher Order Tracking Accuracy

Post a Comment