NMS (non-maximum suppression) is a very popular post-processing method for eliminating redundant object detection windows. I have take Felzenszwalb et al.'s nms.m and made it significantly faster by eliminating an inner loop. 6 years of grad school, 6 years of building large-scale vision systems in Matlab, and you really learn how to vectorize code. The code I call millions of times needs to be fast, and nms is one of those routines I call all the time.
The code is found below as a Github gist -- which was taken from my Exemplar-SVM object recognition library (from my ICCV2011 paper: Ensemble of Exemplar-SVMs for Object Detection and Beyond). The same file, nms.m, can also be found a part of the Exemplar-SVM library on Github. In fact, this code produces the same result as Pedro's code, but is much faster. Here is one timing experiment I performed when performing nms on ~300K windows, where my version is roughly 100 times faster. When you deal with exemplar-SVMs you have to deal with lots of detectors (i.e., lots of detection windows), so fast NMS is money.
>> tic;top = nms_original(bbs,.5);toc
Elapsed time is 58.172313 seconds.
>> tic;top = nms_fast(bbs,.5);toc
Elapsed time is 0.532638 seconds.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
function top = nms(boxes, overlap) | |
% top = nms_fast(boxes, overlap) | |
% Non-maximum suppression. (FAST VERSION) | |
% Greedily select high-scoring detections and skip detections | |
% that are significantly covered by a previously selected | |
% detection. | |
% NOTE: This is adapted from Pedro Felzenszwalb's version (nms.m), | |
% but an inner loop has been eliminated to significantly speed it | |
% up in the case of a large number of boxes | |
% Tomasz Malisiewicz (tomasz@cmu.edu) | |
if isempty(boxes) | |
top = []; | |
return; | |
end | |
x1 = boxes(:,1); | |
y1 = boxes(:,2); | |
x2 = boxes(:,3); | |
y2 = boxes(:,4); | |
s = boxes(:,end); | |
area = (x2-x1+1) .* (y2-y1+1); | |
[vals, I] = sort(s); | |
pick = s*0; | |
counter = 1; | |
while ~isempty(I) | |
last = length(I); | |
i = I(last); | |
pick(counter) = i; | |
counter = counter + 1; | |
xx1 = max(x1(i), x1(I(1:last-1))); | |
yy1 = max(y1(i), y1(I(1:last-1))); | |
xx2 = min(x2(i), x2(I(1:last-1))); | |
yy2 = min(y2(i), y2(I(1:last-1))); | |
w = max(0.0, xx2-xx1+1); | |
h = max(0.0, yy2-yy1+1); | |
o = w.*h ./ area(I(1:last-1)); | |
I([last; find(o>overlap)]) = []; | |
end | |
pick = pick(1:(counter-1)); | |
top = boxes(pick,:); |
I decided to post this, because I have some of my fellow CMU vision hacker buddies using this. They found it helpful, so why shouldn't you?
ReplyDeleteTry AccelerEyes Jacket for even more speed.
ReplyDeleteI'm pretty sure that both accelereyes jacket and any other GPU library will help out the sliding window part of detection the most. Thanks for the tip!
ReplyDeleteI have a CPP(MEX) version of the same algorithm, i will try to upload it as Well.
ReplyDeleteAwesome! If you make a github gist out of your cpp/mex version, hopefully somebody will run some timing tests.
ReplyDeleteLooking forward to seeing your version.
Hello,
ReplyDeleteThanks for your great matlab codes, I've made a C version here https://gist.github.com/3128130
I just tested some simple cases, ;)
Could you please note the format of input and output?
ReplyDeleteHi Sai,
ReplyDeleteThe input is a collection of bounding boxes represented as a large matrix. Each row is an N-d vector with the first four numbers the bbox coordinates and the last number the score. The middle numbers can be just about anything else you want.
The output is a new matrix of bbs, such that they don't overlap more than the overlap threshold.