Saturday, August 13, 2011

blazing fast nms.m (from exemplar-svm library)

If you care about building large-scale object recognition systems, you have to care about speed.  And every little bit of performance counts -- so why not first optimize the stuff which is slowing you down?

NMS (non-maximum suppression) is a very popular post-processing method for eliminating redundant object detection windows.  I have take Felzenszwalb et al.'s nms.m and made it significantly faster by eliminating an inner loop.  6 years of grad school, 6 years of building large-scale vision systems in Matlab, and you really learn how to vectorize code.  The code I call millions of times needs to be fast, and nms is one of those routines I call all the time.

The code is found below as a Github gist -- which was taken from my Exemplar-SVM object recognition library (from my ICCV2011 paper: Ensemble of Exemplar-SVMs for Object Detection and Beyond).  The same file, nms.m, can also be found a part of the Exemplar-SVM library on Github.  In fact, this code produces the same result as Pedro's code, but is much faster.  Here is one timing experiment I performed when performing nms on ~300K windows, where my version is roughly 100 times faster.  When you deal with exemplar-SVMs you have to deal with lots of detectors (i.e., lots of detection windows), so fast NMS is money.


>> tic;top = nms_original(bbs,.5);toc
Elapsed time is 58.172313 seconds.


>> tic;top = nms_fast(bbs,.5);toc
Elapsed time is 0.532638 seconds.





function top = nms(boxes, overlap)
% top = nms_fast(boxes, overlap)
% Non-maximum suppression. (FAST VERSION)
% Greedily select high-scoring detections and skip detections
% that are significantly covered by a previously selected
% detection.
% NOTE: This is adapted from Pedro Felzenszwalb's version (nms.m),
% but an inner loop has been eliminated to significantly speed it
% up in the case of a large number of boxes
% Tomasz Malisiewicz (tomasz@cmu.edu)
if isempty(boxes)
top = [];
return;
end
x1 = boxes(:,1);
y1 = boxes(:,2);
x2 = boxes(:,3);
y2 = boxes(:,4);
s = boxes(:,end);
area = (x2-x1+1) .* (y2-y1+1);
[vals, I] = sort(s);
pick = s*0;
counter = 1;
while ~isempty(I)
last = length(I);
i = I(last);
pick(counter) = i;
counter = counter + 1;
xx1 = max(x1(i), x1(I(1:last-1)));
yy1 = max(y1(i), y1(I(1:last-1)));
xx2 = min(x2(i), x2(I(1:last-1)));
yy2 = min(y2(i), y2(I(1:last-1)));
w = max(0.0, xx2-xx1+1);
h = max(0.0, yy2-yy1+1);
o = w.*h ./ area(I(1:last-1));
I([last; find(o>overlap)]) = [];
end
pick = pick(1:(counter-1));
top = boxes(pick,:);
view raw nms_fast.m hosted with ❤ by GitHub

8 comments:

  1. I decided to post this, because I have some of my fellow CMU vision hacker buddies using this. They found it helpful, so why shouldn't you?

    ReplyDelete
  2. Try AccelerEyes Jacket for even more speed.

    ReplyDelete
  3. I'm pretty sure that both accelereyes jacket and any other GPU library will help out the sliding window part of detection the most. Thanks for the tip!

    ReplyDelete
  4. I have a CPP(MEX) version of the same algorithm, i will try to upload it as Well.

    ReplyDelete
  5. Awesome! If you make a github gist out of your cpp/mex version, hopefully somebody will run some timing tests.

    Looking forward to seeing your version.

    ReplyDelete
  6. Hello,

    Thanks for your great matlab codes, I've made a C version here https://gist.github.com/3128130

    I just tested some simple cases, ;)

    ReplyDelete
  7. Could you please note the format of input and output?

    ReplyDelete
  8. Hi Sai,
    The input is a collection of bounding boxes represented as a large matrix. Each row is an N-d vector with the first four numbers the bbox coordinates and the last number the score. The middle numbers can be just about anything else you want.

    The output is a new matrix of bbs, such that they don't overlap more than the overlap threshold.

    ReplyDelete