Friday, January 27, 2012

drawing sexy graphs in matlab

Everybody who loves computer sciences loves graphs.  But the fat 'n juicy graphs, the ones with complex structure you just gotta visualize.  To enjoy these beautiful data structures, the hackers at AT&T gave us, the world, Graphviz as a powerful tool for visualizing complex graphs in two dimensions.  I do a lot of stuff in Matlab, so I've put my simple graphviz matlab wrappers up on Github so everybody can enjoy them. I do a lot of stuff with graphs...

My repository, which I'm already using as a submodule in many of my projects, can be found here:
https://github.com/quantombone/graphviz_matlab_magic

Here is a matlab script (included as a Github gist), which should be ran in an empty directory, and it will download a nice mat file plus clone my repo and show the following nice graph.  I perform two graphviz passes where the first one is used to read the graphviz coordinates (from the sfdp embedding) and use Matlab's jet colormap to color the edges based on distances in this space.  In other words, nearby nodes which are connected will be connected by red (hot) edges and faraway nodes will be connected by blue (cold) edges.



The matrix visualized comes from an electromagnetic model, the details can be found here: http://www.cise.ufl.edu/research/sparse/matrices/Bai/qc324.html

The original picture generated by Yifan Hu is here for comparison:

Enjoy
--Tomasz

Tuesday, January 10, 2012

100,000+ page views on my computer vision blog

I like high-risk / high-reward activity.  While some say that this is my temperament (perhaps a vestige of youth?) I simply say: "that's how I roll."  Maybe I was too young when I read Kuhn's Structure of Scientific Revolutions, or maybe I was born with iconoclastic ideals, but I earnestly believe that life is too short to always do what you've been told.  One of my favorite maxims is the following: "The only limits we have are the ones we impose upon ourselves."

I took a gamble when I started this blog, blurring the line between all things related to computer vision, philosophy, artificial intelligence, machine learning, and other fun things which constitute my intellectual life.  During my PhD I was even discouraged from blogging, because "my superiors" incessantly reminded me that "you get famous by writing CVPR papers" and not by wasting time maintaining a "cute" blog.  Today I'd like to argue that my adventure in blogging has not been a failure at all!

I had multiple reasons for wanting to blog, several of which I list below:
  • I wanted to practice my writing, and what better way to practice writing than by writing!
  • I wanted an outlet to discuss certain ideas which I find invaluable in my pursuit of building intelligence, but which aren't necessarily publishable.  On my blog I am the sole contributor, the sole editor.  If you don't like what I have to say, start your own blog.  I don't need anonymous reviews, the CVPR submission process stresses me out enough for one lifetime.
  • I wanted a medium to advertise my own work as well other works which I find important for graduate students in Computer Vision to know about.
  • I wanted to expose the field of Computer Vision to a broader audience and hopefully get others excited about this amazing research field.


Today I'm glad to announce that according to statcounter, my computer vision blog has reached over 100,000 views.  In an absolute sense, this really is nothing to be excited about.  By since my CMU homepage has approximately 30,000 views, this means that my blog is 3x as popular as my academic homepage!  Next goal: 1,000,000 page views!

I actually meet more people that know me through my blog than through my research papers, even though I put in 100x the effort in doing the research behind those papers.  I don't plan on taking up blogging full time anytime soon, but it feels good to know that my blogging adventure has paid off.

Here are some of the top keywords which have been used to find my blog:

Here are some of my most popular blog posts of all time:

I encourage anybody who reads my blog to shoot me a quick "yo what's up!" at a local conference or where ever else our paths might cross.  I also encourage everybody to suggest the types of things they would like to read about on my blog.

Tuesday, December 13, 2011

learning to "borrow" examples for object detection. Lim et al, NIPS 2011

Let's say you want to train a cat detector...  If you're anything like me, then you probably have a few labeled cats (~100), as well as a source of non-cat images (~1000).  So what do you do when you can't get any more labeled cats?  (Maybe Amazon's Mechanical Turk service was shut down by the feds, you've got a paper deadline in 48 hours, and money can't get you out of this dilemma.)

Answer: 
1) Realize that there are some labeled dogs/cows/sheep in your dataset!
2) Transform some of the dogs/cows/sheep in your dataset to make them look more like cats. Maybe some dogs are already sufficiently similar to cats! (see cheezburger.com image below)
3) Use a subset of those transformed dogs/cows/sheep examples as additional positives in your cat detector!

Some dogs just look like cats! (and vice-versa)


Using my own internal language, I view this phenomenon as "exemplar theft."  But not the kind of theft which sends you to prison, 'tis the kind of theft which gives you best-paper prizes at your local conference.

Note that this was the answer provided by the vision hackers at MIT in their most recent paper, "Transfer Learning by Borrowing Examples for Multiclass Object Detection," which was just presented at this year's big machine learning-oriented NIPS conference, NIPS 2011. See the illustration from the paper below, which depicts this type of "example borrowing"-sharing for some objects in the SUN09 dataset.


The paper empirically demonstrates that instead of doing transfer learning (also known as multi-task learning) the typical way (regularizing weight vectors towards each other), it is beneficial to simply borrow a subset of (transformed) examples from a related class.  Of course the problem is that we do not know apriori which categories to borrow from, nor which instances from those categories will give us a gain in object detection performance.  The goal of the algorithm is to learn which categories to borrow from, and which examples to borrow.  Not all dogs will help the cat detector.

Here are some examples of popular object categories, the categories from which examples are borrowed, and the categories from which examples are shared once we allow transformations to happen.  Notice the improvement in AP (the higher the average precision the better) when you allow sharing.



They also looked at what happens if you want to improve a single category badass detector on one particular dataset, such as the PASCAL VOC.  Note that these days just about everybody is using the one-and-only "badass detector" and trying to beat it in its own game.   These are the different ways you'll hear people talk about the Latent-SVM-based Deformable Part Model baseline. "badass detector"="state-of-the-art detector"="Felzenszwalb et al. detector"="Pedro's detector"="Deva's detector","Pedro/Deva detector","LDPM detector","DPM detector"

Even if you only care about your favourite dataset, such as PASCAL VOC, you're probably willing to use additional positive data points from another dataset.  In their NIPS paper, the MIT hackers show that simply concatenating datasets is inferior to their clever example borrowing algorithm (mathematical details are found in the paper, but feel free to ask me detailed questions in the comments).  In the figure below, the top row shows cars from one dataset (SUN09), the middle row shows PASCAL VOC 2007 cars, and the bottom row shows which example the SUN09-car detector wants to borrow from PASCAL VOC.

Here the the cross-dataset generalization performance on the SUN09/PASCAL duo.  These results were inspired by the dataset bias work of Torralba and Efros.



In case you're interested, here is the full citation for this excellent NIPS2011 paper:

Joseph J. Lim, Ruslan Salakhutdinov, and Antonio Torralba. "Transfer Learning by Borrowing Examples for Multiclass Object Detection," in NIPS 2011. [pdf]




To get a better understanding of Lim et al's paper, it is worthwhile going back in time to CVPR2011 and taking a quick look the following paper, also from MIT:

Ruslan Salakhutdinov, Antonio Torralba, Josh Tenenbaum. "Learning to Share Visual Appearance for Multiclass Object Detection," in CVPR 2011. [pdf]

Of course, these authors need no introduction (they are all professors at big-time institutions). Ruslan just recently became a Professor and is now back on home turf (where he got his PhD) in Toronto, where he is likely to become the next Hinton.  In my opinion, this "Learning to share" paper was one of the best papers of CVPR 2011.  In this paper they introduced the idea of sharing across rigid classifier templates, and more importantly learning a tree to organize hundreds of object categories.  The tree defines how the sharing is supposed to happen.  The root note is global and shared across all categories, the mid-level nodes can be interpreted as super-categories (i.e., animal, vehicle), and the leaves are the actual object categories (e.g., dog, chair, person, truck).

The coolest thing about the paper is that they use a CRP (chinese restaurant process) to learn a tree without having to specify the number of super-categories!

Finally, we can see some learned weights for three distinct object categories: truck, van, and bucket.  Please see the paper if you want to learn more about sharing -- the clarity of Ruslan's paper is exceptional.




In conclusion, it is pretty clear everybody wants some sort of visual memex. (It is easy to think of the visual memex as a graph where the nodes are individual instances and the edges are relationships between these entities)  Sharing, borrowing, multi-task regularization, exemplar-svms, and a host of other approaches are hinting at the breakdown of the traditional category-based way of approaching the problem of object recognition.  However, our machine learning tools were designed for supervised machine learning with explicit class information.  So what we, the researchers do, is try to break down those classical tools so that we can more effectively exploit the blurry line between not-so-different object categories.  At the end of the day, rigid categories can only get us so far.  Intelligence requires interpretation at multiple and potentially disparate levels.  When it comes to intelligence, the world is not black and white, there are many flavours of meaningful image interpretation.

Tuesday, December 06, 2011

Graphics meets Big Data meets Machine Learning

We've all played Where's Waldo as children, and at least for me it was quite a fun game.  So today let's play an image-based Big Data version of Where's Waldo.  I will give you a picture, and you have to find it in a large collection of images!  This is a form of image retrieval, and this particular formulation is also commonly called "image matching."


The only catch is that you are only given one picture, and I am free to replace the picture with a painting or a sketch.  Any two-dimensional pattern is a valid query image, but the key thing to note is that there is only a single input image. Life would be awesome if Google's Picasa had this feature built in!


The classical way of solving this problem is via a brute-force nearest neighbor algorithm, an algorithm which won't match pixel pattern directly, but an algorithm which will also use a state-of-the-art image descriptor such as GIST for comparison.  Back in 2007, at SIGGRAPH, James Hays and Alexei Efros have shown this to work quite well once you have a very large database of images!  But the reason why the database had to be so large is because a naive Nearest Neighbor algorithm is actually quite dumb.  The descriptor might be cleverer than matching raw pixel intensities, but for a machine, an image is nothing but a matrix of numbers, and nobody told the machine which patterns in the matrix are meaningful and which ones aren't.  In short, the brute-force algorithm works if there are similar enough images such that all parts of the input image will match a retrieved image.  But ideally we would like the algorithm to get better matches by automatically figuring out which parts of the query image are meaningful  (e.g., the fountain in the painting) and which parts aren't (e.g., the reflections in the water).

A modern approach to solve this issue is to collect a large set of related "positive images" and a large set of un-related "negative images" and then train a powerful classifier which can hopefully figure out the meaningful bits of the image. But in this approach the problem is twofold.  First, working with a single input image it is not clear whether standard machine learning tools will have a chance of learning anything meaningful.  The second issue, a significantly worse problem, is that without a category label or tag, how are we supposed to create a negative set?!?  Exemplar-SVMs to the rescue!  We can use a large collection of images from the target domain (the domain we want to find matches from) as the negative set -- as long as the "negative set" contains only a small fraction of potentially related images, learning a linear SVM with a single positive still works.




Here is an excerpt from a Techcrunch article which summarizes the project concisely:

"Instead of comparing a given image head to head with other images and trying to determine a degree of similarity, they turned the problem around. They compared the target image with a great number of random images and recorded the ways in which it differed the most from them. If another image differs in similar ways, chances are it’s similar to the first image. " -- Techcrunch


Abhinav ShrivastavaTomasz MalisiewiczAbhinav GuptaAlexei A. EfrosData-driven Visual Similarity for Cross-domain Image Matching. In SIGGRAPH ASIA, December 2011. Project Page



Here is a short listing of some articles which mention our research (thank Abhinav!).




Monday, December 05, 2011

An accidental face detector

Disclaimer #1: I don't specialize in faces.  When it comes to learning, I like my objectives to be convex.  When it comes to hacking on vision systems, I like to tackle entry-level object categories.

Fun fact #1: Faces are probably the easiest objects in the world for a machine to localize/detect/recognize.

Note #1: I supplied the images, my algorithm supplied the red boxes.

Note #2: Sorry to all my friends who failed to get detected by my accidental face detector! (see below)

So I was hackplaying with some of my PhD thesis code over Thanksgiving, and I accidentally made a face detector.  oops!  I immediately ran to my screenshot capture tool and ran my code on my Mac desktop while browsing Google Images and Facebook.  It seems to work pretty well on real faces as well as sketches/paintings of faces (see below)!  I even caught two Berkeleyites (an Alyosha and a Jianbo), but you gotta find them for yourself.  The detector is definitely tuned to frontal faces, but runs pretty fast and produces few false positives.  Not too shabby for some midnight hackerdom.










Yes, I'm doing dense multiscale sliding windows here.  Yes, I'm HoGGing the hell outta these images. Yes, I'm using a single frontal-face tuned template.  And yes, I only used faces of myself to train this accidental face detector.

Note: If I've used one of your pictures without permission, and you would like a link back to your home on the interwebs, please leave a comment indicating the image and link to original.



Friday, December 02, 2011

Google Scholar, My Citations, a new paradigm for finding great Computer Vision research papers

I have been finding great computer vision research papers by using Google Scholar for the past 2+ years.  My recipe is straightforward and has two key ingredients. First, by finding new papers that cite one of my published papers, I automatically get to read papers which will be relevant to my own research interests.  The best bit is that by using Google Scholar, I'm not limiting my search to a single conference -- Google finds papers from the raw web.

Second, I have a short list of superstar vision researchers (Jitendra Malik, among others) and I basically read anything and everything these gurus publish.  Regularly visiting academic homepages is the best way to do this, but Google Scholar also lets me search by name.  In addition, nobody lists on their homepage their papers' citation counts.  This means if I visit a researcher's personal website, I have to make a decision as to what paper to read based on (title, co-authors, publication venue).  But highly-cited papers are likely to be more important to read first.  I believe that this is a good rule of thumb, and very important if you are new to the field.

I am really glad that Google finally let researchers make public profiles to view their papers and see their citations, etc.  See Google Scholar blog for more information.  I've been using statcounter to monitor my blog's visitors, and now I can use Google Scholar to monitor who is citing my research papers!  I'm not claiming that the only way for me to read one of  your papers is to cite one of my papers, but believe me, even if we never met at a vision conference, if you cited one one my papers there's a good chance I already know about your research :-)  I would love to see Google Scholar Citations pages one day replace "my publications" sections on academic homepages...



My Citations screenshot



My only complaint with Google Scholar is that I can't seem to get it to recognize my two most recent papers.  I have these papers listed on my homepage, so do my co-authors, but Google isn't picking them up!!!  I manually added them to my Google My Citations page, and using Google Scholar I was able to find at least one other paper which cites on of these two papers.

I read the inclusion guidelines, and I'm still baffled.  The PDFs are definitely over 5MB, but my older papers which were indexed by Google were also over 5MB.  Dear Google, are you seriously not indexing my recent papers because they are over 5MB?  It takes us, researchers, months of hard work to get our work out the door.  We see the sun rise for weeks straight when we are in deadline-mode, and the conferences/journals give us size limitations -- we work hard to make our stuff fit within these limits (something like 20MB per PDF).  And we, researchers, are crazy about Google and what it means for organizing the world's information -- naturally we are jumping on the Google Scholar bandwagon. I really hope there's some silly reason why I can't find my own papers using Google Scholar, but if I can't find my own work, that means others can't find my own work, and until I can be confident that Google Scholar is bug-free, I cannot give it my full recommendation.

Problematic papers for Google Scholar:

Abhinav Shrivastava, Tomasz Malisiewicz, Abhinav Gupta, Alexei A. Efros. Data-driven Visual Similarity for Cross-domain Image Matching. In SIGGRAPH ASIA, December 2011.

Tomasz Malisiewicz, Abhinav Gupta, Alexei A. Efros. Ensemble of Exemplar-SVMs for Object Detection and Beyond. In ICCV, November 2011.

If anybody has any suggestions (there's a chance I'm doing something wrong), or an explanation as to why my papers haven't been indexed, I would love to hear from you.

Wednesday, November 16, 2011

don't throw away old code: github-it!

My thesis experiments on Exemplar-SVMs (my PhD thesis link: Note, 33MB) would have taken approximately 20 CPU years to finish.  But not on a fat CMU cluster!  Here is some simple code which helped make things possible in ~1month of 200+ cores of crunching.  That scale of computation is not quite Google-scale computing, but it was a unforgettable experience as a CMU PhD student.  I've recently had to go back to the SSH / GNU Screen method of starting scripts at MIT, since we do not have torque/pbs there, but I definitely use these scripts.  Fork it, use it, change it, hack it, improve it, break it, learn from it, etc.

https://github.com/quantombone/warp_scripts

I used these scripts to drive the experiments in my Exemplar-SVM framework (also on Github).


The basic take home message is "do not throw away old code" which you found useful at some time.  C'mon ex-phd students, I know you wrote a lot of code, you graduated and now you feel embarrassed to share your code.  Who cares if you never had a chance to clean it up, if the world never gets to see it then it will die a silent death from lack of use.  Just put it on Github, and let others take a look.  Git is the world's best source control/versioning system. Its distributed nature makes it perfect for large-scale collaboration.  Now with github sharing is super easy! Sharing is caring.  Let's make the world a better place for hackerdom, one repository at a time.  I've met some great hackers at MIT, such as the great cvondrick, who is still teaching me how to branch like a champ.

Mathematicians share proofs.  Hackers share code.  Embrace technology, embrace Github.  If you ever want to hack with me, it is probably as important for you to know the basics of git as it is for you to be a master of linear algebra.

Additional Reading:
Distributed Version Control: The Future of History, an article about Git by some Kitware software engineers