<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-15418143</id><updated>2012-01-27T19:37:45.656-05:00</updated><category term='flash'/><category term='marathon'/><category term='distance function learning'/><category term='object recognition'/><category term='video annotation'/><category term='rob fergus'/><category term='books'/><category term='meaning'/><category term='loopy belief propagation'/><category term='localization'/><category term='iccv'/><category term='non-maximum suppression'/><category term='identification'/><category term='supernatural'/><category term='torque'/><category term='object interpretation'/><category term='spin image'/><category term='abhinav gupta'/><category term='google internship'/><category term='CMU'/><category term='algorithms'/><category term='paradigm shift'/><category term='iccv 2011'/><category term='association'/><category term='exemplar-svm'/><category term='classification'/><category term='perception'/><category term='picasa'/><category term='object detection'/><category term='quine'/><category term='truth'/><category term='academia'/><category term='idealism'/><category term='image retrieval'/><category term='non-parametric'/><category term='soup of segments'/><category term='warp'/><category term='git'/><category term='appearance'/><category term='dennett'/><category term='video'/><category term='transduction'/><category term='visual memex'/><category term='prototypes'/><category term='gibson'/><category term='c++'/><category term='kant'/><category term='rant'/><category term='vocabulary'/><category term='faculty'/><category term='visualization'/><category term='barrow'/><category term='context challenge'/><category term='workshop'/><category term='scene understanding'/><category term='talk'/><category term='transfer learning'/><category term='graphics'/><category term='hierarchy'/><category term='definition'/><category term='ruslan salakhutdinov'/><category term='brain'/><category term='data-driven'/><category term='joseph lim'/><category term='philosophy'/><category term='networking'/><category term='newton&apos;s method'/><category term='epistemology'/><category term='text'/><category term='weinberger'/><category term='philosophy of science'/><category term='xiaofeng ren'/><category term='image segmentation'/><category term='mac'/><category term='optimization'/><category term='time travel'/><category term='puzzles'/><category term='associations'/><category term='intentional stance'/><category term='vatic'/><category term='summary'/><category term='photobios'/><category term='california'/><category term='blogging'/><category term='summer internship'/><category term='jay yagnik'/><category term='talks'/><category term='segmentation'/><category term='google'/><category term='3d model'/><category term='graphical models'/><category term='vondrick'/><category term='ontologies'/><category term='ramanan'/><category term='reverse-engineering'/><category term='computer graphics'/><category term='felzenszwalb'/><category term='Microsoft'/><category term='ted adelson'/><category term='svetlana lazebnik'/><category term='cluster'/><category term='fgvc'/><category term='sketches'/><category term='jedi'/><category term='wittgenstein'/><category term='roberts'/><category term='github'/><category term='hacking'/><category term='everything is misc'/><category term='bay area'/><category term='colorado'/><category term='ut austin'/><category term='unified field theory'/><category term='complex numbers'/><category term='rgb-d dataset'/><category term='exemplarsvm'/><category term='grammar'/><category term='minds'/><category term='CRF'/><category term='sri'/><category term='copernicus'/><category term='josh tenenbaum'/><category term='gist'/><category term='werewolves'/><category term='scene recognition'/><category term='stanford'/><category term='physics'/><category term='yosemite'/><category term='code'/><category term='popular posts'/><category term='torralba'/><category term='beyond categories'/><category term='multi-task'/><category term='artificial intelligence'/><category term='granada'/><category term='teaching'/><category term='edelman'/><category term='sharing knowledge'/><category term='knowledge'/><category term='tricks'/><category term='post-processing'/><category term='inverse optics'/><category term='citations'/><category term='gnu screen'/><category term='pbs'/><category term='coding n00bs'/><category term='sift'/><category term='realism'/><category term='robotics'/><category term='tenenbaum'/><category term='berkely'/><category term='rgbd'/><category term='startup'/><category term='alyosha efros'/><category term='kitware'/><category term='mac os x'/><category term='ssh'/><category term='indexing'/><category term='william james'/><category term='nn'/><category term='machine perception'/><category term='paintings'/><category term='hackers'/><category term='geometry transfer'/><category term='nms'/><category term='libraries'/><category term='publishing'/><category term='inference'/><category term='filters'/><category term='waterfalls'/><category term='deep learning'/><category term='aude oliva'/><category term='theory of mind'/><category term='mathematics'/><category term='computer vision blog'/><category term='att'/><category term='machine learning'/><category term='image interpretation'/><category term='poselets'/><category term='symposium'/><category term='theories'/><category term='laser'/><category term='datasets'/><category term='sven'/><category term='face memex'/><category term='suns 2009'/><category term='courses'/><category term='multiclass sharing'/><category term='web'/><category term='comedy'/><category term='IIT-at-MIT'/><category term='sfm'/><category term='exemplar svms'/><category term='sfdp'/><category term='predictions'/><category term='face detection'/><category term='millions of images'/><category term='kinect'/><category term='siggraph asia'/><category term='big data'/><category term='dalal triggs'/><category term='marvin minsky'/><category term='psychology'/><category term='hiking'/><category term='intelligence'/><category term='image parsing'/><category term='software engineering'/><category term='attributes'/><category term='sports'/><category term='graph cuts'/><category term='guitar'/><category term='seeing'/><category term='descriptors'/><category term='review'/><category term='parts'/><category term='david marr'/><category term='suns 2011'/><category term='antonio torralba'/><category term='paradigm'/><category term='Takeo Kanade'/><category term='gists'/><category term='interns'/><category term='advice'/><category term='poggio'/><category term='density estimation'/><category term='fractals'/><category term='seeing as'/><category term='language'/><category term='pooling ramanan'/><category term='cognitive science'/><category term='ge research'/><category term='multiple segmentations'/><category term='plenoptic function'/><category term='mean face'/><category term='affordances'/><category term='nips 2009'/><category term='joint regulariztion'/><category term='pragmatism'/><category term='geometry'/><category term='convolutions'/><category term='MATLAB'/><category term='1970s'/><category term='concepts'/><category term='Yaroslav Bulatov'/><category term='uwashington'/><category term='fun'/><category term='moshe bar'/><category term='memex'/><category term='internet-scale'/><category term='crowdsourcing'/><category term='categorization'/><category term='aristotle'/><category term='pedro'/><category term='tombone'/><category term='uw'/><category term='discriminative'/><category term='large dataset'/><category term='shimon ullman'/><category term='alex berg'/><category term='cvpr'/><category term='novel objects'/><category term='cvpr 2010'/><category term='statcounter'/><category term='blocks world'/><category term='graphs'/><category term='image understanding'/><category term='youtube'/><category term='renaissance'/><category term='meta-data transfer'/><category term='conference'/><category term='phish'/><category term='graphviz'/><category term='rosch'/><category term='internship'/><category term='abhinav shrivastava'/><category term='nips 2011'/><category term='barcelona'/><category term='indoor recognition'/><category term='analogies'/><category term='academics'/><category term='phd'/><category term='graduate student life'/><category term='wordle'/><category term='python'/><category term='induction'/><category term='girshick'/><category term='peer review'/><category term='3d recognition'/><category term='frontal faces'/><category term='internet'/><category term='thesis proposal'/><category term='grouping'/><category term='kernels'/><category term='torralba art'/><category term='peter tu'/><category term='future directions'/><category term='matching'/><category term='lesson'/><category term='svm'/><category term='imitation'/><category term='papers'/><category term='linux'/><category term='image matching'/><category term='sharing'/><category term='computer science'/><category term='primal'/><category term='postdoc'/><category term='vision'/><category term='deva ramanan'/><category term='face tracking'/><category term='google scholar'/><category term='research'/><category term='pittpatt'/><category term='nips'/><category term='2.5d'/><category term='students'/><category term='programming'/><category term='tutorial'/><category term='vampires'/><category term='superpixel'/><category term='exemplars'/><category term='active learning'/><category term='berkeley'/><category term='context'/><category term='interpretation'/><category term='blog'/><category term='MATLAB code'/><category term='etymology'/><category term='MIT'/><category term='street view'/><category term='black friday'/><category term='cvpr 2011'/><category term='running'/><category term='computer vision'/><category term='parametric'/><category term='segmentation-driven recognition'/><category term='abstraction'/><category term='typicality effects'/><category term='history'/><category term='article'/><category term='professors'/><category term='facetime'/><category term='parameter estimation'/><category term='james hays'/><category term='wittgentein'/><category term='knol'/><category term='progress'/><category term='kristen grauman'/><title type='text'>tombone's blog</title><subtitle type='html'>The philosophy of computational object recognition, scene understanding, machine learning, and musings on the future of computer vision.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default?start-index=101&amp;max-results=100'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>240</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-15418143.post-771583058768352637</id><published>2012-01-27T16:06:00.002-05:00</published><updated>2012-01-27T16:06:27.044-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MATLAB'/><category scheme='http://www.blogger.com/atom/ns#' term='att'/><category scheme='http://www.blogger.com/atom/ns#' term='visualization'/><category scheme='http://www.blogger.com/atom/ns#' term='graphviz'/><category scheme='http://www.blogger.com/atom/ns#' term='sharing'/><category scheme='http://www.blogger.com/atom/ns#' term='github'/><category scheme='http://www.blogger.com/atom/ns#' term='code'/><category scheme='http://www.blogger.com/atom/ns#' term='gist'/><category scheme='http://www.blogger.com/atom/ns#' term='fractals'/><title type='text'>drawing sexy graphs in matlab</title><content type='html'>Everybody who loves computer sciences loves graphs. &amp;nbsp;But the fat 'n juicy graphs, the ones with complex structure you just gotta visualize. &amp;nbsp;To enjoy these beautiful data structures, the hackers at AT&amp;amp;T gave us, the world, Graphviz as a powerful tool for visualizing complex graphs in two dimensions. &amp;nbsp;I do a lot of stuff in Matlab, so I've put my simple graphviz matlab wrappers up on Github so everybody can enjoy them. I do a lot of stuff with graphs...&lt;br /&gt;&lt;br /&gt;My repository, which I'm already using as a submodule in many of my projects, can be found here:&lt;br /&gt;&lt;a href="https://github.com/quantombone/graphviz_matlab_magic"&gt;https://github.com/quantombone/graphviz_matlab_magic&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Here is a matlab script (included as a Github gist), which should be ran in an empty directory, and it will download a nice mat file plus clone my repo and show the following nice graph. &amp;nbsp;I perform two graphviz passes where the first one is used to read the graphviz coordinates (from the sfdp embedding) and use Matlab's jet colormap to color the edges based on distances in this space. &amp;nbsp;In other words, nearby nodes which are connected will be connected by red (hot) edges and faraway nodes will be connected by blue (cold) edges.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-x6fzsRZgcAk/TyMP1wB9EPI/AAAAAAAAKUE/9gBjbJa4LoI/s1600/circles.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="75" src="http://3.bp.blogspot.com/-x6fzsRZgcAk/TyMP1wB9EPI/AAAAAAAAKUE/9gBjbJa4LoI/s320/circles.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;The matrix visualized comes from an electromagnetic model, the details can be found here:&amp;nbsp;&lt;a href="http://www.cise.ufl.edu/research/sparse/matrices/Bai/qc324.html"&gt;http://www.cise.ufl.edu/research/sparse/matrices/Bai/qc324.html&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The original picture generated by &lt;a href="http://www2.research.att.com/~yifanhu/GALLERY/GRAPHS/GIF_SMALL/Bai@qc324.html"&gt;Yifan Hu&lt;/a&gt; is here for comparison:&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.research.att.com/~yifanhu/GALLERY/GRAPHS/GIF_SMALL/Bai@qc324.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="68" src="http://www.research.att.com/~yifanhu/GALLERY/GRAPHS/GIF_SMALL/Bai@qc324.gif" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Enjoy&lt;br /&gt;--Tomasz&lt;br /&gt;&lt;script src="https://gist.github.com/1690842.js"&gt;&lt;br /&gt;&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-771583058768352637?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/771583058768352637/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2012/01/drawing-sexy-graphs-in-matlab.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/771583058768352637'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/771583058768352637'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2012/01/drawing-sexy-graphs-in-matlab.html' title='drawing sexy graphs in matlab'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-x6fzsRZgcAk/TyMP1wB9EPI/AAAAAAAAKUE/9gBjbJa4LoI/s72-c/circles.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-8876228578345539116</id><published>2012-01-10T23:46:00.000-05:00</published><updated>2012-01-10T23:46:03.705-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='statcounter'/><category scheme='http://www.blogger.com/atom/ns#' term='tombone'/><category scheme='http://www.blogger.com/atom/ns#' term='popular posts'/><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='computer vision blog'/><category scheme='http://www.blogger.com/atom/ns#' term='blogging'/><category scheme='http://www.blogger.com/atom/ns#' term='paradigm shift'/><title type='text'>100,000+ page views on my computer vision blog</title><content type='html'>I like high-risk / high-reward activity. &amp;nbsp;While some say that this is my temperament (perhaps a vestige of youth?) I simply say: "that's how I roll." &amp;nbsp;Maybe I was too young when I read Kuhn's &lt;a href="http://en.wikipedia.org/wiki/The_Structure_of_Scientific_Revolutions"&gt;Structure of Scientific Revolutions&lt;/a&gt;, or maybe I was born with iconoclastic ideals, but I earnestly believe that life is too short to always do what you've been told. &amp;nbsp;One of my favorite maxims is the following: "The only limits we have are the ones we impose upon ourselves."&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I took a gamble when I started this blog, blurring the line between all things related to computer vision, philosophy, artificial intelligence, machine learning, and other fun things which constitute my intellectual life. &amp;nbsp;&lt;b&gt;During my PhD I was even discouraged from blogging&lt;/b&gt;, because "my superiors" incessantly reminded me that "you get famous by writing CVPR papers" and not by wasting time maintaining a "cute" blog. &amp;nbsp;Today I'd like to argue that my adventure in blogging has not been a failure at all!&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I had multiple reasons for wanting to blog, several of which I list below:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;I wanted to practice my writing, and what better way to practice writing than by writing!&lt;/li&gt;&lt;li&gt;I wanted an outlet to discuss certain ideas which I find invaluable in my pursuit of building intelligence, but which aren't necessarily publishable. &amp;nbsp;On my blog I am the sole contributor, the sole editor. &amp;nbsp;If you don't like what I have to say, start your own blog. &amp;nbsp;I don't need anonymous reviews, the CVPR submission process stresses me out enough for one lifetime.&lt;/li&gt;&lt;li&gt;I wanted a medium to advertise my own work as well other works which I find important for graduate students in Computer Vision to know about.&lt;/li&gt;&lt;li&gt;I wanted to expose the field of Computer Vision to a broader audience and hopefully get others excited about this amazing research field.&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-ib_KpgXX6gU/Tw0PUKBDPQI/AAAAAAAAKFY/6VCKrKZVNd0/s1600/100kviews.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="37" src="http://3.bp.blogspot.com/-ib_KpgXX6gU/Tw0PUKBDPQI/AAAAAAAAKFY/6VCKrKZVNd0/s400/100kviews.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Today I'm glad to announce that according to &lt;a href="http://statcounter.com/"&gt;statcounter&lt;/a&gt;, my &lt;a href="http://quantombone.blogspot.com/"&gt;computer vision blog&lt;/a&gt; has reached over 100,000 views. &amp;nbsp;In an absolute sense, this really is nothing to be excited about. &amp;nbsp;By since my CMU homepage has approximately 30,000 views, this means that my blog is 3x as popular as my academic homepage! &amp;nbsp;&lt;b&gt;Next goal: 1,000,000 page views!&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I actually meet more people that know me through my blog than through my research papers, even though I put in 100x the effort in doing the research behind those papers. &amp;nbsp;I don't plan on taking up blogging full time anytime soon, but it feels good to know that my blogging adventure has paid off.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Here are some of the top keywords which have been used to find my blog:&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="https://www.google.com/search?q=computer+vision+blog"&gt;computer vision blog&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="https://www.google.com/search?q=tombone"&gt;tombone&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="https://www.google.com/search?q=cmu+computer+vision+blog"&gt;cmu computer vision blog&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="https://www.google.com/search?q=newton+fractal+matlab"&gt;newton fractal matlab&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="https://www.google.com/search?q=tombone's+blog"&gt;tombone's blog&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Here are some of my most popular blog posts of all time:&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://quantombone.blogspot.com/2011/03/computer-vision-is-artificial.html"&gt;Computer Vision is Artificial Intelligence&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://quantombone.blogspot.com/2011/08/vision-hacker-culture-at-google.html"&gt;The vision hacker culture at Google&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://quantombone.blogspot.com/2010/05/graph-visualizations-as-sexy-as.html"&gt;graph visualizations as sexy as fractals&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://quantombone.blogspot.com/2009/07/simple-newtons-method-fractal-code-in.html"&gt;Simple Newton's Method Fractal code in MATLAB&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://quantombone.blogspot.com/2011/10/kinect-object-datasets-berkeleys-b3do.html"&gt;Kinect Object Datasets: Berkeley's B3DO, UW's RGB-D, and NYU's Depth Dataset&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I encourage anybody who reads my blog to shoot me a quick "yo what's up!" at a local conference or where ever else our paths might cross. &amp;nbsp;I also encourage everybody to suggest the types of things they would like to read about on my blog.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-8876228578345539116?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/8876228578345539116/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2012/01/100000-page-views-on-my-computer-vision.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/8876228578345539116'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/8876228578345539116'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2012/01/100000-page-views-on-my-computer-vision.html' title='100,000+ page views on my computer vision blog'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-ib_KpgXX6gU/Tw0PUKBDPQI/AAAAAAAAKFY/6VCKrKZVNd0/s72-c/100kviews.png' height='72' width='72'/><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-3775278291392261459</id><published>2011-12-13T21:07:00.000-05:00</published><updated>2011-12-16T17:41:05.930-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MIT'/><category scheme='http://www.blogger.com/atom/ns#' term='transfer learning'/><category scheme='http://www.blogger.com/atom/ns#' term='joseph lim'/><category scheme='http://www.blogger.com/atom/ns#' term='multi-task'/><category scheme='http://www.blogger.com/atom/ns#' term='antonio torralba'/><category scheme='http://www.blogger.com/atom/ns#' term='granada'/><category scheme='http://www.blogger.com/atom/ns#' term='josh tenenbaum'/><category scheme='http://www.blogger.com/atom/ns#' term='multiclass sharing'/><category scheme='http://www.blogger.com/atom/ns#' term='papers'/><category scheme='http://www.blogger.com/atom/ns#' term='object detection'/><category scheme='http://www.blogger.com/atom/ns#' term='ruslan salakhutdinov'/><category scheme='http://www.blogger.com/atom/ns#' term='cvpr 2011'/><category scheme='http://www.blogger.com/atom/ns#' term='sharing'/><category scheme='http://www.blogger.com/atom/ns#' term='nips 2011'/><title type='text'>learning to "borrow" examples for object detection. Lim et al, NIPS 2011</title><content type='html'>Let's say you want to train a cat detector... &amp;nbsp;If you're anything like me, then you probably have a few labeled cats (~100), as well as a source of non-cat images (~1000). &amp;nbsp;So what do you do when you can't get any more labeled cats? &amp;nbsp;(Maybe &lt;a href="https://www.mturk.com/mturk/welcome"&gt;Amazon's Mechanical Turk service&lt;/a&gt; was shut down by the feds, you've got a paper deadline in 48 hours, and money can't get you out of this dilemma.)&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Answer:&amp;nbsp;&lt;/b&gt;&lt;br /&gt;1) Realize that there are some labeled dogs/cows/sheep in your dataset!&lt;br /&gt;2) Transform some of the dogs/cows/sheep in your dataset to make them look more like cats. Maybe some dogs are already sufficiently similar to cats! (see cheezburger.com image below)&lt;br /&gt;3) Use a subset of those transformed dogs/cows/sheep examples as additional positives in your cat detector! &lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;span class="Apple-style-span" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;a href="http://draft.blogger.com/goog_1856928949"&gt;&lt;img border="0" height="216" src="http://images.icanhascheezburger.com/completestore/2009/3/20/128820522179594379.jpg" width="320" /&gt;&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;Some dogs just look like cats! (and vice-versa)&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://cheezburger.com/sanfo/lolz/View/1904802560"&gt;Image courtesy of cheezburger.com&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Using my own internal language, I view this phenomenon as "&lt;b&gt;exemplar theft&lt;/b&gt;." &amp;nbsp;But not the kind of theft which sends you to prison, 'tis the kind of theft which gives you best-paper prizes at your local conference.&lt;br /&gt;&lt;br /&gt;Note that this was the answer provided by the vision hackers at MIT in their most recent paper, "&lt;a href="http://people.csail.mit.edu/lim/paper/lst_nips11.pdf"&gt;Transfer Learning by Borrowing Examples for Multiclass Object Detection&lt;/a&gt;," which was just presented at this year's big machine learning-oriented NIPS conference, &lt;a href="http://nips.cc/Conferences/2011/"&gt;NIPS 2011&lt;/a&gt;. See the illustration from the paper below, which depicts this type of "example borrowing"-sharing for some objects in the SUN09 dataset.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;img border="0" height="287" src="http://4.bp.blogspot.com/-w3XcTJNlMEc/TugAkX39HwI/AAAAAAAAKEs/li3SvkMrJsw/s400/transform-lim.png" width="400" /&gt;&lt;/div&gt;&lt;br /&gt;The paper empirically demonstrates that instead of doing transfer learning (also known as &lt;a href="http://en.wikipedia.org/wiki/Multi-task_learning"&gt;multi-task learning&lt;/a&gt;) the typical way (regularizing weight vectors towards each other), it is beneficial to simply borrow a subset of (transformed) examples from a related class. &amp;nbsp;Of course the problem is that we do not know apriori which categories to borrow from, nor which instances from those categories will give us a gain in object detection performance. &amp;nbsp;&lt;b&gt;The goal of the algorithm is to learn which categories to borrow from, and which examples to borrow.&lt;/b&gt; &amp;nbsp;Not all dogs will help the cat detector.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;Here are some examples of popular object categories, the categories from which examples are borrowed, and the categories from which examples are shared once we allow transformations to happen. &amp;nbsp;Notice the improvement in AP (the higher the average precision the better) when you allow sharing.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-TyVO3eq6t1Y/TugAjlaKrMI/AAAAAAAAKEk/ISgSZtPShaw/s1600/table1-lim.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="91" src="http://1.bp.blogspot.com/-TyVO3eq6t1Y/TugAjlaKrMI/AAAAAAAAKEk/ISgSZtPShaw/s320/table1-lim.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;They also looked at what happens if you want to improve a single category badass detector on one particular dataset, such as the&lt;a href="http://pascallin.ecs.soton.ac.uk/challenges/VOC/"&gt; PASCAL VOC&lt;/a&gt;. &amp;nbsp;Note that these days just about everybody is using the one-and-only "badass detector" and trying to beat it in its own game. &amp;nbsp; These are the different ways you'll hear people talk about the &lt;a href="http://www.cs.brown.edu/~pff/latent/"&gt;Latent-SVM-based Deformable Part Model baseline&lt;/a&gt;. "badass detector"="state-of-the-art detector"="Felzenszwalb et al. detector"="Pedro's detector"="Deva's detector","Pedro/Deva detector","LDPM detector","DPM detector"&lt;br /&gt;&lt;br /&gt;Even if you only care about your favourite dataset, such as PASCAL VOC, you're probably willing to use additional positive data points from another dataset. &amp;nbsp;In their NIPS paper, the MIT hackers show that simply concatenating datasets is inferior to their clever example borrowing algorithm (mathematical details are found in the paper, but feel free to ask me detailed questions in the comments). &amp;nbsp;In the figure below, the top row shows cars from one dataset (SUN09), the middle row shows PASCAL VOC 2007 cars, and the bottom row shows which example the SUN09-car detector wants to borrow from PASCAL VOC.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-gde9_nFy8zw/TugAiKCVfnI/AAAAAAAAKEM/-eDPdGow7IU/s1600/figure6-lim.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="125" src="http://1.bp.blogspot.com/-gde9_nFy8zw/TugAiKCVfnI/AAAAAAAAKEM/-eDPdGow7IU/s320/figure6-lim.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Here the the cross-dataset generalization performance on the SUN09/PASCAL duo. &amp;nbsp;These results were inspired by the&lt;a href="http://people.csail.mit.edu/torralba/research/bias/"&gt; dataset bias work of Torralba and Efros&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-MTRGa9pDBP0/TugAjSVAG3I/AAAAAAAAKEc/X0Uard0B1Xs/s1600/table4-lim.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="135" src="http://2.bp.blogspot.com/-MTRGa9pDBP0/TugAjSVAG3I/AAAAAAAAKEc/X0Uard0B1Xs/s320/table4-lim.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;In case you're interested, here is the full citation for this excellent NIPS2011 paper:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://people.csail.mit.edu/lim/"&gt;Joseph J. Lim&lt;/a&gt;, &lt;a href="http://www.utstat.toronto.edu/~rsalakhu/"&gt;Ruslan Salakhutdinov&lt;/a&gt;, and &lt;a href="http://web.mit.edu/torralba/www/"&gt;Antonio Torralba&lt;/a&gt;. "&lt;b&gt;&lt;a href="http://people.csail.mit.edu/lim/paper/lst_nips11.pdf"&gt;Transfer Learning by Borrowing Examples for Multiclass Object Detection&lt;/a&gt;&lt;/b&gt;," in NIPS 2011. [&lt;a href="http://people.csail.mit.edu/lim/paper/lst_nips11.pdf"&gt;pdf&lt;/a&gt;]&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;hr /&gt;&lt;br /&gt;To get a better understanding of Lim et al's paper, it is worthwhile going back in time to CVPR2011 and taking a quick look the following paper, also from MIT:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.utstat.toronto.edu/~rsalakhu/"&gt;Ruslan Salakhutdinov&lt;/a&gt;, &lt;a href="http://web.mit.edu/torralba/www/"&gt;Antonio Torralba&lt;/a&gt;, &lt;a href="http://web.mit.edu/cocosci/josh.html"&gt;Josh Tenenbaum&lt;/a&gt;.&amp;nbsp;"&lt;a href="http://people.csail.mit.edu/torralba/publications/sharingCVPR2011.pdf"&gt;Learning to Share Visual Appearance for Multiclass Object Detection&lt;/a&gt;," in CVPR 2011.&amp;nbsp;[&lt;a href="http://people.csail.mit.edu/torralba/publications/sharingCVPR2011.pdf"&gt;pdf&lt;/a&gt;]&lt;br /&gt;&lt;br /&gt;Of course, these authors need no introduction&amp;nbsp;(they are all professors at big-time institutions). Ruslan just recently became a Professor and is now back on home turf (where he got his PhD) in Toronto, where he is likely to become the next Hinton. &amp;nbsp;In my opinion, this "Learning to share" paper was one of the best papers of CVPR 2011. &amp;nbsp;In this paper they introduced the idea of sharing across rigid classifier templates, and more importantly learning a tree to organize hundreds of object categories. &amp;nbsp;The tree defines how the sharing is supposed to happen. &amp;nbsp;The root note is global and shared across all categories, the mid-level nodes can be interpreted as super-categories (i.e., animal, vehicle), and the leaves are the actual object categories (e.g., dog, chair, person, truck).&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-Ku223tAVZXo/TugAg6t2PTI/AAAAAAAAKD8/-1y0XaUBewI/s1600/figure2-ruslan.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="203" src="http://3.bp.blogspot.com/-Ku223tAVZXo/TugAg6t2PTI/AAAAAAAAKD8/-1y0XaUBewI/s320/figure2-ruslan.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;The coolest thing about the paper is that they use a &lt;a href="http://en.wikipedia.org/wiki/Chinese_restaurant_process"&gt;CRP (chinese&amp;nbsp;restaurant&amp;nbsp;process)&lt;/a&gt;&amp;nbsp;to learn a tree without having to specify the number of super-categories!&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-dJvUJeq6OSw/TugAicwWRiI/AAAAAAAAKEU/D0PCyEE_Y34/s1600/figure3-ruslan.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="193" src="http://4.bp.blogspot.com/-dJvUJeq6OSw/TugAicwWRiI/AAAAAAAAKEU/D0PCyEE_Y34/s320/figure3-ruslan.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;Finally, we can see some learned weights for three distinct object categories: truck, van, and bucket. &amp;nbsp;Please see the paper if you want to learn more about sharing -- the clarity of Ruslan's paper is exceptional.&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-KVCPQeNxaI0/TugAhcMfg2I/AAAAAAAAKEE/YD989eF0Qvk/s1600/figure4-ruslan.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="260" src="http://1.bp.blogspot.com/-KVCPQeNxaI0/TugAhcMfg2I/AAAAAAAAKEE/YD989eF0Qvk/s320/figure4-ruslan.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;hr /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;In conclusion, it is pretty clear everybody wants some sort of visual memex. (It is easy to think of the visual memex as a graph where the nodes are individual instances and the edges are relationships between these entities) &amp;nbsp;Sharing, borrowing, multi-task regularization, exemplar-svms, and a host of other approaches are hinting at the breakdown of the traditional category-based way of approaching the problem of object recognition. &amp;nbsp;However, our machine learning tools were designed for supervised machine learning with explicit class information. &amp;nbsp;So what we, the researchers do, is try to break down those classical tools so that we can more effectively exploit the blurry line between not-so-different object categories. &amp;nbsp;At the end of the day, rigid categories can only get us so far. &amp;nbsp;Intelligence requires interpretation at multiple and potentially disparate levels. &amp;nbsp;When it comes to intelligence, the world is not black and white, there are many flavours of meaningful image interpretation.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-3775278291392261459?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/3775278291392261459/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/12/learning-to-borrow-examples-for-object.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/3775278291392261459'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/3775278291392261459'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/12/learning-to-borrow-examples-for-object.html' title='learning to &quot;borrow&quot; examples for object detection. Lim et al, NIPS 2011'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-w3XcTJNlMEc/TugAkX39HwI/AAAAAAAAKEs/li3SvkMrJsw/s72-c/transform-lim.png' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-5214763284029006409</id><published>2011-12-06T17:20:00.001-05:00</published><updated>2011-12-10T04:39:01.831-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='image matching'/><category scheme='http://www.blogger.com/atom/ns#' term='james hays'/><category scheme='http://www.blogger.com/atom/ns#' term='image retrieval'/><category scheme='http://www.blogger.com/atom/ns#' term='abhinav gupta'/><category scheme='http://www.blogger.com/atom/ns#' term='alyosha efros'/><category scheme='http://www.blogger.com/atom/ns#' term='svm'/><category scheme='http://www.blogger.com/atom/ns#' term='big data'/><category scheme='http://www.blogger.com/atom/ns#' term='machine learning'/><category scheme='http://www.blogger.com/atom/ns#' term='memex'/><category scheme='http://www.blogger.com/atom/ns#' term='siggraph asia'/><category scheme='http://www.blogger.com/atom/ns#' term='picasa'/><category scheme='http://www.blogger.com/atom/ns#' term='exemplar svms'/><category scheme='http://www.blogger.com/atom/ns#' term='abhinav shrivastava'/><category scheme='http://www.blogger.com/atom/ns#' term='graphics'/><category scheme='http://www.blogger.com/atom/ns#' term='nn'/><title type='text'>Graphics meets Big Data meets Machine Learning</title><content type='html'>We've all played &lt;a href="http://en.wikipedia.org/wiki/Where's_Wally%3F"&gt;Where's Waldo&lt;/a&gt; as children, and at least for me it was quite a fun game. &amp;nbsp;So today let's play an image-based&amp;nbsp;&lt;a href="http://www.mckinsey.com/Insights/MGI/Research/Technology_and_Innovation/Big_data_The_next_frontier_for_innovation"&gt;Big Data&lt;/a&gt;&amp;nbsp;version of Where's Waldo. &amp;nbsp;I will give you a picture, and you have to find it in a large collection of images! &amp;nbsp;This is a form of&lt;a href="http://en.wikipedia.org/wiki/Image_retrieval"&gt; image retrieval&lt;/a&gt;, and this particular formulation is also commonly called "image matching."&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-6vM72s6OtIs/Tt6WP707WBI/AAAAAAAAKDw/0u1EnvpXuZA/s1600/painting.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://3.bp.blogspot.com/-6vM72s6OtIs/Tt6WP707WBI/AAAAAAAAKDw/0u1EnvpXuZA/s320/painting.png" width="206" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;The only catch is that you are only given &lt;b&gt;one picture&lt;/b&gt;, and I am free to replace the picture with a painting or a sketch. &amp;nbsp;Any two-dimensional pattern is a valid query image, but the key thing to note is that there is only a &lt;b&gt;single input &lt;/b&gt;image&lt;b&gt;. &lt;/b&gt;&lt;a href="http://googlephotos.blogspot.com/2011/12/picasa-39-now-with-google-sharing-and.html"&gt;Life would be awesome if Google's Picasa had this feature built in!&lt;/a&gt;&lt;br /&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;The classical way of solving this problem is via a brute-force nearest neighbor algorithm, an algorithm which won't match pixel pattern directly, but an algorithm which will also use a state-of-the-art image descriptor such as GIST for comparison. &amp;nbsp;Back in 2007, at SIGGRAPH, &lt;a href="http://www.cs.brown.edu/~hays/"&gt;James Hays&lt;/a&gt; and &lt;a href="http://www.cs.cmu.edu/~efros/"&gt;Alexei Efros&lt;/a&gt; have shown this to work quite well once you have a very large database of images! &amp;nbsp;But the reason why the database had to be so large is because a naive Nearest Neighbor algorithm is actually quite dumb. &amp;nbsp;The descriptor might be cleverer than matching raw pixel intensities, but for a machine, an image is nothing but a matrix of numbers, and nobody told the machine which patterns in the matrix are meaningful and which ones aren't. &amp;nbsp;In short, the brute-force algorithm works if there are similar enough images such that all parts of the input image will match a retrieved image. &amp;nbsp;But ideally we would like the algorithm to get better matches by automatically figuring out which parts of the query image are meaningful&amp;nbsp;&amp;nbsp;(e.g., the fountain in the painting)&amp;nbsp;and which parts aren't (e.g., the reflections in the water).&lt;br /&gt;&lt;br /&gt;A modern approach to solve this issue is to collect a large set of related "positive images" and a large set of un-related "negative images" and then train a powerful classifier which can hopefully figure out the meaningful bits of the image. But in this approach the problem is twofold. &amp;nbsp;First, working with a single input image it is not clear whether standard machine learning tools will have a chance of learning anything meaningful. &amp;nbsp;The second issue, a significantly worse problem, is that without a category label or tag, how are we supposed to create a negative set?!? &amp;nbsp;&lt;a href="http://www.cs.cmu.edu/~tmalisie/projects/iccv11/"&gt;Exemplar-SVMs&lt;/a&gt; to the rescue! &amp;nbsp;We can use a large collection of images from the target domain (the domain we want to find matches from) as the negative set -- as long as the "negative set" contains only a small fraction of potentially related images, learning a linear SVM with a single positive still works.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;iframe allowfullscreen="" frameborder="0" height="360" src="http://www.youtube.com/embed/PY__Fo4o67I?feature=player_embedded" width="640"&gt;&lt;/iframe&gt;&lt;br /&gt;&lt;br /&gt;Here is an excerpt from a Techcrunch article which summarizes the project concisely:&lt;br /&gt;&lt;br /&gt;"Instead of comparing a given image head to head with other images and trying to determine a degree of similarity, they turned the problem around. They compared the target image with a great number of random images and recorded the ways in which it differed the most from them. If another image differs in similar ways, chances are it’s similar to the first image. " -- &lt;a href="http://techcrunch.com/2011/12/06/cmu-researchers-one-up-google-image-search-and-photosynth-with-visual-similarity-engine/"&gt;Techcrunch&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.abhinav-shrivastava.info/"&gt;Abhinav Shrivastava&lt;/a&gt;,&amp;nbsp;&lt;a href="http://www.cs.cmu.edu/~tmalisie/"&gt;Tomasz Malisiewicz&lt;/a&gt;,&amp;nbsp;&lt;a href="http://www.cs.cmu.edu/~abhinavg/"&gt;Abhinav Gupta&lt;/a&gt;,&amp;nbsp;&lt;a href="http://www.cs.cmu.edu/~efros/"&gt;Alexei A. Efros&lt;/a&gt;.&amp;nbsp;&lt;a href="http://graphics.cs.cmu.edu/projects/crossDomainMatching/"&gt;Data-driven Visual Similarity for Cross-domain Image Matching.&lt;/a&gt;&amp;nbsp;In SIGGRAPH ASIA, December 2011.&amp;nbsp;&lt;a href="http://graphics.cs.cmu.edu/projects/crossDomainMatching/"&gt;Project Page&lt;/a&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://graphics.cs.cmu.edu/projects/crossDomainMatching/" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="90" src="http://graphics.cs.cmu.edu/projects/crossDomainMatching/images/teaser.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;hr /&gt;&lt;br /&gt;Here is a short listing of some articles which mention our research (thank &lt;a href="http://www.abhinav-shrivastava.info/"&gt;Abhinav&lt;/a&gt;!).&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="p1"&gt;&lt;span class="s1"&gt;&lt;a href="http://techcrunch.com/2011/12/06/cmu-researchers-one-up-google-image-search-and-photosynth-with-visual-similarity-engine/"&gt;http://techcrunch.com/2011/12/06/cmu-researchers-one-up-google-image-search-and-photosynth-with-visual-similarity-engine/&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p2"&gt;&lt;span class="s2"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="p1"&gt;&lt;span class="s1"&gt;&lt;a href="http://news.cs.cmu.edu/article.php?a=2858"&gt;http://news.cs.cmu.edu/article.php?a=2858&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p2"&gt;&lt;span class="s2"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="p1"&gt;&lt;span class="s1"&gt;&lt;a href="http://futureoftech.msnbc.msn.com/_news/2011/12/06/9252228-computer-mimics-human-ability-to-match-images"&gt;http://futureoftech.msnbc.msn.com/_news/2011/12/06/9252228-computer-mimics-human-ability-to-match-images&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p2"&gt;&lt;span class="s2"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="p1"&gt;&lt;span class="s1"&gt;&lt;a href="http://www.physorg.com/news/2011-12-team-computerized-method-images-photos.html"&gt;http://www.physorg.com/news/2011-12-team-computerized-method-images-photos.html&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p2"&gt;&lt;span class="s2"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="p1"&gt;&lt;span class="s1"&gt;&lt;a href="http://nanopatentsandinnovations.blogspot.com/2011/12/carnegie-mellon-creates-computerized.html"&gt;http://nanopatentsandinnovations.blogspot.com/2011/12/carnegie-mellon-creates-computerized.html&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p2"&gt;&lt;span class="s2"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="p1"&gt;&lt;span class="s1"&gt;&lt;a href="http://www.sciencecodex.com/read/carnegie_mellon_creates_computerized_method_for_matching_images_in_photos_paintings_sketches-82595"&gt;http://www.sciencecodex.com/read/carnegie_mellon_creates_computerized_method_for_matching_images_in_photos_paintings_sketches-82595&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p2"&gt;&lt;span class="s2"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="p1"&gt;&lt;span class="s1"&gt;&lt;a href="http://www.cra.org/ccc/rh-imatch.php"&gt;http://www.cra.org/ccc/rh-imatch.php&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-5214763284029006409?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/5214763284029006409/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/12/graphics-meets-big-data-meets-machine.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/5214763284029006409'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/5214763284029006409'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/12/graphics-meets-big-data-meets-machine.html' title='Graphics meets Big Data meets Machine Learning'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-6vM72s6OtIs/Tt6WP707WBI/AAAAAAAAKDw/0u1EnvpXuZA/s72-c/painting.png' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-7959575846991504818</id><published>2011-12-05T21:14:00.001-05:00</published><updated>2011-12-05T21:33:47.728-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='exemplarsvm'/><category scheme='http://www.blogger.com/atom/ns#' term='iccv'/><category scheme='http://www.blogger.com/atom/ns#' term='alyosha efros'/><category scheme='http://www.blogger.com/atom/ns#' term='face detection'/><category scheme='http://www.blogger.com/atom/ns#' term='sketches'/><category scheme='http://www.blogger.com/atom/ns#' term='paintings'/><category scheme='http://www.blogger.com/atom/ns#' term='frontal faces'/><title type='text'>An accidental face detector</title><content type='html'>&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;b&gt;Disclaimer #1: &lt;/b&gt;I don't specialize in faces. &amp;nbsp;When it comes to learning, I like my objectives to be convex. &amp;nbsp;When it comes to hacking on vision systems, I like to tackle entry-level object categories.&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;b&gt;Fun fact #1:&lt;/b&gt; Faces are probably the easiest objects in the world for a machine to localize/detect/recognize.&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;b&gt;Note #1:&lt;/b&gt; I supplied the images, my algorithm supplied the red boxes.&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;b&gt;Note #2:&lt;/b&gt; Sorry to all my friends who failed to get detected by my accidental face detector! (see below)&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;So I was hackplaying with &lt;a href="https://github.com/quantombone/exemplarsvm"&gt;some of my PhD thesis code&lt;/a&gt; over Thanksgiving, and I accidentally made a face detector. &amp;nbsp;oops! &amp;nbsp;I immediately ran to my screenshot capture tool and ran my code on my Mac desktop while browsing Google Images and Facebook. &amp;nbsp;It seems to work pretty well on real faces as well as sketches/paintings of faces (see below)! &amp;nbsp;I even caught two Berkeleyites (an &lt;a href="http://www.cs.cmu.edu/~efros/"&gt;Alyosha&lt;/a&gt;&amp;nbsp;and a &lt;a href="http://www.cis.upenn.edu/~jshi/"&gt;Jianbo&lt;/a&gt;), but you gotta find them for yourself. &amp;nbsp;The detector is definitely tuned to frontal faces, but runs pretty fast and produces few false positives. &amp;nbsp;Not too shabby for some midnight hackerdom.&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-GuCDzM81cEc/Tt16TQ0CliI/AAAAAAAAKCI/nBR2cVikttk/s1600/faces13.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="242" src="http://3.bp.blogspot.com/-GuCDzM81cEc/Tt16TQ0CliI/AAAAAAAAKCI/nBR2cVikttk/s320/faces13.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-v9wHAg4NWfE/Tt16TlUGDuI/AAAAAAAAKCQ/M8bD3EiHDjA/s1600/faces12.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="238" src="http://2.bp.blogspot.com/-v9wHAg4NWfE/Tt16TlUGDuI/AAAAAAAAKCQ/M8bD3EiHDjA/s320/faces12.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-DjDJdOwMhY4/Tt16UUjDulI/AAAAAAAAKCg/wDGwdAo6650/s1600/faces10.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="276" src="http://3.bp.blogspot.com/-DjDJdOwMhY4/Tt16UUjDulI/AAAAAAAAKCg/wDGwdAo6650/s320/faces10.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-NB3aGJ-iA6s/Tt16VK7EGzI/AAAAAAAAKCw/gcnqlKDbOZM/s1600/faces8.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="237" src="http://4.bp.blogspot.com/-NB3aGJ-iA6s/Tt16VK7EGzI/AAAAAAAAKCw/gcnqlKDbOZM/s320/faces8.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-Yth93WorVgo/Tt16VU3X7bI/AAAAAAAAKC4/XkA2aoMaQDo/s1600/faces7.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="228" src="http://1.bp.blogspot.com/-Yth93WorVgo/Tt16VU3X7bI/AAAAAAAAKC4/XkA2aoMaQDo/s320/faces7.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-CjRbvFwNOWc/Tt16VhQ57aI/AAAAAAAAKDA/CIbV8emDi4Q/s1600/faces6.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="210" src="http://4.bp.blogspot.com/-CjRbvFwNOWc/Tt16VhQ57aI/AAAAAAAAKDA/CIbV8emDi4Q/s320/faces6.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-jihBXuWgJkU/Tt16WZ-YOyI/AAAAAAAAKDI/15ezhMw8rjg/s1600/faces5.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="235" src="http://1.bp.blogspot.com/-jihBXuWgJkU/Tt16WZ-YOyI/AAAAAAAAKDI/15ezhMw8rjg/s320/faces5.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-WMlnHUt5HxY/Tt16XiXJocI/AAAAAAAAKDg/sdAiiUTKZj0/s1600/faces2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="226" src="http://3.bp.blogspot.com/-WMlnHUt5HxY/Tt16XiXJocI/AAAAAAAAKDg/sdAiiUTKZj0/s320/faces2.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-F3okHbhfKps/Tt16YEil33I/AAAAAAAAKDo/B_likD0gNd8/s1600/faces1.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="228" src="http://2.bp.blogspot.com/-F3okHbhfKps/Tt16YEil33I/AAAAAAAAKDo/B_likD0gNd8/s320/faces1.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;Yes, I'm doing dense multiscale sliding windows here. &amp;nbsp;Yes, I'm HoGGing the hell outta these images. Yes, I'm using a single frontal-face tuned template. &amp;nbsp;And yes, I &lt;b&gt;only used faces of myself &lt;/b&gt;to train this accidental face detector.&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;b&gt;Note:&lt;/b&gt; If I've used one of your pictures without permission, and you would like a link back to your home on the interwebs, please leave a comment indicating the image and link to original.&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-7959575846991504818?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/7959575846991504818/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/12/accidental-face-detector.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/7959575846991504818'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/7959575846991504818'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/12/accidental-face-detector.html' title='An accidental face detector'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-GuCDzM81cEc/Tt16TQ0CliI/AAAAAAAAKCI/nBR2cVikttk/s72-c/faces13.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-8288006801574455514</id><published>2011-12-02T11:15:00.001-05:00</published><updated>2011-12-02T11:55:36.314-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='siggraph asia'/><category scheme='http://www.blogger.com/atom/ns#' term='blog'/><category scheme='http://www.blogger.com/atom/ns#' term='statcounter'/><category scheme='http://www.blogger.com/atom/ns#' term='iccv'/><category scheme='http://www.blogger.com/atom/ns#' term='exemplar svms'/><category scheme='http://www.blogger.com/atom/ns#' term='citations'/><category scheme='http://www.blogger.com/atom/ns#' term='google scholar'/><category scheme='http://www.blogger.com/atom/ns#' term='google'/><category scheme='http://www.blogger.com/atom/ns#' term='publishing'/><title type='text'>Google Scholar, My Citations, a new paradigm for finding great Computer Vision research papers</title><content type='html'>I have been finding great computer vision research papers by using Google Scholar for the past 2+ years. &amp;nbsp;My recipe is straightforward and has two key ingredients. First, by f&lt;a href="http://scholar.google.com/scholar?hl=en&amp;amp;q=tomasz+malisiewicz&amp;amp;btnG=Search&amp;amp;as_sdt=0%2C22&amp;amp;as_ylo=&amp;amp;as_vis=0"&gt;inding new papers that cite one of my published papers&lt;/a&gt;, I automatically get to read papers which will be relevant to my own research interests. &amp;nbsp;The best bit is that by using Google Scholar, I'm not limiting my search to a single conference -- Google finds papers from the raw web.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;Second, I have a short list of superstar vision researchers (&lt;a href="http://www.cs.berkeley.edu/~malik/"&gt;Jitendra Malik&lt;/a&gt;, among others) and I basically read anything and everything these gurus publish. &amp;nbsp;Regularly visiting academic homepages is the best way to do this, but Google Scholar also lets me search by name. &amp;nbsp;In addition, nobody lists on their homepage their papers' citation counts. &amp;nbsp;This means if I visit a researcher's personal website, I have to make a decision as to what paper to read based on (title, co-authors, publication venue). &amp;nbsp;But highly-cited papers are likely to be more important to read first. &amp;nbsp;I believe that this is a good rule of thumb, and very important if you are new to the field.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I am really glad that Google finally let researchers make public profiles to view their papers and see their citations, etc. &lt;a href="http://googlescholar.blogspot.com/2011/11/google-scholar-citations-open-to-all.html"&gt;&amp;nbsp;See Google Scholar blog for more information.&lt;/a&gt; &amp;nbsp;I've been using &lt;a href="http://statcounter.com/"&gt;statcounter&lt;/a&gt; to monitor my blog's visitors, and now I can use Google Scholar to monitor who is citing my research papers! &amp;nbsp;I'm not claiming that the only way for me to read one of &amp;nbsp;your papers is to cite one of my papers, but believe me, even if we never met at a vision conference, if you cited one one my papers there's a good chance I already know about your research :-) &amp;nbsp;I would love to see Google Scholar Citations pages one day replace "my publications" sections on academic homepages...&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://scholar.google.com/citations?user=RCTeTV0AAAAJ&amp;amp;hl=en"&gt;&lt;img border="0" height="320" src="http://3.bp.blogspot.com/-6vgceAMjBrc/Ttj65lou05I/AAAAAAAAKB4/WsxcTgnXDJA/s320/mycitations-screenshot.png" width="309" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;My Citations screenshot&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;My only complaint with Google Scholar is that I can't seem to get it to recognize my two most recent papers. &amp;nbsp;I have these papers listed on my homepage, so do my co-authors, but Google isn't picking them up!!! &amp;nbsp;I manually added them to my Google My Citations page, and using Google Scholar I was able to find at least one other paper which cites on of these two papers.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I read the &lt;a href="http://scholar.google.com/intl/en/scholar/inclusion.html"&gt;inclusion guidelines&lt;/a&gt;, and I'm still baffled. &amp;nbsp;The PDFs are definitely over 5MB, but my older papers which were indexed by Google were also over 5MB. &amp;nbsp;Dear Google, are you seriously not indexing my recent papers because they are over 5MB? &amp;nbsp;It takes us, researchers, months of hard work to get our work out the door. &amp;nbsp;We see the sun rise for weeks straight when we are in deadline-mode, and the conferences/journals give us size limitations -- we work hard to make our stuff fit within these limits (something like 20MB per PDF). &amp;nbsp;And we, researchers, are crazy about Google and what it means for organizing the world's information -- naturally we are jumping on the Google Scholar bandwagon.&amp;nbsp;I really hope there's some silly reason why I can't find my own papers using Google Scholar, but if I can't find my own work, that means others can't find my own work, and until I can be confident that Google Scholar is bug-free, I cannot give it my full recommendation.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Problematic papers for Google Scholar:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Abhinav Shrivastava, Tomasz Malisiewicz, Abhinav Gupta, Alexei A. Efros. &lt;a href="http://www.cs.cmu.edu/~tmalisie/projects/sa11/shrivastava-sa11.pdf"&gt;Data-driven Visual Similarity for Cross-domain Image Matching.&lt;/a&gt; In SIGGRAPH ASIA, December 2011.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Tomasz Malisiewicz, Abhinav Gupta, Alexei A. Efros. &lt;a href="http://www.cs.cmu.edu/~tmalisie/projects/iccv11/exemplarsvm-iccv11.pdf"&gt;Ensemble of Exemplar-SVMs for Object Detection and Beyond.&lt;/a&gt; In ICCV, November 2011.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;If anybody has any suggestions (there's a chance I'm doing something wrong), or an explanation as to why my papers haven't been indexed, I would love to hear from you.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-8288006801574455514?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/8288006801574455514/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/12/google-scholar-my-citations-new.html#comment-form' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/8288006801574455514'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/8288006801574455514'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/12/google-scholar-my-citations-new.html' title='Google Scholar, My Citations, a new paradigm for finding great Computer Vision research papers'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-6vgceAMjBrc/Ttj65lou05I/AAAAAAAAKB4/WsxcTgnXDJA/s72-c/mycitations-screenshot.png' height='72' width='72'/><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-1949086628856957509</id><published>2011-11-16T18:29:00.001-05:00</published><updated>2011-12-10T03:58:52.615-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='torque'/><category scheme='http://www.blogger.com/atom/ns#' term='cluster'/><category scheme='http://www.blogger.com/atom/ns#' term='CMU'/><category scheme='http://www.blogger.com/atom/ns#' term='github'/><category scheme='http://www.blogger.com/atom/ns#' term='gnu screen'/><category scheme='http://www.blogger.com/atom/ns#' term='ssh'/><category scheme='http://www.blogger.com/atom/ns#' term='warp'/><category scheme='http://www.blogger.com/atom/ns#' term='exemplar-svm'/><category scheme='http://www.blogger.com/atom/ns#' term='pbs'/><category scheme='http://www.blogger.com/atom/ns#' term='sharing'/><category scheme='http://www.blogger.com/atom/ns#' term='code'/><category scheme='http://www.blogger.com/atom/ns#' term='graphics'/><category scheme='http://www.blogger.com/atom/ns#' term='kitware'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><title type='text'>don't throw away old code: github-it!</title><content type='html'>My thesis experiments on Exemplar-SVMs&amp;nbsp;(my&amp;nbsp;&lt;a href="http://www.cs.cmu.edu/~tmalisie/thesis/malisiewicz_thesis.pdf"&gt;PhD thesis link&lt;/a&gt;: Note, 33MB)&amp;nbsp;would have taken approximately 20 CPU years to finish. &amp;nbsp;But not on a fat CMU cluster! &amp;nbsp;Here is some simple code which helped make things possible in ~1month of 200+ cores of crunching. &amp;nbsp;That scale of computation is not quite Google-scale computing, but it was a unforgettable experience as a CMU PhD student. &amp;nbsp;I've recently had to go back to the SSH / GNU Screen method of starting scripts at MIT, since we do not have torque/pbs there, but I definitely use these scripts. &amp;nbsp;Fork it, use it, change it, hack it, improve it, break it, learn from it, etc.&lt;br /&gt;&lt;br /&gt;&lt;a href="https://github.com/quantombone/warp_scripts"&gt;https://github.com/quantombone/warp_scripts&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I used these scripts to drive the experiments in my &lt;a href="https://github.com/quantombone/exemplarsvm"&gt;Exemplar-SVM framework&lt;/a&gt; (also on Github).&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="https://github.com/" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="https://a248.e.akamai.net/assets.github.com/images/modules/header/logov6-hover.svg?1315937721" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;The basic take home message is &lt;b&gt;"do not throw away old code"&lt;/b&gt; which you found useful at some time. &amp;nbsp;C'mon ex-phd students, I know you wrote a lot of code, you graduated and now you feel&amp;nbsp;embarrassed&amp;nbsp;to share your code. &amp;nbsp;Who cares if you never had a chance to clean it up, if the world never gets to see it then it will die a silent death from lack of use. &amp;nbsp;Just put it on &lt;a href="https://github.com/"&gt;Github&lt;/a&gt;, and let others take a look. &amp;nbsp;Git is the world's best source control/versioning system. Its distributed nature makes it perfect for large-scale collaboration. &amp;nbsp;Now with github sharing is super easy! Sharing is caring. &amp;nbsp;Let's make the world a better place for hackerdom, one repository at a time. &amp;nbsp;I've met some great hackers at MIT, such as the great &lt;a href="https://github.com/cvondrick"&gt;cvondrick&lt;/a&gt;, who is still teaching me how to branch like a champ.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Mathematicians share proofs. &amp;nbsp;Hackers share code. &amp;nbsp;Embrace technology, embrace Github. &lt;/b&gt;&amp;nbsp;If you ever want to hack with me, it is probably as important for you to know the basics of git as it is for you to be a master of linear algebra.&lt;br /&gt;&lt;br /&gt;Additional Reading:&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.kitware.com/index.html" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="66" src="http://www.kitware.com/img/logo_over.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;a href="http://www.kitware.com/products/html/DistributedVersionControlTheFutureOfHistory.html"&gt;Distributed Version Control: The Future of History&lt;/a&gt;, an article about Git by some Kitware software engineers&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-1949086628856957509?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/1949086628856957509/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/11/dont-throw-away-old-code-github-it.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/1949086628856957509'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/1949086628856957509'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/11/dont-throw-away-old-code-github-it.html' title='don&apos;t throw away old code: github-it!'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-6098771917716620389</id><published>2011-11-08T21:05:00.001-05:00</published><updated>2011-11-08T21:05:46.642-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='svetlana lazebnik'/><category scheme='http://www.blogger.com/atom/ns#' term='iccv 2011'/><category scheme='http://www.blogger.com/atom/ns#' term='parts'/><category scheme='http://www.blogger.com/atom/ns#' term='scene recognition'/><title type='text'>scene recognition with part-based models at ICCV 2011</title><content type='html'>&lt;br /&gt;&lt;div class="p1"&gt;Today, I wanted to point everyone's attention to a super-cool paper from day 1 of this year's&lt;a href="http://www.iccv2011.org/"&gt; ICCV 2011 Conference&lt;/a&gt;. &amp;nbsp;Megha Pandey is the lead on this, and Lana Lazebnik (of spatial pyramid fame) is the seasoned vision community member supervising this research. &amp;nbsp;The idea is really simple (and simplicity is a plus!): train a latent deformable part-based model for scenes. &amp;nbsp;Some of the scene models look really cool, and I encourage everybody interested in scene recognition to take a look. &amp;nbsp;&lt;/div&gt;&lt;div class="p1"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.cs.unc.edu/~megha/DPM/index.html" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://www.cs.unc.edu/~megha/DPM/corridor.jpg" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;A Part-based Scene Model&lt;/div&gt;&lt;br /&gt;&lt;div class="p1"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;One of the reasons why I like this paper is because just like our &lt;a href="http://graphics.cs.cmu.edu/projects/crossDomainMatching/"&gt;SIGGRAPH ASIA 2011 paper on cross-domain image matching&lt;/a&gt;, they are using HOG features to represent scenes and applying these models in a sliding-window fashion. &amp;nbsp;This is much different than the traditional image-to-feature-vector mapping used in systems based on the GIST descriptor. &amp;nbsp;These types of approaches allow the detection of a scene inside another image! &amp;nbsp;Framing issues are elegantly handled by allowing the model to slide.&lt;/div&gt;&lt;div class="p1"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;&lt;b style="font-family: 'times new roman';"&gt;&lt;a href="http://www.cs.unc.edu/~lazebnik/publications/megha_iccv2011.pdf"&gt;Scene Recognition and Weakly Supervised Object Localization with Deformable Part-Based Models.&lt;/a&gt;&amp;nbsp;&lt;/b&gt;&lt;span class="Apple-style-span" style="font-family: 'times new roman';"&gt;&lt;a href="http://www.cs.unc.edu/~megha/"&gt;Megha Pandey&lt;/a&gt; and &lt;a href="http://www.cs.unc.edu/~lazebnik/"&gt;Svetlana Lazebnik&lt;/a&gt;.&amp;nbsp;&lt;/span&gt;&lt;em style="font-family: 'times new roman';"&gt;Proceedings of the IEEE International Conference on Computer Vision&lt;/em&gt;&lt;span class="Apple-style-span" style="font-family: 'times new roman';"&gt;, 2011.&amp;nbsp;&lt;/span&gt;&lt;a href="http://www.cs.unc.edu/~megha/DPM/index.html" style="text-align: center;"&gt;Project Page&lt;/a&gt;&amp;nbsp;&lt;a href="http://www.cs.unc.edu/~lazebnik/publications/megha_iccv2011.pdf"&gt;[pdf]&lt;/a&gt;&lt;/div&gt;&lt;div class="p1"&gt;&lt;i&gt;&lt;br /&gt;&lt;/i&gt;&lt;/div&gt;&lt;div class="p1"&gt;&lt;i&gt;Abstract: Weakly supervised discovery of common visual structure in highly variable, cluttered images is a key problem in recognition. We address this problem using deformable part-based models (DPM’s) with latent SVM training&lt;/i&gt;&lt;i&gt;. These models have been introduced for fully supervised training of object detectors, but we demonstrate that they are also capable of more open-ended learning of latent structure for such tasks as scene recognition and weakly supervised object localization. For scene recognition, DPM’s can capture recurring visual elements and salient objects; in combination with standard global image features, they obtain state-of-the-art results on the MIT 67-category indoor scene dataset. For weakly supervised object localization, optimization over latent DPM parameters can discover the spatial extent of objects in cluttered training images without ground-truth bounding boxes. The resulting method outperforms a recent state-of-the-art weakly supervised object localization approach on the PASCAL-07 dataset.&lt;/i&gt;&lt;/div&gt;&lt;div class="p1"&gt;&lt;i&gt;&lt;br /&gt;&lt;/i&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.cs.unc.edu/~megha/DPM/index.html" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://www.cs.unc.edu/~megha/DPM/bicycle-left.jpg" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;Weakly Supervised Object Localization (see paper for details)&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="p1"&gt;&lt;i&gt;&lt;br /&gt;&lt;/i&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-6098771917716620389?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/6098771917716620389/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/11/scene-recognition-with-part-based.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/6098771917716620389'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/6098771917716620389'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/11/scene-recognition-with-part-based.html' title='scene recognition with part-based models at ICCV 2011'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-2516032852816853183</id><published>2011-11-05T03:27:00.001-05:00</published><updated>2011-11-05T03:27:22.998-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='iccv 2011'/><category scheme='http://www.blogger.com/atom/ns#' term='iccv'/><category scheme='http://www.blogger.com/atom/ns#' term='barcelona'/><category scheme='http://www.blogger.com/atom/ns#' term='papers'/><title type='text'>the fun begins at ICCV 2011</title><content type='html'>All the cool vision kids are going, so why aren't you?&lt;br /&gt;&lt;a href="http://www.iccv2011.org/"&gt;http://www.iccv2011.org/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;This will be my first ICCV ever! and my first trip to Spain! &amp;nbsp;Seriously though, if you need to find me over the next week, come to Barcelona.&amp;nbsp;There are lots of great papers out this year and I'll be sure the write about the few which I find interesting (and haven't already blogged about). &amp;nbsp;If you want to learn more about the craziness behind ExemplarSVMs, or just to say 'Hi' don't hesitate to find me walking around the conference. &amp;nbsp;I'll be there during all the workshop days too.&lt;br /&gt;&lt;br /&gt;If anybody has a favourite ICCV2001 paper they want me to look and perhaps write something about (hardcore object recognition please -- I don't care about illumination models), please send me your requests (in the comments below).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-2516032852816853183?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/2516032852816853183/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/11/fun-begins-at-iccv-2011.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/2516032852816853183'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/2516032852816853183'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/11/fun-begins-at-iccv-2011.html' title='the fun begins at ICCV 2011'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-1476206257316623231</id><published>2011-10-26T12:05:00.000-05:00</published><updated>2011-10-26T12:07:16.568-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='internship'/><category scheme='http://www.blogger.com/atom/ns#' term='blog'/><category scheme='http://www.blogger.com/atom/ns#' term='vision'/><category scheme='http://www.blogger.com/atom/ns#' term='Yaroslav Bulatov'/><category scheme='http://www.blogger.com/atom/ns#' term='google'/><category scheme='http://www.blogger.com/atom/ns#' term='machine learning'/><title type='text'>Google Internship in Vision/ML</title><content type='html'>Disclaimer: the following post is cross-posted from &lt;a href="http://yaroslavvb.blogspot.com/"&gt;Yaroslav's "Machine Learning, etc" blog&lt;/a&gt;. Since I always rave about my experiences at Google as an intern (did it twice!), I thought some of my fellow readers would find this information useful. &amp;nbsp;If you are a vision PhD student at CMU or MIT, feel free to ask me more about life at Google. &amp;nbsp;If you have questions regarding the following internship offer, you'll have to ask Yaroslav.&lt;br /&gt;&lt;hr /&gt;Original post at:&amp;nbsp;&lt;a href="http://yaroslavvb.blogspot.com/2011/10/google-internship-in-visionml.html"&gt;http://yaroslavvb.blogspot.com/2011/10/google-internship-in-visionml.html&lt;/a&gt;&lt;br /&gt;&lt;hr /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18px;"&gt;My group has intern openings for winter and summer. Winter may be too late (but if you really want winter, ping me and I'll find out feasibility). We use&amp;nbsp;OCR for Google Books, frames from YouTube videos, spam images, unreadable PDFs encountered by the crawler, images from Google's StreetView cameras, Android and few other areas.&amp;nbsp;Recognizing individual character candidates is a key step in OCR system. One that machines are not very good at. Even with 0 context, humans are better. This shall not stand!&lt;br /&gt;&lt;br /&gt;For example, when I showed the picture below to my Taiwanese coworker he immediately said that these were multiple instance of Chinese "one".&lt;br /&gt;&lt;br /&gt;&lt;img src="http://yaroslavvb.com/upload/interns/one-small.png" style="-webkit-box-shadow: rgba(0, 0, 0, 0.0976562) 1px 1px 5px; background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-color: rgb(238, 238, 238); border-bottom-style: solid; border-bottom-width: 1px; border-left-color: rgb(238, 238, 238); border-left-style: solid; border-left-width: 1px; border-right-color: rgb(238, 238, 238); border-right-style: solid; border-right-width: 1px; border-top-color: rgb(238, 238, 238); border-top-style: solid; border-top-width: 1px; box-shadow: rgba(0, 0, 0, 0.0976562) 1px 1px 5px; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px;" /&gt;&lt;br /&gt;&lt;br /&gt;Here are 4 of those images close-up. Classical OCR approaches, have trouble with these characters.&lt;br /&gt;&lt;br /&gt;&lt;img src="http://yaroslavvb.com/upload/interns/one-large.png" style="-webkit-box-shadow: rgba(0, 0, 0, 0.0976562) 1px 1px 5px; background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-color: rgb(238, 238, 238); border-bottom-style: solid; border-bottom-width: 1px; border-left-color: rgb(238, 238, 238); border-left-style: solid; border-left-width: 1px; border-right-color: rgb(238, 238, 238); border-right-style: solid; border-right-width: 1px; border-top-color: rgb(238, 238, 238); border-top-style: solid; border-top-width: 1px; box-shadow: rgba(0, 0, 0, 0.0976562) 1px 1px 5px; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px;" /&gt;&lt;br /&gt;&lt;br /&gt;This is a common problem for high-noise domain like camera pictures and digital text rasterized at low resolution. Some results&amp;nbsp;&lt;a href="http://research.microsoft.com/en-us/um/people/manik/pubs/deCampos09.pdf" style="color: #2288bb; text-decoration: none;"&gt;suggest&lt;/a&gt;&amp;nbsp;that techniques from Machine Vision can help.&lt;br /&gt;&lt;br /&gt;For low-noise domains like Google Books and broken PDF indexing, shortcomings of traditional OCR systems are due to&lt;br /&gt;1) Large number of classes (100k letters in Unicode 6.0)&lt;br /&gt;2) Non-trivial variation within classes&lt;br /&gt;Example of "non-trivial variation"&lt;br /&gt;&lt;img src="http://yaroslavvb.com/upload/interns/ampersands.png" style="-webkit-box-shadow: rgba(0, 0, 0, 0.0976562) 1px 1px 5px; background-attachment: initial; background-clip: initial; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-color: rgb(238, 238, 238); border-bottom-style: solid; border-bottom-width: 1px; border-left-color: rgb(238, 238, 238); border-left-style: solid; border-left-width: 1px; border-right-color: rgb(238, 238, 238); border-right-style: solid; border-right-width: 1px; border-top-color: rgb(238, 238, 238); border-top-style: solid; border-top-width: 1px; box-shadow: rgba(0, 0, 0, 0.0976562) 1px 1px 5px; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px;" /&gt;&lt;br /&gt;&lt;br /&gt;I found over 100k distinct instances of digital letter 'A' from just one day's crawl worth of documents from the web. Some more examples are&amp;nbsp;&lt;a href="http://yaroslavvb.blogspot.com/2011/09/notmnist-dataset.html" style="color: #2288bb; text-decoration: none;"&gt;here&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Chances are that the ideas for human-level classifier are out there. They just haven't been implemented and tested in realistic conditions. We need&amp;nbsp;&lt;b&gt;someone with ML/Vision background to come to Google and implement a great character classifier.&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;You'd have a large impact if your ideas become part of Tesseract. Through books alone, your code will be run on books from 42 libraries. And since Tesseract is open-source, you'd be contributing to the main OCR effort in the open-source community.&lt;br /&gt;&lt;br /&gt;You will get a ton of data, resources and smart people around you. It's a very low bureocracy place. You could run Matlab code on 10k cores if you really wanted, and I know someone who has launched 200k core jobs for a personal project. The infrastructure also makes things easier. Google's MapReduce can sort a petabyte of data (10 trillion strings) with 8000 machines in&amp;nbsp;&lt;a href="http://googleresearch.blogspot.com/2011/09/sorting-petabytes-with-mapreduce-next.html" style="color: #2288bb; text-decoration: none;"&gt;just 30 mins&lt;/a&gt;. Some of the work in our team used features coming from distributed deep belief infrastructure.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;In order to get an internship position, you must pass general technical screen that I have no control of. If you are interested in more details, you could &lt;a href="http://yaroslavvb.blogspot.com/2011/10/google-internship-in-visionml.html"&gt;contact me directly&lt;/a&gt;. &amp;nbsp;-- Yaroslav&lt;br /&gt;&lt;br /&gt;(the link to apply is usually&amp;nbsp;&lt;a href="http://www.google.com/jobs/students/us/internships/eng/software-engineering-intern-phd-winter-north-america-locations/index.html" style="color: #2288bb; text-decoration: none;"&gt;here&lt;/a&gt;, but now it's down, will update when it's fixed)&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-1476206257316623231?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/1476206257316623231/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/10/google-internship-in-visionml.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/1476206257316623231'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/1476206257316623231'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/10/google-internship-in-visionml.html' title='Google Internship in Vision/ML'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-7916320402173051128</id><published>2011-10-25T16:20:00.001-05:00</published><updated>2011-10-25T16:20:20.281-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='active learning'/><category scheme='http://www.blogger.com/atom/ns#' term='grammar'/><category scheme='http://www.blogger.com/atom/ns#' term='girshick'/><category scheme='http://www.blogger.com/atom/ns#' term='github'/><category scheme='http://www.blogger.com/atom/ns#' term='nips 2011'/><category scheme='http://www.blogger.com/atom/ns#' term='vatic'/><category scheme='http://www.blogger.com/atom/ns#' term='vondrick'/><category scheme='http://www.blogger.com/atom/ns#' term='ramanan'/><category scheme='http://www.blogger.com/atom/ns#' term='video annotation'/><category scheme='http://www.blogger.com/atom/ns#' term='felzenszwalb'/><title type='text'>NIPS 2011 preview: person grammars and machines-in-the-loop for video annotation</title><content type='html'>&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-AtaJLJ4p_FA/TqcinSoq0TI/AAAAAAAAKBc/ZXiAcoa4lGA/s1600/pedro-nips.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="170" src="http://2.bp.blogspot.com/-AtaJLJ4p_FA/TqcinSoq0TI/AAAAAAAAKBc/ZXiAcoa4lGA/s320/pedro-nips.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;a href="http://people.cs.uchicago.edu/~rbg/"&gt;Ross Girshick&lt;/a&gt;, &lt;a href="http://www.cs.brown.edu/~pff/"&gt;Pedro Felzenszwalb&lt;/a&gt;, &lt;a href="http://ttic.uchicago.edu/~dmcallester/"&gt;David McAllester&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Object Detection with Grammar Models&lt;/b&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;To appear in NIPS 2011&amp;nbsp;&lt;a href="http://www.cs.brown.edu/~pff/papers/grammar-nips11.pdf"&gt;pdf&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Today, I want to point out two upcoming NIPS papers which might be of interest to the Computer Vision community. &amp;nbsp;First, we have a person detection paper from the hackers who brought you &lt;a href="http://www.cs.brown.edu/~pff/latent/"&gt;Latent Discriminatively Trained Part-based Models&lt;/a&gt; (aka voc-release-3.1 and voc-release-4.0). &amp;nbsp;I personally don't care for grammars (I think &lt;a href="http://www.cs.cmu.edu/~tmalisie/projects/iccv11/"&gt;exemplars are a much more data-driven&lt;/a&gt; and computation-friendly way of modeling visual concepts), but I think any paper with Pedro on the author list is really worth checking out. &amp;nbsp;Maybe after I digest all the details, I'll jump on the grammar bandwagon (but I doubt it). &amp;nbsp;Also of note, is the fact that Pedro Felzenszwalb has relocated to Brown University.&lt;br /&gt;&lt;br /&gt;&lt;hr /&gt;&lt;br /&gt;The second paper, is by Carl Vondrick and Deva Ramanan (also of latent-svm fame). &amp;nbsp;Carl is the author of &lt;a href="http://mit.edu/vondrick/vatic/"&gt;vatic&lt;/a&gt;&amp;nbsp;and a fellow &lt;a href="https://github.com/cvondrick/"&gt;vision@github hacker&lt;/a&gt;. &amp;nbsp;Carl, like myself, has joined &lt;a href="http://web.mit.edu/torralba/www/"&gt;Antonio Torralba&lt;/a&gt;'s group at MIT this fall. &amp;nbsp;He just started his PhD, so you can only expect the quality of his work to increase without bound over the next ~5 years. &amp;nbsp;&lt;b&gt;&lt;a href="http://mit.edu/vondrick/vatic/"&gt;vatic&lt;/a&gt;&lt;/b&gt; is an online, interactive video annotation tool for computer vision research that crowdsources work to Amazon's Mechanical Turk. Vatic makes it easy to build massive, affordable video data sets and can be deployed on a cloud. Written in Python + C + Javascript, vatic is free and open-source software. The video below showcases the power of vatic.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;iframe allowfullscreen="" frameborder="0" height="315" src="http://www.youtube.com/embed/ljI5pAowACc" width="560"&gt;&lt;/iframe&gt;&lt;br /&gt;In this paper, Vondrick et al. use &lt;a href="http://en.wikipedia.org/wiki/Active_learning"&gt;active learning&lt;/a&gt; to select the frames which require human annotation. &amp;nbsp;Rather than simply doing linear interpolation between frames, they are truly putting the "machine-in-the-loop." When doing large-scale video annotation, this approach can supposedly save you tens of thousands of dollars.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-ueOCt7o-ibg/TqcisM9IO1I/AAAAAAAAKBk/hn-OTXXFYqY/s1600/cv.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="241" src="http://1.bp.blogspot.com/-ueOCt7o-ibg/TqcisM9IO1I/AAAAAAAAKBk/hn-OTXXFYqY/s320/cv.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;a href="http://mit.edu/vondrick/"&gt;Carl Vondrick&lt;/a&gt; and &lt;a href="http://www.ics.uci.edu/~dramanan/"&gt;Deva Ramanan&lt;/a&gt;. "&lt;strong&gt;Video Annotation and Tracking with Active Learning&lt;/strong&gt;"&amp;nbsp;&lt;em&gt;Neural Information Processing Systems&lt;/em&gt;&amp;nbsp;(NIPS) Granada, Spain, December 2011.&amp;nbsp;&lt;a href="http://mit.edu/vondrick/vatic/videoalearn.pdf" style="text-decoration: none;"&gt;[paper]&lt;/a&gt;&amp;nbsp;&lt;a href="http://mit.edu/vondrick/vatic/videoalearnslides.pdf" style="text-decoration: none;"&gt;[slides]&lt;/a&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-7916320402173051128?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/7916320402173051128/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/10/nips-2011-preview-person-grammars-and.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/7916320402173051128'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/7916320402173051128'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/10/nips-2011-preview-person-grammars-and.html' title='NIPS 2011 preview: person grammars and machines-in-the-loop for video annotation'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-AtaJLJ4p_FA/TqcinSoq0TI/AAAAAAAAKBc/ZXiAcoa4lGA/s72-c/pedro-nips.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-964805211522199627</id><published>2011-10-06T13:26:00.000-05:00</published><updated>2011-10-06T17:48:17.712-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='uwashington'/><category scheme='http://www.blogger.com/atom/ns#' term='crowdsourcing'/><category scheme='http://www.blogger.com/atom/ns#' term='3d recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='berkeley'/><category scheme='http://www.blogger.com/atom/ns#' term='rgb-d dataset'/><category scheme='http://www.blogger.com/atom/ns#' term='2.5d'/><category scheme='http://www.blogger.com/atom/ns#' term='kinect'/><category scheme='http://www.blogger.com/atom/ns#' term='uw'/><category scheme='http://www.blogger.com/atom/ns#' term='datasets'/><category scheme='http://www.blogger.com/atom/ns#' term='object detection'/><title type='text'>Kinect Object Datasets: Berkeley's B3DO, UW's RGB-D, and NYU's Depth Dataset</title><content type='html'>&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-size: x-large;"&gt;Why Kinect?&lt;/span&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.pirobot.org/" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://www.pirobot.org/images/pi-kinect.jpg" width="196" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.pirobot.org/"&gt;www.pirobot.org&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;The Kinect, made by Microsoft, is starting to become quite a common item in Robotics and Computer Vision research. &amp;nbsp;While the Robotics community has been using the Kinect as a cheap laser sensor which can be used for obstacle avoidance, the vision community has been excited about using the 2.5D data associated with the Kinect for object detection and recognition. &amp;nbsp;The possibility of building object recognition systems which have access to pixel features as well as 2.5D features is truly exciting for the vision hacker community!&lt;/div&gt;&lt;hr /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-size: x-large;"&gt;Berkeley's B3DO&lt;/span&gt;&lt;/b&gt;&lt;/div&gt;&lt;div style="text-align: left;"&gt;First of all, I would like to mention that it looks like the &lt;a href="http://www.eecs.berkeley.edu/Research/Projects/CS/vision/"&gt;Berkeley Vision Group&lt;/a&gt; jumped on the Kinect bandwagon. &amp;nbsp;But the data collection effort will be &lt;a href="http://en.wikipedia.org/wiki/Crowdsourcing"&gt;crowdsourced&lt;/a&gt; -- they need your help! &amp;nbsp;They need you to use your Kinect to capture your own home/office environments and upload it to their servers &amp;nbsp;This way, a very large dataset will be collected, and we, the vision hackers, can use machine learning techniques to learn what sofas, desks, chairs, monitors, and paintings look like. &amp;nbsp;They Berkeley hackers have a paper on this at one of the&lt;a href="http://www.iccv2011.org/"&gt; ICCV 2011 &lt;/a&gt;workshops in Barcelona, here is the paper information:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://kinectdata.com/" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="243" src="http://kinectdata.com/images/example.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://kinectdata.com/"&gt;kinectdata.com&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: 'Helvetica Neue', Arial, 'Liberation Sans', FreeSans, sans-serif; font-size: 13px; line-height: 19px;"&gt;&lt;span style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 13px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;"&gt;&lt;a href="http://sergeykarayev.com/work/files/iccv2011.pdf"&gt;A Category-Level 3-D Object Dataset: Putting the Kinect to Work&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;a href="http://www.eecs.berkeley.edu/~allie/" style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 13px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-decoration: none;"&gt;Allison Janoch&lt;/a&gt;,&amp;nbsp;&lt;a href="http://sergeykarayev.com/" style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 13px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-decoration: none;"&gt;Sergey Karayev&lt;/a&gt;,&amp;nbsp;&lt;a href="http://www.eecs.berkeley.edu/~jiayq/" style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 13px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-decoration: none;"&gt;Yangqing Jia&lt;/a&gt;,&amp;nbsp;&lt;a href="http://www.cs.berkeley.edu/~barron/" style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 13px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-decoration: none;"&gt;Jonathan T. Barron&lt;/a&gt;,&amp;nbsp;&lt;a href="http://www.cs.berkeley.edu/~mfritz/" style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 13px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-decoration: none;"&gt;Mario Fritz&lt;/a&gt;,&amp;nbsp;&lt;a href="http://www.icsi.berkeley.edu/~saenko/" style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 13px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-decoration: none;"&gt;Kate Saenko&lt;/a&gt;,&amp;nbsp;&lt;a href="http://www.eecs.berkeley.edu/~trevor/" style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 13px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-decoration: none;"&gt;Trevor Darrell&lt;/a&gt;&lt;br /&gt;ICCV-W 2011&lt;br /&gt;[&lt;a href="http://sergeykarayev.com/work/files/iccv2011.pdf" style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 13px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-decoration: none;"&gt;pdf&lt;/a&gt;] [&lt;a href="http://sergeykarayev.com/work/files/iccv2011.bib" style="border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; font-size: 13px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-decoration: none;"&gt;bibtex&lt;/a&gt;]&lt;/span&gt;&lt;br /&gt;&lt;hr /&gt;&lt;div style="text-align: center;"&gt;&lt;span class="Apple-style-span" style="font-size: x-large;"&gt;&lt;b&gt;UW's RGB-D Object Dataset&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;On another note, if you want to use 3D for your own object recognition experiments then you might want to check out the following dataset:&amp;nbsp;&lt;a href="http://www.cs.washington.edu/rgbd-dataset/"&gt;University of Washington's RGB-D Object Dataset&lt;/a&gt;. &amp;nbsp;With this dataset you'll be able to compare against UW's current state-of-the-art.&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.cs.washington.edu/rgbd-dataset/" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="176" src="http://www.cs.washington.edu/rgbd-dataset/imgs/rgbd_dataset2.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;In this dataset you will find RGB+Kinect3D data for many household items taken from different views. &amp;nbsp;Here is the really cool paper which got me excited about the RGB-D Dataset:&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span"&gt;&lt;a href="http://www.cs.washington.edu/homes/kevinlai/publications/lai_aaai11.pdf" target="_blank"&gt;A Scalable Tree-based Approach for Joint Object and Pose Recognition&lt;/a&gt;&lt;br /&gt;&lt;b&gt;&lt;a href="http://www.cs.washington.edu/homes/kevinlai/index.html"&gt;Kevin Lai&lt;/a&gt;&lt;/b&gt;, &lt;a href="http://www.cs.washington.edu/homes/lfb/"&gt;Liefeng Bo&lt;/a&gt;, &lt;a href="http://www.cs.washington.edu/homes/xren/"&gt;Xiaofeng Ren&lt;/a&gt;, and &lt;a href="http://www.cs.washington.edu/homes/fox/"&gt;Dieter Fox&lt;/a&gt;&lt;br /&gt;In the&amp;nbsp;&lt;i&gt;Twenty-Fifth Conference on Artificial Intelligence (AAAI)&lt;/i&gt;, August 2011.&lt;/span&gt;&lt;br /&gt;&lt;hr /&gt;&lt;b&gt;&lt;/b&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-size: x-large;"&gt;NYU's Depth Dataset&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;div style="text-align: left;"&gt;I have to admit that I did not know about this dataset (created by by &lt;a href="http://cs.nyu.edu/~silberman/site/"&gt;Nathan Silberman&lt;/a&gt; of NYU), until after I blogged about the other two datasets. &amp;nbsp;Check out the &lt;a href="http://cs.nyu.edu/~silberman/site/?page_id=27"&gt;NYU Depth Dataset homepage&lt;/a&gt;. However the internet is great, and only a few hours after posted this short blog post, somebody let me know that I left out this really cool NYU dataset. &amp;nbsp;In fact, it looks like this particular dataset might be at the LabelMe-level regarding dense object annotations, but with accompanying Kinect data. &amp;nbsp;Rob Fergus &amp;amp; Co strike again!&lt;/div&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://cs.nyu.edu/~silberman/site/?page_id=27" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="189" src="http://cs.nyu.edu/~silberman/site/wp-content/uploads/2011/05/nyu_depth_dataset_preview.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://cs.nyu.edu/~silberman/"&gt;Nathan Silberman&lt;/a&gt;, &lt;a href="http://cs.nyu.edu/~fergus/"&gt;Rob Fergus&lt;/a&gt;. &lt;a href="http://cs.nyu.edu/~silberman/papers/indoor_seg_struct_light.pdf"&gt;Indoor Scene Segmentation using a Structured Light Sensor.&lt;/a&gt; To Appear: ICCV 2011 Workshop on 3D Representation and Recognition&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-964805211522199627?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/964805211522199627/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/10/kinect-object-datasets-berkeleys-b3do.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/964805211522199627'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/964805211522199627'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/10/kinect-object-datasets-berkeleys-b3do.html' title='Kinect Object Datasets: Berkeley&apos;s B3DO, UW&apos;s RGB-D, and NYU&apos;s Depth Dataset'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-671220490728351870</id><published>2011-09-29T13:12:00.003-05:00</published><updated>2011-09-29T13:13:00.798-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MIT'/><category scheme='http://www.blogger.com/atom/ns#' term='plenoptic function'/><category scheme='http://www.blogger.com/atom/ns#' term='ted adelson'/><category scheme='http://www.blogger.com/atom/ns#' term='unified field theory'/><category scheme='http://www.blogger.com/atom/ns#' term='etymology'/><title type='text'>plenoptica theoretica: fields vs. particles</title><content type='html'>&lt;div style="text-align: center;"&gt;"All bodies together, and each by itself, give off to the surrounding air an infinite number of images which are all-pervading and each complete, each conveying the nature, colour and form of the body which produces it." --Leonardo da Vinci&lt;/div&gt;&lt;br /&gt;Yesterday, &lt;a href="http://persci.mit.edu/people/adelson"&gt;Edward H. Adelson&lt;/a&gt; ("Ted Adelson") gave a lecture at MIT on the plenoptic function and its role in understanding (and unifying) early vision. &amp;nbsp;Ted has been at MIT for quite some time. &amp;nbsp;He is sometimes described as being (1/3 human vision, 1/3 computer vision, and 1/3 computer graphics) and was &lt;a href="http://people.csail.mit.edu/billf/"&gt;Bill Freeman's&lt;/a&gt; advisor.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: left;"&gt;&lt;b&gt;What is the plenoptic function?&lt;/b&gt;&lt;/div&gt;Etymology:&amp;nbsp;Plenoptic comes from plenus+optic.&lt;br /&gt;&lt;a href="http://en.wiktionary.org/wiki/plenus"&gt;plenus&lt;/a&gt;: full, filled&lt;br /&gt;&lt;a href="http://en.wiktionary.org/wiki/optic"&gt;optic&lt;/a&gt;: relating to eye or vision&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-szXtTFRWdfI/ToSrnSKP8kI/AAAAAAAAKBU/-ns5cVApgNM/s1600/plenoptic.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://1.bp.blogspot.com/-szXtTFRWdfI/ToSrnSKP8kI/AAAAAAAAKBU/-ns5cVApgNM/s320/plenoptic.png" width="313" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Ted Adelson imagined a sort of &lt;a href="http://en.wikipedia.org/wiki/Unified_field_theory"&gt;unified field theory&lt;/a&gt; for vision -- instead of proposing a jungle of atoms such as edges, corners, and peaks, the plenoptic function offers a unifying principle under which color, texture, motion, etc. can all be viewed as gradients of the plenoptic function. &amp;nbsp;The plenoptic function is a complete representation which contains, implicitly, a description of every possible photograph that could be taken of a particular space-time chunk of the world. &amp;nbsp;&lt;i&gt;Omniscience is to knowing as the plenoptic function is to seeing.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;Ted remarked that if you asked him 20 years ago what he was working on in vision, you might have gotten a confusing answer. &amp;nbsp;"Do you work on texture, motion, stereo, or illumination?" you might ask. &amp;nbsp;"All of them. &amp;nbsp;Aren't they the all the same thing?" he might reply. &amp;nbsp;Ted argues that vision scientists in the 80s and early 90s tried to cut up the world of vision into neat little "particles" and would develop theories with their favorite particle -- here the particles are early vision concepts such as color, texture, and motion.&lt;br /&gt;&lt;br /&gt;In their seminal paper on the plenoptic function,&amp;nbsp;&lt;a href="http://persci.mit.edu/pub_pdfs/elements91.pdf"&gt;The Plenoptic Function and the Elements of Early Vision&lt;/a&gt;, Adelson and Bergen state that "the elemental operations of early vision involve the measurement of local change long various directions within the plenoptic function." &amp;nbsp;As a theoretical device, the plenoptic function has left a long-standing impression on me. &amp;nbsp;I first came across Ted's ideas back in 2006 -- thanks to Alyosha Efros' course on vision. &amp;nbsp;Having just completed a BS in Physics, I was well aware of unified field theories in physics, and the plenoptic function seemed too cool to forget.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;What the plenoptic function means to me&lt;/b&gt;&lt;br /&gt;However, if the plenoptic function is the Maxwell's equations equivalent for early (low-level) vision, then what I'm ultimately after is the Schrodinger's equation of late (high-level) vision. &amp;nbsp;In his lecture, Ted Adelson acknowledged that vision scientists have a sort of Atom Envy -- they envy the physicists who are able to understand the world in terms of a few fundamental ontologically meaningful entities. &amp;nbsp;First of all, I like particles, but I have no apriori reason to be in the particle camp all of my life. &amp;nbsp;Secondly, the plenoptic function was all about early vision, but my research in vision is all about high-level vision such as object recognition. &amp;nbsp;I might be young and foolish, but the search for a "mind mechanics" has been a part of my research life (at least partially) since ~2003. &amp;nbsp;Right now, my best shot at an answer is that exemplars and associations are the basic building blocks of high-level vision -- but unlike the British Empricisits (the champions of associationism), I would argue that the atomic building blocks of associations are object instances, and not ideas such as "roundness" and "blueness". &amp;nbsp;Complex ideas are then the object categories which arise out of the interactions between these concrete elements of experience.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Conclusion&lt;/b&gt;&lt;br /&gt;The Adelson and Bergen paper is a must read for anybody serious about vision. &amp;nbsp;While it might not offer much in terms of "what next" in vision research, it is nevertheless a useful construct in thinking about vision. &amp;nbsp;I get excited when it comes down to unifying principles and I wish there were more papers like this in vision, especially for high-level vision.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-671220490728351870?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/671220490728351870/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/09/plenoptica-theoretica-fields-vs.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/671220490728351870'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/671220490728351870'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/09/plenoptica-theoretica-fields-vs.html' title='plenoptica theoretica: fields vs. particles'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-szXtTFRWdfI/ToSrnSKP8kI/AAAAAAAAKBU/-ns5cVApgNM/s72-c/plenoptic.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-2363371848647449369</id><published>2011-09-28T13:59:00.002-05:00</published><updated>2011-09-28T13:59:26.925-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MIT'/><category scheme='http://www.blogger.com/atom/ns#' term='kant'/><category scheme='http://www.blogger.com/atom/ns#' term='minds'/><category scheme='http://www.blogger.com/atom/ns#' term='intentional stance'/><category scheme='http://www.blogger.com/atom/ns#' term='philosophy'/><category scheme='http://www.blogger.com/atom/ns#' term='tenenbaum'/><category scheme='http://www.blogger.com/atom/ns#' term='dennett'/><category scheme='http://www.blogger.com/atom/ns#' term='reverse-engineering'/><category scheme='http://www.blogger.com/atom/ns#' term='brain'/><category scheme='http://www.blogger.com/atom/ns#' term='physics'/><category scheme='http://www.blogger.com/atom/ns#' term='cognitive science'/><title type='text'>Kant's Intuitions, the intentional stance, and reverse-engineering the mind</title><content type='html'>&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;img border="0" src="http://t2.gstatic.com/images?q=tbn:ANd9GcQU8IhtzIvrylt80Qe2tVHcZOGKLWIzplrRGYwiGJ8EXcXLKlBX8L5dSk5o" /&gt;&lt;img border="0" height="200" src="http://ase.tufts.edu/cogstud/incbios/dennettd/dennettd_files/image003.jpg" width="169" /&gt;&lt;img border="0" src="http://www.mathcamp.org/2010/images/josh-tenenbaum-thumbnail.jpg" /&gt;&lt;/div&gt;&lt;div class="" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.philosophypages.com/ph/kant.htm"&gt;Immanuel Kant (1724-1804)&lt;/a&gt;, &lt;a href="http://ase.tufts.edu/cogstud/incbios/dennettd/dennettd.htm"&gt;Daniel Dennett &lt;/a&gt;(1942-), &lt;a href="http://web.mit.edu/cocosci/josh.html"&gt;Josh Tenenbaum&lt;/a&gt; (~1971-)&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;“Thoughts without content are empty, intuitions without concepts are blind. The understanding can intuit nothing, the senses can think nothing. Only through their unison can knowledge arise.” -- Immanuel Kant&lt;br /&gt;&lt;br /&gt;“We live in a world that is subjectively open. And we are designed by evolution to be "informavores", epistemically hungry seekers of information, in an endless quest to improve our purchase on the world, the better to make decisions about our subjectively open future.” -- Daniel Dennett&lt;br /&gt;&lt;br /&gt;"For scientists studying how humans cometo understand their world, the central challenge is this: How do our minds get so much from so little? We build rich causal models,make strong generalizations, and construct powerful abstractions, whereas the input data are sparse, noisy, and ambiguous—in every way far too limited. A massive mismatch looms between the information coming in through our senses and the outputs of cognition." -- Josh Tenenbaum&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;/div&gt;&lt;hr /&gt;&lt;div class="MsoNormal"&gt;&lt;b&gt;Organizing by space(space, time, and physics)&lt;o:p&gt;&lt;/o:p&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;There are two faculties of understanding which it isunlikely we have acquired from experience.&amp;nbsp;The first is that of understanding objects as extended bodies in a 3D spaceand thus occupying some volume.&amp;nbsp; I believeit is Kant argued best against the hardcore British Empiricists, whoproclaimed that experience is the sole originator of knowledge.&amp;nbsp; Experiences are the pen strokes, which fill theEmpricisit’s tabula rasa.&amp;nbsp; Kant argued (against Hume) that the concept of a spatially extended object is not acquired from experience– the very notion of experience requires that we already possess the notion ofan object in order to have a meaningful percept.&amp;nbsp; It is as if the Empiricists failed toacknowledge that to make strokes on a sheet of paper, we need to already have apen.&amp;nbsp; &lt;i&gt;Kant’sintuitions are the pens of experience&lt;/i&gt;.&amp;nbsp;The requirement of having suitable intuitions for grouping percepts into experiences is what Kant described as a form of transcendental idealism.&amp;nbsp; “Objectness” is a faculty of humanunderstanding, not something acquired from experience. &amp;nbsp;If you are a vision researcher, being aware of this can have drastic implications on your research programme.&lt;o:p&gt;&lt;/o:p&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;It has also been argued that there are some primitivenotions of object dynamics, aka folk-physics, which can are possessed by very youngchildren.&amp;nbsp; Given the uniformity of humanexperience (at least I have no ostensible reason to double that my colleague’sexperiences significantly differ from my own), and the diversity in our individual upbringing, it is also unlikely that folk-physics is learned from experience. &amp;nbsp;However, I don't want to make any strong claims regarding folk-physics. &amp;nbsp;I feel safe to say that Quantum Mechanics is another story -- it requires years of mathematics and thousands hours of deliberate problem solving to grasp.&lt;o:p&gt;&lt;/o:p&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;b&gt;Organizing by mind(psychology, mind, and intent)&lt;o:p&gt;&lt;/o:p&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;The second faculty of understanding, which can be found inmany aspects of human intelligence, is that of understanding the world in termsof cognitive agents.&amp;nbsp; Humans have anamazing capability when it comes to attributing stuff with having a mind.&amp;nbsp; This way of thinking about the world is socommon and uniform among children all over the world, that the differences intheir upbringing cannot be reconciled with the uniformity of their capabilityto project humanness onto objects.&amp;nbsp; Consider the following video (thanks J. Tenenbaum's videos/lectures for pointing this out).&lt;o:p&gt;&lt;/o:p&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;object class="BLOGGER-youtube-video" classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0" data-thumbnail-src="http://3.gvt0.com/vi/sZBKer6PMtM/0.jpg" height="266" width="320"&gt;&lt;param name="movie" value="http://www.youtube.com/v/sZBKer6PMtM&amp;fs=1&amp;source=uds" /&gt;&lt;param name="bgcolor" value="#FFFFFF" /&gt;&lt;embed width="320" height="266"  src="http://www.youtube.com/v/sZBKer6PMtM&amp;fs=1&amp;source=uds" type="application/x-shockwave-flash"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;We cannot just view this video os triangles, dots, and lines. &amp;nbsp;Each one of understands the story in terms of a narrative based on agents and their intent. &amp;nbsp;We are stimulated by the external world, we take asinput sense-data, and the brain helps us make sense of it -- it turns the hodgepodge of data into experience.&amp;nbsp; But the brain is a mold, it conforms perceptsto some shape defined by the mold.&amp;nbsp; Thesemolds are the faculties of understanding which let us understand things, it islike &lt;b&gt;the faculties of understanding arebasis vectors onto which we project all input sense data&lt;/b&gt;.&amp;nbsp; The data is weak and noisy, the priors are strong, andunderstanding is the result of their union.&amp;nbsp;An experience without a proper basis is blind, it is just a ball ofpercepts.&amp;nbsp; These faculties allow us tohave experience.&amp;nbsp; The experiences,coupled with memory, allow us to obtain understanding – where understanding isthe relationship between a given experience and past experiences, either in theform of direct associations between currently-experienced-objects andpreviously-experienced-objects, or rules abstracted away frompreviously-experienced-objects being directly applied to the current sensedata.&lt;o:p&gt;&lt;/o:p&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;What I am talking about is what &lt;a href="http://ase.tufts.edu/cogstud/incbios/dennettd/dennettd.htm"&gt;philosopher Daniel Dennett&lt;/a&gt; refers to by the “&lt;a href="http://en.wikipedia.org/wiki/Intentional_stance"&gt;intentional stance&lt;/a&gt;.”&amp;nbsp;Given my background in AI and philosophy of mind, it is very likely thatDennett and I have had the same influences.&amp;nbsp; I liketo juxtapose my ideas with those of the classical philosophers such asDescartes, Locke, Kant, Wittgenstein and Pinker -- I’m not sure how Dennett motivates his philosophy nor do I know against whose ideas he juxtaposes his own stance. &lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;hr /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://videolectures.net/nips2010_tenenbaum_hgm/" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="211" src="http://1.bp.blogspot.com/-Rt0JNnPqae8/ToNlawl7dfI/AAAAAAAAKBE/4Al23GsDKio/s320/jt.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;At MIT, J. Tenenbaum is pushing these ideas to the next level. &amp;nbsp;I only wish there was more perception in his work -- toy worlds just don't do it for me. &amp;nbsp;I want to build intelligent machines, and really cannot afford to sidestep the issue of perception. &amp;nbsp;Here is&amp;nbsp;&lt;a href="http://videolectures.net/nips2010_tenenbaum_hgm/"&gt;a great talk by Josh Tenenbaum on reverse engineering the mind from NIPS 2010&lt;/a&gt;. Video is on videolectures.net, just click the link.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="MsoNormal"&gt;&lt;b&gt;Implications for Artificial Intelligence and Machine Vision&lt;o:p&gt;&lt;/o:p&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="MsoNormal"&gt;Following Josh Tenenbaum, I think that a criticism of classical machine learning is long overdue.&amp;nbsp; Machine Learning, as a field, has been spewing out hardcore empiricists.&amp;nbsp; “Let me download&amp;nbsp;&lt;b&gt;your&lt;/b&gt;&amp;nbsp;features, my machine learning algorithm will take care of the rest,” they say.&amp;nbsp; It is like the glory is in the mathematics, which manipulates N-D vectors.&amp;nbsp; But I argue that intelligence isn’t “in the calculus,” it is what the primitives in the calculus actually represent.&amp;nbsp; As an undergraduate I proclaimed, “I am not a mathematician, I am a physicists.&amp;nbsp; I care about the structure of the world, not the structure of proofs. “&amp;nbsp; As a graduate student I proclaimed, “The glory isn’t in the manipulation of vectors, the glory is understanding the what/why of encoding information about the world into vectors.&amp;nbsp; I am a computer vision researcher, not a machine learning researcher.” &amp;nbsp;That is why the view of the world as coming from K different classes is wrong – this is merely a convenient view if the statistician’s toolbox is at your disposal.&amp;nbsp; It is all about structuring the input to match a researcher’s high-level intuitions about the world.&amp;nbsp;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-2363371848647449369?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/2363371848647449369/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/09/kants-intuitions-intentional-stance-and.html#comment-form' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/2363371848647449369'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/2363371848647449369'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/09/kants-intuitions-intentional-stance-and.html' title='Kant&apos;s Intuitions, the intentional stance, and reverse-engineering the mind'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-Rt0JNnPqae8/ToNlawl7dfI/AAAAAAAAKBE/4Al23GsDKio/s72-c/jt.png' height='72' width='72'/><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-3719213786975493873</id><published>2011-09-09T13:50:00.000-05:00</published><updated>2011-09-09T13:53:13.880-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MIT'/><category scheme='http://www.blogger.com/atom/ns#' term='shimon ullman'/><category scheme='http://www.blogger.com/atom/ns#' term='courses'/><category scheme='http://www.blogger.com/atom/ns#' term='IIT-at-MIT'/><category scheme='http://www.blogger.com/atom/ns#' term='vision'/><category scheme='http://www.blogger.com/atom/ns#' term='postdoc'/><category scheme='http://www.blogger.com/atom/ns#' term='torralba'/><category scheme='http://www.blogger.com/atom/ns#' term='intelligence'/><category scheme='http://www.blogger.com/atom/ns#' term='aude oliva'/><category scheme='http://www.blogger.com/atom/ns#' term='poggio'/><title type='text'>My first week at MIT: What is intelligence?</title><content type='html'>In case anybody hasn't heard the news, I am no longer a PhD student at CMU. &amp;nbsp;After I handed in my camera-ready dissertation, it didn't take long for &lt;a href="http://www.cs.cmu.edu/~efros/"&gt;my CMU advisor&lt;/a&gt; to promote me from his 'current students' to 'former students' list on his webpage.&amp;nbsp; Even though I doubt there is anyplace in the world which can rival CMU when it comes to computer vision, &amp;nbsp;I've decided to give MIT a shot. &amp;nbsp;I had wanted to come to MIT for a long time, but 6 years ago I decided to choose CMU's RI over MIT's CSAIL for my computer vision PhD. &amp;nbsp;Life is funny because the paths we take in life aren't dead-ends -- I'm glad I had a second chance to come to MIT. &lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.csail.mit.edu/sites/default/files/logo.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://www.csail.mit.edu/sites/default/files/logo.jpg" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://libraries.mit.edu/archives/exhibits/seal/mit-seal_400x400.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="200" src="http://libraries.mit.edu/archives/exhibits/seal/mit-seal_400x400.gif" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;In case you haven't heard, MIT is a little tech school somewhere in Boston. &amp;nbsp;Lots of undergrads can be caught wearing math Tshirts and posters like the following can be found on the walls of MIT:&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-KYs1zGGtPKQ/TmpcTHQl-AI/AAAAAAAAKA8/Mf2L_Qg9SvU/s1600/ft.jpeg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="239" src="http://4.bp.blogspot.com/-KYs1zGGtPKQ/TmpcTHQl-AI/AAAAAAAAKA8/Mf2L_Qg9SvU/s320/ft.jpeg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;b&gt;A cool (undergrad targeted) poster I saw at MIT&lt;/b&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;As of last week I'm officially a postdoc in CSAIL and I'll be working with &lt;a href="http://web.mit.edu/torralba/www/"&gt;Antonio Torralba&lt;/a&gt; and &lt;a href="http://cvcl.mit.edu/aude.htm"&gt;Aude Oliva&lt;/a&gt;. I've been closely following both Antonio's and Aude's work over the last several years and getting to work with these giants of vision will surely be a treat. &amp;nbsp;In case you don't know &lt;a href="http://en.wikipedia.org/wiki/Postdoctoral_research"&gt;what a postdoc is&lt;/a&gt;, it is a generic term used to describe post-PhD researchers with generally short term (1-3 year) appointments. &amp;nbsp;People generally use the term Postdocotral Fellow or Postdoctoral Associate to describe their position in a university. I guess 3 years working on vision as an undergrad and 6 years of working on vision as a grad student just wasn't enough for me...&lt;br /&gt;&lt;br /&gt;&lt;hr /&gt;I've been getting adjusted to my daily commute through scenic Boston, learning about all the cool vision projects in the lab, as well as meeting all the PhD students working with Antonio. Today was the first day of a course which I'm sitting-in on, titled "What is intelligence?". &amp;nbsp;When I saw a course offered by two computer vision titans (&lt;a href="http://www.wisdom.weizmann.ac.il/~shimon/"&gt;Shimon Ullman&lt;/a&gt; and &lt;a href="http://mcgovern.mit.edu/principal-investigators/tomaso-poggio"&gt;Tomaso Poggio&lt;/a&gt;), I couldn't resist. &amp;nbsp;Here is the information below:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;What is intelligence?&lt;/b&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://upload.wikimedia.org/wikipedia/commons/1/17/ArtificialFictionBrain.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="180" src="http://upload.wikimedia.org/wikipedia/commons/1/17/ArtificialFictionBrain.png" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a href="http://web.mit.edu/9.s915/www/"&gt;"What is intelligence?" course homepage&lt;/a&gt;:&amp;nbsp;&lt;a href="http://web.mit.edu/9.s915/www/"&gt;http://web.mit.edu/9.s915/www/&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="background-color: white;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td align="right"&gt;&lt;b&gt;Class Times:&lt;/b&gt;&lt;/td&gt;&lt;td&gt;Friday 11:00-2:00 pm&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;&lt;b&gt;Units:&lt;/b&gt;&lt;/td&gt;&lt;td&gt;3-0-9&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;&lt;b&gt;Location:&lt;/b&gt;&lt;/td&gt;&lt;td&gt;46-5193 (NOTE: we had to choose a bigger room)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="right"&gt;&lt;b&gt;Instructors:&lt;/b&gt;&lt;/td&gt;&lt;td&gt;Shimon Ullman and Tomaso Poggio&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;The class was packed -- we had to relocate to a bigger room. &amp;nbsp;Much of today's lecture was given by &lt;a href="http://web.mit.edu/lrosasco/www/"&gt;Lorenzo Rosasco&lt;/a&gt;. Lorenzo is the Team Leader of &lt;a href="http://cbcl.mit.edu/IIT@MIT/IIT@MIT.html"&gt;IIT@MIT&lt;/a&gt;. Here is a blurb from IIT@MIT's website describe what this 'center' is all about:&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span"&gt;The IIT@MIT lab was founded from an agreement between the&amp;nbsp;&lt;a href="http://www.mit.edu/" target="_blank"&gt;Massachusetts Institute of Technology&lt;/a&gt;(MIT) and the&amp;nbsp;&lt;a href="http://www.iit.it/en/home.html" target="_blank"&gt;Istituto Italiano di Tecnologia&lt;/a&gt;&amp;nbsp;(IIT). The scientific objective is to develop novel learning and perception technologies – algorithms for learning, especially in the visual perception domain, that are inspired by the neuroscience of sensory systems and are developed within the rapidly growing theory of computational learning. The ultimate goal of this research is to design artificial systems that mimic the remarkable ability of the primate brain to learn from experience and to interpret visual scenes.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;hr /&gt;Another cool class offered this semester at MIT is Antonio Torralba's&amp;nbsp;&lt;a href="http://people.csail.mit.edu/torralba/courses/6.870_2011f/6.870.grounding.html"&gt;Grounding Object Recognition and Scene Understanding&lt;/a&gt;.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://people.csail.mit.edu/torralba/courses/6.870_2011f/teaser.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="74" src="http://people.csail.mit.edu/torralba/courses/6.870_2011f/teaser.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-3719213786975493873?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/3719213786975493873/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/09/my-first-week-at-mit-what-is.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/3719213786975493873'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/3719213786975493873'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/09/my-first-week-at-mit-what-is.html' title='My first week at MIT: What is intelligence?'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-KYs1zGGtPKQ/TmpcTHQl-AI/AAAAAAAAKA8/Mf2L_Qg9SvU/s72-c/ft.jpeg' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-4528857719726665397</id><published>2011-08-24T08:39:00.000-05:00</published><updated>2011-08-24T08:39:01.894-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='c++'/><category scheme='http://www.blogger.com/atom/ns#' term='jedi'/><category scheme='http://www.blogger.com/atom/ns#' term='internship'/><category scheme='http://www.blogger.com/atom/ns#' term='kernels'/><category scheme='http://www.blogger.com/atom/ns#' term='MATLAB'/><category scheme='http://www.blogger.com/atom/ns#' term='facetime'/><category scheme='http://www.blogger.com/atom/ns#' term='svm'/><category scheme='http://www.blogger.com/atom/ns#' term='machine learning'/><category scheme='http://www.blogger.com/atom/ns#' term='jay yagnik'/><category scheme='http://www.blogger.com/atom/ns#' term='youtube'/><category scheme='http://www.blogger.com/atom/ns#' term='picasa'/><category scheme='http://www.blogger.com/atom/ns#' term='interns'/><category scheme='http://www.blogger.com/atom/ns#' term='visualization'/><category scheme='http://www.blogger.com/atom/ns#' term='photobios'/><category scheme='http://www.blogger.com/atom/ns#' term='visual memex'/><category scheme='http://www.blogger.com/atom/ns#' term='face memex'/><category scheme='http://www.blogger.com/atom/ns#' term='google'/><category scheme='http://www.blogger.com/atom/ns#' term='publishing'/><category scheme='http://www.blogger.com/atom/ns#' term='hackers'/><title type='text'>The vision hacker culture at Google ...</title><content type='html'>I sometimes get frustrated when developing machine learning algorithms in C++. &amp;nbsp;And since working in object recognition basically means you have to be a machine learning expert, trying something new and exciting in C++ can be extremely painful. &amp;nbsp;I don't miss the C++ heavy workflow for vision projects at Google. &amp;nbsp;C++ is great for building large-scale systems, but not for pioneering object recognition representations. &amp;nbsp;I like to play with pixels and I like to think of everything as matrices. &amp;nbsp;But programming languages, software engineering philosophies, and other coding issues aren't going to be today's topic. &amp;nbsp;Today I want to talk about &lt;i&gt;the one thing that is more valuable that is computers, and that is people&lt;/i&gt;. &amp;nbsp;Not just people, but a community of people, and in particular the culture at Google -- in particular, vision@Google. &lt;br /&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;I miss being around the hacker culture at Google&lt;/b&gt;. &amp;nbsp;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;The people at Google aren't just hackers, they are &lt;b&gt;Jedis&lt;/b&gt; when it comes to building great stuff -- and that is why I recommend a Google internship to many of my fellow CMU vision Robograds (fyi, Robograds are CMU Robotics Graduate Students). &amp;nbsp;CMU-ers, like Googlers, like to build stuff. &amp;nbsp;However, CMU-ers are typically younger.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.rabittooth.com/800x600StarWarsWallpapers2/LukeSkywalkerANHV2Wallpaper.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="240" src="http://www.rabittooth.com/800x600StarWarsWallpapers2/LukeSkywalkerANHV2Wallpaper.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;Image from&amp;nbsp;&lt;a href="http://www.rabittooth.com/"&gt;http://www.rabittooth.com/&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;What is a software engineering Jedi, you might ask? Tis' one who is not afraid of million cores, one who is not afraid of building something great. &amp;nbsp;While little boys get hurt by the guns 'n knives of C++, Jedi use their tools like ninjas use their swords.&amp;nbsp;You go into Google as a boy, you come out a man. &amp;nbsp;NOTE: I do not recommend going to Google and just toying around in Matlab for 3 months. &amp;nbsp;Build something great, find a Yoda-esque mentor, or at least strive to be a Jedi. &amp;nbsp;&lt;b&gt;There's plenty of time in grad school for Matlab and writing papers. &amp;nbsp;If you get a chance to go to Google, take the opportunity to go large-scale and learn to MapReduce like the pros. &lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Every day I learn about more and more people I respect in vision and learning going to Google, or at least interning there (e.g.,&amp;nbsp;&lt;a href="http://www.cs.ubc.ca/~andrejk/"&gt;Andrej Karpathy&lt;/a&gt; who is starting his PhD@Stanford and&amp;nbsp;&lt;a href="http://www.cs.cmu.edu/~santosh/"&gt;Santosh Divvala&lt;/a&gt; who is a well-known CMU PhD student and vision hacker). &amp;nbsp;And I really can't blame them for choosing Google over places like Microsoft for the summer. &amp;nbsp;I can't think of many better places to be -- the culture is inimitable. &amp;nbsp;I spent two summers at &lt;a href="http://research.google.com/pubs/author36197.html"&gt;Jay Yagnik&lt;/a&gt;'s group some of the great people I interned with are already full-time Googlers (e.g. &lt;a href="http://research.google.com/pubs/author39634.html"&gt;Luca Bertelli&lt;/a&gt;&amp;nbsp;and&amp;nbsp;&lt;a href="http://research.google.com/pubs/author38262.html"&gt;Mehmet Emre Sargin&lt;/a&gt;). &amp;nbsp;&lt;b&gt;And what is really great about vision@google is that these guys get to publish surprisingly often&lt;/b&gt;! &amp;nbsp;Not just throw-away-code kind of publish, but stuff that fits inside large-scale systems -- stuff which is already inside Google products. &amp;nbsp;The technology is often inside the Google product before the paper goes public! &amp;nbsp;Of course it's not easy to publish at a place like Google because there is just way too much exciting large-scale stuff going on. &amp;nbsp;Here is a short list of some cool 2010/2011 vision papers (from vision conferences) with significant Googler contributions.&lt;br /&gt;&lt;br /&gt;&lt;hr /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-hLMkVIGqJNw/TlR0epz2gFI/AAAAAAAAKAY/EMPRNVGdi_g/s1600/bertelli_horses.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://1.bp.blogspot.com/-hLMkVIGqJNw/TlR0epz2gFI/AAAAAAAAKAY/EMPRNVGdi_g/s320/bertelli_horses.png" width="199" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;b&gt;Kernelized Structural SVM Learning&lt;/b&gt;&lt;/div&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; line-height: 19px;"&gt;“Kernelized Structural SVM Learning for Supervised Object Segmentation”,&amp;nbsp;&lt;a href="http://research.google.com/pubs/author39634.html" style="color: #1155cc; text-decoration: none;"&gt;Luca Bertelli&lt;/a&gt;,&amp;nbsp;&lt;a href="http://research.google.com/pubs/author39635.html" style="color: #1155cc; text-decoration: none;"&gt;Tianli Yu&lt;/a&gt;, Diem Vu, Burak Gokturk,&amp;nbsp;&lt;i&gt;Proceedings of IEEE Conference on Computer Vision and Pattern Recognition 2011&lt;/i&gt;.&lt;br /&gt;[&lt;a href="http://research.google.com/pubs/pub36985.html" style="color: #1155cc; text-decoration: none;"&gt;&lt;strong&gt;abstract&lt;/strong&gt;&lt;/a&gt;] [&lt;a href="http://research.google.com/pubs/archive/36985.pdf" style="color: #1155cc; text-decoration: none;"&gt;&lt;strong&gt;pdf&lt;/strong&gt;&lt;/a&gt;]&lt;/span&gt;&lt;br /&gt;&lt;hr /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/-GfNDQ5dl9vY/TlR0jc4ZRsI/AAAAAAAAKAc/1xCBllPdjuQ/s1600/george_tagprop.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="94" src="http://1.bp.blogspot.com/-GfNDQ5dl9vY/TlR0jc4ZRsI/AAAAAAAAKAc/1xCBllPdjuQ/s320/george_tagprop.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;b&gt;Finding Meaning on YouTube&lt;/b&gt;&lt;/div&gt;&lt;span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; line-height: 19px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; line-height: 19px;"&gt;“Finding Meaning on YouTube: Tag Recommendation and Category Discovery”,&amp;nbsp;&lt;a href="http://research.google.com/pubs/author38233.html" style="color: #1155cc; text-decoration: none;"&gt;George Toderici&lt;/a&gt;,&lt;a href="http://research.google.com/pubs/author37818.html" style="color: #1155cc; text-decoration: none;"&gt;Hrishikesh Aradhye&lt;/a&gt;,&amp;nbsp;&lt;a href="http://research.google.com/pubs/author107.html" style="color: #1155cc; text-decoration: none;"&gt;Marius Pasca&lt;/a&gt;, Luciano Sbaiz,&amp;nbsp;&lt;a href="http://research.google.com/pubs/author36197.html" style="color: #1155cc; text-decoration: none;"&gt;Jay Yagnik&lt;/a&gt;,&amp;nbsp;&lt;i&gt;Computer Vision and Pattern Recognition&lt;/i&gt;, 2010.&lt;br /&gt;[&lt;a href="http://research.google.com/pubs/pub35651.html" style="color: #1155cc; text-decoration: none;"&gt;&lt;strong&gt;abstract&lt;/strong&gt;&lt;/a&gt;] [&lt;a href="http://research.google.com/pubs/archive/35651.pdf" style="color: #1155cc; text-decoration: none;"&gt;&lt;strong&gt;pdf&lt;/strong&gt;&lt;/a&gt;]&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; line-height: 19px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;hr /&gt;&lt;span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; line-height: 19px;"&gt;Here is a very exciting and new paper from SIGGRAPH 2011. &amp;nbsp;It is a sort of Visual Memex for faces -- congratulations on this paper, guys! &amp;nbsp;Check out the video below.&lt;/span&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://grail.cs.washington.edu/photobios/" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="70" src="http://grail.cs.washington.edu/photobios/teaser.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;b&gt;&lt;span class="Apple-style-span" style="font-size: x-large;"&gt;Exploring Photobios Movie&lt;/span&gt;&lt;/b&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;center&gt;&lt;iframe frameborder="0" height="225" src="http://player.vimeo.com/video/23561002?title=0&amp;amp;byline=0&amp;amp;portrait=0" width="400"&gt;&lt;/iframe&gt;&lt;br /&gt;&lt;a href="http://vimeo.com/23561002"&gt;Exploring Photobios&lt;/a&gt; from &lt;a href="http://vimeo.com/kemelmi"&gt;Ira Kemelmacher&lt;/a&gt; on &lt;a href="http://vimeo.com/"&gt;Vimeo&lt;/a&gt;&lt;/center&gt;&lt;span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; line-height: 19px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; line-height: 19px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; line-height: 19px;"&gt;&lt;span class="Apple-style-span" style="font-family: arial, sans-serif;"&gt;&lt;a href="http://www.cs.washington.edu/homes/kemelmi"&gt;Ira Kemelmacher-Shlizerman&lt;/a&gt;, &lt;a href="http://www.adobe.com/technology/people/seattle/shechtman.html"&gt;Eli Shechtman&lt;/a&gt;, &lt;a href="http://www.cs.washington.edu/homes/rahul"&gt;Rahul Garg&lt;/a&gt;, &lt;a href="http://www.cs.washington.edu/homes/seitz"&gt;Steven M. Seitz&lt;/a&gt;. "Exploring Photobios." ACM Transactions on Graphics (SIGGRAPH), Aug 2011. &lt;a href="http://grail.cs.washington.edu/photobios/paper.pdf"&gt;[pdf]&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;hr /&gt;&lt;span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; line-height: 19px;"&gt;Finally, here is a very mathematical paper with a sexy title from the vision@google team. &amp;nbsp;It will be presented at the upcoming &lt;a href="http://www.iccv2011.org/"&gt;ICCV 2011 Conference&lt;/a&gt;&amp;nbsp;in Barcelona -- the same conference where I'll be presenting my&amp;nbsp;&lt;a href="http://www.cs.cmu.edu/~tmalisie/projects/iccv11/"&gt;Exemplar-SVM paper&lt;/a&gt;. &amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: arial, sans-serif; line-height: 19px;"&gt;&lt;span class="Apple-style-span" style="font-family: sans-serif, Arial, Helvetica; font-size: small; line-height: normal;"&gt;&lt;b style="font-family: sans-serif, Arial, Helvetica;"&gt;The Power of Comparative Reasoning&lt;/b&gt;&lt;br /&gt;&lt;a href="http://research.google.com/pubs/author36197.html" style="background-attachment: initial; background-clip: initial; background-color: transparent; background-image: initial; background-origin: initial; color: #3333cc;"&gt;Jay Yagnik&lt;/a&gt;, Dennis Strelow, David Ross, Ruei-Sung Lin. ICCV 2011.&amp;nbsp;&lt;a href="http://www.cs.toronto.edu/~dross/YagnikStrelowRossLin_ICCV2011.pdf" style="background-attachment: initial; background-clip: initial; background-color: transparent; background-image: initial; background-origin: initial; color: #3333cc;"&gt;[PDF]&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: arial, sans-serif;"&gt;&lt;span class="Apple-style-span" style="line-height: 19px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: 13px; line-height: 19px;"&gt;P.S. If you're a fellow vision blogger, then come find me in Barcelona@iccv2011 -- we'll go brag a beer.&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-4528857719726665397?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/4528857719726665397/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/08/vision-hacker-culture-at-google.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/4528857719726665397'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/4528857719726665397'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/08/vision-hacker-culture-at-google.html' title='The vision hacker culture at Google ...'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/-hLMkVIGqJNw/TlR0epz2gFI/AAAAAAAAKAY/EMPRNVGdi_g/s72-c/bertelli_horses.png' height='72' width='72'/><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-7931512409012353534</id><published>2011-08-16T16:16:00.000-05:00</published><updated>2011-08-16T16:16:37.462-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='dalal triggs'/><category scheme='http://www.blogger.com/atom/ns#' term='c++'/><category scheme='http://www.blogger.com/atom/ns#' term='MATLAB'/><category scheme='http://www.blogger.com/atom/ns#' term='github'/><category scheme='http://www.blogger.com/atom/ns#' term='software engineering'/><category scheme='http://www.blogger.com/atom/ns#' term='python'/><category scheme='http://www.blogger.com/atom/ns#' term='object recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='code'/><category scheme='http://www.blogger.com/atom/ns#' term='felzenszwalb'/><title type='text'>Question: What makes an object recognition system great?</title><content type='html'>Today, instead of discussing my own perspectives on object recognition or sharing some useful links, I would like to ask a general question geared towards anybody working in the field of computer vision:&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;What makes an object recognition system great?&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In particular, I would like to hear a broad range of perspectives regarding &lt;i&gt;what is necessary to provide an impact-creating open-source object recognition system&lt;/i&gt; for the research community to use&lt;i&gt;.&lt;/i&gt; &amp;nbsp;As a graduate student you might be interested in building your own recognition system, as a researcher you might be interested in extending or comparing against a current system, and as an educator you might want to to direct your students to a fully-functional object recognition system which could be used to bootstrap their research.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-Rb_L4jzn8T8/Tkrdpyy3rBI/AAAAAAAAKAU/e2bGp7ysi0o/s1600/orhog.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="107" src="http://2.bp.blogspot.com/-Rb_L4jzn8T8/Tkrdpyy3rBI/AAAAAAAAKAU/e2bGp7ysi0o/s400/orhog.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;To start the discussion I would like to first enumerate a few elements which I find important in making an object recognition system great.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Open Source&lt;/b&gt;&lt;/div&gt;&lt;div&gt;In order for object recognition to progress, I think releasing binary executables is simply not enough. &amp;nbsp;Allowing others to see your source code means that you gain more scientific credibility and you let others extend your system -- this means letting others &lt;i&gt;both train and test&lt;/i&gt; variants of your system. More people using an object recognition system also translates to a high citation count, which is favorable for researchers seeking career advancement. &amp;nbsp;Felzenszwalb et al. have released multiple open-source version of their &lt;a href="http://people.cs.uchicago.edu/~pff/latent/"&gt;Discriminatively Trained Deformable Part Model&lt;/a&gt; -- each time we see a new release it gets better! &amp;nbsp;Such continual development means that we know the authors really care about this problem. &amp;nbsp;I feel &lt;a href="https://github.com/"&gt;Github&lt;/a&gt;, with its distributed version control and social-coding features, is a powerful took the community should adopt, something which I believe is very much needed to take the community's ideas to the next level. &amp;nbsp;In my own research (e.g., the &lt;a href="http://www.cs.cmu.edu/~tmalisie/projects/iccv11/"&gt;Ensemble of Exemplar-SVMs&lt;/a&gt; approach), I have started using Github (for both private and public development) and I love it. &lt;i&gt;Linux might have been started by a single individual, but it took a community to make it great&lt;/i&gt;. &amp;nbsp;Just look at where Linux is now.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Ease of use&lt;/b&gt;&lt;/div&gt;&lt;div&gt;For ease of use, it is important that the system is implemented in a popular language which is known by a large fraction of the vision community. &amp;nbsp;Matlab, Python, C++, and Java are such popular language and many good implementations are a combination of Matlab with some highly-optimized routines in C++. &amp;nbsp;Good documentation is also important since one cannot expect only experts to be using such a system.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Strong research results&lt;/b&gt;&lt;/div&gt;&lt;div&gt;The YaRS approach, which is the "yet-another-recognition-system" approach, doesn't translate to high usage unless the system actually performs well on a well-accepted object recognition task. &amp;nbsp;Every year at vision conferences, many new recognition frameworks are introduced, but really only a few of them ever pass the test of time. &amp;nbsp;Usually an ideas withstands time because it is a conceptual contribution to science, but systems such as the &lt;a href="http://pascal.inrialpes.fr/soft/olt/"&gt;HOG-based pedestrian detector of Dalal-Triggs&lt;/a&gt; and the &lt;a href="http://people.cs.uchicago.edu/~pff/latent/"&gt;Latent Deformable Part Model of Felzenszwalb et al.&lt;/a&gt; are actually being used by many other researchers. &amp;nbsp;The ideas in these works are not only good, but the recognition systems are great.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Question:&lt;/b&gt;&lt;/div&gt;&lt;div&gt;So what would &lt;i&gt;you&lt;/i&gt; like to see in the next generation of object recognition systems? &amp;nbsp;I will try my best to reply to any comments posted below. &amp;nbsp;Any really great comment might even trigger a significant discussion; enough to warrant its own blog post. &amp;nbsp;Anybody is welcome to comment/argue/speculate below, either using their real name or anonymously.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-7931512409012353534?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/7931512409012353534/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/08/question-what-makes-object-recognition.html#comment-form' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/7931512409012353534'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/7931512409012353534'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/08/question-what-makes-object-recognition.html' title='Question: What makes an object recognition system great?'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-Rb_L4jzn8T8/Tkrdpyy3rBI/AAAAAAAAKAU/e2bGp7ysi0o/s72-c/orhog.png' height='72' width='72'/><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-6779025357358138917</id><published>2011-08-15T22:38:00.001-05:00</published><updated>2011-08-15T22:38:40.878-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='students'/><category scheme='http://www.blogger.com/atom/ns#' term='progress'/><category scheme='http://www.blogger.com/atom/ns#' term='CMU'/><category scheme='http://www.blogger.com/atom/ns#' term='black friday'/><category scheme='http://www.blogger.com/atom/ns#' term='phd'/><category scheme='http://www.blogger.com/atom/ns#' term='graduate student life'/><category scheme='http://www.blogger.com/atom/ns#' term='robotics'/><category scheme='http://www.blogger.com/atom/ns#' term='faculty'/><title type='text'>CMU's Black Fridays: a graduating PhD student's perspective</title><content type='html'>I've always been amazed that the CMU department &lt;b&gt;really&lt;/b&gt; knows what its PhD students are up to -- the big stuff as well as the little stuff. &lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-FX7Cc61_-wI/Tkni8QNpR9I/AAAAAAAAKAQ/I-hABBs6mJ4/s1600/CarnegieMellon_logo-full.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/-FX7Cc61_-wI/Tkni8QNpR9I/AAAAAAAAKAQ/I-hABBs6mJ4/s1600/CarnegieMellon_logo-full.jpg" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Allow me to elaborate. &amp;nbsp;As a PhD student at CMU, you receive a sort of "report card" at the end of every semester during on what is known as "Black Friday." You first submit a short summary of your accomplishments and goals for next semester. &amp;nbsp;Then, on this special day, the professors talk to each other about their students (probably in some secret sound-proof discussion room). &amp;nbsp;While faculty are discussing our fates, we, the students do the opposite. &amp;nbsp;We relax, watch movies, play games, and *imagine* what our superiors are discussing. &amp;nbsp;Black Friday letters serve two roles, a role for the faculty, and a role for the students.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://diypapers.com/" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="200" src="http://diypapers.com/wedding%20envelopes%20black%20foil%20lined.jpg" width="200" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;Image courtesy of&amp;nbsp;&lt;a href="http://diypapers.com/"&gt;diypapers&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;For the faculty, Black Friday is a way for the department to evaluate and monitor the progress of their PhD students. &amp;nbsp;Faculty members get a chance to discuss their students' hardships as well as their successes. &amp;nbsp;For the students, these letters are way to keep us in *check*. &lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;"You better check yo self before you wreck yo self" - &lt;a href="http://en.wikipedia.org/wiki/Check_Yo_Self"&gt;Ice Cube&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;The Black Friday letter lets us know explicitly what research qualifiers, writing qualifiers, etc they expect us to complete next semester. &amp;nbsp;They let us know if they are happy with our progress or unhappy.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://wicwoes.com/wp-content/uploads/2011/02/yipee-e1298137200851.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="182" src="http://wicwoes.com/wp-content/uploads/2011/02/yipee-e1298137200851.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;span class="Apple-style-span" style="background-color: #b5e6ff; color: #55554e; font-family: Georgia, Arial, Helvetica, sans-serif; font-size: 14px; line-height: 19px;"&gt;Photo by&amp;nbsp;&lt;a href="http://www.sxc.hu/profile/Mattox" rel="nofollow" style="background-attachment: initial; background-clip: initial; background-color: transparent; background-image: initial; background-origin: initial; background-position: initial initial; background-repeat: initial initial; border-bottom-width: 0px; border-color: initial; border-left-width: 0px; border-right-width: 0px; border-style: initial; border-top-width: 0px; color: #889800; font-size: 14px; margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px; outline-color: initial; outline-style: initial; outline-width: 0px; padding-bottom: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px; text-decoration: none; vertical-align: baseline;"&gt;Mattox&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;In practice, I found the letters to be a combination of "the good" and "the bad." &amp;nbsp;The good (a statement such as "we are happy that you got your paper accepted to ___") is like getting A's in grade school -- a yipee! moment. &amp;nbsp;The bad (a statement such as "we have noticed that you are struggling taking your experimental research to the next level, and feel that you are spending too much time on ___"), is what pushes us to experience those yipee! moments the following semester. &amp;nbsp;In the past, my Black Friday letters have included details regarding elements of my research and coursework that I didn't know any faculty member even knew about. &amp;nbsp;But the faculty care! &amp;nbsp;The students are the future, and there is nothing like critical feedback to help us achieve our goals. &amp;nbsp;There have been several times during my 6 years at CMU's Robotics Institute when my letter helped keep me in check.&lt;br /&gt;&lt;br /&gt;I only wish there was a way for students to provide Black Friday letters to their faculty mentors...&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Further reading:&lt;/b&gt;&lt;br /&gt;&lt;a href="http://www.csdhead.cs.cmu.edu/blog/2007/12/14/black-friday/"&gt;Black Friday&lt;/a&gt; from Peter Lee&lt;br /&gt;&lt;a href="http://matt-welsh.blogspot.com/2011/07/how-do-you-evaluate-your-grad-students.html"&gt;How do you evaluate your grad students?&lt;/a&gt; by Matt Welsh&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-6779025357358138917?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/6779025357358138917/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/08/cmus-black-fridays-graduating-phd.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/6779025357358138917'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/6779025357358138917'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/08/cmus-black-fridays-graduating-phd.html' title='CMU&apos;s Black Fridays: a graduating PhD student&apos;s perspective'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-FX7Cc61_-wI/Tkni8QNpR9I/AAAAAAAAKAQ/I-hABBs6mJ4/s72-c/CarnegieMellon_logo-full.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-7910222395546231188</id><published>2011-08-13T19:46:00.000-05:00</published><updated>2011-08-13T19:46:19.191-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='exemplars'/><category scheme='http://www.blogger.com/atom/ns#' term='exemplar-svm'/><category scheme='http://www.blogger.com/atom/ns#' term='nms'/><category scheme='http://www.blogger.com/atom/ns#' term='MATLAB'/><category scheme='http://www.blogger.com/atom/ns#' term='non-maximum suppression'/><category scheme='http://www.blogger.com/atom/ns#' term='code'/><category scheme='http://www.blogger.com/atom/ns#' term='object detection'/><category scheme='http://www.blogger.com/atom/ns#' term='pedro'/><category scheme='http://www.blogger.com/atom/ns#' term='post-processing'/><category scheme='http://www.blogger.com/atom/ns#' term='felzenszwalb'/><title type='text'>blazing fast nms.m (from exemplar-svm library)</title><content type='html'>If you care about building large-scale object recognition systems, you have to care about speed. &amp;nbsp;And every little bit of performance counts -- so why not first optimize the stuff which is slowing you down?&lt;br /&gt;&lt;br /&gt;NMS (non-maximum suppression) is a very popular post-processing method for eliminating redundant object detection windows. &amp;nbsp;I have take &lt;a href="http://people.cs.uchicago.edu/~pff/latent/"&gt;Felzenszwalb et al.'s nms.m&lt;/a&gt; and made it significantly faster by eliminating an inner loop. &amp;nbsp;6 years of grad school, 6 years of building large-scale vision systems in Matlab, and you really learn how to &lt;b&gt;vectorize&lt;/b&gt; code. &amp;nbsp;The code I call millions of times needs to be fast, and nms is one of those routines I call all the time.&lt;br /&gt;&lt;br /&gt;The code is found below as a Github gist -- which was taken from my &lt;a href="https://github.com/quantombone/exemplarsvm"&gt;Exemplar-SVM object recognition library&lt;/a&gt; (from my ICCV2011 paper: &lt;a href="http://www.cs.cmu.edu/~tmalisie/projects/iccv11/index.html"&gt;Ensemble of Exemplar-SVMs for Object Detection and Beyond&lt;/a&gt;). &amp;nbsp;The same file, &lt;a href="https://github.com/quantombone/exemplarsvm/blob/master/iccv11/nms.m"&gt;nms.m&lt;/a&gt;, can also be found a part of the Exemplar-SVM library on Github. &amp;nbsp;In fact, this code produces the same result as Pedro's code, but is much faster. &amp;nbsp;Here is one timing experiment I performed when performing nms on ~300K windows, where my version is roughly 100 times faster. &amp;nbsp;When you deal with exemplar-SVMs you have to deal with lots of detectors (i.e., lots of detection windows), so &lt;b&gt;fast NMS is money&lt;/b&gt;.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt;&amp;gt; tic;top = nms_original(bbs,.5);toc&lt;br /&gt;Elapsed time is&amp;nbsp;&lt;b&gt;58.172313&lt;/b&gt;&amp;nbsp;seconds.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&amp;gt;&amp;gt; tic;top = nms_fast(bbs,.5);toc&lt;br /&gt;Elapsed time is &lt;b&gt;0.532638&lt;/b&gt; seconds.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: x-small;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;script src="https://gist.github.com/1144423.js?file=nms_fast.m"&gt;&lt;/script&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-7910222395546231188?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/7910222395546231188/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/08/blazing-fast-nmsm-from-exemplar-svm.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/7910222395546231188'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/7910222395546231188'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/08/blazing-fast-nmsm-from-exemplar-svm.html' title='blazing fast nms.m (from exemplar-svm library)'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-4510981646368890383</id><published>2011-08-12T23:51:00.000-05:00</published><updated>2011-08-12T23:51:31.317-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='segmentation'/><category scheme='http://www.blogger.com/atom/ns#' term='exemplars'/><category scheme='http://www.blogger.com/atom/ns#' term='geometry transfer'/><category scheme='http://www.blogger.com/atom/ns#' term='exemplarsvm'/><category scheme='http://www.blogger.com/atom/ns#' term='exemplar-svm'/><category scheme='http://www.blogger.com/atom/ns#' term='discriminative'/><category scheme='http://www.blogger.com/atom/ns#' term='parametric'/><category scheme='http://www.blogger.com/atom/ns#' term='object interpretation'/><category scheme='http://www.blogger.com/atom/ns#' term='iccv'/><category scheme='http://www.blogger.com/atom/ns#' term='3d model'/><category scheme='http://www.blogger.com/atom/ns#' term='object detection'/><category scheme='http://www.blogger.com/atom/ns#' term='meta-data transfer'/><title type='text'>Ensemble of Exemplar-SVMs for Object Detection and Beyond</title><content type='html'>Over the next couple of days I will be announcing some very exciting news. &amp;nbsp;As many of you know, &lt;b&gt;I defended my PhD&lt;/b&gt; this past Monday at CMU. &amp;nbsp;My family and friends came for the presentation as I defended 6 years of my life in front of Alyosha Efros, Martial Hebert, Takeo Kanade, and Pietro Perona. &amp;nbsp;You might be wondering what I've been up this past year -- what sort of new vision research have I produced since the &lt;a href="http://www.cs.cmu.edu/~tmalisie/projects/nips09/"&gt;Visual Memex&lt;/a&gt; paper.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Throughout the last year or so I have slowly abandoned the &lt;i&gt;segment-then-recognize&lt;/i&gt; approach and fully embraced the exemplar-based component of my research. &amp;nbsp;&lt;b&gt;Because once you go exemplar, you don't go back!&lt;/b&gt;&amp;nbsp; If only &lt;a href="http://www.cogs.indiana.edu/nosofsky/"&gt;Nosofsky&lt;/a&gt; was here, he would be proud. &amp;nbsp;Once you have established a good exemplar-detection alignment, problems such as segmentation become trivial. &amp;nbsp;In fact, exemplar association enables a host of meta-data transfer applications. &amp;nbsp;Here is a quick overview of my recent ICCV 2011 paper with &lt;a href="http://www.cs.cmu.edu/~efros/"&gt;Alexei Efros&lt;/a&gt; and &lt;a href="http://www.cs.cmu.edu/~abhinavg/"&gt;Abhinav Gupta&lt;/a&gt; (the super new and exciting professor at CMU who will likely revolutionize they way we, vision researchers, think about the interplay of geometric reasoning and object recognition). &amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;I will be defending my work to the ICCV crowd this fall in Barcelona. &amp;nbsp;Here is the paper.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Paper:&lt;/b&gt;&lt;/div&gt;&lt;div&gt;Tomasz Malisiewicz, Abhinav Gupta, Alexei A. Efros. Ensemble of Exemplar-SVMs for Object Detection and Beyond . In ICCV, 2011. [&lt;a href="http://www.cs.cmu.edu/~tmalisie/projects/iccv11/exemplarsvm-iccv11.pdf"&gt;PDF&lt;/a&gt;] [&lt;a href="http://www.cs.cmu.edu/~tmalisie/projects/iccv11/"&gt;Project Page&lt;/a&gt;]&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Abstract:&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;This paper proposes a conceptually simple but surprisingly powerful method which combines the effectiveness of a discriminative object detector with the explicit correspondence offered by a nearest-neighbor approach.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-rFVW9aAnoew/TkX477bkV2I/AAAAAAAAJ_0/bzHhyu6pDm8/s1600/teaserama4-01.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://2.bp.blogspot.com/-rFVW9aAnoew/TkX477bkV2I/AAAAAAAAJ_0/bzHhyu6pDm8/s320/teaserama4-01.png" width="232" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;b&gt;Exemplar Associations go Beyond Bounding Boxes&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The method is based on training a separate linear SVM classifier for every exemplar in the training set. Each of these Exemplar-SVMs is thus defined by a single positive instance and millions of negatives.&amp;nbsp;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-Hv1r3ExFj6Q/TkX5k8uCh_I/AAAAAAAAKAE/mSIkT8EqRmU/s1600/exemplar_classifiers-01.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="83" src="http://2.bp.blogspot.com/-Hv1r3ExFj6Q/TkX5k8uCh_I/AAAAAAAAKAE/mSIkT8EqRmU/s400/exemplar_classifiers-01.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;b&gt;An ensemble of exemplars&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;While each detector is quite specific to its exemplar, we empirically observe that an ensemble of such Exemplar-SVMs offers surprisingly good generalization. Our performance on the PASCAL VOC detection task is on par with the much more complex &lt;a href="http://people.cs.uchicago.edu/~pff/latent/"&gt;latent part-based model of Felzenszwalb et al.&lt;/a&gt;, at only a modest computational cost increase.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-ZhKmLmLzNiY/TkYBF24LuzI/AAAAAAAAKAM/9Hsc1fivdIU/s1600/esvm_results2-01.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="148" src="http://4.bp.blogspot.com/-ZhKmLmLzNiY/TkYBF24LuzI/AAAAAAAAKAM/9Hsc1fivdIU/s400/esvm_results2-01.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;b&gt;Generalization from a single positive instance&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;But the central benefit of our approach is that it creates an explicit association between each detection and a single training exemplar. Because most detections show good alignment to their associated exemplar, it is possible to transfer any available exemplar meta-data (segmentation, geometric structure, 3D model, etc.) directly onto the detections, which can then be used as part of overall scene understanding.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This paper can be rightfully seen as a marriage of my &lt;a href="http://www.cs.cmu.edu/~tmalisie/projects/cvpr08/"&gt;older work on learning per-exemplar distances&lt;/a&gt; with the &lt;a href="http://people.cs.uchicago.edu/~pff/latent/"&gt;discriminative training method of Felzenszwalb et al&lt;/a&gt;.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Here are some summary pictures from my paper and a short description of each one:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;1.&lt;/b&gt; Going beyond object detection (i.e., produce a category-labeled bounding box), we look at several &lt;b&gt;meta-data transfer&lt;/b&gt; applications. &amp;nbsp;Meta-data transfer is a way interpreting an object detection in a way which transcends category membership. &amp;nbsp;The first task is that of geometry transfer.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-jwlG60fgvww/TkX5N3NAtiI/AAAAAAAAJ_4/kPi0QkwOr_4/s1600/bus-final-01.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="112" src="http://3.bp.blogspot.com/-jwlG60fgvww/TkX5N3NAtiI/AAAAAAAAJ_4/kPi0QkwOr_4/s400/bus-final-01.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;b&gt;Geometry Transfer&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;2.&lt;/b&gt; &lt;b&gt;Segmentation&lt;/b&gt; is a well-known problem in computer vision -- generally tackled with bottom-up approaches which strive to produce coherent regions based on pixel-pixel appearance similarity. &amp;nbsp;We show that a recognize-then-segment is possible, and in particular an associate-then-segment approach based on transferring segmentations from exemplars onto detection windows.&amp;nbsp;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-1XaX98xiLz8/TkX5T4Kza4I/AAAAAAAAJ_8/lJ1y5_GBwXg/s1600/seg-all-onerow-01.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="135" src="http://3.bp.blogspot.com/-1XaX98xiLz8/TkX5T4Kza4I/AAAAAAAAJ_8/lJ1y5_GBwXg/s400/seg-all-onerow-01.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;b&gt;Segmentation Transfer&lt;/b&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;3. &lt;/b&gt;Object exemplar often show an interplay of objects, suggesting that it is possible to use the recognition of one object to prime the presence of another.&amp;nbsp;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-ftIidpN8__Y/TkX-kELPgxI/AAAAAAAAKAI/mOm12FY_WoA/s1600/person_priming-01.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="128" src="http://2.bp.blogspot.com/-ftIidpN8__Y/TkX-kELPgxI/AAAAAAAAKAI/mOm12FY_WoA/s400/person_priming-01.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;b&gt;Related Object Priming&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;P.S. &lt;a href="http://www.cs.cmu.edu/~abhinavg/"&gt;Dr. Abhinav Gupta&lt;/a&gt; is looking for students, so if you are a 1st year CMU visionary (CMU visionary = robotics vision student@CMU), check out his presentation during the RI Immigration Course.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;P.S.S. Anonymous Reviewer#3: Not only have you single-handedly saved my paper from the clutches of ICCV death, but you have resurrected a graduate student's faith in the justice of the vision peer review process.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-4510981646368890383?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/4510981646368890383/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/08/ensemble-of-exemplar-svms-for-object.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/4510981646368890383'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/4510981646368890383'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/08/ensemble-of-exemplar-svms-for-object.html' title='Ensemble of Exemplar-SVMs for Object Detection and Beyond'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-rFVW9aAnoew/TkX477bkV2I/AAAAAAAAJ_0/bzHhyu6pDm8/s72-c/teaserama4-01.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-4827153132223101752</id><published>2011-07-24T14:54:00.000-05:00</published><updated>2011-07-24T14:54:36.489-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Takeo Kanade'/><category scheme='http://www.blogger.com/atom/ns#' term='startup'/><category scheme='http://www.blogger.com/atom/ns#' term='CMU'/><category scheme='http://www.blogger.com/atom/ns#' term='pittpatt'/><category scheme='http://www.blogger.com/atom/ns#' term='phd'/><category scheme='http://www.blogger.com/atom/ns#' term='robotics'/><category scheme='http://www.blogger.com/atom/ns#' term='face detection'/><category scheme='http://www.blogger.com/atom/ns#' term='google'/><title type='text'>CMU Robotics Instititue's vision finds a home at Google</title><content type='html'>Congratulations to &lt;a href="http://pittpatt.com/"&gt;PittPatt&lt;/a&gt; for their recent acquisition by Google.&amp;nbsp; PittPatt, a Pittsburgh-based startup, has its roots in CMU's Robotics Institute (where I'm currently a PhD student).&amp;nbsp; &lt;a href="http://www.cs.cmu.edu/%7Ehws/"&gt;Henry Schneiderman&lt;/a&gt;, the CEO of PittPatt, did some truly hardcore computer vision work while doing his PhD under Takeo Kanade.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/-E1l1aB2cN0s/Tix2X0WyCdI/AAAAAAAAJ_w/IjaEvT2WK_o/s1600/pittpatt.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://2.bp.blogspot.com/-E1l1aB2cN0s/Tix2X0WyCdI/AAAAAAAAJ_w/IjaEvT2WK_o/s1600/pittpatt.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Two famous papers by Hendry Schneiderman and Takeo Kanade are the following:&lt;br /&gt;&lt;span style=" font-family: Arial;"&gt;&lt;span style="font-family: Arial;"&gt;H. Schneiderman, T. Kanade. "A Statistical Method for 3D Object Detection Applied to Faces and Cars". IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2000) &lt;a href="http://www.cs.cmu.edu/afs/cs.cmu.edu/user/hws/www/CVPR00.pdf"&gt;pdf format&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br/&gt;&lt;br /&gt;&lt;span style="font-family: Arial;"&gt;&lt;span style="font-family: Arial;"&gt;H. Schneiderman, T. Kanade. "Probabilistic Modeling of Local Appearance and Spatial Relationships for Object Recognition." IEEE Conference on Computer Vision and Pattern Recognition (CVPR 1998). &lt;a href="http://www.cs.cmu.edu/afs/cs.cmu.edu/user/hws/www/CVPR98.pdf"&gt;pdf format&lt;/a&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="color: navy; font-family: Arial;"&gt;&lt;span style="font-family: Arial;"&gt; &lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Here is what the front page of PittPatt states:&lt;br /&gt;&lt;br /&gt;Joining Google is the next thrilling step in a journey that began  with research at Carnegie Mellon University's Robotics Institute in the  1990s and continued with the launching of Pittsburgh Pattern Recognition  (PittPatt) in 2004. We've worked hard to advance the research and  technology in many important ways and have seen our technology come to  life in some very interesting products. At Google, computer vision  technology is already at the core of many existing products (such as  Image Search, YouTube, Picasa, and Goggles), so it's a natural fit to  join Google and bring the benefits of our research and technology to a  wider audience. We will continue to tap the potential of computer vision  in applications that range from simple photo organization to complex  video and mobile applications.&lt;br /&gt;&lt;div style="margin-bottom: 20px;"&gt;We look forward to joining the team at Google!&lt;/div&gt;&lt;div style="margin-bottom: 20px;"&gt;&lt;i&gt;The team at Pittsburgh Pattern Recognition&lt;/i&gt;&lt;/div&gt;&lt;br /&gt;Perhaps Henry's success is yet another reason to come to CMU to get a vision PhD...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-4827153132223101752?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/4827153132223101752/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/07/cmu-robotics-instititues-vision-finds.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/4827153132223101752'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/4827153132223101752'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/07/cmu-robotics-instititues-vision-finds.html' title='CMU Robotics Instititue&apos;s vision finds a home at Google'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/-E1l1aB2cN0s/Tix2X0WyCdI/AAAAAAAAJ_w/IjaEvT2WK_o/s72-c/pittpatt.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-4754246395387206250</id><published>2011-07-13T22:54:00.000-05:00</published><updated>2011-07-13T22:54:10.825-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ge research'/><category scheme='http://www.blogger.com/atom/ns#' term='professors'/><category scheme='http://www.blogger.com/atom/ns#' term='werewolves'/><category scheme='http://www.blogger.com/atom/ns#' term='peter tu'/><category scheme='http://www.blogger.com/atom/ns#' term='vampires'/><category scheme='http://www.blogger.com/atom/ns#' term='academia'/><category scheme='http://www.blogger.com/atom/ns#' term='supernatural'/><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='comedy'/><category scheme='http://www.blogger.com/atom/ns#' term='article'/><title type='text'>vampiric professors vs. industry-leading werewolves</title><content type='html'>&lt;div style="text-align: center;"&gt;"&lt;b&gt;Vampires need blood from human donors. Professors need publications, which they extract from their grad students&lt;/b&gt;." - Peter Tu&lt;/div&gt;&lt;table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style="text-align: center;"&gt;&lt;a href="http://www.werewolves.com/werewolves-vs-vampires/" imageanchor="1" style="margin-left: auto; margin-right: auto;"&gt;&lt;img border="0" height="171" src="http://www.werewolves.com/wordpress/wp-content/uploads/2009/11/Vampire_werewolf.jpg" width="320" /&gt;&lt;/a&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td class="tr-caption" style="text-align: center;"&gt;&lt;b&gt;Photo courtesy of Werewolves.com&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;I wanted to share with everybody a short article by &lt;a href="http://ge.geglobalresearch.com/blog/author/peter-tu/"&gt;Peter Tu&lt;/a&gt; of GE Research (and a Computer Vision hacker) in which he compares &lt;b&gt;Professors to Vampires&lt;/b&gt; and &lt;b&gt;Industry Researchers to Werewolves&lt;/b&gt;. &amp;nbsp;If you are in the mood for a witty and insightful read, check this out:&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;a href="http://ge.geglobalresearch.com/blog/computer-vision-and-the-supernatural/"&gt;Computer Vision and the Supernatural&lt;/a&gt;&amp;nbsp;by Peter Tu&lt;/div&gt;&lt;br /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-4754246395387206250?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/4754246395387206250/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/07/vampiric-professors-vs-industry-leading.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/4754246395387206250'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/4754246395387206250'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/07/vampiric-professors-vs-industry-leading.html' title='vampiric professors vs. industry-leading werewolves'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-5831574162836321898</id><published>2011-07-07T15:56:00.002-05:00</published><updated>2011-07-07T15:59:52.910-05:00</updated><title type='text'>David Weinberger: Reading Aristotle</title><content type='html'>Here are some quotes from &lt;a href="http://www.hyperorg.com/speaker/bio.html"&gt;David Weinberger&lt;/a&gt;'s &lt;a href="http://www.kmworld.com/Articles/Column/David-Weinberger/Reading-Aristotle-9754.aspx"&gt;Reading Aristotle&lt;/a&gt; -- a short article on KMWorld from 2004.&lt;br /&gt;&lt;br /&gt;"After all, the music of Aristotle's thought comes from his assumption that the principles of knowledge are the same as the principles of the universe. The categories are not "mere" categories of thought for him. They are also the way the cosmos is arranged. If the order of knowledge and the order of the world are not the same, reasoned Aristotle, then knowledge isn't possible. Knowledge is only possible if the universe is ordered in knowable ways."&lt;br /&gt;&lt;br /&gt;"a world apart from the categories of understanding would be by definition unknowable"&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Weinberger's ideas are very relevant to my thesis on the &lt;a href="http://www.cs.cmu.edu/~tmalisie/projects/nips09/"&gt;Visual Memex&lt;/a&gt; and his writings have been a great influence on me.  I've recently been on a bit of a Weinberger-reading/listening-binge after I found some of his lectures as Podcasts!&lt;br /&gt;&lt;br /&gt;---&lt;br /&gt;&lt;br /&gt;David Weinberger is the author of Everything Is Miscellaneous: The Power of the New Digital Disorder. The book's accompanying blog can be found here: &lt;a href="http://www.everythingismiscellaneous.com/"&gt;http://www.everythingismiscellaneous.com/&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-5831574162836321898?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/5831574162836321898/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/07/david-weinberger-reading-aristotle.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/5831574162836321898'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/5831574162836321898'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/07/david-weinberger-reading-aristotle.html' title='David Weinberger: Reading Aristotle'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-7029082544474406432</id><published>2011-06-22T02:15:00.003-05:00</published><updated>2011-06-22T02:18:21.847-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='segmentation'/><category scheme='http://www.blogger.com/atom/ns#' term='conference'/><category scheme='http://www.blogger.com/atom/ns#' term='gibson'/><category scheme='http://www.blogger.com/atom/ns#' term='CMU'/><category scheme='http://www.blogger.com/atom/ns#' term='inference'/><category scheme='http://www.blogger.com/atom/ns#' term='waterfalls'/><category scheme='http://www.blogger.com/atom/ns#' term='papers'/><category scheme='http://www.blogger.com/atom/ns#' term='sfm'/><category scheme='http://www.blogger.com/atom/ns#' term='hiking'/><category scheme='http://www.blogger.com/atom/ns#' term='poselets'/><category scheme='http://www.blogger.com/atom/ns#' term='graphical models'/><category scheme='http://www.blogger.com/atom/ns#' term='affordances'/><category scheme='http://www.blogger.com/atom/ns#' term='imitation'/><category scheme='http://www.blogger.com/atom/ns#' term='cvpr'/><title type='text'>cvpr 2011: highlights from day 1</title><content type='html'>Today was the first main day of the CVPR 2011 conference. &amp;nbsp;Here are some papers which I found particularly exciting and a brief reason why they are super cool:&lt;br /&gt;&lt;br /&gt;&lt;title&gt;&lt;/title&gt;&lt;style type="text/css"&gt;p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Arial}p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Arial; color: #3c00af}p.p3 {margin: 0.0px 0.0px 0.0px 0.0px; font: 12.0px Arial; min-height: 14.0px}span.s1 {letter-spacing: 0.0px}span.s2 {text-decoration: underline ; letter-spacing: 0.0px}span.s3 {letter-spacing: 0.0px color: #000000}&lt;/style&gt;&lt;br /&gt;&lt;div class="p1"&gt;&lt;span class="s1"&gt;Becuase you can’t &lt;i&gt;afford&lt;/i&gt; to miss this one:&lt;/span&gt;&lt;/div&gt;&lt;div class="p2"&gt;&lt;span class="s2"&gt;&lt;a href="http://www.cs.cmu.edu/~abhinavg/"&gt;Abhinav Gupta&lt;/a&gt;&lt;/span&gt;&lt;span class="s3"&gt;, &lt;a href="http://research.satkin.com/"&gt;&lt;span class="s2"&gt;Scott Satkin&lt;/span&gt;&lt;/a&gt;, &lt;a href="http://www.cs.cmu.edu/~efros/"&gt;&lt;span class="s2"&gt;Alexei A. Efros&lt;/span&gt;&lt;/a&gt; and &lt;a href="http://www.cs.cmu.edu/~hebert/"&gt;&lt;span class="s2"&gt;Martial Hebert&lt;/span&gt;&lt;/a&gt;. &lt;a href="http://draft.blogger.com/goog_1146779343"&gt;&lt;span class="s2"&gt;From Scene Geometry to&amp;nbsp;&lt;/span&gt;&lt;/a&gt;&lt;a href="http://www.cs.cmu.edu/~abhinavg/papers/0586.pdf"&gt;&lt;span class="s2"&gt;Human Workspace&lt;/span&gt;&lt;/a&gt;. In CVPR 2011.&lt;/span&gt;&lt;/div&gt;&lt;div class="p3"&gt;&lt;span class="s1"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="p1"&gt;&lt;span class="s1"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="p1"&gt;&lt;span class="s1"&gt;Because large scale structure from motion meets graphical models:&lt;/span&gt;&lt;/div&gt;&lt;div class="p2"&gt;&lt;span class="s2"&gt;&lt;a href="https://www.cs.indiana.edu/~djcran/"&gt;David Crandall&lt;/a&gt;&lt;/span&gt;&lt;span class="s3"&gt;, &lt;a href="http://people.csail.mit.edu/aho/"&gt;&lt;span class="s2"&gt;Andrew Owens&lt;/span&gt;&lt;/a&gt;, &lt;a href="http://www.cs.cornell.edu/~snavely/"&gt;&lt;span class="s2"&gt;Noah Snavely&lt;/span&gt;&lt;/a&gt;, and &lt;a href="http://www.cs.cornell.edu/~dph/"&gt;&lt;span class="s2"&gt;Dan Huttenlocher&lt;/span&gt;&lt;/a&gt;.&amp;nbsp;&lt;a href="http://people.csail.mit.edu/aho/pubs/dco_crandall_etal_cvpr_2011.pdf"&gt;&lt;span class="s2"&gt;Discrete-Continuous Optimization for Large-Scale Structure from Motion&lt;/span&gt;&lt;/a&gt;.&amp;nbsp;In CVPR 2011.&lt;/span&gt;&lt;/div&gt;&lt;div class="p3"&gt;&lt;span class="s1"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="p1"&gt;&lt;span class="s1"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="p1"&gt;&lt;span class="s1"&gt;Because segmentation provides a good basis for object discovery:&lt;/span&gt;&lt;/div&gt;&lt;div class="p2"&gt;&lt;span class="s2"&gt;&lt;a href="http://vision.ucsd.edu/person/carolina-galleguillos"&gt;Galleguillos C.&lt;/a&gt;&lt;/span&gt;&lt;span class="s3"&gt;,&amp;nbsp;&lt;a href="http://vision.ucsd.edu/biblio/author/217"&gt;&lt;span class="s2"&gt;McFee B.&lt;/span&gt;&lt;/a&gt;,&amp;nbsp;&lt;a href="http://vision.ucsd.edu/person/serge-belongie"&gt;&lt;span class="s2"&gt;Belongie S.&lt;/span&gt;&lt;/a&gt;,&amp;nbsp;&lt;a href="http://vision.ucsd.edu/biblio/author/100"&gt;&lt;span class="s2"&gt;Lanckriet G.&lt;/span&gt;&lt;/a&gt;&amp;nbsp;&lt;a href="http://vision.ucsd.edu/sites/default/files/0892-1.pdf"&gt;&lt;span class="s2"&gt;From Region Similarity to Category Discovery&lt;/span&gt;&lt;/a&gt;.&amp;nbsp;In CVPR 2011.&lt;/span&gt;&lt;/div&gt;&lt;div class="p3"&gt;&lt;span class="s1"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="p1"&gt;&lt;span class="s1"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="p1"&gt;&lt;span class="s1"&gt;Because imitation provides a category-free way of understanding pose:&lt;/span&gt;&lt;/div&gt;&lt;div class="p2"&gt;&lt;span class="s2"&gt;&lt;a href="http://cs.nyu.edu/~gwtaylor/"&gt;Graham Taylor&lt;/a&gt;&lt;/span&gt;&lt;span class="s3"&gt;, Ian Spiro, &lt;a href="http://mrl.nyu.edu/~bregler/"&gt;&lt;span class="s2"&gt;Christoph Bregler&lt;/span&gt;&lt;/a&gt;, and &lt;a href="http://cs.nyu.edu/~fergus/"&gt;&lt;span class="s2"&gt;Rob Fergus&lt;/span&gt;&lt;/a&gt;.&amp;nbsp;&lt;a href="http://cs.nyu.edu/~gwtaylor/publications/cvpr2011/0969.pdf"&gt;&lt;span class="s2"&gt;Learning Invarance through Imitation&lt;/span&gt;&lt;/a&gt;. In CVPR 2011.&amp;nbsp;&lt;a href="http://movement.nyu.edu/imitation"&gt;&lt;span class="s2"&gt;Project page&lt;/span&gt;&lt;/a&gt;&amp;nbsp;(with links to supplementary material)&lt;/span&gt;&lt;/div&gt;&lt;div class="p3"&gt;&lt;span class="s1"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="p1"&gt;&lt;span class="s1"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="p1"&gt;&lt;span class="s1"&gt;Because you don’t need to solve intractable graphical model inferences:&lt;/span&gt;&lt;/div&gt;&lt;div class="p2"&gt;&lt;span class="s2"&gt;&lt;a href="http://www.cs.cmu.edu/~sross1/"&gt;S. Ross&lt;/a&gt;&lt;/span&gt;&lt;span class="s3"&gt;, &lt;a href="http://www.cs.cmu.edu/~dmunoz/"&gt;&lt;span class="s2"&gt;D. Munoz&lt;/span&gt;&lt;/a&gt;, &lt;a href="http://www.cs.cmu.edu/~hebert/"&gt;&lt;span class="s2"&gt;M. Hebert&lt;/span&gt;&lt;/a&gt;, J. A. Bagnell.&amp;nbsp;&lt;a href="http://www.cs.cmu.edu/~sross1/publications/Ross-CVPR11.pdf"&gt;&lt;span class="s2"&gt;Learning Message-Passing Inference Machines for Structured Prediction.&lt;/span&gt;&lt;/a&gt;&amp;nbsp; In CVPR 2011.&lt;/span&gt;&lt;/div&gt;&lt;div class="p3"&gt;&lt;span class="s1"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="p1"&gt;&lt;span class="s1"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="p1"&gt;&lt;span class="s1"&gt;Because a little bit of bottom-up never hurt a lot of top-down:&lt;/span&gt;&lt;/div&gt;&lt;div class="p2"&gt;&lt;span class="s2"&gt;&lt;a href="http://lmb.informatik.uni-freiburg.de/people/brox/index.en.html"&gt;Thomas Brox&lt;/a&gt;&lt;/span&gt;&lt;span class="s3"&gt;, &lt;a href="http://www.cs.berkeley.edu/~lbourdev/"&gt;&lt;span class="s2"&gt;Lubomir Bourdev&lt;/span&gt;&lt;/a&gt;, Subhransu Maji, Jitendra Malik. &lt;a href="http://www.eecs.berkeley.edu/~lbourdev/poselets/bbmm-seg-cvpr11.pdf"&gt;&lt;span class="s2"&gt;Object Segmentation by Alignment of Poselet Activations to Image Contours&lt;/span&gt;&lt;/a&gt;.&amp;nbsp;In CVPR 2011&lt;/span&gt;&lt;/div&gt;&lt;div class="p3"&gt;&lt;span class="s1"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="p1"&gt;&lt;span class="s1"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="p1"&gt;&lt;span class="s1"&gt;Because you might want to perform segmentation simultaneously on several related images:&lt;/span&gt;&lt;/div&gt;&lt;div class="p2"&gt;&lt;span class="s2"&gt;&lt;a href="http://www.cs.ucl.ac.uk/staff/S.Vicente/index.html"&gt;Sara Vicente&lt;/a&gt;&lt;/span&gt;&lt;span class="s3"&gt;, &lt;a href="http://research.microsoft.com/en-us/people/carrot/"&gt;&lt;span class="s2"&gt;Carsten Rother&lt;/span&gt;&lt;/a&gt; and &lt;a href="http://www.cs.ucl.ac.uk/staff/V.Kolmogorov/"&gt;&lt;span class="s2"&gt;Vladimir Kolmogorov&lt;/span&gt;&lt;/a&gt;. &lt;a href="http://www.cs.ucl.ac.uk/staff/S.Vicente/papers/ObjectCosegmentation_CVPR11.pdf"&gt;&lt;span class="s2"&gt;Object Cosegmentation&lt;/span&gt;&lt;/a&gt;. In CVPR 2011.&lt;/span&gt;&lt;/div&gt;&lt;div class="p2"&gt;&lt;span class="s3"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="p2"&gt;&lt;span class="s3"&gt;I missed some of the posters later in the evening because I went for a short hike organized by &lt;a href="http://www.cis.upenn.edu/~jshi/"&gt;Jianbo Shi&lt;/a&gt;&amp;nbsp;(to &lt;a href="http://www.sevenfalls.com/home/index.cfm"&gt;Seven Falls&lt;/a&gt;). &amp;nbsp;After all, CVPR is in Colorado Springs, and it is silly to stay inside the conference the entire time. &amp;nbsp; Below is a pic of the falls (only a short drive from the CVPR11 conference hotel).&amp;nbsp;&lt;/span&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/-xiR9DnoyhDA/TgGVLhudnCI/AAAAAAAADJk/M_VL67EHBlg/s1600/photo.JPG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="239" src="http://3.bp.blogspot.com/-xiR9DnoyhDA/TgGVLhudnCI/AAAAAAAADJk/M_VL67EHBlg/s320/photo.JPG" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="p2"&gt;&lt;span class="s3"&gt;After several hours of talks it was good to get some fresh air and see some waterfalls in Colorado Springs! &amp;nbsp;On Friday, I'm going on the Pikes Beak downhill bike ride with some friends and it should be super-fun.&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-7029082544474406432?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/7029082544474406432/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/06/cvpr-2011-highlights-from-day-1.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/7029082544474406432'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/7029082544474406432'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/06/cvpr-2011-highlights-from-day-1.html' title='cvpr 2011: highlights from day 1'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/-xiR9DnoyhDA/TgGVLhudnCI/AAAAAAAADJk/M_VL67EHBlg/s72-c/photo.JPG' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-62283540901401076</id><published>2011-06-14T00:39:00.036-05:00</published><updated>2011-06-14T00:52:05.416-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='fgvc'/><category scheme='http://www.blogger.com/atom/ns#' term='cvpr 2011'/><category scheme='http://www.blogger.com/atom/ns#' term='research'/><category scheme='http://www.blogger.com/atom/ns#' term='colorado'/><category scheme='http://www.blogger.com/atom/ns#' term='cvpr'/><category scheme='http://www.blogger.com/atom/ns#' term='workshop'/><category scheme='http://www.blogger.com/atom/ns#' term='publishing'/><category scheme='http://www.blogger.com/atom/ns#' term='networking'/><title type='text'>cvpr 2011 in colorado springs</title><content type='html'>&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.cvpr2011.org/" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="56" src="http://www.cvpr2011.org/sites/default/files/genesis_CVPR_logo.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;CVPR (&lt;a href="http://www.cvpr2011.org/"&gt;CVPR 2011&lt;/a&gt;) will be held this year in Colorado Springs, and I'll hopefully get a chance to blog about the newest and exciting ideas coming out of the Computer Vision research community while I'm there. &amp;nbsp;CVPR is a conference which I try to attend every year because 1.) the quality of the work there is superb and 2.) it is a perfect opportunity to network with all the amazing vision researchers I've met over the years.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.fgvc.org/" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="50" src="https://sites.google.com/site/cvprfgvc/_/rsrc/1304709158658/config/6b744d2d74c16d5e.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;I will also be at the &lt;a href="http://www.fgvc.org/"&gt;Fine Grained Visual Categorization workshop&lt;/a&gt; on June 25th, which has an exciting schedule of talks and posters. &amp;nbsp;This is the first ever FGVC workshop, and it is being organized by &lt;a href="http://www.cs.umd.edu/~farrell" rel="nofollow" style="color: #8a8c50; text-decoration: underline;"&gt;Ryan Farrell&lt;/a&gt; (UMD), &lt;a href="http://vision.ucsd.edu/person/steve-branson" rel="nofollow" style="color: #8a8c50; text-decoration: underline;"&gt;Steve Branson&lt;/a&gt; (UCSD), and &lt;a href="http://www.vision.caltech.edu/welinder/" rel="nofollow" style="color: #8a8c50; text-decoration: underline;"&gt;Peter Welinder&lt;/a&gt; (Caltech).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-62283540901401076?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/62283540901401076/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/06/cvpr-2011-in-colorado-springs.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/62283540901401076'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/62283540901401076'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/06/cvpr-2011-in-colorado-springs.html' title='cvpr 2011 in colorado springs'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-1856510902905828103</id><published>2011-04-28T03:54:00.000-05:00</published><updated>2011-04-28T03:54:03.993-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rob fergus'/><category scheme='http://www.blogger.com/atom/ns#' term='deva ramanan'/><category scheme='http://www.blogger.com/atom/ns#' term='filters'/><category scheme='http://www.blogger.com/atom/ns#' term='talks'/><category scheme='http://www.blogger.com/atom/ns#' term='object detection'/><category scheme='http://www.blogger.com/atom/ns#' term='convolutions'/><category scheme='http://www.blogger.com/atom/ns#' term='pooling ramanan'/><category scheme='http://www.blogger.com/atom/ns#' term='deep learning'/><title type='text'>vision talks at CMU are the best</title><content type='html'>At CMU, we get some of the best people in object recognition to visit and give talks. &amp;nbsp;The &lt;a href="http://vasc.ri.cmu.edu/seminar/"&gt;CMU VASC seminar&lt;/a&gt; is a place where all the cool vision researchers come and advertise their own research. &amp;nbsp;Just last week &lt;a href="http://www.ics.uci.edu/~dramanan/"&gt;Deva Ramanan&lt;/a&gt;&amp;nbsp;gave a spectacular talk about his most recent part-based detector. &amp;nbsp;Deva made an everlasting impression on me as a first year PhD student -- 5 years ago he showed the world that visualizations of person's parts as marginals are much sexier than just boxes. &lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Deva's Sexy Part Marginals&lt;/b&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.ics.uci.edu/~dramanan/papers/parse/peopleResAll.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://www.ics.uci.edu/~dramanan/papers/parse/peopleResAll.jpg" width="213" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;Today,&amp;nbsp;&lt;a href="http://www.cs.nyu.edu/~fergus"&gt;Rob Fergus&lt;/a&gt;&amp;nbsp;gave a talk on his most recent deep learning research. &amp;nbsp;This research topic has been promulgated by hardcore machine learning titans such as &lt;a href="http://robotics.stanford.edu/~ang/"&gt;Andrew Ng&lt;/a&gt;, &lt;a href="http://yann.lecun.com/"&gt;Yann LeCun&lt;/a&gt;, and &lt;a href="http://www.cs.toronto.edu/~hinton/"&gt;Geoff Hinton&lt;/a&gt;, so it will be exciting to see how Rob applies these ideas to object recognition. &amp;nbsp;Unsupervised Feature Learning seems pretty exciting; unfortunately, I just cannot take results on Caltech101/Caltech256 very seriously :-(&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;b&gt;Deep Learning Learning Architectures&lt;/b&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/-12P6T1CIr3E/TbkoeNF0moI/AAAAAAAADI8/uXdBW6STA_M/s1600/Architecture.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="178" src="http://4.bp.blogspot.com/-12P6T1CIr3E/TbkoeNF0moI/AAAAAAAADI8/uXdBW6STA_M/s320/Architecture.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-1856510902905828103?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/1856510902905828103/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/04/vision-talks-at-cmu-are-best.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/1856510902905828103'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/1856510902905828103'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/04/vision-talks-at-cmu-are-best.html' title='vision talks at CMU are the best'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/-12P6T1CIr3E/TbkoeNF0moI/AAAAAAAADI8/uXdBW6STA_M/s72-c/Architecture.jpg' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-2636156300773236914</id><published>2011-03-25T13:13:00.000-05:00</published><updated>2011-03-25T13:13:51.449-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='tricks'/><category scheme='http://www.blogger.com/atom/ns#' term='MATLAB'/><category scheme='http://www.blogger.com/atom/ns#' term='github'/><category scheme='http://www.blogger.com/atom/ns#' term='programming'/><category scheme='http://www.blogger.com/atom/ns#' term='gists'/><title type='text'>Matlab Trick: Counting via Sparse</title><content type='html'>Here is a simple little matlab demo script of how to count items using the command &lt;b&gt;sparse&lt;/b&gt;.  Counting the occurrences of items is a frequently performed task in vision and not many people know that sparse can do this.  Remember, if you're going to be a matlab jedi do not write for loops:&lt;br /&gt;&lt;script src="https://gist.github.com/887292.js?file=count_with_sparse.m"&gt;&lt;/script&gt;&lt;br /&gt;&lt;br /&gt;I am using github's gists to embed this code in style.  Github lets me choose the language, highlights the syntax accordingly, and the snippet is its own repository!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-2636156300773236914?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/2636156300773236914/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/03/matlab-trick-counting-via-sparse.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/2636156300773236914'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/2636156300773236914'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/03/matlab-trick-counting-via-sparse.html' title='Matlab Trick: Counting via Sparse'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-6960398232105413584</id><published>2011-03-24T21:04:00.001-05:00</published><updated>2011-03-26T00:03:21.739-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='computer vision'/><category scheme='http://www.blogger.com/atom/ns#' term='philosophy'/><category scheme='http://www.blogger.com/atom/ns#' term='artificial intelligence'/><category scheme='http://www.blogger.com/atom/ns#' term='quine'/><category scheme='http://www.blogger.com/atom/ns#' term='cognitive science'/><title type='text'>Computer Vision is Artificial Intelligence</title><content type='html'>Computer vision is a diverse field and its researchers have multifaceted interests and aspirations. &amp;nbsp;It should not be surprising that no two vision researchers think about the field in the same way. &amp;nbsp;Different academic backgrounds foster alternative and potentially &lt;a href="http://en.wikipedia.org/wiki/Commensurability_(philosophy_of_science)"&gt;incommensurable&lt;/a&gt; interpretations. &amp;nbsp;It is as if &lt;a href="http://en.wikipedia.org/wiki/W._V._Quine"&gt;W.V.O Quine'&lt;/a&gt;s thesis that no observation can be "theory-independent" directly applies to vision: a researcher in computer vision cannot uphold a view on his own field that is objective and independent of their own predispositions, upbringing, and educational program. &amp;nbsp;While I cannot speak clearly about the long-term goals of the entire body researchers in vision, today I would like discuss my own take on computer vision. &amp;nbsp;I do not offer the world an objective account of why computer vision intrigues me, but by sharing with the world the reasons why I find vision exciting, perhaps &lt;b&gt;together&lt;/b&gt; we can break the boundaries of machine intelligence.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;span class="Apple-style-span" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;a href="http://draft.blogger.com/goog_830284386"&gt;&lt;img border="0" height="200" src="http://www.mcgill.ca/files/cogsci/CognitiveScience.gif" width="189" /&gt;&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;b&gt;Cognitive Science is a computational study of the mind: &lt;/b&gt;&lt;a href="http://www.mcgill.ca/cogsci/"&gt;McGill Cognitive Science&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;One of the biggest accomplishments in the field of Artificial Intelligence was when&lt;a href="http://en.wikipedia.org/wiki/Deep_Blue_(chess_computer)"&gt;&amp;nbsp;Deep Blue, a chess playing program developed at IBM, beat the world chess champion, Garry Kasparov&lt;/a&gt;. &amp;nbsp;But this was in the early days of artificial intelligence -- when computer scientists still weren't sure on what it means for a machine to be intelligent. &amp;nbsp;Chess is a well-known thinking-man's game, and at first glance it seems that a machine can only be worthy of being dubbed intelligent if it performs competitively on intelligent-people activities such as chess.&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.slate.com/id/2445/" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://img.slate.com/media/50000/50917/Stamaty-Kasparov-anim.gif" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;b&gt;Chess: Human vs. Machine:&lt;/b&gt;&amp;nbsp;&lt;a href="http://www.slate.com/id/2445/"&gt;Slate article about Deep Blue&lt;/a&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;b&gt;Given the plethora of tasks that humans can effortlessly perform in daily life, is engineering a machine to rival humans on just one such task bringing researchers any closer to building truly intelligent machines?&lt;/b&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;The problem with chess is that it has a &lt;b&gt;"finite universe problem"&lt;/b&gt; -- there is a finite number of primitives (the chess pieces) which can be manipulated by choosing a move from a finite set of allowable actions. &amp;nbsp;But if we think of normal life (going to work, eating dinner, talking to a friend) as a game, then it is not hard to see that most everyday situations involving humans involve a sea of infinite objects (just look around and name all the different objects you can see around you!) and an equally capacious space of allowable actions (consider all the things you could with all those objects around you!). &amp;nbsp;Intelligence is what allows us to cope with the complexities of the universe by focusing our attention on a limited set of relevant variables -- but the working set of objects/concepts we must consider at any single instant is chosen from a seemingly infinite set of alternatives.&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"&gt;I believe that everyday human-level visual intelligence is greatly undervalued by people -- and there is a very good reason for this! &amp;nbsp;The ability to make sense of what is going on in a single picture is such a trivial and autonomous task for humans, that we don't even bother quantifying just how good we are at it. &amp;nbsp;But let me reassure you that automated image understanding is no trivial feat. &amp;nbsp;The world is not composed of 20 visual object categories and the space of allowable and interpretable utterances we could associate with a static picture is seemingly infinite. &amp;nbsp;While the 20 category object detection task (as popularized by the &lt;a href="http://pascallin.ecs.soton.ac.uk/challenges/VOC/"&gt;PASCAL VOC&lt;/a&gt;) does have a finite universe problem, the grander version of the vision master problem (a combination of detection/recognition/categorization where you can interpret an input any way you like) is much more complex and mirrors the structure of the external world well.&lt;/div&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="https://lh3.googleusercontent.com/-RPnkl30bmL4/TYvvCd_CGcI/AAAAAAAADIc/LBAnCkhoxZE/s1600/bender.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="240" src="https://lh3.googleusercontent.com/-RPnkl30bmL4/TYvvCd_CGcI/AAAAAAAADIc/LBAnCkhoxZE/s320/bender.gif" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;b&gt;Robotics Challenge: Build a Robot like Bender&lt;/b&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Any application which calls for automated analysis of images requires vision. &amp;nbsp;A robot, if it is to be successful interacting with the world and performing useful tasks, needs to perceive the external world and organize it. &amp;nbsp;While some see vision as just one small piece of the "&lt;b&gt;Robotics Challenge&lt;/b&gt;" (build a robot and make it do cool stuff), it totally unclear to me where to draw the boundary between low-level pixel analysis and high-level cognitive scene understanding. &amp;nbsp;Over the years, I have been thinking more and more about this problem, and I've convinced myself that &lt;b&gt;the interesting part of vision is precisely at the boundary between what is commonly thought of as low-level representation of signal and what is considered high-level representation of visual concepts&lt;/b&gt;. &amp;nbsp;While some view computer vision as "applied mathematics" or "applied machine learning" or "image processing in disguise", I passionately believe the following:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Computer Vision is Artificial Intelligence&lt;/b&gt;&lt;/div&gt;&lt;div&gt;&lt;div style="text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;I am not promulgating the thesis that all aspects of machine intelligence are visual, but I want to assure you that there are enough high-level semantic capabilities which must be set in place for vision to work, that it is &lt;b&gt;not worthwhile to think of vision as smaller problem than general purpose intelligence&lt;/b&gt;. &amp;nbsp;I believe that once we have made progress on vision (not in the narrow-universe setting) to the point where generic visual scene understanding is effectively solved, there won't be much left that needs to go into the "ethereal" mind which cognitive scientists want to empower machines with! &amp;nbsp;The only way to make machines truly understand scenes, objects, and their interactions is to make machines know something about the fabric of human life, and it is important for machines to learn this for themselves from real-world experience. &amp;nbsp;This goes beyond representing object appearance because folk physics, folk psychology, causality, spatio-temporal continuity, etc are all faculties which vision systems will need (at least the vision systems I want to ultimately build) for general purpose scene understanding. &amp;nbsp;I don't want to undermine the efforts of cognitive scientists (which work on many of the theories/ideas I've delineated before), but perhaps only to convince them that I have been a cognitive scientist all along. &amp;nbsp;I don't think placing a label on myself, by calling myself as either a cognitive scientist, a computer vision researcher, or AI researcher is very conducive to good research.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-6960398232105413584?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/6960398232105413584/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/03/computer-vision-is-artificial.html#comment-form' title='8 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/6960398232105413584'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/6960398232105413584'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/03/computer-vision-is-artificial.html' title='Computer Vision is Artificial Intelligence'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='https://lh3.googleusercontent.com/-RPnkl30bmL4/TYvvCd_CGcI/AAAAAAAADIc/LBAnCkhoxZE/s72-c/bender.gif' height='72' width='72'/><thr:total>8</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-2526986251138168017</id><published>2011-03-23T02:49:00.001-05:00</published><updated>2011-03-23T16:54:38.978-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ut austin'/><category scheme='http://www.blogger.com/atom/ns#' term='kristen grauman'/><category scheme='http://www.blogger.com/atom/ns#' term='cvpr 2011'/><category scheme='http://www.blogger.com/atom/ns#' term='computer vision'/><title type='text'>Kristen Grauman's CVPR 2011 papers</title><content type='html'>&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;This year there are lots of &lt;a href="http://www.cs.utexas.edu/~grauman/research/pubs.html"&gt;new papers from Kristen Grauman's Computer Vision Group&lt;/a&gt; at UT-Austin, to be presented at &lt;a href="http://cvpr2011.org/"&gt;CVPR 2011&lt;/a&gt;.&amp;nbsp; There are lots of them, but not abstracts/PDFS yet.&amp;nbsp; Here is a melange of pictures to entice us..&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://vision.cs.utexas.edu/projects/easiness/easiness.html" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="145" src="http://vision.cs.utexas.edu/projects/easiness/easiness_files/image004.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px;"&gt;&lt;span style="font-family: Arial, sans-serif; font-size: 10pt;"&gt;Learning the Easy Things First: Self-Paced Visual Category Discovery.&amp;nbsp; Y. J. Lee and K. Grauman.&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="font-family: Arial, sans-serif; font-size: 10pt;"&gt;To appear,&lt;span style="font-style: italic;"&gt;Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition&lt;/span&gt;&amp;nbsp;(CVPR), Colorado Springs, CO, June 2011.&amp;nbsp; [&lt;a href="http://vision.cs.utexas.edu/projects/easiness/easiness.html"&gt;project page&lt;/a&gt;]&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;hr /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px;"&gt;&lt;span style="font-family: Arial, sans-serif; font-size: 10pt;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.cs.utexas.edu/~grauman/research/ims/bplr.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="129" src="http://www.cs.utexas.edu/%7Egrauman/research/ims/bplr.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px;"&gt;&lt;span style="font-family: Arial, sans-serif; font-size: 10pt;"&gt;Boundary-Preserving Dense Local Regions.&amp;nbsp; J. Kim and K. Grauman.&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="font-family: Arial, sans-serif; font-size: 10pt;"&gt;To appear,&amp;nbsp;&lt;span style="font-style: italic;"&gt;Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition&lt;/span&gt;&amp;nbsp;(CVPR), Colorado Springs, CO, June 2011.&amp;nbsp; (Oral)&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;hr /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;span class="Apple-style-span" style="font-family: Arial, sans-serif; font-size: x-small;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.cs.utexas.edu/~grauman/research/ims/livelearning.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="206" src="http://www.cs.utexas.edu/%7Egrauman/research/ims/livelearning.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px;"&gt;&lt;span style="font-family: Arial; font-size: 10pt;"&gt;Large-Scale Live Active Learning: Training Object Detectors with Crawled Data and Crowds.&amp;nbsp; S. Vijayanarasimhan and K. Grauman.&amp;nbsp; To appear,&amp;nbsp;&lt;span style="font-style: italic;"&gt;Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition&lt;/span&gt;&amp;nbsp;(CVPR), Colorado Springs, CO, June 2011.&lt;/span&gt;&lt;span style="font-family: Arial;"&gt;&amp;nbsp;&amp;nbsp;&lt;small&gt;(Oral)&lt;/small&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px;"&gt;&lt;span style="font-family: Arial;"&gt;&lt;small&gt;&lt;br /&gt;&lt;/small&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;hr /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.cs.utexas.edu/~grauman/research/ims/sharing.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="231" src="http://www.cs.utexas.edu/%7Egrauman/research/ims/sharing.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px;"&gt;&lt;span style="font-family: Arial, sans-serif; font-size: 10pt;"&gt;haring Features Between Objects and Their Attributes.&amp;nbsp; S. J. Hwang, F. Sha, and K. Grauman.&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="font-family: Arial, sans-serif; font-size: 10pt;"&gt;To appear,&amp;nbsp;&lt;span style="font-style: italic;"&gt;Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition&lt;/span&gt;&amp;nbsp;(CVPR), Colorado Springs, CO, June 2011.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;hr /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px;"&gt;&lt;span style="font-family: Arial, sans-serif; font-size: 10pt;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.cs.utexas.edu/~chaoyeh/cvpr2011_location.htm" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="291" src="http://www.cs.utexas.edu/%7Egrauman/research/ims/location.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Arial, sans-serif; font-size: 13px;"&gt;Clues from the Beaten Path: Location Estimation with Bursty Sequences of Tourist Photos.&amp;nbsp; C.-Y. Chen and K. Grauman.&amp;nbsp; To appear,&amp;nbsp;&lt;span style="font-style: italic;"&gt;Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition&lt;/span&gt;&amp;nbsp;(CVPR), Colorado Springs, CO, June 2011. [&lt;a href="http://www.cs.utexas.edu/~chaoyeh/cvpr2011_location.htm"&gt;project page&lt;/a&gt;]&lt;/span&gt;&lt;/div&gt;&lt;hr /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; font-family: Arial, sans-serif; font-size: 13px;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.cs.utexas.edu/~grauman/research/ims/maxsubgraph.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="106" src="http://www.cs.utexas.edu/%7Egrauman/research/ims/maxsubgraph.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px;"&gt;&lt;small&gt;&lt;span style="font-family: Arial;"&gt;Efficient Region Search for Object Detection.&amp;nbsp; S. Vijayanarasimhan and K. Grauman.&amp;nbsp;&lt;/span&gt;&lt;/small&gt;&amp;nbsp;&lt;span style="font-family: Arial, sans-serif; font-size: 10pt;"&gt;To appear,&amp;nbsp;&lt;span style="font-style: italic;"&gt;Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition&lt;/span&gt;&amp;nbsp;(CVPR), Colorado Springs, CO, June 2011.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;hr /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.cs.utexas.edu/~grauman/research/ims/attribute_discovery.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="137" src="http://www.cs.utexas.edu/%7Egrauman/research/ims/attribute_discovery.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px;"&gt;&lt;span style="font-family: Arial, sans-serif; font-size: 10pt;"&gt;Interactively Building a Discriminative Vocabulary of Nameable Attributes.&amp;nbsp; D. Parikh and K. Grauman.&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;span style="font-family: Arial, sans-serif; font-size: 10pt;"&gt;To appear,&amp;nbsp;&lt;span style="font-style: italic;"&gt;Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition&amp;nbsp;&lt;/span&gt;(CVPR), Colorado Springs, CO, June 2011.&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;hr /&gt;&lt;br /&gt;On another note, &lt;a href="http://web.mit.edu/torralba/www/"&gt;Antonio Torralba&lt;/a&gt; has a CVPR 2011 paper with &lt;a href="http://www.mit.edu/~rsalakhu/"&gt;Ruslan Salakhutdinov&lt;/a&gt; (Hinton's ex-PhD student) and Josh Tenenbaum, but I'll post info and pics after I get to read the paper.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Learning to Share Visual Appearance for Multiclass Object Detection &lt;/b&gt; &lt;br /&gt;&lt;a href="http://www.mit.edu/~rsalakhu/index.html"&gt;Ruslan Salakhutdinov&lt;/a&gt;,      &lt;a href="http://web.mit.edu/torralba/www/"&gt; Antonio Torralba &lt;/a&gt;, and  &lt;a href="http://web.mit.edu/cocosci/josh.html"&gt; Josh Tenenbaum&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-2526986251138168017?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/2526986251138168017/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/03/kristen-graumans-cvpr-2011-papers.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/2526986251138168017'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/2526986251138168017'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/03/kristen-graumans-cvpr-2011-papers.html' title='Kristen Grauman&apos;s CVPR 2011 papers'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-5751380381704955866</id><published>2011-03-23T02:29:00.000-05:00</published><updated>2011-03-23T02:29:02.240-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='git'/><category scheme='http://www.blogger.com/atom/ns#' term='github'/><category scheme='http://www.blogger.com/atom/ns#' term='programming'/><title type='text'>vision@github</title><content type='html'>Version control is a &lt;b&gt;must&lt;/b&gt; if you're serious about software development.&amp;nbsp; Like it or not, you have to be serious about software development if you want to build a large computer vision system (and perhaps get a PhD while doing it).&amp;nbsp; Over the years, I've moved from CVS to SVN to git.&amp;nbsp; I refuse to code without version control. &lt;br /&gt;&lt;br /&gt;While git lets me easily share my private research code with my colleagues at CMU (much easier than svn), github lets me 'publish' it online in style.&amp;nbsp; This is not the kind of sharing that most researchers do -- most researchers I know just put a tarball online.&amp;nbsp; I've started playing around with &lt;a href="https://github.com/"&gt;github&lt;/a&gt;, and I'm really excited about using it for my own Computer Vision project.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="https://github.com/" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="https://d3nwyuy0nl342s.cloudfront.net/images/modules/header/logov3-hover.png" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;I think sharing research code in a high-quality environment is necessary if we want to push the boundaries of machine intelligence in a team-like fashion.&amp;nbsp; I think good papers and good code make good research, and while lots of care goes into high quality publications in the field, very few high-quality object recognition programs are published online "well".&amp;nbsp; I think having forums, version control, wikis, etc is important if one wants a project to stand the test-of-time.&amp;nbsp; I think github is the way to go if you're going to share your code online.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-5751380381704955866?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/5751380381704955866/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/03/visiongithub.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/5751380381704955866'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/5751380381704955866'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/03/visiongithub.html' title='vision@github'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-4551810497114139354</id><published>2011-03-15T21:16:00.000-05:00</published><updated>2011-03-15T21:16:59.515-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='fun'/><category scheme='http://www.blogger.com/atom/ns#' term='kinect'/><category scheme='http://www.blogger.com/atom/ns#' term='guitar'/><title type='text'>kinect fun</title><content type='html'>Ah, the guitar. &amp;nbsp;Ah, a kinect.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="https://lh6.googleusercontent.com/-j89FQec2Sps/TYAdSl3jSLI/AAAAAAAADIY/X8gRQ9iWWTQ/s1600/guitar_kinect.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="252" src="https://lh6.googleusercontent.com/-j89FQec2Sps/TYAdSl3jSLI/AAAAAAAADIY/X8gRQ9iWWTQ/s320/guitar_kinect.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Kinect plugged into my Macbook Air.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-4551810497114139354?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/4551810497114139354/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/03/kinect-fun.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/4551810497114139354'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/4551810497114139354'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/03/kinect-fun.html' title='kinect fun'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='https://lh6.googleusercontent.com/-j89FQec2Sps/TYAdSl3jSLI/AAAAAAAADIY/X8gRQ9iWWTQ/s72-c/guitar_kinect.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-8366442318510878311</id><published>2011-01-28T00:55:00.000-05:00</published><updated>2011-01-28T00:55:28.667-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='symposium'/><category scheme='http://www.blogger.com/atom/ns#' term='suns 2011'/><title type='text'>snowy in Boston, SUn-ny at MIT</title><content type='html'>&lt;div style="text-align: center;"&gt;Today is &lt;a href="http://suns.mit.edu/"&gt;SUnS 2011 @ MIT&lt;/a&gt;.&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_6fdO7SbU8eM/TUJZNgcv0YI/AAAAAAAADH8/3bpqMB65OxM/s1600/suns2011.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="236" src="http://2.bp.blogspot.com/_6fdO7SbU8eM/TUJZNgcv0YI/AAAAAAAADH8/3bpqMB65OxM/s320/suns2011.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;Artwork by &lt;a href="http://web.mit.edu/torralba/www/"&gt;Antonio Torralba&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;a href="http://suns.mit.edu/"&gt;SUnS 2011&lt;/a&gt; is a multi-disciplinary symposium with speakers and poster presenters from a variety of disciplines (neurophysiology, cognitive neuroscience, visual cognition and computer vision) who will address a range of topics related to scene understanding and spatial cognition, object recognition, attention, visual search, etc.&lt;br /&gt;&lt;br /&gt;ORGANIZERS: &lt;a href="http://cvcl.mit.edu/Aude.htm"&gt;Aude Oliva&lt;/a&gt;, &lt;a href="http://research.brown.edu/myresearch/Thomas_Serre"&gt;Thomas Serre&lt;/a&gt;, &lt;a href="http://web.mit.edu/torralba/www/"&gt;Antonio Torralba&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I will be there and will try to update the world about the most exciting stuff I learn about. &amp;nbsp;I'm really excited about &lt;a href="http://barlab.mgh.harvard.edu/"&gt;Moshe Bar&lt;/a&gt;'s talk, his research has been a great influence of mine.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-8366442318510878311?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/8366442318510878311/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/01/snowy-in-boston-sun-ny-at-mit.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/8366442318510878311'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/8366442318510878311'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/01/snowy-in-boston-sun-ny-at-mit.html' title='snowy in Boston, SUn-ny at MIT'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_6fdO7SbU8eM/TUJZNgcv0YI/AAAAAAAADH8/3bpqMB65OxM/s72-c/suns2011.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-4404587642202977431</id><published>2011-01-26T18:33:00.000-05:00</published><updated>2011-01-26T18:33:53.586-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MATLAB'/><category scheme='http://www.blogger.com/atom/ns#' term='code'/><category scheme='http://www.blogger.com/atom/ns#' term='advice'/><title type='text'>if you are starting your research in the field of object recognition / object detection...</title><content type='html'>If you are an aspiring computer vision graduate student and hope to one day shatter the boundaries of machine perception, a good place to start is on the shoulders of giants. &amp;nbsp;A key ingredient to successful object recognition research is a powerful codebase, which you will hopefully one day outgrow and/or extend. &amp;nbsp;The single best place to get starter-code is at the following work, titled:&lt;br /&gt;&lt;span class="Apple-style-span" style="font-family: times;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;h2 style="text-align: center;"&gt;Discriminatively Trained Deformable Part Models&lt;/h2&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://people.cs.uchicago.edu/~pff/latent/" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="212" src="http://people.cs.uchicago.edu/~pff/latent/2007_008221.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Why not start with some easy-to-understand MATLAB code so you can starting advancing your research this year, not this decade!?! &amp;nbsp;Also, if you are able to build on this work, you will have an easy time publishing object detection papers that will actually be treated seriously by contemporary vision researchers. &amp;nbsp;So my advice is to get voc-release-3.1, and read the following PAMI paper. &lt;br /&gt;&lt;br /&gt;P. Felzenszwalb, R. Girshick, D. McAllester, D. Ramanan&lt;br /&gt;&lt;b&gt;Object Detection with Discriminatively Trained Part Based Models&lt;/b&gt;&lt;br /&gt;IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, No. 9, September 2010&lt;br /&gt;&lt;a href="http://people.cs.uchicago.edu/~pff/papers/lsvm-pami.pdf"&gt;pdf&lt;/a&gt;&amp;nbsp;&lt;span style="color: red;"&gt;&lt;a href="http://people.cs.uchicago.edu/~pff/latent"&gt;Source code&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Be warned that &lt;a href="http://people.cs.uchicago.edu/~pff/"&gt;pff&lt;/a&gt; is probably smarter than you so you will not be able to understand 100% of everything he says, but because it is well-written code you will not have to understand all of it.&amp;nbsp;If you want to be a Vision Jedi, look at the code, read the paper, discard the downloaded code, and write it yourself.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-4404587642202977431?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/4404587642202977431/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2011/01/if-you-are-starting-your-research-in.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/4404587642202977431'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/4404587642202977431'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2011/01/if-you-are-starting-your-research-in.html' title='if you are starting your research in the field of object recognition / object detection...'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-5244048245399773268</id><published>2010-12-31T02:12:00.000-05:00</published><updated>2010-12-31T02:12:27.567-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='3d recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='berkeley'/><category scheme='http://www.blogger.com/atom/ns#' term='alex berg'/><category scheme='http://www.blogger.com/atom/ns#' term='mac os x'/><category scheme='http://www.blogger.com/atom/ns#' term='kinect'/><category scheme='http://www.blogger.com/atom/ns#' term='rgbd'/><category scheme='http://www.blogger.com/atom/ns#' term='Microsoft'/><category scheme='http://www.blogger.com/atom/ns#' term='xiaofeng ren'/><category scheme='http://www.blogger.com/atom/ns#' term='hacking'/><category scheme='http://www.blogger.com/atom/ns#' term='superpixel'/><title type='text'>why I should be hacking with a kinect</title><content type='html'>It was recently brought to my attention that &lt;a href="http://acberg.com/"&gt;Alex Berg a.k.a. Alexander Berg&lt;/a&gt; is &lt;a href="http://acberg.com/kinect/"&gt;hacking with a Kinect&lt;/a&gt;.&lt;br /&gt;In case you didn't know, Alex Berg is an assistant professor at Stony       Brook University as of Sept 2010.&amp;nbsp; He came out of Jitendra Malik's group, and can be thought of as my academic uncle (because he got his PhD with Jitendra at basically the same time as my advisor, &lt;a href="http://www.cs.cmu.edu/%7Eefros/"&gt;Alyosha Efros&lt;/a&gt;). I am a big fan of Alex Berg's work.&amp;nbsp; (See the paper at ECCV 2010: &lt;a href="http://acberg.com/papers/10000_categories_eccv2010.pdf"&gt;&lt;b&gt;What does classifying more than 10,000 image categories tell us?&lt;/b&gt;&lt;/a&gt; and note his upcoming workshop "Large Scale Learning for Vision" at CVPR 2011).&lt;br /&gt;&lt;br /&gt;I had already known that &lt;a href="http://seattle.intel-research.net/%7Exren/"&gt;Xiaofeng Ren&lt;/a&gt; has been hacking with RGB-D cameras such as the Kinect for some time now.&amp;nbsp; Xiaofeng (&lt;a href="http://www.pronouncenames.com/pronounce/xiaofeng"&gt;pronunciation of first name&lt;/a&gt;) Ren is a research scientist at &lt;a href="http://www.seattle.intel-research.net/"&gt;Intel Labs Seattle&lt;/a&gt; since 2008 and on the affiliate faculty at the CSE department at UW since 2010.&amp;nbsp; He is another one my many academic uncles and has contributed greatly to the field of Computer Vision.&amp;nbsp; For some of his recent work with Kinects, see his &lt;a href="http://ils.intel-research.net/projects/rgbd"&gt;RGB-D project page&lt;/a&gt;. Xiaofeng Ren's work has also been very influential during my own  research -- it is worthwhile to recall that he coined the term "superpixels", which is  prevalent in contemporary Computer Vision literature. &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.xbox.com/en-US/kinect" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="179" src="http://nxeassets.xbox.com/shaxam/0201/e8/16/e816cf5b-acd6-4204-b158-142f7df17fb9.JPG?v=1#kinect_product_front.JPG" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;So when I learned that these bad-ass ex-Berkeley hackers are hacking  with Kinects, I figured it was the time to acquire one of my own. &lt;b&gt;I bought a Kinect today&lt;/b&gt; and plan on playing with &lt;a href="http://acberg.com/kinect/"&gt;Alex Berg's kinect2matlab interface for Mac OS X&lt;/a&gt; soon! &lt;br /&gt;&lt;br /&gt;&lt;b&gt;So, why aren't you hacking with a kinect?&lt;/b&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-5244048245399773268?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/5244048245399773268/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2010/12/why-i-should-be-hacking-with-kinect.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/5244048245399773268'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/5244048245399773268'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2010/12/why-i-should-be-hacking-with-kinect.html' title='why I should be hacking with a kinect'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-4380536407491888016</id><published>2010-11-22T18:53:00.000-05:00</published><updated>2010-11-22T18:53:09.370-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='computer graphics'/><category scheme='http://www.blogger.com/atom/ns#' term='stanford'/><category scheme='http://www.blogger.com/atom/ns#' term='context'/><category scheme='http://www.blogger.com/atom/ns#' term='philosophy'/><category scheme='http://www.blogger.com/atom/ns#' term='visual memex'/><title type='text'>I, for one, welcome our new Visual Memex-based overlords</title><content type='html'>&lt;div style="text-align: center;"&gt;&lt;b&gt;Welcome to the era of visual intelligence -- the era of Visual Memex-based overlords (now in 3D!)&lt;/b&gt;&lt;br /&gt;&lt;b&gt;&amp;nbsp;&amp;nbsp;&lt;/b&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_6fdO7SbU8eM/TOr6PLs1ZmI/AAAAAAAADHg/A_3evd49Qho/s1600/context_sa.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="115" src="http://1.bp.blogspot.com/_6fdO7SbU8eM/TOr6PLs1ZmI/AAAAAAAADHg/A_3evd49Qho/s400/context_sa.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="" style="clear: both; text-align: center;"&gt;Image from&amp;nbsp;&amp;nbsp;&lt;a href="http://graphics.stanford.edu/~mdfisher/Data/Context.pdf"&gt;Context-Based Search for 3D Models&lt;/a&gt;, by&amp;nbsp;&lt;a href="http://graphics.stanford.edu/~mdfisher/"&gt;Matthew Fisher&lt;/a&gt;&amp;nbsp;and&amp;nbsp;&lt;a href="http://www.graphics.stanford.edu/~hanrahan/"&gt;Pat Hanrahan&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;The goal of today's post is simple: to empower you, the reader, with an exciting and fresh perspective on the problem of visual reasoning. &amp;nbsp;This simple idea is one of the central tenets promulgated in my upcoming doctoral dissertation -- and but I'd like to give this potent &lt;a href="http://en.wikipedia.org/wiki/Meme"&gt;meme&lt;/a&gt; a head start. &amp;nbsp;Visual Memex-style reasoning is not the kind of reasoning that is described in classic graduate level textbooks on AI (e.g. first-order logic). &amp;nbsp;In the case that you've mentally over-fit to a graduate-level CS curriculum, you might even portray my iconoclastic views as ramblings of a lunatic -- this is okay, I know at least&amp;nbsp;&lt;a href="http://en.wikipedia.org/wiki/Ludwig_Wittgenstein"&gt;Ludwig&lt;/a&gt; would be proud.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The Visual Memex is a mentality/perspective which, I believe, can overcome many limitations faced by modern computer vision systems. &amp;nbsp;What the Visual Memex can do for visual intelligence is akin to what the World Wide Web has done for knowledge (see Weinberger's excellent book &lt;a href="http://www.everythingismiscellaneous.com/"&gt;"Everything is&amp;nbsp;Miscellaneous"&lt;/a&gt; for the full argument). &amp;nbsp;It's akin to using Google for acquiring knowledge instead of going to the library -- maybe knowledge was never meant to be embedded in bookshelves. &amp;nbsp;The idea is&amp;nbsp;embarrassingly&amp;nbsp;simple: &lt;b&gt;replace visual object categories with object exemplars and relationships between those exemplars. &amp;nbsp;&lt;/b&gt;Maybe the linguistic categories that we (as humans) cannot seem to live without are mere shadows cast on the wall of a dark cave. &amp;nbsp;Psychologists have long abandoned rigid categories in their models of how humans think about concepts, but the notion of a class is so fundamental to contemporary Machine Learning that many haven't even bothered to question its tenuous foundations. &amp;nbsp;While categories (also referred to as classes) definitely make learning algorithms easier to formalize, maybe its better to let the data speak for itself. &lt;b&gt;&amp;nbsp;Free the data!&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.cs.cmu.edu/~tmalisie/projects/nips09/memex2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="113" src="http://www.cs.cmu.edu/~tmalisie/projects/nips09/memex2.png" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;One upcoming research paper inspired by this category-free mentality is:&amp;nbsp;&lt;a href="http://graphics.stanford.edu/~mdfisher/Data/Context.pdf"&gt;Context-Based Search for 3D Models&lt;/a&gt;, by&amp;nbsp;&lt;a href="http://graphics.stanford.edu/~mdfisher/"&gt;Matthew Fisher&lt;/a&gt; and&amp;nbsp;&lt;a href="http://www.graphics.stanford.edu/~hanrahan/"&gt;Pat Hanrahan&lt;/a&gt;, of Stanford University. &amp;nbsp;This paper will be presented at &lt;a href="http://www.siggraph.org/asia2010/"&gt;SIGGRAPH Asia 2010&lt;/a&gt;. &amp;nbsp;Maybe it is time to abandon those rigid categories and memexify your own research problem?&lt;br /&gt;&lt;br /&gt;Further reading:&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;The Classic:&amp;nbsp;&lt;a href="http://en.wikipedia.org/wiki/Memex"&gt;Vannevar Bush's Memex&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://en.wikipedia.org/wiki/Allegory_of_the_Cave"&gt;Plato's Allegory of the Cave&lt;/a&gt;&lt;/li&gt;&lt;li&gt;2D object recognition:&amp;nbsp;&lt;a href="http://www.cs.cmu.edu/~tmalisie/projects/nips09/"&gt;Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships&lt;/a&gt;&lt;/li&gt;&lt;li&gt;3D object modeling:&amp;nbsp;&lt;a href="http://graphics.stanford.edu/~mdfisher/Data/Context.pdf"&gt;Context-Based Search for 3D Models&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-4380536407491888016?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/4380536407491888016/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2010/11/i-for-one-welcome-our-new-visual-memex.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/4380536407491888016'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/4380536407491888016'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2010/11/i-for-one-welcome-our-new-visual-memex.html' title='I, for one, welcome our new Visual Memex-based overlords'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_6fdO7SbU8eM/TOr6PLs1ZmI/AAAAAAAADHg/A_3evd49Qho/s72-c/context_sa.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-3753884459163892021</id><published>2010-11-13T13:27:00.000-05:00</published><updated>2010-11-13T13:27:38.335-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='rant'/><category scheme='http://www.blogger.com/atom/ns#' term='cvpr'/><category scheme='http://www.blogger.com/atom/ns#' term='publishing'/><title type='text'>CVPR, the A+'s of yesteryear, and robots need us</title><content type='html'>It is November yet again, and I'm proud to announce my last &lt;a href="http://cvpr2011.org/"&gt;CVPR&lt;/a&gt; submission as a graduate student!&amp;nbsp; It is that time of the year again -- the post-CVPR downtime.&amp;nbsp; It is time to mentally tuck away the fruits of our labor (&lt;i&gt;NOTE: you might want to create a readme.txt which explains how to use the 20,000 lines of code you wrote in the 7 days preceding the deadline&lt;/i&gt;), consider the long-term impact of our work, and perhaps even reconsider our position in life.&lt;br /&gt;&lt;br /&gt;I want to build intelligent machines, and I feel &lt;b&gt;vision is the right place to start&lt;/b&gt; -- even roboticists such as &lt;a href="http://people.csail.mit.edu/brooks/"&gt;Rodney Brooks&lt;/a&gt; started out in vision. However, I don't feel churning out 'cute' CVPR papers is going to do much.&amp;nbsp; Perhaps if all one cares about in life is getting tenure at a top ranked university, then proof-of-concept papers might be the path of least resistance.&amp;nbsp; But remember when you were a teen, and you wanted to build a rocket which lets you travel at relativistic speeds -- allowing you to go back in time?&amp;nbsp; Or remember when you wanted to build those humanoid robots that would both entertain your kid sister and help out your mother with house chores?&lt;br /&gt;&lt;br /&gt;So why did so many intelligent people I know abandon those grandeur dreams and settle for bread crumbs?&amp;nbsp; Getting your paper submitted to a peer-reviewed conference, so that you can pad your CV with another publication, is incommensurable with the dreams you once had.&amp;nbsp; The publication of today is the A+ of yesteryear, and it is just way too easy for us, intellectuals, to stay comfortable with those A's, without asking for more.&amp;nbsp; But &lt;b&gt;robots need us&lt;/b&gt;, CVPR papers won't assemble themselves into intelligent machines.&lt;br /&gt;&lt;br /&gt;But the deadline is over, and now its time to relax.&amp;nbsp; If my rant did not make sense to you, then I envy you.&amp;nbsp; I have to move on to more positive things -- I need to finish reading Pinker's Blank Slate, read some more Wittgenstein (and fully assimilate his criticism of Augustine's theory of language-acquisition), waste two days playing with the Riemann Zeta function (because the Basel problem was only the beginning), play some guitar, etc.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-3753884459163892021?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/3753884459163892021/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2010/11/cvpr-as-of-yesteryear-and-robots-need.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/3753884459163892021'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/3753884459163892021'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2010/11/cvpr-as-of-yesteryear-and-robots-need.html' title='CVPR, the A+&apos;s of yesteryear, and robots need us'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-5582856687814396312</id><published>2010-08-25T14:21:00.000-05:00</published><updated>2010-08-25T14:21:24.733-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MIT'/><category scheme='http://www.blogger.com/atom/ns#' term='knowledge'/><category scheme='http://www.blogger.com/atom/ns#' term='marvin minsky'/><category scheme='http://www.blogger.com/atom/ns#' term='artificial intelligence'/><category scheme='http://www.blogger.com/atom/ns#' term='david marr'/><title type='text'>Multifaceted Knowledge Representation: Ideas from Marvin Minsky</title><content type='html'>"I think a key to AI is the need for several representations of the knowledge, such that when the system is stuck (using one representation) it can jump to use another.  When &lt;a href="http://en.wikipedia.org/wiki/David_Marr_%28neuroscientist%29"&gt;David Marr&lt;/a&gt; at MIT moved into computer vision, he generated a lot of excitement, but he hit up against the problem of knowledge representation; he had no good representations for knowledge in his vision systems." -- &lt;a href="http://web.media.mit.edu/%7Eminsky/"&gt;Marvin Minsky&lt;/a&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://web.media.mit.edu/%7Eminsky/minsky.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://web.media.mit.edu/%7Eminsky/minsky.gif" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Check out the &lt;a href="http://mitpress.mit.edu/e-books/Hal/chap2/two1.html"&gt;full interview with Marvin Minsky here&lt;/a&gt; -- a must read for anybody serious about building intelligent machines!&amp;nbsp; This interview appears to be a part of a larger volume: &lt;a href="http://mitpress.mit.edu/e-books/hal/"&gt;Hal's Legacy&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I believe that in order to make the enterprise of computer vision of success, we must seriously broaden our outlook on the problem.&amp;nbsp; Are we seriously expecting algorithms to delineate object boundaries from real images based on statistics of patch descriptors without any sort of model of the world?&lt;br /&gt;&lt;br /&gt;I don't know about you, but I seriously want to build intelligent machines.&amp;nbsp; I don't think there will ever be any sort of low-level SIFT-esque algorithm that "solves vision."&amp;nbsp; It is a much grander picture of intelligence that I'm really after -- and successful computer vision will be a result(component?) of a higher-level intelligent machine.&amp;nbsp; Machines need to know about a whole lot more than is found in a single image -- and the necessary conceptual tools might not be present in the computer vision community.&lt;br /&gt;&lt;br /&gt;A recurring theme in my blog is my belief that we must become renaissance men -- a unison of *nix hackers, vision scientists, cognitive scientists, philosophers, athletes, machine learning scientists, skilled orators, and much more -- if we are to have any hope of chiseling away at the problem of computational intelligence.&amp;nbsp; Minsky was a pioneer of computational intelligence, and his words revitalize my own research efforts in this direction.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-5582856687814396312?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/5582856687814396312/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2010/08/multifaceted-knowledge-representation.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/5582856687814396312'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/5582856687814396312'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2010/08/multifaceted-knowledge-representation.html' title='Multifaceted Knowledge Representation: Ideas from Marvin Minsky'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-5747330814074565284</id><published>2010-08-23T12:24:00.001-05:00</published><updated>2010-08-24T02:10:39.683-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MIT'/><category scheme='http://www.blogger.com/atom/ns#' term='image parsing'/><category scheme='http://www.blogger.com/atom/ns#' term='computer graphics'/><category scheme='http://www.blogger.com/atom/ns#' term='blocks world'/><category scheme='http://www.blogger.com/atom/ns#' term='CMU'/><category scheme='http://www.blogger.com/atom/ns#' term='machine perception'/><category scheme='http://www.blogger.com/atom/ns#' term='image understanding'/><category scheme='http://www.blogger.com/atom/ns#' term='object recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='roberts'/><category scheme='http://www.blogger.com/atom/ns#' term='inverse optics'/><category scheme='http://www.blogger.com/atom/ns#' term='appearance'/><category scheme='http://www.blogger.com/atom/ns#' term='geometry'/><title type='text'>Beyond pixel-wise labeling: Blocks World Revisited</title><content type='html'>&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;"Thoughts without content are empty, intuitions without concepts are blind.&lt;/b&gt;" -- &lt;a href="http://en.wikipedia.org/wiki/Immanuel_Kant"&gt;Immanuel Kant&lt;/a&gt;&amp;nbsp;&lt;/div&gt;&lt;br /&gt;The Holy Grail problem of computer vision research is general-purpose image understanding.&amp;nbsp; Given as input a digital image (perhaps from Flickr or from Google Image search), we want to recognize the depicted objects (cars, dogs, sheep, Macbook Pros), their functional properties (which of the depicted objects are suitable for sitting), and recover the underlying geometry and spatial relations (which objects are lying on the desk).&amp;nbsp; &lt;br /&gt;&lt;br /&gt;The early days of vision were dominated via the "&lt;b&gt;Image Understanding as Inverse Optics&lt;/b&gt;" mentality.&amp;nbsp; In order to make the problem easier, as well as to cope with the meager computational resources of the 60s, early computer vision researchers tried to recover the 3D geometry of simple scenes consisting of arrangements of &lt;b&gt;blocks&lt;/b&gt;.&amp;nbsp; One of the earlier efforts in this direction, is the PhD thesis &lt;a href="http://dspace.mit.edu/handle/1721.1/11589"&gt;Machine Perception of Three-Dimensional Solids&lt;/a&gt; by &lt;a href="http://www.packet.cc/"&gt;Larry Roberts&lt;/a&gt; from MIT back in 1963.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.packet.cc/images/mach-per-fig2.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://www.packet.cc/images/mach-per-fig2.jpg" width="256" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;But wait -- these block-worlds are unlike anything found in the real world!&amp;nbsp; The drastic divide between the imagery that vision researchers were studying in the 60s and what humans observe during their daily experiences ultimately led to the disappearance of block-worlds in computer vision research.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://computerblindness.blogspot.com/" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="197" src="http://3.bp.blogspot.com/_IDEWI0P9RbA/TB6VYXUN_CI/AAAAAAAAACA/DllZ1rwWktA/s320/semsegm.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;Image Parsing Concept Image from &lt;a href="http://computerblindness.blogspot.com/"&gt;Computer Blindness Blog&lt;/a&gt; &lt;/div&gt;&lt;br /&gt;Over the past couple of decades, we have seen the success of Machine Learning, and it is of no surprise that we are currently living in the "&lt;b&gt;Image Understanding as statistical inference&lt;/b&gt;" era.&amp;nbsp; While a single 256x256 grayscale image might have been okay to use in  the 1960s, today's computer vision researchers use powerful computer  clusters and do serious heavy-lifting on millions of real-world  megapixel images.&amp;nbsp;  The man-made blocks-world of the 1960s is a thing of the past, and the variety found on random images downloaded from Flickr is the complexity we must now cope with. &lt;br /&gt;&lt;br /&gt;&lt;hr /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.cs.cmu.edu/%7Eabhinavg/blocksworld/" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="78" src="http://www.cs.cmu.edu/%7Eabhinavg/blocksworld/img/p3.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;While the style of computer vision research has shifted since its early  days in the 1960s/1970s,&amp;nbsp; many old ideas (and perhaps prematurely  considered outdated) are making a comeback! &lt;br /&gt;&lt;br /&gt;Assigning basic-level object category labels to pixels is a very popular theme in vision.&amp;nbsp; Unfortunately, to gain a deeper understanding of an image, &lt;b&gt;robots will inevitably have to go beyond pixel-level class labels&lt;/b&gt;.&amp;nbsp; (This is one of the central themes in my thesis -- coming out soon!)&amp;nbsp; Given human-level understanding of a scene, it is trivial to represent it as a pixel-wise labeling map, but given a pixel-wise labeling map it is not trivial to convert it to human-level understanding.&amp;nbsp; &lt;br /&gt;&lt;br /&gt;What sort of questions &lt;b&gt;can be answered&lt;/b&gt; about a scene when the output of an "image understanding" system is represented as a pixel-wise label map?&lt;br /&gt;&lt;br /&gt;1. Is there a car in the image?&lt;br /&gt;2. Is there a person at this location in the image?&lt;br /&gt;&lt;br /&gt;What questions&lt;b&gt; cannot be answered &lt;/b&gt;given a pixel-wise label map?&lt;br /&gt;&lt;br /&gt;1. How many cars are in this image? (While there are some approaches that strive to deal with delineating object instance boundaries, most image parsing approaches fail to recognize boundaries between two instances of the same category)&lt;br /&gt;2. Which surfaces can I sit on?&lt;br /&gt;3. Where can I park my car?&lt;br /&gt;4. How geometrically stable are the objects in the scene?&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;While I have more criticisms than tentative solutions, I believe that vision students shouldn't be parochially preoccupied with solely the most recent approach to image understanding.&amp;nbsp; It is valuable to go back several decades in the literature and gain a broader perspective on image understanding.&amp;nbsp; However, some progress is being made!&amp;nbsp; A deeply insightful upcoming paper from ECCV 2010, is the following:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.cs.cmu.edu/%7Eabhinavg"&gt;Abhinav Gupta&lt;/a&gt;, &lt;a href="http://www.cs.cmu.edu/%7Eefros"&gt;Alexei A. Efros&lt;/a&gt; and &lt;a href="http://www.cs.cmu.edu/%7Ehebert"&gt;Martial Hebert&lt;/a&gt;, &lt;a href="http://www.cs.cmu.edu/%7Eabhinavg/blocksworld/"&gt;&lt;b&gt;Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics&lt;/b&gt;&lt;/a&gt;, European Conference on Computer Vision, 2010. &lt;a href="http://www.cs.cmu.edu/%7Eabhinavg/blocksworld/blocksworld.pdf"&gt;(PDF)&lt;/a&gt;    &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.cs.cmu.edu/%7Eabhinavg/blocksworld/" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="152" src="http://www.cs.cmu.edu/%7Eabhinavg/blocksworld/img/fig3.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;What Abhinav Gupta does very elegantly in this paper is connect the blocks-world research of the 1960s with the geometric-class estimation problem, as introduced by &lt;a href="http://www.cs.uiuc.edu/homes/dhoiem/"&gt;Derek Hoiem&lt;/a&gt;.&amp;nbsp; While the final system is evaluation in a Hoiem-like pixel-wise labeling task, the actual scene representation is 3D.&amp;nbsp; The blocks in this approach are more abstract than the Lego-like volumes in the 1960s -- Abhinav's blocks are actually cars, buildings, and trees. I included the infamous Immanuel Kant quote, because I feel it describes Abhinav's work very well.&amp;nbsp; Abhinav introduces the block as a theoretical construct which glues together a scene's elements and provides a much more solid interpretation -- Abhinav's blocks add the content to geometric image understanding which is lacking in the purely pixe-wise approaches.&lt;br /&gt;&lt;br /&gt;While integrating large-scale categorization into this type of geometric reasoning is still an open problem, Abhinav provides us visionaries with a glimpse of what image understanding should be.&amp;nbsp; The integration of robotics with image understanding technology will surely drive pixel-based "dumb" image understanding approaches to extinction.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-5747330814074565284?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/5747330814074565284/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2010/08/beyond-pixel-wise-labeling-blocks-world.html#comment-form' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/5747330814074565284'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/5747330814074565284'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2010/08/beyond-pixel-wise-labeling-blocks-world.html' title='Beyond pixel-wise labeling: Blocks World Revisited'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_IDEWI0P9RbA/TB6VYXUN_CI/AAAAAAAAACA/DllZ1rwWktA/s72-c/semsegm.png' height='72' width='72'/><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-3484069061421598529</id><published>2010-06-17T17:30:00.000-05:00</published><updated>2010-06-17T17:30:47.344-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='cvpr 2010'/><category scheme='http://www.blogger.com/atom/ns#' term='papers'/><category scheme='http://www.blogger.com/atom/ns#' term='cvpr'/><title type='text'>more papers to check out from cvpr</title><content type='html'>Here are more CVPR 2010 papers which I either found interesting or plan on reading when I get back to PIT.&amp;nbsp; Enjoy!&lt;br /&gt;&lt;br /&gt;&lt;span id="ctl00_cph_sessionDetails_Label1" style="font-size: small;"&gt;&lt;a href="http://www.socher.org/uploads/Main/SocherFeiFei_CVPR2010.pdf"&gt;&lt;b&gt;Connecting Modalities:                          Semi-supervised Segmentation and Annotation of  Images                          Using Unaligned Text Corpora&lt;/b&gt;&lt;/a&gt; &lt;/span&gt;                                                                  &lt;span id="ctl00_cph_sessionDetails_Label1" style="font-size: small;"&gt;&amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;span id="ctl00_cph_sessionDetails_Label1" style="font-size: small;"&gt;Authors:&amp;nbsp; Richard Socher                          (Stanford University) , Li Fei-Fei (Stanford   University)&amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;a href="http://people.cs.uchicago.edu/%7Erbg/papers/Cascade-Object-Detection-with-Deformable-Part-Models--Felzenszwalb-Girshick-McAllester.pdf"&gt;Cascade Object  Detection                         with Deformable Part Models &amp;nbsp;&lt;/a&gt;&lt;/b&gt;                                                                  &lt;br /&gt;&lt;span id="ctl00_cph_sessionDetails_Label1" style="font-size: small;"&gt;Authors:&amp;nbsp; Pedro                          Felzenszwalb (University of Chicago) , Ross  Girshick                          (University ) , David McAllester (Toyota   Technological Institute, Chicago)&amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.cs.umd.edu/%7Ebehjat/papers/CVPR10.pdf"&gt;&lt;span id="ctl00_cph_sessionDetails_Label1" style="font-size: small;"&gt;&lt;b&gt;Beyond Active Noun  Tagging:                         Modeling Contextual Interactions for  Multi-Class  Active                         Learning &lt;/b&gt;&lt;/span&gt;&lt;/a&gt;                                                                  &lt;span id="ctl00_cph_sessionDetails_Label1" style="font-size: small;"&gt;&amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;span id="ctl00_cph_sessionDetails_Label1" style="font-size: small;"&gt;Authors:&amp;nbsp; Behjat Siddiquie                          (UMIACS) , Abhinav Gupta (Carnegie Mellon   University)&amp;nbsp;&amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;a href="http://people.cs.uchicago.edu/%7Epff/papers/dpseg9.pdf"&gt;Tiered Scene Labeling  with                         Dynamic Programming&amp;nbsp;&amp;nbsp;&lt;/a&gt;&lt;/b&gt;&lt;br /&gt;&lt;span id="ctl00_cph_sessionDetails_Label1" style="font-size: small;"&gt;Authors:&amp;nbsp; Pedro                          Felzenszwalb (University of Chicago) , Olga  Veksler                          (University of Western Ontario)&amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span id="ctl00_cph_sessionDetails_Label1" style="font-size: small;"&gt;&lt;a href="http://www.ics.uci.edu/%7Edramanan/papers/layers.pdf"&gt;&lt;b&gt;Layered Object  Detection for                         Multi-Class Segmentation&lt;/b&gt;&lt;/a&gt; &lt;/span&gt;                                                                  &lt;span id="ctl00_cph_sessionDetails_Label1" style="font-size: small;"&gt;&amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;span id="ctl00_cph_sessionDetails_Label1" style="font-size: small;"&gt;Authors:&amp;nbsp; Yi Yang (UCI) ,                          Sam Hallman () , Deva Ramanan () , Charless   Fowlkes (UC                         Irvine)&amp;nbsp;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;span id="ctl00_cph_sessionDetails_Label1" style="font-size: small;"&gt;&lt;a href="http://ai.stanford.edu/%7Epawan/kumar10.pdf"&gt;Efficiently Selecting                          Regions for Scene Understanding&lt;/a&gt;&amp;nbsp;&lt;/span&gt;&lt;span id="ctl00_cph_sessionDetails_Label1" style="font-size: small;"&gt;&amp;nbsp;&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;span id="ctl00_cph_sessionDetails_Label1" style="font-size: small;"&gt;Authors:&amp;nbsp; M. Pawan Kumar                          (Stanford University) , Daphne Koller (Stanford)                        &lt;/span&gt;&lt;span id="ctl00_cph_sessionDetails_Label1" style="font-size: small;"&gt;&amp;nbsp;&lt;/span&gt;&lt;span id="ctl00_cph_sessionDetails_Label1" style="font-size: small;"&gt;&amp;nbsp;&lt;/span&gt;&lt;span id="ctl00_cph_sessionDetails_Label1" style="font-size: small;"&gt; &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://graphics.stanford.edu/projects/lgl/papers/hgoag-iwcecic-10/hgoag-iwcecic-10.pdf"&gt;&lt;span style="font-size: small;"&gt;&lt;span id="ctl00_cph_sessionDetails_Label1"&gt;&lt;b&gt;Image Webs: Computing and                         Exploiting Connectivity in Image Collections&lt;/b&gt;&lt;/span&gt;&lt;/span&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-size: small;"&gt;&lt;span id="ctl00_cph_sessionDetails_Label1"&gt;&lt;/span&gt;&lt;span id="ctl00_cph_sessionDetails_Label1"&gt;Authors:&amp;nbsp; Kyle Heath                         (Stanford) , Natasha Gelfand (Nokia Research -  Palo                         Alto, CA) , Maks Ovsjanikov (Stanford  University) ,                         Mridul Aanjaneya (Stanford University) ,  Leonidas                         Guibas (Stanford University) &lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span id="ctl00_cph_sessionDetails_Label1" style="font-size: small;"&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-3484069061421598529?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/3484069061421598529/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2010/06/more-papers-to-check-out-from-cvpr.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/3484069061421598529'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/3484069061421598529'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2010/06/more-papers-to-check-out-from-cvpr.html' title='more papers to check out from cvpr'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-4755842676007115933</id><published>2010-06-13T20:54:00.000-05:00</published><updated>2010-06-13T20:54:58.810-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='soup of segments'/><category scheme='http://www.blogger.com/atom/ns#' term='3d recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='graph cuts'/><category scheme='http://www.blogger.com/atom/ns#' term='cvpr 2010'/><category scheme='http://www.blogger.com/atom/ns#' term='image segmentation'/><category scheme='http://www.blogger.com/atom/ns#' term='inference'/><category scheme='http://www.blogger.com/atom/ns#' term='multiple segmentations'/><title type='text'>constrained parametric min-cuts: exciting segmentation for the sake of recognition</title><content type='html'>I would like to introduce two papers about &lt;b&gt;Constrained Parametric Min-Cuts&lt;/b&gt; from C. Sminchisescu's group.&amp;nbsp; These papers are very relevant to my research direction (which lies at the intersection of segmentation and recognition).&amp;nbsp; Like my own work, these papers are about &lt;b&gt;segmentation for recognition's sake&lt;/b&gt;.&amp;nbsp; The segmentation algorithm proposed in the paper is a sort of "segment sliding approach", where many binary graph-cuts optimization problems are solved for different Grab-Cut style initializations.&amp;nbsp; These segments are then scored using a learned scoring function -- think regression versus classification.&amp;nbsp; They show that these top segments are actually quite meaningful and correspond to object boundaries really well.&amp;nbsp; Finally a tractable number of top hypothesis (still overlapping at this stage), are piped into a recognition engine.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_6fdO7SbU8eM/TBWJU4qZQ9I/AAAAAAAADFw/aO6QZhYjPFk/s1600/constrained_cuts.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="160" src="http://2.bp.blogspot.com/_6fdO7SbU8eM/TBWJU4qZQ9I/AAAAAAAADFw/aO6QZhYjPFk/s320/constrained_cuts.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;The idea that features derived from segments are better for recognition than features from the spatial support of a sliding rectangle resonates in all of my papers.&amp;nbsp; Regarding these CVPR2010 papers, I like their ideas of learning a category-free "segmentation-function" and the sort of multiple-segmentation version of this algorithm is very appealing.&amp;nbsp; If I remember correctly, the idea of learning a segmentation function comes to us from X. Ren, and the idea of using multiple segmentation comes from D. Hoiem. These papers are a cool new idea utilizing both insights.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://sminchisescu.ins.uni-bonn.de/people/carreira.html"&gt;J.&amp;nbsp;Carreira&lt;/a&gt;  and &lt;a href="http://sminchisescu.ins.uni-bonn.de/"&gt;C.&amp;nbsp;Sminchisescu&lt;/a&gt;.   &lt;a href="http://sminchisescu.ins.uni-bonn.de/papers/cs-cvpr10.pdf"&gt;Constrained  Parametric Min-Cuts for Automatic Object Segmentation&lt;/a&gt;.  In CVPR  2010. &lt;br /&gt;&lt;br /&gt;&lt;a href="http://sminchisescu.ins.uni-bonn.de/people/li.html"&gt;F.&amp;nbsp;Li&lt;/a&gt;,  &lt;a href="http://sminchisescu.ins.uni-bonn.de/people/carreira.html"&gt;J.&amp;nbsp;Carreira&lt;/a&gt;,  and &lt;a href="http://sminchisescu.ins.uni-bonn.de/"&gt;C.&amp;nbsp;Sminchisescu&lt;/a&gt;.  &lt;a href="http://sminchisescu.ins.uni-bonn.de/papers/cls-cvpr10.pdf"&gt;  Object Recognition as Ranking Holistic Figure-Ground Hypotheses.&lt;/a&gt;  In  CVPR 2010.&lt;br /&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;br /&gt;&lt;b&gt;-------&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Spotlights for these papers are during these tracks at CVPR2010:&lt;br /&gt;&lt;b&gt;Object Recognition III: Similar Shapes&lt;/b&gt;&lt;br /&gt;&lt;b&gt;Segmentation and Grouping II: Semantic Segmentation tracks&lt;/b&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-4755842676007115933?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/4755842676007115933/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2010/06/constrained-parametric-min-cuts.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/4755842676007115933'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/4755842676007115933'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2010/06/constrained-parametric-min-cuts.html' title='constrained parametric min-cuts: exciting segmentation for the sake of recognition'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_6fdO7SbU8eM/TBWJU4qZQ9I/AAAAAAAADFw/aO6QZhYjPFk/s72-c/constrained_cuts.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-8712174900135998739</id><published>2010-06-13T15:25:00.000-05:00</published><updated>2010-06-13T15:25:30.709-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sven'/><category scheme='http://www.blogger.com/atom/ns#' term='perception'/><category scheme='http://www.blogger.com/atom/ns#' term='image segmentation'/><category scheme='http://www.blogger.com/atom/ns#' term='grouping'/><category scheme='http://www.blogger.com/atom/ns#' term='abstraction'/><title type='text'>Sven Dickinson at POCV 2010, ACVHL Tomorrow</title><content type='html'>This morning, &lt;a href="http://www.cs.toronto.edu/%7Esven/"&gt;Sven Dickinson&lt;/a&gt; gave a talk to start the &lt;a href="http://cvl.cse.sc.edu/pocv.html"&gt;POCV 2010 Workshop&lt;/a&gt; at CVPR2010.&amp;nbsp; For those of you who might not know, POCV stands for Perceptual Organization in Computer Vision.&amp;nbsp; While segmentation can be thought of as a perceptual grouping process, contiguous regions don't have to be the end product of a meaningful perceptual grouping process.&amp;nbsp; There are many popular and useful algorithms which group non-accidental contours yet come short of a full-blown image segmentation.&lt;br /&gt;&lt;br /&gt;The title of Dickinson's talk was "The Role of Intermediate Shape Priors in Perceptual Grouping and Image Abstraction." In the beginning of his talk, Sven pointed out how perceptual organization was at its prime in the mid 90s and declined in the 2000s due to the popularity of machine learning and the "detection" task.&amp;nbsp; He believes that good perceptual grouping is what is going to make vision scale -- that is, without first squeezing out all that we can out of the bottom level we are doomed to fail.&lt;br /&gt;&lt;br /&gt;Dickinsons showed some nice results from his most recent research efforts where objects are broken down into generic "parts" -- this reminded me of Biederman's geons, although Sven's fitting is done in the 2D image plane.&amp;nbsp; Sven emphasized that successful shape primitives must be category-independent if we are to have scalable recognition of thousands of visual concepts in images.&amp;nbsp; This is much different than the mainstream per-category object detection task which has been popularized by contests such as the PASCAL VOC.&lt;br /&gt;&lt;br /&gt;While I personally believe that there is a good place for perceptual organization in vision, I wouldn't view it as the Holy Grail.&amp;nbsp; It is perhaps the Holy Bridge we must inevitably cross on the way to finding the Holy Grail.&amp;nbsp; I believe that for full-grown fully-functional members of society, our ability to effortlessly cope with the world is chiefly due to its simplicity and repeatability, and not due to some amazing internal perceptual organization algorithm.&amp;nbsp; Perhaps it is when we were children -- viewing the world through a psychedelic fog of innocence -- that perceptual grouping helped us cut up the world into meaningful entities.&lt;br /&gt;&lt;br /&gt;A common theme in Sven's talk was the idea of Learning to Group in a category-independent way.&amp;nbsp; This means that all of the successes of Machine Learning aren't thrown out the door, and this appears to a quite different way of grouping than what has been done in the 1970s.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.salon.com/tech/feature/2006/07/24/turks/story.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="219" src="http://www.salon.com/tech/feature/2006/07/24/turks/story.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;Tomorrow I will be at &lt;a href="http://ttic.uchicago.edu/%7Edparikh/acvhl2010.htm"&gt;ACVHL Workshop "Advancing Computer Vision with Humans in the Loop"&lt;/a&gt;.&amp;nbsp; I haven't personally "turked" yet, but I feel I will be jumping on the bandwagon soon.&amp;nbsp; Anyways, the keynote speakers should make for an awesome workshop.&amp;nbsp; They do not need introductions: David Forsyth, Aude Oliva, Fei-Fei Li, Antonio Torralba, and Serge Belongie -- all influential visionaries.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-8712174900135998739?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/8712174900135998739/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2010/06/sven-dickinson-at-pocv-2010-acvhl.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/8712174900135998739'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/8712174900135998739'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2010/06/sven-dickinson-at-pocv-2010-acvhl.html' title='Sven Dickinson at POCV 2010, ACVHL Tomorrow'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-8487802295886293980</id><published>2010-06-13T01:23:00.001-05:00</published><updated>2010-06-13T01:24:56.427-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='exemplars'/><category scheme='http://www.blogger.com/atom/ns#' term='rosch'/><category scheme='http://www.blogger.com/atom/ns#' term='books'/><category scheme='http://www.blogger.com/atom/ns#' term='concepts'/><category scheme='http://www.blogger.com/atom/ns#' term='weinberger'/><category scheme='http://www.blogger.com/atom/ns#' term='copernicus'/><category scheme='http://www.blogger.com/atom/ns#' term='prototypes'/><category scheme='http://www.blogger.com/atom/ns#' term='knowledge'/><category scheme='http://www.blogger.com/atom/ns#' term='hierarchy'/><category scheme='http://www.blogger.com/atom/ns#' term='visual memex'/><category scheme='http://www.blogger.com/atom/ns#' term='cvpr'/><category scheme='http://www.blogger.com/atom/ns#' term='everything is misc'/><category scheme='http://www.blogger.com/atom/ns#' term='categorization'/><category scheme='http://www.blogger.com/atom/ns#' term='aristotle'/><title type='text'>everything is misc -- torralba cvpr paper to check out</title><content type='html'>Weinberger's&lt;a href="http://www.everythingismiscellaneous.com/"&gt; Everything is Miscellaneous&lt;/a&gt; is a delightful read -- I just finished it today while flying from PIT to SFO.&amp;nbsp; It was recommended to me by my PhD advisor, &lt;a href="http://www.cs.cmu.edu/%7Eefros/"&gt;Alyosha&lt;/a&gt;, and now I can see why!&amp;nbsp; Many of the key motivations behind my current research on object representation deeply resonate in Weinberger's book. &lt;br /&gt;&lt;br /&gt;Weinberger motivates Rosch's theory of categorization (the Prototype Model), and explains how it is a significant break from the thousand years of Aristotelian thought.&amp;nbsp; Aristotle gave us the notion of a category -- centered around the notion of a definition.&amp;nbsp; For Aristotle, every object can be stripped to its essential core, and place in its proper place in a God's-eye objective organization of the world.&amp;nbsp; It was Rosch who showed us that categories are much fuzzier and more hectic than suggested by the rigid Aristotelian system. Just like Copernicus single-handedly stopped the Sun and set the Earth in motion, Rosch disintegrated our neatly organized world-view and demonstrated how an individual's path through life shapes h/er concepts.&lt;br /&gt;&lt;br /&gt;I think it is fair to say that my own ideas as well as Weinberger's aren't so much an extension of the Roschian mode of thought, but also a significant break from the entire category-based way of thinking.&amp;nbsp; Given that Rosch studied Wittgenstein as a student, I'm surprised her stance wasn't more extreme, more along the anti-category line of thought.&amp;nbsp; I don't want to undermine her contribution to psychology and computer science in any way, and I want to be clear that she should only be lauded for her remarkable research.&amp;nbsp; Perhaps Wittgenstein was as extreme and iconoclastic as I like my philosophers to be, but Rosch provided us with a computational theory and not just a philosophical lecture.&lt;br /&gt;&lt;br /&gt;From my limited expertise in theories of categorization in the field of Psychology, whether it is Prototype Models or the more recent data-driven Exemplar Models, these theories are still &lt;b&gt;theories of categories&lt;/b&gt;.&amp;nbsp; Whether the similarity computations are between prototypes and stimuli, or between exemplars and stimuli, the output of a categorization model is still a category.&amp;nbsp; Weinberger is all about modern data-driven notions of knowledge organization, in a way that breaks free from the imprisoning notion of a category.&amp;nbsp; Knowledge is power, so why imprison it in rigid modules called categories?&amp;nbsp; Below is a toy visualization of a web of concepts, as imagined by me.&amp;nbsp; This is very much the web-based view of the world.&amp;nbsp; Wikipedia is a bunch of pages and links.&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;&lt;br /&gt;&lt;/b&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Artistic rendition of a "web of concepts"&lt;/b&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://2.bp.blogspot.com/_6fdO7SbU8eM/TBRqj0O7yVI/AAAAAAAADFs/TD_E5_cQbDM/s1600/ball_Graph_lambda2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="230" src="http://2.bp.blogspot.com/_6fdO7SbU8eM/TBRqj0O7yVI/AAAAAAAADFs/TD_E5_cQbDM/s320/ball_Graph_lambda2.png" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;I found it valuable to think of the &lt;a href="http://www.cs.cmu.edu/%7Etmalisie/projects/nips09/"&gt;Visual Memex&lt;/a&gt;, the model I'm developing in my thesis research, as &lt;b&gt;an anti-categorization model of knowledge&lt;/b&gt; -- a vast network of object-object relationships.&amp;nbsp; The idea of using little concrete bits of information to create a rich non-parametric web is the recurring theme in Weinberger's book.&amp;nbsp; In my case, the problem of extracting primitives from images, and all of the problem in dealing with real-world images are around to plague me, and the Visual Memex must rely on many Computer Vision techniques -- such things are not discussed in Weinberger's book.&amp;nbsp; The "perception" or "segmentation" component of the Visual Memex is not trivial -- where linking words on the web is much easier.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;CVPR paper to look out for&lt;/b&gt;&lt;/div&gt;&lt;br /&gt;However, the category-based view is all around us.&amp;nbsp; I expect most of this year's CVPR papers to fit in this category-based view of the world. One paper, co-authored by the great &lt;a href="http://web.mit.edu/torralba/www/"&gt;Torralba&lt;/a&gt;, looks relevant to my interests.&amp;nbsp; It is yet another triumph for the category-based mentality in computer vision.&amp;nbsp; In fact, one of the figures in the paper demonstrates the category-based  view of the world very well.&amp;nbsp; Unlike the memex, the organization is explicit in the following figure:&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a href="http://web.mit.edu/%7Emyungjin/www/HContext.html"&gt;&lt;b&gt;Exploiting Hierarchical Context on a Large Database of Object Categories&lt;/b&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://web.mit.edu/%7Emyungjin/www/images/SUN_tree.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="87" src="http://web.mit.edu/%7Emyungjin/www/images/SUN_tree.jpg" width="400" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a href="http://web.mit.edu/%7Emyungjin/www/HContext.html"&gt;Exploiting Hierarchical Context on a Large Database of Object Categories&lt;/a&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;a href="http://www.mit.edu/%7Emyungjin/"&gt;Myung Jin Choi&lt;/a&gt;, &lt;a href="http://people.csail.mit.edu/lim/"&gt;Joseph Lim&lt;/a&gt;, &lt;a href="http://web.mit.edu/torralba/www/"&gt;Antonio Torralba&lt;/a&gt;, and Alan  S. Willsky. CVPR 2010. &lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-8487802295886293980?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/8487802295886293980/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2010/06/everything-is-misc-torralba-cvpr-paper.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/8487802295886293980'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/8487802295886293980'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2010/06/everything-is-misc-torralba-cvpr-paper.html' title='everything is misc -- torralba cvpr paper to check out'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_6fdO7SbU8eM/TBRqj0O7yVI/AAAAAAAADFs/TD_E5_cQbDM/s72-c/ball_Graph_lambda2.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-642563116507536756</id><published>2010-06-11T23:17:00.003-05:00</published><updated>2010-06-11T23:33:53.050-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='blogging'/><category scheme='http://www.blogger.com/atom/ns#' term='cvpr'/><category scheme='http://www.blogger.com/atom/ns#' term='sharing knowledge'/><title type='text'>blogging from CVPR2010</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://cvl.umiacs.umd.edu/conferences/cvpr2010/images/Mosaic800.jpg"&gt;&lt;img style="display: block; margin: 0px auto 10px; text-align: center; cursor: pointer; width: 458px; height: 112px;" src="http://cvl.umiacs.umd.edu/conferences/cvpr2010/images/Mosaic800.jpg" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;It might not be one of those glamorous Apple events during which Steve Jobs introduces a new shiny gadget for the masses to desire, but plenty of exciting stuff happens at CVPR which instills desire into our souls, that is, the souls of computer vision scientists.  Wouldn't you rather see the Great Torralba give a talk over some big company's chief executive officer?  For those of you who do not know what &lt;a href="http://cvl.umiacs.umd.edu/conferences/cvpr2010/"&gt;CVPR&lt;/a&gt; is -- it is one of the big Computer Vision conferences during which we (&lt;span style="font-weight: bold;"&gt;the geeks, scientists, engineers, developers, hackers, and mathematicians&lt;/span&gt;) exchange ideas regarding our most recent research in the world of computer vision.&lt;br /&gt;&lt;br /&gt;I am flying to SF tomorrow morning, and will be blogging about some of the cool papers I encounter at this year's CVPR.  I do not have a paper at this year's conference so I'm in full assimilate-knowledge mode where I hope to absorb thousands of ideas related to my field.  I already mentioned some of Kristen Grauman's cool segmentation papers, but expect to see in the next several blog posts many additional discussions for what I think are "exciting" papers.  I am already getting excited and have plenty of papers to read during my flight, in addition to finishing &lt;a href="http://isbn.nu/0805080430"&gt;Everything is Miscellaneous&lt;/a&gt;.  I will be blogging from CVPR, like an Apple fanboy would at one of those Apple WWDC events -- but I will share math, theory, algorithms, and the like.&lt;br /&gt;&lt;br /&gt;As always, &lt;a href="http://www.cvpapers.com/cvpr2010.html"&gt;the list of CVPR 2010 papers on the web can be found here&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-642563116507536756?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/642563116507536756/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2010/06/blogging-from-cvpr2010.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/642563116507536756'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/642563116507536756'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2010/06/blogging-from-cvpr2010.html' title='blogging from CVPR2010'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-5482702638351188938</id><published>2010-05-25T15:49:00.000-05:00</published><updated>2010-05-25T15:49:28.957-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='marathon'/><category scheme='http://www.blogger.com/atom/ns#' term='picasa'/><category scheme='http://www.blogger.com/atom/ns#' term='mean face'/><category scheme='http://www.blogger.com/atom/ns#' term='torralba art'/><category scheme='http://www.blogger.com/atom/ns#' term='sports'/><category scheme='http://www.blogger.com/atom/ns#' term='running'/><title type='text'>my average face &amp;&amp; second half marathon</title><content type='html'>&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_6fdO7SbU8eM/S_w036aFBJI/AAAAAAAADEI/uH2WQXKANWU/s1600/face_mean.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://1.bp.blogspot.com/_6fdO7SbU8eM/S_w036aFBJI/AAAAAAAADEI/uH2WQXKANWU/s320/face_mean.png" width="271" /&gt;&lt;/a&gt;&lt;/div&gt;In case you didn't know, Picasa now performs face recognition in your photos.&amp;nbsp; I found it amusing to see the progression of my own face over the past several years.&amp;nbsp; Picasa lets you extract these face tiles into an 'export' directory, and it is trivial to load them up in Matlab for additional fun.&amp;nbsp; I produced some &lt;a href="http://web.mit.edu/torralba/www/"&gt;Torralba&lt;/a&gt;Art by averaging over 400 faces of myself (with no alignment whatsoever) collected over the past several years.&amp;nbsp; These photos come from my personal photo collection, so I'm not making them publicly available.&amp;nbsp; But here's the average face!&amp;nbsp; I resized all images to 500x500 before averaging and resized the average to the mean aspect ratio of all images.&amp;nbsp; The "black-eyes" come from the fact that I was wearing black sunglasses in about 10% of the photos.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://1.bp.blogspot.com/_6fdO7SbU8eM/S_w04JuEdTI/AAAAAAAADEM/yT99wRcxLWw/s1600/tomasz_half_marathon_pgh.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="320" src="http://1.bp.blogspot.com/_6fdO7SbU8eM/S_w04JuEdTI/AAAAAAAADEM/yT99wRcxLWw/s320/tomasz_half_marathon_pgh.png" width="241" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;On another note, I ran the &lt;a href="http://www.pittsburghmarathon.com/"&gt;Pittsburgh Half Marathon&lt;/a&gt; this year.&amp;nbsp; This was my second half marathon ever -- my first one was last summer in San Francisco.&amp;nbsp; This time, my finishing time was 1hour 40 minutes, which happens to be the goal I set for myself (10 minutes faster than my SF time).&amp;nbsp; The first 20 minutes I was passing everybody in front of me since it was quite crowded.&amp;nbsp; I could probably shave another 2-3 minutes off if I start towards the front of the herd, but I'll need some serious training if I'm going to reach 1:30 in a future race.&amp;nbsp; I might even run a full marathon one of these days...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-5482702638351188938?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/5482702638351188938/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2010/05/my-average-face-second-half-marathon.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/5482702638351188938'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/5482702638351188938'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2010/05/my-average-face-second-half-marathon.html' title='my average face &amp;&amp; second half marathon'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_6fdO7SbU8eM/S_w036aFBJI/AAAAAAAADEI/uH2WQXKANWU/s72-c/face_mean.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-6753036520390844390</id><published>2010-05-09T18:45:00.001-05:00</published><updated>2010-05-09T18:48:07.667-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='sfdp'/><category scheme='http://www.blogger.com/atom/ns#' term='computer graphics'/><category scheme='http://www.blogger.com/atom/ns#' term='visualization'/><category scheme='http://www.blogger.com/atom/ns#' term='graphviz'/><category scheme='http://www.blogger.com/atom/ns#' term='mathematics'/><category scheme='http://www.blogger.com/atom/ns#' term='fractals'/><title type='text'>graph visualizations as sexy as fractals</title><content type='html'>I love to display mathematical phenomena -- often for me the proof is in the visualization.  If you ever steal one of my personal research notebooks you'll see that the number of graphs I've been drawing over the years has been increasing at a steady rate.  This is a habit I acquired from studying Probabilistic Graphical Models and the machine learning-heavy curriculum at CMU.&lt;br /&gt;&lt;br /&gt;Back in high school I was amazed by the beauty of fractals based on Newton's method for finding roots, but as I've slowly been shifting my mode of thought from continuous optimization problems to discrete ones, automated graph visualization is as close as I've ever gotten to being an artist.  Here is one such sexy graph visualization from &lt;a href="http://www2.research.att.com/%7Eyifanhu/"&gt;Yifan Hu&lt;/a&gt;'s gallery.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;&lt;span style="font-size:small;"&gt;Andrianov/lpl1 via sfdp by Yifan Hu&lt;/span&gt;&lt;/b&gt; &lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www2.research.att.com/%7Eyifanhu/GALLERY/GRAPHS/index1.html" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img src="http://www2.research.att.com/%7Eyifanhu/GALLERY/GRAPHS/GIF_SMALL/Andrianov@lpl1.gif" border="0" height="258" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;I have been using &lt;a href="http://www.graphviz.org/"&gt;Graphviz&lt;/a&gt; for about 8 years now, and I just can't get enough.  I never thought it would produce anything as beautiful as this!  I generally used graphviz to produce graphs like this:&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.graphviz.org/Gallery/directed/world.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img src="http://www.graphviz.org/Gallery/directed/world.png" border="0" height="178" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;Inspired by Yifan Hu and his &lt;a href="http://www2.research.att.com/%7Eyifanhu/research_interest.html#graphDrawing"&gt;amazing multilevel force directed algorithm for visualizing graphs&lt;/a&gt; I've started using sfdp for some of my own visualizations.  sfdp is now inside graphviz, and can be used with the -K switch as follows (also with overlap=scale):&lt;br /&gt;&lt;br /&gt;&lt;b&gt;$ dot -Ksfdp -Tpdf memex.gv &amp;gt; memex.pdf&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Inspired by Yifan Hu's coloring scheme based on edge length, I color the edges using a standard matlab jet colormap with shorter edges being red and longer ones being blue.  To get the resulting lengths of edges, I actually run sfdp twice -- once to read off the vertex positions (this is what the graph drawing optimization produces), and once again to assign the edge colors based on those lengths.  I could process the resulting postscript with one run like Yifan, but I don't want to figure out how to parse postscript files today.  Here is an example using some of my own data.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;Car Concept Visual Memex via sfdp by Tomasz&lt;/b&gt; &lt;b&gt;Malisiewicz &lt;/b&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.cs.cmu.edu/%7Etmalisie/images/memex/car_sfdp_memex.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img src="http://www.cs.cmu.edu/%7Etmalisie/images/memex/car_sfdp_memex_small.png" border="0" height="295" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;This is a visualization of the car subset of the Visual Memex I use as an internal organization of visual concepts to be used for image understanding.  If you click on this image, it will show you a significantly larger png.&lt;br /&gt;&lt;br /&gt;As a sanity check, I also created a visualization of a standard UF Sparse Matrix (here is both mine and Yifan's result)&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;UTM1700b via sfdp by Yifan Hu&lt;/b&gt;&lt;b&gt;&lt;/b&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www2.research.att.com/%7Eyifanhu/GALLERY/GRAPHS/PDF/TOKAMAK@utm1700b.pdf" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img src="http://www.research.att.com/%7Eyifanhu/GALLERY/GRAPHS/GIF_SMALL/TOKAMAK@utm1700b.gif" border="0" height="234" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;b&gt;UTM1700b via sfdp by Tomasz Malisiewicz &lt;/b&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.cs.cmu.edu/%7Etmalisie/images/memex/utm1700b.pdf" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img src="http://www.cs.cmu.edu/%7Etmalisie/images/memex/utm1700b.png" border="0" height="223" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;As you can see, the graphs are pretty similar, modulo some coloring strategy differences -- but since the colors are somewhat arbitrary this is not an issue.  If you click on these pictures you can see the PDFs which were generated via graphviz.  Now only if my real-world computer vision graph were as structured as these toy problems then others could view me as both an artist and a scientist (like a true Renaissance man).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-6753036520390844390?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/6753036520390844390/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2010/05/graph-visualizations-as-sexy-as.html#comment-form' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/6753036520390844390'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/6753036520390844390'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2010/05/graph-visualizations-as-sexy-as.html' title='graph visualizations as sexy as fractals'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-5904562312023093510</id><published>2010-04-14T23:57:00.001-05:00</published><updated>2010-04-15T00:00:23.044-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='street view'/><category scheme='http://www.blogger.com/atom/ns#' term='internet-scale'/><category scheme='http://www.blogger.com/atom/ns#' term='google'/><category scheme='http://www.blogger.com/atom/ns#' term='geometry'/><title type='text'>Internet-scale vision at CVPR 2010</title><content type='html'>I've been reading some of the recent CVPR 2010 papers (check out the &lt;a href="http://www.cvpapers.com/cvpr2010.html"&gt;CVPR papers on the web page&lt;/a&gt; to see the full list), and I came a cool video produced by &lt;a href="http://www.cs.washington.edu/homes/furukawa/"&gt;Yasutaka Furukawa&lt;/a&gt;.  I met Yasutaka when I was a visitor at Jean Ponce's WILLOW group in Paris during Spring 2008, and I was truly amazed by some of the cool geometry-based work he has done.  Being a recognition/machine-learning guy myself, I can only appreciate and wonder at the amazing work produced by in-depth knowledge of geometry.  In this particular case, the images aren't ones that Yasutaka collected himself.  The idea behind internet-scale vision is that you can use the millions of photos on sites such as Flickr.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Here is a cool video below, very much in the spirit of Photosynth.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;object height="385" width="640"&gt;&lt;param name="movie" value="http://www.youtube.com/v/ofHFOr2nRxU&amp;amp;color1=0xb1b1b1&amp;amp;color2=0xcfcfcf&amp;amp;hl=en_US&amp;amp;feature=player_embedded&amp;amp;fs=1"&gt;&lt;param name="allowFullScreen" value="true"&gt;&lt;param name="allowScriptAccess" value="always"&gt;&lt;embed src="http://www.youtube.com/v/ofHFOr2nRxU&amp;amp;color1=0xb1b1b1&amp;amp;color2=0xcfcfcf&amp;amp;hl=en_US&amp;amp;feature=player_embedded&amp;amp;fs=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" height="385" width="640"&gt;&lt;/embed&gt;&lt;/object&gt;&lt;br /&gt;&lt;br /&gt;It is also not a surprise to find that Yasutaka is now working at Google.   One can only imagine where Google is going to apply the "Street-View" mentality next.  Cities like NYC already have nice high-resolution building facades, see picture below from Google Earth Blog.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.gearthblog.com/blog/archives/2010/04/new_york_city_gets_lifelike_facades.html" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img src="http://www.gearthblog.com/blog/archives/2010/04/13/nyc-facades.jpg" border="0" height="210" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;a href="http://www.gearthblog.com/blog/archives/2010/04/new_york_city_gets_lifelike_facades.html"&gt;&lt;br /&gt;&lt;/a&gt;&lt;br /&gt;I want to one day run all of my object recognition experiments on Google Street view, and there is probably only a handful of places in the world that have the computational infrastructure to play with such experiments.  I drool at the idea of one day building a &lt;a href="http://www.cs.cmu.edu/%7Etmalisie/projects/nips09/"&gt;Visual Memex&lt;/a&gt; from billions of online images (and this can only happen at at place like Google).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-5904562312023093510?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/5904562312023093510/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2010/04/internet-scale-vision-at-cvpr-2010.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/5904562312023093510'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/5904562312023093510'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2010/04/internet-scale-vision-at-cvpr-2010.html' title='Internet-scale vision at CVPR 2010'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-4310673204462244741</id><published>2010-04-05T14:28:00.001-05:00</published><updated>2010-04-05T14:30:40.899-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='indexing'/><category scheme='http://www.blogger.com/atom/ns#' term='beyond categories'/><category scheme='http://www.blogger.com/atom/ns#' term='ontologies'/><category scheme='http://www.blogger.com/atom/ns#' term='libraries'/><category scheme='http://www.blogger.com/atom/ns#' term='categorization'/><title type='text'>Ontology is Overrated: Categories, Links, and Tags</title><content type='html'>&lt;div style="text-align: center;"&gt;&lt;a href="http://www.shirky.com/writings/ontology_overrated.html"&gt;Ontology is Overrated: Categories, Links, and Tags&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://www.shirky.com/writings/ontology_overrated.html" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" height="195" src="http://www.shirky.com/writings/ontology_images/just_links.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;br /&gt;This is the title of &lt;a href="http://www.shirky.com/writings/ontology_overrated.html"&gt;a powerful treatise written by Clay Shirky&lt;/a&gt;, in which he strives to "convince you that a lot of what we think we know about categorization is wrong."&amp;nbsp; Much thanks to David Weinberger's blog &lt;a href="http://www.everythingismiscellaneous.com/"&gt;www.everythingismiscellaneous.com&lt;/a&gt; for pointing out this article.&amp;nbsp; The take home message is quite similar to some of the "Beyond Categories" ideas I've tried to promulgate in my meager attempt to understand why progress in computer vision has reached a standstill.&amp;nbsp; For anybody interested in understanding the limitations of classical systems of categorization, this article is a worth a read.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-4310673204462244741?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/4310673204462244741/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2010/04/ontology-is-overrated-categories-links.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/4310673204462244741'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/4310673204462244741'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2010/04/ontology-is-overrated-categories-links.html' title='Ontology is Overrated: Categories, Links, and Tags'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-6388270286282305039</id><published>2010-04-05T00:10:00.004-05:00</published><updated>2010-04-05T00:16:16.547-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='kristen grauman'/><category scheme='http://www.blogger.com/atom/ns#' term='cvpr 2010'/><category scheme='http://www.blogger.com/atom/ns#' term='image segmentation'/><category scheme='http://www.blogger.com/atom/ns#' term='papers'/><category scheme='http://www.blogger.com/atom/ns#' term='cvpr'/><title type='text'>Exciting Computer Vision papers from Kristen Grauman's UT-Austin Group</title><content type='html'>Back in 2005, I remember meeting &lt;a href="http://userweb.cs.utexas.edu/~grauman/"&gt;Kristen Grauman&lt;/a&gt; at MIT's accepted PhD student open house. &amp;nbsp;Back then she was a PhD student under &lt;a href="http://www.eecs.berkeley.edu/~trevor/"&gt;Trevor Darrell&lt;/a&gt;&amp;nbsp;(and is known for her work on the &lt;a href="http://userweb.cs.utexas.edu/~grauman/research/projects/pmk/pmk_projectpage.htm"&gt;Pyramid Match Kernel&lt;/a&gt;), but now she has her own vision group at UT-Austin. &amp;nbsp;She is the the advisor behind many cool vision projects there, and here are a few segmenatation/categorization related papers from the upcoming CVPR2010 conference. &amp;nbsp;I look forward to checking out these papers because they are relevant to my own research interests. &amp;nbsp;&lt;b&gt;NOTE&lt;/b&gt;: some of the papers links are still not up -- I just used the links from Kristen's webpage.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a href="http://users.ece.utexas.edu/~ylee/objectgraph/objectgraph.html"&gt;Object-Graphs for Context-Aware Category Discovery.&lt;/a&gt;&amp;nbsp;&lt;a href="http://users.ece.utexas.edu/~ylee/"&gt;Y. J. Lee&lt;/a&gt; and K. Grauman&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://users.ece.utexas.edu/~ylee/objectgraph/objectgraph.html"&gt;&lt;img border="0" height="226" src="http://userweb.cs.utexas.edu/~grauman/research/ims/objgraph.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: left;"&gt;&lt;br /&gt;&lt;/div&gt;&lt;hr /&gt;&lt;div style="text-align: center;"&gt;&lt;a href="http://userweb.cs.utexas.edu/~grauman/papers/lee_collectcut_cvpr2010.pdf"&gt;Collect-Cut: Segmentation with Top-Down Cues Discovered in Multi-Object Images.&lt;/a&gt;&amp;nbsp;&lt;a href="http://users.ece.utexas.edu/~ylee/"&gt;Y. J. Lee&lt;/a&gt; and K. Grauman&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://userweb.cs.utexas.edu/~grauman/papers/lee_collectcut_cvpr2010.pdf"&gt;&lt;img border="0" src="http://userweb.cs.utexas.edu/~grauman/research/ims/image013.jpg" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;hr /&gt;&lt;div style="text-align: center;"&gt;&lt;a href="http://userweb.cs.utexas.edu/~grauman/papers/kim_cvpr2010.pdf"&gt;Asymmetric Region-to-Image Matching for Comparing Images with Generic Object Categories.&lt;/a&gt;&amp;nbsp;&lt;a href="http://userweb.cs.utexas.edu/~jaechul/"&gt;J. Kim&lt;/a&gt; and K. Grauman&lt;/div&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://userweb.cs.utexas.edu/~grauman/papers/kim_cvpr2010.pdf"&gt;&lt;img border="0" height="101" src="http://userweb.cs.utexas.edu/~grauman/research/ims/regions.jpg" width="320" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-6388270286282305039?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/6388270286282305039/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2010/04/exciting-computer-vision-papers-from.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/6388270286282305039'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/6388270286282305039'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2010/04/exciting-computer-vision-papers-from.html' title='Exciting Computer Vision papers from Kristen Grauman&apos;s UT-Austin Group'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-6637835093764280341</id><published>2010-03-22T01:50:00.007-05:00</published><updated>2010-03-22T12:59:38.439-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='graduate student life'/><category scheme='http://www.blogger.com/atom/ns#' term='programming'/><category scheme='http://www.blogger.com/atom/ns#' term='lesson'/><category scheme='http://www.blogger.com/atom/ns#' term='coding n00bs'/><title type='text'>PhDs make many smart programmers become software engineering n00bs</title><content type='html'>This is true.  &lt;b&gt;A couple of years in a PhD program -- reading papers and writing throw-away code in Matlab, and it easy to become a throw-away programmer, a sort of liability in the real world.&lt;/b&gt;  It is no surprise many companies look down on hiring PhDs.  I've seen kids enter the PhD program with real programming talent and exit real software engineering &lt;a href="http://www.urbandictionary.com/define.php?term=n00b"&gt;n00bs&lt;/a&gt;.  In graduate school, you might code for 6 years without anybody grading your code.  If you get sloppy, you will be worse off than when you started.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The problem is that many advisors don't care about their students writing good code.  Writing good papers and giving good presentations -- you will be told that this is what makes good PhD students.  Who cares about writing good code? -- we'll just have some 'engineering' people re-write it once you become famous.  This is what students across the globe are being fed.  This is no surprise, because your advisor won't get tenure by turning you into a mean mathematically-inclined super hacker.  Then again, your advisor won't care if you go bald, are malnutritioned, and have no life outside research.  There are many things that one has to take care of themselves, and software development skills aren't any different. &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Note to the real world looking to hire talent&lt;/b&gt;:  You should grill, I mean really grill fresh PhDs regarding the software development skills.  Don't become mesmerized by their 4.0s, their long publication lists, and all their 'achievements.'  If you want to hire a fresh PhD to write code, whether in a research or an engineering setting, then give them one hell-of-an-interview.  I agree with Google's interview process.  I studied for it, I am proud of my own software engineering skills, and I was proud to have been an intern at Google (twice).  But I know of companies who were sorry they hired PhDs only to learn these recent graduates could only dabble on the board and would utterly fail at the terminal.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;b&gt;Note to PhDs looking to one day take our skill-set and impact the real world&lt;/b&gt;:  Never stop learning and never stop writing good code.  Never stop taking care of yourself.  You were the brightest of the brightest before you started your PhD, and now you have 5-6 years to exit as a real superman.  With all the mathematics and presentations skills you will acquire during a PhD ,on top of good software engineering skills, you will become invaluable to the real world.  Its a real shame to become less valuable to the outside world after 6 years of a strenuous PhD program.  But nobody will give you the recipe for success.  Nobody will tell you to exercise, but if you want to pound your brain with mental challenges for decades to come, you will need physical exercise in your daily regiment.  Your advisors won't tell you that keeping up to date on the tools of the trade, and being a real hacker, is very valuable in the real world.  You will be told that fast results = many papers and its not worth writing good code.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;After obtaining a PhD we should be role-models for the entire world.  Seriously, why not?  If a PhD is the highest degree that an institution can grant, then we should feel proud about getting one.  But we are human, and one is only as strong as their weakest link.  We should become super hackers, fear no quantum mechanics, fear no presentation in front of a crowd, and be all that one can be.  &lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This is a part of a serious of posts aimed at finding flaws in the academic/PhD process and how it pertains to building strong/intelligent/confident individuals.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-6637835093764280341?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/6637835093764280341/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2010/03/phds-make-many-smart-programmers-become.html#comment-form' title='19 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/6637835093764280341'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/6637835093764280341'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2010/03/phds-make-many-smart-programmers-become.html' title='PhDs make many smart programmers become software engineering n00bs'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>19</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-5870578596811610177</id><published>2010-03-18T18:46:00.010-05:00</published><updated>2010-03-19T17:10:27.314-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='renaissance'/><category scheme='http://www.blogger.com/atom/ns#' term='philosophy'/><category scheme='http://www.blogger.com/atom/ns#' term='mathematics'/><category scheme='http://www.blogger.com/atom/ns#' term='physics'/><title type='text'>Back to basics: Vision Science Transcends Mathematics</title><content type='html'>Vision (a.k.a. image understanding, image interpretation, perception, object recognition)  is quite unlike some of the mathematical problems we were  introduced to in our youth.  In fact, thinking of vision as a "&lt;span style="font-style: italic;"&gt;mathematical&lt;/span&gt; &lt;span style="font-style: italic;"&gt;problem"&lt;/span&gt; in the traditional sense is questionable.  An important characteristic of such "problems" is that by pointing them out we already have a notion of what it would be like to solve them.  Does a child think of gravity as such a problem?  Well, probably not, because without the necessary mathematical backbone there is no problem with gravity!  It's just the way the world works!  But once a child has been perverted by mathematics and introduced into the intellectual world of science, the world ceases to just be.  The world become a massive equation.&lt;br /&gt;&lt;br /&gt;Consider the seemingly elementary problem of &lt;a href="http://www.sosmath.com/algebra/factor/fac11/fac11.html"&gt;finding the  roots of a cubic polynomial&lt;/a&gt;.  Many of us can recite the quadratic  equation by heart, but not the one for cubics (try deriving the simpler quadratic formula by hand).  If we were given  one evening and a whole lot of blank paper, we could try tackling this  problem (no Google allowed!).  While the probability of failure is quite high (and arguably most of us would fail), it would still make  sense of "coming closer to the solution".  Maybe we could even solve the  problem when some terms are missing, etc.  The important thing here is that the notion of having reached a solution is well-defined.  Also, once we've found the solution it would probably be easier to  convince ourselves that it is correct (verification would be easier than  actually coming up with the solution).&lt;br /&gt;&lt;br /&gt;Vision is more like  theoretical physics, psychology, and philosophy and less like the well-defined math  problem I described above.  When dealing with the math problem described above, we know  what the symbols mean, we know valid operations -- the game is already set in place.  In vision, just  like physics, psychology and philosophy, the notion of a fundamental operational unit (which happens to be &lt;span style="font-style: italic;"&gt;an object&lt;/span&gt; for vision) isn't rigidly  defined as the &lt;a href="http://www.mtnmath.com/whatth/node7.html"&gt;Platonic Ideals&lt;/a&gt; used throughout mathematics.  &lt;span style="font-weight: bold;"&gt;We know what a circle is, we know what a real-valued variable is, but what is a "car"?&lt;/span&gt;  Consider your mental image of a car.  Now remove a wheel and ask yourself, is this still a car?  Surely!  But what happens as we start removing more and more elements.  At what point does this object cease to be a car and become a motor, a single tire, or a piece of metal?  The circle, a Platonic Ideal, ceases to become a circle once it has suffered the most trivial of all perturbations -- any deviation from perfection, and &lt;span style="font-weight: bold;"&gt;boom!&lt;/span&gt; the circle ceases to be a circle. &lt;br /&gt;&lt;br /&gt;Much of Computer Vision does not ask such metaphysical questions, as objects of the real world are seamlessly mapped to abstract symbols that our mathematically-inclined PhD students love to play with.  I am sad to report that this naive mapping between objects of the real world and mathematical symbols isn't so much a questions of style, it is basically &lt;span style="font-style: italic;"&gt;the foundation of modern computer vision research&lt;/span&gt;.  So what must be done to expand this parochial field of Vision into a mature field?  Wake up and stop coding!  I think Vision needs a sort of a &lt;span style="font-weight: bold;"&gt;mental coup d'état&lt;/span&gt;, a fresh outlook on old problem.  Sometimes to make progress we have start with a clean slate -- current visionaries do not possess the right tools for this challenging enterprise.  Instead of throwing higher-level mathematics at the problem, maybe we are barking up the wrong tree?  However, if mathematics is the only thing we are good at, then how are we to have a mature discussion which transcends mathematics?  &lt;span style="font-weight: bold;"&gt;The window through which we peer circumscribes the world we see.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;I  believe if we are to make progress in this challenging endeavor, we  must first become Renaissance men, a sort of Neitzschean &lt;a href="http://en.wikipedia.org/wiki/%C3%9Cbermensch"&gt;Übermensch&lt;/a&gt;.  We must understand what has been  said about perception, space, time, and the structure of the universe.   We must become better historians.  We must study not only more mathematics, but more physics, more psychology, read more Aristotle and Kant, build better robots,  engineer stabler software, become better sculptors and painters, become more articulate orators, establish better personal relationships, etc. Once we've mastered more domains of reality, and only then, will we have a better set of tools for coping with paradoxes inherent in artificial intelligence.  Because a better grasp on reality -- inching closer to enlightenment -- will result in asking more meaningful questions.&lt;br /&gt;&lt;br /&gt;I am optimistic.  But the enterprise which I've  outlined will require a new type of individual, one worthy of the name  Renaissance Man.  We aren't interested in toy problems here, nor cute  solutions.  If we want to make progress, we must shape our lives and  outlooks around this very fact.  Two steps backwards and three steps forward.  Rinse, lather, repeat.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-5870578596811610177?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/5870578596811610177/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2010/03/back-to-basics-vision-science.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/5870578596811610177'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/5870578596811610177'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2010/03/back-to-basics-vision-science.html' title='Back to basics: Vision Science Transcends Mathematics'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-7932830919800918909</id><published>2010-03-05T10:48:00.006-05:00</published><updated>2010-03-09T16:27:17.166-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='1970s'/><category scheme='http://www.blogger.com/atom/ns#' term='knowledge'/><category scheme='http://www.blogger.com/atom/ns#' term='image understanding'/><category scheme='http://www.blogger.com/atom/ns#' term='barrow'/><category scheme='http://www.blogger.com/atom/ns#' term='history'/><title type='text'>Representation and Use of Knowledge in Vision: Barrow and Tenenbaum's Conclusion</title><content type='html'>To gain a better perspective on my research regarding the Visual Memex, I spent some time reading &lt;span style="font-style: italic;"&gt;&lt;a href="http://www.amazon.com/Object-Categorization-Computer-Vision-Perspectives/dp/0521887380"&gt;Object Categorization: Computer and Human Vision Perspectives&lt;/a&gt; &lt;/span&gt;which contains many lovely essays on Computer Vision.  This book contains recently written essays by titans of Computer Vision and contains a great deal lessons learned from history.  While such a 'looking back' on vision makes for a good read, it is also worthwhile to find old works 'looking forward' and anticipating the successes and failures of the upcoming generations.&lt;br /&gt;&lt;br /&gt;In this 'looking forward' fashion, I want to share a passage regarding image understanding systems, from "&lt;a href="http://www.ai.sri.com/pub_list/1391"&gt;&lt;span style="font-weight: bold;"&gt;Representation and Use of Knowledge in Vision&lt;/span&gt;&lt;/a&gt;," by H. G. Barrow and J. M. Tenenbaum, July 1975.  This is a short paper worth reading for both graduate students and professors interested in pushing Computer Vision research to its limits.  I enjoyed the succinct and motivational ending so much, it is worth repeating it verbatim:&lt;br /&gt;&lt;br /&gt;--------&lt;br /&gt;&lt;br /&gt;III Conclusion&lt;br /&gt;&lt;br /&gt;We conclude by reiterating some of the major premises underlying this paper:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;The more knowledge the better.&lt;/span&gt;    &lt;span style="font-weight: bold;"&gt;&lt;br /&gt;The more data, the better.&lt;/span&gt;    &lt;span style="font-weight: bold;"&gt;&lt;br /&gt;Vision is a gigantic optimization problem.&lt;/span&gt;    &lt;span style="font-weight: bold;"&gt;&lt;br /&gt;Segmentation is low-level interpretation using general knowledge.&lt;/span&gt;    &lt;span style="font-weight: bold;"&gt;&lt;br /&gt;Knowledge is incrementally acquired.&lt;/span&gt;    &lt;span style="font-weight: bold;"&gt;&lt;br /&gt;Research should pursue Truth, not Efficiency.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;A further decade will determine our skill as visionaries.&lt;br /&gt;&lt;br /&gt;-------------&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-7932830919800918909?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/7932830919800918909/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2010/03/representation-and-use-of-knowledge-in.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/7932830919800918909'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/7932830919800918909'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2010/03/representation-and-use-of-knowledge-in.html' title='Representation and Use of Knowledge in Vision: Barrow and Tenenbaum&apos;s Conclusion'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-6424395107529671179</id><published>2010-02-19T11:46:00.005-05:00</published><updated>2010-02-19T11:55:42.743-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='thesis proposal'/><category scheme='http://www.blogger.com/atom/ns#' term='Takeo Kanade'/><category scheme='http://www.blogger.com/atom/ns#' term='talk'/><title type='text'>Data-Driven Image Parsing With the Visual Memex: Thesis Proposal Complete!</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_6fdO7SbU8eM/S37Ai4OELwI/AAAAAAAADB4/PL30REega9s/s1600-h/Picture+1.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 265px;" src="http://4.bp.blogspot.com/_6fdO7SbU8eM/S37Ai4OELwI/AAAAAAAADB4/PL30REega9s/s400/Picture+1.png" alt="" id="BLOGGER_PHOTO_ID_5439997105349603074" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Yesterday, I successfully gave my thesis proposal talk at CMU and it was a great experience.  The feedback I obtained from my committee members was invaluable, especially the comments from Takeo Kanade.  It was a great honor for me to have Takeo Kanade, one of the &lt;span style="font-weight: bold;"&gt;titans&lt;/span&gt; of vision, on my committee.  My external member, Pietro Perona, is also a key player object recognition, and provided some perceptive comments.&lt;br /&gt;&lt;br /&gt;I gave my talk on my Macbook Pro using Keynote.  I use the dvi output to connect to the projector and on my screen I was able to see the current slide as well as the upcoming slide.  Using Skype I was able to connect to Pietro in California and share the presentation screen (not my two-slide screen!) with him.  This was, he was able to follow along and see the same slides as everybody in the room.  Skype was a great success!&lt;br /&gt;&lt;br /&gt;I would like to thank everybody who came to my talk!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-6424395107529671179?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/6424395107529671179/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2010/02/data-driven-image-parsing-with-visual.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/6424395107529671179'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/6424395107529671179'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2010/02/data-driven-image-parsing-with-visual.html' title='Data-Driven Image Parsing With the Visual Memex: Thesis Proposal Complete!'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_6fdO7SbU8eM/S37Ai4OELwI/AAAAAAAADB4/PL30REega9s/s72-c/Picture+1.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-671325185580130200</id><published>2010-01-26T02:28:00.004-05:00</published><updated>2010-01-26T02:55:47.755-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='beyond categories'/><category scheme='http://www.blogger.com/atom/ns#' term='psychology'/><category scheme='http://www.blogger.com/atom/ns#' term='philosophy'/><category scheme='http://www.blogger.com/atom/ns#' term='books'/><category scheme='http://www.blogger.com/atom/ns#' term='concepts'/><category scheme='http://www.blogger.com/atom/ns#' term='cognitive science'/><title type='text'>Beyond Categories =? Doing without Concepts</title><content type='html'>The term "beyond category," from my limited knowledge, was originally coined to describe the music of Duke Ellington.  It is a term of praise that acknowledges that one's style is inimitable and transcends barriers.&lt;br /&gt;&lt;br /&gt;"Beyond Categories" was the first part of &lt;a href="http://www.cs.cmu.edu/%7Etmalisie/projects/nips09/"&gt;my NIPS 2009 paper&lt;/a&gt;'s title.  To "go beyond" means to transcend, to abandon or do without some limitation and strive higher -- there is nothing magical about my use of the term.  I used the term &lt;span style="font-style: italic;"&gt;category&lt;/span&gt; to refer to object categories, as are commonly used in computer vision, artificial intelligence, machine learning, as well as psychology, philosophy, and other branches of cognitive science.  One of my research goals is to go beyond the use of categories as the basis for machine perception and visual reasoning.  It has been argued by &lt;a href="http://www.pitt.edu/%7Emachery/"&gt;Machery&lt;/a&gt; that the term category is roughly equivalent to the term concept as used in psychology literature.  In some sense the title of Machery's recent book, "&lt;a href="http://www.amazon.com/Doing-without-Concepts-Edouard-Machery/dp/0195306880/ref=sr_1_3?ie=UTF8&amp;amp;s=books&amp;amp;qid=1232844568&amp;amp;sr=8-3"&gt;Doing without concepts&lt;/a&gt;," is analogous to the phrase "Beyond categories" but to reassure myself I'll have to finish reading Machery's book.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a href="http://www.pitt.edu/%7Emachery/"&gt;&lt;img style="width: 322px; height: 322px;" src="http://www.pitt.edu/%7Emachery/final%20cover.jpg" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;So far the first chapter has been a delightful exposition into the world of concepts, a term dear to researchers in machine perception (AI) as well as human categorization (psychology).  I look forward to reading the rest of the book, which I accidentally found while looking for &lt;a href="http://www.amazon.com/Classification-Cognition-Oxford-Psychology-William/dp/0195109740/ref=sr_1_2?ie=UTF8&amp;amp;s=books&amp;amp;qid=1264491979&amp;amp;sr=1-2"&gt;Estes' book on categorization&lt;/a&gt;.  I had already digested/assimilated some of Machery's work, in particular his paper titled &lt;a href="http://www.pitt.edu/%7Emachery/papers/concepts%20are%20not%20a%20natural%20kind_machery.pdf"&gt;Concepts are not a natural kind&lt;/a&gt;, so seeing his name on a book at the CMU library piqued my interest. In this 2005 paper, Machery argues that the debate between prototypes vs. exemplars vs. theories in the literature on concepts is not well-founded and there is no reason to believe a single theory should prevail.  I'll attempt to summarize some of his take-home messages and their relevance to computer vision once I finish this book.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-671325185580130200?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/671325185580130200/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2010/01/beyond-categories-doing-without.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/671325185580130200'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/671325185580130200'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2010/01/beyond-categories-doing-without.html' title='Beyond Categories =? Doing without Concepts'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-243611738278981469</id><published>2010-01-20T14:23:00.014-05:00</published><updated>2010-01-22T22:58:54.353-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MIT'/><category scheme='http://www.blogger.com/atom/ns#' term='Takeo Kanade'/><category scheme='http://www.blogger.com/atom/ns#' term='linux'/><category scheme='http://www.blogger.com/atom/ns#' term='robotics'/><category scheme='http://www.blogger.com/atom/ns#' term='lesson'/><category scheme='http://www.blogger.com/atom/ns#' term='vocabulary'/><title type='text'>Heterarchies and Control Structure in Image Interpretation</title><content type='html'>Several days ago I was reading one of &lt;a href="http://www.ri.cmu.edu/person.html?person_id=136"&gt;Takeo Kanade&lt;/a&gt;'s classic computer vision papers from 1977 titled "&lt;a href="http://www.ri.cmu.edu/pub_files/pub4/kanade_takeo_1977_1/kanade_takeo_1977_1.pdf"&gt;Model Representation and Control Structure in Image Understanding&lt;/a&gt;" and I came across a new term, &lt;span style="font-weight: bold;"&gt;heterarchy&lt;/span&gt;.  I think motivating this concept is as important as its definition. At the representational level, Kanade does a good job at advocating the use of multiple levels of representation -- from pixels to patches to regions to subimages to objects.&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_6fdO7SbU8eM/S1djLROOhmI/AAAAAAAADA8/dLNO_5Trxt4/s1600-h/kanade.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 330px; height: 400px;" src="http://1.bp.blogspot.com/_6fdO7SbU8eM/S1djLROOhmI/AAAAAAAADA8/dLNO_5Trxt4/s400/kanade.png" alt="" id="BLOGGER_PHOTO_ID_5428916921071208034" border="0" /&gt;&lt;/a&gt;  In addition to discussing the representational aspects of image understanding systems, Kanade analyzes different strategies for using knowledge in such systems (he uses the term &lt;span style="font-style: italic;"&gt;control structure&lt;/span&gt; to signify the overall flow of information between subroutines).  On one extreme is &lt;span style="font-style: italic;"&gt;pass-oriented&lt;/span&gt; processing (this is Kanade's term -- I prefer to use the terms &lt;span style="font-style: italic;"&gt;feed-forward &lt;/span&gt;or&lt;span style="font-style: italic;"&gt; bottom-up&lt;/span&gt;) which relies on iteratively building higher levels of interpretation from lower ones.  Marr's vision pipeline is mostly bottom-up, but that discussion will be left for another post.   Another extreme is top-down processing, where the image is analyzed in a global-to-local fashion.  Of course, as of 2010 these ideas are being used on a regular basis in vision.  One example is the paper &lt;a href="http://w3.cs.huji.ac.il/%7Eyweiss/LevinWeissECCV06.pdf"&gt;Learning to Combine Bottom-Up and Top-Down Segmentation&lt;/a&gt; by Levin and Weiss.&lt;br /&gt;&lt;br /&gt;Kanade acknowledges that the flow of a vision algorithm is very much dependent on the representation used.  For image understanding, bottom-up as well as top-down processing will both be critical components of the entire system.  However the exact strategy for combining these processes, in addition to countless other mid-level stages, is not very clear.  Directly quoting Kanade, "The ultimate style would be a heterarchy, in which a number of modules work together like a community of experts with no strict central executive control."  According to this line of thought, processing would occur in a loopy and cooperative style.  Kanade attributes &lt;a href="http://dspace.mit.edu/handle/1721.1/40799"&gt;the concept of a heterarchy&lt;/a&gt; to Patrick Winston who worked with robots in the golden days of AI at MIT.  Like Kanade, Winston criticizes a linear flow of information in scene interpretation (this criticism dates back to 1971).  The basic problem outlined by both Kanade and Winston is that modules such as line-finders and region-finders (think segmentation) are simply not good enough to be used in subsequent stages of understanding.  In my own research I have used the concept of multiple image segmentations to bypass some of the issued with relying on the output of low/mid -level processing for high-level processing.  In 1971 Winston envisioned an algorithmic framework that is a melange of subroutines -- a web of algorithms created by different research groups -- that would interact and cooperate to understand an image.   This is analogous to the development of an operating system like Linux.  There is no overall theory developed by a single research group that made Linux a success -- it is the body of hackers and engineers that produced a wide range of software products that make using Linux a success.&lt;br /&gt;&lt;br /&gt;Unfortunately given the tradition of computer vision research, I believe that an open-source-style group effort in this direction will not come out of university-style research (which is overly coupled with the publishing cycle).  It would be a noble effort, but would more of a feat of engineering and not science.  Imagine a group of 2-3 people creating an operating system from scratch -- it seems like a crazy idea in 2010.  However, computer vision research is often done in such small teams (actually there is often a single hacker behind a vision project).  But maybe going open-source and allowing several decades of interaction will actually produce usable image understanding systems.  I would like to one day lead such an effort -- being both the theoretical mastermind as well as the hacker behind this vision. I am an INTJ, hear me roar.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-243611738278981469?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/243611738278981469/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2010/01/heterarchies-and-control-structure-in.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/243611738278981469'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/243611738278981469'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2010/01/heterarchies-and-control-structure-in.html' title='Heterarchies and Control Structure in Image Interpretation'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_6fdO7SbU8eM/S1djLROOhmI/AAAAAAAADA8/dLNO_5Trxt4/s72-c/kanade.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-6949171888319387417</id><published>2010-01-18T14:28:00.007-05:00</published><updated>2010-01-19T11:03:46.698-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='perception'/><category scheme='http://www.blogger.com/atom/ns#' term='image interpretation'/><category scheme='http://www.blogger.com/atom/ns#' term='philosophy of science'/><category scheme='http://www.blogger.com/atom/ns#' term='image understanding'/><category scheme='http://www.blogger.com/atom/ns#' term='truth'/><title type='text'>Understanding versus Interpretation -- a philosophical distinction</title><content type='html'>Today I want to bring up an interesting discussion regarding the connotation of the word "understanding" versus "interpretation," particularly in the context of "scene understanding" versus "scene interpretation."  While many vision researchers use these terms interchangeably, I think it is worthwhile to make the distinction, albeit a philosophical one.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;On Understanding&lt;/span&gt;&lt;br /&gt;While everybody knows that the goal of computer vision is to recognize all of the objects in an image, there is plenty of disagreement about how to represent objects and recognize them in the image.  There is a physicalist account (from &lt;a href="http://en.wikipedia.org/wiki/Physicalism"&gt;Wikipedia&lt;/a&gt;: Physicalism is a philosophical position holding that everything which exists is no more extensive than its physical properties), where the goal of vision is to reconstruct veridical properties of the world.   This view is consistent with the realist stance in philosophy (think back to Philosophy 101) -- there exists a single observer-independent 'ground-truth' regarding the identities of all of the objects contained in the world.  The notion of vision as measurement is very strong under this physicalist account.  The stuff of the world is out there just waiting to be grasped!   I think the term "understanding" fits very well into this truth-driven account of computer vision.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;On interpretation&lt;/span&gt;&lt;br /&gt;The second view, a postmodern and anti-realist one, is of vision as a way of &lt;span style="font-style: italic;"&gt;interpreting&lt;/span&gt; scenes.  The shift is from veridical recovery of the properties of the world from an image (measurement) to the observer-dependent interpretation of the input stimulus.  Under this account, there is no need to believe in a god's eye 'objective' view of the world.  Image interpretation is the registration of an input image with a vast network of past experience, both visual and abstract.   The same person can vary their own interpretation of an input as time passes and the internal knowledge based has evolved. Under this view, two distinct robots could provide very useful yet distinct 'image interpretations' of the same input image.  The main idea is that different robots could have different interpretation-spaces, that is they could obtain &lt;a href="http://en.wikipedia.org/wiki/Commensurability_%28philosophy_of_science%29"&gt;incommensurable&lt;/a&gt; (yet very useful!) interpretations of the same image.&lt;br /&gt;&lt;br /&gt;It has been argued by &lt;a href="http://www.cogsci.uci.edu/personnel/hoffman/hoffman.html"&gt;Donald Hoffman&lt;/a&gt; (&lt;a href="http://www.blogger.com/www.cogsci.uci.edu/%7Eddhoff/interface.pdf"&gt;Interface Theory of Perception&lt;/a&gt;) that there is no reason why we should expect evolution to have driven humans towards veridical perception.  In fact, Hoffman argues that natures drives veridical perception towards extinction and it only makes sense to speak of perception as guiding agents towards pragmatic interpretations of their environment.&lt;br /&gt;&lt;br /&gt;In philosophy of science, there is the debate of whether the field of physics is unraveling some ultimate truth about the world versus physics painting a coherent and pragmatic picture of the world.  I've always viewed science as an art and I embrace my anti-realist stance -- which has been shaped by &lt;a href="http://en.wikipedia.org/wiki/Thomas_Samuel_Kuhn"&gt;Thomas Kuhn&lt;/a&gt;, &lt;a href="http://en.wikipedia.org/wiki/William_James"&gt;William James&lt;/a&gt;, and many others.  While my scientific interests have currently congealed in computer vision, it is no surprise that I'm finding conceptual agreement between my philosophy of science and my concrete research efforts in object recognition.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-6949171888319387417?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/6949171888319387417/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2010/01/understanding-versus-interpretation.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/6949171888319387417'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/6949171888319387417'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2010/01/understanding-versus-interpretation.html' title='Understanding versus Interpretation -- a philosophical distinction'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-4424248325637848396</id><published>2010-01-14T00:53:00.004-05:00</published><updated>2010-01-14T00:55:38.743-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='summary'/><category scheme='http://www.blogger.com/atom/ns#' term='visualization'/><category scheme='http://www.blogger.com/atom/ns#' term='text'/><category scheme='http://www.blogger.com/atom/ns#' term='wordle'/><title type='text'>wordle word summary image</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_6fdO7SbU8eM/S06xXOobFhI/AAAAAAAADAc/d_Y626KnTQw/s1600-h/Picture+6.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 261px;" src="http://4.bp.blogspot.com/_6fdO7SbU8eM/S06xXOobFhI/AAAAAAAADAc/d_Y626KnTQw/s400/Picture+6.png" alt="" id="BLOGGER_PHOTO_ID_5426469613650777618" border="0" /&gt;&lt;/a&gt;This is the &lt;a href="http://www.wordle.net/"&gt;wordle&lt;/a&gt; summary of the keywords from this blog.  Its a cute visualization of topics.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-4424248325637848396?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/4424248325637848396/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2010/01/wordle-word-summary-image.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/4424248325637848396'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/4424248325637848396'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2010/01/wordle-word-summary-image.html' title='wordle word summary image'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_6fdO7SbU8eM/S06xXOobFhI/AAAAAAAADAc/d_Y626KnTQw/s72-c/Picture+6.png' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-4771965188820696819</id><published>2010-01-12T14:27:00.004-05:00</published><updated>2010-01-12T23:15:06.325-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='interpretation'/><category scheme='http://www.blogger.com/atom/ns#' term='scene understanding'/><category scheme='http://www.blogger.com/atom/ns#' term='artificial intelligence'/><category scheme='http://www.blogger.com/atom/ns#' term='sri'/><title type='text'>Image Interpretation Objectives</title><content type='html'>An example of a typical complex outdoor natural scene that a general knowledge-based image interpretation system might be expected to understand is shown in Figure 1.  An objective of such systems is to identify semantically meaningful visual entities in a digitized and segmented image of some scene.  That is, to correctly assign semantically meaningful labels (e.g., house, tree, grass, and so on) to regions in an image -- see [29,30].  A computer-based image interpretation system can be viewed as having two major components, a "low-level" component and a "high-level" component [19],[31].  In many respects, the low-level portion of the system is designed to mimic the early stages of visual image processing in human-like systems.  In these early stages, it is believed that scenes are partitioned, to some extent, into regions that are homogeneous with respect to some set of perceivable features (i.e., feature vector) in the scene [6],[40],[39].  To this extent, most low-level general purpose computer vision systems are designed to perform the same task.  An example of a partitioning (i.e., segmentation) of Figure 1 into homogeneous regions is shown in Figure 2.  The knowledge-based computer vision system we shall describe in this paper is not currently concerned with resegmenting portions of an image.  Rather, its task is to correctly label as many regions as possible in a given segmentation.&lt;br /&gt;&lt;br /&gt;This a direct quote from a 1984 paper on computer vision.  A great example of segmentation-driven scene understanding.  The content is similar enough to my own line of work that it could have been an excerpt from my own thesis.&lt;br /&gt;&lt;br /&gt;It is actually in a section called Image Interpretation Objectives from "&lt;a href="http://www.ai.sri.com/pub_list/585"&gt;Evidential Knowledge-Based Computer Vision&lt;/a&gt;" by Leonard P. Wesley, 1984.  I found this while reading lots of good&lt;a href="http://www.ai.sri.com/pub_list/technotes.php"&gt; tech reports from SRI International's AI Center&lt;/a&gt; in Menlo Park.  Some good stuff there by Tenenbaum, Barrow, Duda, Hart, Nillson, Fischler, Pereira, Pentland, Fua, Szeliski, to name a few.  Lots of stuff there is relevant to scene understanding and grounds the problem in robotics (since there was no "internet" vision back in the 70s and 80s).&lt;br /&gt;&lt;br /&gt;On another note, I still haven't been able to find a copy of the classic paper, &lt;span style="font-weight: bold;"&gt;Experiments in Interpretation-Guided Segmentation&lt;/span&gt; by Tenenbaum and Barrow from 1978.  If anybody knows where to find a pdf copy send me an email. &lt;span style="font-weight: bold;"&gt;UPDATE&lt;/span&gt;: Thanks to the quick reply!  I have the paper now.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-4771965188820696819?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/4771965188820696819/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2010/01/image-interpretation-objectives.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/4771965188820696819'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/4771965188820696819'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2010/01/image-interpretation-objectives.html' title='Image Interpretation Objectives'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-1286886463326167067</id><published>2009-12-10T14:22:00.003-05:00</published><updated>2009-12-10T14:36:09.756-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='nips'/><category scheme='http://www.blogger.com/atom/ns#' term='computer vision'/><title type='text'>Computer Vision Papers at NIPS 2009</title><content type='html'>Here I some computer vision papers I found interesting at NIPS 2009 in Vancouver.&lt;br /&gt;&lt;br /&gt;[&lt;a href="http://books.nips.cc/papers/files/nips22/NIPS2009_0004.pdf"&gt;pdf&lt;/a&gt;][&lt;a href="http://books.nips.cc/papers/files/nips22/NIPS2009_0004.bib"&gt;bib&lt;/a&gt;] &lt;i&gt;Unsupervised Detection of Regions of Interest Using Iterative Link Analysis&lt;/i&gt;  (NIPS 2009)&lt;br /&gt;&lt;a href="http://www.cs.cmu.edu/%7Egunhee/"&gt;Gunhee Kim&lt;/a&gt;, &lt;a href="http://web.mit.edu/torralba/www/"&gt;Antonio Torralba&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;[&lt;a href="http://books.nips.cc/papers/files/nips22/NIPS2009_0576.pdf"&gt;pdf&lt;/a&gt;][&lt;a href="http://books.nips.cc/papers/files/nips22/NIPS2009_0576.bib"&gt;bib&lt;/a&gt;] &lt;i&gt;Region-based Segmentation and Object Detection&lt;/i&gt; (NIPS 2009)&lt;br /&gt;&lt;a href="http://www.stanford.edu/%7Esgould/"&gt;Stephen Gould&lt;/a&gt;, Tianshi Gao, &lt;a href="http://ai.stanford.edu/%7Ekoller/"&gt;Daphne Koller&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;[&lt;a href="http://books.nips.cc/papers/files/nips22/NIPS2009_1002.pdf"&gt;pdf&lt;/a&gt;][&lt;a href="http://books.nips.cc/papers/files/nips22/NIPS2009_1002.bib"&gt;bib&lt;/a&gt;] &lt;i&gt;Segmenting Scenes by Matching Image Composites&lt;/i&gt; (NIPS 2009)&lt;br /&gt;&lt;a href="http://www.di.ens.fr/%7Erussell/index.html"&gt;Bryan Russell&lt;/a&gt;, &lt;a href="http://www.cs.cmu.edu/%7Eefros/"&gt;Alyosha Efros&lt;/a&gt;, &lt;a href="http://www.di.ens.fr/%7Ejosef/"&gt;Josef Sivic&lt;/a&gt;, &lt;a href="http://people.csail.mit.edu/billf/"&gt;Bill Freeman&lt;/a&gt;, &lt;a href="http://www.robots.ox.ac.uk/%7Eaz/"&gt;Andrew Zisserman&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;[&lt;a href="http://books.nips.cc/papers/files/nips22/NIPS2009_0086.pdf"&gt;pdf&lt;/a&gt;][&lt;a href="http://books.nips.cc/papers/files/nips22/NIPS2009_0086.bib"&gt;bib&lt;/a&gt;] &lt;i&gt;Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships&lt;/i&gt; (NIPS 2009)&lt;br /&gt;&lt;a href="http://www.cs.cmu.edu/%7Etmalisie/"&gt;Tomasz Malisiewicz&lt;/a&gt;, &lt;a href="http://www.cs.cmu.edu/%7Eefros/"&gt;Alyosha Efros&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-1286886463326167067?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/1286886463326167067/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2009/12/computer-vision-papers-at-nips-2009.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/1286886463326167067'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/1286886463326167067'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2009/12/computer-vision-papers-at-nips-2009.html' title='Computer Vision Papers at NIPS 2009'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-6038652194790520070</id><published>2009-12-06T15:45:00.004-05:00</published><updated>2009-12-06T16:06:28.762-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='peer review'/><category scheme='http://www.blogger.com/atom/ns#' term='computer science'/><category scheme='http://www.blogger.com/atom/ns#' term='publishing'/><title type='text'>On Science: passion towards solving a problem must come from within</title><content type='html'>I'm currently in Chicago while traveling to Vancouver (NIPS 2009 conference) where I'll be defending my research during Tuesday's poster session.  Instead of delving into the computational challenges that motivate my research, I want to take a step back and criticize what (sometimes? often?) happens during the publishing cycle.&lt;br /&gt;&lt;br /&gt;According to me, good research starts with the passion to solve a particular problem or address a specific concern.  Quite often, good research will raise more questions than it successfully solves.  Unfortunately, when we submit papers to conferences we are judged on the clarity of presentation, level of experimental validation, as well as overall completeness.  This means that the publishing cycle quite often promotes writing "cute" papers that have little long term impact in the field and can only be viewed as thorough and complete due to their narrow scope.  This is why we should not solely rely on peer review nor cater our scientific lives towards pleasing others.  Sometimes being a good scientist means breaking free from the norms that the world around us rigidly follows, sometimes publishing too often skews our research focus, and sometimes falling off the face of the earth for a period of time is necessary to push science in a new direction. &lt;br /&gt;&lt;br /&gt;I want to challenge every scientist to follow their dreams and attempt to solve the problems they truly care about and not just attempt to please peer review.  Maybe some think that the perversion of science (that is evaluating scientists by the number of publications they have) is okay, but in my book a scientific career which produces a single grand idea is superior to a career saturated with myriad "cute and thorough" papers.  I'm not particularly upset with the progress of Computer Vision, but I think more people should ponder the negative consequences of pulling the publish-trigger too often.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-6038652194790520070?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/6038652194790520070/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2009/12/on-science-passion-towards-solving.html#comment-form' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/6038652194790520070'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/6038652194790520070'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2009/12/on-science-passion-towards-solving.html' title='On Science: passion towards solving a problem must come from within'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-5795304264350554747</id><published>2009-11-24T11:17:00.007-05:00</published><updated>2009-12-02T09:55:29.043-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='concepts'/><category scheme='http://www.blogger.com/atom/ns#' term='attributes'/><category scheme='http://www.blogger.com/atom/ns#' term='language'/><category scheme='http://www.blogger.com/atom/ns#' term='categorization'/><title type='text'>Understanding the role of categories in object recognition</title><content type='html'>If we set aside our academic endeavors of publishing computer vision and machine learning papers and sincerely ask ourselves, "What is the purpose of recognition?" a very different story emerges.&lt;br /&gt;&lt;br /&gt;Let me first outline the contemporary stance on recognition (that is object recognition as is embraced by the computer vision community), which is actually a bit of a "non-stance" because many people working on recognition haven't bothered to understand the motivations, implications, and philosophical foundations of their work.  &lt;span style="font-weight: bold;"&gt;The standard view of recognition is that it is equivalent to categorization -- assigning an object its "correct" category is the goal of recognition&lt;/span&gt;.  Object recognition, as is found in vision papers, is commonly presented as single image recognition task which is not tied to an active and mobile agent that must understand and act in an environment around them.  These contrived tasks are partially to blame for making us think that categories are the ultimate truth.  Of course, once we've pinpointed the correct category we can look up information about the object category at hand in some sort of grand encyclopedia.   For example, once we've categorized an object as a bird we can simply recall the fact that "it flies" from such a source of knowledge.&lt;br /&gt;&lt;br /&gt;Most object recognition research is concerned with &lt;span style="font-weight: bold;"&gt;object representations&lt;/span&gt; (&lt;span style="font-style: italic;"&gt;what features to compute from an image&lt;/span&gt;) as well as supervised (and semi-supervised) &lt;span style="font-weight: bold;"&gt;machine learning techniques to learn object models from data&lt;/span&gt; in order to discriminate and thus "recognize" object categories.  The reason why object recognition has become so popular in the recent decade is that many researchers in AI/Robotics envision a successful vision system as a key component in any real-world robotic platform.  &lt;span style="font-weight: bold;"&gt;If you ask a human to describe their environment, we will probably use a bunch of nouns to enumerate the stuff around them, so surely nouns must be the basic building blocks of reality!&lt;/span&gt; In this post I want to question this commonsense assumption that categories are the building blocks of reality and propose a different way of coping with reality, one that doesn't try to directly estimate a category from visual data.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;I argue that just because nouns (and the categories they refer to) are the basis of effability for humans, it doesn't mean that nouns and categories are the quarks and gluons of recognition.&lt;/span&gt;  Language is a relatively recent phenomenon for humans (think evolutionary scale here), and it is absent in many animals inhabiting the earth beside us.  It is absurd to think that animals do not possess a faculty for recognition just because they do not have a language.  Since animals can quite effectively cope with the world around them, there must be hope for understanding recognition in a way that doesn't invoke linguistic concepts.&lt;br /&gt;&lt;br /&gt;Let me make my first disclaimer.  I am not against categories altogether -- they have their place.  The goal of language is human-human communication and intelligent robotic agents will inevitably have to map their internal modes of representation onto human language if we are to understand and deal with such artificial beings.  I just want to criticize the idea that categories are found deep within our (human) neural architecture and serve as the basis for recognition.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://bp3.blogger.com/_ZvcL7qhQ3B8/Rrk6X_5BhlI/AAAAAAAAAPY/H1X9omY9VIA/s400/caveman.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 177px; height: 177px;" src="http://bp3.blogger.com/_ZvcL7qhQ3B8/Rrk6X_5BhlI/AAAAAAAAAPY/H1X9omY9VIA/s400/caveman.jpg" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;Imagine a caveman and his daily life which requires quite a bit of "recognition"-abilities to cope with the world around him.   He must differentiate pernicious animals from edible ones, distinguish contentious cavefolk from his peaceful companions, and reason about the plethora of plants around him.  For each object that he recognizes, he must be able to determine whether it is edible, dangerous, poisonous, tasty, heavy, warm, etc.  In short, recognition amounts to predicting a set of attributes associated with an object.  &lt;span style="font-weight: bold;"&gt;Recognition is the linking of perceptible attributes (it is green and the size of my fist) to our past experiences and predicting attributes that are not conveyed by mere appearance. &lt;/span&gt; If we see a tiger, it is solely on our past experiences that we can call it dangerous.&lt;br /&gt;&lt;br /&gt;So imagine a vector space, where each dimension encodes an attribute such as edible, throwable, tasty, poisonous, kind, etc.  Each object can be represented as a point in this attribute space.  It is language that gives us categories as a shorthand to talk about commonly found objects.  Different cultures would give rise to different ways of cutting up the world, and this is consistent with what has been observed by psychologists.  Viewing &lt;span style="font-weight: bold;"&gt;categories as a way of compressing attribute vectors&lt;/span&gt; not only makes sense but is in agreement with the idea that categories culturally arose much later than the ability for humans to recognize objects. Thus it makes sense to think of category-free recognition.  Since a robotic agent who was programmed to think of the world in terms of categories will have to unroll categories to understand objects in terms of tangible properties if they are to make sense of the world around them, &lt;span style="font-weight: bold;"&gt;why not use the properties/attributes as the primary elements of recognition&lt;/span&gt; in the first place!?&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_6fdO7SbU8eM/SxWjVhIau7I/AAAAAAAAC6o/4Ed3tlLRC1Q/s1600/attributes.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 196px; height: 255px;" src="http://4.bp.blogspot.com/_6fdO7SbU8eM/SxWjVhIau7I/AAAAAAAAC6o/4Ed3tlLRC1Q/s400/attributes.png" alt="" id="BLOGGER_PHOTO_ID_5410410117421775794" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a href="http://www.cs.uiuc.edu/homes/afarhad2/index_files/Attributes.pdf"&gt;from Describing objects by their attributes&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;These ideas are not entirely new.  In Computer Vision, there is a CVPR 2009 paper &lt;a href="http://www.cs.uiuc.edu/homes/afarhad2/index_files/Attributes.pdf"&gt;Describing objects by their attributes&lt;/a&gt; by &lt;a href="http://www.cs.uiuc.edu/homes/afarhad2/"&gt;Farhadi&lt;/a&gt;, Endres, &lt;a href="http://www.cs.uiuc.edu/homes/dhoiem/"&gt;Hoiem&lt;/a&gt;, and Forsyth (from UIUC) which strives to understand objects directly using the ideas discussed above.  In the domain of thought recognition, the paper &lt;a href="http://www.ri.cmu.edu/pub_files/2009/12/395_paper.pdf"&gt;Zero-Shot Learning with Semantic Output Codes&lt;/a&gt; by &lt;a href="http://www.ri.cmu.edu/person.html?person_id=1584"&gt;Palatucci&lt;/a&gt;, Pomerleau, Hinton, and Mitchell strives to understand concepts in a similar semantic basis.&lt;br /&gt;&lt;br /&gt;I believe the field of computer vision has been conceptually stuck and the vehement reliance on rigid object categories is partially to blame.  We should read more Wittgenstein and focus more on understanding vision as a mere component of artificial intelligence.  If we play the recognize objects in a static image game (as Computer Vision is doing!) then we obtain a fragmented view of reality and cannot fully understand the relationship between recognition and intelligence.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-5795304264350554747?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/5795304264350554747/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2009/11/understanding-role-of-categories-in.html#comment-form' title='16 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/5795304264350554747'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/5795304264350554747'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2009/11/understanding-role-of-categories-in.html' title='Understanding the role of categories in object recognition'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://bp3.blogger.com/_ZvcL7qhQ3B8/Rrk6X_5BhlI/AAAAAAAAAPY/H1X9omY9VIA/s72-c/caveman.jpg' height='72' width='72'/><thr:total>16</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-4868419711935549044</id><published>2009-11-12T18:11:00.003-05:00</published><updated>2009-11-12T18:21:49.454-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='graphical models'/><category scheme='http://www.blogger.com/atom/ns#' term='tutorial'/><category scheme='http://www.blogger.com/atom/ns#' term='scene understanding'/><category scheme='http://www.blogger.com/atom/ns#' term='machine learning'/><title type='text'>Learning and Inference in Vision: from Features to Scene Understanding</title><content type='html'>&lt;div style="text-align: center;"&gt;&lt;img style="width: 434px; height: 64px;" src="http://www.ml.cmu.edu/images2/mlheader.gif" /&gt;&lt;br /&gt;&lt;/div&gt;Tomorrow, &lt;a href="http://www.cs.cmu.edu/%7Ejch1/"&gt;Jonathan Huang&lt;/a&gt; and I are giving a Computer Vision tutorial at the &lt;a href="http://www.cs.cmu.edu/%7Emldsym/"&gt;First MLD (Machine Learning Department) Research Symposium&lt;/a&gt; at CMU.  The title of our presentation is &lt;span style="font-weight: bold;"&gt;Learning and Inference in Vision: from Features to Scene Understanding&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;The goal of the tutorial is to expose Machine Learning students to state-of-the-art object recognition, scene understanding and the inference problems associated with such high-level recognition problems.  Our target audience is graduate students with little or no prior exposure to object recognition who would like to learn more about the use of probabilistic graphical models in Computer Vision.   We outline the difficulties present in object recognition/detection and outline several different models for jointly reasoning about multiple object hypotheses.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-4868419711935549044?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/4868419711935549044/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2009/11/learning-and-inference-in-vision-from.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/4868419711935549044'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/4868419711935549044'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2009/11/learning-and-inference-in-vision-from.html' title='Learning and Inference in Vision: from Features to Scene Understanding'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-2097927310060841809</id><published>2009-11-07T16:23:00.007-05:00</published><updated>2009-11-07T17:27:49.240-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='indexing'/><category scheme='http://www.blogger.com/atom/ns#' term='association'/><category scheme='http://www.blogger.com/atom/ns#' term='internet'/><category scheme='http://www.blogger.com/atom/ns#' term='visual memex'/><category scheme='http://www.blogger.com/atom/ns#' term='scene understanding'/><category scheme='http://www.blogger.com/atom/ns#' term='google'/><category scheme='http://www.blogger.com/atom/ns#' term='web'/><title type='text'>A model of thought: The Associative Indexing of the Memex</title><content type='html'>The &lt;a href="http://en.wikipedia.org/wiki/Memex"&gt;Memex&lt;/a&gt; "Memory Extender" is an organizational device, a conceptual device, and a framework for dealing with conceptual relationships in an associative way.  Abandoning the Aristotelian tradition of rooting concepts in definitions, the Memex suggests an association-based, non-parametric, and data-driven representation of concepts.&lt;br /&gt;&lt;br /&gt;Since the mind=software analogy is so deeply engraved in my thoughts, it is hard for me to see intelligent reasoning as anything but a computer program (albeit one which we might never discover/develop).  It is worthwhile to see sketches of the memex from an era before computers. (See Figure below).  However, with the modern Internet, a magnificent example of a Bush's ideology, with links denoting the associations between pages,  we need no better analogy.  Bush's critique of the artificiality of traditional schemes of indexing resonates in the world wide web.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;img style="width: 329px; height: 231px;" src="https://atlas.colorado.edu/%7Ehofmocke/digitalpoetry/images/memex.gif" /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;A Mechanical Memex Sketch&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;By extrapolating Bush's anti-indexing argument to visual object recognition, I realize that the blunder is to assign concepts to rigid categories.  The desire to break free from categorization was the chief motivation for my &lt;a href="http://www.cs.cmu.edu/%7Etmalisie/projects/nips09/"&gt;Visual Memex&lt;/a&gt; paper.  If Bush's ideas were so successful in predicting the modern Internet, we should ask ourselves, "Why are categories so prevalent in computational models of perception?"  &lt;span style="font-style: italic;"&gt;Maybe it is machine learning, with its own tradition of classes in supervised learning approaches, that has scarred the way we computer scientists see reality.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;“The human mind does not work that way. It operates by association. &lt;span style="font-weight: bold;"&gt;With one item in its grasp, it snaps instantly to the next that is suggested by the association of thoughts&lt;/span&gt;, in accordance with some intricate web of trails carried by the cells of the brain. It has other characteristics, of course; trails that are not frequently followed are prone to fade, items are not fully permanent, memory is transitory. Yet the speed of action, the intricacy of trails, the detail of mental pictures, is awe-inspiring beyond all else in nature.” -- Vannevar Bush&lt;br /&gt;&lt;br /&gt;Is Google's grasp of the world of information anything more than a Memex?  I'm not convinced that it is not.  While the feat of searching billions of web pages in real time has already been demonstrated by Google (and reinforced every day), the best computer vision approaches as of today resemble nothing like Google's data-driven way of representing concepts. I'm quite interested in pushing this link-based data-driven mentality to the next level in the field of Computer Vision.  Breaking free from the categorization assumptions that plague computational perception might the the key ingredient in the recipe for success.&lt;br /&gt;&lt;br /&gt;Instead of summarizing, here is another link to a well-written article on the &lt;a href="http://theblackx.com/a/memex.html"&gt;Memex&lt;/a&gt; by &lt;a href="http://finnb.net/"&gt;Finn Brunton&lt;/a&gt;.   Quoting Brunton, "The deepest implications of the Memex would begin to become apparent here: not the speed of retrieval, or even the association as such, but the fact that the association is arbitrary and can be shared, which begins to suggest that, at some level, the data itself is also arbitrary within the context of the Memex; that it may not be “the shape of thought,” emphasis on the the, but that it is the shape of a new thought, a mediated and mechanized thought, one that is described by queries and above all by links."&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-2097927310060841809?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/2097927310060841809/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2009/11/model-of-thought-associative-indexing.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/2097927310060841809'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/2097927310060841809'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2009/11/model-of-thought-associative-indexing.html' title='A model of thought: The Associative Indexing of the Memex'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-8295569609462763414</id><published>2009-11-05T16:43:00.009-05:00</published><updated>2009-11-05T17:14:37.721-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='nips'/><category scheme='http://www.blogger.com/atom/ns#' term='context challenge'/><category scheme='http://www.blogger.com/atom/ns#' term='context'/><category scheme='http://www.blogger.com/atom/ns#' term='torralba'/><category scheme='http://www.blogger.com/atom/ns#' term='object recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='visual memex'/><category scheme='http://www.blogger.com/atom/ns#' term='categorization'/><title type='text'>The Visual Memex: Visual Object Recognition Without Categories</title><content type='html'>&lt;div style="text-align: center;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_6fdO7SbU8eM/SvNId_ygK_I/AAAAAAAAC6I/CQVGFEv-wsg/s1600-h/memex_context1.png"&gt;&lt;img style="cursor: pointer; width: 339px; height: 89px;" src="http://1.bp.blogspot.com/_6fdO7SbU8eM/SvNId_ygK_I/AAAAAAAAC6I/CQVGFEv-wsg/s1600/memex_context1.png" alt="" id="BLOGGER_PHOTO_ID_5400740058324020210" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;span style="font-weight: bold;"&gt;Figure 1&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;I have discussed the limitations of using rigid object categories in computer vision, and my CVPR 2008 work on &lt;a href="http://www.cs.cmu.edu/%7Etmalisie/projects/cvpr08/"&gt;Recognition as Association&lt;/a&gt; was a move towards developing a category-free model of objects.  I was primarily concerned with local object recognition where the recognition problem was driven by the appearance/shape/texture features derived from within a segment (a region extraction from an image using an image segmentation algorithm).  Recognition of objects was done locally and independently per region, since I did not have good model of category-free context at that time.  I've given the problem of contextual object reasoning much thought over the past several years, and equipped with the power of graphical models and learning algorithms I now present a model for category-free object relationship reasoning.&lt;br /&gt;&lt;br /&gt;Now its 2009, and its no surprise that I have a paper on context.  Context is the new beast and all the cool kids are using it for scene understanding; however, categories are used so often for this problem that their use is rarely questioned.  In my &lt;a href="http://www.cs.cmu.edu/%7Etmalisie/projects/nips09/"&gt;NIPS 2009 paper&lt;/a&gt;, I present a category-free model of object relationships and address the problem of context-only recognition where the goal is to recognize an object solely based on contextual cues.  &lt;span style="font-weight: bold;"&gt;Figure 1&lt;/span&gt; shows an example of such a prediction task.   Given K objects and their spatial configuration, is it possible to predict the appearance of a hidden object at some spatial location?&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.cs.cmu.edu/%7Etmalisie/projects/nips09/memex2.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 340px; height: 97px;" src="http://www.cs.cmu.edu/%7Etmalisie/projects/nips09/memex2.png" alt="" border="0" /&gt;&lt;/a&gt;&lt;span style="font-weight: bold;"&gt;Figure 2&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;br /&gt;I present a model called the Visual Memex (visualized in &lt;span style="font-weight: bold;"&gt;Figure 2&lt;/span&gt;), which is a non-parametric graph-based model of visual concepts and their interactions. Unlike traditional approaches to object-object modeling which learn potentials between every pair of categories (the number of such pairs scales quadratically with the number of categories), I make no category assumptions for context.&lt;br /&gt;&lt;br /&gt;The official paper is out, and can be found on my project page:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.cs.cmu.edu/%7Etmalisie/"&gt;Tomasz Malisiewicz&lt;/a&gt;, &lt;a href="http://www.cs.cmu.edu/%7Eefros/"&gt;Alexei A. Efros&lt;/a&gt;. &lt;a href="http://www.cs.cmu.edu/%7Etmalisie/projects/nips09/"&gt;Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships.&lt;/a&gt; In &lt;a href="http://nips.cc/"&gt;NIPS&lt;/a&gt;, December 2009. &lt;a href="http://www.cs.cmu.edu/%7Etmalisie/projects/nips09/malisiewicz_nips09.pdf"&gt;PDF&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Abstract: The use of context is critical for scene understanding in computer   vision, where the recognition of an object is driven by both local   appearance and the object's relationship to other elements of the   scene (context).  Most current approaches rely on modeling the   relationships between object categories as a source of context. In   this paper we seek to move beyond categories to provide a richer   appearance-based model of context.  We present an exemplar-based   model of objects and their relationships, the Visual Memex,   that encodes both local appearance and 2D spatial context between   object instances. We evaluate our model on &lt;a href="http://web.mit.edu/torralba/www/"&gt;Torralba&lt;/a&gt;'s   proposed Context Challenge against a baseline category-based system.   Our experiments suggest that moving beyond categories for context   modeling appears to be quite beneficial, and may be the critical   missing ingredient in scene understanding systems.&lt;br /&gt;&lt;br /&gt;I gave at talk about my work yesterday at &lt;a href="http://www.cs.cmu.edu/%7Emisc-read/"&gt;CMU's Misc-read&lt;/a&gt; and received some good feedback.   I'll be at NIPS this December representing this body of research.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-8295569609462763414?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/8295569609462763414/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2009/11/visual-memex-visual-object-recognition.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/8295569609462763414'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/8295569609462763414'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2009/11/visual-memex-visual-object-recognition.html' title='The Visual Memex: Visual Object Recognition Without Categories'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_6fdO7SbU8eM/SvNId_ygK_I/AAAAAAAAC6I/CQVGFEv-wsg/s72-c/memex_context1.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-986058223144191165</id><published>2009-10-26T14:59:00.006-05:00</published><updated>2009-10-26T18:48:40.081-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='wittgentein'/><category scheme='http://www.blogger.com/atom/ns#' term='philosophy'/><category scheme='http://www.blogger.com/atom/ns#' term='meaning'/><category scheme='http://www.blogger.com/atom/ns#' term='concepts'/><category scheme='http://www.blogger.com/atom/ns#' term='categorization'/><title type='text'>Wittgenstein's Critique of Abstract Concepts</title><content type='html'>In his &lt;a href="http://en.wikipedia.org/wiki/Philosophical_Investigations"&gt;Philosophical Investigations&lt;/a&gt;, Wittgenstein argues against abstraction -- via several thought experiments he strives to annihilate the view that during their lives humans develop neat and consistent concepts in their minds (akin to building a dictionary). He criticizes the commonplace notions of meaning and concept formation (as were commonly used in philosophical circles at the time) and has contributed greatly to my own ideas regarding categorization in computer vision.&lt;br /&gt;&lt;br /&gt;Wittgenstein asks the reader to come up with the definition of the concept "game." While we can look up &lt;a href="http://www.merriam-webster.com/dictionary/game"&gt;the definition of "game" in a dictionary&lt;/a&gt;, we can't help but feel that any definition will be either too narrow or too broad.  The number of exceptions we would need in a single definition scales as the number of unique games we've been exposed to.  His point wasn't that game cannot be defined -- it was that the lack of a formal definition does not prevent us from using the word "game" correctly.   Think of a child growing up and being exposed to multi-player games, single-player games, fun games, competitive games, games that are primarily characterized by their display of athleticism (aka sports or Olympic Games). Let's not forget activities such as courting and the Stock Market which are also referred to as "games."  Wittgenstein criticizes the idea that during our lives we somehow determine what is common between all of those examples of games and form an abstract concept of game which determines how we categorize novel activities. For Wittgenstein, our concept of game is not much more than our exposure to activities labeled as games and our ability to re-apply the word game in future context. &lt;br /&gt;&lt;br /&gt;Wittgenstein's ideas are an antithesis to Platonic Realism and Aristotle's Classical notion of Categories, where concepts/categories are pure, well-defined, and possess neatly defined boundaries. For Wittgenstein, experience is the anchor which allows us to measure the similarity between a novel activity and past activities referred to as games.  &lt;span style="font-weight: bold;"&gt;Maybe the ineffability of experience isn't because internal concepts are inaccessible to introspection, maybe there is simply no internal library of concepts in the first place.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;An experience-based view of concepts (or as my advisor would say, a &lt;span style="font-style: italic;"&gt;data-driven theory of concepts&lt;/span&gt;) suggests that there is no surrogate for living a life rich with experience.  While this has implications for how one should live their own life, it also has implications in the field of artificial intelligence.  The modern enterprise of "internet vision" where images are labeled with categories and fed into a classifier has to be questioned.  While I have criticized categories, there are also problems with a purely data-driven large-database-based approach.  It seems that a good place to start is by pruning away redundant bits of information; however, judging what is redundant and how is still an open question.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-986058223144191165?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/986058223144191165/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2009/10/wittgensteins-critique-of-abstract.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/986058223144191165'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/986058223144191165'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2009/10/wittgensteins-critique-of-abstract.html' title='Wittgenstein&apos;s Critique of Abstract Concepts'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-2854664141789313683</id><published>2009-10-19T16:33:00.007-05:00</published><updated>2009-10-20T00:11:20.534-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='indoor recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='prototypes'/><category scheme='http://www.blogger.com/atom/ns#' term='distance function learning'/><category scheme='http://www.blogger.com/atom/ns#' term='torralba'/><category scheme='http://www.blogger.com/atom/ns#' term='gist'/><title type='text'>Scene Prototype Models for Indoor Image Recognition</title><content type='html'>In today's post I want to briefly discuss a computer vision paper which has caught my attention.&lt;br /&gt;&lt;br /&gt;In the paper &lt;a href="http://web.mit.edu/torralba/www/indoor.html"&gt;Recognizing Indoor Scenes&lt;/a&gt;, &lt;a href="http://people.csail.mit.edu/ariadna/"&gt;Quattoni&lt;/a&gt; and &lt;a href="http://web.mit.edu/torralba/www/"&gt;Torralba&lt;/a&gt; build a scene recognition system for categorizing indoor images.  Instead of performing learning directly in descriptor space (such as the GIST over the entire image), the authors use a "distance-space" representation. An image is described by a vector of distances to a large number of scene prototypes.  A scene prototype consists of a root feature (the global GIST) as well as features belonging to a small number of regions associated with the prototype.  One example of such a prototype might be an office scene with a monitor region in the center of the image and a keyboard region below it -- however the ROIs (which can be thought of as parts of the scene) are often more abstract and do not neatly correspond to a single object.&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a href="http://people.csail.mit.edu/torralba/publications/indoor.pdf"&gt;&lt;img style="width: 360px; height: 216px;" src="http://www.cs.cmu.edu/%7Etmalisie/images/scene_protos.png" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;The learning problem (which is solved once per category) is then to find the internal parameters of each prototype as well as the per-class prototype distance weights which are used for classification.  From a distance function learning point of view, it is rather interesting to see distances to many exemplars being used as opposed to the distance to a single focal exemplar.&lt;br /&gt;&lt;br /&gt;Although the authors report results on the image categorization task it is worthwhile to ask if scene prototypes could be used for object localization.  While it is easy to be the evil genius and devise an image that is unique enough such that it doesn't conform to any notion of a prototype, &lt;span style="font-style: italic;"&gt;I wouldn't be surprised if 80% of the images we encounter on the internet conform to a few hundred scene prototypes.&lt;/span&gt;  Of course the problem of learning such prototypes from data without prototype-labeling (which requires expert vision knowledge) is still open. Overall, I like the direction and ideas contained in this research paper and I'm looking forward to see how these ideas develop.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-2854664141789313683?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/2854664141789313683/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2009/10/scene-prototype-models-for-indoor-image.html#comment-form' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/2854664141789313683'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/2854664141789313683'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2009/10/scene-prototype-models-for-indoor-image.html' title='Scene Prototype Models for Indoor Image Recognition'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-8514294566909730967</id><published>2009-10-13T13:46:00.008-05:00</published><updated>2009-10-16T16:04:36.087-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='image segmentation'/><category scheme='http://www.blogger.com/atom/ns#' term='segmentation-driven recognition'/><title type='text'>What is segmentation-driven object recognition?</title><content type='html'>In this post, I want to discuss what the term "segmentation-driven object recognition" means to me.  While segmentation-only  and object recognition-only research papers are ubiquitous in vision conferences (such as CVPR , ICCV, and ECCV), a new research direction which uses segmentation for recognition has emerged.  Many researchers pushing in this direction are direct descendants of the great J. &lt;a href="http://www.cs.berkeley.edu/%7Emalik/"&gt;Malik&lt;/a&gt; such as &lt;a href="http://cseweb.ucsd.edu/%7Esjb/"&gt;Belongie&lt;/a&gt;, &lt;a href="http://www.cs.cmu.edu/%7Eefros/"&gt;Efros&lt;/a&gt;, &lt;a href="http://www.cs.sfu.ca/%7Emori/"&gt;Mori&lt;/a&gt;, and many others.  The best example of segmentation-driven recognition can be found in Rabinovich's &lt;a href="http://vision.ucsd.edu/sites/default/files/iccv07.pdf"&gt;Objects in Context&lt;/a&gt; paper.  The basic idea in this paper is to compute multiple stable segmentations of an input image using Ncuts and use a dense probabilistic graphical model over segments (combining local terms and segment-segment context) to recognize objects inside those regions.&lt;br /&gt;&lt;br /&gt;Segmentation-only research focuses on the actual image segmentation algorithms -- where the output of a segmentation algorithm is a partition of a 2D image into contiguous regions.  Algorithms such as mean-shift, normalized cuts, as well as 100s of probabilistic graphical models can be used produce such segmentations.   The Berkeley group (in an attempt to salvage "mid-level" vision) has been working diligently on boundary detection and image segmentation for over a decade.&lt;br /&gt;&lt;br /&gt;Recognition-only research generally focuses on new learning techniques or building systems to perform well on detection/classification benchmarks.  The sliding window approach coupled with bag-of-words models has dominated vision and is the unofficial method of choice.&lt;br /&gt;&lt;br /&gt;It is easy to relax the bag-of-words model, so let's focus on rectangles for a second.  If we do not use segmentation, the world of objects will have to conform to sliding rectangles and image parsing will inevitably look like this:&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a href="http://www.di.ens.fr/%7Erussell/projects/recognitionBySceneAlignment/index.html"&gt;&lt;img style="width: 368px; height: 148px;" src="http://www.di.ens.fr/%7Erussell/projects/recognitionBySceneAlignment/banner.jpg" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;(Taken from &lt;a href="http://www.di.ens.fr/%7Erussell/"&gt;Bryan Russell&lt;/a&gt;'s &lt;a href="http://www.di.ens.fr/%7Erussell/projects/recognitionBySceneAlignment/index.html"&gt;Object Recognition by Scene Alignment&lt;/a&gt; paper).&lt;br /&gt;&lt;br /&gt;It has been argued that segmentation is required to move beyond the world of rectangular windows if we are to successfully break up images into their constituent objects.  While some objects can be neatly approximated by a rectangle in the 2D image plane, to explain away an arbitrary image free-form regions must be used.  I have argued this point extensively in my BMVC 2007 paper, and the interesting result was that multiple segmentations must by used if we want to produce reasonable segments.  Sadly, &lt;span style="font-weight: bold;"&gt;segmentation is generally not good enough&lt;/span&gt; by itself to produce object-corresponding regions.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a href="http://www.cs.cmu.edu/%7Etmalisie/projects/bmvc07/"&gt;&lt;img style="width: 177px; height: 118px;" src="http://balaton.graphics.cs.cmu.edu/bmvc07/31_0459_001_004_895_002.png" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;(Here is an example of the Mean Shift algorithm where to get a single cow segment two adjacent regions had to be merged.)&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;The question of how to use segmentation algorithms for recognition is still open.&lt;/span&gt;  If segmentation could tessellate an image into "good" regions in one-shot then the goal of recognition is to simply label these regions and life becomes simple.  This is unfortunately far from reality.  While blobs of homogeneous appearance often correspond to things like sky, grass, and road, many objects do not pop out as a single segment.  I have proposed using a soup of such segments that come from different algorithms being ran with different parameters (and even merging pairs and triplets of such segments!) but this produces a large number of regions and thus making the recognition task harder.&lt;br /&gt;&lt;br /&gt;Using a soup of segments, a small fraction of the regions might be of high quality; however, recognition now has to throw away 1000s of misleading segments.  &lt;a href="http://www.cs.cmu.edu/%7Eabhinavg/Home.html"&gt;Abhinav Gupta&lt;/a&gt;, a new addition to CMU vision community, has pointed out that if we want to model context between segments (and for object-object relationships this means a quadratic dependence on the number of segments), using a large soup of segments in simply not tractable.  Either the number of segments or the number of context interactions has to be reduced in this case, but non-quadratic object-object context models are an open question.&lt;br /&gt;&lt;br /&gt;In conclusion, the representation used by segmentation (that of free-form regions) is superior to sliding window approaches which utilize rectangular windows.  However, off-the-shelf segmentation algorithms are still lacking with respect to their ability to generate such regions.  Why should an algorithm that doesn't know anything about objects be able to segment out objects?  I suspect that in the upcoming years we will see a flurry of learning-based segmenters that provide a blend of recognition and bottom-up grouping, and I envision such algorithms to be used a strictly non-feedforward way.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-8514294566909730967?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/8514294566909730967/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2009/10/what-is-segmentation-driven-object.html#comment-form' title='7 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/8514294566909730967'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/8514294566909730967'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2009/10/what-is-segmentation-driven-object.html' title='What is segmentation-driven object recognition?'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>7</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-400741918172906413</id><published>2009-09-19T03:48:00.004-05:00</published><updated>2009-09-19T21:56:09.339-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='yosemite'/><category scheme='http://www.blogger.com/atom/ns#' term='nips 2009'/><category scheme='http://www.blogger.com/atom/ns#' term='joint regulariztion'/><category scheme='http://www.blogger.com/atom/ns#' term='google internship'/><title type='text'>joint regularization slides</title><content type='html'>Trevor Darrell posted his &lt;a href="http://www.eecs.berkeley.edu/%7Etrevor/bavm.pdf"&gt;slides from BAVM about joint regularization across classifier learning&lt;/a&gt;.  I think this is a really cool and promising idea and I plan on applying it to my own research on local distance function learning when I get back to CMU in October.&lt;br /&gt;&lt;br /&gt;The idea is there should be significant overlap between what a cat classifier learns and what a dog classifier learns.  So why independently learn two classifiers?&lt;br /&gt;&lt;a href="http://nips.cc/Conferences/2009/Program/speaker-info.php?ID=6486"&gt;&lt;br /&gt;My paper on the Visual Memex got accepted to NIPS 2009&lt;/a&gt; so I will be there representing my work in December.  Be sure to read future blog posts about this work which strives to break free from using categories in Computer Vision.&lt;br /&gt;&lt;br /&gt;On another note, today was my last day interning at Google (a former Robograd was my mentor) and I will be driving back to Pittsburgh from Mountain View this Sunday.  Yosemite is the first stop!  I plan on doing some light hiking with my new Vibram Five Fingers!  I've been using them for deadlifting and they've been great for both working out and just chilling/coding around the Googleplex.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;img style="width: 214px; height: 161px;" src="http://www.vibramfivefingers.com/products/images/products/145//large.jpg" /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-400741918172906413?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/400741918172906413/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2009/09/joint-regularization-slides.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/400741918172906413'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/400741918172906413'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2009/09/joint-regularization-slides.html' title='joint regularization slides'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-2363313132649610763</id><published>2009-08-18T00:22:00.004-05:00</published><updated>2009-08-18T02:41:52.535-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='multi-task'/><category scheme='http://www.blogger.com/atom/ns#' term='computer vision'/><category scheme='http://www.blogger.com/atom/ns#' term='distance function learning'/><category scheme='http://www.blogger.com/atom/ns#' term='joint regulariztion'/><category scheme='http://www.blogger.com/atom/ns#' term='classification'/><category scheme='http://www.blogger.com/atom/ns#' term='machine learning'/><title type='text'>exciting stuff at BAVM2009 #1: joint regularization</title><content type='html'>There were a couple of cool computer vision ideas that I was exposed to at BAVM2009.  First, &lt;a href="http://www.eecs.berkeley.edu/%7Etrevor/"&gt;Trevor Darrell&lt;/a&gt; mentioned some cool work by &lt;a href="http://people.csail.mit.edu/ariadna/"&gt;Ariadna Quattoni&lt;/a&gt; on &lt;a href="http://conflate.net/icml/paper/2009/475"&gt;L1/L_inf regularization&lt;/a&gt;.  The basic idea, which has also recently been used in other ICML 2009 works such as &lt;a href="http://www.ri.cmu.edu/publication_view.html?pub_id=6390&amp;amp;menu_code=0307"&gt;Han Liu and Mark Palatucci's Blockwise Coordinate Descent&lt;/a&gt;, is that you want to regularize across a bunch of problems.  This is sometimes referred to as multi-task learning.  Imagine solving two SVM optimization problems to find linear classifiers for detecting cars and bicycles in images.  It is reasonable to expect that in high dimensional spaces these two classifiers will something in common.  To provide more intuition, it might be the case that your feature set provides many irrelevant variables and when learning these classifiers independently much work is spent on removing these dumb variables.  By doing some sort of joint regularization (or joint feature selection), you can share information across seemingly distinct classification problems.&lt;br /&gt;&lt;br /&gt;In fact, when I was talking about my own CVPR08 work &lt;a href="http://robotics.stanford.edu/%7Ekoller/"&gt;Daphne Koller&lt;/a&gt; suggested that this sort of regularization might work for my task of learning distance functions.  However, I am currently exploiting the independence that I get from not doing any cross-problem regularization by solving the distance function learning problems independently.  While regularization might be desirable, it couples problems and it might be difficult to solve hundreds of thousands of such problems jointly.&lt;br /&gt;&lt;br /&gt;I will mention some other cool works in future posts.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-2363313132649610763?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/2363313132649610763/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2009/08/exciting-stuff-at-bavm2009-1-joint.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/2363313132649610763'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/2363313132649610763'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2009/08/exciting-stuff-at-bavm2009-1-joint.html' title='exciting stuff at BAVM2009 #1: joint regularization'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-589920631225332174</id><published>2009-08-14T01:43:00.004-05:00</published><updated>2009-08-14T02:00:55.922-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='video'/><category scheme='http://www.blogger.com/atom/ns#' term='stanford'/><category scheme='http://www.blogger.com/atom/ns#' term='context'/><category scheme='http://www.blogger.com/atom/ns#' term='bay area'/><category scheme='http://www.blogger.com/atom/ns#' term='california'/><category scheme='http://www.blogger.com/atom/ns#' term='image understanding'/><category scheme='http://www.blogger.com/atom/ns#' term='berkely'/><category scheme='http://www.blogger.com/atom/ns#' term='workshop'/><category scheme='http://www.blogger.com/atom/ns#' term='networking'/><title type='text'>Bay Area Vision Meeting (BAVM 2009): Image and Video Understanding</title><content type='html'>Tomorrow (Friday) afternoon is &lt;a href="http://vision.stanford.edu/bavm2009/index.html"&gt;BAVM 2009&lt;/a&gt;, a Bay Area workshop on Image and Video Understanding, which will be held at Stanford this year.   It is being organized by Juan Carlos Niebles, one of Fei-Fei Li's students, and I will be there representing CMU.   I have a poster about some new research and getting feedback is always good, but I'm really excited about meeting some of the other graduate students who work on image understanding.   The Berkeley group has been pushing hard segmentation-driven image understanding so seeing what they're up to should be interesting.  There will also be many fellow Googlers and researchers from companies in the Bay Area so it will also be a good place to network.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a href="http://vision.stanford.edu/bavm2009/index.html"&gt;&lt;img src="http://vision.stanford.edu/bavm2009/img/bavmlogo_sm.png" alt="BAVM2009" /&gt; &lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;I look forward to hearing the invited speakers and the seeing the bleeding-edge stuff during the poster sessions. I'll try to blog a little bit about some of the coolest stuff I encounter when I get back.&lt;br /&gt;&lt;span style="text-decoration: underline;"&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-589920631225332174?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/589920631225332174/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2009/08/bay-area-vision-meeting-bavm-2009-image.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/589920631225332174'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/589920631225332174'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2009/08/bay-area-vision-meeting-bavm-2009-image.html' title='Bay Area Vision Meeting (BAVM 2009): Image and Video Understanding'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-5761775929037532555</id><published>2009-08-07T23:54:00.007-05:00</published><updated>2009-08-08T00:17:13.346-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='exemplars'/><category scheme='http://www.blogger.com/atom/ns#' term='graphs'/><category scheme='http://www.blogger.com/atom/ns#' term='visualization'/><category scheme='http://www.blogger.com/atom/ns#' term='non-parametric'/><title type='text'>Graphviz for Object Recognition Research</title><content type='html'>Many of the techniques that I employ for object recognition utilize a non-parametric representation of visual concepts.   In many such non-parametric models, examples of visual concepts are stored in a database as opposed to "abstracted away" as is commonly done when fitting a parametric appearance model.   When designing such non-parametric models, I find it important to &lt;span style="font-weight: bold;"&gt;visualize&lt;/span&gt; the relationships between concepts.  The ability to visualize what you're working on creates an intimate link between you and your ideas and can often drive creativity.&lt;br /&gt;&lt;br /&gt;One way to visualize a database of exemplar objects, or a "soup of concepts," is as a graph.   This generally makes sense when it is meaningful to define an edge between to atoms.  While a vector-drawing utility (such as Illustrator) is great for manually putting together graphs for presentations or papers, automated visualization of large graphs is critical for debugging many graph-based algorithms.&lt;br /&gt;&lt;br /&gt;A really cool (and &lt;span style="font-weight: bold;"&gt;secret&lt;/span&gt;) figure which I generated using &lt;a href="http://www.graphviz.org/"&gt;Graphviz&lt;/a&gt; somewhat recently can be seen below.  I use Matlab to write a simple .dot file and then call something like &lt;span style="font-style: italic;"&gt;neato&lt;/span&gt; to get the pdf output.  Click on the image to see the vectorized pdf automatically produced by Graphviz.&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;a href="http://www.cs.cmu.edu/%7Etmalisie/images/image_memex_radial.pdf"&gt;&lt;img style="width: 375px; height: 388px;" src="http://www.cs.cmu.edu/%7Etmalisie/images/image_memex_radial.png" alt="Graphviz generated graph" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;What does this graph show?  Its a secret... (details coming soon)&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-5761775929037532555?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/5761775929037532555/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2009/08/graphviz-for-object-recognition.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/5761775929037532555'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/5761775929037532555'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2009/08/graphviz-for-object-recognition.html' title='Graphviz for Object Recognition Research'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-9207315743843762050</id><published>2009-07-31T02:58:00.004-05:00</published><updated>2009-07-31T03:32:48.609-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='MATLAB'/><category scheme='http://www.blogger.com/atom/ns#' term='MATLAB code'/><category scheme='http://www.blogger.com/atom/ns#' term='code'/><category scheme='http://www.blogger.com/atom/ns#' term='newton&apos;s method'/><category scheme='http://www.blogger.com/atom/ns#' term='fractals'/><title type='text'>Simple Newton's Method Fractal code in MATLAB</title><content type='html'>Due to popular request I've sharing some very simple Newton's Method Fractal code in MATLAB. It produces the following 800x800 image (in about 2.5 seconds on my 2.4Ghz Macbook Pro):&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&gt;&gt; [niters,solutions] = matlab_fractal;&lt;br /&gt;&gt;&gt; imagesc(niters)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_6fdO7SbU8eM/SnKnCLdGxtI/AAAAAAAACjk/fwat4aphGc4/s1600-h/a.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 300px;" src="http://3.bp.blogspot.com/_6fdO7SbU8eM/SnKnCLdGxtI/AAAAAAAACjk/fwat4aphGc4/s400/a.png" alt="" id="BLOGGER_PHOTO_ID_5364533762028127954" border="0" /&gt;&lt;/a&gt;&lt;span style="font-size:100%;"&gt;function [niters,solutions] = matlab_fractal&lt;br /&gt;%Create Newton's Method Fractal Image&lt;br /&gt;%Tomasz Malisiewcz (tomasz@cmu.edu)&lt;br /&gt;%http://quantombone.blogspot.com/&lt;br /&gt;NITER = 40;&lt;br /&gt;threshold = .001;&lt;br /&gt;&lt;br /&gt;[xs,ys] = meshgrid(linspace(-1,1,800), linspace(-1,1,800));&lt;br /&gt;solutions = xs(:) + i*ys(:);&lt;br /&gt;select = 1:numel(xs);&lt;br /&gt;niters = NITER*ones(numel(xs), 1);&lt;br /&gt;&lt;br /&gt;for iteration = 1:NITER&lt;br /&gt;  oldi = solutions(select);&lt;br /&gt; &lt;br /&gt;  %in newton's method we have z_{i+1} = z_i - f(z_i) / f'(z_i)&lt;br /&gt;  solutions(select) = oldi - f(oldi) ./ fprime(oldi);&lt;br /&gt; &lt;br /&gt;  %check for convergence or NaN (division by zero)&lt;br /&gt;  differ = (oldi - solutions(select));&lt;br /&gt;  converged = abs(differ) &lt; threshold;&lt;br /&gt;  problematic = isnan(differ);&lt;br /&gt; &lt;br /&gt;  niters(select(converged)) = iteration;&lt;br /&gt;  niters(select(problematic)) = NITER+1;&lt;br /&gt;  select(converged | problematic) = [];&lt;br /&gt;end&lt;br /&gt;&lt;br /&gt;niters = reshape(niters,size(xs));&lt;br /&gt;solutions = reshape(solutions,size(xs));&lt;br /&gt;&lt;br /&gt;function res = f(x)&lt;br /&gt;res = (x.^2).*x - 1;&lt;br /&gt;&lt;br /&gt;function res = fprime(x)&lt;br /&gt;res = 3*x.^2;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-9207315743843762050?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/9207315743843762050/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2009/07/simple-newtons-method-fractal-code-in.html#comment-form' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/9207315743843762050'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/9207315743843762050'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2009/07/simple-newtons-method-fractal-code-in.html' title='Simple Newton&apos;s Method Fractal code in MATLAB'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_6fdO7SbU8eM/SnKnCLdGxtI/AAAAAAAACjk/fwat4aphGc4/s72-c/a.png' height='72' width='72'/><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-204150350326046869</id><published>2009-07-15T22:58:00.008-05:00</published><updated>2009-07-23T15:06:43.486-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='laser'/><category scheme='http://www.blogger.com/atom/ns#' term='spin image'/><category scheme='http://www.blogger.com/atom/ns#' term='3d recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='localization'/><category scheme='http://www.blogger.com/atom/ns#' term='sift'/><category scheme='http://www.blogger.com/atom/ns#' term='matching'/><category scheme='http://www.blogger.com/atom/ns#' term='descriptors'/><title type='text'>Spin Images for object recognition in 3D Laser Data</title><content type='html'>&lt;div style="text-align: left;"&gt;Today's post is about 3D object recognition, that is localization and recognition of objects from 3D laser data (and not the perception/recovery of 3D from 2D images).&lt;/div&gt;&lt;br /&gt;My first exposure to object recognition was in the context of specific object recognition inside 3D laser scans.    In specific object recognition, you are looking for 'stapler X' or 'computer keyboard Y' and not just any stapler/computer keyboard.  If the computer keyboard was black then it will always be black since we assume intrinsic appearance doesn't change in specific object recognition.  This is a different (and easier!) problem than category-based recognition where colors and shapes can change due to intra-class variation.&lt;br /&gt;&lt;br /&gt;The problem of specific object 3D recognition I'll be discussing is as follows:&lt;br /&gt;&lt;br /&gt;&lt;span style="color: rgb(0, 102, 0);"&gt;Given M detailed 3D object models, localize all (if any) of these objects (in any spatial configuration) in a 3D laser scan of a scene potentially containing much more stuff than just the objects of interest (aka the clutter).&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;There was actually quite a lot of research in this style of 3D recognition in the 1990's with the belief that 3D recognition would be much simpler than recognition from 2D images.  The idea (Marr's idea, actually) was that object recognition in 2D images would by preceded by object-identity independent 3D surface extraction so that 2D recognition would resemble this version of 3D recognition after some initial geometric processing.&lt;br /&gt;&lt;br /&gt;However, it ends up that many of the ambiguities present in 2D imagery were also present in 3D laser data -- the problems of bottom-up perceptual grouping were as difficult in 3D as in 2D.  Just because you have 3D locations associated with parts of an object does not make it any easier to tell where the object begins and where it ends (namely the problem of segmentation).  It is this inability to segment out objects that resulted in the widespread usage of local descriptors such as &lt;a href="http://en.wikipedia.org/wiki/Scale-invariant_feature_transform"&gt;SIFT&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Many of today's 2D object recognition problems rely on local descriptors which bypass the problem of segmentation, and it isn't surprising that the 3D recognition problem I described above was elegantly approached by A.E. Johnson and M. Hebert as early as 1997 via a local 3D descriptor known as a &lt;a href="http://www.ri.cmu.edu/publication_view.html?pub_id=3598"&gt;Spin Image&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_6fdO7SbU8eM/SmjCs7-wGPI/AAAAAAAACjc/jox13tXiatQ/s1600-h/si.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 320px; height: 266px;" src="http://2.bp.blogspot.com/_6fdO7SbU8eM/SmjCs7-wGPI/AAAAAAAACjc/jox13tXiatQ/s320/si.png" alt="" id="BLOGGER_PHOTO_ID_5361749433655498994" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;The idea behind a Spin Image is actually very similar to that of a SIFT descriptor used in image-based object recognition.  &lt;span style="font-weight: bold;"&gt;A spin image is a regional point descriptor used to characterize the shape properties of a 3D object with respect to a single oriented point.&lt;/span&gt;  It is called a "spin" image because the process of creating such a descriptor can be envisioned as spinning a sheet around the axis defined by an oriented point and collecting the contributions of nearby points.  Since a point's normal can be computed fairly robustly given its neighboring points, the spin image is highly robust to rigid transformations when defined with respect to this canonical frame.  Since it is 2D and not 3D it does lose some discriminative power -- two different yet related surfaces chunks can have the same spin image.  The idea behind using this descriptor for recognition is that we can compute many of these descriptors all over the surface of our object models as well as the input 3D laser scan.  We then have to perform matching over these descriptors to create some sort of correspondences (potentially spatially verified).&lt;br /&gt;&lt;br /&gt;(For a fairly recent overview of spin images as well as other similar regional shape descriptors and their applications to 3D object recognition check out Andrea Frome's ECCV 2004 paper, &lt;a href="http://www.ri.cmu.edu/publication_view.html?pub_id=4611"&gt;Recognizing Objects in Range Data Using Regional Point Descriptors&lt;/a&gt;.)&lt;br /&gt;&lt;br /&gt;Spin images aren't a thing of the past, in fact here is a link to a RSS 2009 paper by &lt;a href="http://www.cs.washington.edu/homes/kevinlai/index.html"&gt;Kevin Lai&lt;/a&gt; and Dieter Fox which uses spin images (and my local distance function learning approach!):&lt;br /&gt;&lt;a href="http://www.cs.washington.edu/ai/Mobile_Robotics/projects/3d-object-recognition/"&gt;3D Laser Scan Classification Using Web Data and Domain Adaptation&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-204150350326046869?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/204150350326046869/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2009/07/spin-images-for-object-recognition-in.html#comment-form' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/204150350326046869'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/204150350326046869'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2009/07/spin-images-for-object-recognition-in.html' title='Spin Images for object recognition in 3D Laser Data'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_6fdO7SbU8eM/SmjCs7-wGPI/AAAAAAAACjc/jox13tXiatQ/s72-c/si.png' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-3307675368405859118</id><published>2009-07-03T13:44:00.003-05:00</published><updated>2009-07-03T14:08:52.590-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='william james'/><category scheme='http://www.blogger.com/atom/ns#' term='paradigm'/><category scheme='http://www.blogger.com/atom/ns#' term='epistemology'/><category scheme='http://www.blogger.com/atom/ns#' term='philosophy'/><category scheme='http://www.blogger.com/atom/ns#' term='wittgenstein'/><category scheme='http://www.blogger.com/atom/ns#' term='computer science'/><category scheme='http://www.blogger.com/atom/ns#' term='realism'/><category scheme='http://www.blogger.com/atom/ns#' term='idealism'/><title type='text'>Linguistic Idealism</title><content type='html'>I have been an anti-realist since a freshman in college.  Due to my lack of philosophical vocabulary I might have even called myself an idealist back then.  However, looking back I think it would have been much better to use the word 'anti-realist.'  I was mainly opposed to the correspondence theory of truth which presupposes an external, observer independent, reality to which our thoughts and notions are supposed to adhere to.  It was in the context of the Philosophy of Science that I acquired my strong anti-realist views, (developing my views while taking Quantum Mechanics, Epistemology, and Artificial Intelligence courses at the same time).  Pragmatism -- the offspring of William James -- was the single best view which best summarized my philosophical views.  While pragmatism is a rejection of the absolutes, an abandonment of metaphysics, it does not get in the way of making progress in science.  It is merely a new a perspective on science, a view that does not undermine the creativity of the creator of scientific theories, a re-rendering of the scientist as more of an artist and less of a machine.&lt;br /&gt;&lt;br /&gt;However, pragmatism is not the anything-goes postmodern philosophy that many believe it to be. It is as if there is something about the world which compels scientists to do science in a similar way and for ideas to converge.  I recently came across the concept of Linguistic Idealism, and being a recent reader of Wittgenstein this is a truly novel concept for me.  Linguistic Idealism is a sort of dependence on language, or the Gamest-of-all-games that we (humans) play.  It is a sort of epiphany that all statements we make about the world are statements within the customs of language which results in a criticism of the validity of those statements with respect to correspondence to an external reality.  The criticism of statements' validity stems from the fact that they rely on language, a somewhat arbitrary set of customs and rules which we follow when we communicate.  Philosophers such as Sellars have gone as far as to say that all awareness is linguistically mediated.  If we step back, can we say anything at all about perception?&lt;br /&gt;&lt;br /&gt;I'm currently reading a book on Wittgenstein called "Wittgenstein's Copernican Revolution: The Question of Linguistic Idealism."&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-3307675368405859118?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/3307675368405859118/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2009/07/linguistic-idealism.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/3307675368405859118'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/3307675368405859118'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2009/07/linguistic-idealism.html' title='Linguistic Idealism'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-2479742591521146636</id><published>2009-06-29T22:00:00.002-05:00</published><updated>2009-06-29T22:14:52.973-05:00</updated><title type='text'>Its all about the data</title><content type='html'>I'm at Google this summer (Google summer internship round #2) because its where the data is.  If you want to recognize objects from images you need to learn what objects look like.  If you want to learn what an object looks like you need to have many examples of that object.  You then feed those instances into an algorithm to figure out its essence -- what it is about that object's appearance that makes it that object.  Google has the data and Google has the infrastructure to process that data, so I'm there for the summer.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-2479742591521146636?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/2479742591521146636/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2009/06/its-all-about-data.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/2479742591521146636'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/2479742591521146636'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2009/06/its-all-about-data.html' title='Its all about the data'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-7052125063822978997</id><published>2009-06-19T14:52:00.011-05:00</published><updated>2009-06-19T16:19:49.471-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='prototypes'/><category scheme='http://www.blogger.com/atom/ns#' term='distance function learning'/><category scheme='http://www.blogger.com/atom/ns#' term='density estimation'/><category scheme='http://www.blogger.com/atom/ns#' term='typicality effects'/><category scheme='http://www.blogger.com/atom/ns#' term='svm'/><category scheme='http://www.blogger.com/atom/ns#' term='classification'/><category scheme='http://www.blogger.com/atom/ns#' term='categorization'/><title type='text'>A Shift of Focus: Relying on Prototypes versus Support Vectors</title><content type='html'>The goal of today's blog post is to outline an important difference between traditional categorization models in Psychology such as Prototype Models, and Support Vector Machine (SVM) based models.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_6fdO7SbU8eM/Sjv_xordmRI/AAAAAAAACdk/w0qz47IC-Q0/s1600-h/svm_vs_proto.001.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 300px;" src="http://1.bp.blogspot.com/_6fdO7SbU8eM/Sjv_xordmRI/AAAAAAAACdk/w0qz47IC-Q0/s400/svm_vs_proto.001.png" alt="" id="BLOGGER_PHOTO_ID_5349150210630981906" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;When solving a SVM optimization problem in the dual (given a kernel function), the answer is represented as a set of weights associated with each of the data-centered kernels.    In the Figure above, a SVM is used to learn a decision boundary between the blue class (desks) and the red class (chairs).  The sparsity of such solutions means that only a small set of examples are used to define the class decision boundary.  All points on the wrong side of the decision boundary and barely yet correctly classified points (within the margin) have non-zero weights.  Many Machine Learning researchers get excited about the sparsity of such solutions because in theory, we only need to remember a small number of kernels for test time.  However, &lt;span style="font-weight: bold;"&gt;the decision boundary is defined with respect to the problematic examples&lt;/span&gt; (misclassified and barely classified ones) and not the most typical examples.  The most typical (and easy to recognize) examples are not even necessary to define the SVM decision boundary.  Two data sets that have the same problematic examples, but significant differences in the "well-classified" examples might result in the same exact SVM decision boundary.&lt;br /&gt;&lt;br /&gt;My problem with such boundary-based approaches is that by focusing only on the boundary between classes useful information is lost.  Consider what happens when two points are correctly classified (and fall well beyond the margin on their correct side): the distance-to-decision-boundary is not a good measure of class membership.   By &lt;span style="font-weight: bold;"&gt;failing to capture the "density" of data&lt;/span&gt;, the sparsity of such models can actually be a bad thing.  As with discriminative methods, reasoning about the support vectors is useful for close-call classification decisions, but we lose fine-scale membership details (aka "density information") far from the decision surface.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_6fdO7SbU8eM/Sjv_3V28rBI/AAAAAAAACds/Yd5rYd-4HSQ/s1600-h/svm_vs_proto.002.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 400px; height: 300px;" src="http://3.bp.blogspot.com/_6fdO7SbU8eM/Sjv_3V28rBI/AAAAAAAACds/Yd5rYd-4HSQ/s400/svm_vs_proto.002.png" alt="" id="BLOGGER_PHOTO_ID_5349150308658097170" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;In a single-prototype model (pictured above), a single prototype is used per class and distances-to-prototypes implicitly define the decision surface.  The &lt;span style="font-weight: bold;"&gt;focus is on exactly the 'most confident' examples, which are the prototypes&lt;/span&gt;.  Prototypes are created during training -- if we fit a Gaussian distribution to each class, the mean becomes the prototype.  Notice that by focusing on Prototypes, we gain density information near the prototype at the cost of losing fine-details near the decision boundary.  Single-Prototype models generally perform worse on forced-choice classification tasks when compared to their SVM-based discriminative counterparts; however, there are important regimes where &lt;span style="font-weight: bold;"&gt;too much emphasis on the decision boundary is a bad thing&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;In other words, Prototype Methods are best and what they were designed to do in categorization, namely capture Typicality Effects (see Rosch).  It would be interesting to come up with more applications where handing Typicality Effects and grading membership becomes more important than making close-call classification decision.  I suspect that in many real-world information retrieval applications (where high precision is required and low recall tolerated) going beyond boundary-based techniques is the right thing to do.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-7052125063822978997?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/7052125063822978997/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2009/06/shift-of-focus-relying-on-prototypes.html#comment-form' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/7052125063822978997'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/7052125063822978997'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2009/06/shift-of-focus-relying-on-prototypes.html' title='A Shift of Focus: Relying on Prototypes versus Support Vectors'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_6fdO7SbU8eM/Sjv_xordmRI/AAAAAAAACdk/w0qz47IC-Q0/s72-c/svm_vs_proto.001.png' height='72' width='72'/><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-6772806559968171246</id><published>2009-06-16T15:19:00.014-05:00</published><updated>2009-06-17T13:39:45.565-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='william james'/><category scheme='http://www.blogger.com/atom/ns#' term='edelman'/><category scheme='http://www.blogger.com/atom/ns#' term='seeing'/><category scheme='http://www.blogger.com/atom/ns#' term='psychology'/><category scheme='http://www.blogger.com/atom/ns#' term='paradigm'/><category scheme='http://www.blogger.com/atom/ns#' term='seeing as'/><category scheme='http://www.blogger.com/atom/ns#' term='philosophy'/><category scheme='http://www.blogger.com/atom/ns#' term='object recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='wittgenstein'/><category scheme='http://www.blogger.com/atom/ns#' term='review'/><category scheme='http://www.blogger.com/atom/ns#' term='concepts'/><category scheme='http://www.blogger.com/atom/ns#' term='theories'/><title type='text'>On Edelman's "On what it means to see"</title><content type='html'>I previously mentioned &lt;a href="http://kybele.psych.cornell.edu/%7Eedelman/"&gt;Shimon Edelman&lt;/a&gt; in my blog and why his ideas are important for the advancement of computer vision.  Today I want to post a review of a powerful and potentially influential 2009 piece written by Edelman.&lt;br /&gt;&lt;br /&gt;Below is a review of the June 16th, 2009 version of this paper:&lt;br /&gt;Shimon Edelman, &lt;a href="http://kybele.psych.cornell.edu/%7Eedelman/Archive/Edelman-what-it-means-to-see-revised.pdf"&gt;&lt;i&gt;On what it means to see, and what we can     do about it&lt;/i&gt;&lt;/a&gt;, in &lt;i&gt;Object Categorization: Computer   and Human Vision Perspectives&lt;/i&gt;, S. Dickinson, A. Leonardis, B.   Schiele, and M. J. Tarr, eds. (Cambridge University Press, 2009, in   press). Penultimate draft.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:78%;"&gt;I will refer to the article as OWMS (On What it Means to See).&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;The goal of Edelman's article is to demonstrate the limitations of conceptual vision (referred to as "seeing as"), criticize the modern computer vision paradigm as being overly conceptual, and show how providing a richer representation of a scene is required for advancing computer vision.&lt;br /&gt;&lt;br /&gt;Edelman proposes non-conceptual vision, where categorization isn't forced on an input -- "because the input may best be left altogether uninterpreted in the traditional sense." (OWMS)  I have to agree with the author, where abstracting away the image into a conceptual map is not only an impoverished view of the world, but it is not clear whether such a limited representation is useful for other tasks relying on vision (something like the bottom of Figure 1.2 in OWMS or the Figure seen below and taken from my &lt;a href="http://www.cs.cmu.edu/%7Etmalisie/presentations/vasc_association_talk.pdf"&gt;Recognition by Association&lt;/a&gt; talk).&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;div style="text-align: center;"&gt;&lt;span style="color: rgb(255, 0, 0);"&gt;Building a Conceptual Map = Abstracting Away&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;a href="http://3.bp.blogspot.com/_6fdO7SbU8eM/SjktQRHmQWI/AAAAAAAACdE/vRo5MU_R4Yo/s1600-h/left1.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: left; cursor: pointer; width: 240px; height: 180px;" src="http://3.bp.blogspot.com/_6fdO7SbU8eM/SjktQRHmQWI/AAAAAAAACdE/vRo5MU_R4Yo/s320/left1.png" alt="" id="BLOGGER_PHOTO_ID_5348355789975601506" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/_6fdO7SbU8eM/SjktWWppcpI/AAAAAAAACdM/CtdRHg2Yedo/s1600-h/right1.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 240px; height: 180px;" src="http://2.bp.blogspot.com/_6fdO7SbU8eM/SjktWWppcpI/AAAAAAAACdM/CtdRHg2Yedo/s320/right1.png" alt="" id="BLOGGER_PHOTO_ID_5348355894539809426" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Drawing on insights from the influential Philosopher Wittgenstein, Edelman discusses the difference between "seeing" versus "seeing as."  "Seeing as" is the easy-to-formalize map-pixels-to-objects attitude which modern computer vision students are spoon fed from the first day of graduate school -- and precisely the attitude which Edelman attacks in this wonderful article.  To explain "seeing" Edelman uses some nice prose from Wittgenstein's &lt;span style="font-style: italic;"&gt;Philosophical Investigations&lt;/span&gt;; however, instead of repeating the passages Edelman selected, I will complement the discussion with a relevant passage by William James:&lt;br /&gt;&lt;br /&gt;&lt;span style="font-style: italic;"&gt;The germinal question concerning things brought for the first time before consciousness is not the theoretic "What is that?" but the practical "Who goes there?" or rather, as Horwicz has admirably put it, "What is to be done?" ... In all our discussions about the intelligence of lower animals the only test we use is that of their acting as if for a purpose. &lt;/span&gt;(William James in Principles of Psychology, page 941)&lt;br /&gt;&lt;br /&gt;"Seeing as" is a non-invertible process that abstracts away visual information to produce a lower dimensional conceptual map (see Figure above), whereas "seeing" provides a richer representation of the input scene.  Its not exactly clear what is the best way to operationalize this "seeing" notion in a computer vision system, but the escapability-from-formalization might be one of the subtle points Edelman is trying to make about non-conceptual vision. Quoting Edelman, when "seeing" we are "letting the seething mass of categorization processes that in any purposive visual system vie for the privilege of interpreting the input be the representation of the scene, without allowing any one of them to gain the upper hand." (OWMS) Edelman goes on to criticize "seeing as" because vision systems have to be open-ended in the sense that we cannot specify ahead of time all the tasks that vision will be applied to.  According to Edelman, conceptual vision cannot capture the ineffability (or richness) of the human visual experience.  Linguistic concepts capture a mere subset of visual experience, and casting the goal of vision as providing a linguistic (or conceptual) interpretation is limited.  The sparsity of conceptual understanding is one key limitation of the modern computer vision paradigm.  Edelman also criticizes the notion of a "ground-truth" segmentation in computer vision, arguing that a fragmentation of the scene into useful chunks is in the eye of the beholder.&lt;br /&gt;&lt;br /&gt;To summarize, Edelman points out that "The missing component is the capacity for having rich visual experiences... The visual world is always more complex than can be expressed in terms of a ﬁxed set of concepts, most of which, moreover, only ever exist in the imagination of the beholder." (OWMS) Being a pragmatist, many of these words resonate deeply within my soul, and I'm particularly attracted to elements of Edelman's antirealism.&lt;br /&gt;&lt;br /&gt;I have to give two thumbs up to this article for pointing out the flaws in the current way computer vision scientists go about tackling vision problems (in other words researchers too often &lt;span style="font-weight: bold;"&gt;blindly work inside the current computer vision paradigm&lt;/span&gt; and do not often enough question fundamental assumptions which can help new paradigms arise).   Many similar concerns regarding Computer Vision I have already pointed out on this blog, and it is reassuring to find others point to similar paradigmatic weaknesses.  Such insights need to somehow leave the Philosophy/Psychology literature and make a long lasting impact in the CVPR/NIPS/ICCV/ECCV/ICML communities.  The problem is that too many researchers/hackers actually building vision systems and teaching Computer Vision courses have no clue who Wittgenstein is and that they can gain invaluabe insights from Philosophy and Psychology alike.  Computer Vision is simply not lacking computational methods, it is gaining critical insights that cannot be found inside an Emacs buffer.   In order to advance the field, one needs to: read, write, philosophize, as well as mathematize, exercise, diversify, be a hacker, be a speaker, be one with the terminal, be one with prose, be a teacher, always a student, a master of all trades; or simply put, be a &lt;span style="font-weight: bold;"&gt;Computer Vision Jedi&lt;/span&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-6772806559968171246?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/6772806559968171246/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2009/06/on-edelmans-on-what-it-means-to-see.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/6772806559968171246'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/6772806559968171246'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2009/06/on-edelmans-on-what-it-means-to-see.html' title='On Edelman&apos;s &quot;On what it means to see&quot;'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_6fdO7SbU8eM/SjktQRHmQWI/AAAAAAAACdE/vRo5MU_R4Yo/s72-c/left1.png' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-4974943126802188177</id><published>2009-06-12T01:45:00.003-05:00</published><updated>2009-06-12T02:16:17.609-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='exemplars'/><category scheme='http://www.blogger.com/atom/ns#' term='nips'/><category scheme='http://www.blogger.com/atom/ns#' term='prototypes'/><category scheme='http://www.blogger.com/atom/ns#' term='phish'/><category scheme='http://www.blogger.com/atom/ns#' term='artificial intelligence'/><category scheme='http://www.blogger.com/atom/ns#' term='categorization'/><title type='text'>Exemplars, Prototypes, and towards a Theory of Concepts for AI</title><content type='html'>While initial musings (and some early theories) on Categorization come from Philosophy (think &lt;a href="http://en.wikipedia.org/wiki/Categories_%28Aristotle%29"&gt;Categories by Aristotle&lt;/a&gt;), most modern research on Categorization which adheres to the scientific method comes from Psychology (&lt;a href="http://en.wikipedia.org/wiki/Concept_learning"&gt;Concept Learning on Wikipedia&lt;/a&gt;).  Two popular models which originate from Psychology literature are Prototype Theory and Exemplar Theory.  Summarizing briefly, categories in Prototype Theory are abstractions which summarize a category while categories in Exemplar Theory are represented nonparametrically.  While I'm personally a big proponent of Exemplar Theory (see my &lt;a href="http://www.cs.cmu.edu/%7Etmalisie/projects/cvpr08/"&gt;Recognition by Association CVPR2008 paper&lt;/a&gt;), I'm not going to discuss the details of my philosophical stance in this post.  I want to briefly point out the shortcomings of these two simplified views of concepts.&lt;br /&gt;&lt;br /&gt;Researchers focusing on Categorization are generally dealing with a very simplified (and overly academic) view of the world -- where the task is to categorize a single input stimulus.  The problem is that if we want a Theory of Concepts that will be the backbone of intelligent agents, we have to deal with &lt;span style="font-weight: bold;"&gt;relationships between concepts&lt;/span&gt; with as much fervor as the representations of concepts themselves.  While the debate concerning exemplars vs. prototypes has been restricted to these single stimulus categorization experiments, it is not clear to me why we should prematurely adhere to one of these polarized views before we consider how we can make sense of inter-category relationships.  In other words, if an exemplar-based view of concepts looks good (so-far) yet it is not as useful for modeling relationships as a prototype-view, then we have to change our views.  Following &lt;a href="http://en.wikipedia.org/wiki/William_James"&gt;James' pragmatic method&lt;/a&gt;, we should evaluate category representations with respect to a larger system embodied in an intelligent agent (and its ability to cope with the world) and not the overly academic single-stimulus experiments dominating experimental psychology.&lt;br /&gt;&lt;br /&gt;On another note, I submitted my most recent research to &lt;a href="http://nips.cc/"&gt;NIPS&lt;/a&gt; last week (supersecret for now), and went to a few &lt;a href="http://www.phish.com/"&gt;Phish&lt;/a&gt; concerts.  I'm driving to California next week and I start at Google at the end of June.  I also started reading a book on &lt;a href="http://www.cambridge.org/us/catalogue/catalogue.asp?isbn=0521813158"&gt;James and Wittgenstein&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.livephish.com/images/pix/shows/ph090606_01.jpg"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 412px; height: 189px;" src="http://www.livephish.com/images/pix/shows/ph090606_01.jpg" alt="" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-4974943126802188177?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/4974943126802188177/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2009/06/exemplars-prototypes-and-towards-theory.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/4974943126802188177'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/4974943126802188177'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2009/06/exemplars-prototypes-and-towards-theory.html' title='Exemplars, Prototypes, and towards a Theory of Concepts for AI'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-4594139548352565179</id><published>2009-03-30T14:51:00.003-05:00</published><updated>2009-03-30T15:27:39.219-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='perception'/><category scheme='http://www.blogger.com/atom/ns#' term='predictions'/><category scheme='http://www.blogger.com/atom/ns#' term='psychology'/><category scheme='http://www.blogger.com/atom/ns#' term='associations'/><category scheme='http://www.blogger.com/atom/ns#' term='time travel'/><category scheme='http://www.blogger.com/atom/ns#' term='moshe bar'/><category scheme='http://www.blogger.com/atom/ns#' term='brain'/><category scheme='http://www.blogger.com/atom/ns#' term='analogies'/><title type='text'>Time Travel, Perception, and Mind-wandering</title><content type='html'>Today's post is dedicated to ideas promulgated by &lt;a href="http://barlab.mgh.harvard.edu/index.htm"&gt;Bar&lt;/a&gt;'s most recent article, "&lt;a href="http://barlab.mgh.harvard.edu/papers/RSBar.pdf"&gt;The proactive brain: memory for predictions.&lt;/a&gt;"&lt;br /&gt;&lt;br /&gt;Bar builds on the foundation of his former thesis, namely that the brain's 'default' mode of operation is to daydream, fantasize, and continuously revisit and reshape past memories and experiences.  While it makes sense that traversing the internal network of past experiences is useful when trying to understand a complex novel phenomenon, why exert so much work when just 'chilling out' a.k.a. being in the 'default' mode?  Bar's proposal is that this seemingly wasteful daydreaming is actually crucial for generating virtual experiences and synthesizing not-directly-experienced, yet critically useful memories of alternate scenarios.  These 'alternate future memories' are how our brain recombines tidbits from actual experiences and helps us understand novel scenarios before they actually happen.  It makes sense that the brain has a method for 'densifying' the network of past experiences, but that this happens in the 'default' mode a truly bold view held by Bar.&lt;br /&gt;&lt;br /&gt;In the domain of visual perception and scene understanding, the world has much regularity.  Thus the predictions generated by our brain often match the percept, and thus accurate predictions rid us of the need to exert mental brainpower on certain predictable aspects of the world.  For example, seeing a bunch of cars on a road along with a bunch of windows on a building pre-sensitizes us so much with respect to seeing a stop sign in an intimate spatial relationship with the other objects that we don't need to perceive much more than speckle of red for a nanosecond to confirm its presence in the scene.&lt;br /&gt;&lt;br /&gt;Quoting Bar, "we are rarely in the 'now'" since when understanding the visual world we integrate information from multiple points in time.  We use the information perceptible to our senses (the now), memories of former experiences (the past), as well all of the recombined and synthesized scenarios explored by our brains and encoded as virtual memories (plausible futures).  In each moment of our waking life, our brains provide us with a shortlist of primed (to be expected) objects, contexts, and their configurations related to our immediate perceptible future.  Who says we can't travel through time? -- it seems we are already living a few seconds ahead of direct perception (the immediate now).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-4594139548352565179?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/4594139548352565179/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2009/03/time-travel-perception-and-mind.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/4594139548352565179'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/4594139548352565179'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2009/03/time-travel-perception-and-mind.html' title='Time Travel, Perception, and Mind-wandering'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-8482174970124275518</id><published>2009-03-29T11:22:00.005-05:00</published><updated>2009-03-29T11:49:09.124-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='large dataset'/><category scheme='http://www.blogger.com/atom/ns#' term='computer vision'/><category scheme='http://www.blogger.com/atom/ns#' term='summer internship'/><category scheme='http://www.blogger.com/atom/ns#' term='paradigm'/><category scheme='http://www.blogger.com/atom/ns#' term='object recognition'/><category scheme='http://www.blogger.com/atom/ns#' term='data-driven'/><category scheme='http://www.blogger.com/atom/ns#' term='google'/><title type='text'>My 2nd Summer Internship in Google's Computer Vision Research Group</title><content type='html'>This summer I will be going for my 2nd summer internship at Google's Computer Vision Research Group in Mountain View, CA.  My first real internship ever was last summer at Google -- I loved it.&lt;br /&gt;&lt;br /&gt;There are many reasons for going back for the summer.  Being in the research group and getting to address the same types of vision/recognition related problems as during my PhD is very important for me.  It is not just a typical software engineering internship -- I get an better overall picture of how object recognition research can impact the world at a large scale,  the Google-scale, before I finish my PhD and become set in my ways.  Being in an environment where one can develop something super cool and weeks later millions of people see a difference in the way they interact with the internet (via Google's services of course) is also super exciting.  Finally, the computing infrastructure that Google has set up for its researchers/engineers is unrivaled when it comes to large scale machine learning.&lt;br /&gt;&lt;br /&gt;Many Google researchers (such as &lt;a href="http://earningmyturns.blogspot.com/"&gt;Fernando Periera&lt;/a&gt;) are big advocates of the data-driven mentality, where using massive amounts of data coupled with simple algorithms has more promise than complex algorithms with small amounts of training data.  In earlier posts I already mentioned how &lt;a href="http://www.cs.cmu.edu/%7Eefros/"&gt;my advisor at CMU&lt;/a&gt; is a big advocate of this approach in Computer Vision.  This &lt;a href="http://www.computer.org/portal/cms_docs_intelligent/intelligent/homepage/2009/x2exp.pdf"&gt;Unreasonable Effectiveness of Data&lt;/a&gt; is a powerful mentality yet difficult to embrace with the computational resources offered by one's computer science department.  But this data-driven paradigm is not only viable at Google -- it is the essence of Google.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/15418143-8482174970124275518?l=quantombone.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://quantombone.blogspot.com/feeds/8482174970124275518/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://quantombone.blogspot.com/2009/03/my-2nd-summer-internship-in-googles.html#comment-form' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/8482174970124275518'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/15418143/posts/default/8482174970124275518'/><link rel='alternate' type='text/html' href='http://quantombone.blogspot.com/2009/03/my-2nd-summer-internship-in-googles.html' title='My 2nd Summer Internship in Google&apos;s Computer Vision Research Group'/><author><name>Tomasz Malisiewicz</name><uri>https://profiles.google.com/107912691630546731185</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='//lh6.googleusercontent.com/-ely56X4eTGQ/AAAAAAAAAAI/AAAAAAAAJ_g/VLmSVgSof5s/s512-c/photo.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-15418143.post-2456016741360002951</id><published>2009-03-26T01:58:00.003-05:00</published><updated>2009-03-26T02:08:34.665-05:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='perception'/><category scheme='http://www.blogger.com/atom/ns#' term='rosch'/><category scheme='http://www.blogger.com/atom/ns#' term='future directions'/><category scheme='http://www.blogger.com/atom/ns#' term='novel objects'/><category scheme='http://www.blogger.com/atom/ns#' term='image understanding'/><category scheme='http://www.blogger.com/atom/ns#' term='philosophy'/><category scheme='http://www.blogger.com/atom/ns#' term='pragmatism'/><category scheme='http://www.blogger.com/atom/ns#' term='categorization'/><title type='text'>Beyond Categorization: Getting Away From Object Categories in Computer Vision</title><content type='html'>Natural language evolved over thousands of years to become the powerful tool that is is today. When we say things using language to convey our experiences with the world, we can't help but refer to object categories. When we say things such as "this is a car" what we are actually saying is "this is an instance from the car category." Categories let us get away from referring to individual object instances -- in most cases knowing that something belongs to a particular category is more than enough knowledge to deal with it. This is a type of "understanding by compression" or understanding by abstracting away the unnecessary details. In the words of Rosch&lt;span id="gtbmisp_8" style="border: 0pt none ; margin: 0pt; padding: 0pt; background: transparent none repeat scroll 0% 0%; font-style: normal; font-variant: normal; font-weight: bold; line-height: normal; font-size-adjust: none; font-stretch: normal; position: static; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial; text-align: left; text-indent: 0pt; text-transform: none; text-decoration: underline; cursor: pointer;font-family:serif;font-size:100%;color:red;"   &gt;&lt;/span&gt;, "the task of category systems is to provide maximum information with the least cognitive effort." Rosch&lt;span id="gtbmisp_9" style="border: 0pt none ; margin: 0pt; padding: 0pt; background: transparent none repeat scroll 0% 0%; font-style: normal; font-variant: normal; font-weight: bold; line-height: normal; font-size-adjust: none; font-stretch: normal; position: static; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial; text-align: left; text-indent: 0pt; text-transform: none; text-decoration: underline; cursor: pointer;font-family:serif;font-size:100%;color:red;"   &gt;&lt;/span&gt; would probably agree that it only makes sense to talk about the &lt;span style="font-weight: bold;"&gt;utility&lt;/span&gt; of a category system (a for getting a grip on reality) as opposed to the &lt;span style="font-weight: bold;"&gt;truth value&lt;/span&gt; of a category system with respect how well it aligns to observer-independent reality. The degree of pragmatism expressed by Rosch &lt;span style="text-decoration: underline;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;/span&gt;is something that William James would have been proud of.&lt;br /&gt;&lt;br /&gt;From a very young age we are taught language and soon it takes over our inner world. We 'think' in language. Language provides us with a list of nouns -- a way of cutting up the world into categories. Different cultures have different languages that cut up the world differently and one might wonder how well the object categories contained in any given single language correspond to reality -- &lt;span style="font-style: italic;"&gt;if it even makes sense to talk about an observer independent reality&lt;/span&gt;. Rosch &lt;span style="text-decoration: underline;"&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;/span&gt;would argue that human categorization is the result of "psychological principles of categorization" and is more related to how we interact with the world than how the world is. If the only substances we ingested for nutrients&lt;span id="gtbmisp_12" style="border: 0pt none ; margin: 0pt; padding: 0pt; background: transparent none repeat scroll 0% 0%; font-style: normal; font-variant: normal; font-weight: bold; line-height: normal; font-size-adjust: none; font-stretch: normal; position: static; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial; text-align: left; text-indent: 0pt; text-transform: none; text-decoration: underline; cursor: pointer;font-family:serif;font-size:100%;color:green;"   &gt;&lt;/span&gt; were types of grass, then categorizing all of the different strains of grass with respect to flavor, vitamin content, color, etc would be beneficial for us (as a species). Rosch&lt;span id="gtbmisp_13" style="border: 0pt none ; margin: 0pt; padding: 0pt; background: transparent none repeat scroll 0% 0%; font-style: normal; font-variant: normal; font-weight: bold; line-height: normal; font-size-adjust: none; font-stretch: normal; position: static; -moz-background-clip: -moz-initial; -moz-background-origin: -moz-initial; -moz-background-inline-policy: -moz-initial; text-align: left; text-indent: 0pt; text-transform: none; text-decoration: underline; cursor: pointer;font-family:serif;font-size:100%;color:red;"   &gt;&lt;/span&gt; points out in her works that her ideas refer to categorization at the species-level and she calls it human categorization. She is not referring to a personal categorization; for example, the way a child might cluster concepts when he/she starts learning about the world.&lt;br /&gt;&lt;br /&gt;It is not at all clear to me whether we should be using the categories from natural language as the to-be-recognized entities in our image understanding systems. Many animals do not have a language with which they can compress percepts into neat little tokens -- yet they have no problem interacting with the world. Of course, if we want to build machines that understand the world around them in a way that they can communicate with us (humans), then language and its inherent categorization will play a crucial role.&lt;br /&gt;&lt;br /&gt;While we ultimately use language to convey our ideas to other humans, how early are the principles of categorization applied to perception? Is the grouping of percepts into categories even essential for perception? I doubt that anybody would argue that language and its inherent categorization is not useful for dealing with the world -- the only question is how it interacts with perception.&lt;br /&gt;&lt;br /&gt;Most computer vision researchers are stuck in the world of categorization and many systems rely on categorization at a very early stage. &lt;span style="font-weight: bold;"&gt;A problem with categorization is its inability to deal with novel categories&lt;/span&gt; -- something which humans must deal with at a very young age. We (humans) can often deal with arbitrary input and using analogies can still get a grip and the world around us (even when it is full of novel categories). One hy
