Does Vision Research Drive Deep Learning Startups?

What’s relationship between academic research and entrepreneurship? Does writing technical papers help startups? Does a government support of computer science research translate into commercial innovation? We’d all like to know!

I just spent a week at the Computer Vision and Pattern Recognition (CVPR) Conference in Honolulu. It is one of the three big academic conferences – along with Neural Inspired Processing Systems (NIPS) and the International Conference on Machine Learning (ICML) – that has embraced and proliferated cutting edge research in deep learning.   CVPR has grown rapidly, with attendance almost doubling annually over the past few years (to more than 5200 this year). More importantly, it has played a key role in encouraging all kinds of computer vision researchers to explore the potential for deep learning methods on almost every standing vision challenge.

So many teams submit worthwhile papers that it has adopted a format to expose as many people to as many papers as possible. Roughly 2500 papers were submitted this year, of which only about 30% are accepted. Even so how can people absorb more than 750 papers? All the accepted papers get into poster sessions, and those sessions are unlike any poster session I’ve seen before. You often find two or three authors surrounded by a crowd of 25 or more, each explaining the gist of the talk over and over again for all comers. Some notable papers are given a chance to be shared a rapid-fire FOUR minute summary in one of many parallel tracks. And even smaller handful of papers gets much bigger exposure – a whole TWELVE minute slot, including Q&A!

Remarkably the format does work – the short talks serve as useful teasers to draw people to the posters. A session of short talks gives a useful cross section of the key application problems and algorithmic methods across the whole field of computer vision. That broad sampling of the papers confirms the near-total domination of computer vision by deep learning methods.

You can find interesting interactive statistics on the submitted CVPR17 papers here and on the accepted papers here. Ironically, only 12% of the papers are assigned to the “CNN and Deep Learning” subcategory, but in reality, almost every other category, from “Scene Recognition and Scene Understanding” to “Video Events and Activities” also dominated by deep learning and CNN methods. Of course, there is still a role for classical image processing and geometry-based vision algorithms, but these topics occupy something of a backwater at CVPR. I believe this radical shift in focus reflect a combination of the power of deep learning methods on the inherent complexity (and ambiguity) of many vision tasks, but deep learning is also a shiny new “hammer” for many research teams – they want to find every vision problem that remotely resembles a “nail”!

Nevertheless, CVPR is so rich with interesting ideas that the old “drinking from a fire hose” analogy fits well. It would be hard to do justice to range of ideas and the overall buzz of activity there, but here are a few observations about the show and the content.

  • Academia meets industry – for recruiting. The conference space is built around a central room that serves for poster space, exhibition space and dining. This means you have a couple of thousand elite computer vision and neural network experts wandering around at any moment. The exhibitors –Microsoft, NVidia, Facebook Google, Baidu and even Apple, plus scores of smaller companies – ring the outside of the poster zone, all working to lure prospective recruits into their booths.
  • Creative applications abound. I was genuinely surprised at how broadly research teams are applying neural network methods. Yes, there is still work on the old workhorses of image classification and segmentation, but so much of the work is happening on newer problems in video question answering, processing of 3D scene information, analyzing human actions sequences, and using generative adversarial networks and weakly supervised systems to learn from limited data. The speed of progress is also impressive – some papers build on methods themselves first published at this CVPR 2017! On the other hand, not all the research shows breakthrough results. Even very novel algorithms sometimes show just minor improvements in accuracy over previous work, but improvements by a few percent add up quickly if rival groups release new results every few months.
  • Things are moving too fast for silos to develop.   I got a real sense that the crowd was really trying to absorb everything, not just focusing on narrow niches. In most big fields like computer vision, researchers tend to narrow their attention to specific set of problems or algorithms, and those areas develop highly specialized terminology and metrics. At CVPR, I found something closer to a common language across most of the domain, with everyone so eager to learn and adopt methods from others, regardless of the specialty. That cross-fertilization certainly seems to be aiding the ferocious pace of research.
  • The set of research institutions is remarkably diverse. With so much enthusiasm for deep learning and its applications to vision, an enormous range of universities, research institutes and companies are getting into the research game. To quantify the diversity, I took a significant sample of papers (about 100 out of the 780 published) and looked at the affiliations of the authors. I counted each unique institution on a paper, but did not count the number of authors from the same institution. In my sample, the typical paper had authors from 1.6 institutions, with a maximum of four institutions. I suspect that mixed participation reflects both overt collaboration and movement of people, both between universities and from universities into companies. In addition, one hundred unique institutions are involved in the one hundred papers I sampled. This means probably have many hundreds of institutions doing world-class research computer vision and deep learning. While the usual suspects – the leading tech universities – are doing more papers, the overall participation is extraordinarily broad.
  • Research hot spots by country. The geographical distribution of the institutions says a lot about how different academic communities are responding the deep learning phenomenon. As you’d expect, the US has the greatest number of author institutions, with Europe and China following closely. The rest of the Asia lags pretty far behind in sheer numbers. Here’s the make up of our sample, by country – the tiny slices in this chart are Belgium, Denmark, India, Israel, Sweden, Finland, Netherlands, Portugal, Australia and Austria.Europe altogether together (EU with UK;-), plus Switzerland) produced about 28% of the papers, significantly more than China, but still less than the US.
  • Research hotspots by university. We can also use this substantial sample to get a rough sense of which specific universities are putting the most effort on this. Here are the top ten institutions worldwide, for this sample of the paper authors – with some key institution in the UK (Oxford and Cambridge) and Germany (Karlsruhe and Max Planck Institute) rounding out the next ten:

    All this raises an interesting question – what’s the relationship academic research and startups in deep learning technology? Do countries with strong research also have strong startup communities? This is a question this authorship sample, plus the Cognite Ventures deep learning startup database, can try to answer, since we have a pretty good model of where the top computer vision startups are based, and we know where the research is based. In the chart below, I show the fraction of deep vision startups and the fraction of computer vision papers, for the top dozen countries (by number of startups):

    The data appears to tell some interesting stories:

    • US, UK and China really dominate in the combination of research and startups.
    • US participation computer vision research and startups is fairly balanced, though research actually lags a bit behind startup activity.
    • The UK actually has meaningfully more startup activity than research, relative to other countries. This may reflect good leveraging of worldwide research and a good climate for vision startups.
    • China is fairly strong on research relative to vision startups, suggesting perhaps upside potential for the Chinese startup scene, as the entrepreneurial community leverages more of the local research expertise
    • Though the numbers are not big enough to reach strong conclusions, it appears that research in Germany, France and Japan significantly exceeds any conversion of research into startups. I think this reflects the fairly strong research tradition, especially in Germany, with a developed overall startup culture and climate.


    Both the live experience of CVPR and the statistics underscore the real and vital link between research in computer vision and the emergence of a deep understanding of methods and applications for new enterprises. This simple analysis doesn’t directly reveal causality, but it is a pretty good bet that computer vision researchers often become deep learning founders and technologists, fueling the ongoing transformation of the vision space. More research means more and better startups.