Does Vision Research Drive Deep Learning Startups?

What’s relationship between academic research and entrepreneurship? Does writing technical papers help startups? Does a government support of computer science research translate into commercial innovation? We’d all like to know!

I just spent a week at the Computer Vision and Pattern Recognition (CVPR) Conference in Honolulu. It is one of the three big academic conferences – along with Neural Inspired Processing Systems (NIPS) and the International Conference on Machine Learning (ICML) – that has embraced and proliferated cutting edge research in deep learning.   CVPR has grown rapidly, with attendance almost doubling annually over the past few years (to more than 5200 this year). More importantly, it has played a key role in encouraging all kinds of computer vision researchers to explore the potential for deep learning methods on almost every standing vision challenge.

So many teams submit worthwhile papers that it has adopted a format to expose as many people to as many papers as possible. Roughly 2500 papers were submitted this year, of which only about 30% are accepted. Even so how can people absorb more than 750 papers? All the accepted papers get into poster sessions, and those sessions are unlike any poster session I’ve seen before. You often find two or three authors surrounded by a crowd of 25 or more, each explaining the gist of the talk over and over again for all comers. Some notable papers are given a chance to be shared a rapid-fire FOUR minute summary in one of many parallel tracks. And even smaller handful of papers gets much bigger exposure – a whole TWELVE minute slot, including Q&A!

Remarkably the format does work – the short talks serve as useful teasers to draw people to the posters. A session of short talks gives a useful cross section of the key application problems and algorithmic methods across the whole field of computer vision. That broad sampling of the papers confirms the near-total domination of computer vision by deep learning methods.

You can find interesting interactive statistics on the submitted CVPR17 papers here and on the accepted papers here. Ironically, only 12% of the papers are assigned to the “CNN and Deep Learning” subcategory, but in reality, almost every other category, from “Scene Recognition and Scene Understanding” to “Video Events and Activities” also dominated by deep learning and CNN methods. Of course, there is still a role for classical image processing and geometry-based vision algorithms, but these topics occupy something of a backwater at CVPR. I believe this radical shift in focus reflect a combination of the power of deep learning methods on the inherent complexity (and ambiguity) of many vision tasks, but deep learning is also a shiny new “hammer” for many research teams – they want to find every vision problem that remotely resembles a “nail”!

Nevertheless, CVPR is so rich with interesting ideas that the old “drinking from a fire hose” analogy fits well. It would be hard to do justice to range of ideas and the overall buzz of activity there, but here are a few observations about the show and the content.

  • Academia meets industry – for recruiting. The conference space is built around a central room that serves for poster space, exhibition space and dining. This means you have a couple of thousand elite computer vision and neural network experts wandering around at any moment. The exhibitors –Microsoft, NVidia, Facebook Google, Baidu and even Apple, plus scores of smaller companies – ring the outside of the poster zone, all working to lure prospective recruits into their booths.
  • Creative applications abound. I was genuinely surprised at how broadly research teams are applying neural network methods. Yes, there is still work on the old workhorses of image classification and segmentation, but so much of the work is happening on newer problems in video question answering, processing of 3D scene information, analyzing human actions sequences, and using generative adversarial networks and weakly supervised systems to learn from limited data. The speed of progress is also impressive – some papers build on methods themselves first published at this CVPR 2017! On the other hand, not all the research shows breakthrough results. Even very novel algorithms sometimes show just minor improvements in accuracy over previous work, but improvements by a few percent add up quickly if rival groups release new results every few months.
  • Things are moving too fast for silos to develop.   I got a real sense that the crowd was really trying to absorb everything, not just focusing on narrow niches. In most big fields like computer vision, researchers tend to narrow their attention to specific set of problems or algorithms, and those areas develop highly specialized terminology and metrics. At CVPR, I found something closer to a common language across most of the domain, with everyone so eager to learn and adopt methods from others, regardless of the specialty. That cross-fertilization certainly seems to be aiding the ferocious pace of research.
  • The set of research institutions is remarkably diverse. With so much enthusiasm for deep learning and its applications to vision, an enormous range of universities, research institutes and companies are getting into the research game. To quantify the diversity, I took a significant sample of papers (about 100 out of the 780 published) and looked at the affiliations of the authors. I counted each unique institution on a paper, but did not count the number of authors from the same institution. In my sample, the typical paper had authors from 1.6 institutions, with a maximum of four institutions. I suspect that mixed participation reflects both overt collaboration and movement of people, both between universities and from universities into companies. In addition, one hundred unique institutions are involved in the one hundred papers I sampled. This means probably have many hundreds of institutions doing world-class research computer vision and deep learning. While the usual suspects – the leading tech universities – are doing more papers, the overall participation is extraordinarily broad.
  • Research hot spots by country. The geographical distribution of the institutions says a lot about how different academic communities are responding the deep learning phenomenon. As you’d expect, the US has the greatest number of author institutions, with Europe and China following closely. The rest of the Asia lags pretty far behind in sheer numbers. Here’s the make up of our sample, by country – the tiny slices in this chart are Belgium, Denmark, India, Israel, Sweden, Finland, Netherlands, Portugal, Australia and Austria.Europe altogether together (EU with UK;-), plus Switzerland) produced about 28% of the papers, significantly more than China, but still less than the US.
  • Research hotspots by university. We can also use this substantial sample to get a rough sense of which specific universities are putting the most effort on this. Here are the top ten institutions worldwide, for this sample of the paper authors – with some key institution in the UK (Oxford and Cambridge) and Germany (Karlsruhe and Max Planck Institute) rounding out the next ten:

    All this raises an interesting question – what’s the relationship academic research and startups in deep learning technology? Do countries with strong research also have strong startup communities? This is a question this authorship sample, plus the Cognite Ventures deep learning startup database, can try to answer, since we have a pretty good model of where the top computer vision startups are based, and we know where the research is based. In the chart below, I show the fraction of deep vision startups and the fraction of computer vision papers, for the top dozen countries (by number of startups):

    The data appears to tell some interesting stories:

    • US, UK and China really dominate in the combination of research and startups.
    • US participation computer vision research and startups is fairly balanced, though research actually lags a bit behind startup activity.
    • The UK actually has meaningfully more startup activity than research, relative to other countries. This may reflect good leveraging of worldwide research and a good climate for vision startups.
    • China is fairly strong on research relative to vision startups, suggesting perhaps upside potential for the Chinese startup scene, as the entrepreneurial community leverages more of the local research expertise
    • Though the numbers are not big enough to reach strong conclusions, it appears that research in Germany, France and Japan significantly exceeds any conversion of research into startups. I think this reflects the fairly strong research tradition, especially in Germany, with a developed overall startup culture and climate.


    Both the live experience of CVPR and the statistics underscore the real and vital link between research in computer vision and the emergence of a deep understanding of methods and applications for new enterprises. This simple analysis doesn’t directly reveal causality, but it is a pretty good bet that computer vision researchers often become deep learning founders and technologists, fueling the ongoing transformation of the vision space. More research means more and better startups.

Deep Learning Startups in China: Report from the Leading Edge

Everyone knows the Chinese classic curse, “May you live in interesting times”. Well, it turns out, the Chinese origin for this pithy phrase is apocryphal – the British statesman, Austin Chamberlain, probably popularized the phrase in the 1930s and attributed it to the Chinese to lend it gravity. We do, however, live in interesting times, in no field better epitomized than in deep learning, and in no location more poignantly than in China.

I have just return from a ten day tour of Beijing, Shenzhen, Shanghai and Hangzhou, meeting with deep learning startups and giving a series of talks on the worldwide deep learning market. In the most fundamental ways, neither the technology, nor the applications, nor the startup process are so different from what you find in Silicon Valley or Europe, but the trip was full of little eye-openers about the deep learning in China, and about the entrepreneurial process there. It reinforced a few long-standing observations, but also shifted my point of view in important ways.

The most striking reflection on the China startup scene is how much it feels like the Silicon Valley environment, and how it seems to differ from other Asian markets. First, there seems to be quite of bit of money available, from classic VCs, from industrial sponsors and even from US semiconductor companies – Xilinx and NVidia have investments in these high-profile startups in China, but I’m sure other major players do too. Second, deep learning is very active, with much of the same “gold-rush” feeling I observe in the US. This contrasts with the Taiwan, Japan and Korea markets, where the deep learning startup theme is less developed, either because startups less central to the business environment (Japan) or because the deep learning enthusiasm has not grown so intense (Taiwan, Korea). Ample funding and enthusiasm also means rapid growth of significant engineering teams – the smallest company I saw had 25 people, the biggest had about 400. California teams have evolved office layouts that look like Chinese ones – open offices without cubicle walls – and Chinese teams have developed the California tradition of endless free food. We are not there yet, but we closer than ever to a common startup culture spanning the Pacific.

Observation: The Chinese academic and industrial technical community is closely tuned into the explosion of activity in deep learning, and many companies are looking to leverage it in products. Baidu’s heavy investment in deep learning– with a research team of more than 1000 – is already well know. The number of papers, on deep learning from Chinese universities and the interest level among startups is also very high. Overall, Chinese industry seems to be gradually shifting from a “cost-down” mindset – focused on how to take established products and to optimize the whole bill-of-materials for lower cost – towards greater attention to functional innovation. A strong orientation towards hardware and system products remains: I have found many fewer pure-play cloud software startups in China than in the US or UK. Nevertheless, the original software content in these systems is growing rapidly. Almost every company I visited had polished and impressive demos of vehicle tracking, face recognition, crowd surveillance or demographic assessment.

Observation: Chinese startups are unafraid of doing new deep-learning silicon platforms. Quite a few of the software companies I visited are building or planning chips capturing their insights into neural network inference computation. Perhaps one in four Chinese startups is working towards silicon, while only one in 15 worldwide is doing custom silicon. One executive explained that that Chinese investors really like to see the potential differentiation that comes from chips, and startups believe that committing to silicon actually helps secure capital. This is in stark contrast to current Silicon Valley wisdom – that investors flee at the mention of chip investment. This striking dichotomy reflects a combination of perceived lower chip development costs in China (because of lower engineering salaries, avoidance of bleeding-edge semiconductor technologies below 20nm and smaller, niche-oriented designs) and the widespread belief that tying software to silicon protects software value. Ironically, silicon development is now strikingly rare among Silicon Valley startups, driven partly by the high costs, and long timelines for chip products, and partly by comparative attractiveness of cloud software startups for investment, where the upfront costs are so much less, and countless new market niches seems to appear weekly.

Observation: The China startups are almost entirely focused on real-time vision and audio applications, with only a supporting role for cloud-based deployment. Cars, human-machine interface and surveillance are the standout segments.   DJI, the world’s biggest consumer drone maker uses highly sophisticated deep learning for target tracking and gesture control. Most of the companies doing vision have capability and demos in identifying and tracking vehicles, pedestrians and bicycles, which applies to both automotive driver assistance/self-driving vehicles and surveillance. Top startups in the vision space include Cambricon, DeepGlint, Deephi, Emotibot, Megvii, Horizon Robotics, Intellifusion, Minieye, Momenta, MorphX, Rokid, SenseTime, and Zero Zero Robotics. Audio systems are also a big area, with particular emphasis on automated speech recognition, including increasing embedded real-time speech processing. Top startups here include AISpeech, Mobvoi, and Unisound.

Observation: China has a disproportionally large surveillance market, and correspondingly heavy interest in deep learning-based applications for face identification, pedestrian tracking, and crowd monitoring.   China is already the world’s largest video surveillance market and has been among the fastest growing. Chinese suppliers describe usage scenarios of identifying and tracking criminals, but China does not have a serious conventional crime problem (both violent crime and property crime are well below US levels, for example). To some extent, “monitoring crime” is a code word for monitoring political disturbances, a deep obsession of the Chinese Communist Party. This is not the only driver for vision-based applications – face ID for access control and transaction verification are also important.

Over the course of ten days, I saw some particularly interesting startups:

  • Horizon Robotics [Beijing]: Horizon is a deep-learning powerhouse, led by Yu Kai. With 220 people, they are innovating across a broad front on vision systems, including smart home, gaming, voice-vision integration, and self-driving cars. They have also adopted a tight hardware-software integration strategy for more complete and efficient solutions.
  • Horizon is a deep-learning powerhouse, led by Yu Kai. With 220 people, they are innovating across a broad front on vision systems, including smart home, gaming, voice-vision integration, and self-driving cars. They have also adopted a tight hardware-software integration strategy for more complete and efficient solutions.
  • Intellifusion [Shenzhen]: Intellifusion is fairly complete video system supplier with close ties to the government organizations deploying public surveillance. They currently deploy their own servers with GPUs and FPGAs, but moving increasing functionality into edge devices, like the cameras themselves.
  • NextVPU [Shanghai]: NextVPU is the youngest (about 12 months old) and smallest (24 people) of the startups I saw. They are pursuing AR, robotics and ADAS systems, but their first product, an AR headset of the visually impaired, is compelling in both a technical and social sense. Their headset does full scene segmentation for pedestrian navigation and recognizes dozens of key objects – signs, obstacles and common home and urban elements to help their users.
  • Deephi [Beijing]: Deephi is one of the most advanced and impressive of all the deep learning startups, with close ties to both the leading technical university in China, Tsinghua, and leading US research universities.   They have a particularly sophisticated understanding of what it takes to map leading edge neural networks into the small power, compute and memory resources of embedded devices, using world-class compression techniques. They are pursuing both surveillance (vision) and data-center (vision and speech) applications with a range of innovative programmable and optimized inference architectures.
  • Sensetime [Beijing]: Sensetime is one of the biggest and most visible software startups using deep learning for vision. They have impressive demos spanning surveillance, face recognition, demographic and mood analysis, and street view identification and tracking of vehicles. They are sufficiently competent and confident to have developed their own training framework, Parrots, in lieu of Caffe, Tensor Flow and the other standard platforms.
  • Megvii [Beijing]: Megvii is a prominent Chinese “unicorn” – a startup valued at US$1B+, and is often known by the name of their leading application, Face++. Face++ is a sophisticated and widely used face ID environment, leveraging the Chinese government’s face database. This official and definitive database enables customer verification for transaction systems like Alibaba’s AliPay and DiDi’s leading ride hailing system. They show an impressive collection of demos for recognition and augmented reality and “augmented photography. Like many other Chinese companies, Megvii is moving functionality from the cloud to embedded devices, to improve latency, availability and security.
  • Bitmain [Beijing]: Bitmain is hardly a startup and is wildly successful in non-deep learning applications, specifically cyptocurrency mining hardware. They have become the biggest supplier of ASICs for computational hashing, especially for Bitcoin, but now spreading into rival currencies like Litecoin. Founded in 2013, they hit US$100 in revenue in 2014 and are on track to do US$500M this year. This revenue stream and profitability is allowing them to explore new areas, and deep learning platforms seem to be a candidate for further expansion.

Here’s a more complete list of top Chinese deep-learning startups

Name Description
4Paradigm Scaled deep learning cloud platform
AISpeech Real-time and cloud-based automated speech recognition for car, home and robot UI
Cambricon Device and cloud processors for AI
DeepGlint 3D computer vision and deep learning for human & vehicle detection, tracking and recognition
Deephi Compressed CNN networks and processors
Emotibot A natural interaction interface between human and machine based on multi-modal
Face++ Face recognition
Horizon Robotics Smart Home, automotive and Public safety
ICarbonX Individualized health analysis and prediction of health index by machine analysis
Intellifusion Cloud-based deep learning for public safety and industrial monitoring
Minieye ADAS vision cameras and software
Mobvoi Smart watch with voice search using cloud
Momenta AI platform for level 5 autonomous driving
MorpX Commercializes computer vision and deep learning technologies for low-cost/-power platforms
Rokid Home personal assistant – ASR + face/gesture
SeetaTech Open source development platform to enable enterprise vision and machine learning
SenseTime Computer vision
TUPU Image recognition technology and services
tuSimple Software for self-driving cars: detection and tracking, flow, SLAM, segmentation, face analysis
Unisound AI-based speech and text
YITU Technology Computer vision for surveillance, transportation and medical imaging
Zero Zero Robotics Smart following drone camera

Of course, no one can claim to understand everything that’s happening in the vibrant Chinese startup community, least of all a non-native speaker. Nevertheless, everyone I spoke with in China validated this list of the top deep learning startups. Some were a bit surprised at the depth of the list, especially in identifying startups that were not yet on their radar. Both technically, and in exploring market trends, the China startup world is at the cutting edge in many areas. It bears close watching for worldwide impact.

The Cognite 300 Poster

Today I’m rolling out the Cognite 300 poster, a handy guide to the more focused and interesting startup companies in cognitive computing and deep learning.  I wrote about the ideas behind the creation of the list in an earlier blog posting:

Who are the most important start-ups in cognitive computing?

I have updated the on-line list every couple of months since the start of 2017, and will continue to do so, because the list keeps changing. Some companies close, some are acquired, some shift their focus. Most importantly, I continue to discover startups that belong on the list, so I will keep adding those, using approximately the same criteria of focus used for the first batch.

I should underscore that many potentially interesting companies haven’t gone on the list:

  • because it appears that AI is only a modest piece of their value proposition, or
  • because there is too little information on their websites to judge, or
  • because they doing interesting work on capturing and curating big data sets (that may ultimately require deep learning methods)  but don’t emphasize learning work themselves, or
  • I just failed to understand the significance of the company’s offerings

A few weeks ago, a venture capital colleague suggested that I should do a poster, to make the list more accessible and to communicate a bit of the big picture of segments and focus areas for these companies.  I classified the companies (alas, by hand, not with a neural network 😉 into 16 groups

  1. Sec – Security, Surveillance, Monitoring
  2. Cars – Autonomous Vehicles
  3. HMI – Human-Machine Interface
  4. Robot – Drones and Robots
  5. Chip – Silicon Platforms
  6. Plat -Deep Learning Cloud Compute/Data Platform and Services
  7. VaaSVision as a Service
  8. ALaaS – Audio and Language as a Service
  9. Mark – Marketing and Advertising
  10. CRM – Customer Relationship Management and Human Resources
  11. Manf – Operations, Logistics and Manufacturing
  12. Sat – Aerial Imaging and Mapping
  13. Med – Medicine and Health Care
  14. Media – Social Media, Entertainment and Lifestyle
  15. Fin – Finance, Insurance and Commerce
  16. IT – IT Operations and Security

I’ve also included an overlay of two broader categories, Vision and Embedded.  Many of the 16 categories fall cleanly inside or outside embedded and vision, but some categories include all the combinations.  A few companies span two of the 16 groups, so they are shown in both.

You may download and use the poster as you wish, so long as you reproduce it in its entirety, do not modify it and maintain the attribution to Cognite Ventures.

The Cognite 300 Startup Poster

Finally, I have also updated the list itself, including details on the classification of the startup by the 16 categories and the 2 broader classes, and identifying the primary country of operations.  For US companies I’ve also including the primary state.

The Cognitive Computing Startup List


How to Start an Embedded Vision Company – Part 3

This is the third installment of my thoughts on starting an embedded vision company. In part 1, I focused on the opportunity, especially how the explosion in the number of image sensors was overwhelming human capacity to directly view all the potential image streams, and creating a pressing need for orders-of-magnitude increase in the volume and intelligence of local vision processing. In part 2, I shifted to a discussion of some core startup principles and models for teams in embedded vision or beyond. In this final section, I focus on how the combination of the big vision opportunity, plus the inherent agility (and weakness!) of startups can guide teams to some winning technologies and business approaches.

Consider areas of high leverage on embedded vision problems:

  1. The Full Flow: Every step of the pixel flow from sensor interface, through the ISP, to the mix video analytics (classical and neural network-based) has impact on vision system performance. Together with choices in training data, user interface, application targeting and embedded vs. cloud application partitioning give an enormous range of options on vision quality, latency, cost, power, and functionality. That diversity of choices creates many potential niches where a startup can take root and grow, without having to attack huge obvious markets using the most mainstream techniques initially.
  2. Deep Neural Networks: At this point it is pretty obvious to that neural network methods are transforming computer vision. However, applying neural networks in vision is much more than just doing ImageNet classification. It pays to invest in thoroughly understanding the variety of both discrimination methods (object identification, localization, tracking) and generation methods. Neural-network-based image synthesis may start to play a significant role in augmenting or even displacing 3D graphics rendering in some scenarios. Moreover, Generative Adversarial Network methods allow a partially trained discrimination network and a generation network to iterate through refinements that improve both networks automatically.
  3. Data Sets: To find, create and repurpose data for better training is half the battle in deep learning. Having access to unique data sets can be the key differentiator for a startup, and brainstorming new problems that can be solved with available large data sets is a useful discipline in startup strategy development. Ways to maximize data leverage may include the following:
    1. Create an engagement model with customers, so that their data can contribute to the training data set for future use. Continuous data bootstrapping, perhaps spurred by free access to cloud service, may allow creation of large, unique training data collections.
    2. Build photo-realistic simulations of the usage scenes and sequences in your target world. The extracted image sequences are inherently labeled by the underlying scene structures and can generate large training sets to augment real world image captured training data. Moreover, simulation can systematically cover rare but important combinations of object motion, lighting, and camera impairments for added system robustness. For example the automotive technology startup up, AIMotive, builds both sophisticated fused cognition systems from image, LiDAR and radar streams, and sophisticated driving simulators with accurate 3D world to train and test neural network-based systems.
    3. Some embedded vision systems can be designed as subsets of bigger, more expensive server-based vision systems, especially when neural networks of heroic scale are development by cloud-based researchers. If the reference network is enough better than the goals for the embedded system, the behavior of that big model can be used as “ground truth” for the embedded system. This makes generation of large training sets for the embedded version much easier.
    4. Data augmentation is a powerful method. If you have only a moderate amount of training data, you may be able to apply a series of transformations to the data and allow prior labeling to be maintained. (We know a dog is still a dog, no matter how we scale it, rotate it or flip its image.) Be careful though – neural networks can be so discriminating that a network trained on artificial or augmented data, may only respond to such example, however similar those examples may be to real world data, in human perception.
  4. New Device Types: The low cost and high intelligence of vision subsystems is allowing imaging-based systems in lots of new form-factors. These new device types may create substantially new vision problems. Plausible new devices include augmented reality headsets and glasses, ultra-small always-alert “visual dust” motes, new kinds of vehicles from semi trucks to internal body “drones”, and cameras embedded in clothing, toys, disposable medical supplies, packaging materials, and other unconventional settings. It may not be necessary in these new devices to deliver either fine images, or achieve substantial autonomy. Instead, the imagers may just be the easiest way to get a little bit more information from the environment or insight about the user.
  5. New Silicon Platforms: Progress in the new hardware platforms for vision processing, especially for deep neural network methods, is nothing less than breathtaking. We’re seeing improvements in efficiency of at least 3x per year, which translates into both huge gains in absolute performance at the high end, and percolation of significant neural network capacity into low cost and low power consumer-class systems. Of course, 200% per year efficiency growth cannot continue for very long, but it does let design teams think big about what’s possible in a given form-factor and budget. This rapid advance in computing capacity appears to be happening in many different product categories – in server-grade GPUs, embedded GPUs, mobile phone apps processors, and deeply embedded platforms for IoT. As just one typical example, the widely used Tensilica Vision DSP IP cores have seen the multiply rate – a reasonable proxy for neural network compute throughput – increase by 16x (64 è 1024 8x8b multiplies per cycle per core) in just over 18 months. Almost every established chip company doing system-on-chip platforms is rolling out significant enhancements or new architectures to support deep learning. In addition, almost 20 new chip startups are taking the plunge with new platforms, typically aiming at huge throughput to rival high-end GPUs or ultra high efficiency to fit into IoT roles. This wealth of new platforms will make choosing a target platform more complex, but will also dramatically increase the potential speed and capability of new embedded vision platforms.
  6. More Than Just Vision: When planning an embedded vision product, it ‘s important to remember that embedded vision is a technology, not an end application. Some applications will be completely dominated by their vision component, but for many others the vision channel will be combined with many other information channels. This may come from other sensors, especially audio and motion sensors, or from user controls or from background data, especially cloud data. In addition, each vision node may be just one piece of a distributed application, so that node-to-node and node-to-cloud-to-node application coordination may be critical, especially in developing a wide assessment of a key issue or territory. Once all the channels of data are aggregate and analyzed, for example, through convolutional neural networks, what then? Much of the value of vision is in taking action, whether the action is real-time navigation, event alerts, emergency braking, visual or audio response to users, or updating of central event databases. In thinking about the product, map out the whole flow to capture a more complete view of user needs, dependencies on other services, computation and communication latencies and throughput bottlenecks, and competitive differentiators for the total experience.
  7. Point Solution to Platform: In the spirit of “crossing the chasm” it is often necessary to define the early product as a solution for a narrow constituency’s particular needs. Tight targeting of a point solution may let you stand out in a noisy market of much bigger players, and to reduce the integration risks faced by your potential early adopters. However, that also limits the scope of the system to just what you directly engineer. Opening up the interfaces and the business model to let both customers and third parties add functionality has two big benefits. First, it means that the applicability of your core technology can expand to markets and customers that you couldn’t necessarily serve with your finite resources to adapt and extend the product. Second, the more a customer invests their own engineering resources into writing code or developing peripheral hardware around your core product, the more stake they have in your success. Both practical and psychological factors make your product sticky. It turns a point product into a platform. Sometimes, that opening of the technology can leverage an open-source model, so long as some non-open, revenue-generating dimension remains. Proliferation is good, but is not the same as bookings. Some startups start with a platform approach, but that has challenges.   It may be difficult to get customers to invest to build your interfaces into their system if you’re too small and unproven, and it may be difficult to differentiate against big players able to simply declare an “de facto industry standard”.

Any startup walks a fine line between replicating what others have done before, and attempting something so novel that no one can appreciate the benefit. One useful way to look for practical novelty is to look at possible innovation around the image stream itself. Here are four ways you might think about new business around image streams:

  1. Take an existing image stream, and apply improved algorithms. For example, build technology that operates on user’s videos and does improved captioning, tagging and indexing.
  2. Take and existing image stream and extract new kinds of data beyond the original intent. For example, use outdoor surveillance video streams to do high resolution weather reporting, or look at traffic congestion.
  • Take an existing image stream and provide services on it under new business models. For example, build a software for user video search that doesn’t charge by copy or by subscription, but by success in finding specific events
  1. Build new image streams by putting cameras in new places. For example, chemical refiners are installing IR cameras that can identify leaks of otherwise invisible gases. A agricultural automation startup, Blue River, is putting sophisticated imaging on herbicide sprayers, so that herbicides can be applied just on recognized weeds, not on crop plants or bare soil, increasing yields and reducing chemical use.

Thinking beyond just the image stream can be important too. Consider ways that cameras, microphones and natural language processing methods can be combined to get richer insights into the environment and users intent.

  • Can the reflected sound of a aerial drone’s blades give additional information for obstacle avoidance?
  • Can the sound of tires on the road surface give clues about driving conditions for autonomous cars?
  • Can the pitch and content of voices give indications of stress levels in drivers, or crowds in public places?

The figure below explores a range of application and functions types using multiple modes of sensing and analysis

Autonomous Vehicles and Robotics Monitoring, Inspection and Surveillance Human-Machine Interface Personal Device Enhancement
Vision ·    Multi-sensor: image, depth, speed

·    Environmental assessment

·    Localization and odometry

·    Full surround views

·    Obstacle avoidance

·    Attention monitoring

·    Command interface

·    Multi-mode automatic speech recognition

·   Social photography

·   Augmented Reality

·   Localization and odometry

Audio ·    Ultrasonic sensing · Acoustic surveillance

· Health and performance monitoring

·   Mood analysis

·   Command interface

·  ASR in social media context

·  Hands-free UI

·  Audio geolocation

Natural Language · Access control

· Sentiment analysis

·   Sentiment analysis

·   Command interface

·  Real-time translation

·  Local service bots

·  Enhanced search

The variety of vision opportunities is truly unbounded. The combination of inexpensive image sensors, huge cognitive computing capacity, rich training data and ubiquitous communications makes this time absolutely unique. Doing a vision startup is hard, just as any startup venture is hard. Finding the right team, the right market, the right product and the right money is never easy, but the rewards, especially the emotional, technical and professional rewards can be enormous.

Good luck!

How to Start an Embedded Vision Company – Part 2

In my previous blog post, I outlined how the explosion of high-resolution, low-cost image sensors was transforming the nature of vision, as we rapidly evolve to a world where most pixels are never seen by humans, but captured, analyzed and used by embedded computing systems. This discontinuity is creating ample opportunities for new technologies, new business models and new companies. In this second part, we look at the basics ingredients of a startup, and two rival models of how to approach building a successful enterprise.

Let’s look at the basic ingredients of starting a technology business – not just a vision venture. We might call this “Startup 101”. The first ingredient is the team.

  • You will need depth of skills. It is impossible to be outstanding in absolutely everything, but success does depend on having world-class capability in one or two disciplines, usually including at least one technology dimension. Without leadership somewhere, it’s hard to differentiate from others, and to see farther down the road on emerging applications.
  • You don’t need to be world-class in everything, but having a breadth of skills across the basics – hardware, software, marketing, sales, fund-raising, strategy, infrastructure – will help enormously in moving forward as a business. The hardware/software need is obvious and usually first priority. You have to be able to develop and deliver something useful, unique and functional. But sooner or later you’ll also need to figure out how to describe it to customers, make the product and company visible, and complete business transactions. You’ll also need to put together some infrastructure, so that you can hire and pay people, get reasonably secure network access and supply toilet paper in the bathrooms.
  • Some level of experience on the team is important. You don’t need to be graybeards with rich startup and big company track records, but some level of real world history is enormously valuable. You need enough to avoid the rookie mistakes and to recognize the difference between a normal potholes and an existential crevasse.   Potholes you do your best to avoid, but it you have to survive a bump, you can. A bit of experience can alert you when you’re approaching the abyss, so you can do everything possible to get around it. Is there a simple formula for recognizing those crevasses? Unfortunately, no (but they often involve core team conflict, legal issues, or cash flow). Startups through a lot of issues, big and small, at the leadership team, so there will be plenty of opportunity to gain experience along the way.
  • The last key dimension of team is perhaps the most important, but also the most nebulous – character. Starting a company is hard work, with plenty of stress and emotion, because of the stakes. A team, capable and committed to openness, patience and honesty, will perform better, last longer, and have more fun than other teams. It does NOT mean that the team should agree all the time – part of the point of constructing a team with diverse skills is to get good “parallax perspective” on the thorniest problems. It DOES mean trusting one another to do their jobs, being willing to ask tough questions about assumptions and methods, and working hard for common effort. More than anything, it means putting ego and individual glory aside.

The second key ingredient for a startup is the product. Every startup’s product is different (or it had better be!), but here are four criteria to apply to the product concept:

  1. The product should be unique in at least one major dimension.
  • The uniqueness could be functionality – product does something that wasn’t possible before, or it does a set of functions together that were weren’t possible before.
  • The uniqueness could be performance – it does a known job faster, at lower power, cheaper or in a smaller form-factor than anyone else.
  • The uniqueness could be the business or usage model – it allows a task to be done by a new – usually less sophisticated – customer, or let’s the customer pay for it in a different way
  1. Building the product must be feasible.   It isn’t enough just to have a Mat Lab model of a great new vision algorithm – you need to make it work at the necessary speed, and fit in the memory of the target embedded platform, with all the interfaces to cameras, networks, storage and other software layers.
  2. The product should be defensible. Once others learn about the product, can they easily copy it? When you work with customers about real needs, will you be able to incorporate improvements more rapidly and more completely than others? Can you gather training data and interesting usage cases more quickly? Can you protect your code, your algorithms, and your hardware design from overt cloners?
  3. You should be able to explain the product relative to the competition? In some ideal world, customers would be able to appreciate the magnificence of your invention without any point of comparison – they would instantly understand how to improve their lives by buying your product.   In that ideal world you would have no competition.  In the long run, you ideally want to so dominate your segment that no one else comes close. However, if you have no initial reference point – no competition – you may struggle to discover and explain the product’s virtues. Having some competition is not a bad thing –it gives a preexisting reference point by which the performance, functionality and usage model breakthrough can be made vivid to potential customers. In fact, if you think you have no competition, you should probably go find some, at least for purpose of showing the superiority of your solution.

The third key ingredient for a startup is the target market: the group of users plus the context for use. Think “teenage girls” + “shopping for clothes” or “PCB layout designers” + “high frequency multi-layer board timing closure”.

Finding the right target market for a new technology product faces an inherent dilemma. In the early going, it is not hard to find a group of technology enthusiasts who will adopt the product because it is new, cool and powerful. They have an easy time picture how it may serve their uses and are comfortable with the hard work to adapt the inherent power of your technology to their needs. Company progress often stalls, however, once this small population of early adopters has embraced the product. The great majority of users are not looking for an application or integration challenge – they just want to get some job done. They may tolerate use of new technology from an untried startup, but only if it clearly addresses their use case. This transition to mainstream users has been characterized by author Geoffrey Moore as “crossing the chasm”. The recognized strategy for getting into wider use is to narrow the focus to more completely solve the problems of a smaller group of mainstream customers, often by solving the problem more completely for one vertical application or for one specific usage scenario. So “going vertical” puts the fear (and hypothetical potential) of the technology into the background and emphasizes the practical and accessible benefits of the superior solution.

This narrowing of focus, however, can sometimes create a dilemma in working with potential investors. Investors, especially big VCs want to hit homeruns by winning huge markets. They don’t want narrow niche plays.  The highly successful investor, Peter Thiel, dramatizes this point of view by saying “competition is for losers”, meaning that growing and dominating a new market can be much more profitable than participating in an existing commodity market.

The answer is to think about, and where appropriate, talk about the market strategy in two levels. First identify huge markets that are still under-served or latent. Then focus on an early niche within that emerging market which can be dominated by a concentrated effort, where the insights and technologies needed to master the niche are excellent preparation for larger and larger surrounding use-cases with the likely huge market. Talking about both the laser focus on the “beachhead” initial market AND the setup for leadership in the eventual market can often resolve the apparent paradox.

The accumulated wisdom of startup methods is evolving continuously, both as teams refine what works, and as the technology and applications create new business models [think advertising], new platforms [think applications as a cloud service], new investment methods [think crowd funding] and new team structures [think gig economy]. The software startup world, in particular, has been dramatically influenced by the “Lean Startup” principle.   This idea has evolved over the past fifteen year, spawned by the writing of Steve Blank, more than any one individual. It contrasts in key ways to the long-standing model, which we can call “Old School”.

Old School Lean Startup
Funding Seed Round based on team and idea, A Round to develop product, B Round after revenue Develop prototype to get Seed Round, A Round after revenue, B Round, if any, for global expansion
Product Types Hardware/software systems and silicon Easiest with software
Customer Acquisition Develop sales and marketing organization, to sell direct or build channel CEO and CTO are chief sales people until product and revenue potential proven in the market
Business models Mostly B2B with large transactions Web–centric B2B and B2C with subscriptions and micro-transactions

In vastly simplified form, the Lean Startup model is built on five elements of guidance:

  1. Rapidly develop a Minimum Viable Product (MVP) – the simplest product-like deliverable that customers will actually use in some form. Getting engaged with customers as early as possible gives you the kind of feedback on real problems that you cannot get from theoretical discussions. It gives you a chance to concentrate on the most customer-relevant features and skip the effort on features that customers are less likely to care about.
  2. Test prototypes on target users early and often – Once you have an MVP, you have a great platform to evolve incrementally. If you can figure out how to float new features into the product without jeopardizing basic functionality, then you can do rapid experimentation on customer usage. This allows the product to evolve more quickly.
  3. Measure market and technical progress dispassionately and scientifically – New markets and technologies often don’t follow old rules-of-thumb, so you may need to develop new more appropriate metrics of progress for both. Methods like A-B testing of alternative mechanisms can give quick feedback on real customer usage, and enhances a sense of honesty and realism in the effort.
  4. Don’t take too much money too soon – Taking money from investors is an implied commitment to deliver predictable returns in a predicable fashion. If you try to make that promise too early, people won’t believe you, so you won’t get the funding. Even if you can convince investors to fund you, taking money too early may make you commit to a product before you really know what works best. In some areas, like cloud software, powerful ideas can sometimes be developed and launched by small teams, so that little funding is absolutely necessary in the early days. Startup and funding culture have evolved together so that teams often don’t expect to get outside funding until they have their MVP. Some teams expect to be eating ramen noodles for many months.
  5. Leverage open source and crowd source thinking – It is hard to overstate the impact of open source on many areas of software. The combination of compelling economics, rapid evolution and vetting by a wide base of users makes open source important in two ways – as a building block within your own technical development strategy, and as part of a proliferation strategy that creates a wider market for your product. Crowd sourcing represents an analogous method to harness wide enthusiasm for your mission or product to gather and refine content, generate data and get startup funds.

As these methods have grown up in the cloud software world, they do not all automatically apply to embedded vision startups. Some technologies, like new silicon platforms, require such high upfront investments and expensive iterations that deferring funding or iterating customer prototypes may not be viable. In addition, powerful ideas like live (and unannounced) A-B testing on customers will not be acceptable for some embedded products, especially in safety-critical applications. The lean methods here work most obviously in on-line environments, with large numbers of customers and small transactions.   A design win for an embedded system may have much greater transaction value than any single order in on-line software, so the sales process may be quite different, with a significant role for well-orchestrated sales and marketing efforts with key customers. Nevertheless, we can compare typical “Old School” and “Lean Startup” methods across key areas like funding, product types, methods for getting customers and core business models.

How to Start an Embedded Vision Company — Part 1

Part 1: Why Vision


Since I started Cognite Ventures eight months ago, my activity with startup teams has ramped up dramatically. Many of these groups are targeting some kind embedded vision application, and many want advice on how to succeed. This conversation developed into an idiosyncratic set of thoughts on vision startup guidance, which in turn spawned a talk at the Embedded Vision Summit which I’m now expanding as a blog. You can find the slides here, but I will also break this conversation down into a three-part article.

Please allow me to start with some caveats! Every startup is different, every team is different, and the market is constantly evolving – so there is no right answer. Moreover, I have had success in my startups, especially Tensilica, but I can hardly claim that I have succeeded just because of following these principles. I have been blessed with an opportunity to work with remarkable teams, whose own talent, energy and principles have been enormously influential on the outcome. To the extent that I have directly contributed to startup success, is it because of applying these ideas? Or in spite of these ideas? Or just dumb luck?

I believe the current energy around new ventures in vision comes from two fundamental technical and market trends. First, the cost of capturing image streams has fallen dramatically. I can buy a HD resolution security camera with IR illumination and an aluminum housing for $13.91 on Amazon.   This implies that the core electronics – CMOS sensor, basic image signal processing and video output – probably costs about two dollars at the component level.   This reflects the manufacturing learning curve from the exploding volume of cameras. It’s useful to compare the trend for the population of people with the population of cameras on the planet, based on SemiCo data on image sensors from 2014 and assuming each sensor has a useful life in the wild of three years.

What does it say? First, it appears that the number of cameras crossed over the number of people sometime in the last year. This means that even if every human spent every moment of every day and night watching video, a significant fraction of the output of these cameras would go unwatched. Of course, many of these cameras are shut off, or sitting in someone’s pocket, or watching complete darkness at any given time. Nevertheless, it is certain that humans will very rarely see the captured images. If installing or carrying those cameras around is going to make any sense, it will because we used vision analysis to filter, select or act on the streams without human involvement in every frame.

But the list of implications goes on!

  • We now have more than 10B image sensors installed. If each can produce an HD video stream of 1080p60, we have potential raw content of roughly 100M pixels per second per camera, or 1018 new pixels per second, or something >1025 B per years of raw pixel data. If, foolishly, we tried to keep all the raw pixels, the storage requirement would exceed the annual production of hard disk plus NAND flash by a factor of rough 10,000. Even if we compressed the video down to 5Mbps, we would fill up a year’s supply of storage by sometime on January 4 of the next year. Clearly we’re not going to store all that potential content. (Utilization and tolerable compression rates will vary widely by type of camera – the camera on my phone is likely to be less active that a security camera, and some security cameras may get by on less than 5MBps, but the essential problem remains.)
  • Where do new bits come from? New bits are captured from the real world, or “synthesized” from other data. Synthesized data is credit card transactions, packet headers, stock trades, emails, and other data created within electronic systems as a byproduct of applications. Real world data can be pixels from cameras, or audio samples from microphones, or accelerometer data from MEMS sensors. Synthetic data is ultimately derived from real world data, though the transformations of human interaction, economic transactions and sharing.   Audio and motion sensors are rich sources of data, but their data rates are dramatically less – 3 to 5 orders of magnitude less – than that of even cheap image sensors. So virtually all of the real data of the world – and an interesting fraction of all electronic data – is pixels.
  • The overwhelming volume of pixels has deep implications for computing and communications. Consider that $13.91 video camera. Even if we found a way to ship that continuous video stream up to cloud, we couldn’t afford to use some x86 or GPU-enabled server to process all that content – over the life of that camera, we’d could easy spend thousands of dollars on that hardware (and power) dedicated to that video channel.   Similarly, 5Mbps of compressed video * 60 second * 60 minutes * 24 hours * 365 days is 12,960 Gbits per month. I don’t know about your wireless plan, but that’s more than my cellular wireless plan absorbs easily. So it is pretty clear that we’re not going to be able to either do the bulk of the video analysis on cloud servers, or communicate it via cellular. Wi-Fi networks may have no per-bit charges, and greater overall capacity, but wireless infrastructure will have trouble scaling to the necessary level to handle tens of billions of streams.  We must find ways to do most of the computing on embedded systems, so that no video, or only the most salient video is sent to the cloud for storage, further processing or human review and action.

The second reason for the enthusiasm for vision is the revolution in computation methods for extracting insights from image streams. In particular, the emergence of convolutional neural networks as a key analytical building block has dramatically improved the potential for vision systems to extract subtle insightful results from complex, noisy image streams. While no product is just a neural network, the increasingly well-understood vocabulary of gathering and labeling large data sets, constructing and training neural networks, and deploying those computational networks onto efficient embedded hardware, has become part of the basic language of vision startups.

When we reflect these observations back onto the vision market, we can discern three big useful categories of applications:

  1. Capture of images and video for human consumption. This incudes everything from fashion photography and snapshots posted on Facebook to Hollywood films and document scanning. This is the traditional realm of imaging, and much of the current technology base – image signal processing pipelines, video compression methods and video displays – are built around particular characteristics of the human visual system. This area has been the mainstay of digital imaging and video related products for the past two decades.   Innovation in new higher resolution formats, new cameras and new image enhancement remains a plausible area for startup activity even today, but it is not as hot as it has been. While this area has been the home of classical image enhancement methods, there is ample technical innovation in this category, for example, in new generative neural network models that can synthesize photo-realistic images.
  2. Capture of images and video, then filtering, reducing and organizing into a concise form for human decision-making.  This category includes a wide range of vision processing and analytics technologies, including most activity in video monitoring and surveillance. The key here is often to make huge bodies of video content tagged, indexed and searchable, and to filter out irrelevant content so only a tiny fraction needs to be uploaded, stored, reviewed or more exhaustively analyzed. This area is already active but we would expect even more, especially as teams work to exploit the potential for joint analytics spanning many cameras simultaneously.  Cloud applications are particularly important in this area, because its storage, computing and collaboration flexibility.
  3. Capture of images and video, analyzing and then using insights to take autonomous action. This domain has captured the world’s imagination in recent years, especially with the success of autonomous vehicle prototypes and smart aerial drones.   The rapid advances in convolutional neural networks are particularly vivid and important in this area, as vision processing becomes accurate and robust enough to trust with decision making in safety-critical systems. One of the key characteristics of these systems is short-latency, robustness and hard real-time performance. System architects will rely on autonomous vision systems to the extent that the systems can make guarantees of short decision latency and ~100% availability.

Needless to say, some good ideas may be hybrids of these three, especially in systems that use vision for some simple autonomous decision-making, but rely on humans for backup, or for more strategic decisions, based on the consolidated data.

In the next part of the article, we’ll take a look at the ingredients of a startup – team, product and target market – and look at some cogent lessons from the “lean startup” model that rules software entrepreneurship today.


What’s happening in startup funding?

I’ve spend the last few months digging into the intersection between the on-going deep learning revolution and the world-wide opportunity for startups. This little exercise has highlighted both how the startup funding world is evolving, and some of the unique issues and opportunities for deep learning-based startups.

Looking at some basic funding trends is a good place to start. Pitchbook as just published an excellent public summary of key quantitative trends in US startup funding:

These show the growth in the seed funding level and valuation, the stretching out of the pre-seed stage for companies and the a reduction in overall funding activity from the exceedingly frothy levels of 2015.

Let’s look at some key pictures – first seed funding:

That’s generally a pretty comforting trend – seed round funding levels and valuations increasing steadily over time, without direct signs of a funding bubble or “irrational enthusiasm”.   This says that strong teams with great ideas and demonstrated progress on their initial product (their Minimum Viable Product or “MVP”) are learning from early trial customers, getting some measurable traction and able to articulate a differentiated story to seed investors.

A second picture on time-to-funding gives a more sobering angle – time to funding:

This picture suggests that the time-line for progressing through the funding stages is stretching out meaningfully. In particular, it says that it is taking longer to get to seed funding – now more than two years. How to startups operate before seed? I think the answer is pre-seed angle funding, “friends-and-family” investment, credit cards and a steady diet of ramen noodles ;-). This means significant commitment to the minimally-funded startup as not a transitory moment but a life-style. It takes toughness and faith.

That commitment to toughness has been codified as the concept of the Lean Startup.  In the “good old days” a mainstream  entrepreneur  has an idea, assembles a stellar team, raises money, rents space, buys computers, a phone systems, networks and cubicles, builds prototypes, hires sales and marketing people and takes a product to market.  And everyone hoped customers would buy it just as they were supposed to.  The Lean Startup model turns that around – an entrepreneur has an idea, gathers two talented technical friends, uses their old laptops and an AWS account, builds prototypes and takes themselves to customers.  They iterate on customer-mandated features for a few months and take it to market as a cloud-based service.  Then they raise money.   More ramen-eating for the founding team, less risk for the investors, and better return on investment overall.

Some kinds of technologies and business models fit the Lean Startup model easily – almost anything delivered as software, especially in the cloud or in non-mission-critical roles.  Some models don’t fit so well – it is tough to build new billion-transistor chips on just a ramen noodle budget, and tough to get customers without a working prototype.  So the whole distribution of startups has shifted in favor of business models and technologies that look leaner.

If you’re looking for sobering statistics, the US funding picture shows that funding has retreated a bit from the highs of 2015 and early 2016.

Does that mean that funding is drying up? I don’t think so. It just makes things look like late 2013 and early 2014, and certainly higher than 2011 and 2012. In fact, I believe that most quality startups are going to find adequate funding, though innovation, “leanness” and savvy response to emerging technologies all continue to be critically important.

To get a better idea of the funding trend, I dug a bit deeper into one segment – computing vision and imaging – hat I feel may be representative of a broad class of emerging technology-driven applications, especially as investment shifts towards artificial intelligence in all its forms.

For this, I mined Crunchbase, the popular startup funding event database and service, to get a rough picture of what has happened in funding over the past five years. It’s quite hard to get unambiguous statistics from a database like this when your target technology or market criteria don’t neatly fit the predefined categories. You’re forced to resort to description text keyword filtering which is slow and imperfect. Nevertheless, a systematic set of key word filters can give good relative measures over time, even if they can’t give very good absolute numbers.   Specifically, I looked at the number of funding deals, and the number of reported dollars for fundings in embedded vision (EV) companies in each quarter over the past five years, as reported in Crunchbase and as filtered down to represent the company’s apparent focus. (It’s not trivial. Lots of startups’ descriptions talk, for example, about their “company vision” but that doesn’t mean they’re in the vision market ;-). The quarter by quarter numbers jump around a lot, of course, but the linear trend is pretty clearly up and to the right. This data seems to indicate a health level of activity and funding climate for embedded vision.

I’d say that the overall climate for technologies related to cognitive computing – AI, machine learning, neural networks, computer vision, speech recognition, natural language processing and their myriad applications – continues to look health as a whole as well.

In parallel with this look at funding, I’ve also been grinding away at additions, improvements, corrections and refinements on the Cognitive Computing Startup List. I’ve just made the third release of that list. Take a look!



A global look at the cognitive computing start-up scene

I published the first version of my cognitive computing startup list about six weeks ago.  As I poked around further, and got some great questions from the community, I discovered a range of new resources on deep learning and AI startups, and literally thousands of new candidates.  In particular, I started using Crunchbase as a resource to spread my net further for serious cognitive computing companies.  If you simply search their database for companies that mention artificial intelligence somewhere in their description, you get about 2200 hits.  Even the Crunchbase category of Artificial Intelligence companies has more than 1400 companies currently.

As I described in the first release, the majority of companies in the AI category, while having generally interesting or even compelling propositions, are using true cognitive computing as just a modest element of some broader product value, or may be playing up the AI angle, because it is so sexy right now.  Instead, I really tried to identify those companies operating on inherently huge data analytics and generation problems, which have a tight focus on automated machine learning, and whose blogs and job posting suggest depth of expertise and commitment to machine learning and neural network methods.

I also found other good lists of AI-releated startups, like MMC Ventures’s “Artificial Intelligence in the UK: Landscape and learnings from 226 startups”:

and the Chinese Geekpark A100 list of worldwide startups:

With all this, I could filter the vast range of startups down to about 275 that seem to represent the most focused, the most active and the most innovative, according to my admittedly idiosyncratic criteria.

The geographical distribution is instructive.  Not surprisingly, about half are based in the US, with two-thirds of the US start-ups found in California.  More surprisingly is the strong second is the UK, with more than 20% of the total, followed by China, and Canada.  I was somewhat surprised to find China with just 8% of the startups, so I asked a number of colleagues to educate me more on cognitive computing startups in China.  This yields a few more important entrants, but China still lags behind the UK in cognitive computing startups.

I have split the list a number of different ways, identifying those

  • with a significant focus on embedded systems (not just cloud-based software): 82 companies
  • working primarily on imaging and vision-based cognitive computing: 125 companies
  • doing embedded vision: 74 companies

Within embedded vision, you’ll find 10 or more each focused on surveillance, autonomous cars, drones and robotics, human-machine interface, and new silicon platforms for deep learning.  It’s a rich mix.

Stay tuned for more on individual companies, startup strategies and trends in the different segments of cognitive computing.  And take a look at the list!

Who are the most important start-ups in cognitive computing?

What’s happening with start-ups in cognitive computing? It is hard to know where to begin. The combination of the real dramatic progress on the technology, the surge of creativity in conceiving new applications – and big improvements on existing ones – and the tsunami of public hype around AI all combine to inspire a vast array of cognitive computing startups. Over the past three months I have compiled a working list of cognitive computing startups, as a tool to understand the scope and trends of entrepreneurship in the field.

The current list has 185 entities that look like startups – generally, small, privately held organizations, with basic web presence and a stated and serious focus on technology and applications of AI, machine learning and neural-inspired computing. I have tried to omit companies that have been acquired or gone defunct, or are so stealthy that they have no meaningful Internet footprint.   There are many more companies using some form of big data analysis than shown here. Given the hype around cognitive computing, it is certainly popular for many companies to include some mention of AI or machine learning, even when it is fairly tangential to a companies core activities. Making the judgment to include a name on my list was often a close call – there was no bright line. So in rough terms, the criteria might be summarized as follows:

  • Must be a company or independent organization, not an open-source project.
  • Must have enough information on the Internet (company description on web site, LinkedIn, angel investing sites, job postings) to get at least a crude understanding of the degree of focus on cognitive computing
  • Focused on developing or using sophisticated machine learning, especially deep learning methods, not just, for example, doing big data management and analytics as modest part of a general cloud application environment in business intelligence, marketing, CRM, or ecommerce.

I examined four of five hundred companies as candidates for the list, and whittled it down to about 190 that seemed most interesting, most innovative and most focused on advanced cognitive computing technology and applications. The list of candidates came from lots of sources. I have heard about a wide range of vision-centric cognitive computing companies from working intensively in the computer-vision field in the past five years, as well as most of the companies doing specialized processors, and basic neural network technology. I also worked from other teams excellent published lists. The most useful of these is the excellent “The State of Machine Intelligence, 2016”, a list almost 300 companies put together by Shivon Zilis and James Cham and published in the Harvard Business Review, November 2, 2016. I also used the Machine Learning Startup list from as a source of ideas. Finally, I have had scores of conversations with practitioners in the field and read hundreds articles about startup activity over these three months to put together my list.

Three trends stand out as lessons from this survey exercise, beyond the sheer numbers. First, the group represents tremendous diversity, covering novel ideas from robotics, health care, self-driving cars, enterprise operations, on-line commerce, agriculture and personal productivity. These entrepreneurs all believe they have an opportunity to understand and exploit complex patterns in the masses of available data to yield better insights into how to serve customers and users. The more overwhelming the data, the greater the enthusiasm for deep learning from it. (It remains to be seen, however, which of these teams will actually succeed in systematically uncovering dramatic patterns and in monetizing those insights.)

Second, cloud-based software applications dominate the list. I think this comes both from the relative ease of starting enterprise software companies in the current venture climate and from the remarkable breadth of applicability of the powerful pattern recognition and natural language capabilities of state-of-the-art learning algorithms. So every application niche has an available sub-niche in cognitive compute approaches to that application. On the other hand, hardware startups, especially silicon-intensive startups, are pretty scarce. This reflects the fact that many enterprise-centric uses of cognitive computing are not actually much limited by the cost, power or performance of their cognitive computing algorithms – they are initially more concerned with just getting any consistent insights from their data. There is a healthy number of real-time or embedded applications here, especially in robotics and automotive, but these may be content for a while to build at the systems level leveraging off-the-shelf sensors, memories, and CPU, GPU and FPGA silicon computing platforms.

Third, the list is dynamic. Since I started looking, a handful has been acquired, and many more have been created. Undoubtedly many will fail to meet their lofty objectives and others will shift focus in response to the market’s relentless education on what’s really wanted. I’m convinced that the cognitive computing trend is not close to peaking, so we’ll see many new rounds of startups, both going deeper into the underlying technology, as it evolves, and going wider into new application niches across all kinds of cloud and embedded systems.

In the future, I expect to see a huge variety of every-day devices sprout cameras, microphones and motion sensors, with sophisticated cognitive computing behind them to understand human interactions and their environment with astonishing detail and apparent sophistication. Similarly, it seems quite safe to forecast systematic cloud-based identification of trends in our health, habits, purchases, sentiment, and activities. At a minimum, this will uncover macroscopic trends of specific populations, but will often come down, for better or for worse, to individual tracking, diagnosis and commercial leveraging.

The current list:

The Fourth Design Wave

The pace of change in technology, especially in electronic systems, is so rapid and relentless, that we rarely get a chance to pause and look at the big picture.  We have experienced such a cascade of smart, mobile, cloud-enabled products in recent years, that the longer-term patterns in design are not always clear.  It is worthwhile, however, to look briefly at the longer arc of history in electronic design, from the emergence of radio and telephone technology to today, and to anticipate the spread of machine learning and artificial intelligence into our daily lives.

At the risk of oversimplifying a rich tapestry of invention, productization, economic transformation and dead-end developments, we discern three waves of essential electronic design, and the onset of the fourth, as shown below.  Each successive wave does not replace the prior dominant design technology, but builds on top it.

Four Waves of Electronic Design
Four Waves of Electronic Design

The first wave is analog circuits, starting with vacuum tube technologies found in early radios, television and radar in the 1930s and 40s but becoming fully levering transistors as they came along, first as discrete devices, then in ICs.  Today, analog circuits are crucial important in electronic design, with increasing IP reuse as a basic design method for leveraging analog expertise.

The second wave, naturally, is digital design, fully emerging in the 1960s, with discrete transistors, and then TTL components.  In the VLSI era, design transitioned to RTL to gain productivity, verifiability, portability and integratability in system-on-chip.  Today, large fractions of the digital content of any design are based on IP reuse, as with analog circuits.  The remarkable longevity of Moore’s Law scaling of cost, power and performance, has driven digital designs to extrarordinary throughput, complexity and penetration in our lives.

The third wave – processor-based design – really started with digital computers but became a widespread force with the proliferation of the microprocessor and microcontroller in the late 1970s and 1980s.  The underlying digital technology scaling allows the processors grow by roughly one million fold in performance, enabling the explosion of software that characterizes the processor-based design wave.  Software has move inexorably from assembly language coding, to use of high-level languages and optimizing compilers, to to rich software reuse in processor-centric ecosystems, especially around specific operating systems, and to the  proliferation of open-source software as a major driver for cost-reduction, creativity and standardization in complex software systems.

We are now on the cusp of the fourth wave – cognitive computing.  The emergence of large data-sets, new hardware and methods for training of complex neural networks, and the need to extract more insight from ambiguous video and audio, all have helped drive this fourth wave.  It will not replace the prior three waves – we will certainly need advanced design capabilities in analog, digital and processors-plus-software, but these will often be the raw building-blocks for constructing cognitive computing systems.  And even when deep learning and other cognitive computing methods form the heart of an electronic system, these other types of design will play complementary roles in communication, storage and conventional computing around the cognitive heart.  The acknowledgement of the power of cognitive computing is a very recent development – deep neural networks were an obscure curiosity four years ago – but we can anticipate rapid development, and perhaps dramatic change.  In fact, it seems likely that many of today’s hot network structures, training  methods, data-sets and applications will be obsoleted several times over in the next ten years.  Nevertheless, the underlying need for such systems is durable.

Archeologists understand that the proliferation, economics, and even culture of a community is often driven by the characteristic tools of the group.  The variety and usefulness of electronic systems is inevitably coupled to the availability of design tools to rapidly and reliably create new systems,  In the figure below, we show a few of the key tools that typify design today in the analog, digital and processor-based layers.

Current key tools for design
Current key tools for design

The cognitive computing community fully appreciates the need for robust, easy-to-use tool environments, but those emerging tool flows are often still crude, and rarely cover the complete design development cycle from concept and data set selection, to deployment, verification and release.  It seems safe to predict that major categories will cover training, network structure optimization, automated data curation, with labeling, synthesis and augmentation,  and widespread licensing of common large data-sets.  In addition, we might  expect to see tools to assist in debug and visualization  of networks, environments for debug and regression testing, and new mechanisms to verify the accuracy, robustness, and efficiency of training networks.  Finally, no complete system or application will consist of a single cognitive engine or neural network – real systems will comprise a rich mix of conventionally programmed hardware/software and multiple cognitive elements working together, and often distributed across the physical environment, with some elements close to myriad sensors and others deployed entirely in the cloud.  We can easily see the eventual evolution of tools and methods to manage those highly distributed systems, and perhaps relying on data-flows from millions of human users or billions of sensors.

So the fourth wave seems to be here now, but we cannot yet hope to see its ultimate impact on the world.