Deep Learning Startups in China: Report from the Leading Edge

Everyone knows the Chinese classic curse, “May you live in interesting times”. Well, it turns out, the Chinese origin for this pithy phrase is apocryphal – the British statesman, Austin Chamberlain, probably popularized the phrase in the 1930s and attributed it to the Chinese to lend it gravity. We do, however, live in interesting times, in no field better epitomized than in deep learning, and in no location more poignantly than in China.

I have just return from a ten day tour of Beijing, Shenzhen, Shanghai and Hangzhou, meeting with deep learning startups and giving a series of talks on the worldwide deep learning market. In the most fundamental ways, neither the technology, nor the applications, nor the startup process are so different from what you find in Silicon Valley or Europe, but the trip was full of little eye-openers about the deep learning in China, and about the entrepreneurial process there. It reinforced a few long-standing observations, but also shifted my point of view in important ways.

The most striking reflection on the China startup scene is how much it feels like the Silicon Valley environment, and how it seems to differ from other Asian markets. First, there seems to be quite of bit of money available, from classic VCs, from industrial sponsors and even from US semiconductor companies – Xilinx and NVidia have investments in these high-profile startups in China, but I’m sure other major players do too. Second, deep learning is very active, with much of the same “gold-rush” feeling I observe in the US. This contrasts with the Taiwan, Japan and Korea markets, where the deep learning startup theme is less developed, either because startups less central to the business environment (Japan) or because the deep learning enthusiasm has not grown so intense (Taiwan, Korea). Ample funding and enthusiasm also means rapid growth of significant engineering teams – the smallest company I saw had 25 people, the biggest had about 400. California teams have evolved office layouts that look like Chinese ones – open offices without cubicle walls – and Chinese teams have developed the California tradition of endless free food. We are not there yet, but we closer than ever to a common startup culture spanning the Pacific.

Observation: The Chinese academic and industrial technical community is closely tuned into the explosion of activity in deep learning, and many companies are looking to leverage it in products. Baidu’s heavy investment in deep learning– with a research team of more than 1000 – is already well know. The number of papers, on deep learning from Chinese universities and the interest level among startups is also very high. Overall, Chinese industry seems to be gradually shifting from a “cost-down” mindset – focused on how to take established products and to optimize the whole bill-of-materials for lower cost – towards greater attention to functional innovation. A strong orientation towards hardware and system products remains: I have found many fewer pure-play cloud software startups in China than in the US or UK. Nevertheless, the original software content in these systems is growing rapidly. Almost every company I visited had polished and impressive demos of vehicle tracking, face recognition, crowd surveillance or demographic assessment.

Observation: Chinese startups are unafraid of doing new deep-learning silicon platforms. Quite a few of the software companies I visited are building or planning chips capturing their insights into neural network inference computation. Perhaps one in four Chinese startups is working towards silicon, while only one in 15 worldwide is doing custom silicon. One executive explained that that Chinese investors really like to see the potential differentiation that comes from chips, and startups believe that committing to silicon actually helps secure capital. This is in stark contrast to current Silicon Valley wisdom – that investors flee at the mention of chip investment. This striking dichotomy reflects a combination of perceived lower chip development costs in China (because of lower engineering salaries, avoidance of bleeding-edge semiconductor technologies below 20nm and smaller, niche-oriented designs) and the widespread belief that tying software to silicon protects software value. Ironically, silicon development is now strikingly rare among Silicon Valley startups, driven partly by the high costs, and long timelines for chip products, and partly by comparative attractiveness of cloud software startups for investment, where the upfront costs are so much less, and countless new market niches seems to appear weekly.

Observation: The China startups are almost entirely focused on real-time vision and audio applications, with only a supporting role for cloud-based deployment. Cars, human-machine interface and surveillance are the standout segments.   DJI, the world’s biggest consumer drone maker uses highly sophisticated deep learning for target tracking and gesture control. Most of the companies doing vision have capability and demos in identifying and tracking vehicles, pedestrians and bicycles, which applies to both automotive driver assistance/self-driving vehicles and surveillance. Top startups in the vision space include Cambricon, DeepGlint, Deephi, Emotibot, Megvii, Horizon Robotics, Intellifusion, Minieye, Momenta, MorphX, Rokid, SenseTime, and Zero Zero Robotics. Audio systems are also a big area, with particular emphasis on automated speech recognition, including increasing embedded real-time speech processing. Top startups here include AISpeech, Mobvoi, and Unisound.

Observation: China has a disproportionally large surveillance market, and correspondingly heavy interest in deep learning-based applications for face identification, pedestrian tracking, and crowd monitoring.   China is already the world’s largest video surveillance market and has been among the fastest growing. Chinese suppliers describe usage scenarios of identifying and tracking criminals, but China does not have a serious conventional crime problem (both violent crime and property crime are well below US levels, for example). To some extent, “monitoring crime” is a code word for monitoring political disturbances, a deep obsession of the Chinese Communist Party. This is not the only driver for vision-based applications – face ID for access control and transaction verification are also important.

Over the course of ten days, I saw some particularly interesting startups:

  • Horizon Robotics [Beijing]: Horizon is a deep-learning powerhouse, led by Yu Kai. With 220 people, they are innovating across a broad front on vision systems, including smart home, gaming, voice-vision integration, and self-driving cars. They have also adopted a tight hardware-software integration strategy for more complete and efficient solutions.
  • Horizon is a deep-learning powerhouse, led by Yu Kai. With 220 people, they are innovating across a broad front on vision systems, including smart home, gaming, voice-vision integration, and self-driving cars. They have also adopted a tight hardware-software integration strategy for more complete and efficient solutions.
  • Intellifusion [Shenzhen]: Intellifusion is fairly complete video system supplier with close ties to the government organizations deploying public surveillance. They currently deploy their own servers with GPUs and FPGAs, but moving increasing functionality into edge devices, like the cameras themselves.
  • NextVPU [Shanghai]: NextVPU is the youngest (about 12 months old) and smallest (24 people) of the startups I saw. They are pursuing AR, robotics and ADAS systems, but their first product, an AR headset of the visually impaired, is compelling in both a technical and social sense. Their headset does full scene segmentation for pedestrian navigation and recognizes dozens of key objects – signs, obstacles and common home and urban elements to help their users.
  • Deephi [Beijing]: Deephi is one of the most advanced and impressive of all the deep learning startups, with close ties to both the leading technical university in China, Tsinghua, and leading US research universities.   They have a particularly sophisticated understanding of what it takes to map leading edge neural networks into the small power, compute and memory resources of embedded devices, using world-class compression techniques. They are pursuing both surveillance (vision) and data-center (vision and speech) applications with a range of innovative programmable and optimized inference architectures.
  • Sensetime [Beijing]: Sensetime is one of the biggest and most visible software startups using deep learning for vision. They have impressive demos spanning surveillance, face recognition, demographic and mood analysis, and street view identification and tracking of vehicles. They are sufficiently competent and confident to have developed their own training framework, Parrots, in lieu of Caffe, Tensor Flow and the other standard platforms.
  • Megvii [Beijing]: Megvii is a prominent Chinese “unicorn” – a startup valued at US$1B+, and is often known by the name of their leading application, Face++. Face++ is a sophisticated and widely used face ID environment, leveraging the Chinese government’s face database. This official and definitive database enables customer verification for transaction systems like Alibaba’s AliPay and DiDi’s leading ride hailing system. They show an impressive collection of demos for recognition and augmented reality and “augmented photography. Like many other Chinese companies, Megvii is moving functionality from the cloud to embedded devices, to improve latency, availability and security.
  • Bitmain [Beijing]: Bitmain is hardly a startup and is wildly successful in non-deep learning applications, specifically cyptocurrency mining hardware. They have become the biggest supplier of ASICs for computational hashing, especially for Bitcoin, but now spreading into rival currencies like Litecoin. Founded in 2013, they hit US$100 in revenue in 2014 and are on track to do US$500M this year. This revenue stream and profitability is allowing them to explore new areas, and deep learning platforms seem to be a candidate for further expansion.

Here’s a more complete list of top Chinese deep-learning startups

Name Description
4Paradigm Scaled deep learning cloud platform
AISpeech Real-time and cloud-based automated speech recognition for car, home and robot UI
Cambricon Device and cloud processors for AI
DeepGlint 3D computer vision and deep learning for human & vehicle detection, tracking and recognition
Deephi Compressed CNN networks and processors
Emotibot A natural interaction interface between human and machine based on multi-modal
Face++ Face recognition
Horizon Robotics Smart Home, automotive and Public safety
ICarbonX Individualized health analysis and prediction of health index by machine analysis
Intellifusion Cloud-based deep learning for public safety and industrial monitoring
Minieye ADAS vision cameras and software
Mobvoi Smart watch with voice search using cloud
Momenta AI platform for level 5 autonomous driving
MorpX Commercializes computer vision and deep learning technologies for low-cost/-power platforms
Rokid Home personal assistant – ASR + face/gesture
SeetaTech Open source development platform to enable enterprise vision and machine learning
SenseTime Computer vision
TUPU Image recognition technology and services
tuSimple Software for self-driving cars: detection and tracking, flow, SLAM, segmentation, face analysis
Unisound AI-based speech and text
YITU Technology Computer vision for surveillance, transportation and medical imaging
Zero Zero Robotics Smart following drone camera

Of course, no one can claim to understand everything that’s happening in the vibrant Chinese startup community, least of all a non-native speaker. Nevertheless, everyone I spoke with in China validated this list of the top deep learning startups. Some were a bit surprised at the depth of the list, especially in identifying startups that were not yet on their radar. Both technically, and in exploring market trends, the China startup world is at the cutting edge in many areas. It bears close watching for worldwide impact.