Today I’m rolling out the Cognite 300 poster, a handy guide to the more focused and interesting startup companies in cognitive computing and deep learning. I wrote about the ideas behind the creation of the list in an earlier blog posting:
I have updated the on-line list every couple of months since the start of 2017, and will continue to do so, because the list keeps changing. Some companies close, some are acquired, some shift their focus. Most importantly, I continue to discover startups that belong on the list, so I will keep adding those, using approximately the same criteria of focus used for the first batch.
I should underscore that many potentially interesting companies haven’t gone on the list:
because it appears that AI is only a modest piece of their value proposition, or
because there is too little information on their websites to judge, or
because they doing interesting work on capturing and curating big data sets (that may ultimately require deep learning methods) but don’t emphasize learning work themselves, or
I just failed to understand the significance of the company’s offerings
A few weeks ago, a venture capital colleague suggested that I should do a poster, to make the list more accessible and to communicate a bit of the big picture of segments and focus areas for these companies. I classified the companies (alas, by hand, not with a neural network 😉 into 16 groups
Sec – Security, Surveillance, Monitoring
Cars – Autonomous Vehicles
HMI – Human-Machine Interface
Robot – Drones and Robots
Chip – Silicon Platforms
Plat -Deep Learning Cloud Compute/Data Platform and Services
VaaSVision as a Service
ALaaS – Audio and Language as a Service
Mark – Marketing and Advertising
CRM – Customer Relationship Management and Human Resources
Manf – Operations, Logistics and Manufacturing
Sat – Aerial Imaging and Mapping
Med – Medicine and Health Care
Media – Social Media, Entertainment and Lifestyle
Fin – Finance, Insurance and Commerce
IT – IT Operations and Security
I’ve also included an overlay of two broader categories, Vision and Embedded. Many of the 16 categories fall cleanly inside or outside embedded and vision, but some categories include all the combinations. A few companies span two of the 16 groups, so they are shown in both.
You may download and use the poster as you wish, so long as you reproduce it in its entirety, do not modify it and maintain the attribution to Cognite Ventures.
Finally, I have also updated the list itself, including details on the classification of the startup by the 16 categories and the 2 broader classes, and identifying the primary country of operations. For US companies I’ve also including the primary state.
This is the third installment of my thoughts on starting an embedded vision company. In part 1, I focused on the opportunity, especially how the explosion in the number of image sensors was overwhelming human capacity to directly view all the potential image streams, and creating a pressing need for orders-of-magnitude increase in the volume and intelligence of local vision processing. In part 2, I shifted to a discussion of some core startup principles and models for teams in embedded vision or beyond. In this final section, I focus on how the combination of the big vision opportunity, plus the inherent agility (and weakness!) of startups can guide teams to some winning technologies and business approaches.
Consider areas of high leverage on embedded vision problems:
The Full Flow: Every step of the pixel flow from sensor interface, through the ISP, to the mix video analytics (classical and neural network-based) has impact on vision system performance. Together with choices in training data, user interface, application targeting and embedded vs. cloud application partitioning give an enormous range of options on vision quality, latency, cost, power, and functionality. That diversity of choices creates many potential niches where a startup can take root and grow, without having to attack huge obvious markets using the most mainstream techniques initially.
Deep Neural Networks: At this point it is pretty obvious to that neural network methods are transforming computer vision. However, applying neural networks in vision is much more than just doing ImageNet classification. It pays to invest in thoroughly understanding the variety of both discrimination methods (object identification, localization, tracking) and generation methods. Neural-network-based image synthesis may start to play a significant role in augmenting or even displacing 3D graphics rendering in some scenarios. Moreover, Generative Adversarial Network methods allow a partially trained discrimination network and a generation network to iterate through refinements that improve both networks automatically.
Data Sets: To find, create and repurpose data for better training is half the battle in deep learning. Having access to unique data sets can be the key differentiator for a startup, and brainstorming new problems that can be solved with available large data sets is a useful discipline in startup strategy development. Ways to maximize data leverage may include the following:
Create an engagement model with customers, so that their data can contribute to the training data set for future use. Continuous data bootstrapping, perhaps spurred by free access to cloud service, may allow creation of large, unique training data collections.
Build photo-realistic simulations of the usage scenes and sequences in your target world. The extracted image sequences are inherently labeled by the underlying scene structures and can generate large training sets to augment real world image captured training data. Moreover, simulation can systematically cover rare but important combinations of object motion, lighting, and camera impairments for added system robustness. For example the automotive technology startup up, AIMotive, builds both sophisticated fused cognition systems from image, LiDAR and radar streams, and sophisticated driving simulators with accurate 3D world to train and test neural network-based systems.
Some embedded vision systems can be designed as subsets of bigger, more expensive server-based vision systems, especially when neural networks of heroic scale are development by cloud-based researchers. If the reference network is enough better than the goals for the embedded system, the behavior of that big model can be used as “ground truth” for the embedded system. This makes generation of large training sets for the embedded version much easier.
Data augmentation is a powerful method. If you have only a moderate amount of training data, you may be able to apply a series of transformations to the data and allow prior labeling to be maintained. (We know a dog is still a dog, no matter how we scale it, rotate it or flip its image.) Be careful though – neural networks can be so discriminating that a network trained on artificial or augmented data, may only respond to such example, however similar those examples may be to real world data, in human perception.
New Device Types: The low cost and high intelligence of vision subsystems is allowing imaging-based systems in lots of new form-factors. These new device types may create substantially new vision problems. Plausible new devices include augmented reality headsets and glasses, ultra-small always-alert “visual dust” motes, new kinds of vehicles from semi trucks to internal body “drones”, and cameras embedded in clothing, toys, disposable medical supplies, packaging materials, and other unconventional settings. It may not be necessary in these new devices to deliver either fine images, or achieve substantial autonomy. Instead, the imagers may just be the easiest way to get a little bit more information from the environment or insight about the user.
New Silicon Platforms: Progress in the new hardware platforms for vision processing, especially for deep neural network methods, is nothing less than breathtaking. We’re seeing improvements in efficiency of at least 3x per year, which translates into both huge gains in absolute performance at the high end, and percolation of significant neural network capacity into low cost and low power consumer-class systems. Of course, 200% per year efficiency growth cannot continue for very long, but it does let design teams think big about what’s possible in a given form-factor and budget. This rapid advance in computing capacity appears to be happening in many different product categories – in server-grade GPUs, embedded GPUs, mobile phone apps processors, and deeply embedded platforms for IoT. As just one typical example, the widely used Tensilica Vision DSP IP cores have seen the multiply rate – a reasonable proxy for neural network compute throughput – increase by 16x (64 è 1024 8x8b multiplies per cycle per core) in just over 18 months. Almost every established chip company doing system-on-chip platforms is rolling out significant enhancements or new architectures to support deep learning. In addition, almost 20 new chip startups are taking the plunge with new platforms, typically aiming at huge throughput to rival high-end GPUs or ultra high efficiency to fit into IoT roles. This wealth of new platforms will make choosing a target platform more complex, but will also dramatically increase the potential speed and capability of new embedded vision platforms.
More Than Just Vision: When planning an embedded vision product, it ‘s important to remember that embedded vision is a technology, not an end application. Some applications will be completely dominated by their vision component, but for many others the vision channel will be combined with many other information channels. This may come from other sensors, especially audio and motion sensors, or from user controls or from background data, especially cloud data. In addition, each vision node may be just one piece of a distributed application, so that node-to-node and node-to-cloud-to-node application coordination may be critical, especially in developing a wide assessment of a key issue or territory. Once all the channels of data are aggregate and analyzed, for example, through convolutional neural networks, what then? Much of the value of vision is in taking action, whether the action is real-time navigation, event alerts, emergency braking, visual or audio response to users, or updating of central event databases. In thinking about the product, map out the whole flow to capture a more complete view of user needs, dependencies on other services, computation and communication latencies and throughput bottlenecks, and competitive differentiators for the total experience.
Point Solution to Platform: In the spirit of “crossing the chasm” it is often necessary to define the early product as a solution for a narrow constituency’s particular needs. Tight targeting of a point solution may let you stand out in a noisy market of much bigger players, and to reduce the integration risks faced by your potential early adopters. However, that also limits the scope of the system to just what you directly engineer. Opening up the interfaces and the business model to let both customers and third parties add functionality has two big benefits. First, it means that the applicability of your core technology can expand to markets and customers that you couldn’t necessarily serve with your finite resources to adapt and extend the product. Second, the more a customer invests their own engineering resources into writing code or developing peripheral hardware around your core product, the more stake they have in your success. Both practical and psychological factors make your product sticky. It turns a point product into a platform. Sometimes, that opening of the technology can leverage an open-source model, so long as some non-open, revenue-generating dimension remains. Proliferation is good, but is not the same as bookings. Some startups start with a platform approach, but that has challenges. It may be difficult to get customers to invest to build your interfaces into their system if you’re too small and unproven, and it may be difficult to differentiate against big players able to simply declare an “de facto industry standard”.
Any startup walks a fine line between replicating what others have done before, and attempting something so novel that no one can appreciate the benefit. One useful way to look for practical novelty is to look at possible innovation around the image stream itself. Here are four ways you might think about new business around image streams:
Take an existing image stream, and apply improved algorithms. For example, build technology that operates on user’s videos and does improved captioning, tagging and indexing.
Take and existing image stream and extract new kinds of data beyond the original intent. For example, use outdoor surveillance video streams to do high resolution weather reporting, or look at traffic congestion.
Take an existing image stream and provide services on it under new business models. For example, build a software for user video search that doesn’t charge by copy or by subscription, but by success in finding specific events
Build new image streams by putting cameras in new places. For example, chemical refiners are installing IR cameras that can identify leaks of otherwise invisible gases. A agricultural automation startup, Blue River, is putting sophisticated imaging on herbicide sprayers, so that herbicides can be applied just on recognized weeds, not on crop plants or bare soil, increasing yields and reducing chemical use.
Thinking beyond just the image stream can be important too. Consider ways that cameras, microphones and natural language processing methods can be combined to get richer insights into the environment and users intent.
Can the reflected sound of a aerial drone’s blades give additional information for obstacle avoidance?
Can the sound of tires on the road surface give clues about driving conditions for autonomous cars?
Can the pitch and content of voices give indications of stress levels in drivers, or crowds in public places?
The figure below explores a range of application and functions types using multiple modes of sensing and analysis
Autonomous Vehicles and Robotics
Monitoring, Inspection and Surveillance
Personal Device Enhancement
· Multi-sensor: image, depth, speed
· Environmental assessment
· Localization and odometry
· Full surround views
· Obstacle avoidance
· Attention monitoring
· Command interface
· Multi-mode automatic speech recognition
· Social photography
· Augmented Reality
· Localization and odometry
· Ultrasonic sensing
· Acoustic surveillance
· Health and performance monitoring
· Mood analysis
· Command interface
· ASR in social media context
· Hands-free UI
· Audio geolocation
· Access control
· Sentiment analysis
· Sentiment analysis
· Command interface
· Real-time translation
· Local service bots
· Enhanced search
The variety of vision opportunities is truly unbounded. The combination of inexpensive image sensors, huge cognitive computing capacity, rich training data and ubiquitous communications makes this time absolutely unique. Doing a vision startup is hard, just as any startup venture is hard. Finding the right team, the right market, the right product and the right money is never easy, but the rewards, especially the emotional, technical and professional rewards can be enormous.
In my previous blog post, I outlined how the explosion of high-resolution, low-cost image sensors was transforming the nature of vision, as we rapidly evolve to a world where most pixels are never seen by humans, but captured, analyzed and used by embedded computing systems. This discontinuity is creating ample opportunities for new technologies, new business models and new companies. In this second part, we look at the basics ingredients of a startup, and two rival models of how to approach building a successful enterprise.
Let’s look at the basic ingredients of starting a technology business – not just a vision venture. We might call this “Startup 101”. The first ingredient is the team.
You will need depth of skills. It is impossible to be outstanding in absolutely everything, but success does depend on having world-class capability in one or two disciplines, usually including at least one technology dimension. Without leadership somewhere, it’s hard to differentiate from others, and to see farther down the road on emerging applications.
You don’t need to be world-class in everything, but having a breadth of skills across the basics – hardware, software, marketing, sales, fund-raising, strategy, infrastructure – will help enormously in moving forward as a business. The hardware/software need is obvious and usually first priority. You have to be able to develop and deliver something useful, unique and functional. But sooner or later you’ll also need to figure out how to describe it to customers, make the product and company visible, and complete business transactions. You’ll also need to put together some infrastructure, so that you can hire and pay people, get reasonably secure network access and supply toilet paper in the bathrooms.
Some level of experience on the team is important. You don’t need to be graybeards with rich startup and big company track records, but some level of real world history is enormously valuable. You need enough to avoid the rookie mistakes and to recognize the difference between a normal potholes and an existential crevasse. Potholes you do your best to avoid, but it you have to survive a bump, you can. A bit of experience can alert you when you’re approaching the abyss, so you can do everything possible to get around it. Is there a simple formula for recognizing those crevasses? Unfortunately, no (but they often involve core team conflict, legal issues, or cash flow). Startups through a lot of issues, big and small, at the leadership team, so there will be plenty of opportunity to gain experience along the way.
The last key dimension of team is perhaps the most important, but also the most nebulous – character. Starting a company is hard work, with plenty of stress and emotion, because of the stakes. A team, capable and committed to openness, patience and honesty, will perform better, last longer, and have more fun than other teams. It does NOT mean that the team should agree all the time – part of the point of constructing a team with diverse skills is to get good “parallax perspective” on the thorniest problems. It DOES mean trusting one another to do their jobs, being willing to ask tough questions about assumptions and methods, and working hard for common effort. More than anything, it means putting ego and individual glory aside.
The second key ingredient for a startup is the product. Every startup’s product is different (or it had better be!), but here are four criteria to apply to the product concept:
The product should be unique in at least one major dimension.
The uniqueness could be functionality – product does something that wasn’t possible before, or it does a set of functions together that were weren’t possible before.
The uniqueness could be performance – it does a known job faster, at lower power, cheaper or in a smaller form-factor than anyone else.
The uniqueness could be the business or usage model – it allows a task to be done by a new – usually less sophisticated – customer, or let’s the customer pay for it in a different way
Building the product must be feasible. It isn’t enough just to have a Mat Lab model of a great new vision algorithm – you need to make it work at the necessary speed, and fit in the memory of the target embedded platform, with all the interfaces to cameras, networks, storage and other software layers.
The product should be defensible. Once others learn about the product, can they easily copy it? When you work with customers about real needs, will you be able to incorporate improvements more rapidly and more completely than others? Can you gather training data and interesting usage cases more quickly? Can you protect your code, your algorithms, and your hardware design from overt cloners?
You should be able to explain the product relative to the competition? In some ideal world, customers would be able to appreciate the magnificence of your invention without any point of comparison – they would instantly understand how to improve their lives by buying your product. In that ideal world you would have no competition. In the long run, you ideally want to so dominate your segment that no one else comes close. However, if you have no initial reference point – no competition – you may struggle to discover and explain the product’s virtues. Having some competition is not a bad thing –it gives a preexisting reference point by which the performance, functionality and usage model breakthrough can be made vivid to potential customers. In fact, if you think you have no competition, you should probably go find some, at least for purpose of showing the superiority of your solution.
The third key ingredient for a startup is the target market: the group of users plus the context for use. Think “teenage girls” + “shopping for clothes” or “PCB layout designers” + “high frequency multi-layer board timing closure”.
Finding the right target market for a new technology product faces an inherent dilemma. In the early going, it is not hard to find a group of technology enthusiasts who will adopt the product because it is new, cool and powerful. They have an easy time picture how it may serve their uses and are comfortable with the hard work to adapt the inherent power of your technology to their needs. Company progress often stalls, however, once this small population of early adopters has embraced the product. The great majority of users are not looking for an application or integration challenge – they just want to get some job done. They may tolerate use of new technology from an untried startup, but only if it clearly addresses their use case. This transition to mainstream users has been characterized by author Geoffrey Moore as “crossing the chasm”. The recognized strategy for getting into wider use is to narrow the focus to more completely solve the problems of a smaller group of mainstream customers, often by solving the problem more completely for one vertical application or for one specific usage scenario. So “going vertical” puts the fear (and hypothetical potential) of the technology into the background and emphasizes the practical and accessible benefits of the superior solution.
This narrowing of focus, however, can sometimes create a dilemma in working with potential investors. Investors, especially big VCs want to hit homeruns by winning huge markets. They don’t want narrow niche plays. The highly successful investor, Peter Thiel, dramatizes this point of view by saying “competition is for losers”, meaning that growing and dominating a new market can be much more profitable than participating in an existing commodity market.
The answer is to think about, and where appropriate, talk about the market strategy in two levels. First identify huge markets that are still under-served or latent. Then focus on an early niche within that emerging market which can be dominated by a concentrated effort, where the insights and technologies needed to master the niche are excellent preparation for larger and larger surrounding use-cases with the likely huge market. Talking about both the laser focus on the “beachhead” initial market AND the setup for leadership in the eventual market can often resolve the apparent paradox.
The accumulated wisdom of startup methods is evolving continuously, both as teams refine what works, and as the technology and applications create new business models [think advertising], new platforms [think applications as a cloud service], new investment methods [think crowd funding] and new team structures [think gig economy]. The software startup world, in particular, has been dramatically influenced by the “Lean Startup” principle. This idea has evolved over the past fifteen year, spawned by the writing of Steve Blank, more than any one individual. It contrasts in key ways to the long-standing model, which we can call “Old School”.
Seed Round based on team and idea, A Round to develop product, B Round after revenue
Develop prototype to get Seed Round, A Round after revenue, B Round, if any, for global expansion
Hardware/software systems and silicon
Easiest with software
Develop sales and marketing organization, to sell direct or build channel
CEO and CTO are chief sales people until product and revenue potential proven in the market
Mostly B2B with large transactions
Web–centric B2B and B2C with subscriptions and micro-transactions
In vastly simplified form, the Lean Startup model is built on five elements of guidance:
Rapidly develop a Minimum Viable Product (MVP) – the simplest product-like deliverable that customers will actually use in some form. Getting engaged with customers as early as possible gives you the kind of feedback on real problems that you cannot get from theoretical discussions. It gives you a chance to concentrate on the most customer-relevant features and skip the effort on features that customers are less likely to care about.
Test prototypes on target users early and often – Once you have an MVP, you have a great platform to evolve incrementally. If you can figure out how to float new features into the product without jeopardizing basic functionality, then you can do rapid experimentation on customer usage. This allows the product to evolve more quickly.
Measure market and technical progress dispassionately and scientifically – New markets and technologies often don’t follow old rules-of-thumb, so you may need to develop new more appropriate metrics of progress for both. Methods like A-B testing of alternative mechanisms can give quick feedback on real customer usage, and enhances a sense of honesty and realism in the effort.
Don’t take too much money too soon – Taking money from investors is an implied commitment to deliver predictable returns in a predicable fashion. If you try to make that promise too early, people won’t believe you, so you won’t get the funding. Even if you can convince investors to fund you, taking money too early may make you commit to a product before you really know what works best. In some areas, like cloud software, powerful ideas can sometimes be developed and launched by small teams, so that little funding is absolutely necessary in the early days. Startup and funding culture have evolved together so that teams often don’t expect to get outside funding until they have their MVP. Some teams expect to be eating ramen noodles for many months.
Leverage open source and crowd source thinking – It is hard to overstate the impact of open source on many areas of software. The combination of compelling economics, rapid evolution and vetting by a wide base of users makes open source important in two ways – as a building block within your own technical development strategy, and as part of a proliferation strategy that creates a wider market for your product. Crowd sourcing represents an analogous method to harness wide enthusiasm for your mission or product to gather and refine content, generate data and get startup funds.
As these methods have grown up in the cloud software world, they do not all automatically apply to embedded vision startups. Some technologies, like new silicon platforms, require such high upfront investments and expensive iterations that deferring funding or iterating customer prototypes may not be viable. In addition, powerful ideas like live (and unannounced) A-B testing on customers will not be acceptable for some embedded products, especially in safety-critical applications. The lean methods here work most obviously in on-line environments, with large numbers of customers and small transactions. A design win for an embedded system may have much greater transaction value than any single order in on-line software, so the sales process may be quite different, with a significant role for well-orchestrated sales and marketing efforts with key customers. Nevertheless, we can compare typical “Old School” and “Lean Startup” methods across key areas like funding, product types, methods for getting customers and core business models.
Since I started Cognite Ventures eight months ago, my activity with startup teams has ramped up dramatically. Many of these groups are targeting some kind embedded vision application, and many want advice on how to succeed. This conversation developed into an idiosyncratic set of thoughts on vision startup guidance, which in turn spawned a talk at the Embedded Vision Summit which I’m now expanding as a blog. You can find the slides here, but I will also break this conversation down into a three-part article.
Please allow me to start with some caveats! Every startup is different, every team is different, and the market is constantly evolving – so there is no right answer. Moreover, I have had success in my startups, especially Tensilica, but I can hardly claim that I have succeeded just because of following these principles. I have been blessed with an opportunity to work with remarkable teams, whose own talent, energy and principles have been enormously influential on the outcome. To the extent that I have directly contributed to startup success, is it because of applying these ideas? Or in spite of these ideas? Or just dumb luck?
I believe the current energy around new ventures in vision comes from two fundamental technical and market trends. First, the cost of capturing image streams has fallen dramatically. I can buy a HD resolution security camera with IR illumination and an aluminum housing for $13.91 on Amazon. This implies that the core electronics – CMOS sensor, basic image signal processing and video output – probably costs about two dollars at the component level. This reflects the manufacturing learning curve from the exploding volume of cameras. It’s useful to compare the trend for the population of people with the population of cameras on the planet, based on SemiCo data on image sensors from 2014 and assuming each sensor has a useful life in the wild of three years.
What does it say? First, it appears that the number of cameras crossed over the number of people sometime in the last year. This means that even if every human spent every moment of every day and night watching video, a significant fraction of the output of these cameras would go unwatched. Of course, many of these cameras are shut off, or sitting in someone’s pocket, or watching complete darkness at any given time. Nevertheless, it is certain that humans will very rarely see the captured images. If installing or carrying those cameras around is going to make any sense, it will because we used vision analysis to filter, select or act on the streams without human involvement in every frame.
But the list of implications goes on!
We now have more than 10B image sensors installed. If each can produce an HD video stream of 1080p60, we have potential raw content of roughly 100M pixels per second per camera, or 1018 new pixels per second, or something >1025 B per years of raw pixel data. If, foolishly, we tried to keep all the raw pixels, the storage requirement would exceed the annual production of hard disk plus NAND flash by a factor of rough 10,000. Even if we compressed the video down to 5Mbps, we would fill up a year’s supply of storage by sometime on January 4 of the next year. Clearly we’re not going to store all that potential content. (Utilization and tolerable compression rates will vary widely by type of camera – the camera on my phone is likely to be less active that a security camera, and some security cameras may get by on less than 5MBps, but the essential problem remains.)
Where do new bits come from? New bits are captured from the real world, or “synthesized” from other data. Synthesized data is credit card transactions, packet headers, stock trades, emails, and other data created within electronic systems as a byproduct of applications. Real world data can be pixels from cameras, or audio samples from microphones, or accelerometer data from MEMS sensors. Synthetic data is ultimately derived from real world data, though the transformations of human interaction, economic transactions and sharing. Audio and motion sensors are rich sources of data, but their data rates are dramatically less – 3 to 5 orders of magnitude less – than that of even cheap image sensors. So virtually all of the real data of the world – and an interesting fraction of all electronic data – is pixels.
The overwhelming volume of pixels has deep implications for computing and communications. Consider that $13.91 video camera. Even if we found a way to ship that continuous video stream up to cloud, we couldn’t afford to use some x86 or GPU-enabled server to process all that content – over the life of that camera, we’d could easy spend thousands of dollars on that hardware (and power) dedicated to that video channel. Similarly, 5Mbps of compressed video * 60 second * 60 minutes * 24 hours * 365 days is 12,960 Gbits per month. I don’t know about your wireless plan, but that’s more than my cellular wireless plan absorbs easily. So it is pretty clear that we’re not going to be able to either do the bulk of the video analysis on cloud servers, or communicate it via cellular. Wi-Fi networks may have no per-bit charges, and greater overall capacity, but wireless infrastructure will have trouble scaling to the necessary level to handle tens of billions of streams. We must find ways to do most of the computing on embedded systems, so that no video, or only the most salient video is sent to the cloud for storage, further processing or human review and action.
The second reason for the enthusiasm for vision is the revolution in computation methods for extracting insights from image streams. In particular, the emergence of convolutional neural networks as a key analytical building block has dramatically improved the potential for vision systems to extract subtle insightful results from complex, noisy image streams. While no product is just a neural network, the increasingly well-understood vocabulary of gathering and labeling large data sets, constructing and training neural networks, and deploying those computational networks onto efficient embedded hardware, has become part of the basic language of vision startups.
When we reflect these observations back onto the vision market, we can discern three big useful categories of applications:
Capture of images and video for human consumption. This incudes everything from fashion photography and snapshots posted on Facebook to Hollywood films and document scanning. This is the traditional realm of imaging, and much of the current technology base – image signal processing pipelines, video compression methods and video displays – are built around particular characteristics of the human visual system. This area has been the mainstay of digital imaging and video related products for the past two decades. Innovation in new higher resolution formats, new cameras and new image enhancement remains a plausible area for startup activity even today, but it is not as hot as it has been. While this area has been the home of classical image enhancement methods, there is ample technical innovation in this category, for example, in new generative neural network models that can synthesize photo-realistic images.
Capture of images and video, then filtering, reducing and organizing into a concise form for human decision-making. This category includes a wide range of vision processing and analytics technologies, including most activity in video monitoring and surveillance. The key here is often to make huge bodies of video content tagged, indexed and searchable, and to filter out irrelevant content so only a tiny fraction needs to be uploaded, stored, reviewed or more exhaustively analyzed. This area is already active but we would expect even more, especially as teams work to exploit the potential for joint analytics spanning many cameras simultaneously. Cloud applications are particularly important in this area, because its storage, computing and collaboration flexibility.
Capture of images and video, analyzing and then using insights to take autonomous action. This domain has captured the world’s imagination in recent years, especially with the success of autonomous vehicle prototypes and smart aerial drones. The rapid advances in convolutional neural networks are particularly vivid and important in this area, as vision processing becomes accurate and robust enough to trust with decision making in safety-critical systems. One of the key characteristics of these systems is short-latency, robustness and hard real-time performance. System architects will rely on autonomous vision systems to the extent that the systems can make guarantees of short decision latency and ~100% availability.
Needless to say, some good ideas may be hybrids of these three, especially in systems that use vision for some simple autonomous decision-making, but rely on humans for backup, or for more strategic decisions, based on the consolidated data.
In the next part of the article, we’ll take a look at the ingredients of a startup – team, product and target market – and look at some cogent lessons from the “lean startup” model that rules software entrepreneurship today.
I’ve spend the last few months digging into the intersection between the on-going deep learning revolution and the world-wide opportunity for startups. This little exercise has highlighted both how the startup funding world is evolving, and some of the unique issues and opportunities for deep learning-based startups.
These show the growth in the seed funding level and valuation, the stretching out of the pre-seed stage for companies and the a reduction in overall funding activity from the exceedingly frothy levels of 2015.
Let’s look at some key pictures – first seed funding:
That’s generally a pretty comforting trend – seed round funding levels and valuations increasing steadily over time, without direct signs of a funding bubble or “irrational enthusiasm”. This says that strong teams with great ideas and demonstrated progress on their initial product (their Minimum Viable Product or “MVP”) are learning from early trial customers, getting some measurable traction and able to articulate a differentiated story to seed investors.
A second picture on time-to-funding gives a more sobering angle – time to funding:
This picture suggests that the time-line for progressing through the funding stages is stretching out meaningfully. In particular, it says that it is taking longer to get to seed funding – now more than two years. How to startups operate before seed? I think the answer is pre-seed angle funding, “friends-and-family” investment, credit cards and a steady diet of ramen noodles ;-). This means significant commitment to the minimally-funded startup as not a transitory moment but a life-style. It takes toughness and faith.
That commitment to toughness has been codified as the concept of the Lean Startup. In the “good old days” a mainstream entrepreneur has an idea, assembles a stellar team, raises money, rents space, buys computers, a phone systems, networks and cubicles, builds prototypes, hires sales and marketing people and takes a product to market. And everyone hoped customers would buy it just as they were supposed to. The Lean Startup model turns that around – an entrepreneur has an idea, gathers two talented technical friends, uses their old laptops and an AWS account, builds prototypes and takes themselves to customers. They iterate on customer-mandated features for a few months and take it to market as a cloud-based service. Then they raise money. More ramen-eating for the founding team, less risk for the investors, and better return on investment overall.
Some kinds of technologies and business models fit the Lean Startup model easily – almost anything delivered as software, especially in the cloud or in non-mission-critical roles. Some models don’t fit so well – it is tough to build new billion-transistor chips on just a ramen noodle budget, and tough to get customers without a working prototype. So the whole distribution of startups has shifted in favor of business models and technologies that look leaner.
If you’re looking for sobering statistics, the US funding picture shows that funding has retreated a bit from the highs of 2015 and early 2016.
Does that mean that funding is drying up? I don’t think so. It just makes things look like late 2013 and early 2014, and certainly higher than 2011 and 2012. In fact, I believe that most quality startups are going to find adequate funding, though innovation, “leanness” and savvy response to emerging technologies all continue to be critically important.
To get a better idea of the funding trend, I dug a bit deeper into one segment – computing vision and imaging – hat I feel may be representative of a broad class of emerging technology-driven applications, especially as investment shifts towards artificial intelligence in all its forms.
For this, I mined Crunchbase, the popular startup funding event database and service, to get a rough picture of what has happened in funding over the past five years. It’s quite hard to get unambiguous statistics from a database like this when your target technology or market criteria don’t neatly fit the predefined categories. You’re forced to resort to description text keyword filtering which is slow and imperfect. Nevertheless, a systematic set of key word filters can give good relative measures over time, even if they can’t give very good absolute numbers. Specifically, I looked at the number of funding deals, and the number of reported dollars for fundings in embedded vision (EV) companies in each quarter over the past five years, as reported in Crunchbase and as filtered down to represent the company’s apparent focus. (It’s not trivial. Lots of startups’ descriptions talk, for example, about their “company vision” but that doesn’t mean they’re in the vision market ;-). The quarter by quarter numbers jump around a lot, of course, but the linear trend is pretty clearly up and to the right. This data seems to indicate a health level of activity and funding climate for embedded vision.
I’d say that the overall climate for technologies related to cognitive computing – AI, machine learning, neural networks, computer vision, speech recognition, natural language processing and their myriad applications – continues to look health as a whole as well.
In parallel with this look at funding, I’ve also been grinding away at additions, improvements, corrections and refinements on the Cognitive Computing Startup List. I’ve just made the third release of that list. Take a look!
I published the first version of my cognitive computing startup list about six weeks ago. As I poked around further, and got some great questions from the community, I discovered a range of new resources on deep learning and AI startups, and literally thousands of new candidates. In particular, I started using Crunchbase as a resource to spread my net further for serious cognitive computing companies. If you simply search their database for companies that mention artificial intelligence somewhere in their description, you get about 2200 hits. Even the Crunchbase category of Artificial Intelligence companies has more than 1400 companies currently.
As I described in the first release, the majority of companies in the AI category, while having generally interesting or even compelling propositions, are using true cognitive computing as just a modest element of some broader product value, or may be playing up the AI angle, because it is so sexy right now. Instead, I really tried to identify those companies operating on inherently huge data analytics and generation problems, which have a tight focus on automated machine learning, and whose blogs and job posting suggest depth of expertise and commitment to machine learning and neural network methods.
I also found other good lists of AI-releated startups, like MMC Ventures’s “Artificial Intelligence in the UK: Landscape and learnings from 226 startups”:
and the Chinese Geekpark A100 list of worldwide startups:
With all this, I could filter the vast range of startups down to about 275 that seem to represent the most focused, the most active and the most innovative, according to my admittedly idiosyncratic criteria.
The geographical distribution is instructive. Not surprisingly, about half are based in the US, with two-thirds of the US start-ups found in California. More surprisingly is the strong second is the UK, with more than 20% of the total, followed by China, and Canada. I was somewhat surprised to find China with just 8% of the startups, so I asked a number of colleagues to educate me more on cognitive computing startups in China. This yields a few more important entrants, but China still lags behind the UK in cognitive computing startups.
I have split the list a number of different ways, identifying those
with a significant focus on embedded systems (not just cloud-based software): 82 companies
working primarily on imaging and vision-based cognitive computing: 125 companies
doing embedded vision: 74 companies
Within embedded vision, you’ll find 10 or more each focused on surveillance, autonomous cars, drones and robotics, human-machine interface, and new silicon platforms for deep learning. It’s a rich mix.
Stay tuned for more on individual companies, startup strategies and trends in the different segments of cognitive computing. And take a look at the list!
What’s happening with start-ups in cognitive computing? It is hard to know where to begin. The combination of the real dramatic progress on the technology, the surge of creativity in conceiving new applications – and big improvements on existing ones – and the tsunami of public hype around AI all combine to inspire a vast array of cognitive computing startups. Over the past three months I have compiled a working list of cognitive computing startups, as a tool to understand the scope and trends of entrepreneurship in the field.
The current list has 185 entities that look like startups – generally, small, privately held organizations, with basic web presence and a stated and serious focus on technology and applications of AI, machine learning and neural-inspired computing. I have tried to omit companies that have been acquired or gone defunct, or are so stealthy that they have no meaningful Internet footprint. There are many more companies using some form of big data analysis than shown here. Given the hype around cognitive computing, it is certainly popular for many companies to include some mention of AI or machine learning, even when it is fairly tangential to a companies core activities. Making the judgment to include a name on my list was often a close call – there was no bright line. So in rough terms, the criteria might be summarized as follows:
Must be a company or independent organization, not an open-source project.
Must have enough information on the Internet (company description on web site, LinkedIn, angel investing sites, job postings) to get at least a crude understanding of the degree of focus on cognitive computing
Focused on developing or using sophisticated machine learning, especially deep learning methods, not just, for example, doing big data management and analytics as modest part of a general cloud application environment in business intelligence, marketing, CRM, or ecommerce.
I examined four of five hundred companies as candidates for the list, and whittled it down to about 190 that seemed most interesting, most innovative and most focused on advanced cognitive computing technology and applications. The list of candidates came from lots of sources. I have heard about a wide range of vision-centric cognitive computing companies from working intensively in the computer-vision field in the past five years, as well as most of the companies doing specialized processors, and basic neural network technology. I also worked from other teams excellent published lists. The most useful of these is the excellent “The State of Machine Intelligence, 2016”, a list almost 300 companies put together by Shivon Zilis and James Cham and published in the Harvard Business Review, November 2, 2016. I also used the Machine Learning Startup list from angel.co as a source of ideas. Finally, I have had scores of conversations with practitioners in the field and read hundreds articles about startup activity over these three months to put together my list.
Three trends stand out as lessons from this survey exercise, beyond the sheer numbers. First, the group represents tremendous diversity, covering novel ideas from robotics, health care, self-driving cars, enterprise operations, on-line commerce, agriculture and personal productivity. These entrepreneurs all believe they have an opportunity to understand and exploit complex patterns in the masses of available data to yield better insights into how to serve customers and users. The more overwhelming the data, the greater the enthusiasm for deep learning from it. (It remains to be seen, however, which of these teams will actually succeed in systematically uncovering dramatic patterns and in monetizing those insights.)
Second, cloud-based software applications dominate the list. I think this comes both from the relative ease of starting enterprise software companies in the current venture climate and from the remarkable breadth of applicability of the powerful pattern recognition and natural language capabilities of state-of-the-art learning algorithms. So every application niche has an available sub-niche in cognitive compute approaches to that application. On the other hand, hardware startups, especially silicon-intensive startups, are pretty scarce. This reflects the fact that many enterprise-centric uses of cognitive computing are not actually much limited by the cost, power or performance of their cognitive computing algorithms – they are initially more concerned with just getting any consistent insights from their data. There is a healthy number of real-time or embedded applications here, especially in robotics and automotive, but these may be content for a while to build at the systems level leveraging off-the-shelf sensors, memories, and CPU, GPU and FPGA silicon computing platforms.
Third, the list is dynamic. Since I started looking, a handful has been acquired, and many more have been created. Undoubtedly many will fail to meet their lofty objectives and others will shift focus in response to the market’s relentless education on what’s really wanted. I’m convinced that the cognitive computing trend is not close to peaking, so we’ll see many new rounds of startups, both going deeper into the underlying technology, as it evolves, and going wider into new application niches across all kinds of cloud and embedded systems.
In the future, I expect to see a huge variety of every-day devices sprout cameras, microphones and motion sensors, with sophisticated cognitive computing behind them to understand human interactions and their environment with astonishing detail and apparent sophistication. Similarly, it seems quite safe to forecast systematic cloud-based identification of trends in our health, habits, purchases, sentiment, and activities. At a minimum, this will uncover macroscopic trends of specific populations, but will often come down, for better or for worse, to individual tracking, diagnosis and commercial leveraging.
The pace of change in technology, especially in electronic systems, is so rapid and relentless, that we rarely get a chance to pause and look at the big picture. We have experienced such a cascade of smart, mobile, cloud-enabled products in recent years, that the longer-term patterns in design are not always clear. It is worthwhile, however, to look briefly at the longer arc of history in electronic design, from the emergence of radio and telephone technology to today, and to anticipate the spread of machine learning and artificial intelligence into our daily lives.
At the risk of oversimplifying a rich tapestry of invention, productization, economic transformation and dead-end developments, we discern three waves of essential electronic design, and the onset of the fourth, as shown below. Each successive wave does not replace the prior dominant design technology, but builds on top it.
The first wave is analog circuits, starting with vacuum tube technologies found in early radios, television and radar in the 1930s and 40s but becoming fully levering transistors as they came along, first as discrete devices, then in ICs. Today, analog circuits are crucial important in electronic design, with increasing IP reuse as a basic design method for leveraging analog expertise.
The second wave, naturally, is digital design, fully emerging in the 1960s, with discrete transistors, and then TTL components. In the VLSI era, design transitioned to RTL to gain productivity, verifiability, portability and integratability in system-on-chip. Today, large fractions of the digital content of any design are based on IP reuse, as with analog circuits. The remarkable longevity of Moore’s Law scaling of cost, power and performance, has driven digital designs to extrarordinary throughput, complexity and penetration in our lives.
The third wave – processor-based design – really started with digital computers but became a widespread force with the proliferation of the microprocessor and microcontroller in the late 1970s and 1980s. The underlying digital technology scaling allows the processors grow by roughly one million fold in performance, enabling the explosion of software that characterizes the processor-based design wave. Software has move inexorably from assembly language coding, to use of high-level languages and optimizing compilers, to to rich software reuse in processor-centric ecosystems, especially around specific operating systems, and to the proliferation of open-source software as a major driver for cost-reduction, creativity and standardization in complex software systems.
We are now on the cusp of the fourth wave – cognitive computing. The emergence of large data-sets, new hardware and methods for training of complex neural networks, and the need to extract more insight from ambiguous video and audio, all have helped drive this fourth wave. It will not replace the prior three waves – we will certainly need advanced design capabilities in analog, digital and processors-plus-software, but these will often be the raw building-blocks for constructing cognitive computing systems. And even when deep learning and other cognitive computing methods form the heart of an electronic system, these other types of design will play complementary roles in communication, storage and conventional computing around the cognitive heart. The acknowledgement of the power of cognitive computing is a very recent development – deep neural networks were an obscure curiosity four years ago – but we can anticipate rapid development, and perhaps dramatic change. In fact, it seems likely that many of today’s hot network structures, training methods, data-sets and applications will be obsoleted several times over in the next ten years. Nevertheless, the underlying need for such systems is durable.
Archeologists understand that the proliferation, economics, and even culture of a community is often driven by the characteristic tools of the group. The variety and usefulness of electronic systems is inevitably coupled to the availability of design tools to rapidly and reliably create new systems, In the figure below, we show a few of the key tools that typify design today in the analog, digital and processor-based layers.
The cognitive computing community fully appreciates the need for robust, easy-to-use tool environments, but those emerging tool flows are often still crude, and rarely cover the complete design development cycle from concept and data set selection, to deployment, verification and release. It seems safe to predict that major categories will cover training, network structure optimization, automated data curation, with labeling, synthesis and augmentation, and widespread licensing of common large data-sets. In addition, we might expect to see tools to assist in debug and visualization of networks, environments for debug and regression testing, and new mechanisms to verify the accuracy, robustness, and efficiency of training networks. Finally, no complete system or application will consist of a single cognitive engine or neural network – real systems will comprise a rich mix of conventionally programmed hardware/software and multiple cognitive elements working together, and often distributed across the physical environment, with some elements close to myriad sensors and others deployed entirely in the cloud. We can easily see the eventual evolution of tools and methods to manage those highly distributed systems, and perhaps relying on data-flows from millions of human users or billions of sensors.
So the fourth wave seems to be here now, but we cannot yet hope to see its ultimate impact on the world.
The technology landscape has been utterly transformed in the past decade by one technology above all others – mobile wireless data services. It has enabled the global smart phone revolution, with millions of apps and new business models based on ubiquitous high-bandwidth data access, especially using 3G, 4G and WiFi. It has also transformed computing infrastructure, through the impetus for continuous real-time applications served up from massive aggregated data of social connections, transportation, commerce, and crowd-sourced entertainment.
So what’s next? One direction is obvious and important – yet better wireless data services, especially ubiquitous cellular or 5G wireless. It is clear that the global appetite for data – higher bandwidth, more reliable, lower latency, more universal data – is huge. If the industry can find ways to deliver improved data services at reasonable costs, the demand will likely drive an enormous variety of new applications. And that trend alone is an interesting fundamental technology story.
The next decade will also witness another revolution – widespread development and deployment of cognitive computing, Neural inspired computing methods already play a key role in advanced driver assistance systems, speech recognition and face recognition, and are likely to sweep through many robotics, social media, finance, commerce and health care applications. Triggered by availability of new data-sets and bigger, better trained neural networks, cognitive computing will likely show rapid improvements in effectiveness in complex recognition and decision-making scenarios.
But how will these two technologies – 5G and cognitive computing – interact? How will this new basic computing model shift the demands on wireless networks? Let’s explore a bit.
Let’s start by looking at some of the proposed attributes of 5G:
Total capacity scaling via small cells
Much larger number of users and much higher bandwidth per user
Reduced latency – down to a few tens of milliseconds
Native machine-to-machine connectivity without the capacity, bandwidth and latency constraints of the basestation as intermediary
More seamless integration across fiber and wireless for better end-to-end throughput
These functions will not be easy to achieve – sustained algorithmic and implementation innovation in massive MIMO, new modulation and frequency.time division multiplexing, exploitation of millimeter wave frequency bands, and collaborative multi-cell protocols are likely to be needed. Even the device architectures themselves, to make them more suited for low-latency machine-to-machine connectivity, will be mandatory.
Some of the attributes of 5G are driven by a simple extrapolation from today’s mobile data use models. For example, it is reasonable to expect that media content streaming, especially video streaming will be a major downlink use-case, driven by more global viewing, and by higher resolution video formats. However, the number of video consumers cannot grow indefinitely – the human population is increasing at only about 1.2% per year. So how far can downlink traffic grow if it is driven primarily by conventional video consumption – certainly by an order of magnitude, but probably not by 2 or 3 orders of magnitude,
On the other hand, we are seeing rapid increase in the number of data sensors, especially cameras, in the world – cameras in mobile phones, security cameras, flying drones and toys, smart home appliances, cars, industrial control nodes and other real-time devices. Image sensors produce so much more data than other common sensor types (e.g. microphones for audio and accelerometers for motion) that virtually 100% of all newly produced raw data is image/video data. Cameras are increasing a much faster rate than human eyeballs – at a rate of more than 20% per year. In fact, I estimate that the number of cameras in the world exceeded the number of humans, starting in 2016. By 2020, we could see more than 20B image sensors active in the world, each theoretically capturing 250MB per second or more of data (1080p60 4:2:2 video capture)
This raises two key questions.
First, how do we move that much video content. Even with good (but still lossy) video compression (assuming 6Mbps encoding rate), that number of cameras running continuously implies more than 10^17 bits per second of required uplink capacity. That translates into a requirement for the equivalent of hundreds of millions of current 4G base stations, just to handle the video uplink. This implies that 5G wireless needs to look at least as hard at uplink capacity as at downlink capacity. Of course, not all those cameras will be working continuously, but it is easily possible to imagine that the combination of self-driving cars, ubiquitous cameras in public areas for safety and more immersive social media could increase the total volume by several orders of magnitude.
Second, who is going to look at the output of all those cameras? It can’t be people – there simple aren’t enough eyeballs – so we need some other audience – “virtual eyeballs” to monitor, and extract the images, events or sequences of particular interest – to automatically make decisions using that image or video flow, or to distill it down to content down just the rare and relevant events for human evaluation.
In many cases, the latency demands on decision making are so intense – in automotive systems for example -that only real-time machine vision will be fast enough to respond. This implies that computer vision will be a key driving force in distributed real-time systems. In some important scenarios, the vision intelligence will be implemented in the infrastructure, either in cloud servers or in a new category of cloud-edge computing nodes. This suggests a heavy emphasis on the capacity scaling and latency. In many other scenarios, the interpretation of the video must be local to the device to make it fast and robust enough for mission-critical applications. In these cases, the recognition tasks are local, and only the the more concise and abstract stream of recognized objects or events needs to be communicated over the wireless network.
So let’s make a recap some observations about the implications of the simultaneous evolution of 5G wireless and cognitive computing.
We will see a “tug-of-war” between in device-side or cloud-side cognitive computing, based on bandwidth and latency demands, concerns of robustness in the face of network outage, and the risks of exposure of raw data.
Device: low latency, low bandwidth consumption, lowest energy
Cloud: Training, fast model updates, data aggregation, flexibility
Current wireless network are not fast enough for cloud-based real-time vision, but 5G capacity gains, especially combined with good cloud edge computing may be close enough.
The overwhelming number of image sensors makes computer vision’s “virtual eyeballs” necessary. This, in turn, implies intense demands on both uplink capacity and local intelligence.
Machine-to-machine interactions will not happen on low-level raw sensor data – cars will not exchange pixel-level information to avoid accidents. The machines will need to exchange abstract data in order to provide high robustness and low latency, even with 5G networks.
Wireless network operations will be highly complex, with big opportunities for adaptive behavior to improve service, trim costs and reduce energy consumption. Sophisticated pattern recognition and response using cognitive computing make ultimately play a significant role in real-0time network management.
So its time to buckle in and get ready for an exciting ride – a decade of dramatic innovation in systems, especially systems at the confluence of 5G wireless and deep learning. For more insights, please see my presentation from the IEEE 5G Summit, September 29, 2016: rowen-5g-meets-deep-learning-v2
Neural networks and the broader category of cognitive computing have certainly blossomed in the past couple of years. After more than three decades of academic investment, neural networks are an overnight success. I think three forces have triggered this explosion of new technology (and hype).
First, the Internet has aggregated previously unimaginable reservoirs of raw data, capturing a vivid, comprehensive, but incoherent picture of the real world and human activity. This becomes the foundation from which we can train models of reality, unprejudiced by oversimplified synopses.
Second, progress in computing and storage, have made it practicable to implement large-scale model training processes, and to deploy useful inference-based applications using those trained models. Amid hand-wringing over the so-called “death of Moore’s Law” we actually find that a combination of increasingly efficient engines and massively parallel training and inference installations is actually giving us sustained scaling of compute capability for neural networks. Today, GPUs and FPGAs are leading hardware platforms for training and deployment, but we can safely bet that new platform architectures, build from direct experience with neural network algorithms, are just around the corner.
Third, we have seen rapid expansion of understanding of the essential mechanisms and applications of neural networks for cognition. Universities, technology companies and end-users have quickly developed enthusiasm for the proposed benefits, even if the depth of knowledge is weak. This excitement translates into funding, exploratory developments and pioneering product developments.
These three triggers – massive data availability, seriously parallel computing hardware, and wide enthusiasm – set the scene for the real work of bringing neural networks into the mainstream. Already we see a range of practical deployments, in voice processing, automated translation, facial recognition and automated driving, but the real acceleration is still ahead of us. We are likely to see truly smart deployments in finance, energy, retail, health care, transportation, public safety and agriculture in the next five years.
The rise of cognitive computing will not be smooth. It is perfectly safe to predict two types of hurdles. On one hand, the technology will sometimes fail to deliver on promises, and some once-standard techniques will be discredited and abandoned in favor of new network structures, training methods, deployment platforms and application frameworks. We may even think sometimes that the cognitive computing revolution has failed. On the other hand, there will be days when the technology appears so powerful as to be a threat to our established patterns of work and life. It will sometimes appear to achieve a level of intelligence, independence and mastery that frightens people. We will ask, sometimes justifiably, if we want to put decision making on key issues of morality, liberty, privacy and empathy into the hands of artificial intelligences.
Nevertheless, I remain an optimist, on the speed of progress and depth of impact, and well as on our ability and willingness to shape this technology to fully serve human ends.