All posts by rowen

How to Start an Embedded Vision Company — Part 1

Part 1: Why Vision

 

Since I started Cognite Ventures eight months ago, my activity with startup teams has ramped up dramatically. Many of these groups are targeting some kind embedded vision application, and many want advice on how to succeed. This conversation developed into an idiosyncratic set of thoughts on vision startup guidance, which in turn spawned a talk at the Embedded Vision Summit which I’m now expanding as a blog. You can find the slides here, but I will also break this conversation down into a three-part article.

Please allow me to start with some caveats! Every startup is different, every team is different, and the market is constantly evolving – so there is no right answer. Moreover, I have had success in my startups, especially Tensilica, but I can hardly claim that I have succeeded just because of following these principles. I have been blessed with an opportunity to work with remarkable teams, whose own talent, energy and principles have been enormously influential on the outcome. To the extent that I have directly contributed to startup success, is it because of applying these ideas? Or in spite of these ideas? Or just dumb luck?

I believe the current energy around new ventures in vision comes from two fundamental technical and market trends. First, the cost of capturing image streams has fallen dramatically. I can buy a HD resolution security camera with IR illumination and an aluminum housing for $13.91 on Amazon.   This implies that the core electronics – CMOS sensor, basic image signal processing and video output – probably costs about two dollars at the component level.   This reflects the manufacturing learning curve from the exploding volume of cameras. It’s useful to compare the trend for the population of people with the population of cameras on the planet, based on SemiCo data on image sensors from 2014 and assuming each sensor has a useful life in the wild of three years.

What does it say? First, it appears that the number of cameras crossed over the number of people sometime in the last year. This means that even if every human spent every moment of every day and night watching video, a significant fraction of the output of these cameras would go unwatched. Of course, many of these cameras are shut off, or sitting in someone’s pocket, or watching complete darkness at any given time. Nevertheless, it is certain that humans will very rarely see the captured images. If installing or carrying those cameras around is going to make any sense, it will because we used vision analysis to filter, select or act on the streams without human involvement in every frame.

But the list of implications goes on!

  • We now have more than 10B image sensors installed. If each can produce an HD video stream of 1080p60, we have potential raw content of roughly 100M pixels per second per camera, or 1018 new pixels per second, or something >1025 B per years of raw pixel data. If, foolishly, we tried to keep all the raw pixels, the storage requirement would exceed the annual production of hard disk plus NAND flash by a factor of rough 10,000. Even if we compressed the video down to 5Mbps, we would fill up a year’s supply of storage by sometime on January 4 of the next year. Clearly we’re not going to store all that potential content. (Utilization and tolerable compression rates will vary widely by type of camera – the camera on my phone is likely to be less active that a security camera, and some security cameras may get by on less than 5MBps, but the essential problem remains.)
  • Where do new bits come from? New bits are captured from the real world, or “synthesized” from other data. Synthesized data is credit card transactions, packet headers, stock trades, emails, and other data created within electronic systems as a byproduct of applications. Real world data can be pixels from cameras, or audio samples from microphones, or accelerometer data from MEMS sensors. Synthetic data is ultimately derived from real world data, though the transformations of human interaction, economic transactions and sharing.   Audio and motion sensors are rich sources of data, but their data rates are dramatically less – 3 to 5 orders of magnitude less – than that of even cheap image sensors. So virtually all of the real data of the world – and an interesting fraction of all electronic data – is pixels.
  • The overwhelming volume of pixels has deep implications for computing and communications. Consider that $13.91 video camera. Even if we found a way to ship that continuous video stream up to cloud, we couldn’t afford to use some x86 or GPU-enabled server to process all that content – over the life of that camera, we’d could easy spend thousands of dollars on that hardware (and power) dedicated to that video channel.   Similarly, 5Mbps of compressed video * 60 second * 60 minutes * 24 hours * 365 days is 12,960 Gbits per month. I don’t know about your wireless plan, but that’s more than my cellular wireless plan absorbs easily. So it is pretty clear that we’re not going to be able to either do the bulk of the video analysis on cloud servers, or communicate it via cellular. Wi-Fi networks may have no per-bit charges, and greater overall capacity, but wireless infrastructure will have trouble scaling to the necessary level to handle tens of billions of streams.  We must find ways to do most of the computing on embedded systems, so that no video, or only the most salient video is sent to the cloud for storage, further processing or human review and action.

The second reason for the enthusiasm for vision is the revolution in computation methods for extracting insights from image streams. In particular, the emergence of convolutional neural networks as a key analytical building block has dramatically improved the potential for vision systems to extract subtle insightful results from complex, noisy image streams. While no product is just a neural network, the increasingly well-understood vocabulary of gathering and labeling large data sets, constructing and training neural networks, and deploying those computational networks onto efficient embedded hardware, has become part of the basic language of vision startups.

When we reflect these observations back onto the vision market, we can discern three big useful categories of applications:

  1. Capture of images and video for human consumption. This incudes everything from fashion photography and snapshots posted on Facebook to Hollywood films and document scanning. This is the traditional realm of imaging, and much of the current technology base – image signal processing pipelines, video compression methods and video displays – are built around particular characteristics of the human visual system. This area has been the mainstay of digital imaging and video related products for the past two decades.   Innovation in new higher resolution formats, new cameras and new image enhancement remains a plausible area for startup activity even today, but it is not as hot as it has been. While this area has been the home of classical image enhancement methods, there is ample technical innovation in this category, for example, in new generative neural network models that can synthesize photo-realistic images.
  2. Capture of images and video, then filtering, reducing and organizing into a concise form for human decision-making.  This category includes a wide range of vision processing and analytics technologies, including most activity in video monitoring and surveillance. The key here is often to make huge bodies of video content tagged, indexed and searchable, and to filter out irrelevant content so only a tiny fraction needs to be uploaded, stored, reviewed or more exhaustively analyzed. This area is already active but we would expect even more, especially as teams work to exploit the potential for joint analytics spanning many cameras simultaneously.  Cloud applications are particularly important in this area, because its storage, computing and collaboration flexibility.
  3. Capture of images and video, analyzing and then using insights to take autonomous action. This domain has captured the world’s imagination in recent years, especially with the success of autonomous vehicle prototypes and smart aerial drones.   The rapid advances in convolutional neural networks are particularly vivid and important in this area, as vision processing becomes accurate and robust enough to trust with decision making in safety-critical systems. One of the key characteristics of these systems is short-latency, robustness and hard real-time performance. System architects will rely on autonomous vision systems to the extent that the systems can make guarantees of short decision latency and ~100% availability.

Needless to say, some good ideas may be hybrids of these three, especially in systems that use vision for some simple autonomous decision-making, but rely on humans for backup, or for more strategic decisions, based on the consolidated data.

In the next part of the article, we’ll take a look at the ingredients of a startup – team, product and target market – and look at some cogent lessons from the “lean startup” model that rules software entrepreneurship today.

 

What’s happening in startup funding?

I’ve spend the last few months digging into the intersection between the on-going deep learning revolution and the world-wide opportunity for startups. This little exercise has highlighted both how the startup funding world is evolving, and some of the unique issues and opportunities for deep learning-based startups.

Looking at some basic funding trends is a good place to start. Pitchbook as just published an excellent public summary of key quantitative trends in US startup funding: http://pitchbook.com/news/reports/2016-annual-vc-valuations-report

These show the growth in the seed funding level and valuation, the stretching out of the pre-seed stage for companies and the a reduction in overall funding activity from the exceedingly frothy levels of 2015.

Let’s look at some key pictures – first seed funding:

That’s generally a pretty comforting trend – seed round funding levels and valuations increasing steadily over time, without direct signs of a funding bubble or “irrational enthusiasm”.   This says that strong teams with great ideas and demonstrated progress on their initial product (their Minimum Viable Product or “MVP”) are learning from early trial customers, getting some measurable traction and able to articulate a differentiated story to seed investors.

A second picture on time-to-funding gives a more sobering angle – time to funding:

This picture suggests that the time-line for progressing through the funding stages is stretching out meaningfully. In particular, it says that it is taking longer to get to seed funding – now more than two years. How to startups operate before seed? I think the answer is pre-seed angle funding, “friends-and-family” investment, credit cards and a steady diet of ramen noodles ;-). This means significant commitment to the minimally-funded startup as not a transitory moment but a life-style. It takes toughness and faith.

That commitment to toughness has been codified as the concept of the Lean Startup.  In the “good old days” a mainstream  entrepreneur  has an idea, assembles a stellar team, raises money, rents space, buys computers, a phone systems, networks and cubicles, builds prototypes, hires sales and marketing people and takes a product to market.  And everyone hoped customers would buy it just as they were supposed to.  The Lean Startup model turns that around – an entrepreneur has an idea, gathers two talented technical friends, uses their old laptops and an AWS account, builds prototypes and takes themselves to customers.  They iterate on customer-mandated features for a few months and take it to market as a cloud-based service.  Then they raise money.   More ramen-eating for the founding team, less risk for the investors, and better return on investment overall.

Some kinds of technologies and business models fit the Lean Startup model easily – almost anything delivered as software, especially in the cloud or in non-mission-critical roles.  Some models don’t fit so well – it is tough to build new billion-transistor chips on just a ramen noodle budget, and tough to get customers without a working prototype.  So the whole distribution of startups has shifted in favor of business models and technologies that look leaner.

If you’re looking for sobering statistics, the US funding picture shows that funding has retreated a bit from the highs of 2015 and early 2016.

Does that mean that funding is drying up? I don’t think so. It just makes things look like late 2013 and early 2014, and certainly higher than 2011 and 2012. In fact, I believe that most quality startups are going to find adequate funding, though innovation, “leanness” and savvy response to emerging technologies all continue to be critically important.

To get a better idea of the funding trend, I dug a bit deeper into one segment – computing vision and imaging – hat I feel may be representative of a broad class of emerging technology-driven applications, especially as investment shifts towards artificial intelligence in all its forms.

For this, I mined Crunchbase, the popular startup funding event database and service, to get a rough picture of what has happened in funding over the past five years. It’s quite hard to get unambiguous statistics from a database like this when your target technology or market criteria don’t neatly fit the predefined categories. You’re forced to resort to description text keyword filtering which is slow and imperfect. Nevertheless, a systematic set of key word filters can give good relative measures over time, even if they can’t give very good absolute numbers.   Specifically, I looked at the number of funding deals, and the number of reported dollars for fundings in embedded vision (EV) companies in each quarter over the past five years, as reported in Crunchbase and as filtered down to represent the company’s apparent focus. (It’s not trivial. Lots of startups’ descriptions talk, for example, about their “company vision” but that doesn’t mean they’re in the vision market ;-). The quarter by quarter numbers jump around a lot, of course, but the linear trend is pretty clearly up and to the right. This data seems to indicate a health level of activity and funding climate for embedded vision.

I’d say that the overall climate for technologies related to cognitive computing – AI, machine learning, neural networks, computer vision, speech recognition, natural language processing and their myriad applications – continues to look health as a whole as well.

In parallel with this look at funding, I’ve also been grinding away at additions, improvements, corrections and refinements on the Cognitive Computing Startup List. I’ve just made the third release of that list. Take a look!

 

 

A global look at the cognitive computing start-up scene

I published the first version of my cognitive computing startup list about six weeks ago.  As I poked around further, and got some great questions from the community, I discovered a range of new resources on deep learning and AI startups, and literally thousands of new candidates.  In particular, I started using Crunchbase as a resource to spread my net further for serious cognitive computing companies.  If you simply search their database for companies that mention artificial intelligence somewhere in their description, you get about 2200 hits.  Even the Crunchbase category of Artificial Intelligence companies has more than 1400 companies currently.

As I described in the first release, the majority of companies in the AI category, while having generally interesting or even compelling propositions, are using true cognitive computing as just a modest element of some broader product value, or may be playing up the AI angle, because it is so sexy right now.  Instead, I really tried to identify those companies operating on inherently huge data analytics and generation problems, which have a tight focus on automated machine learning, and whose blogs and job posting suggest depth of expertise and commitment to machine learning and neural network methods.

I also found other good lists of AI-releated startups, like MMC Ventures’s “Artificial Intelligence in the UK: Landscape and learnings from 226 startups”:

https://medium.com/mmc-writes/artificial-intelligence-in-the-uk-landscape-and-learnings-from-226-startups-70b9551f3e4c#.l7elokutt

and the Chinese Geekpark A100 list of worldwide startups:

http://www.geekpark.net/topics/217003

With all this, I could filter the vast range of startups down to about 275 that seem to represent the most focused, the most active and the most innovative, according to my admittedly idiosyncratic criteria.

The geographical distribution is instructive.  Not surprisingly, about half are based in the US, with two-thirds of the US start-ups found in California.  More surprisingly is the strong second is the UK, with more than 20% of the total, followed by China, and Canada.  I was somewhat surprised to find China with just 8% of the startups, so I asked a number of colleagues to educate me more on cognitive computing startups in China.  This yields a few more important entrants, but China still lags behind the UK in cognitive computing startups.

I have split the list a number of different ways, identifying those

  • with a significant focus on embedded systems (not just cloud-based software): 82 companies
  • working primarily on imaging and vision-based cognitive computing: 125 companies
  • doing embedded vision: 74 companies

Within embedded vision, you’ll find 10 or more each focused on surveillance, autonomous cars, drones and robotics, human-machine interface, and new silicon platforms for deep learning.  It’s a rich mix.

Stay tuned for more on individual companies, startup strategies and trends in the different segments of cognitive computing.  And take a look at the list!

Who are the most important start-ups in cognitive computing?

What’s happening with start-ups in cognitive computing? It is hard to know where to begin. The combination of the real dramatic progress on the technology, the surge of creativity in conceiving new applications – and big improvements on existing ones – and the tsunami of public hype around AI all combine to inspire a vast array of cognitive computing startups. Over the past three months I have compiled a working list of cognitive computing startups, as a tool to understand the scope and trends of entrepreneurship in the field.

The current list has 185 entities that look like startups – generally, small, privately held organizations, with basic web presence and a stated and serious focus on technology and applications of AI, machine learning and neural-inspired computing. I have tried to omit companies that have been acquired or gone defunct, or are so stealthy that they have no meaningful Internet footprint.   There are many more companies using some form of big data analysis than shown here. Given the hype around cognitive computing, it is certainly popular for many companies to include some mention of AI or machine learning, even when it is fairly tangential to a companies core activities. Making the judgment to include a name on my list was often a close call – there was no bright line. So in rough terms, the criteria might be summarized as follows:

  • Must be a company or independent organization, not an open-source project.
  • Must have enough information on the Internet (company description on web site, LinkedIn, angel investing sites, job postings) to get at least a crude understanding of the degree of focus on cognitive computing
  • Focused on developing or using sophisticated machine learning, especially deep learning methods, not just, for example, doing big data management and analytics as modest part of a general cloud application environment in business intelligence, marketing, CRM, or ecommerce.

I examined four of five hundred companies as candidates for the list, and whittled it down to about 190 that seemed most interesting, most innovative and most focused on advanced cognitive computing technology and applications. The list of candidates came from lots of sources. I have heard about a wide range of vision-centric cognitive computing companies from working intensively in the computer-vision field in the past five years, as well as most of the companies doing specialized processors, and basic neural network technology. I also worked from other teams excellent published lists. The most useful of these is the excellent “The State of Machine Intelligence, 2016”, a list almost 300 companies put together by Shivon Zilis and James Cham and published in the Harvard Business Review, November 2, 2016. I also used the Machine Learning Startup list from angel.co as a source of ideas. Finally, I have had scores of conversations with practitioners in the field and read hundreds articles about startup activity over these three months to put together my list.

Three trends stand out as lessons from this survey exercise, beyond the sheer numbers. First, the group represents tremendous diversity, covering novel ideas from robotics, health care, self-driving cars, enterprise operations, on-line commerce, agriculture and personal productivity. These entrepreneurs all believe they have an opportunity to understand and exploit complex patterns in the masses of available data to yield better insights into how to serve customers and users. The more overwhelming the data, the greater the enthusiasm for deep learning from it. (It remains to be seen, however, which of these teams will actually succeed in systematically uncovering dramatic patterns and in monetizing those insights.)

Second, cloud-based software applications dominate the list. I think this comes both from the relative ease of starting enterprise software companies in the current venture climate and from the remarkable breadth of applicability of the powerful pattern recognition and natural language capabilities of state-of-the-art learning algorithms. So every application niche has an available sub-niche in cognitive compute approaches to that application. On the other hand, hardware startups, especially silicon-intensive startups, are pretty scarce. This reflects the fact that many enterprise-centric uses of cognitive computing are not actually much limited by the cost, power or performance of their cognitive computing algorithms – they are initially more concerned with just getting any consistent insights from their data. There is a healthy number of real-time or embedded applications here, especially in robotics and automotive, but these may be content for a while to build at the systems level leveraging off-the-shelf sensors, memories, and CPU, GPU and FPGA silicon computing platforms.

Third, the list is dynamic. Since I started looking, a handful has been acquired, and many more have been created. Undoubtedly many will fail to meet their lofty objectives and others will shift focus in response to the market’s relentless education on what’s really wanted. I’m convinced that the cognitive computing trend is not close to peaking, so we’ll see many new rounds of startups, both going deeper into the underlying technology, as it evolves, and going wider into new application niches across all kinds of cloud and embedded systems.

In the future, I expect to see a huge variety of every-day devices sprout cameras, microphones and motion sensors, with sophisticated cognitive computing behind them to understand human interactions and their environment with astonishing detail and apparent sophistication. Similarly, it seems quite safe to forecast systematic cloud-based identification of trends in our health, habits, purchases, sentiment, and activities. At a minimum, this will uncover macroscopic trends of specific populations, but will often come down, for better or for worse, to individual tracking, diagnosis and commercial leveraging.

The current list:  http://www.cogniteventures.com/the-cognitive-computing-startup-list/

The Fourth Design Wave

The pace of change in technology, especially in electronic systems, is so rapid and relentless, that we rarely get a chance to pause and look at the big picture.  We have experienced such a cascade of smart, mobile, cloud-enabled products in recent years, that the longer-term patterns in design are not always clear.  It is worthwhile, however, to look briefly at the longer arc of history in electronic design, from the emergence of radio and telephone technology to today, and to anticipate the spread of machine learning and artificial intelligence into our daily lives.

At the risk of oversimplifying a rich tapestry of invention, productization, economic transformation and dead-end developments, we discern three waves of essential electronic design, and the onset of the fourth, as shown below.  Each successive wave does not replace the prior dominant design technology, but builds on top it.

Four Waves of Electronic Design
Four Waves of Electronic Design

The first wave is analog circuits, starting with vacuum tube technologies found in early radios, television and radar in the 1930s and 40s but becoming fully levering transistors as they came along, first as discrete devices, then in ICs.  Today, analog circuits are crucial important in electronic design, with increasing IP reuse as a basic design method for leveraging analog expertise.

The second wave, naturally, is digital design, fully emerging in the 1960s, with discrete transistors, and then TTL components.  In the VLSI era, design transitioned to RTL to gain productivity, verifiability, portability and integratability in system-on-chip.  Today, large fractions of the digital content of any design are based on IP reuse, as with analog circuits.  The remarkable longevity of Moore’s Law scaling of cost, power and performance, has driven digital designs to extrarordinary throughput, complexity and penetration in our lives.

The third wave – processor-based design – really started with digital computers but became a widespread force with the proliferation of the microprocessor and microcontroller in the late 1970s and 1980s.  The underlying digital technology scaling allows the processors grow by roughly one million fold in performance, enabling the explosion of software that characterizes the processor-based design wave.  Software has move inexorably from assembly language coding, to use of high-level languages and optimizing compilers, to to rich software reuse in processor-centric ecosystems, especially around specific operating systems, and to the  proliferation of open-source software as a major driver for cost-reduction, creativity and standardization in complex software systems.

We are now on the cusp of the fourth wave – cognitive computing.  The emergence of large data-sets, new hardware and methods for training of complex neural networks, and the need to extract more insight from ambiguous video and audio, all have helped drive this fourth wave.  It will not replace the prior three waves – we will certainly need advanced design capabilities in analog, digital and processors-plus-software, but these will often be the raw building-blocks for constructing cognitive computing systems.  And even when deep learning and other cognitive computing methods form the heart of an electronic system, these other types of design will play complementary roles in communication, storage and conventional computing around the cognitive heart.  The acknowledgement of the power of cognitive computing is a very recent development – deep neural networks were an obscure curiosity four years ago – but we can anticipate rapid development, and perhaps dramatic change.  In fact, it seems likely that many of today’s hot network structures, training  methods, data-sets and applications will be obsoleted several times over in the next ten years.  Nevertheless, the underlying need for such systems is durable.

Archeologists understand that the proliferation, economics, and even culture of a community is often driven by the characteristic tools of the group.  The variety and usefulness of electronic systems is inevitably coupled to the availability of design tools to rapidly and reliably create new systems,  In the figure below, we show a few of the key tools that typify design today in the analog, digital and processor-based layers.

Current key tools for design
Current key tools for design

The cognitive computing community fully appreciates the need for robust, easy-to-use tool environments, but those emerging tool flows are often still crude, and rarely cover the complete design development cycle from concept and data set selection, to deployment, verification and release.  It seems safe to predict that major categories will cover training, network structure optimization, automated data curation, with labeling, synthesis and augmentation,  and widespread licensing of common large data-sets.  In addition, we might  expect to see tools to assist in debug and visualization  of networks, environments for debug and regression testing, and new mechanisms to verify the accuracy, robustness, and efficiency of training networks.  Finally, no complete system or application will consist of a single cognitive engine or neural network – real systems will comprise a rich mix of conventionally programmed hardware/software and multiple cognitive elements working together, and often distributed across the physical environment, with some elements close to myriad sensors and others deployed entirely in the cloud.  We can easily see the eventual evolution of tools and methods to manage those highly distributed systems, and perhaps relying on data-flows from millions of human users or billions of sensors.

So the fourth wave seems to be here now, but we cannot yet hope to see its ultimate impact on the world.

 

5G Wireless Meets AI

The technology landscape has been utterly transformed in the past decade by one technology above all others – mobile wireless data services.  It has enabled the global smart phone revolution, with millions of apps and new business models based on ubiquitous high-bandwidth data access, especially using 3G, 4G and WiFi.  It has also transformed computing infrastructure, through the impetus for continuous real-time  applications served up from massive aggregated data of social connections, transportation, commerce, and  crowd-sourced entertainment.

So what’s next?  One direction is obvious and important – yet better wireless data services, especially ubiquitous cellular or 5G wireless.  It is clear that the global appetite for data – higher bandwidth, more reliable, lower latency, more universal data  – is huge.  If the industry can find ways to deliver improved data services at reasonable costs, the demand will likely drive an enormous variety of new applications.  And that trend alone is an interesting fundamental technology story.
The next decade will also witness another  revolution – widespread development and deployment of cognitive computing,  Neural inspired computing methods already play a key role in advanced driver assistance systems, speech recognition and face recognition, and are likely to sweep through many robotics, social media, finance, commerce and health care applications.  Triggered by availability of new data-sets and bigger, better trained neural networks, cognitive computing will likely show rapid improvements in effectiveness in complex recognition and decision-making scenarios.
But how will these two technologies – 5G and cognitive computing – interact?  How will this new basic computing model shift the demands on wireless networks?  Let’s explore a bit.
Let’s start by looking at some of the proposed attributes of 5G:
  • Total capacity scaling via small cells
  • Much larger number of  users and much higher bandwidth per user
  • Reduced latency – down to a few tens of milliseconds
  • Native machine-to-machine connectivity without the capacity, bandwidth and latency constraints of the basestation as intermediary
  • More seamless integration across fiber and wireless for better end-to-end throughput
These functions will not be easy to achieve – sustained algorithmic and implementation innovation in massive MIMO, new modulation and frequency.time division multiplexing, exploitation of millimeter wave frequency bands, and collaborative multi-cell protocols are likely to be needed.  Even the device architectures themselves, to make them more suited for low-latency machine-to-machine connectivity, will be mandatory.
Some of the attributes of 5G are driven by a simple extrapolation from today’s mobile data use models.  For example, it is reasonable to expect that media content streaming, especially video streaming will be a major downlink use-case, driven by more global viewing, and by higher resolution video formats.  However, the number of video consumers cannot grow indefinitely – the human population is increasing at only about 1.2% per year.    So how far can downlink traffic grow if it is driven primarily by conventional video consumption – certainly by an order of magnitude, but probably not by 2 or 3 orders of magnitude,
  On the other hand, we are seeing rapid increase in the number of data sensors, especially cameras,  in the world –  cameras in mobile phones, security cameras, flying drones and toys, smart home appliances, cars, industrial control nodes and other real-time devices.  Image sensors produce so much more data than other common sensor types (e.g. microphones for audio and accelerometers for motion) that virtually 100% of all newly produced raw data is image/video data.  Cameras are increasing a much faster rate than human eyeballs – at a rate of more than 20% per year.  In fact, I estimate that the number of cameras in the world exceeded the number of humans, starting in 2016.  By 2020, we could see more than 20B image sensors active in the world, each theoretically capturing 250MB per second or more of data (1080p60 4:2:2 video capture)
This raises two key questions.
 First, how do we move that much video content.  Even with good (but still lossy) video compression (assuming 6Mbps encoding rate), that number of cameras running continuously implies more than 10^17 bits per second of required uplink capacity.  That translates into a requirement for the equivalent of hundreds of millions of current 4G base stations, just to handle the video uplink.  This implies that 5G wireless needs to look at least as hard at uplink capacity as at downlink capacity.  Of course, not all those cameras will be working continuously, but it is easily possible to imagine that the combination of self-driving cars, ubiquitous cameras in public areas for safety and more immersive social media could increase the total volume by several orders of magnitude.
Second, who is going to look at the output of all those cameras?  It can’t be people – there simple aren’t enough eyeballs – so we need some other audience – “virtual eyeballs” to monitor, and extract the images, events or sequences of particular interest – to automatically make decisions using that image or video flow, or to distill it down to content down just the rare and relevant events for human evaluation.
In many cases, the latency demands on decision making are so intense – in automotive systems for example -that only real-time machine vision will be fast enough to respond.   This implies that computer vision will be a key driving force in distributed real-time systems.  In some important scenarios, the vision intelligence will be implemented in the infrastructure, either in cloud servers or in a new category of cloud-edge computing nodes. This suggests a heavy emphasis on the capacity scaling and latency.  In many other scenarios, the interpretation of the video must be local to the device to make it fast and robust enough for mission-critical applications.  In these cases, the recognition tasks are local, and only the the more concise and abstract  stream of recognized objects or events needs to be communicated over the wireless network.
So let’s make a recap some observations about the implications of the simultaneous evolution of 5G wireless and cognitive computing.
 
  • We will see a “tug-of-war” between in device-side or cloud-side cognitive computing, based on bandwidth and latency demands, concerns of robustness in the face of network outage, and the risks of exposure of raw data.
    1. Device: low latency, low bandwidth consumption, lowest energy
    2. Cloud: Training, fast model updates, data aggregation, flexibility
  • Current wireless network are not fast enough for cloud-based real-time vision, but 5G capacity gains, especially combined with good cloud edge computing may be close enough.
  • The overwhelming number of image sensors makes computer vision’s “virtual eyeballs” necessary.  This, in turn, implies intense demands on both uplink capacity and local intelligence.  
  • Machine-to-machine interactions will not happen on low-level raw sensor data – cars will not exchange pixel-level information to avoid accidents.  The machines will need to exchange  abstract data in order to provide high robustness and low latency, even with 5G networks.
  • Wireless network operations will be highly complex, with big opportunities for adaptive behavior to improve service, trim costs and reduce  energy consumption.  Sophisticated pattern recognition and response using cognitive computing make ultimately play a significant role in  real-0time network management.

So its time to buckle in and get ready for an exciting ride – a decade of dramatic innovation in systems, especially systems at the confluence of 5G wireless and deep learning.  For more insights, please see my presentation from the IEEE 5G Summit, September 29, 2016: rowen-5g-meets-deep-learning-v2

 

Cognitive Computing: Why Now and What Next?

Neural networks and the broader category of cognitive computing have certainly blossomed in the past couple of years.  After more than three decades of academic investment, neural networks are an overnight success.  I think three forces have triggered this explosion of new technology (and hype).

First, the Internet has aggregated previously unimaginable reservoirs of raw data, capturing a vivid, comprehensive, but incoherent picture of the real world and human activity.  This becomes the foundation from which we can train models of reality, unprejudiced by oversimplified synopses.

Second, progress in computing and storage, have made it practicable to implement  large-scale model training processes, and to deploy useful inference-based applications using those trained models.  Amid hand-wringing over the so-called “death of Moore’s Law” we actually find that a combination of increasingly efficient engines and massively parallel training and inference installations is actually giving us sustained scaling of compute capability for neural networks.  Today, GPUs and FPGAs are leading hardware platforms for training and deployment, but we can safely bet that new platform architectures, build from direct experience with neural network algorithms, are just around the corner.

Third, we have seen rapid expansion of understanding of the essential mechanisms and applications of neural networks for cognition.  Universities, technology companies and end-users have quickly developed enthusiasm for the proposed benefits, even if the depth of knowledge is weak.  This excitement translates into funding, exploratory developments and pioneering product developments.

These three triggers – massive data availability, seriously parallel computing hardware, and wide enthusiasm – set the scene for the real work of bringing neural networks into the mainstream.  Already we see a range of practical deployments, in voice processing, automated translation, facial recognition and automated driving, but the real acceleration is still ahead of us.  We are likely to see truly smart deployments in finance, energy, retail, health care, transportation, public safety and agriculture in the next five years.

The rise of cognitive computing will not be smooth. It is perfectly safe to predict two types of hurdles.  On one hand, the technology will sometimes fail to deliver on promises, and some once-standard techniques will be discredited and abandoned in favor of new network structures, training methods, deployment platforms and application frameworks.  We may even think sometimes that the cognitive computing revolution has failed.  On the other hand, there will be days when the technology appears so powerful as to be a threat to our established patterns of work and life.  It will sometimes appear to achieve a level of intelligence, independence and mastery that frightens people.  We will ask, sometimes justifiably, if we want to put decision making on key issues of morality, liberty, privacy and empathy into the hands of artificial intelligences.

Nevertheless, I remain an optimist, on the speed of progress  and depth of impact, and well as on our ability and willingness to shape this technology to fully serve human ends.

 

Hello, Cognite Ventures

Welcome to Cognitive Ventures.  I have created Cognite to discover, advise and invest in novel technologies and business models for cognitive computing.  The name, of course, refers to the broad area of cognitive computing, but imagines a new fundamental material, “cognite” from which intelligent systems can be build.