To launch automated vehicles, the general thinking is that LiDAR is required to accomplish highly precise long distance ranging so as to ensure safety. But LiDARs are seen as expensive in automotive circles. Costs will likely come down over time, but what to do now to deploy systems without breaking the bank? Today NODAR, an early stage startup based in Cambridge Massachusetts, emerged from stealth to announce their approach based on long baseline stereo vision. What exactly is this? Is it the answer?
Disclosure: I am an Advisor to NODAR.
Stereopsis Does What Mono Cameras Can’t
If you follow the self-driving vehicle world even marginally, you’re aware of the trifecta by which self-driving vehicles perceive the driving environment: camera, radar, and LiDAR. Radar and LiDAR bounce energy off the outer world and process the returning energy to detect objects, measuring range and velocity. Essential tracking and context for these objects is derived by processing data from individual cameras. Less well known is stereo vision, which relies on relative motion as seen from two cameras looking in roughly the same direction to measure range to objects ahead, a technique called stereopsis. Our eyes and brains do this, too; try maneuvering swiftly through a complex space with one eye covered and you’ll see how important that second eye is! Because cameras are super-cheap and high-performance radars and LiDARs are more expensive, stereo vision offers the potential to achieve high quality perception at a much lower cost.
If a couple of low cost standard automotive cameras can provide highly accurate long range sensing, as NODAR claims, it’s a game changer. But the details are a little complicated; first I need to unpack a few factors.
Range measurement can actually be provided by a single camera, using various assumptions and approximations. This technique is used for basic Advanced Driver Assistance Systems (ADAS), having been successfully commercialized by Mobileye. Tesla uses mono-cameras plus radar for AutoPilot perception. However, ADAS systems relying on mono-cameras are highly limited in range and can suffer from “object ambiguity,” among other factors. Therefore, they are not viewed as supporting long range sensing for advanced ADAS or most Automated Driving Systems (ADS). Texas Instruments recently published this very readable short report contrasting mono- and stereo cameras for perception.
In a stereo vision system, range capability is “built in” but is bound by the laws of optics. The larger the separation between the two cameras, the longer the range. We humans have good depth perception out to about ten meters due to the roughly six centimeter separation of our eyes. Now, consider the vehicle case. If two forward looking cameras were placed at the edges of a 1.5 meter wide passenger car (integrated into the headlight assembly, for instance), range measurements could be made for objects at 250 meters. Range would be greater with cameras placed at the edges of a big truck. That’s good, because trucks have a longer braking distance than cars.
But there’s a catch: if images from stereo cameras are randomly shaking around relative to one another, which will happen due to road vibrations when cameras are far apart, the image data can’t be correlated. Garbage in, garbage out. This appears to pretty much kill the idea of “cameras in each headlight assembly” for long range. The engineering compromise is to mount the two cameras on a single, mechanically stable assembly. Given engineering, space, and styling limitations, this typically results in a small unit placed forward of the rear-view mirror with cameras spaced only about 20cm apart, as shown in the image below. At this “short-baseline” spacing, reliable ranging of only 50 meters or less can be achieved. But this is still useful, and short-baseline stereo camera pairs have been incorporated into basic ADAS for years as an alternative or complement to radar. Suppliers such as Bosch, Continental, Denso, Hitachi, Texas Instruments, and Veoneer provide Adaptive Cruise Control, Emergency Brake Assist, and similar systems, based on stereo vision, to passenger car OEMs selling worldwide. Originally available only in luxury cars, Subaru has taken the lead in popularizing stereo vision for the masses with their Eyesight system. Originally introduced in 2008, Eyesight is now standard across all Subaru models.
How To Un-Vibe The Vibe For Longer Range?
Implementing long-baseline stereo vision to achieve that longer range is a tantalizing goal. Whoever figures out how to process vision data from independently jostling cameras will turn heads in the ADAS/ADS world. Automotive suppliers have been scratching their heads for years trying to crack this nut, looking to gain new OEM business.
Now, NODAR says they have solved these inter-camera calibration issues so that cameras no longer need to be packaged on a single assembly. This allows the separation between cameras to be as wide as the vehicle platform. Using “untethered” stereo cameras in this way allows detection range sufficient to support advanced ADAS and automated driving.
I asked NODAR CEO Dr. Leaf Jiang what led him to start this company. “It started with the core belief that advanced automotive safety systems should be safer than human drivers and accessible to everyone,” he said. “I worked for 13 years at the forefront of LiDAR technology at MIT Lincoln Laboratory, inventing new LiDAR systems, and deploying LiDAR and other electro-optic sensors systems on automated vehicles for defense applications. After thinking about the problem space for so long, I found a path to commercializing high-fidelity 3D sensing at passenger vehicle price point.”
NODAR’s announcement includes an online demonstration of its Hammerhead 3D vision system, which they say produces high-density 3D point-clouds at ranges up to 1,000 meters, providing object detection at “previously impossible” ranges. Jiang describes their approach this way, “We turned a mechanical engineering problem into a software problem.”
The online demonstration highlights the ability of NODAR’s system to detect and accurately measure the distance to a typical 10cm wide brick, on the road surface, at a distance of 150 meters. The brick serves to emulate a small object or debris in an actual road situation.
Providing technical context for this scenario, LiDAR expert Sabbir Rangwala notes that, “At a range of 150 meters, you need good enough angular resolution with a LiDAR to see at least 3-4 points on a 10 cm object to enable high confidence identification under bright and light starved conditions. If cameras are able to achieve this type of 3D sensing without active laser illumination, it provides a significant disruption to LiDAR”. In this respect, Jiang says "in this NODAR Hammerhead demo, the system detects 5 pixels in width by 11 pixels in length on the brick, more than satisfying this criterion.”
The NODAR press release explains that being able to detect an unknown object at 150 meters means that the vehicle (or driver) has ample time (about four seconds at highway speeds) to safely react to the obstacle. Further, that detection will be more reliable as compared to existing mono-camera approaches: “Mono-camera solutions relying on deep learning to estimate depth are inherently limited by finite training sets, compute requirements, and known object ambiguity (adult vs. child can introduce range error of 50%).” Addressing LIDAR sensing, NODAR notes that “LiDARs rely on scanning beams and can easily miss small objects. The LiDAR scanning process takes precious time, whereas a camera-based system offers >20X the area coverage rate with the reliability, robustness, and low price of high volume solid-state cameras. NODAR produces frame-by-frame disparity maps every 33 milliseconds while single-camera systems and LiDAR must aggregate and analyze data before producing results, causing significantly slower performance.” NODAR’s blog provides a deep dive on the physics and processing involved.
Unlike LiDAR and radar, cameras are passive sensors. For long-range forward sensing, they must work in low light, possibly beyond the headlight illumination range. Jiang says part of the solution lies in advances in today’s CMOS cameras, “which are starting to meet or exceed human vision.” He notes that Hammerhead can process data from multi-spectral cameras, including infrared for nighttime driving. Camera selection is an OEM decision.
What about computational intensity? Will the image processing requirements using NODAR’s approach require expensive high-end chips? Not a problem, according to Jiang, “Our calibration algorithms require a fraction of the computing resources needed for running the stereo correspondence algorithm, so as long as you have a processor capable of standard old-school stereo vision, you already have the computational requirements needed for NODAR software.”
NODAR is not alone in this space. Israel-based Foresight, founded in 2015, develops both “in-line-of sight” vision systems and “beyond-line-of-sight” cellular-based applications. The Foresight website describes their approach for automatic calibration of untethered cameras. Their sensor, called QuadSight, uses two thermal and two visible-light cameras. The website notes that, “Through sensor fusion, QuadSight leverages reflected light from visible-light cameras with thermal energy captured by long-wave infrared cameras for robust accurate object detection of any shape, form or material, in all weather and lighting conditions – including complete darkness, rain, haze, fog and glare.”
Foresight contends that QuadSight is the only sensor which generates an infrared-based point cloud. The company announced last September it has completed the development of a commercial version of its automatic calibration software and filed for U.S. patents on their technique. Foresight has partnered with FLIR Systems, Inc. to exclusively source FLIR thermal imagers, with FLIR leveraging its substantial presence in automotive product to identify customers for QuadSight. Foresight has signed a distribution agreement with Cornes Technologies for OEM and Tier 1 customers in Japan.
Foresight is also a partner in the European Commission’s newly funded All Weather Autonomous Real logistics operations and Demonstrations (AWARD) consortium, which will develop and operate safe autonomous heavy-duty vehicles in harsh weather conditions.
NODAR is offering a software-only capability that can be adapted to a stereo camera approach being implemented by a supplier or OEM, whereas Foresight is offering a complete sensor package. Both approaches have advantages in the current marketplace.
While the space for mono-camera and LiDAR solutions is crowded, the universe for long-baseline stereo appears to be quite small: Foresight and NODAR. Interestingly, truck ADS developer Plus has stated they are also employing untethered stereo vision, but they haven’t provided further details.
I would think that the automotive supplier leaders in first generation stereo vision are hard at work to develop techniques similar to those touted by Foresight and NODAR. Or have the suppliers tried and failed, opening the door to upstart players?
My OEM and supplier contacts tell me that “if” stereo vision can be deployed taking advantage of the entire width of the vehicle, this would open up a range of design options, reducing cost, and providing a redundant perception path for Level 3 and 4 automation. This a big if, and they want to get their hands on these systems for evaluation. Foresight press releases indicate they have had some success in selling prototype systems to vehicle OEMs for evaluation. NODAR is currently engaged in proof-of-concept projects with several automotive OEMs and Tier 1 suppliers. Jiang expects that in 2021, NODAR will engage with one or more commercial partners to integrate Hammerhead into a production system.
Fisheye Vision Of An Entirely Different Sort
Safety systems relying on stereo vision are now on millions of cars protecting their occupants in safety-critical short range scenarios. Cameras on cars are increasingly ubiquitous: in the front, on the sides, and especially in the rear. What can they see together that they can’t see alone? Could side-looking Tesla cameras enhance perception for their “Summon” parking lot maneuvers, for instance? If the calibration issues for long-baseline stereo are truly solved, the possibilities abound.
NODAR’s Hammerhead shark analogy is clever, since the hammerhead shark has the best depth perception in the animal kingdom due to the wide separation between its eyes. NODAR is offering perception in which “the distance to every pixel in every frame is known with precision.” What tech developer wouldn't want that?
CEO Jiang asserts that, “A camera-based solution is the only way to deliver on the performance, safety, and pricing requirements of the mainstream automotive market.” Like most startups in this space, they and Foresight are now on the long and arduous path of proving their tech works within the constraints of automotive product.
Sharks are known to be stealthy, but this is no longer the case for NODAR. Being “out” in the larger world will no doubt invite some interesting debate from the mono-camera and LiDAR camps. I’m keen to hear what they have to say, at the same time closely watching the auto industry’s reception for the “new stereo guys” in town.