CES 2023 (4) – VoxelSensors 3D Perception, Fast and Accurate

Attending AR/VR/MR and Photonics West Conference

I will attend the AR/VR/MR conference next week (as I write this article). I have already had many people ask if I would be there and like meeting blog readers. I have been reviewing the AR/VR/MR program, and there are many interesting talks. This is the best conference to learn about AR technology and meet technical people in AR.

Porotech, the subject of the previous article, will be presenting Simplifying the manufacture and design of optical systems for AR glasses with DynamicPixelTuning® microLED technology on 30 January 2023 • 4:40 PM, Room 3022. Porotech will also be publicly exhibiting on Jan. 31 and Feb. 1.

VoxelSensors, the subject of this article, will be presenting Low power, low latency 3D perception for XR 30 January 2023 • 3:20 PM PST, Room 3024 

Introduction

This series on CES 2023 is starting with new foundation technologies for AR/MR rather than products. The first three parts covered Meta Materials’ non-polarizing dimming, Porotech’s MicroLED display technologies, and Addoptics 3-D Printable Optics Molds. This article discusses VoxelSensors, a new 3-D perception technology.

While I am familiar with the basic concepts of 3D perception, I have not studied it extensively. So I may slip up on the terminology and be unaware of some methods used in AR and VR. Based on what I know, VoxelSensors seems to have major advances over the current 3D perception methods.

To help understand how the VoxelSensors approach is so different, a short background on the common existing 3D perception methods and their limitations is included toward the end of this article.

VoxelSensor’s Switching Pixels™ – A New Type of “Camera”

Voxelsensors is a fabless semiconductor company making a new type of “camera,” with each sensor, what they call Switching Pixels™, capturing the time a scanning laser “event” occurs. It can send out events at a rate of 100 MHz.

The VoxelSensor works differently than an “event camera,” as they are commonly known. However, existing event cameras were used to develop the algorithms before the specialized and a better-performing, VoxelSensor was created.

Because the VoxelSensor is designed to look for a scanning laser, it can be very sensitive to that laser light while rejecting all other light. Because of laser and IR radiation eye safety limits, in 3D perception, better sensor sensitivity translates into better range and resolution.

VoxelSensors 3D Perception Method

The VoxelSensors 3D Perception method starts with a Lissajous scanning of a laser beam. While the scanning does not have to be Lissajous scanning, the Lissajous pattern can quickly scan a space image of the whole area. The resolution keeps building if everything is stationary as the scanning process continues. Conceptually, the high-resolution perception is built from a series of up-to-date sparse detections. The figure below from VoxelSensor illustrates how a sparse scanning process detects faster than a common Time-of-Flight Camera LiDAR. Also, note that it will have a much higher-resolution image at any point in time.

Figure by VoxelSensors illustrating how the Lissajous scanning process is faster and higher resolution

More than 10X Faster Detection than Typical LiDARs

The VoxelSensor’s method has a sparse image of the whole field of view in less than 1 millisecond. In contrast, typical LiDAR methods can take 16 milliseconds or more plus whatever processing time is required before it detects anything. By this time it detects anything, the typical LiDAR scan is stale by more than 16 milliseconds.

Using the output of two VoxelSensors, it is very simple to determine 3D distance via simple triangulation. There is no complex image processing required. The distance accuracy, unlike LiDAR, is not based on the speed of light. Thus the distance resolution can be higher and not requires the degree of calibration required to measure distances based on the speed of light. The figure below outlines VoxelSensor’s 3D perception process.

VoxelSensors’ CES 2023 Suite Video

Demo Hardware

VoxelSensors produced a 1-minute video showing their suite at CES 2023 and some information on their technology. They used a visible red laser (rather than IR) so people could see the process.

The current demonstration, briefly seen in the video, is big and bulky and has a ways to go in terms of fitting in an AR or VR Headset. But this is common for an early-stage demonstration. VoxelSensors expects everything to fit easily in an AR/VR headset in a finished product with just the two small Switching Pixels sensors.

Oqmented Lissajous Scanner Used (Forget LBS Displays)

Oqmented Laser Scanning Display at AWE 2021

VoxelSensors is in the process of shrinking everything and announced in December 2022 that they are working with Oqmented, which has Lissajous scanning mem’s mirrors. In addition to 3D perception, Oqmented has been trying to use its Lissajous scanning process for making laser scanning displays for several years.

As I wrote and showed in AWE 2021 Part 2: Laser Scanning – Oqmented, Dispelix, and ST Micro, I don’t think Lissajous scanning laser beam scanning displays (LBS) will ever make sense or be competitive. But in that same article, I thought that Oqmented and 3-D Sensing with Lissajous Scanning would make tremendous sense in 3D sensing/perception (perhaps I knew of this 3D perception method). To be blunt, Oqmented should concentrate on perception and forget about display applications.

Background: Common AR/VR 3D Perception Methods

The basic goal of 3D sensing is to develop a point cloud of Voxels in terms of horizontal (X), vertical (Y), and depth (Z) to physical objects in the real world. Some of the common 3D perception techniques used in VR/AR:

  1. Optical (visible and IR) tracking – The image(s) from one or more cameras is combined with image processing and structural assumptions (like the relationship of fingers to the hand and the hand to the arm) to determine 2D and 3D locations. With more than one camera or over multiple frames over time, depth information can be teased out of the images.
    • While the most common and lowest cost way to get some level of 3D perception. The resolution and accuracy, particularly in Z/depth, are low and require a significant amount of process.
    • Speed is limited by the camera frame rate and the time to process the images. Any depth sensing is slow and may take multiple frames to build.
  2. Structured Light – One or more pattern(s) of (typically IR) light is projected and then captured by one or more (often IR) camera(s). The distortion of the structured light pattern is then processed to extract the 3D information from how the structured light patterns are distorted. Structured light was first made famous by its use in Microsoft’s Kinect. PrimeSense, which developed the Kinect’s 3D perception technology, was acquired by Apple in 2013. Apple then started adding structured light 3D perception to iPhones and iPads, starting with the iPhone X.
    • Structured light typically is better concerning understanding Z/distance and takes less processing. It sometimes can take the projector and camera moving to gain accuracy if there is a single projector and camera (example: iPhone X)
  3. Scanning LiDAR (scanning with a single time of flight sensor) works by sending out a beam of IR light, usually laser (some low-end systems use a highly focused LED), and measuring the time it takes for light to return to one or a few sensors with the distance calculated based on the speed of light. One or more pulsing beams are scanned in X and Y to determine X and Y. The X and Y locations are determined by synchronizing with the scanning processing, and the Z/distance is determined by the time it takes a speed of light for the pulsing light source to return. There are many ways the beam can be scanned, including motors rotating an assembly of multiple lasers, mems scanning mirror(s), and vibrating diffraction gratings.
    • Typically the scanning process is relatively slow (slower than 1/60th of a second).
    • The Z/distance accuracy is limited by the ability to accurately emit light, sense, and measure the speed of light (about 1 foot or 30cm per nanosecond).
    • As the typical IR laser light must be limited in output for eye safety (with any method using lasers), distance, speed, resolution, and the sensor’s sensitivity and the ability to reject noise, all become interrelated.
    • Intel’s spec for their mems mirror LiDAR (being discontinued) gives some idea of the capabilities and trade-offs with mems scanning LiDAR with a frame rate of about 1/30th of a second and varying resolution based on distance. Interestingly, the Intel Lidar was planned for the Meta Quest Pro.
  4. Solid State, Time of Flight Camera (ToF), LiDAR – Instead of scanning the narrow (laser) beam, the area is flooded with a single wide beam or an array of smaller beams from a laser through diffractive gratings or the like. A time-of-flight camera with an X-Y array of small sensors is used to measure the time and, thus, distance. Solid-state LiDAR is often seen in cell phones, VR, and AR systems.
    • The resolution is limited at any given instance to the resolution of the ToF camera (higher resolution can be built up over time by combining with motion information).
    • The individual sensors typically require more photons to sense, which means it takes more time and thus a slower frame rate the farther away the system wants to detect objects.
    • The accuracy of the time measurement
    • The speed in AR/VR is typically slow. For example, the Hololens 2 can take up to one second to determine the far distances.

The following table is a comparison made by VoxelSensors:

Conclusions / Summary

What I see as major standout capabilities of VoxelSensors over existing 3D perception methods:

  • Much faster initial detection of movement (~1 millisecond vs. typical 16-33 milliseconds)
  • Simple Triangulation vs. speed of light type measurement or massive stereo image processing for depth calculation has advantages in accuracy and speed (and power) of calculation.
  • Highly sensitive Switching Pixels™ enable higher frame rate, longer distances, and better utilization of eye-safe laser light.

We will have to wait and see how the VoxelSensors system will shrink and what the Switching Pixels VoxelSensors will cost in production. I assume VoxelSensors will eventually be competitive with time-of-flight cameras in both size and cost but offer the above advantages.

Karl Guttag
Karl Guttag
Articles: 229

3 Comments

  1. Hi Karl,
    Thanks for your recent articles. Appreciate the different angles.

    Speaking of Photonics West, could you or Brad please visit LetinAR’s booth?
    It seems reflective optics are the way out.
    Would be amazing to hear your take on LetinAR’s PinTilt enhancement that was recently publicized.
    https://www.youtube.com/watch?v=GL9e-uVjAcI

    Cheers

    • I’m going to Photonics West and the AR/VR/MR conference within the conference, but Brad is not. Brad and I discussed LetinAR in the video we recorded about CES. We recorded over 2 hours, and Brad is releasing it in 20 to 30-minute parts over the next few weeks (https://www.youtube.com/playlist?list=PLK1pU2H_tAvD8R8p6hoE0DhrvdIXObBif). Thanks for the video. It does a good job of explaining the newer LetinAR design. By using TIR and tilting the focusing mirror they can also tilt the pin mirrors at a shallower angle reducing their effect on the real world.

      LetinAR pin mirrors work and are much more efficient than waveguides (and thus can work with Micro-OLEDs), but I have reservations about the technique. First, technically the pin mirrors should be spaced the size of the pupil to get a continuous image without a double image, but since the pupil varies in size with light and from person to person, the is no perfect spacing. You can also still see dimming circles and diffractive/blurring effects when you look at the real world.

Leave a Reply

%d bloggers like this: