304 North Cardinal St.
Dorchester Center, MA 02124
304 North Cardinal St.
Dorchester Center, MA 02124
I apologize for yet another delay in the next part about Field of View, but there was a bit of news I wanted to address on a timely basis. According to Nikkei Tech, on October 8, 2019, Sony announced an AR Experience with Ghostbusters In the Park starting October 12, 2019, to December 8, 2019, at Ginza Sony Park (warning: much of the Nikkei Tech article is behind a paywall). On the same day, Sony put a 1-minute commercial-type video on YouTube called “Ghostbusters AR Experience, Ginza Park, Japan (Sony).”
On various web forums that discuss displays this weekend, people were wondering what technology Sony was using in this new AR headset. None of the pictures I could find nor frames in the Sony Ghostbuster AR video gave enough detail to verify the technology. My big break occurred when Adrian Stannard, told me that he had seen the headset at SID 2019 during a paper presentation. Adrian was formerly the Director of Research and Lead Optical Engineer at Daqri.
Also, to get this article out quickly, I didn’t send it to Ron Padzensky. Ron often reviews these articles and catches many errors. I apologies in advance.
Adrian only got a quick look at the Sony headset after the presentation and thought it looks like a Lumus optical engine because he could see the partial mirror slats. From the rough pictures, I had seen up to that point, it did look a lot like the Lumus’s OE Vision 1080p optical engine that was shown at AWE 2019 in a Lenovo prototype (see the comparison on the left)
While it looked close, even from the lower resolution images I had up to that point, the edges of the optics look more squared off than the Lumus engine. Also curious was that in the Sony video, they showed using an LCD shutter (more on that later). Needing shutter was indicative of the display not being bright enough. Yet I knew the Lumus engine has several thousand nits, one of the brightest optical engines available.
With the lead from Adrian, I looked up the Sony papers at the 2019 SID Display Week Digest of Papers and found paper 2-24, “Latency Compensation for Optical See-Through Head-Mounted with Scanned Display” (behind a paywall if you are not a member of SID). As the paper’s title suggests, it mostly discusses motion prediction and pre-compensation for an AR headset. The display they are using is a Sony 1280 by 768 pixel interlaced Micro-OLED.
There was almost nothing on the optics other than a picture of the headset, but that picture was of high enough resolution to clearly showed the optical combiner. It was a perfect match for the headset in the video (see below).
On the left, I have zoomed in on Sony’s combiner and on a picture I took of the Lumus optics. In these pictures, you can tell that not only is the shape of the edges different, but they are very different technologies.
First, the Sony optics are blocking a large percentage of the light, whereas the Lumus optics are nearly transparent. Therefore the “slates” are barely visible in the Lumus waveguide. While hard to tell from this one angle, the Sony optics look like they are 3 to 4 times thicker than the Lumus.
With the Lumus waveguide, the light bounces many times through what is called a waveguide using total internal reflection (TIR). The slats are known as pupil expanders because they act to increase the pupil/eye-box. In the Lumus structure, the light ends up passing through the upper slats to get to the bottom loover. If the slats blocked as much light as seen in the Sony optics, almost no light would reach to the bottom louver.
Sony is using a different mechanism that Lumus. They can’t be like other waveguides used in Lumus, Hololens, Magic Leap, Digilens, and Waveoptics (to name a few) with many TIRs reflections. The Sony design is likely using at most a single TIR off the front surface more like what is known a freeform optics. While each of the slats in the Sony design is acting like a pupil expander similar to Lumus, but there has to be a direct path, with perhaps one TIR kiss off the front glass, for each of the slats. To give a more direct path, the glass must be a lot thicker. Also, Sony’s thicker glass and slats will help with light throughput.
The type of Micro-OLEDs in the Sony Headset output on the order of 1000 nits. Even very bright (and expensive) OLEDs only go to 5,000 nits. 1,000 to 5,000 nits may sound like a bright display compared to a 600 nit smartphone or 200 nit computer monitor. But when you are making a transparent display, the AR combiner optics often relay much less than 10% of the nits to the eye.
With DLP and LCOS projectors, the light output can be well over 1 million nits as they can highly collimate LED light. High nits are the reason why DLP and LCOS are commonly used with waveguides while you never see OLEDs being used with waveguide optics. Laser scanning, as used on the Hololens 2, has a beam that at any instant in time puts out many millions of nits (enough to burn through the retina if the beam stops).
In addition to other losses, the output in nits of a waveguide is reduced by at least the area of the input versus the area eye-box at the output. Without getting into all the messy math and details, with a 1.8mm thin waveguide like Lumus, this reduction can be more than 100s to 1 and Lumus can be almost 10 times better than typical diffractive waveguides. Thus if they used a 1000 nit Micro-OLED as the input, they would get well less than 10 nits out, which is far too dim for anything but nighttime use. This is why you don’t see Mico-LED displays paired with common waveguides and Sony used a different structure.
Even with the optical combiner Sony is using, they likely going to have losses in nits of greater than 10 to 1. So if starting with ~1000 nit micro-OLED, they are probably only getting about 100 nits or less out to the eye. The light loss from the Sony type, or worse yet waveguides optical combiners is why companies like nReal went with a much simpler birdbath structure when using Micro-OLEDs.
The video at about 24 seconds shows them turning on what looks like an LCD shutter in front of the combiner to block most of the real-world light. I have captured the before and after the shutter is turned on in the video.
So on top of Sony’s combiner optics blocking a lot of light, they have added an LCD shutter to block most of the ambient light for outdoor use.
The focus of Sony’s paper was on pre-compensation to deal with “motion to photon” delay. The picture of the headset that helped me identify the combiner structure was almost an afterthought. The Micro-OLED display they were using further complicates matters because it used 120Hz Interlaced video. Without any correction, in addition to other delay problems, they would see a comb-like effect on any edges where there is motion.
The paper says their hardware has a “warper [that] applies the inverse displacements to the rendered image on the frame buffer” (see picture below). Additionally, the Warper is doing operations to correct for lens distortion. Key here is that they are doing some significant processing of the rendered image in the frame buffer before it is sent to the display. Processing that is done after the frame buffer is known as “back-end” processing.
The basic idea is that to reduce the “motion-to-photon” latency, they are correcting for any user motion after the rendering has been done into the frame buffer and thus nearer to the time when the pixels will be seen. Furthermore, because the rendered image is being changed after the frame buffer, any lens distortion correction must be after this transform and thus must also happen after the frame buffer.
It so happens that also at SID 2019, ARM was announcing their Mali-D77 “Display Processing Unit.” At ARM’s press conference, ARM talked about many of the same “motion to photon” issues. It also happens that the Sony design uses an ARM-based core with a GPU and display controller. As of this writing, I have not been able to confirm whether the Sony AR design uses the Mali-D77.
The whole idea of back end processing (processing after the frame buffer) goes back over 40 years ago. Prior to working on displays, I worked for the better part of 20 years on both front- and back-end graphics processing. So the Sony processing and ARMs Mail-D77 announcement I found personally interesting. So I am going to indulge in a bit of personal history. You can stop here if you don’t want to hear about it.
The first device I worked on way back in 1977 was the TMS9918, Video Display Processor (VDP). The TMS9918 was the very device with Sprites. I was one of two engineers that figured out sprite processing and I designed all of the sprite processing logic. The TMS9918 was different and sophisticated for the time as we went through a series of processing steps rather than having dedicated hardware to what was known as “player graphics.” processing.
The TI Home Computer, Colecovision, and the first generation MSX Computers in Japan all used the TMS9918 or is variant the 9928 and 9929. Both Nintendo and Sega wrote software for Colecovision and were involved with the MSX computers in Japan. When these two companies designed their own home game machines, they used superset clones of the TMS9918’s Sprite processing.
Back in the late 1970s, only very high-end CAD computers could afford a frame buffer. The TMS9918 was the first consumer device to use DRAM, but we still had to generate the display “on the fly” as video games could not afford enough DRAM for a complete frame buffer. So, in essence, everything on the TMS9918 was back end processing, including the Sprites.
In the early 1980s, I led the definition and design of the TMS340 family, the first fully programmable graphics processors. The TMS340 family had “SIMD” graphics operations almost 10 years before Intel put them on on the Pentium as “MMX” instructions. The TMS340 family of processors were designed to work into frame buffers. Essentially, the TMS340 processors were front-end processors. Early on in the TMS340 program, we talked about doing bitmapped Sprites, what the Xerox Parc called BitBlts.
Staring in the late 1980s, Microsoft had a program known as “Talisman” that was going to do away with the frame buffer entirely and do everything on-the-fly from little cache buffers. Microsoft thought they would revolutionize the way computer graphics were done and hired many of the biggest names in computer graphics.
Talisman worked on 32 by 32 chucks of the image. Talisman called the objects they worked on “Sprites.”
One of the main underlying premises of the Talisman program was that DRAM, the only affordable type of frame buffer memory, was going to be too slow and that processing was going to be fast enough to support processing their Sprites on the fly. But then came Synchronous DRAM and later GDDR, something I also had a hand in helping create in the late 1980s, that dramatically improved the speed of DRAM. The block transfer speed of SDRAM resulted in is being more practical to keep doing all processing into a frame buffer and undercut Talisman’s back-end approach.
With ARM’s Mali architecture, and they split of front-end processing before the frame buffer and back-end processing after it. With this architecture, they now have significant front- and back-end processing. In a way, we have come full circle. ARM’s Display Processing Unit (DPU) also uses a series of cache buffers to deal with issues like rotation without requiring massive buffering. ARM is still doing the bulk of all the rendering in the front-end. But with today’s technology, they are trying to intelligently split the work to reduce the motion-to-photon latency. Moving some tasks to back-end (sprite) processing. As they say, “everything old is new again.”