Busting, Sony’s Ghostbusters AR Display (and a Little Sprite History)

Introduction

I apologize for yet another delay in the next part about Field of View, but there was a bit of news I wanted to address on a timely basis. According to Nikkei Tech, on October 8, 2019, Sony announced an AR Experience with Ghostbusters In the Park starting October 12, 2019, to December 8, 2019, at Ginza Sony Park (warning: much of the Nikkei Tech article is behind a paywall). On the same day, Sony put a 1-minute commercial-type video on YouTube called “Ghostbusters AR Experience, Ginza Park, Japan (Sony).”

On various web forums that discuss displays this weekend, people were wondering what technology Sony was using in this new AR headset. None of the pictures I could find nor frames in the Sony Ghostbuster AR video gave enough detail to verify the technology. My big break occurred when Adrian Stannard, told me that he had seen the headset at SID 2019 during a paper presentation. Adrian was formerly the Director of Research and Lead Optical Engineer at Daqri.

Also, to get this article out quickly, I didn’t send it to Ron Padzensky. Ron often reviews these articles and catches many errors. I apologies in advance.

Identifying the Technology

Adrian only got a quick look at the Sony headset after the presentation and thought it looks like a Lumus optical engine because he could see the partial mirror slats. From the rough pictures, I had seen up to that point, it did look a lot like the Lumus’s OE Vision 1080p optical engine that was shown at AWE 2019 in a Lenovo prototype (see the comparison on the left)

While it looked close, even from the lower resolution images I had up to that point, the edges of the optics look more squared off than the Lumus engine. Also curious was that in the Sony video, they showed using an LCD shutter (more on that later). Needing shutter was indicative of the display not being bright enough. Yet I knew the Lumus engine has several thousand nits, one of the brightest optical engines available.

With the lead from Adrian, I looked up the Sony papers at the 2019 SID Display Week Digest of Papers and found paper 2-24, “Latency Compensation for Optical See-Through Head-Mounted with Scanned Display” (behind a paywall if you are not a member of SID). As the paper’s title suggests, it mostly discusses motion prediction and pre-compensation for an AR headset. The display they are using is a Sony 1280 by 768 pixel interlaced Micro-OLED.

There was almost nothing on the optics other than a picture of the headset, but that picture was of high enough resolution to clearly showed the optical combiner. It was a perfect match for the headset in the video (see below).

Comparing the Sony and Lumus Combiner Optics

On the left, I have zoomed in on Sony’s combiner and on a picture I took of the Lumus optics. In these pictures, you can tell that not only is the shape of the edges different, but they are very different technologies.

First, the Sony optics are blocking a large percentage of the light, whereas the Lumus optics are nearly transparent. Therefore the “slates” are barely visible in the Lumus waveguide. While hard to tell from this one angle, the Sony optics look like they are 3 to 4 times thicker than the Lumus.

With the Lumus waveguide, the light bounces many times through what is called a waveguide using total internal reflection (TIR). The slats are known as pupil expanders because they act to increase the pupil/eye-box. In the Lumus structure, the light ends up passing through the upper slats to get to the bottom loover. If the slats blocked as much light as seen in the Sony optics, almost no light would reach to the bottom louver.

Sony is using a different mechanism that Lumus. They can’t be like other waveguides used in Lumus, Hololens, Magic Leap, Digilens, and Waveoptics (to name a few) with many TIRs reflections. The Sony design is likely using at most a single TIR off the front surface more like what is known a freeform optics. While each of the slats in the Sony design is acting like a pupil expander similar to Lumus, but there has to be a direct path, with perhaps one TIR kiss off the front glass, for each of the slats. To give a more direct path, the glass must be a lot thicker. Also, Sony’s thicker glass and slats will help with light throughput.

OLEDs are Not Even Close to Being Bright Enough for Thin Waveguides

The type of Micro-OLEDs in the Sony Headset output on the order of 1000 nits. Even very bright (and expensive) OLEDs only go to 5,000 nits. 1,000 to 5,000 nits may sound like a bright display compared to a 600 nit smartphone or 200 nit computer monitor. But when you are making a transparent display, the AR combiner optics often relay much less than 10% of the nits to the eye.

With DLP and LCOS projectors, the light output can be well over 1 million nits as they can highly collimate LED light. High nits are the reason why DLP and LCOS are commonly used with waveguides while you never see OLEDs being used with waveguide optics. Laser scanning, as used on the Hololens 2, has a beam that at any instant in time puts out many millions of nits (enough to burn through the retina if the beam stops).

In addition to other losses, the output in nits of a waveguide is reduced by at least the area of the input versus the area eye-box at the output. Without getting into all the messy math and details, with a 1.8mm thin waveguide like Lumus, this reduction can be more than 100s to 1 and Lumus can be almost 10 times better than typical diffractive waveguides. Thus if they used a 1000 nit Micro-OLED as the input, they would get well less than 10 nits out, which is far too dim for anything but nighttime use. This is why you don’t see Mico-LED displays paired with common waveguides and Sony used a different structure.

Sony’s LCD Shutter

Even with the optical combiner Sony is using, they likely going to have losses in nits of greater than 10 to 1. So if starting with ~1000 nit micro-OLED, they are probably only getting about 100 nits or less out to the eye. The light loss from the Sony type, or worse yet waveguides optical combiners is why companies like nReal went with a much simpler birdbath structure when using Micro-OLEDs.

The video at about 24 seconds shows them turning on what looks like an LCD shutter in front of the combiner to block most of the real-world light. I have captured the before and after the shutter is turned on in the video.

So on top of Sony’s combiner optics blocking a lot of light, they have added an LCD shutter to block most of the ambient light for outdoor use.

Sony’s Latency Compensation and Optics Correction After Frame Buffer  – Could They Be Using ARM’s Mali-D77 Display Processing Unit?

The focus of Sony’s paper was on pre-compensation to deal with “motion to photon” delay. The picture of the headset that helped me identify the combiner structure was almost an afterthought. The Micro-OLED display they were using further complicates matters because it used 120Hz Interlaced video. Without any correction, in addition to other delay problems, they would see a comb-like effect on any edges where there is motion.

The paper says their hardware has a “warper [that] applies the inverse displacements to the rendered image on the frame buffer” (see picture below). Additionally, the Warper is doing operations to correct for lens distortion. Key here is that they are doing some significant processing of the rendered image in the frame buffer before it is sent to the display. Processing that is done after the frame buffer is known as “back-end” processing.

The basic idea is that to reduce the “motion-to-photon” latency, they are correcting for any user motion after the rendering has been done into the frame buffer and thus nearer to the time when the pixels will be seen. Furthermore, because the rendered image is being changed after the frame buffer, any lens distortion correction must be after this transform and thus must also happen after the frame buffer.

It so happens that also at SID 2019, ARM was announcing their Mali-D77 “Display Processing Unit.” At ARM’s press conference, ARM talked about many of the same “motion to photon” issues. It also happens that the Sony design uses an ARM-based core with a GPU and display controller. As of this writing, I have not been able to confirm whether the Sony AR design uses the Mali-D77.

Some Sprite-Processing (Personal) History and Context

The whole idea of back end processing (processing after the frame buffer) goes back over 40 years ago. Prior to working on displays, I worked for the better part of 20 years on both front- and back-end graphics processing. So the Sony processing and ARMs Mail-D77 announcement I found personally interesting. So I am going to indulge in a bit of personal history. You can stop here if you don’t want to hear about it.

The first device I worked on way back in 1977 was the TMS9918, Video Display Processor (VDP). The TMS9918 was the very device with Sprites. I was one of two engineers that figured out sprite processing and I designed all of the sprite processing logic. The TMS9918 was different and sophisticated for the time as we went through a series of processing steps rather than having dedicated hardware to what was known as “player graphics.”   processing.

The TI Home Computer, Colecovision, and the first generation MSX Computers in Japan all used the TMS9918 or is variant the 9928 and 9929. Both Nintendo and Sega wrote software for Colecovision and were involved with the MSX computers in Japan. When these two companies designed their own home game machines, they used superset clones of the TMS9918’s Sprite processing.

Back in the late 1970s, only very high-end CAD computers could afford a frame buffer. The TMS9918 was the first consumer device to use DRAM, but we still had to generate the display “on the fly” as video games could not afford enough DRAM for a complete frame buffer. So, in essence, everything on the TMS9918 was back end processing, including the Sprites.

In the early 1980s, I led the definition and design of the TMS340 family, the first fully programmable graphics processors. The TMS340 family had “SIMD” graphics operations almost 10 years before Intel put them on on the Pentium as “MMX” instructions. The TMS340 family of processors were designed to work into frame buffers. Essentially, the TMS340 processors were front-end processors. Early on in the TMS340 program, we talked about doing bitmapped Sprites, what the Xerox Parc called BitBlts.

Staring in the late 1980s, Microsoft had a program known as “Talisman” that was going to do away with the frame buffer entirely and do everything on-the-fly from little cache buffers.   Microsoft thought they would revolutionize the way computer graphics were done and hired many of the biggest names in computer graphics.

Talisman worked on 32 by 32 chucks of the image. Talisman called the objects they worked on “Sprites.”

One of the main underlying premises of the Talisman program was that DRAM, the only affordable type of frame buffer memory, was going to be too slow and that processing was going to be fast enough to support processing their Sprites on the fly. But then came Synchronous DRAM and later GDDR, something I also had a hand in helping create in the late 1980s, that dramatically improved the speed of DRAM. The block transfer speed of SDRAM resulted in is being more practical to keep doing all processing into a frame buffer and undercut Talisman’s back-end approach.

With ARM’s Mali architecture, and they split of front-end processing before the frame buffer and back-end processing after it. With this architecture, they now have significant front- and back-end processing. In a way, we have come full circle. ARM’s Display Processing Unit (DPU) also uses a series of cache buffers to deal with issues like rotation without requiring massive buffering. ARM is still doing the bulk of all the rendering in the front-end. But with today’s technology, they are trying to intelligently split the work to reduce the motion-to-photon latency. Moving some tasks to back-end (sprite) processing. As they say, “everything old is new again.”

Default image
Karl Guttag

13 Comments

  1. How does the microOLED Sony was using compare to Emagin’s?

    • From what I remember (don’t have time to check them right now), the spec’s are similar. The main difference is that Sony is making Micro-OLEDs in high volume with many design wins. If you see a camera with an OLED viewfinder, it is almost for sure Sony. The same goes for other consumer products, including AR, that I have found. So far as I can tell, eMagin has mostly been a player in defense-related products. There are also China-based companies that are showing Micro-OLED devices.

      • Thanks for the update. The device in their paper (1280 by 768). Certainly, a 3X improvement would help and this project would probably have access to Sony’s best panels, but I suspect they are still sub 100 nits to the eye and maybe a lot less.

        Karl

      • Adrian Stannard
        Adrian Stannard

        Hi Karl, there are some significant differences in the latest uOLEDs from Sony – the structure was redesigned at the wafer level to minimize the optical path length through color filters (eliminating color bleaching that was prevalent on their earlier panels), and more useful still (something some of us minnows have been asking micro-display manufacturers for years): adding a MLA (micro lens array) to reduce the NA to make it close to the FOV you want. Their cone angle is ~ 40° – convenient for most HMDs, as you ellude to in your previous excellent article, much more than that is largely wasted – and also increasing luminance since light is not radiated over full solid angle. If you exceed +/-20° then the intensity rolls off (and color changes). At display week they demonstrated these panels, giving 3000 and 5000 cd/m^2 (latter being quarter area due to electronic drive reasons) – they are indeed impressively bright – some 3000 nit samples I received were closer to 4000 nits. Emagin panels, although impressive – in order to achieve similar luminance they need to remove the color filters, which as you can imagine results in a considerably reduced occupation of the CIE color space.

        Sony’s [edited at author’s request] would certainly make a good photo-emissive projector option (ignoring persistence artifacts!) whilst we all wait for suitable microLEDs to reach maturity.

      • Thanks for the update on Sony’s uOLEDs, Adrian,

        I may be a bit out of date on what they are offering on the market. Also, an internal Sony AR headset might have access to newer technology than the broad market. Working on AR and wanting high see-through were typically over 90% of the nits-in are lost in the combining optics, even 5,000 nits does not change things much for broad market AR. There are exceptions to the 90+% loss “rule,” particularly in high-cost applications where notch-filters are used. Most notably the green notch filter HUD combiners have been used in the military since the 1960s (in the early days they using narrow-spectrum phosphor CRTs, today it can be green uOLEDs).

        I did consider the Sony paper, “High-Efficiency OLED Microdisplay with Microlens Array” also at SID which is where they use MLAs and noted that it was a recent paper and only was a lower resolution 640×360 device. In most cases like this, it is several years before this will migrate to the mainstream product line with higher resolution. Still, it could be that the technology could have migrated earlier to a special internal Sony AR headset.

        Regardless, even starting with 5,000 nits you are in a very tight box where they needed a special combiner structure that would block a lot of light and even with this Sony added an LCD shutter to block even more light. All these suggest an AR headset with not a lot of nits output.

      • Thanks Carl

  2. The pictured Sony waveguide really reminds me of the Optinvent design.

  3. What’s your take on Gallium hmd from kura.tech?

    • While interesting, I’m more than a bit skeptical about the Gallium, particularly its time frame. They are trying to make big strides in both optics and displays and they are a small startup. Even if what they are doing is possible, I would find it highly unlikely they could have it up and running in 2020.

      I know a bit more but I have limits as to what I can say at this point.

  4. Dear Karl, i just wanted to thank you for your in-depth articles. This really helps to understand the AR – industry and make informed decisions about e.g. buying new equipment for AR and VR development.
    Thanks again !

    p.s. what is your opinion on the nreal light FOV ? it is now reported to have only a 52° field of view. that point would be a clear a step backwards (?) coming from dreamglasses with their approx. 90°…

    • I think Nreal’s FOV is a good match for a 1080p display. It supports about 1.4 arcminutes per pixel which reasonably good for supporting fine text and detail (1 arcminute per pixel is nearing the eye’s limit, so-called by Apple “Retinal”). As I pointed out (using Thad Starner’s info) in the article, the “best” seat in a movie theater typically has about a 35-degree FOV horizontally (about 40 degrees diagonally) and IMAX is more like 45 degrees horizontally and 50 degrees diagonally. So by this measure, Nreal is giving the FOV of an IMAX theater. The image is also very high in contrast (when looking at a black background through the glasses) and has minimal distortion. Nreal is also much smaller and lighter than say Dream Glasses.

      As I wrote in my FOV Obsession article (https://www.kguttag.com/2019/10/07/fov-obsession/), I think that people are putting too much emphasis on FOV. I think to some degree this is a carryover from VR where it was easy to support a very wide FOV. A wide FOV with low angular resolution would seem to be a gaming requirement. I tend to think that if the application is gaming, VR type headsets make more sense.

      DreamWorld (I think you mean rather than “dreamglasses”: https://www.dreamworldvision.com/) uses a large off-axis combiner. This design while having a wide FOV because it is using a large flat panel display. While it has a large FOV, the angular resolution is only about 3.5 arcminutes per pixel which means that you will see the individual pixels (screen door effect). Because it is off-axis, there will be both distortion and a tilted focus plane. You can “correct” for distortion but with a loss in resolution, but you can’t fix the tiled focus.

      The key question is “what is the application?”

  5. […] have a massive transmission burden and will have to have better back end display processing (ex., ARM’s Mali-D77) to adjust for “motion to photon” time lags. Even taking all the processing burdens away, there […]

Leave a Reply

Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

%d bloggers like this: