Varjo Foveated Display and Passthrough AR at AWE 2018

Introduction

I’m long overdue to report on some things I saw at AWE in late May 2018. Having written two articles (article one and article two) back in 2017, I was curious to see the Varjo demonstration. In recent news, Varjo announced that they raised $31 million in a Series B round.  Varjo was nice enough to invite me to their suite to see some demos and let me take whatever pictures I wanted.

I wanted to see A) how it looked with my own eyes, and B) if they had solved the eye tracking and foveated display image movement I was skeptical about back in 2017 (see my 2017 article two). The answer to A) is that when you look straight ahead it does look very good, but you can see the boundaries if you look around. The answer to B) was “not so much.”

While it was disappointing for me not to see any demonstration of eye tracking and display movement from Varjo, I did enjoy their pass-through (camera) AR demo that was stunning in some ways. Varjo said they had just gotten the AR demo up and running right before the show. While it was a bit crude, it was still effective in demonstrating the concept.

Foveated Display (Quick Intro)

Foveated display technology is based on the fact that the human vision only has high resolution at the center of the eye. The concept is to put a high-resolution image in the center of wherever the eye is looking and then fill in with a low resolution elsewhere to impart a feeling of fuller emersion. Varjo combines a large flat panel display used in VR headsets with a small OLED microdisplay for the “foveated” (center) display with a beam splitting mirror to combine the two images.

In theory, the beam splitter is supposed to move around causing the image from the OLED microdisplay to track the eye. While conceptually easy, this is very hard to solve in practice and is the part that Varjo chose not to demonstrate. When I asked at the show why they were not demonstrating it, I was given a bit of a waffling comment to the effect of their lab prototype being noisy and that people were so thrilled with the static (fixed mirror and eye tracking) version that they decided to concentrate on it.

Static Foveated Display

The demo started with their static foveated display which consists of a modified VR headset with a static beam combiner and microdisplay. It does “work” if you look straight ahead, and the image quality is consistent with the pictures Varjo released in 2017 that were taken through the actual optics.

Varjo demonstrated a series of images, and as I have often written on this blog, a simple black and white test pattern is often the toughest thing for a color display to get right. Shown below (click on the image for a much larger view) is such a test pattern. The camera does tend to exaggerate the transition because when you are looking straight ahead, this region is meant to be seen by parts of the eye with much lower resolution.

One issue I did notice in the central region is some chroma aberrations (click on the photo to see these). A fundamental issue for a foveated display is that both the peripheral (low resolution) and foveated (high resolution) image must go through the same main lens (see diagram above). They are using a lens that may be just good enough for the peripheral image but is too low in quality for the foveated image and thus the visible chroma aberrations.

Ron Panzensky who reviewed and help edit this article commented that while this demo impressed him, he thought the transition from high to low resolution was so abrupt that he noticed it. This is an issue I wrote about in the Part 2 article on Varjo. Additionally, for the same reason, a foveated display is supposed to work, the lack of tracking means the eye can turn to bring the fovea to notice the transition.

The next photo shows a more complicated scene, and none of the problems can be seen. The complexity of the image helps hide any noticeable flaws that would show up on say a flat white background. At the same time, it demonstrates the high resolution of the foveated region (outlined in red dots).

Passthrough (Camera) AR Display

Varjo’s pass-through AR demo I found particularly interesting. The headset had two cameras on the front, and the room was rigged for motion tracking. I believe they are still using an optically modified Oculus Rift headset (see picture on the left) for this demo rig.

When you put the rig on, you see a 3-D rendered motorcycle in the room that you can walk up to and around with the 3-D motorcycle convincingly behaving like it is in the room. Unlike optical AR (ex. Magic Leap and HoloLens) the virtual objects look solid and not ghostly. It is hard to tell the difference between what is real and virtual.

When you look straightforward, you do see what appears to be an extremely high-resolution image which adds to the sense of the motorcycle being there. Some of the features on the computer model for the motorcycle, however, look like they could use some more polygons as some curves seem to turn into line segments (for example, the back of the motorcycle’s mirror).

On the crop of the picture through the optics (left) and on the larger picture below, I have outlined with a dotted red line the rough boundary of the foveated image (click on the image to see larger versions). You can see the transition with the camera but you cannot if you are looking straight ahead.

While I didn’t notice a significant lag from head movement to the image changing, I was not rigorously checking for this issue during the demo. I also didn’t check for how the real-world looked in the foveated display region, but it would clearly be limited by the camera’s resolution.

What’s Hard with Optical AR Is Often Easy with Passthrough AR and Vice Versa

I want to make clear that passthrough AR has its advantages and disadvantages compared to optical AR. Some of the hardest challenges with Optical AR are almost trivial with Passthrough AR.

Passthrough AR has major advantages regarding opacity and hard edge occlusion. It is also much easier to balance the brightness between the virtual and real worlds. The optics are generally also much simpler.

There are also many serious downsides to passthrough AR compared to optical AR. No display can match the human visual system. There will always be some lag between input and output. The real world does not vary in focus properly which causes vergence-accommodation conflict. Passthrough AR headsets are going to be bulkier and entirely block off the user’s direct view of the real world thus isolating the user.

Static Foveated Display is 99% of the Visual Impact, But Maybe Only 1% of the Solution

According to Varjo at AWE 2018, they have delayed perfecting the eye tracking and display movement to concentrate on building static foveated displays. This is in spite of what Varjo has said previously and what is on their website, “By tracking your eyes in real-time, Bionic Display™ delivers a flawless and completely accurate image that far surpasses anything on the market today.

Certainly the “static” (non-eye-tracking) foveated display makes a good first impression with the user perceiving a very high-resolution image. But it only works if the user stares straight ahead with little eye movement. If there were detail across the FOV like text, the user would know that only the center is in sharp focus.

Unfortunately, it is the tracking of the eye accurately and getting everything to line up and work optically, including getting everything to be in focus, that is by far the is the hardest part of the foveated display problem. There are a number of difficult problems to be solved regarding eye tracking, software/algorithms, moving the foveated part of the image, and the overall optics.

Conclusion: Racing to Get There Ahead of Higher Resolution Flat Panels for VR

There are good reasons to believe foveated rendering will likely be used to reduce the computational load in the future for VR headsets. Companies no less than Nvidia and Microsoft are publishing papers on studies using with large flat panels (see my first article on foveated displays). These displays used conventional flat panels and simply vary the computations based on eye tracking.

Small flat panel displays have continued to reduce the pixel sizes, which in turn can be used to improve the angular resolution. While the first-generation VR headsets (ala Oculus Rift) were at a very chunky ~4.4 arcminutes per pixel, the Oculus Quest is at ~3 arcminutes per pixel. Though most designers “goal” is to achieve about 1 arcminute per pixel, somewhere around 1.5 arcminutes per pixels is often considered “good enough” for most practical uses — particularly game playing. The inevitable “crank turning” of flat panel displays is expected to keep the resolution improving and closing in on these goals.

The question becomes whether a “physical” foveated display with a moving eye tracking high-resolution region will become practical before flat panels support “good enough” angular resolution before an optically moving foveated region display becomes practical.

Acknowledgment

I would like to thank Ron Padzensky for reviewing and making corrections to this article.

Default image
Karl Guttag

11 Comments

  1. […] Varjo foveated display and passthrough AR at AWE 2018 […]

  2. Having read your previous blog entries, I had come to the conclusion that there were some hard limits when increasing the resolution of small displays. Here you say that it’s just a matter of time and “crank turning” before the displays hit the magic number.
    My question is, what is the hard limit of current technology as far as arcminute resolution goes?

    • Thanks,

      I should have made a distinction about the issue of pixel size and resolution with respect of VR and “AR-Passthrough” versus full AR. With VR/Passthrough they are better able to “ride to curve.” The pixel sizes still have to come down considerably and there is a chance it could slow down as they are already pushing what a person can see with a direct view displays. Ast some point VR will have to carry the whole development cost of making pixel size smaller rather than living off the improvements made for cell phones.

      Another factor is the distance the display it from the eye. If you take the same flat panel and move it further from the eye and adjust the optics accordingly, the FOV is reduced but the angular resolution improves. You can imagine with a lightweight OLED flat panel that you might put it a bit further from the eye to gain in angular resolution and making it a bit bigger to get by the FOV.

      It is hard to know how many crank-turns/years for the flat panels. Some of this depends on if the cell phones keep pushing more pixels as they are already getting close to the limit of even people with extremely good eyesight. The cell phones have the kind of volume that enable the R&D. Without the cell phone driving it, the rate of reduction in pixel size could slow down. Another issue is it technology. I’m a big long-term believer that MicroLED (inorganic, not OLED) will eventually take over everything but they could be years away from high volume production. MicroLEDs might change the equation. but I suspect that they continue the path set by OLED phone displays in terms of pixel size as they will use similar driving substrates but change the technology built on those substrates.

      BTW, it is a very different issue for see-through displays were a cellphone-sized display really does not work well due to weight, form factor, and brightness)

      As you can see, I can argue either side in the Foveated Display versus Pixels getting smaller. The question becomes they solve a true foveated display (eye tracking and high-res-image moving) at a cost and reliability that works before the displays will solve the problem.

  3. Excellent thoughts. Very holistic view of the market.
    I suppose the question of “What is the smallest possible pixel you can make” is somewhat moot in this discussion. Given enough time and resources, you could make LEDs at the molecular scale. In fact, here is an article about just that from 2014. https://www.eitdigital.eu/news-events/blog/article/pixel-as-small-as-it-can-get
    This doesn’t really matter for the kinds of things you are discussing here though. It’s not a question of “Can we”, but more of “Is this a financially sound decision over the next few years”. Obviously someone at Varjo thinks so, but someone at Samsung might think otherwise. Maybe the market for XR will have enough momentum to keep the display crank turning for long enough to make Varjo obsolete. Maybe ultra high resolution displays will stay expensive enough to give them a reasonable business edge. Maybe a third party with more experience with tiny displays, tracking and adjusting optics (I’m thinking of a camera manufacturer and image stabilization) will sweep them. As you said here, they only have a small percentage of the solution figured out so far.
    Maybe a static foveated display is good enough. As a commenter on your previous post about Varjo said, they would prefer to have static now than wait for eye tracking. Players in the XR hardware market are eager to distinguish their offerings from the competition and given a low enough price point in volume, a small patch of readability might be the key to a successful marketing campaign.
    As always, good to read your work. Thanks for posting.

    • There are also issues of trying to make transistors smaller for each pixel. It gets to be a challenge particularly for OLED and MicroLEDs that have to drive current for each pixel which takes more circuitry (LCDs are essentially passive/static voltages). As you scaled down, you need not only smaller but you need better transistors. This means going to more expensive technology and for very small displays IC’s. There is a saddle curve where very large pixels for say TVs get expensive (thus you pay a lot more for 70″ than a 55″ TV), the pixels are more expensive as they get smaller for cell phones, and then they get much more expensive on IC’s (microdisplays).

      There is a big gap that has yet to be filled between the smallest cell phone pixels at about 50 micron pitch (the iPhone Xs is about 57 microns) and 2500 square microns and microdisplay pixels at typically 4 to 10 microns or 16 to 100 square microns. The gap is caused by the transistor sizes and jumping between using “low-temperature polysilicon” and crystalline (IC) transistors. I talked about this gap in https://www.kguttag.com/2017/06/07/gaps-in-pixel-sizes/

      One of the big problems as they scale LEDs smaller is light extraction or getting the photons out toward the eye. Typically, very small LEDs have issues with light extraction.

      It is both an issue of the market driving forces and the inherent difficulty in making improvements. As you start to approach physical limits, the problems can get exponentially more difficult/expensive. Without a market driver, it becomes financially difficult to justify spending on tough to solve problems. One reason I believe MicroLED, in addition to their technical superiority to OLEDs, is that there is so much R&D going on with LEDs today due to their use in lighting. While not directly relevant, the understanding of the physics and manufacturing/yield is likely to spin off into other areas.

  4. Karl great read as always. I have a question about pixel size and density. If I was to use some very close to the eye display. What would the pixel pitch and density need to be to obtain effects such as those used in the fictional viewer of Tony Stark a.k.a. Iron Man? I would think the trick would be to go super high res directly in front of the eye which should also provide infinity field of depth. What are your thoughts? Also, I would imagine there would need to be a certain distance between the pixels to preserve angular resolution. It may be impossible to do at our current state of technology but humor me, please.

    • It is best to think in terms of “angular resolution” when it comes to near to eye displays. The common “metric” is the arcminute (1/60th of a degree = 1 arcminute). One arcminute/pixel is considered “retinal resolution” as made famous by Apple. 300 pixels per inch at “reading distance” (about 10 to 12 inches) is roughly 1 arcminute per pixel. Most of the optics in a near-eye display don’t so much enlarge the image, but rather allow the eye to focus at closer distances.

      While 1 arcminute is considered “retinal” a person with very good vision can detect things like jagged steps in a very high contrast image down to about 0.5 arcminutes. For most practical uses, 1.5 arcminutes per pixel is probably good enough, particularly for AR where the contrast is not going to be great against an arbitrary background.

      The eye’s visual acuity falls off rapidly from the center of the eye, but then the eye moves around and human vision system “stitches” together the view a person perceives. The idea of foviated displays is that if you track the eye and can move the image so the high-resolution part is where the eye is looking, you can get away with vastly fewer pixels and computation.

      It sounds like you are the confusing depth of field with resolution. The depth of field a matter of how the light rays are focused. If you have a super high-resolution image but it is out of focus it won’t have a large depth of field. There are ways such as using laser light either to scan the image or use laser light to illuminate a panel display (DLP and LCOS for example) were the image will have a very large depth of field because the light rays from each pixel are highly collimated and thus don’t require focusing (very roughly speaking).

  5. Very nice review on Varjo foveatd display system.
    In order to achieve angular resolution ~60 ppd and FOV >100° simultaneously, a display panel with over 6K×6K pixels would be ultimately needed for each eye. Such a huge increase of pixel number would bring several challenges and difficulties:
    1. Panel fabrication. No matter for liquid crystal display (LCD) or organic light emitting diode (OLED) display, integrating over 36 million (6K×6K) pixels on a small size panel is definitely not an easy task. With the tremendous efforts of panel manufactures, the VR display resolution
    has been pushed to around 4K×4K recently, although there is still a long way to go for low cost mass production.
    2. Driving electronics. To drive a high-resolution 6K×6K display panel with over 90 Hz refresh rate, the addressing time would be much shorter for each scan line. New driving circuitry designs are needed.
    3. Data transport. Assuming a 90-Hz 6K×6K display panel, we need to deliver more than 70 Gb/s to the display panel, not even to mention light field displays with space-, time or polarization-multiplexing.
    4. Image rendering. It would be extremely challenging to render high-resolution images and videos in real-time, due to the limitation from the computing power. More efficient image rendering pipeline must be developed as well.

    • You make some good points. For VR/non-see-through, it would seem the trend in display panels will get there in about 10 years give or take 5 years. This is just judging trend lines in terms of pixel size and numbers of pixels. The 5-year delta depends on if there is a dedicated effort as the display manufactures may judge it as too small a market to make a major effort. Without seeing a market, it may not happen for even longer.

      Per your #4, they could use foveated rendering with a very high resolution, but not foveated display. Basically, just use eye tracking. They could combine this with variable resolution driving to drive groups of pixels in the user’s peripheral vision to help with your #2 above. Similarly, foveated rendering could reduce the bandwidth requirements and there are already companies working on this problem.

      For AR/see-through/small-and-light-weight, it is a different story. I don’t see any trends that would get them to 6K by 6K with a pixel size that will work for optics in less than 20 years. Something like a foveated display may be the only way to get to ~1 arcminute with a > 90-degree FOV.

  6. Good day sir!
    Excellent blog and although this specific post is almost an year old i’m still interrested in some technical details of this pass-through-system (especially the cameras). On Varjo’s website you can read that they are using 2x12MP cams with a rate of 90Hz which sounds great. Do you have any insights about them (manifacturer, lens-types etc)?
    Please keep up this great source!
    Best wishes
    Frank

    • Thanks,

      Sorry, I don’t have any details on the cameras. The “usual suspects” for camera chips would be Sony or Omnivision, but then there are many companies making the optics.

Leave a Reply

Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

%d bloggers like this: