304 North Cardinal St.
Dorchester Center, MA 02124
304 North Cardinal St.
Dorchester Center, MA 02124
As discussed in Part 1, the basic concept of foveated display in theory should work to provide high angular resolution with a wide FOV. There is no single display technology today for near-to-eye displays. Microdisplays (LCOS, DLP, and OLED) support high angular resolution but not wide FOV and larger flat panel displays (OLED and LCD) support wide FOV but with low angular resolution.
The image above left includes crops from the picture on Varjo’s web site call “VR Scene Detail” (toward the end of this article is the whole annotated image). Varjo included both the foveated and un-foveated image from the center of the display. The top rectangle in red it taken from the top edge of the picture where we can just see the transition starting from the foveated image to what Varjo calls the “context” or lower resolution image. Blending is used to avoid an abrupt transition that the eye might notice.
The topic foveated gathered addition interest with Apple’s acquisition of the eye tracking technology company SMI which provided the eye tracking technology for Nvidia’s foveated rendering HMD study (see below). It is not clear at this time why Apple bought SMI, it could be for foveated rendering (f-rendering) and/or foveated display (f-display).
The common human visual acuity charts (right) give some feel for the why foveation (f-rendering and/or f-display) works. But these graphs are for static images of high contrast black and white line pairs. While we commonly talk about a person normally seeing down to 1 arcminute per pixel (300 dpi at about 10 inches) being good, but people can detect down to about 1/2 arcminute and if you have a long single high contrast line down to about 1/4th of an arcminute. The point here is to understand that these graphs are a one-dimensional slice of a multi-dimensional issue.
For reference, Varjo’s high resolution display has slightly less than 1-arminute/pixel and their context display in their prototype has about 4.7-arcminutes/pixel. More importantly, their high resolution display covers about 20 degrees horizontally and 15 degrees vertically and this is within the range where people could see errors if they are high in contrast based on the visual acuity graphs.
Varjo will be blending to reduce the contrast difference and thus make the transition less noticeable. But on the negative side, with any movement of the eyes, the image on the foveated display will change and the visual system tends to amplify any movement/change.
F–rendering, varies the detail/resolution/quality/processing based on where the eyes are looking. This is seen as key in not only reducing the computing requirement but also saving power consumption. F-rendering has been proven to work with many human studies including those done as part of Microsoft’s 2012 and Nvidia’s 2016 papers. F-rendering becomes ever more important as resolution increases.
F-rendering uses a single high resolution display and change the level of rendering detail. It then uses blending between various detail levels to avoid abrupt changes that the eye detect. As the Microsoft and Nvida papers point out, the eye is particularly sensitive to changes/movement.
In the case of the often cited Microsoft 2012, they used 3 levels of detail with two “blend masks” between them as illustrated in their paper (see right). This gave them a very gradual and wide transition, but 3 resolution levels with wide bands of transition are “luxuries” that Varjo can’t have. Varjo only has two possible levels of detail, and as will be shown, they can only afford a narrow transition/bends region. Microsoft 2012 study used only 1920×1080 monitor with a lower resolution central region than Varjo (about half the resolution) and then 3 blending regions that are so broad that that they would be totally impractical for f-display.
Nvidia’s 2016 study (which cites Microsoft 2012) simplified to two levels of detail, fovea and periphery, with a sampling factor of 1 and 4 with a simpler linear blending between the two detail levels. Unfortunately, most of Nvidia’s study was done with a very low angular resolution Oculus headset display with about a 4.7 arcminutes/pixel with a little over 1,000 by 1,000 pixels per eye, the same display as Varjo uses for their low resolution part of the image. Most of the graphs and discussion in the paper was with respect to this low angular resolution headset.
Nvidia 2016 also did some study of a 27″ (diagonal) 2560×1440 monitor with the user 81cm way resulting in an angular resolution of about 1-arcminute and horizontal FOV of 40 degrees which would be more applicable to Varjo’s case. Unfortunately, As the paper states on their user study, “We only evaluate the HMD setup, since the primary goal of our desktop study in Section 3.2 was to conﬁrm our hypothesis for a higher density display.” They only clue they give for the higher resolution system is that, “We set the central foveal radius for this setup to 7.5°.” There was no discussion I could find for how they set the size of the blend region; so it is only a data point.
Comment/Request: I looked around for a study that would be more applicable to Varjo’s case. I was expecting to find a foveated rendering study using say a 4K (3840×2160) television which would support 1 arcminute for 64 by 36 degrees but I did not find it. If you know of such a study let me know.
Even if we had a f-rendering study of an ~1-arcminute peak resolution system, it would still only give us some insight into the f-display issues. F-rendering, while conceptually similar and likely to to be required to support a f-display (f-display), is significantly simpler.
With f-rendering, everything is mathematical beyond the detection of the eye movement. The size of the high resolution and lower resolution(s) and the blend region(s) can be of arbitrary size to reduce detection and even be dynamic based on contend. The alignment between resolutions is perfectly registered. The color and contrast between resolutions is identical. The resolution of rendering of the high resolution area does not have to scaled/re-sampled to match the background.
Things are much tougher for f-display as there are two physically different displays and the high resolution display has to be optically aligned/moved based on the movement of the eye. The alignment of the display resolution(s) limited by the optics ability to move the apparent location of the high resolution part of the image. There is likely to be some vibration/movement even when aligned. The potential size of the high resolution display as well as the size of the transition region is limited by the size/cost of the microdisplay used. There can be only a single transition. The brightness, color, and contrast will be different between the two physically different displays (even if both are say OLED, the brightness and colors will not be exactly the same). Additionally, the high resolution display’s image will have to be remapped after any optical distortion to match the context/peripheral image; this will both reduce the effective resolution and will introduce movement into the highest resolvable (by the eye) part of the FOV as the foveated display tracks the eye on what otherwise should be say a stationary image.
When asked, Varjo has said that they more capable systems in the lab than the fixed f-display prototype they are showing. But they stopped short of saying whether they have a full up running system and have provide no results of any human studies.
The bottom line here, is that there are many more potential issues with f-display that could prove to be very hard if not practically impossible to solve. A major problem being getting the high res. image to optically move and stop without the eye noticing it. It is impossible to fully understand how will it will work without a full-blown working system and a study with humans and a wide variety of content and user conditions including the user moving their head and reaction of the display and optics.
Varjo is currently demoing a proof of concept system with the foveated/high-resolution image fix and not tracking the center of vision. The diagram below shows the 100 by 100 degree FOV of the current Varjo demonstration system. For the moment at least, let’s assume their next step will be to have a version of this where the center/foveated image moves.
Shown in the figure above is roughly the size of the foveated display region (green rectangle) which covers about 27.4 by 15.4 degrees. The dashed red rectangle show the area covered by the pictures provided by Varjo which does not even fully cover the foveated area (in the pictures they just show the start of the transition/blending from high to low resolution).
Also shown is a dashed blue circle with the 7.5 degree “central fovial radius” (15 degree diameter) circle of the Nvidia 2016 high angular resolution system. It is interesting that it is pretty close to angle covered vertically by the Varjo display.
Varjo’s Foveated display should appear to the human eye as having much higher resolution than an non-foveated display of with the same resolution as Varjo’s context/periphery display. It is certainly going to work well when totally stationary (such as Varjo’s demo system).
My major concern comes (and something that can’t be tested without a full blown system) when everything moves. The evidence above suggests that there may be visible moving noise at the boundaries of the foveated and context image.
Some of the factors that could affect the results:
One should also note that the moving foveated image will by necessity have to be mapped onto the stationary low resolution image. Assuming the rendering pipeline first generates a rectangular coordinated image and then re-samples it to adjust for the placement and optical distortion of the foveated image, the net effective resolution will be about half that of the “native” display due to the re-sampling.
In theory, this re-sampling loss could be avoided/reduce by computing the high resolution image with the foveated image already remapped, but with “conventional” pipelines this would add a lot of complexity. But this type of display would likely in the long run be used in combination with foveated rendering where this may not be adding too much more to the pipeline (just something to deal with the distortion).
First, I want to complement Varjo for putting actual through the optics high resoluion images on their website (note, click on their “Full size JPG version“). By Varjo’s own admission, these pictures were taken crudely with a consumer camera so the image quality is worse than you would see looking into the optics directly. In particular there are chroma aberrations that are clearly visible in the full size image that are likely caused by the camera and how it was use and not necessarily a problem with Varjo’s optics. If you click on the image below, it will bring up the full size image (over 4,000 by 4,000 pixels and about 4.5 megabytes) in a new tab.
The angles give in the figure were calculated based on the known pixel size of the Oculus CV1 display (their pixels are clearly visible in the non-foveated picture). For the “foveated display” (green rectangle) I used Varjo’s statement that it was at least 70 pixels/degree (but I suspect not much more than that either).
Next time on this topic, I plan on discussion how f-displays may or may not compete in the future with higher resolution single displays.