304 North Cardinal St.
Dorchester Center, MA 02124
304 North Cardinal St.
Dorchester Center, MA 02124
One thing that continually confounds me is the high percentage of people, who when discussing AR, seem to only care about the display’s field of view (FOV) as if it is the only specification that matters. This obsession with the display’s FOV ignores many other major image quality factors including angular resolution, brightness, contrast, transmissivity, image artifacts, distortion, and uniformity.
As best I can tell, this FOV obsession started with VR headsets. I would contend that most “instant experts” had never heard of FOV before they learned about VR. FOV is an important issue, just not to the exclusion of all else. Like having too much ice cream, with wider FOV, one should consider the side effects.
For today’s article, I am going to concentrate on the FOV of the displayed image. In a follow-on article, I intend to address how AR affects the “see-through FOV” of human vision. As a preview for that article, I am including figure taken from a paper by Bernard Kress that I annotated and colored below.
I’m largely going to be using information from two vocal advocates for AR that have somewhat different perspectives on the usage model. Starner’s Photonics West 2019 (PW2019) presentation was the initial inspiration for this article.
Thad Starner, who is a Georgia Tech Professor and one of the Architects of Google Glass, inspired this blog article with his presentation at Photonics West in February 2019. Starner has continuously worn AR glasses since 1993. He is somewhat of a contrarian today in the world of AR in that he advocates for a smaller display FOV to gain other aspects he sees as more critical, like size, weight, comfort, and discretion when wearing.
Starner appears to be an “AR minimalist” and contents that the virtues of AR are being lost in trying to do too much. At both PW2019 and AWE 2019 Starner was wearing, and even advocating for his North Focals AR glasses. The North Focals only has about a 15-degree FOV, a tiny eyebox requiring custom fitting, and poor image quality by any objective measure, yet it works for him.
To be sure, by Starner’s simply being willing to wear AR glasses continuously since 1993, Starner is not an average person and may perceive things differently. At the same time, rather than just theorize about what people might want (in my experience, too often the case in AR), he lives it and has a wealth of practical experience.
In preparing this article, I found figures and descriptions in a July 2019 SPIE article and a related video by Bernard Kress, Microsoft Hololens Partner Optical Architect to have relavent information. Before Hololens, Kress worked on Google Glass (with birdbath optics) and at on waveguides at Digilens (previously, SBG-Labs) on waveguides.
In addition to his many papers and lectures (many of which can be searched for on the internet), Kress has put together what I affectionately call the “Bernard Kress Traveling Circus” of panel sessions with some excellent panelists. The panel session at PW2019 that included Thad Starner was one of these panels.
I suspect that the focus on FOV stems from VR, and then people being surprised by the much narrower FOV with most AR devices. Many seem to want AR to be a superset of VR. But before we even get to the FOV, we should ask, “what is the application for the AR headset?” What other factors are we willing to sacrifice to reach a wider FOV? Then we have all the many other non-display factors like eye relief, size, weight, battery, cost, durability, etc. that are going to be impacted by a larger FOV. It seems many are expecting AR to address the same application as VR.
A wide FOV helps create total emersion in a virtual world, which was the goal of VR. This emersion requirement of VR seems to have been transferred to AR. At the same time, many AR applications require detailed high-resolution content.
Most VR headsets have an angular resolution of the order of 5 arcminutes per pixel when the eye can resolve better than 1 arcminute per pixel. On such VR headsets, text content is by necessity big and chunky (screen door effect), and it takes a large amount of the FOV to show readable letters and numbers. Additionally, the eye has to move much more than is comfortable to read anything more than a few numbers or words.
If the goal is to convey detail information like text, numbers, how-to-guides, and direction efficiently, then one needs higher angular resolution. In short, as used for the typical content on a cell phone, computer, or even smartwatch, the characteristics of a typical VR displays are terrible.
Almost all VR headsets are built with an inexpensive cell phone-sized flat panels and simple (often cheap) optics. This structure makes it cheap and easy to render wide FOVs, but with poor angular resolution. The headsets end up being big and bulky like ski goggles with large and heavy stuff added onto them.
AR headsets designed to be reasonably light and open to the real world typically start with small microdisplays (displays made on computer chips). The pixels are nominally 10x smaller linearly and 100X smaller in area. The optics have to be more expensive to be able to resolve the smaller pixels. On top of this, AR headset needs combiner optics to mix the real-world with the display.
Microdisplays with small pixels inherently support high resolution, but it becomes challenging to support a very wide FOV without uneconomically large microdisplays. AR headsets using microdisplays thus have angular resolutions in the 1 to 1.5 arcminute range but struggle to support FOV’s much more than 50 degrees. In addition to the display itself, the physics of the waveguide type combiner optics can put a limit on the FOV.
For reference, 1 arcminute is about 300 pixels per inch held at reading distance. Apple famously referred to their first cell phone display exceeding 300 pixels per inch as a “Retina Display”.
Inexpensive to make “fish-eye” headsets have been build using a cell-phone-like display and a pair of spherical combiners (i.e. Meta 2, Mira, DreamWorld, iGlass). These provide wide FOV but with very low angular resolution similar to typical VR headsets. They also suffer from image distortion and have non-uniform focus from the top to the bottom of the image. Almost ironically, while the combiner optics are huge, they often result in headsets that are about the same size or smaller than headsets utilizing eye-glasses-like waveguide optics.
At PW2019, Thad Starner contended that in addition to compromising other critical factors like size, weight, cost, eye-comfort, and social acceptance, a very large FOV could be counterproductive. That a person with 26 years of personal experience wearing AR headsets was countering “the bigger FOV, the better” philosophy made it stand out.
Starner practices what he preaches. During his talk and throughout the sessions, Starner was wearing North Focals. He was still wearing them when I took the photo on the left at AWE a few weeks later. The North Focals have a relatively tiny FOV of about 15 degrees, poor image quality, and a tiny eyebox (and thus need custom fitting).
Starner, made the point, “the best display is the one you have with you.” This is his twist on the old cell phone camera expression “The best camera is the one you have with you” (made famous by photographer Chase Jarvis). The point is that Starner is willing to sacrifice many other factors to have the display with him and almost unnoticeable when in use.
By counter-productive, Starner is referring to the augmentation covering up what the user needs to see in the real world such as a surgeon’s hands when doing an operation. Starner went on to suggest that putting images slightly to the side of the user’s forward vision is often better.
Whether the extra content is enhancing the AR application, is a distraction, or is dangerous is an important question. For example, is the display blocking the user’s peripheral vision to the extent that they cannot see what is going on around them? As we often say in displays, “peripheral vision has low resolution but is sensitive to motion and is there to keep you from getting eaten by the tiger,” or in more modern times, to help keep you from getting hit by a vehicle.
Starner points out that most people grossly overestimate the field of view they get from display devices as well as in movie theaters. People are generally surprised (and you can usually win a bar bet) that what is consider by THX “optimum” theater seat in a movie theater only has about a 35 to 40-degree horizontal FOV. He points out that people don’t want to play “Head Ping Pong” by sitting in the front rows of a Theater. Below is Starner’s table with my annotations and additions in red.
While Starner makes a good point with this table, I did find it to be somewhat biased toward the smaller numbers by picking things like the “minimums” and using a basic iPhone and an iPad mini, rather than what has become the more popular “plus” sizes. But Starner’s broader point is sound; people grossly overestimate the FOV they think they need.
Starner contends that many applications don’t need more than about a 30-degree display FOV (equivalent to a typical HDTV usage), and a device can be very functional with even a much smaller FOV. The subject of “what is the application” becomes important as Starner is considering AR applications for conveying information. As he points out, most people even with very large LCD TVs, end up sitting with horizontal FOVs of only 20 to 30 degrees.
If the purpose of the AR headset is to convey information rather than VR-like environments, then it becomes crucial to understand how much of the field of view is useful for detailed information.
The figure below is taken from a paper by Bernard Kress. The paper has useful information and figures about the human visual field of view (at least once you get past the marketing fluff at the beginning, particularly the self-serving subjective table 1 in the paper).
You should note that in the figure above the high-resolution fovea only covers about 3 degrees. Other sources consider the fovea to be up to 5 degrees. Part of this difference is the sharp fall-off in resolution in the area of the fovea and part of it is that the size and shape of the fovea differs from person to person. Either way the fovea is very small compared to the human FOV. The fovea is the only part of a normal eye that has 20/20 vision.
A very interesting observation in Kress’s paper is that the size and location of the fovea may be a function of early development. Quoting Kress, “It is interesting to note that the human fovea grows slowly in early life based on specific human visual behavior, and is not a feature of our visual system given at birth. Therefore, the location of the fovea might drift to new positions on the retina with novel visual behaviors radically different from millennia of human evolution, such as the use of small digital displays held at close range by toddlers. Another severe change would be early childhood myopia due to the same cause.” He cites the paper “Ocular effects of virtual reality headset wear in young adults“
The fovea is the only part of a normal eye that has 20/20 vision. Studies have shown that the area around the fovea, the parafovea, while lower in resolution still aids in processing details like text. The eye is continuously moving with a rapid series of jump-like motions known as saccades. The human visual system then builds up what one sees by combining the series of “snapshots” at various resolutions at each saccade. It is an amazingly complex process that is imperceptible most of the time. Usually, only display devices will belie the complex behavior of the human visual system.
The figure below is taken from Starner’s PW2019 presentation (with my annotations in red). The left side shows the well-known chart of how the resolution of the eye falls off dramatically with distance, showing in a graphical form the information from Kress’s regions above. The right side of Starner’s slide is from a study of eye movements and fixations (saccades) for reading text in a 2005 Lund University Study.
As Starner emphasis’s in the title of the slide and elsewhere in his presentation, a typical newspaper column of text is only 6.6 degrees wide when held at reading distance. The obvious point is that only a 6.6-degree image is wide enough for communicating considerable text information.
There is a region where the eye can comfortably (for a short time, more on this later) move the fovea which Kress refers to as “the fixed foveated region”. Kress shows this as being about 40 to 50 degrees, but it varies from person to person as does what is considered “comfortable” and for how long. Different studies have a similar concept but report somewhere between 25 and 40 degrees. Kress’s 50 degrees is on the larger side.
Outside the fixed foveated region, a person is prone to turn their head rather than just move their eyes. Some people are more prone to turn their head sooner than others, and it can depend on other factors, so it is a rough range. At some point, the person is going to turn their head, and this is a profound difference with a head-mounted display that is attached to the head versus a monitor.
Furthermore, as Starner’s presentation will discuss, even if the eye can comfortably move in the fixed foveated region for short intervals, this level of range cannot be sustained for long periods without eye discomfort.
Now going back to Kress’s “fixed foveated region” being about 40-50 degrees being about as much as the eye can comfortably move, Starner shows that this much movement will cause discomfort in less than 30 minutes. Starner shows the results from a 2017 Thesis by Haynes, which included a study on FOV and eye discomfort for reading text over a range of angles.
Haynes’ thesis states (page 8 of 303 pages), “Thus, eye physiology provides some basic parameters for potential HWD design. A display can be no more than 55° horizontally from the normal line of sight based on oculomotor mechanical limits. However, the effective oculomotor range places a de facto limit at 45°. Further, COMR and saccadic accuracy suggest visually comfortable display locations may be no more than [plus or minus] 10-20° from the primary position of gaze.”
In practical terms, when one is reading with a large computer monitor (not-head-worn), the eyes will tend to move about a total of about 20 to 30 degrees before the head rather than the eye’s move. With AR, this has implications.
The practical application of this study suggests that there should not be constant content outside of about 20 degrees on which the user is supposed to concentrate. So we are in a bit of a box, that if we put detailed content in the center of the user’s vision, we may be blocking their view of critical things they should see (such as their hands), but the content can’t be too far from center, or it will cause eye discomfort.
The way a human uses a head-mounted display is quite different than when they use a monitor. A major difference is that when a person’s head turns, the physical display moves with them. As pointed out above, somewhere around plus or minus 15 degrees horizontally from the center of view, people turn their heads. The larger screens support immersion and head panning.
Using techniques such as SLAM and eye-tracking one can lock the displays so they somewhat behave as a fixed monitor but the human vision system still gets clues that things are not right. One obvious thing is that focus does not change. Less obvious is that the whole head and eye saccade interaction is acting differently. All this seems to suggest that user interfaces for head-mounted displays need to be different than that for fixed screens.
When Varjo first started talking about “Foveated displays” back in 2017 (June 2017 and July 2017 articles on Varjo’s original Foveated Display concept), they were planning on tracking the eye and moving the high-resolution “fovea” display using a mirror based on eye-tracking. Back in 2017, I explained that by far, the hardest problem with making the original Varjo foveated display concept work was to have the high-resolution display track the eye. I commented that having a high-resolution display in the center was about 1% of the difficulty of making the displayed image track the eye.
By the time I saw Vajro at AWE 2018, they were not talking about moving the foveated display region. Also, the angular resolution had changed for the foveated display from 70 pixels per arcminute to 60 pixels per arcminute or 1 pixel per degree, so the center display covers a bit more area. Vajro’s VR-1 specs have a 1080p center display. These specs translate to covering 32-degrees horizontally by 18-degrees vertically. These numbers show why the Varjo VR-1 fix foveated display works most of the time, but why one can still see the transition to the lower resolution region if they try. If the foveated region extended to somewhere between 40 and 50 degrees, it might be enough to make the transition go away for almost all cases.
While on the subject of Foveated Displays, Apple patent application 20190285897 was recently published that includes using multiple scanned laser projectors, and even scanned arrays of lasers, to generate a foveated near-eye display. I don’t have time here to go into details, but the whole thing looks like more of a researcher’s dream than anything that will be produced anytime soon. I just wanted to mention it because it pertains to foveated displays and many Apple and AR news sources have been reporting on it.
Different perspectives on the usage model for AR should result in different requirements for both the FOV and the importance of angular resolution. AR designers seem caught in the trap of having “instant experts” including those in the media demanding that they have the FOV of today’s VR headsets. The issue of angular resolution necessary to reasonably read text seems to be almost totally overlooked by the “instant experts.”
AR developers, in addition to servicing wide FOV and angular resolution, are also wanting the headset to be sleek and wearable for long periods. The result so far have been “jack of all trades and master of none” headsets. On top of the display and optical requirements, AR designers are cramming a bunch of processing power, communication hardware, and a battery into the headset itself or a corded processing module. The result is a headset that only the relatively small part of the “enterprise market” can justify and few will want to wear for very long. The units end up requiring a large case to protect and carry when not in use.
Thad Starner may go a bit to the other extreme with his more minimalist approach to AR. For example, I can see the utility in North Focals while Thad swears by them. Still, he makes some good points that developers should consider.
I’m working as Chief Science Officer (CSO) at RAVN, a company developing AR headsets for first responders and military applications. RAVN might be considered a competitor to Microsoft’s Hololens.
I want to thank Ron Padzensky for reviewing and making corrections to this article.