Apple Vision Pro (Part 5A) – Why Monitor Replacement is Ridiculous


As I wrote in Apple Vision Pro (Part 1) regarding the media coverage of the Apple Vision Pro, “Unfortunately, I saw very little technical analysis and very few with deep knowledge of the issues of virtual and augmented reality. At least they didn’t mention what seemed to me to be obvious issues and questions.

I have been working for the last month on an article to quantify why it is ridiculous to think that a VR headset, even one from Apple, will be a replacement for a physical monitor. In writing the article, if felt the need to include a lot of background material and other information as part of the explanation. As the article was getting long, I decided to break it into two parts, this being the first part.

The issues will be demonstrated using the Meta Quest Pro (MQP) because that is the closest headset available, and it also claims to be for monitor replacement and uses similar pancake optics. I will then translate these results to the higher, but still insufficient, resolution of the Apple Vision Pro (AVP). The AVP will have to address all the same issues as the MQP.

Office applications, including word processing, spreadsheets, presentations, and internet browsing, mean dealing with text. As this article will discuss, text has always been treated as a special case with some “cheating” (“hints” for grid fitting) to improve sharpness and readability. This article will also deal with resolution issues with trying to fit a virtual monitor in a 3-D space.

I will be for this set of articles suspending my disbelief in many other human factor problems caused by trying to simulate a fixed monitor in VR to concentrate on the readability of text.

Back to the Future with Very Low Pixels Per Degree (ppd) with the Apple Vision Pro

Working on this article reminded me of lessons learned in the mid-1980s when I was the technical leader of the TMS34010, the first fully programmable graphics processor. The TMS340 development started in 1982 before an Apple Macintosh (1984) or Lisa (1983) existed (and they were only 1-bit per pixel). But like those products, my work on the 34010 was influenced by Xerox PARC. At that time, only very expensive CAD and CAM systems had “bitmapped graphics,” and all PC/Home Computer text was single-size and monospaced. They were very low resolution if they had color graphics (~320×200 pixels). IBM introduced VGA (640×480) and XGA (1024×768) in 1987, which were their first IBM PC square pixel color monitors.

The original XGA monitor, considered “high resolution” at the time, had a 16” diagonal and 82ppi, which translated 36 to 45 pixels per degree (ppd) from 0.5 meters to 0.8 meters (typical monitor viewing distance), respectively. Factoring in the estimated FOV and resolutions, the Apple Vision Pro is between 35 and 40 ppd or about the same as a 1987 monitor.

So it is time to dust off the DeLorean and go Back to the Future of the mid-1980s and the technical issues with low ppd displays. Only it is worse this time because, in the 1980s, we didn’t have to resample/rescale everything in 3-D space when the user’s head moves to give the illusion that the monitor isn’t moving.

For more about my history in 1980s computer graphics and GPUs, see Appendix 1: My 1980s History with Bitmapped Fonts and Multiple Monitors.

The question is, “Would People?” Not “Could People?” Use an Apple Vision Pro (AVP) as a Computer Monitor

With their marketing and images (below), Apple and Meta suggest that their headsets will work as a monitor replacement. Yes, they will “work” as a monitor if you are desperate and have nothing else, but having multiple terrible monitors is not a solution many people will want. These marketing concepts fail to convey that each virtual monitor will have low effective resolution forcing the text to be blown up to be readable and thus have less content per monitor. They also fail to convey that the text looks grainy and shimmers (more on this in a bit).

Meta Quest Pro (left) and Apple Vision Pro (right) have similar multiple monitor concepts.

Below is a through-the-lens picture of MQP’s Horizons Virtual Desktop. t was taken through the left eye’s optics with the camera centered for best image quality and showed more of the left side of the binocular FOV. Almost all the horizontal FOV for the left eye is shown in the picture, but the camera slightly cuts off the top and bottom.

MQP Horizon Desktop – Picture via the Left Eye Optics (camera FOV 80°x64°)

Below for comparison is my desktop setup with a 34” 22:9 3440×1400 monitor on the left and a 27” 4K monitor on the right. The combined cost of the two monitors is less than $1,000 today. The 22:9 monitor display setting is 100% scale (in Windows display settings) and has 11pt fonts in the spreadsheet. The righthand monitor is set for 150% scaling with 11pt fonts netting fonts that are physically the same size.

My office setup – 34” 22:9 3440×1440 (110 PPI) widescreen (left) & 27” 16:9 4K (163 PPI) Monitor (right)

Sitting 0.5 to 0.8 meters away (typical desktop monitor distance), I would judge the 11pt font on either of the physical monitors as much more easily readable than the 11pt font on the Meta Quest Pro with the 150% scaling, even though the MQP’s “11pt” is angularly about 1.5x bigger (as measured via the camera). The MQP’s text is fuzzier, grainier, and scintillates/shimmers. I could over six times the legible text on the 34” 22:9 monitor and over four times on the 27” 4K as the MQP. With higher angular resolution, the AVP will be better than the MQP but still well below the amount of legible text.

Note on Window’s Scaling

In Window, 100% means a theoretical 96 dots per inch. Windows factors in the information reported by the monitor to it (in this case, from the MQP’s software) give a “Scale and Layout” recommendation (right). The resolution reported to Windows by the MQP’s Horizon’s virtual monitor is 1920×1200, and the recommended scaling was 150%. This setting is what I used for most pictures other than for the ones called out as being at 100% or 175%.

For more on the subject of how font “points” are defined, see Appendix 3: Confabulating typeface “points” (pt) with With Pixels – A Brief History.


I’m not going to go into everything wrong with VR optics, and this article deals with being able to read text in office applications. VR optics have a lot of constraints in terms of cost, space, weight, and wide FOV. While pancake optics are a major improvement over the more common Fresnel lenses, to date, they still are poor optically (we will have to see about the AVP).

While not bad in the center of the FOV, they typically have severe pincushion distortion and chroma (color) aberrations. Pancake optics are more prone to collecting and scattering light, causing objects to glow on dark backgrounds, contrast reduction, and ghosts (out-of-focus reflection). I discussed these issues with Pancake Optics in Meta (aka Facebook) Cambria Electrically Controllable LC Lens for VAC. With computer monitors, there are no optics to cause these problems.

Optical Distortion

As explained in Meta Quest Pro (Part 1) – Unbelievably Bad AR Passthrough, the Meta Quest Pro rotates the two displays for the eyes ~20° to clear the nose. The optics also have very large pincushion distortion. The display processor on the MQP pre-corrects digitally for the display optics’ severe pincushion distortion. This correction comes at some loss of fidelity in the resampling process.

The top right image shows the video feed to the displays. The distortion and rotation have been digitally corrected in the lower right image, but other optical problems are not shown (see through-the-lens pictures in this art cle).

There is also an optical “cropping” of the left and right eye displays, indicated by the Cyan and Red dashed lines, respectively. The optical cropping shown is based on my observations and photographs.

The pre-distortion correction is certainly going to hurt the image quality. It is likely that the AVP, using similar pancake optics, will have similar needs for pre-correction. Even though the MQP displays are rotated (no word on the AVP), there are so many other transforms/rescalings, including the transforms in 3-D space required to make the monitor(s) appear stationary, that if the rotation is combined with them (rather than done as a separate transform), the rotation o the display’s effect on resolution may be negligible. The optical quality distortion and the loss of text resolution, when transformed in 3-D space, are more problematic.

Binocular Overlap and Rivalry

One of the ways to improve the overall FOV with a biocular system is to have the FOV of the left and right eye only partially overlap (see figure below). The paper Perceptual Guidelines for Optimizing Field of View in Stereoscopic Augmented Reality Displays and the article Understanding Binocular Overlap and Why It’s Important for VR Headsets discuss the issues with binocular overlap (also known as “Stereo Overlap”). Most optical AR/MR systems have a full or nearly full overlap, whereas VR headsets often have a significant amount of partial overlap.

Partial overlap increases the total FOV when combining both eyes. The problem with partial overlap occurs at the boundary where one FOV ends in the middle of the other eye’s FOV. One eye sees the image fade out to black, whereas the other sees the image. This is a form of Biocular Rivalry, and it is left to the visual cortex to sort out what is seen. The visual cortex will mostly sort it out in a desirable way, but there will be artifacts. Most often, the visual cortex will pick the eye that appears brighter (i.e., the cortex picks one and does not average), but there can be problems with the transition area. Additionally, where one is concentra ing can affect what is seen/perceived.

In the case of the MQP, the region of binocular overlap is slightly less than the width of the center monitor in Meta’s Horizon’s Desktop when viewed from the starting position. Below left shows the view through the left eye when centering the monitor in the binocular FOV.

When concentrating on a cell in the center, I didn’t notice a problem, but when I took in the whole image, I could see these rings, particularly in the lighter parts of the image.  

The Meta Quest 2 appears to have substantially more overlap. On the left is a view through the left eye with the camera positioned similarly to the MQP (above left). Note how the left eye’s FOV overlaps the hole central monitor. I didn’t notice the transition “rings” with the Meta Quest 2 as I did with the MQP.

Binocular overlap is not one of those things VR companies like to specify; they would rather talk about the bigger FOV.

In the case of the AVP, it will be interesting to see the amount of binocular overlap in their optics and if it affects the view of the virtual monitors. One would like the overlap to be more than the width of a “typical” virtual monitor, but what does “typical” mean if the monitors can be of arbitrary size and positioned anywhere in 3-D space, as suggested in the AVP’s marketing material?

Inscribing a virtual landscape-oriented monitor uses about half of the vertical pixels of the headset.

The MQP’s desktop illustrates the basic issues of inscribing a virtual monitor into the VR FOV while keeping the monitor stationary. There is some margin for allowing head movement without cutting off the monitor, which would be distracting. Additionally, the binocular overlap cutting off the monitor is discussed above.

As discussed in more detail, the MQP uses a 16:10 aspect ratio, 1920×1200 pixel “virtual monitors” (the size it reports to Windows). The multiple virtual monitors are mapped into the MQP’s 1920×1800 physical display. Looking straight ahead, sitting at the desktop, you see the central monitor and about 30% of the two side monitors.

The center monitor’s center uses about 880 pixels, or about half of the 1800 vertical pixels of the QP’s physical display. The central monitor behaves about 1.5 meters (5 f et) away or about 2 to 3 times the distance of a typical computer monitor. This makes “head zooming” (leaning in to make the image bigger) ineffective.

Apple’s AVP has a similar FOV and will have similar limitations in fitting virtual moni ors. There is the inevitable compromise between showing the whole monitor with some latitude user moving their head while avoiding cutt ng off the monitor the sides of the monitor.

Simplified Scaling Example – Rendering a Pixel Size Dot

The typical readable text has a lot of high-resolution, high contra t, and features that will be on the order of one pixel wide, such as the stroke and dot in the letter “i.” The problems with drawing a single pixel size dot in 3-D space illustrate some of the problems.

Consider drawing a small circular dot that, after all the 3-D transforms, is the size of about one pixel. In the figure below, the pixel boundaries are shown with blue lines. The four columns below in the figure below show a few of an infinite number of relationships between a rendered dot and the pixel grid.

The first row shows the four dots relative to the grid. The nearest pixel is turned on in the second row based on the centroid. In row three, a simple average is used to draw the pixel where the average of 4 pixels should equal the brightness of one pixel. The fourth row shows a low-pass filter of the virtual dots. The fifth row renders the pixels based on the average value of the low-pass filtered version of the dots.

The centroid method is the sharpest and keeps the size of the dot the same, but the location will tend to jump around with the slightest head movement. If many dots formed an object, the shape would appear to wriggle. With the simple average, the “center of mass” is more accurate than the centroid method, but the dot changes shape dramatically based on alignment/movement. The average of the low-pass filter method is better in terms of center of mass, and the shape changes less based on alignment, but now a single pixel size circle is blurred out over 9 pixels.

There are many variations to resampling/scaling, but they all make tradeoffs. A first-order tradeoff is between wiggling (changing in shape and location), with movement versus sharpness. A big problem with text when rendered low ppd displays, including the Apple Vision Pro, is that many features, from periods to the dots of letters to the stroke width of small text fonts, will be close to 1 pixel.

Scaling text – 40+ Years of Computer ont Grid Fitting (“Cheating”) Exposed

Since the beginning, personal computers have dealt with low pixels-per-inch monitors, translating into low pixels per degree based on typical viewing distances. Text is full of fine detail and often has perfectly horizontal and vertical strokes that, even with today’s higher PPI monitors, cause pixel alignment issues. Text is so important and so common that it gets special treatment. Everyone “cheats” to make text look better.

The fonts need to be recognizable without making them so big that the eye has to move a lot to read words and make content less dense with less information on a single screen. Big fonts produce less content per display and more eye movement, making the muscles sore.

In the early to mid-1980, PCs moved rough-looking fixed space to proportionally spaced text and carefully hand-crafted fonts, and only a few font sizes were available. Font edges are also smoothed (antialiased) to make it look better. Today, most fonts are rendered from a model with “hints” that help the fonts look better on a pixel grid. TrueType, originally developed by Apple as a workaround to paying royalties to Adobe, is used by both Apple and MS Windows and includes “Hints” in the font definitions for grid fitting (see: Windows hinting and Apple hinting).

Simplistically, grid fitting tries to make horizontal and vertical strokes of a font land on the pixel grid by slightly modifying the shape and location (vertical and horizontal spacing) of the font. Doing so requires less smoothing/antialiasing without making the font look jagged. This works because computer monitor pixels are on a rectangular grid, and in most text applications, the fonts are drawn in horizontal rows.

Almost all font rending is grid fits, just some more than others (see from 2 07 Font rendering philosophies of Windows & Mac OS X). Apple (and Adobe) have historically tried to keep the text size and spacing more accurate at some loss in font sharpness and readability on low PPI monitors (an easy solution for Apple as they expect you to buy a higher PPI monitor). MS Windows with ClearType and Apple with their LCD font smoothing have options to try and improve fonts further by taking advantage of LCDs with side-by-side red-green-blue subpixels.

But this whole grid fitting scheme falls apart when the monitors are virtualized. Horizontal and vertical strokes transform into diagonal lines. Because grid fitting won’t work, the display of a virtual monitor needs to be much higher in angular resolution than a physical monitor to show a font of the same size with similar sharpness. Yet today and for the foreseeable future, VR displays are much lower resolution.

For more on the definition of font “Points” and their history with Windows and Macs, see Appendix 3: Confabulating typeface “points” (pt) with With Pixels – A Brief History.

Rendering Options: Virtual Monitors Fixed in 3-D Space Breaks the “Pixel Grid.”

The slightest head movement means that everything has to be re-rendered. The “grid” to which you want to render text is not the virtual monitor but that of the headset’s display. There are at least two main approaches:

  1. Re-render everything from scratch every frame – This will give the best theoretical image quality but is very processor intensive and will not be supported by most legacy applications. Simply put, these applications are structured to draw in terms of physical pixels of a fixed size and orientation rather than everything drawn virtually.
  2. Render to a “higher” resolution (if possible) and then scale to the headset’s physical pixels.
    • One would like the rendering to be at least 2X (linearly, 4X the pixels) of the physical pixels of the headset covering the same area to keep from having significant degradation in image quality after the scaling-down process.
    • The higher-resolution virtual image transformed onto the surface (which might be curved itself) of the virtual monitor in 3-D space. Virtual monitor processing can become complex if the user can put multiple monitors here, there, and everywhere that can be viewed from any angle and distance. The rendering resolution needed for each virtual monitor depends on the virtual distance from the eye.
    • Even with this approach, there are “application issues” from the legacy of 40+ years of pcs dealing with fixed pixel grids.
    • The grid stretching (font hinting) becomes counterproductive since they are stretching to the virtual rather than the physical display.

Systems will end up with a hybrid of the two approaches mixing “new” 3-D applications with legacy office applications.

Inscribing a virtual landscape-oriented monitor uses about half of the vertical pixels of the headset.

The MQP’s Horizons appears to render the virtual monitor(s) and then re-render them in 3-D space along with the cylindrical effect plus pre-correction for their Pancake lens distortion.

The MQP’s desktop illustrates the basic issues of inscribing a virtual monitor into the VR FOV while keeping the monitor stationary. There is some margin for allowing head movement without cutting off the monitor, which would be distracting. Additionally, the binocular overlap cutting off the monitor is discussed above.

The MQP uses a 16:10 aspect ratio, 1920×1200 pixel “virtual monitors.” The multiple virtual monitors are mapped into the MQP’s 1920×1800 physical display. Looking straight ahead, sitting at the desktop, you see the central monitor and about 30% of the two side monitors.

The virtual monitor’s center uses about 880 pixels, or about half of the 1800 vertical pixels of the MQP’s physical display or 64% of the 1200 vertical pixels reported to Windows with the use at the desktop.

The central monitor behaves like it is about 1.5 meters (5 feet) away or about 2 to 3 times the distance of a typical computer monitor. This makes “head zooming” (leaning in to make the image bigger) much less effective (by a factor of 2 to 3X).

Apple’s AVP has a similar FOV and will have similar limitations in fitting virtual monitors. There is the inevitable compromise between showing the whole monitor with some latitude user movi g their head while avoiding cutting off the monitor on the sides of the monitor.

The pre-distortion correction is certainly going to hurt the image. It is possible that the AVP, using similar pancake optics, will have similar needs for pre-correction (most, if not all, VR optics have significant pincushion distortion – a side effect of trying to support a wide FOV). The MQP displays are rotated to clear the nose (no word on the AVP). However, this can be rolled into the other transformations and probably does not significantly impact the processing requirement or image quality.

A simplified example of scaling text

The image below, one cell of a test pattern with two lines of text and some 1- and 2-pixel-wide lines, shows a simulation (in Photoshop) of the scaling process. For this test, I chose a 175% scaled 11pt front which should have roughly the same number of pixels as an 11pt font at 100% on an Apple Vision Pro. This simulation greatly simplifies the issue but shows what is happening with the pixels. The MQP and AVP must support resampling with 6 degrees of free om in the virtual world and a pre-correcting distortion with the optics (and, in the case of MQP’s Horizons, curve the virtual monitor).

Source Cel (left), Simulated 64% scaling (right)
  • Sidenote: This one test pattern accidentally has an “i” rather than a “j” between the g & k that I discovered late into editing.

The pixels have been magnified by 600% (in the full-size image), and a grid has been shown to see the individual pixels. On the top right source has been scaled by 64%, about the same amount MQP Horizons scales the center of the 1920×1200 virtual monitor when sitting at the desktop. The bottom right image scales by 64% and rotates by 1° to simulate some head tilt.

If you look carefully at the scaled one and two-pixel wide lines in the simulation, you will notice that sometimes the one-pixel wide lines are as wide as the 2-pixel lines but dimmer. You will also see what started as identical fonts from line to line look different when scaled even without any rotation. Looking through the lens cells, the fonts have further degradation/softening as they are displayed on color subpixels.

Below is what the 11pt 175% fonts look like via the lens of the MQP in high enough resolution to see the color subpixels. By the time the fonts have gone through all the various scaling, they are pretty rounded off. If you look closely at the same font in different locations (say the “7” for the decimal point), you will notice every instance is different, whereas, on a conventional physical monitor, they would all be identical due to grid fitting.

MQP 175% Scaled 11pt fonts

For reference, the full test pattern and the through-the-lens picture of the virtual monitor are given below (Click on the thumbnails to see the full-resolution images). The camera’s exposure was set low so the subpixels would not blow out and lose all their color.

Scintillating Text

When looking through the MQP, the text scintillates/sparkles. This occurs because no one can keep their head perfectly still, and every text character is being redrawn on each frame with slightly different alignments to the physical pixels causing the text to wriggle and scintillate.

Scaling/resampling can be done with sharper or softer processing. Unfortunately, the sharper the image after resampling, the more it will wriggle with movement. The only way to avoid this wriggling and have sharp images is to have a much higher ppd. MQP has only 22.5ppd, and the AVP has about 40ppd and should be better, but I think they would need about 80pp (about the limit of good vision and what Apple retinal monitors support) to eliminate the problems.

The MQP (and most displays) uses spatial color with individual red, green, and blue subpixels, so the wriggling is at the subpixel level. The picture below shows the same text with the headset moving slightly between shots.

Below is a video from two pictures taken with the headset moved slightly between shots to demonstrate the scintillation effect. The 14pt font on the right has about the same number of pixels as an 11pt font with the resolution of the Apple Vision Pro.

Scintillation/wiggle of two frames (right-click > “loop” -> play triangle to see the effect)


This will not be a close call, and using any VR headset, including the QP and Apple Vision Pro, as a computer monitor replacement fails any serious analysis. It might impress people who don’t understand the issues and can be wowed by a flashy short demo, and it might be better than nothing. But it will be a terrible replacement for a physical monitor/display.

I can’t believe Apple seriously thinks a headset display with about 40ppd will make a good virtual monitor. Even if some future VR headset has 80ppd and over 100-degree FOV, double the AVP linearly or 4X, it will still have problems.

Part 5B of this series will include more examples and more on my conclusions.

Appendix 1: My 1980s History with Bitmapped Fonts and Multiple Monitors

All this discussion of fonts and 3-D rendering reminded me of those early days when the second-generation TMS34020 almost got designed into the color Macintosh (1985 faxed letter from Steve Perlman from that era – right). I also met with Steve Jobs at NeXT and mentioned Pixar to him before Jobs bought them (discussed in my 2011 blog article) and John Warnock, a founder of Adobe, who was interested in doing a Port of Postscript to the 34010 in that same time frame.

In the 1980s, I was the technical leader for a series of programs that led to the first fully programmable graphics processor, the TMS34010, and the Multi-ported Video DRAM (which led to today’s SDRAM and GDRAM) at Texas Instruments (TI) (discussed a bit more here and in Jon Peddie’s 2019 IEEE article and his 2022 book “The History of the GPU – Steps to Invention”).

In the early 1980s, Xerox PARC’s work influenced my development of the TMS34010, including Warnock’s 1980 paper (while still at PARC), “The Display of Characters Using Gray Level Sample Arrays,” and the series of PARC’s articles in BYTE Magazine, particularly the August 1981 edition on Smalltalk which discussed bit/pixel aligned transfers (BitBlt) and the use of a “mouse” which had to be explained to BYTE readers as, “a small mechanical box with wheels that lets you quickly move the cursor around the screen.”

When defining the 34010, I had to explain to TI managers that the Mouse would be the next big input device for ergonomic reasons, not the lightpen (used on CAD terminals at TI in the early 1980s), which requires the user to keep their arm floating in the air which quickly become tiring. Most AR headset user interfaces make users suffer with having to float their hands to point, select, and type, so the lessons of the past are being relearned.

In the late 1980s, a systems engineer for a company I had never heard of called “Bloomberg,” who wanted to support 2 to 4 monitors per PC graphics board, came to see us at TI. In a time when a single 1023×786 graphic card could cost over $1,200 (about $3,000 in 2023 dollars), this meeting stood out. The Bloomberg engineer explained how Wall Street traders would pay a premium to get as much information as possible in front of them, and a small advantage on a single trade would pay for the system. It was my first encounter with someone wanting multiple high-resolution monitors per PC.

I used to have a life designing cutting-edge products from blank sheets of paper (back then, it was physical paper) through production and marketing; in contrast, I blog about other people’s designs today. And I have dealt with pixels and fonts for over 40 years.


Below is one of my early presentations on what was then called the “Intelligent Graphics Controller” (for internal political reasons, we could not call it a “processor”), which became the TMS34010 Graphics System Processor. You can also see the state of 1982 presentation technology with a fixed-spaced font and the need to cut and paste hand drawings. This slide was created in Feb 1982. The Apple Lisa didn’t come out until 1983, and the Mac in 1984.

1986 and the Battle with Intel for Early Graphics Processor Dominance

e announced the TMS34010 in 1986, and our initial main competitor was the Intel 82786. But the Intel chip was “hardware” and lacked the 34010’s programmability, and to top it off, the Intel chip had many bugs. In just a few months, the 82786 was a non-factor. The copies of a few of the many articles below capture the events.

1986 we wrote two articles on the 34010 in the IEEE CG&A magazine. You can see from the front pages of the articles the importance we put on drawing text. Copies of these articles are available online (click on the thumbnails below to be linked to the full articles). You may note the similarity of the IEEE CG&A article’s first figure to the one in the 1981 Byte Smalltalk article, where we discussed extending “BitBlt” to the color “PixBlt.”

Around 1980 we started publishing a 3rd party guide of all the companies developing hardware and software for the 340 family of products, and the June 1990 4th Edition contained over 200 hardware and software products.

Below is a page from the TMS340 TIGA Graphics Library, including the font library. In the early 1980s, everyone had to develop their font libraries. There was insufficient power to render fonts with “hints” on the fly. We also do well to have bitmapped fonts with little or no antialiasing/smoothing. From about

Sadly, we are a bit before our time, and Texas Instruments had, by the late 1980s, fallen far behind TSMC and many other companies in semiconductor technology for making processors. Our competitors, such as ATI (NVidia wasn’t founded until 1993), could get better semiconductor processing at a lower cost from the then-new semiconductor 3rd party fabs such as TSMC (founded in 1987).

Appendix 2: Notes on Pictures

All the MQP pictures in these two articles were taken through the l ft eye optics using either the Canon R5 (45mp) with an RF16mmf2.8 or 28mmf2.8 “pancake” lens or the lower resolution Olympus E-M5D-3 (20mp) with 9-18mm zoom lens at 9mm. Both cameras feature a “pixel shift” feature that moves the lens, giving 405mp (24,576 x 16,384) for the R5 and 80mp (10,368 x 7,776 pixels) for the M5D-3 and all the pictures used this feature as it gave better resolution, even if the images were later scaled down.

High-resolution pictures of computer monitors with color subpixels and any scaling or compression cause issues with color and intensity moiré (false patterning) due to the “beat frequency” between the camera’s color sensor and the display device. In this case, there are many different beat frequencies between both the pixels and color subpixels of the MQP’s displays and the cameras. Additionally, the issues of the MQP’s optics (which are poor compared to a camera lens) vary the resolution radially. I found for the whole FOV image, the lower-resolution Olympus camera didn’t have nearly as severe a moiré issue (only a little in intensity and almost none in color). In contrast, it was unavoidable with the R5 with the 16mm lens (see comparison below).

The R5 with the 28mmf2.8 Lens and pixel shift mode could capture the MQP’s individual red, green, and blue subpixels (right). In the picture above, the two “7s” on the far right have a little over 1 pixel wide horizontal and diagonal stroke. The two 7’s are formed by different subpixels caused by them being slightly differently aligned in 3D space. The MQP’s displays are rotated by about 20°; thus, the subpixels are on a 20° diagonal (about the same as the lower stoke on the 7’s. Capturing at this resolution where the individual red, green, and blue sub-pixels are visible necessitated underexposing the overall image by about 8X (3 camera stops). Otherwise, some color dots (particularly green) will “blow out” and shift the color balance.

As seen in the full-resolution crop above, each color dot in the MQP’s display device covers about 1/8th of the area of a pixel, with the other two colors and black filling the rest of the area of a pixel. Note how the scaled-down version of the same pixels on the right look dim when the subpixels are averaged together. The camera exposure had to be set about three stops lower (8 times in brightness as stops are a power of two) to avoid blowing out the subpixels.

Appendix 3: Confabulating typeface “points” (pt) with With Pixels – A Brief History

Making a monitor appear locked in 3-D spaces breaks everything about how PCs have dealt with rendering text and most other objects. Since the beginning of PC bitmap graphics, practical compromises (and shortcuts) have been made to reduce processing and to make images look better on affordable computer monitors. A classic compromise is the font “point,” defined (since 1517) at ~1/72nd of an inch.

So, in theory, when rendering text, a computer should consider the physical size of the monitor’s pixels. Early bitmapped graphics monitors in the mid-1980s had about 60 to 85 ppi, so the PC developers (except Adobe with their Postscript printers, with founders from Xerox PARC, that also influenced Apple) without a processing power to deal with it and the need to get on with making products confabulated “points” and “pixels.” Display font “scaling” helps correct this early transgression.

Many decades ago, MS Windows decided that a (virtual) 96 dots per inch (DPI) would be their default “100%” font scaling. An interesting Wikipedia article on the convoluted logic that led to Microsoft’s decision is discussed here. Conversely, Apple stuck with 72 PPI as their basis for fonts and made compromises with font readability on lower-resolution monitors with smaller fonts. Adherence to 72 PPI may explain why a modern Apple Mac 27” monitor is 5K to reach 218 ppi (within rounding of 3×72=216). In contrast, the much more common and affordable 27” 4K monitor has 163 ppi, not an integer multiple of 72, and Macs have scaling issues with 3rd party monitors, including the very common 27” 4k.

Microsoft and Macs have tried to improve the text by varying the intensity of the color subpixels. Below is an example from MS Windows with “ClearType” for a series of different-size fonts. Note particularly the horizontal strokes at the bottom of the numbers 1, 2, and 7 below and how the jump from 1 pixel wide with no smoothing from Calibri 9 to 14pt, then an 18pt, the strokes jump to 2 pixels wide with a little smoothing and then at 20pt become 2 pixels wide with no smoothing vertically.

Apple has a similar function known as “LCD Font Smoothing. Apple had low-ppd text rendering issues in its rearview mirror with “retinal resolution” displays for Mac laptops and monitors. “Retinal resolution” translates to more than 80ppd when viewed normally, which is about from about 12” (0.3 meters) for handheld devices (ex. iPhone) or about 0.5 to 0.8 meters for a computer.

The chart was Edited for Space, and ppd in information was added.

Apple today sells “retina monitors” with a high 218 PPI, which makes text grid fitting less of an issue. But as the chart from Mac external displays for designers and developers (right), Mac systems have resolution and performance issues with in-between resolution monitors.

The Apple Vision Pro has less than 40 ppd, much lower than any of these monitors at normal viewing distance. And that is before all the issues with making the virtual monitor seem stationary as the user moves.

Karl Guttag
Karl Guttag
Articles: 243


  1. Obviously a VR headset is not a good replacement for a real monitor. I am sure also Apple knows that. But they had to show some use cases and they could not reveal their true use cases yet (assuming they have some). If they had done that, those would be copied and running on other VR glasses long before the AVP becomes available. So let’s just ignore anything they said during the reveal and wait for the product release.

    • It apparently is not that obvious to everyone based on the number of people that said that it was “the future.” There are those that think Apple can do no wrong and if Apple says so it must be true. It seemed like 90% of the people that got to try on the AVP are in these categories.

      There is a large group of people that think the whole AVP is silly (such as the reactions from The Verge).

      There are also a group of people, think Apple is fishing for the use cases hoping that developer will identify them.

      Your take seems to be a bit different in that all the failed used cases are just a ruse to throw people off Apple’s “true” use case (if there is one).

      I don’t see the need to wait to start analyzing with the available information. The way you sharpen your analytical skills is to theories and predict and then see which ones prove to be correct. In this case, there are many things that seem obviously wrong with the AVP. I’m still floored by the horrible placement of the passthrough cameras so they could support the winking-blinking effect of the front display.

      • Sorry, I didn’t mean to undermine this article. Your analysis is spot on and very thorough.
        Still, it just seems absurd to me anyone who knows more about this would seriously consider that a VR device will be a proper replacement for a monitor/desk setup.
        I’m not saying Apple placed the AVP as monitor replacement, so they can throw people or competition off. It is just that they had to place it somehow during the announcement and in my mind they cannot reveal their true use cases yet – so long before market availability. That would make no sense.

    • This doesn’t seem at all like an Apple concern. With the admittedly massive, company-defining exception of the iPhone, modern Apple does absolutely nothing but enter established segments and do them either better or “better” than the existing players. Independently, the lead times on HMDs mean we know nothing on the AVP’s tier (except existing Varjo stuff) will be available before its launch anyway.

  2. Thanks for this great article, Karl! It helped a lot to what we are doing in increasing ppi in microdisplay, beyond 10k ppi.
    As you showed in this article, it looks like we need to have much smaller pixel or bigger ppd for XR with head moving than normal stationary display, where LCoS has advantage.
    I’m thinking that the image information, resolution, however, is not necessarily be higher, but pixel need to be small enough to manipulate the text rendering effective.
    So, what do you think about the upscaling in XR display? For example, the information for user don’t need to be 8K resolution but pixel need to be small enough for 60 ppd… Display information 2 million pixels (like 1920×1080 resolution) but display has 8 million pixels which can provide reducing screen door effect and more sophisticated text rendering while in move?

  3. Swave Photonics spin-off from the IMEC
    Swave uses a conventional alloy of germanium, antimony and tellurium (GST)
    IMEC looked at a lot of technologies, DLP, MEMS, microLED, but at best they were around a 2-micron pixel pitch.” Not only does GST get to 225nm pixel-pitch
    By making arrays with a pixel-pitch below the wavelength of visible light it is possible to program the visual field to produce an optical wavefront, as if from a real object.
    Swave expects to exhibit behind closed doors at the CES event in Las Vegas January 2024

  4. You’re right that’s it not a compelling monitor replacement for text. Them showing that felt like padding out the presentation and trying to justify the high rumored price when the one realistic use case is displaying large immersive video and still images in a portable product.

  5. This is the one and only good article about it i found. Thanks for your article.
    I am thinking about this as a concept, not a great end product to be honest. It looks like a really massive brand testing things out.

  6. Apple does not necessarily have to fish for the use case. There are plenty of use cases that are maybe no obvious – reading text, browsing the net does not belong to them.

    If you work in a studio environment, like sound design, film post production, 3D design – just to name a few – you need as much screen real estate as possible. Virtualizing them in a AVP sounds really compelling to me, having worked 35 years in this fields.

    My ‘regular’ work setup is between 4-5 screens of 30″ and above dimension. I have to change physical position to be able to see the different screens, while still rotating / scrolling content in different screens (like channels/tracks), asset lists / libraries.

    With an AVP that is pretty easy accomplished by merely rotating or tilting my head, combined with gestures content literally flows.

    For most part I can not take my studio setup with me on location. Yet an AVP would give me full access and more even sitting in a airplane.

    So there are cases and workflows will adapt to the tools available.

    I think nobody is expecting the AVP to be then end to all the questions, the answer to all the problems.

    • Since I was working on a “high end” graphics processor, I tended to have good monitors. I went for many years with an SXGA (1280×768) monitors. That was about all you could get at a rational cost out of CRT technology before scanning rates and beam control accuracy to support >75Hz refresh rates started getting ridiculous. Note there was 1920x1080I (interlaced) and 1280x720P (progressive) for HDTV; that was all about CRT scan rates that could be rationally support. Sadly, the HDTV standards were messed up because they started out accommodating CRTs that would be obsolete by the time HDTV took off on the US. But computer monitors couldn’t be interlaced due to flicker and besides had much better color with a wide viewing angle.

      Resolution didn’t take off again until LCD Monitors became practical and everything jumped to 1920x1080P (6OHz) or 1920x1200P (1280×720 or 1280×800 for the lower end).

  7. I’m really curious what you think about the immersed headset that is being marketed and will be sold as a monitor replacement.

    • I assume you are talking about the immersed virtual desktop software. I don’t think they have a “headset.”
      I haven’t tried it as I was getting the information I needed on the MQP’s displays using Meta’s Horizons and it was reported by some reviewers as buggy. immersed is more flexible which is both good and bad as it means more things to check up. I’ve been thinking about trying it out later.

  8. What is the PPD for a 1920×1080 27 inch monitor when compared to the IBM XGA monitor used as an illustrative example? Are they comparable? I use a 1080 monitor for work and it’s more than adequate.

    • A 27″ 1920×1080 works out to about 31 to 47ppd if you sit 0.5 to 0.8 inches from the monitor. The bigger the monitor and bigger the pixels the farther a person will tend to sit. The IBM XGA at 16″ and 1024×768 works out to 28ppd to 45ppd from 0.5m to 0.8m respectively.

      Here is a handy calculator for translating monitor screen diagonals and pixel resolution into PPI and ppd.

      A big point of the article is that once you “virtualize” the monitor and try to make it appear stationary in 3-D space, it hurts the readability of the text. Text that is good enough at a given angular resolution on a fixed monitor will not be on a virtualized monitor due to the rescaling and the inability to “grid fit” the text like most PC/Mac text generators perform.

      • While I can believe it, is the statistically overwhelmingly likely difference between the two eyes’ grids purely a negative? Do you get effectively some supersampling (as well as suppression of scintillation) when both foveae are covering the same zone on the informational texture and head micromovements are inevitable?

      • One eye dominates in any given region of the display. There is no “additive effect” or smoothing of scintillation that I notice (the center of the image seems to scintillate as much with one or two eyes). But then, I know from personal experience from vision tests there is some advantage in reading text to use both eye.

        I don’t know to what degree of accuracy the vision system picks or whether it factors in other things such as focus, but it would make and interesting study (likely out there someplace). Everyone has a “dominate eye” that will tend to “win.” With monocular optics systems they need to put the virtual display on the dominate eye.

  9. Apple doesn’t use hinting anymore. Or sub-pixel antialiasing. They render everything at a multiple of screen resolution and scale down.

    Looks terrible if you don’t have a 4K or better display.

  10. Really interesting article, thank you. Unfortunately it pretty much confirms what I’ve been thinking about AVP.

    I’ve also been following the development of lighter alternatives such as the nReal (I think they rebranded to xReal) Air and Lenovo Glasses T1 but have also come to the conclusion that the resolutions on offer are simply insufficient. However: In principle these should be more suitable because they have fixed screen positions and you can therefore use subsampling/hinting etc. without having as many issues. They’re also much lighter and can be powered by the phone/tablet/laptop they’re connected to indefinitely. Plug in a single USB-C cable and you’re good to go.

    I think 4k per eye (they’re currently at 1080p IIRC) should be decent for a single 3840×2160 (or maybe 3840×3840?) display on those types of glasses – that’s when I’ll start considering them for actual text-based work. I do most of my work on a 15.6″ 4k laptop at roughly 125% scaling (it’s actually Gnome at 100% with slightly increased font sizes, but 125% is what I was running on Windows before) and a 40″ 4k desktop monitor at 100%, and I’m not looking for a downgrade in terms of screen real estate – if I want a 1080p screen I’ll just use a 12-13″ laptop – the minimum useful size is limited by the keyboard anyway.

    AR/VR type glasses on the other hand, with a view port coupled to accelerometers and other sensors, present a much bigger challenge and likely won’t be able to be used like Apple’s marketing mockup video suggests until they hit 8k or 12k per eye.

Leave a Reply

%d bloggers like this: