Canon R5 Mk ii Drops Pixel Shift High Res. – Is Canon Missing the AI Big Picture?

Introduction

Sometimes, companies make what seems, on the surface, technically poor decisions. I consider this the case with Canon’s new R5 Mark ii (and R1) dropping support for sensor Pixel Shifting High Resolution (what Canon calls IBIS High Res). Canon removed the IBIS High Res mode, which captures (as I will demonstrate) more real information and seemingly adds an AI upscaling to create fake information. AI upscaling, if desired, can be done better and more conveniently on a computer, but Pixel Shift/IBIS High Res cannot.

The historical reason for pixel shift is to give higher resolution in certain situations. Still, because the cameras combine the images “in-camera” with the camera’s limited processing and memory resources plus simple firmware algorithms, they can’t deal with either camera or subject motion. Additionally, while the Canon R5 can take 20 frames per second (the R5 Mark ii can take 30 frames per second), taking the nine frames takes about half a second, but then it takes another ~8 seconds for the camera to process them. Rather than putting more restrictions on shooting, it would have been much easier and faster to save the raw frames (with original sensor subpixels) to the flash drive for processing later by a much more capable computer using better algorithms that can constantly be improved.

Canon’s competitors, Sony and Nikon, are already saving raw files with their pixel-shift modes. I hoped Canon would see the light with the new R5 mark ii (R5m2) and support IBIS HR in saving the raw frames. Instead, Canon went in the wrong direction; they dropped IBIS High Res altogether and added an in-camera “AI upscaling.” computer. The first-generation R5 didn’t have IBIS High Res, but a firmware release later added this capability. I’m hoping the same will happen with the R5 Mark ii, only this time saving the RAW frames rather than creating an in-camera JPEG.

Features Versus Capabilities

I want to distinguish between a “feature” and a “capability.” Take, for example, high dynamic range. The classical photography problem is taking a picture in a room with a window with a view; you can expose inside the room, in which case the view out the window will be blown out, or you expose the view out the window, in which case the room will look nearly black. The Canon R5 has an “HDR Mode” that takes multiple frames at different exposure settings and allows you to save a single processed image only or with all the frames saved. The “feature” was making a single HDR image, and the “capability” was rapidly taking multiple frames with different exposures and saving those frames.

The Canon R5 made IBIS High Res a feature when it only offered a single JPEG output without the capability of saving individual frames with the sensor shifted by sub-pixel amounts. By saving raw frames, the software could better combine frames. Additionally, the software could deal with camera and subject motion, which are unsavable artifacts in an IBIS high-res JPEG. As such, when I use IBIS High Res, I typically take three pictures just in case, as one of the pictures often will have unfixable problems that can only be seen once viewed on a computer monitor. It would also be desirable to select how many frames to save; for example, saving more than one cycle of frames would help deal with subject or camera motion.

Cameras today support some aspects of “computational photography.” Saving multiple images can be used for panoramic stitching, high dynamic range, focus stacking (to support larger than possible depths of focus with a single picture), and astrophotography image stacking (using interval timers to take many shots that are added together). Many cameras, like the R5, have even added modes to support taking multiple pictures for focus stacking, high dynamic range, and interval timers. So for the R5 mk. ii to have dropped sensor pixel shifting seems like a backward direction in the evolution of photography.

This Blog’s Use of Pixel Shifting for Higher Resolution

Both cameras have “In-Body-Stabilization” (IBIS) that normally moves the camera sensor based on motion detection to reduce camera/lens motion blur. They both also support a high-resolution mode where, instead of using the IBIS for stabilization, they use it to shift the sensor by a fraction of a pixel to take a higher-resolution image. Canon called this capability “IBIS High Res.” The R5 in-camera combines nine images, each shifted by 1/3rd of a pixel, to make a 405mp JPEG image. The D5 combines four images, each shifted by a half pixel.

In the past year, I started using my “personal camera,” the Canon R5 (45MP “full frame” 35mm), to take pictures of VR/Passthrough-AR and optical AR glasses (where possible). I also use my older Olympus D5 Mark iii (20MP Micro 4/3rd) because it is a smaller camera with smaller lenses that lets it get into the optimum optical location in smaller form factor AR glasses.

The cameras and lenses I use most are shown on the right, except for the large RF15-35mm lens on the R5 camera, which is shown for comparison. To take pictures through the optics and get inside the eye box/pupil, the lens has to be physically close to the image sensor in the camera, which limits lens selection. Thus, while the RF15-35mm lens is “better” than the fixed focus 28mm and 16mm lenses, it won’t work to take a headset picture. The RF28mm and RF16mm lenses are the only full-frame Canon lenses I found to work. Cell phones with small lenses “work,” but they don’t have the resolution of a dedicated camera, aperture control, and shutter speed control necessary to get good pictures through headsets.

Moiré

Via Big Screen Beyond

In addition to photography being my hobby, I take tens of thousands of pictures a year via the optics of AR and VR headsets, which pose particular challenges for this blog. Because I’m shooting at displays with a regular pattern of pixels with a camera its regular pattern of pixels, there is a constant chance for moiré due to the beat frequencies between the pixels and color subpixels of the camera and the display device as magnified by the camera and headset optics (left). To keep within the eye box/pupil of the headset, I am limited to simpler lenses that are physically short to keep the distance from the headset optics to the camera short, which limits the focal lengths and thus magnification to combat moiré. In camera, pixel-shifting has proven to be a way to not only improve resolution but greatly reduce moiré effects.

Issues with moiré are not limited to taking pictures via AR and VR headsets; it is a problem with real-world pictures that include things like patterns in clothing (famously with fences (from a distance where they form a small pattern) and other objects with a regular pattern (see typical photographic moiré problems below).

Anti-Aliasing

Those who know signal theory know that a low-pass cutoff filter reduces/avoids aliasing (moiré is a form of aliasing). Cameras have also used “anti-aliasing” filters, which very slightly blur the image to reduce aliasing, but this comes at the expense of resolution. In the past, with lower-resolution sensors, the chance of encountering real-world things in a picture that would cause aliasing was more likely, and the anti-aliasing filters were more necessary.

As the resolution of sensors has increased, there is a lesser likelihood that something in the typical picture that is in focus will be at the point it aliases and combined with better algorithms that can detect and reduce the effect of moiré. Still, while sometimes the moiré can be fixed in post-processing, in critical or difficult situations, it would be better if additional frames were stored to clue software into processing it as aliasing/moiré rather than “real” information.

Camera Pixels and Bayer Filter (and misunderstanding)

Most cameras today (including Canon) use a Bayer Filter pattern (below right) with two green-filtered pixels for each red or blue pixel. When producing an image for a person to view, a computer’s camera or RAW conversion software, often called “debayering” or “demosaicing,” generates a full-color pixel by combining the information from many (8 or more) surrounding single-color pixels with the total number of full-color pixels equaling the number of photosites.

Camera makers count every photosite as a pixel even though the camera only captured “one color” at that photosite. Some people, somewhat mistakenly, think the resolution is one-quarter claimed since only one-quart red and blue photosites exist. After all, with a color monitor, we don’t count the red, green, and blue subpixels as 3 pixels but just one. However, Microsoft’s ClearType does gain some resolution from the color subpixels to refine text better.

It turns out that except for extreme image cases, including special test patterns, the effective camera resolution is close to the number of photosites (and not 1/4th or 1/2). There are several reasons why this is true. First, note the red, green, and blue filter’s frequency responses for the color camera sensor (above left – taken from a Sony sensor as it was available). Notice how their spectrums are wide and overlapping. The wide spectral nature of these filters is necessary to capture all the continuous spectrums of color in the real world (every call “red” does not have the same wavelength). If the filters were very narrow and only captured a single wavelength, then any colors that are not that wavelength would be black. Each photosite captures intensity information for all colors, but the filtering biases it toward bands of colors.

Almost everything (other than spectral lines from plasmas, lasers, and some test patterns) that can be seen in the real world is not a single wavelength but a mix of wavelengths. There is even the unusual case of magenta, which does not have a wavelength (and thus, many claim it is not a color) but is a mix of blue and red. With a typical photo, we have wide-spectrum filters capturing wide-spectrum colors.

It turns out that humans sense resolution mostly in intensity and not color. This fact has been exploited to reduce the bandwidth of early color television and to reduce data in all the video and image compression algorithms. Thanks to the overlap in the color filters in the camera filters, there is considerable intensity information in the various color pixels.

Human Vision and Color

Consider human vision if the camera sensor’s Bayer patterns and color filter spectral overlaps were bad, then consider the human retina. On average, humans have 7 million cones in the retina, of which ~64% are long (L) wavelength (red), ~32% medium (M – green), and ~2% short (S – blue). However, these percentages vary widely from person to person, particularly the percentage of short/blue cones. The cones that sense color support high resolution are concentrated in the center of vision.

Notice the spectral response of the so-called red, green, and blue cones (below left) and compare it to the camera sensor filters’ response above. Note how much the “red” and “green” responses overlap. On the right is a typical distribution of cones near the fovea (center) of vision, and note there are zero “blue”/short cones in the very center of the fovea; it makes the Bayer pattern look great😁.

Acuity of the Eye

Next, we have the fact that the cones are concentrated in the center of vision and that visual acuity falls off rapidly. The charts below show the distribution of rods and cones in the eye (left) and the sharp fall-off in visual acuity from the center of vision.

Saccadic Eye Movement – The Eyes’ “Pixel Shifting”

Looking at the distribution of cones and the lack of visual acuity outside the fovea, you might wonder how humans see anything in detail. The eye constantly moves in a mix of large and small steps known as saccades. The eye tends to blank while it moves and then takes a metaphorical snapshot. The visual cortex takes the saccade’s “snapshots” and forms a composite image. In effect, the human visual system is doing “pixel shifting.”

My Use of Pixel Shifting (IBIS High-Res)

I am a regular user of the IBIS High-Resolution on this blog. Taking pictures of displays with their regular patterns is particularly prone to moiré. Plus, with the limited lenses I can use that are all wide-angle (and thus low magnification), it helps to get some more resolution. With IBIS, a single picture 405 mp (24,576 by 16,384 pixels) IBIS High-Resolution image can capture ~100-degree wide FOV and yet see details of individual pixels from a 4K display device.

It seems a bit afterthought on the R5 with the JPEG output. Even with the camera on a tripod, it screws up, so usually, I take three shots just in case because I will only know later when I look at the results blown up on a monitor if one of them messed up. The close-in crops (right) are from two back-to-back shots with IBIS high-res. In the bad shot, you can see how the edges look feathered/jagged (particularly comparing vertical elements like the “l” in Arial). I would much rather have had the IBIS HR output the 9 RAW images.

IBIS High-Res Comparison to Native Resolution

IBIS High Res helps provide higher resolution and can significantly reduce moiré. Often, the pixel shift output will have much less moiré. I can often reduce the IBIS high-res to a lower resolution, and the image has much less moiré and is a bit sharper even when scaled down to the size of a “native” resolution picture as shown below.

The crops below show the IBIS High Res image at full resolution and the native resolution scaled up to match, along with insets of the IBIS High Res picture scaled down to match the native resolution.

The Image below was taken in IBIS High Resolution and then scaled down by 33.33% for publication on this blog (from the article AWE 2024 VR – Hypervision, Sony XR, Big Screen, Apple, Meta, & LightPolymers).

The crops below compare the IBIS High Res at full resolution to a native image upscaled by 300%. Notice how the IBIS High Res has better color detail. If you look at the white tower on a diagonal in the center of the picture (pointed to by the red arrow), you can see the red (on the left) blue chroma aberrations caused by the headset’s optics, but these and other color details are lost in the native shot.

Conclusions

While my specific needs are a little special, I think Canon is missing out on a wealth of computational photography options by not supporting IBIS High-Res with RAW output. The obvious benefits are helping with moiré and getting higher-resolution still lifes. By storing RAW, there is also the opportunity to deal with movement in the scene, which may even be hand-held. It would be great to have the option to control the shift amount (shift by 1/3 and 1/2 would be good options) and the number of pictures. For example, it would be good to capture more than one “cycle” to help deal with motion.

Smartphones are cleaning up on dedicated cameras in “computational photography” to make small sensors with mediocre optics look very good. Imagine what could be done with better lenses and cameras. Sony, a leader in cell phone sensors, knows this and has pixel shift with RAW output. I don’t understand why Canon is ceding the pixel shift to Sony and Nikon. Hopefully, it will be a firmware update like it was on the original R5. Only this time, please save the RAW/cRAW files.

In related news, I’m working on an article about Texas Instrument’s renewed thrust into AR with DLP. TI DLP has been working with PoLight to support Pixel Shift (link to video with PoLight) for resolution enhancement with AR glasses (see also Cambridge Mechatronics and poLight Optics Micromovement (CES/PW Pt. 6))

Karl Guttag
Karl Guttag
Articles: 297

2 Comments

  1. Nice story! I was skeptical about the quality of your pixel shift through the lens photos and this post addresses many of my concerns.

    The story I’ve heard about pixel shift is that various camera vendors have patents on flawed pixel shift systems, if they somehow combined them they might produce something better, but they can’t because of patents. This could well be a reason the feature got removed from your camera.

    I can say though, as a photographer, that I’m skeptical about increasing resolution. Occasionally I get a picture with my Sony α7iv where the depth of field is good enough and the image looks like pixel art in that I can resolve individual bricks at the top of an 11 story building when I’m at the base, each one is a few pixels and each pixel looks meaningful. In that case a better sensor or a pixel shift really would give me more meaningful pixels.

    I take a lot of macro photographs of flowers, on the other hand, and generally I can’t get the whole flower “tack sharp” at any settings because of depth of field limitations. The photo is beautiful and most people wouldn’t complain but I see photos that people produce with focus stacking and I can’t get it out of my mind that this is possible. In circumstances like that pixel shift can at best give me more useful pixels in the disc but do nothing for the ray, unless instead of focus stacking 10 images I pixel shift and focus stack and take more like 100 images.

    I thought about switching to the α7R series but the burden of massive RAW files seemed to much. I like doing sports photography and it’s enough of a burden to come back from a game with 400 large RAW files, it would be more of a burden to come back with 400 XXL RAW files. So I think there are a lot of reasons why many photogs might say “Nein Danke” to pixel shift.

    • Thanks,
      Hard to know whether the patent issue is a urban legend or true without seeing the patents. The concept of pixel shifting has to be very old and if there are patents they should be at or nearing expiring. Its are to believe that pixel shifting would have better patent coverage than focus stacking and there are many different stacking applications available.

      I having looked much at Sony cameras after I decided to go with the Canon RF system. My sense is that there is a lot of “grass is greener” in that each system as advantages and disadvantages. It looks like on a few of the latest Sony camera have focus bracketing. Camera controlled focus bracketing is a huge advantage for macro shots and handy but less critical for large depth of field landscapes. I used several times on my R5 and it is amazing what you can get. In one case I turned the focus bracketed macro pictures into a “movie” which was interesting. Unless the subject is dead and locked down, I can’t imaging combining focus stacking and pixel shifting.

      I think focus bracketing might be useful as an alternative to pixel shifting to prevent/reduces moire and maybe improve resolution. It would take some “smart” software to combine the main image with a few pictures at different focuses. In its simplest concept, the moire will change with the focus change and the software could use that information know that moire/aliasing is occuring. With focus breathing, you would also be getting a bit of pixel shift (albeit radial with none in the center). Unfortunately, the R5 and R5m2 both monotonically increase the focus when “bracketing.” What you would want is for it to act more like exposure bracketing where the focus moves back and forth from the original focus point.

      I think for the potential of pixel shift to be unlocked, they have to save the RAW files. Whenever I use pixel shift on the R5, I always take 3 shots and typically one of them will be messed up, but I won’t know it until I see it on my computer highly magnified. it would be so much better for the camera to take say 18 pictures (two sets of 9) and then be able use all that information to deal with any motion. I don’t understand at a technical level why Sony’s pixel shifts by Whole pixels for 4 shots or half pixels for 16 shots. Because of the overlapping color spectrums of the color filters, you don’t gain nearly has much as people think by shifting by whole pixels other than perhaps color moire, but the shifted 4 times by half a pixel, they would gain both in resolution and color information. The 16 shot mode seems overkill. They should have a 4 shot shift by 1/2 pixel mode. I’m not sure if Canons shift by 1/3 9 times would have much advantage over shifting 1/2 4 times (it would be interesting to compare). Pixel shifting quickly reaches a diminishing returns as it shifts by finer amounts while the size of the pixel stays the same.

Leave a Reply

Discover more from KGOnTech

Subscribe now to keep reading and get access to the full archive.

Continue reading