WebP vs AVIF at equal SSIM: a 100-photo compression benchmark

July 2, 2026·9 min read·by Maya Chen

I ran 100 real product photos through Atlas’s image engine four ways. At equal measured quality, SSIM-targeted WebP came out 56% smaller than a flat quality-85 export and AVIF 76% smaller. The numbers, the caveats, and the one change that backfired.

I ran 100 real product photos through Atlas's image engine four ways. At the same measured quality, an SSIM-targeted WebP came out 56% smaller than a flat quality-85 export, and AVIF came out 76% smaller. The engine already shipped those modes. The export path just was not using them.

Why I ran this

Atlas optimizes images. That is the pitch, and it sits right on the pricing page: "SSIM-based compression." So the fair question to ask of the product is simple. When a real user exports a real library, how good is the compression, actually? Not on one hand-picked sample. On a batch, with a quality metric, reproducibly.

I had a realistic test set on hand: 100 studio product photos from an auto-accessories e-commerce catalog. Jump starters, floor mats, dash cams, detailing kits. Consistent lighting, high-resolution source files averaging 4.6 MB each (100 files, 461 MB total). That is the "a photographer just uploaded a full shoot" scenario, so I used it as the benchmark corpus.

What is SSIM, and why does a compressor need it?

You cannot talk about image compression honestly without a quality metric, because you can always make a file smaller by making it worse. The metric most teams reach for is SSIM (Structural Similarity), introduced by Wang, Bovik, Sheikh, and Simoncelli in 2004. Instead of measuring per-pixel error like PSNR, SSIM compares local structure, luminance, and contrast the way human vision tends to, and returns a score from 0 to 1 where 1 is identical. A refinement, MS-SSIM (multi-scale SSIM, 2003), evaluates that structure across several resolutions and tracks human judgment even more closely. Atlas's engine implements both. Sources: Wang et al. 2004, "Image Quality Assessment: From Error Visibility to Structural Similarity" and the MS-SSIM paper (2003).

Here is why it matters for a compressor. If you fix the quality target and let the encoder find the smallest file that clears it, you get small files and honest quality at once. Every image is different, so a flat quality number overpays on easy images and underdelivers on hard ones. A search that says "hit SSIM 0.995, then stop" adapts per image. That is the mode I wanted to measure.

What the engine ships vs what the export was using

Atlas's image engine exposes an optimizeImage() call with several modes:

level: "custom" with a fixed quality. One encode at that number, fast, no adaptation.
level: "auto". A binary search on quality that finds the lowest setting still clearing an SSIM target (0.995 by default, which is visually lossless).
level: "perceptual". The same search, judged with 5-scale MS-SSIM instead of single-scale SSIM.
AVIF output, fully supported, and the engine's own format recommender even suggests it.

Then I read how Atlas actually called it. Every bulk and cached path passed level: "custom" with a flat quality: 85 for the ZIP export, 72 for the in-app preview and grid thumbnails. The SSIM search was wired to exactly one endpoint, the single-image on-demand optimize. MS-SSIM and AVIF never made it into the export or derivative paths at all.

So the real question was not "does Atlas compress well." It was "how much is Atlas leaving on the table by not using the modes it already ships?"

The experiment

Same 100 images. Same output dimensions (resized to 1600px on the long edge, which is what the preview and export paths target, so only the strategy varies). Four encodes per image against the exact installed engine version (v0.3.1), measuring output bytes, SSIM, the quality the search chose, and encode time.

optimizeImage(file, { format: 'webp', level: 'auto', ssimTarget: 0.995, maxWidth: 1600 })

The four strategies:

Export default today: WebP, flat quality 85.
SSIM-auto: WebP, level:"auto", SSIM target 0.995.
AVIF SSIM-auto: AVIF, level:"auto", SSIM target 0.995.
MS-SSIM perceptual: WebP, level:"perceptual".

Results

Strategy	Mean bytes/image	Total (100 images)	Mean SSIM	Encode time
WebP q85 (Atlas export today)	94.5 KB	9.68 MB	0.9964	1.3 s
WebP SSIM-auto	41.5 KB	4.25 MB	0.9951	6.5 s
AVIF SSIM-auto	22.3 KB	2.29 MB	0.9952	7.8 s
WebP MS-SSIM perceptual	19.3 KB	1.98 MB	0.9922	7.3 s

Read the SSIM column first, because it is the honesty check. SSIM-auto and AVIF-auto land at 0.9951 and 0.9952, essentially the same perceived quality as the flat q85's 0.9964. The gap between 0.9964 and 0.9951 is far below the threshold where a person notices a difference on a screen. At that equal quality:

At equal measured quality, the SSIM-targeted WebP was 56% smaller than the flat quality-85 export, and AVIF was 76% smaller. Same perceived quality, a fraction of the bytes.

SSIM-auto WebP is 56% smaller than the flat q85 export.
AVIF SSIM-auto is 76% smaller than the flat q85 export.

Not one of the 100 images came out larger than its source in any mode. The perceptual mode went further (80% smaller) but dropped to SSIM 0.9922 and pushed the WebP quality to its floor, so I treat that as an aggressive option rather than a free win.

For a library of 10,000 product images, the WebP change alone turns roughly 945 MB of served derivatives into 415 MB. The AVIF change turns it into 223 MB. That is bandwidth, storage, and Largest Contentful Paint. Image bytes are one of the biggest LCP levers in Google's Core Web Vitals guidance, and AVIF or WebP over JPEG is their standard recommendation (web.dev: serve modern image formats). Faster images move Core Web Vitals, and Core Web Vitals move conversion. That is why this benchmark belongs on a product blog and not in a drawer.

The honest caveats

A benchmark without caveats is marketing. Three things temper the numbers above:

Encode time. The SSIM search is about 5x slower than a single flat encode (6.5s vs 1.3s per image), and AVIF is slower still, because each search runs several trial encodes plus an SSIM computation. For a cached derivative, encoded once and served thousands of times, that cost amortizes to nothing, which is exactly why I targeted the pre-generated paths. For a live, uncached bulk export the user is waiting on, it is a real latency trade.
Content dependence. These are clean studio shots on white backgrounds, which compress unusually well. The SSIM search settled around WebP q41 on them. Busy, textured, noisy real-world photos will not compress nearly as hard, so expect smaller-but-still-meaningful savings on a mixed library.
The target is a choice. "56% smaller" is measured at SSIM 0.995. Choose a stricter target and you save less. Choose a looser one and you save more but risk visible loss. The number is a function of the quality bar, and I picked the visually-lossless bar on purpose.

What I shipped, and what I deliberately did not

Findings are cheap. So I changed the code and re-measured the exact paths I changed.

Atlas pre-generates two derivatives in the background at upload time: a 1600px preview (opened on every asset view) and a 640px grid thumbnail (rendered on every library page). Both were flat WebP q72. Both are encode-once, serve-many, so the search's extra time is free in the way that matters. I flipped the preview to SSIM-auto and benchmarked the delta:

Path	Before (flat q72)	After (SSIM-auto)	Result
1600px preview	40.3 KB, SSIM 0.9955	33.5 KB, SSIM 0.9951	17% smaller
640px thumbnail	11.8 KB, SSIM 0.9944	14.4 KB, SSIM 0.9951	22% larger

The preview change is a clean 17% cut at equal quality, and it shipped. The thumbnail result is the interesting one. Switching it to SSIM-auto made the files bigger. Why? At 640px, flat q72 was already producing SSIM 0.9944, below the 0.995 target. So the search, asked to reach 0.995, correctly picked a higher quality and produced larger, slightly-better-looking files. The SSIM-auto win only shows up where the old flat quality was sitting above the target and paying for quality nobody could see. At the small thumbnail size, q72 was already under the line.

So I kept the thumbnail on flat q72 and left a comment explaining exactly why. That is the whole argument for measuring instead of assuming: the same one-line change was a 17% win in one place and a 22% regression 200 pixels smaller. The change adds a smart flag to the derivative spec, routes the preview through the SSIM search, and bumps the cache version so existing previews regenerate at the smaller size. Exports that pick an explicit quality stay untouched.

What this means if you optimize images anywhere

The practical takeaways generalize past one codebase:

A flat quality number leaves money on the table. If you encode everything at "quality 85," you overpay on the images that could go lower and you have no floor on the ones that should go higher. Target a quality metric and let the encoder find the size.
AVIF is the single biggest lever when your pipeline can afford the encode time and your audience's browsers support it (all current major browsers do). 76% smaller than flat-quality WebP at the same measured quality is not a rounding error.
Amortize the expensive search onto cached artifacts. The SSIM search is too slow to run inline on every request, and ideal for anything you encode once and serve forever.
Measure the specific thing you are about to change. The thumbnail regression was invisible in theory and obvious in the data.

If you run one thing this week, point your cached derivative path at an SSIM target of 0.995 and re-measure the file size before and after on a real batch. That single change is where the 56% lives.

Reproduce it

The corpus was 100 studio product PNGs (avg 4.6 MB). Each was encoded with Atlas's image engine (v0.3.1) at maxWidth: 1600, preserveAspect: true, varying only format and level. I recorded optimized size, SSIM, chosen quality, and wall-clock encode time per image, then aggregated across the set. The SSIM target for auto is 0.995; for perceptual it is a 5-scale MS-SSIM target of 0.985. Run it on your own library and the ranking holds even if the exact percentages move.

FAQ

Does compressing images this way make them look worse? No, at the target used here. Every optimized mode held SSIM at 0.995 or above, which is visually lossless on a screen. SSIM is a perceptual quality metric, so holding it constant means holding perceived quality roughly constant while the file size drops.

How much smaller are the files, really? On this 100-image benchmark, at equal measured quality: SSIM-targeted WebP was 56% smaller than flat WebP quality 85, and SSIM-targeted AVIF was 76% smaller. On a mixed real-world library expect less, because clean studio shots compress unusually well.

Is AVIF or WebP better for product photos? AVIF produced the smallest files at equal quality in every test (about 76% smaller than flat WebP q85), at the cost of slower encoding. WebP is the safer default for maximum compatibility and faster encodes. AVIF wins on bytes when you can cache the result.

Why not just always use the most aggressive compression? Because "most aggressive" (the perceptual mode at its floor) dropped measured quality to SSIM 0.9922 and can introduce visible loss on some images. The right setting is a quality target, not a maximum-compression switch.

Why was one change a win and a nearly identical change a regression? The SSIM search only saves bytes when the previous flat quality was above the SSIM target and paying for invisible quality. At the 1600px preview, flat q72 was above target, so the search went lower and saved 17%. At the 640px thumbnail, flat q72 was already below target, so the search went higher and added 22%.

What quality metric should I target for web images? SSIM around 0.99 to 0.995 is a good visually-lossless range for most web imagery. Go higher for archival or print-adjacent needs, lower only when you have measured that the loss is acceptable for your content.

Find any asset in seconds. Photo Atlas is digital asset management for creative and brand teams, with early-access founder pricing for the first users. Get early access