How many bytes is “normal” for a web font: a study using Google fonts

January 23rd, 2024. Tagged: font-face, performance

For images I think we (web developers) have a sense of how many bytes we can expect an image we see on a page to be. A JPEG photo? 100-ish K is ok for a decent quality. Less is nice. How about 200K? Hmmm..., ok. Half a meg? This must be a Hero of some sort. 2 megs? That better be a downloadable hi-res photo of Neptune or something.

But file sizes of web fonts? I personally don't have a gut feeling how much is too much and how much is to be expected. So here's my attempt to find out.

Data set

Turns out one can download all Google Fonts from GitHub. Under a gigabyte of stuff, lots of fonts. For my purposes I decided to only look into regular fonts (no bold, italics), which is still plenty. I took only the TTF files that have "Regular" in the name and that's 1128 files.

  find /gfonts -type f -iname "*regular*" -print0 | xargs -0 cp -t ../regulars

Tools I used

Glyphhanger is a nice and easy Nodejs library and CLI that uses Python's fonttools and makes it trivial to subset fonts, while also converting to WOFF2 which is the format that will end up on the web.

Fontkit is also a Nodejs library that can inspect a font file and tell you some meta data such as number of characters, number of glyphs (those two are not synonymous, turns out). And there's also a nice crisp web UI on top of fontkit for all your font introspection needs.

US ASCII subset

Because I was sure some of these fonts may be wild (big sizes, tons of glyphs), I thought I'd level the playing field by subsetting each font only to the 95 characters in basic English, so no umlats and so on. This is the unicode range U+0020-007E, also conveniently called US_ASCII in Glyphhanger.

Converting all fonts is a one-liner:

$ glyphhanger --subset="*.ttf" --US_ASCII --formats=woff2

Randomly inspecting some fonts I saw some have just a handful of characters, not the expected 95. Reason is some, say Japanese-only, have very few characters in the US_ASCII unicode range. So I thought I should filter only those that have 95 characters.

The complete script is available, but the salient parts are just looping all files, reading the content and passing each one to fontkit for introspection:

const fontkit = require('fontkit');

// all files
fs.readdir(fontDirectory, (err, files) => {
  files.forEach((file) => {
    fs.readFile(fontPath, (err, fontBuffer) => {
      const font = fontkit.create(fontBuffer);
  
      // and now some handy properties are available:
      font.familyName
      font.numGlyphs
      font.characterSet

font.characterSet.length lets us only work with the fonts that have 95 characters and discard the rest. This results in a total of 1074 files for us draw general conclusions. And here are the results...

Results

  • Average File Size: 19751.88 bytes
  • Median File Size: 12380 bytes
  • Average Glyph Count: 144.92
  • Median Glyph Count: 107
  • Number of font files: 1074

As you can see there are usually a few more glyphs than there are characters.

And so, a conclusion: the median font file with English-only subset of characters should be around 12K. If you look at your network requests and your font is much larger, well there's work for you to do.

Stats

The full stats are available here in CSV format but here's a taste...

Num chars Num glyphs Bytes File Font name
... ... ... ... ...
95 175 40260 GreatVibes-Regular-subset.woff2 Great Vibes
95 96 4248 Gudea-Regular-subset.woff2 Gudea
95 116 16088 GreyQo-Regular-subset.woff2 Grey Qo
95 96 47676 Griffy-Regular-subset.woff2 Griffy
95 123 14660 Gruppo-Regular-subset.woff2 Gruppo
95 107 13760 Gupter-Regular-subset.woff2 Gupter
95 156 17964 Gulzar-Regular-subset.woff2 Gulzar
95 116 24364 Gwendolyn-Regular-subset.woff2 Gwendolyn
95 213 14468 HachiMaruPop-Regular-subset.woff2 Hachi Maru Pop
95 98 10452 Halant-Regular-subset.woff2 Halant
95 98 6648 Habibi-Regular-subset.woff2 Habibi
95 96 10736 HammersmithOne-Regular-subset.woff2 Hammersmith One
95 96 10696 Handlee-Regular-subset.woff2 Handlee
95 107 34260 Hanalei-Regular-subset.woff2 Hanalei
95 107 16448 HanaleiFill-Regular-subset.woff2 Hanalei Fill
95 96 8356 Gurajada-Regular-subset.woff2 Gurajada
95 96 14912 HeadlandOne-Regular-subset.woff2 HeadlandOne
... ... ... ... ...

Outliers

What about some font files on the outer edges of the median?

Some small files (2K) are hardly useable:

Others (also 2K) are perfectly fine, though simple:

And even 3k can "buy" you a fine font that makes your visitors say, hey this website is not like the others:

On the larger side (250K) we have


(what happened to the capital F?)

and

I suspect more hole-y fonts are more complicated to draw and therefore weigh more, compared to simple strokes, like an old-timey digital watch.

LATIN

Alright, 95 characters is fine and all, but you're one Voilà! away from embarrassment, because your font doesn't have an à. So how about a more character-complete LATIN subset. Glyphhanger's LATIN is a more involved set of unicode ranges:

U+0000-00FF
U+0131
U+0152-0153
U+02BB-02BC
U+02C6
U+02DA
U+02DC
U+2000-206F
U+2074
U+20AC
U+2122
U+2191
U+2193
U+2212
U+2215
U+FEFF
U+FFFD

I'm not going to pretend I understand why this is the range, but I can tell you these are 385 characters in total, I checked.

let count = 13; // single chars: U+0131, U+02C6, etc

for (let codePoint = 0x0000; codePoint <= 0x00FF; codePoint++) {
  count++;
}
for (let codePoint = 0x0152; codePoint <= 0x0153; codePoint++) {
  count++;
}
for (let codePoint = 0x02BB; codePoint <= 0x02BC; codePoint++) {
  count++;
}
for (let codePoint = 0x2000; codePoint <= 0x206F; codePoint++) {
  count++;
}
console.log(count); // 385

Subsetting to LATIN is just as easy as US_ASCII:

$ glyphhanger --subset="*.ttf" --LATIN --formats=woff2

With US_ASCII we had 95 characters in most fonts and removed the ones with fewer characters to keep it all equal. Here, rarely, if ever there's a font that has all 385 characters. Most have a little over 200. So I somewhat randomly picked 200 as a number under which the font is not considered for a comparison. We still have over 1000 font files to compare, but that's a little caveat: not all fonts support the same characters. (I did keep the number of characters in the stats, see below)

Results

  • Average File Size: 29045.30 bytes
  • Median File Size: 19092 bytes
  • Average Glyph Count: 287.03
  • Median Glyph Count: 236
  • (Update Oct 28, 2024) Median Character Count: 219
  • Number of font files: 1009

Conclusion: the median font file with Latin-extended subset of characters should be a little under 20K. If you look at your network requests and your font is much larger, well there's work for you to do.

Stats

The full stats are available here in CSV format but here's a taste...

Num chars Num glyphs Bytes File Font name
262 315 15884 Arya-Regular-subset.woff2 Arya
224 260 32052 Arizonia-Regular-subset.woff2 Arizonia
224 247 40712 AreYouSerious-Regular-subset.woff2 Are You Serious
235 236 17488 Armata-Regular-subset.woff2 Armata
209 210 16920 Arvo-Regular-subset.woff2 Arvo
228 233 23044 Asar-Regular-subset.woff2 Asar
216 217 24424 Artifika-Regular-subset.woff2 Artifika
231 350 23464 Arsenal-Regular-subset.woff2 Arsenal
231 348 21244 AsapCondensed-Regular-subset.woff2 Asap Condensed
230 261 20792 Athiti-Regular-subset.woff2 Athiti
... ... ... ... ...
221 340 12504 ZenKakuGothicAntique-Regular-subset.woff2 Zen Kaku Gothic Antique
216 229 15872 ZenLoop-Regular-subset.woff2 Zen Loop
227 921 107016 YujiMai-Regular-subset.woff2 Yuji Mai
221 340 12516 ZenKakuGothicNew-Regular-subset.woff2 Zen Kaku Gothic New
226 350 15928 ZenKurenaido-Regular-subset.woff2 Zen Kurenaido
226 348 15564 ZenMaruGothic-Regular-subset.woff2 Zen Maru Gothic
221 341 34128 ViaodaLibre-Regular-subset.woff2 Viaoda Libre
226 350 19696 ZenOldMincho-Regular-subset.woff2 Zen Old Mincho
225 590 43104 ZillaSlab-Regular-subset.woff2 Zilla Slab
227 921 94288 YujiSyuku-Regular-subset.woff2 Yuji Syuku
216 317 32700 ZenTokyoZoo-Regular-subset.woff2 Zen Tokyo Zoo
229 595 43912 ZillaSlabHighlight-Regular-subset.woff2 Zilla Slab Highlight

Next time...

So here it is, folks, a web font file that supports extended Latin characters, your Às and your Ás and Â, Ã, Ä, Å... should weigh around 20K. Anything a little over (or a lot over) 20K is up to you to decide. Is the font worth it, can it be subset, etc, etc.

That's, of course, just, like, my opinion. Curious to see other folks' thoughts and/or further experimentation.

As a follow up I want to just try to see how much subsetting really helps. Stay tuned.

Comments? Find me on BlueSky, Mastodon, LinkedIn, Threads, Twitter