For images I think we (web developers) have a sense of how many bytes we can expect an image we see on a page to be. A JPEG photo? 100-ish K is ok for a decent quality. Less is nice. How about 200K? Hmmm..., ok. Half a meg? This must be a Hero of some sort. 2 megs? That better be a downloadable hi-res photo of Neptune or something.
But file sizes of web fonts? I personally don't have a gut feeling how much is too much and how much is to be expected. So here's my attempt to find out.
Data set
Turns out one can download all Google Fonts from GitHub. Under a gigabyte of stuff, lots of fonts. For my purposes I decided to only look into regular fonts (no bold, italics), which is still plenty. I took only the TTF files that have "Regular" in the name and that's 1128 files.
find /gfonts -type f -iname "*regular*" -print0 | xargs -0 cp -t ../regulars
Tools I used
Glyphhanger is a nice and easy Nodejs library and CLI that uses Python's fonttools and makes it trivial to subset fonts, while also converting to WOFF2 which is the format that will end up on the web.
Fontkit is also a Nodejs library that can inspect a font file and tell you some meta data such as number of characters, number of glyphs (those two are not synonymous, turns out). And there's also a nice crisp web UI on top of fontkit for all your font introspection needs.
US ASCII subset
Because I was sure some of these fonts may be wild (big sizes, tons of glyphs), I thought I'd level the playing field by subsetting each font only to the 95 characters in basic English, so no umlats and so on. This is the unicode range U+0020-007E
, also conveniently called US_ASCII
in Glyphhanger.
Converting all fonts is a one-liner:
$ glyphhanger --subset="*.ttf" --US_ASCII --formats=woff2
Randomly inspecting some fonts I saw some have just a handful of characters, not the expected 95. Reason is some, say Japanese-only, have very few characters in the US_ASCII unicode range. So I thought I should filter only those that have 95 characters.
The complete script is available, but the salient parts are just looping all files, reading the content and passing each one to fontkit for introspection:
const fontkit = require('fontkit'); // all files fs.readdir(fontDirectory, (err, files) => { files.forEach((file) => { fs.readFile(fontPath, (err, fontBuffer) => { const font = fontkit.create(fontBuffer); // and now some handy properties are available: font.familyName font.numGlyphs font.characterSet
font.characterSet.length
lets us only work with the fonts that have 95 characters and discard the rest. This results in a total of 1074 files for us draw general conclusions. And here are the results...
Results
- Average File Size: 19751.88 bytes
- Median File Size: 12380 bytes
- Average Glyph Count: 144.92
- Median Glyph Count: 107
- Number of font files: 1074
As you can see there are usually a few more glyphs than there are characters.
And so, a conclusion: the median font file with English-only subset of characters should be around 12K. If you look at your network requests and your font is much larger, well there's work for you to do.
Stats
The full stats are available here in CSV format but here's a taste...
Num chars | Num glyphs | Bytes | File | Font name |
---|---|---|---|---|
... | ... | ... | ... | ... |
95 | 175 | 40260 | GreatVibes-Regular-subset.woff2 | Great Vibes |
95 | 96 | 4248 | Gudea-Regular-subset.woff2 | Gudea |
95 | 116 | 16088 | GreyQo-Regular-subset.woff2 | Grey Qo |
95 | 96 | 47676 | Griffy-Regular-subset.woff2 | Griffy |
95 | 123 | 14660 | Gruppo-Regular-subset.woff2 | Gruppo |
95 | 107 | 13760 | Gupter-Regular-subset.woff2 | Gupter |
95 | 156 | 17964 | Gulzar-Regular-subset.woff2 | Gulzar |
95 | 116 | 24364 | Gwendolyn-Regular-subset.woff2 | Gwendolyn |
95 | 213 | 14468 | HachiMaruPop-Regular-subset.woff2 | Hachi Maru Pop |
95 | 98 | 10452 | Halant-Regular-subset.woff2 | Halant |
95 | 98 | 6648 | Habibi-Regular-subset.woff2 | Habibi |
95 | 96 | 10736 | HammersmithOne-Regular-subset.woff2 | Hammersmith One |
95 | 96 | 10696 | Handlee-Regular-subset.woff2 | Handlee |
95 | 107 | 34260 | Hanalei-Regular-subset.woff2 | Hanalei |
95 | 107 | 16448 | HanaleiFill-Regular-subset.woff2 | Hanalei Fill |
95 | 96 | 8356 | Gurajada-Regular-subset.woff2 | Gurajada |
95 | 96 | 14912 | HeadlandOne-Regular-subset.woff2 | HeadlandOne |
... | ... | ... | ... | ... |
Outliers
What about some font files on the outer edges of the median?
Some small files (2K) are hardly useable:
Others (also 2K) are perfectly fine, though simple:
And even 3k can "buy" you a fine font that makes your visitors say, hey this website is not like the others:
On the larger side (250K) we have
(what happened to the capital F?)
and
I suspect more hole-y fonts are more complicated to draw and therefore weigh more, compared to simple strokes, like an old-timey digital watch.
LATIN
Alright, 95 characters is fine and all, but you're one Voilà! away from embarrassment, because your font doesn't have an à. So how about a more character-complete LATIN subset. Glyphhanger's LATIN is a more involved set of unicode ranges:
U+0000-00FF U+0131 U+0152-0153 U+02BB-02BC U+02C6 U+02DA U+02DC U+2000-206F U+2074 U+20AC U+2122 U+2191 U+2193 U+2212 U+2215 U+FEFF U+FFFD
I'm not going to pretend I understand why this is the range, but I can tell you these are 385 characters in total, I checked.
let count = 13; // single chars: U+0131, U+02C6, etc for (let codePoint = 0x0000; codePoint <= 0x00FF; codePoint++) { count++; } for (let codePoint = 0x0152; codePoint <= 0x0153; codePoint++) { count++; } for (let codePoint = 0x02BB; codePoint <= 0x02BC; codePoint++) { count++; } for (let codePoint = 0x2000; codePoint <= 0x206F; codePoint++) { count++; } console.log(count); // 385
Subsetting to LATIN is just as easy as US_ASCII:
$ glyphhanger --subset="*.ttf" --LATIN --formats=woff2
With US_ASCII we had 95 characters in most fonts and removed the ones with fewer characters to keep it all equal. Here, rarely, if ever there's a font that has all 385 characters. Most have a little over 200. So I somewhat randomly picked 200 as a number under which the font is not considered for a comparison. We still have over 1000 font files to compare, but that's a little caveat: not all fonts support the same characters. (I did keep the number of characters in the stats, see below)
Results
- Average File Size: 29045.30 bytes
- Median File Size: 19092 bytes
- Average Glyph Count: 287.03
- Median Glyph Count: 236
- (Update Oct 28, 2024) Median Character Count: 219
- Number of font files: 1009
Conclusion: the median font file with Latin-extended subset of characters should be a little under 20K. If you look at your network requests and your font is much larger, well there's work for you to do.
Stats
The full stats are available here in CSV format but here's a taste...
Num chars | Num glyphs | Bytes | File | Font name |
---|---|---|---|---|
262 | 315 | 15884 | Arya-Regular-subset.woff2 | Arya |
224 | 260 | 32052 | Arizonia-Regular-subset.woff2 | Arizonia |
224 | 247 | 40712 | AreYouSerious-Regular-subset.woff2 | Are You Serious |
235 | 236 | 17488 | Armata-Regular-subset.woff2 | Armata |
209 | 210 | 16920 | Arvo-Regular-subset.woff2 | Arvo |
228 | 233 | 23044 | Asar-Regular-subset.woff2 | Asar |
216 | 217 | 24424 | Artifika-Regular-subset.woff2 | Artifika |
231 | 350 | 23464 | Arsenal-Regular-subset.woff2 | Arsenal |
231 | 348 | 21244 | AsapCondensed-Regular-subset.woff2 | Asap Condensed |
230 | 261 | 20792 | Athiti-Regular-subset.woff2 | Athiti |
... | ... | ... | ... | ... |
221 | 340 | 12504 | ZenKakuGothicAntique-Regular-subset.woff2 | Zen Kaku Gothic Antique |
216 | 229 | 15872 | ZenLoop-Regular-subset.woff2 | Zen Loop |
227 | 921 | 107016 | YujiMai-Regular-subset.woff2 | Yuji Mai |
221 | 340 | 12516 | ZenKakuGothicNew-Regular-subset.woff2 | Zen Kaku Gothic New |
226 | 350 | 15928 | ZenKurenaido-Regular-subset.woff2 | Zen Kurenaido |
226 | 348 | 15564 | ZenMaruGothic-Regular-subset.woff2 | Zen Maru Gothic |
221 | 341 | 34128 | ViaodaLibre-Regular-subset.woff2 | Viaoda Libre |
226 | 350 | 19696 | ZenOldMincho-Regular-subset.woff2 | Zen Old Mincho |
225 | 590 | 43104 | ZillaSlab-Regular-subset.woff2 | Zilla Slab |
227 | 921 | 94288 | YujiSyuku-Regular-subset.woff2 | Yuji Syuku |
216 | 317 | 32700 | ZenTokyoZoo-Regular-subset.woff2 | Zen Tokyo Zoo |
229 | 595 | 43912 | ZillaSlabHighlight-Regular-subset.woff2 | Zilla Slab Highlight |
Next time...
So here it is, folks, a web font file that supports extended Latin characters, your Às and your Ás and Â, Ã, Ä, Å... should weigh around 20K. Anything a little over (or a lot over) 20K is up to you to decide. Is the font worth it, can it be subset, etc, etc.
That's, of course, just, like, my opinion. Curious to see other folks' thoughts and/or further experimentation.
As a follow up I want to just try to see how much subsetting really helps. Stay tuned.
Comments? Find me on BlueSky, Mastodon, LinkedIn, Threads, Twitter