Inverse font subsetting

November 25th, 2024. Tagged: font-face, performance

While at the most recent performance.now() conference, I had a little chat with Andy Davies about fonts and he mentioned it'd be cool if, while subsetting, you can easily create a second subset file that contains all the "rejects". All the characters that were not included in the initially desired subset.

And as the flight from Amsterdam is pretty long, I hacked on just that. Say hello to a new script, available as an NPM package, called...

inverse-subset

Initially I was thinking to wrap around Glyphhanger and do both subsets, but decided that there's no point in wrapping Glyphhanger to do what Glyphhanger already does. So the initial subset is left to the user to do in any way they see fit. What I set out to do was take The Source (the complete font file) and The Subset and produce an inversion, where

The Inverted Subset = The Source - The Subset

This way if your subset is all Latin characters, the inversion will be all non-Latin characters.

When you craft the @font-face declaration, you can use the Unicode range of the subset, like

@font-face {
    font-family: "Oxanium";
    src: url("Oxanium-subset.woff2") format("woff2");
    unicode-range: U+0020-007E;
}

(Unicode generated by wakamaifondue.com/beta)

Then for the occasional character that is not in this range, you can let the browser load the inverted subset. But that should be rare, otherwise an oft-needed character will be in the original subset.

Save on HTTP requests and bytes (in 99% of cases) and yet, take care of all characters your font supports for that extra special 1% of cases.

Unicode-optional

Wakamaifondue can generate the Unicode range for the inverted subset too but it's not required (it's too long!) only if the inverted declaration comes first. In other words if you have:

@font-face {
  font-family: "Oxanium";
  src: url("Oxanium-inverse-subset.woff2") format("woff2");
}
@font-face {
  font-family: "Oxanium";
  src: url("Oxanium-subset.woff2") format("woff2");
  unicode-range: U+0020-007E;
}

... and only Latin characters on the page, then Oxanium-inverse-subset.woff2 is NOT going to be downloaded, because the second declaration overwrites the first.

Test page is here

If you flip the two @font-face blocks, the inversion will be loaded because it claims to support everything. And the Latin will be loaded too, because the inversion proves inadequate.

If you cannot guarantee the order of @font-faces for some reason, specifying a scary-looking Unicode range for the inversion is advisable:

@font-face {
    font-family: "Oxanium";
    src: url("Oxanium-inverse-subset.woff2") format("woff2");
    unicode-range: U+0000, U+000D, U+00A0-0107, U+010C-0113, U+0116-011B,
        U+011E-011F, U+0122-0123, U+012A-012B, U+012E-0131, U+0136-0137,
        U+0139-013E, U+0141-0148, U+014C-014D, U+0150-015B, U+015E-0165,
        U+016A-016B, U+016E-0173, U+0178-017E, U+0192, U+0218-021B, U+0237,
        U+02C6-02C7, U+02C9, U+02D8-02DD, U+0300-0304, U+0306-0308,
        U+030A-030C, U+0326-0328, U+03C0, U+1E9E, U+2013-2014, U+2018-201A,
        U+201C-201E, U+2020-2022, U+2026, U+2030, U+2039-203A, U+2044, U+2070,
        U+2074, U+2080-2084, U+20AC, U+20BA, U+20BD, U+2113, U+2122, U+2126,
        U+212E, U+2202, U+2206, U+220F, U+2211-2212, U+2215, U+2219-221A,
        U+221E, U+222B, U+2248, U+2260, U+2264-2265, U+25CA, U+F000,
        U+FB01-FB02;
}

How embarrassment looks like

If you don't load the extended characters and someone uses your CMS to add a wee bit of je ne sais quoi, you get a fallback font:

Test page is here

(Note the à shown in a fallback font)

But if you do load the inversion, all is fine with the UI once again.

Test page

Thank you!

... and happy type setting, subsetting, and inverse subsetting!

Here's a view of the tool in action:

Comments? Feedback? Find me on Twitter, Mastodon, Bluesky, LinkedIn, Threads