Letter frequency in the Bulgarian language

November 1st, 2024. Tagged: misc hackery

In this post, I talked about the letter frequency in English presented in Peter Norvig's research. And then I thought... what about my own mother tongue?

So I got a corpus of 5000 books (832,260 words), a mix of Bulgarian authors and translations, and counted the letter frequency. Here's the result in CSV format: letters.csv


Here are the results (in alphabetical order) in a graph:

And another graph, with data sorted by the frequency of letters:

ChatGPT gives a different result, even startlingly so (o is the winner at ~9.1% and a is third with 7.5%), which makes me like my letter count research even more 😀

Comments? Feedback? Find me on Twitter, Mastodon, Bluesky, LinkedIn, Threads