In this post, I talked about the letter frequency in English presented in Peter Norvig's research. And then I thought... what about my own mother tongue?
So I got a corpus of 5000 books (832,260 words), a mix of Bulgarian authors and translations, and counted the letter frequency. Here's the result in CSV format: letters.csv
Here are the results (in alphabetical order) in a graph:
And another graph, with data sorted by the frequency of letters:
ChatGPT gives a different result, even startlingly so (o is the winner at ~9.1% and a is third with 7.5%), which makes me like my letter count research even more 😀
Comments? Feedback? Find me on Twitter, Mastodon, Bluesky, LinkedIn, Threads