Not just words, but the invisible structures behind them. See how famous authors shape their sentences by analyzing their works for punctuation patterns.
This project started a few years ago to try to better understand punctuation trends within famous books. I wanted to see how the punctuation usage among different authors compared, and if there was an interesting way to visualize this.
This data source behind this project started with me just downloading the top 50 most popular copyright free books from the Gutenberg Project website. The Gutenberg Project is an open source public archive (and the oldest eBook library) that aims to 'encourage the creation and distribution of eBooks'. If you are interested in the data source, you can find more information about it here.
This workflow was fine on a small and experimental scale but needed a lot more work in order to bring it completely up to speed. I leveraged this Kaggle dataset to identify the top 10,000 most popular books on GoodReads (allegedly). After this, I developed a custom search and extraction workflow to check if each one of these books is available on the Gutenberg platform.
For this analysis, the full text of each selected eBook was systematically processed to identify and quantify every punctuation mark. Metrics such as total word count, character count, and the proportion of punctuation to overall text were calculated for each work. Additionally, the frequency of individual punctuation symbols was examined to reveal distinctive stylistic patterns.
This project serves no commerical purpose and was solely created for fun to see how punctuation patterns are used across famous books.
hang with
your favorite
writers
see how
they build
their stories