The numbers, one to 50 thousand
A book of found typography from historic texts (2025)
November 2025
The primary rule of National Novel Generation Month is to generate a book of 50,000 words, so I generated a book of the numbers one to fifty-thousand.
About the project
The project extracts number images from copyright-free books on the Internet Archive, using OCR data to locate and crop individual digit sequences. For any numbers missing from the corpus, it composes them from extracted primitives.
The final PDF contains pages with the numbers one through fifty thousand, laid out in a visually interesting way that preserves the original typography, ink texture, and paper quality from the source materials.
Technical notes
The project uses hOCR (HTML-based OCR) data from Internet Archive digitisation to precisely locate text on page images. Python scripts handle downloading, extraction, composition, and final PDF generation.
Suggested Internet Archive collections with good hOCR coverage include americana, toronto, cdl, and internetarchivebooks.
Remix
Hugo van Kemenade created π with 50,000 digits, a remix celebrating the release of Python 3.14.