The numbers, one to 50 thousand

A book of found typography from historic texts (2025)

November 2025

The primary rule of National Novel Generation Month is to generate a book of 50,000 words, so I generated a book of the numbers one to fifty-thousand.

Source code

Detail showing extracted number typography from historic texts

About the project

The project extracts number images from copyright-free books on the Internet Archive, using OCR data to locate and crop individual digit sequences. For any numbers missing from the corpus, it composes them from extracted primitives.

The final PDF contains pages with the numbers one through fifty thousand, laid out in a visually interesting way that preserves the original typography, ink texture, and paper quality from the source materials.

Sample spreads showing numbers extracted from various historic texts
Sample spreads
Generated table of contents
A table of contents helps the reader find their favorite number

Technical notes

The project uses hOCR (HTML-based OCR) data from Internet Archive digitisation to precisely locate text on page images. Python scripts handle downloading, extraction, composition, and final PDF generation.

Suggested Internet Archive collections with good hOCR coverage include americana, toronto, cdl, and internetarchivebooks.

Remix

Hugo van Kemenade created π with 50,000 digits, a remix celebrating the release of Python 3.14.

Spreads from the π remix showing digits of pi extracted from historic texts
Sample spreads from the π remix