Talkie is a 1930 language model with a modern contamination problem
A 13B model trained on pre-1931 text is less a nostalgia demo than a practical test bed for clean data, synthetic tuning, and what language models really learn from the web.
news, tips, and reviews that make thinking machines useful
XTag archive
Everything we’ve published under Data Contamination so far.
Follow this lane
Data Contamination readers are already filtering for a specific AI topic, which makes this archive a useful audience signal for sponsors and repeat readers.
A 13B model trained on pre-1931 text is less a nostalgia demo than a practical test bed for clean data, synthetic tuning, and what language models really learn from the web.