×

Talkie: A ‘Vintage’ AI model that lives in 1930 and knows nothing about the present!

Have you ever imagined having a conversation with someone from the distant past? Someone who has never heard of the iPhone, knows nothing about the internet, and isn’t even aware that World War II occurred? This is exactly what the Talkie project offers. It is a Large Language Model (LLM) with 13 billion parameters, trained exclusively on historical texts dating back to before 1931. It is not just an exciting technical experiment, but a serious attempt to build what are called “Vintage Language Models,” which aim to simulate the knowledge, culture, and language as they were in a bygone era, free from the “contamination” of modern data that floods today’s models.

From Phonegram: A man sits at a desk holding a laptop, listening to an avatar of a 'vintage' AI discussing inventions, technology, and communication symbols in an antique study room.


Why do we need an AI from the past?

Training an AI model to be “ignorant” of the present might seem strange, but the scientific benefits of this approach are remarkable. The core idea behind vintage models is to study how knowledge evolves and the capacity for prediction. By training Talkie on pre-1931 texts, researchers can test the model’s ability to “predict” the future. For example, could a model trained up to 1911 deduce the General Theory of Relativity discovered by Einstein in 1915?

From Phonegram: Two men discussing at a desk with scientific papers; behind them are images of Albert Einstein, equations, a steam train, the Titanic, and a vintage AI symbolizing science and historical innovation.

Furthermore, modern models suffer from what is called “contamination,” where they have already seen test questions or programming solutions while being trained on the web. Vintage models are inherently free from this contamination; they have never seen a single line of Python code because it didn’t exist back then. Yet, experiments have shown that the Talkie model can learn programming just by being given a few examples in the context of a conversation, proving the AI’s ability to generalize and reason logically rather than just memorizing.


Talkie 13B: A digital time machine

Talkie is considered the largest vintage language model currently available, having been trained on 260 billion tokens of historical English texts, including books, newspapers, scientific journals, and patents. The result is an amazing conversation partner; it can write Gothic horror stories in the style of the 19th century, or describe the impressions of a traveler visiting Cairo for the first time in the Victorian era using poetic language we no longer use today.

From Phonegram: Four vintage book covers about etiquette, recipes, behavior, games, and letter writing, each featuring ornate typography and illustrations.

What is exciting is that this model does not follow instructions based on modern “chat” data; rather, it was fine-tuned using old etiquette books, letter-writing guides from the turn of the 20th century, and classic cookbooks. This makes it reflect the culture and values of the era it represents, with all its linguistic and social characteristics, making it an invaluable tool for historians, sociologists, and writers seeking historical authenticity in texts.


Training challenges: From poor text quality to ‘temporal leakage’

Building a model that lives in 1930 is not as easy as it seems. One of the biggest challenges is data quality; since the texts were not digital, they had to be converted using Optical Character Recognition (OCR) techniques. The problem is that these techniques often make significant errors in reading old fonts, which reduces the model’s learning efficiency. Researchers found that models trained on human-digitized texts significantly outperform those relying on traditional OCR, which is pushing them to develop OCR systems specifically for historical documents.

From Phonegram: A man in vintage clothing examining documents at a desk, surrounded by old books and newspapers, while digital analysis overlays—powered by a vintage AI model—highlight historical events and texts.

The other challenge is “temporal leakage”; sometimes modern texts sneak into the database, such as an introduction written by an editor in 2020 for a book published in 1920. This caused early versions of the model to know—to the researchers’ surprise—about Roosevelt’s presidency that began in 1933 or even about World War II. Therefore, the team is currently working on developing advanced filters to ensure that Talkie remains—technically—a prisoner of its golden age before the 1930s.

If you had the chance, what question would you ask an AI that believes we are still in 1930?

Source:

talkie-lm.com

Leave a Reply