Oscar Robinson

London, UK based Software and Data Engineer. I love using Kotlin, Scala, Kafka, Postgres and Snowflake.

Will generative AIs curate our history?

History is what humans have recorded over the last 6 millennia. For the majority of this timespan, these records are disparate and sparse: diaries, business records, funeral rites, censuses. Historians curate these snippets of evidence to create a picture of life in our past.

However, the last few decades have ushered in the information age. The amount of data about our lives that is now recorded is staggering. It’s estimated we collectively produce trillions of megabytes of data every day. Historians of the future will have two problems: how will this digital information be preserved and how will it be curated to create a digestible picture of life at a given moment in history?

2022 saw generative AI hit the mainstream, with November seeing the release of ChatGPT, an interface to OpenAI’s GPT-3.5 language model. This model was, among other things, trained on most of the publicly available internet, namely the Common Crawl dataset. These monthly snapshots of the web are hundreds of terabytes in size. However, the GPT-3 model was less than 1TB. This raises an interesting proposition: could we use such models to collect and preserve the vast expanse of data we produce so that it can be explored by future generations? After all, the prospect of preserving a terabyte of data for centuries or millennia sounds much more plausible than many petabytes of distributed data.

Now, ChatGPT has its shortcomings. Most of the criticism is around the accuracy of the answers it provides to questions. How these shortcomings will be addressed over the coming months and years remains to be seen but it seems likely they will be.

Now imagine such an improved model, produced each year, or even each month, that captures the state of life on earth in that time period. A natural language model that can be queried conversationally about culture, home life, politics, anything that is happening in the world. Not just the view of those happenings from a specific person of influence but from everyone on earth who is producing information. This model would be able to provide the perspective of anyone from that time and describe what the world was like for them. It could condense the vast amounts of information into a single queryable interface for a future historian to explore.

Historians of today curate primary sources to help paint a picture of the past. Perhaps these tools we now have can do this curation of our current lives and be used to paint a picture of our present to the historians of the future?