AI Tackles The Vatican’s Secrets

2018-03-22

4 minute read

Even church archivists don’t know what mysteries lie hidden in the Vatican Secret Archives, since many of its documents have never been transcribed. A machine-vision system for medieval text is about to change that.

Partner with citi.io
for your next big idea.
Let us know here.

The Vatican Secret Archives are the stuff of legend. Reportedly filling some 85 kilometers of shelving in Vatican City, they contain the private letters and other documents of past popes, some of them dating back to the eighth century.

The archives are closely guarded. Since 1881, though, scholars have had limited access to some of the documents, and even this has revealed much.

For example, there’s a 60-meter scroll detailing the trials of the Knights Templar, which started in 1307 and lasted several years. There are letters to various popes from Michelangelo; from Henry VIII, requesting a marriage annulment; and from Mary, Queen of Scots, begging for intercession before her beheading.

From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.

CYBERPOGO.COM :: For the Arts, Sciences, and Technology.

DADAHACKS.COM :: Parenting For The Rest Of Us.

ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.

TAKUMAKU.COM :: For The Hearth And Home.

ASTER.CLOUD :: From The Cloud And Beyond.

LIWAIWAI.COM :: Intelligence, Inside and Outside.

GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.

FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.

ASTERCASTER.COM :: Supra Astra. Beyond The Stars.

BARTDAY.COM :: Prosperity For Everyone.

The archives also contain more recent correspondence, such as letters from Abraham Lincoln and Jefferson Davis attempting to persuade Pope Pius IX to favor the Union and the Confederacy, respectively. Then there are the records relating to Pope Pius XII and his dealings with the Nazi regime during World War II, which have never been published. Indeed, all records from 1939 onward are entirely secret.

While publication of the records is forbidden, the archives have their own photographic and conservation studios. And like many historic archives around the world, they have begun to save images of certain documents to preserve them and allow further study.

But the Vatican’s records are so voluminous that transcribing them by hand is impractical in any reasonable period of time. Could machine vision help?

Today we get an answer thanks to the work of Donatella Firmani at Roma Tre University in Italy and a few colleagues. These folks have begun a project called In Codice Ratio (Latin for “The Code System”), which aims to automatically transcribe a part of the Vatican Secret Archives called the Vatican Registers.

LEARN MORE What's Next (Maybe) For Generative AI, Foundation Models, Large Language Models, et al. v0.1.

This corpus consists of more than 18,000 pages of official 13th-century correspondence between the Catholic Church and kings, queens, and political and religious institutions across Europe. “Never having been transcribed in the past, these documents are of unprecedented historical relevance,” say Firmani and co.

Medieval texts pose a unique problem for machine vision. Conventional optical character recognition algorithms do not work well, because the manuscripts are written in varying styles with different ligatures (characters that combine adjacent letters) and with idiosyncratic abbreviations.

To get around that, scholars have developed machine vision systems that recognize entire words rather than letters. But this is far from satisfactory, because most words appear only a few times even in long documents. So creating data sets for machines to learn from is hard.

Now Firmani and co have come up with a new way to train an optical character recognition system that divides each word into a series of strokes, which fit together like a jigsaw puzzle. “Our goal is to develop a full-fledged system that transcribes as much as possible from the manuscripts,” they say.

Having divided each word into strokes, the system attempts to fit them together to form known letters and then analyzes all potential permutations of letters. Finally, it rules out all those that are non-grammatical.

For example, a common pattern of strokes can be interpreted as “iii” or as “m,” but the former can be ruled out as grammatically inadmissible. The same strokes might also represent “in” or “ni,” and to decide between them, the system must study the word and its context more closely.

LEARN MORE The Evolution of AI From Programmed to Self-Learning Systems

Firmani and co’s first step was to create a data set to train a machine-vision system based on a neural network. This data set must be labeled so that the system can learn what letters are represented by different combinations of strokes.

Firmani and co used crowdsourcing to complete this annotation. They presented the jigsaw segmentation of words as a pattern recognition problem to 120 high school students, rather like Captcha puzzles. The students together manually labeled a 15,000-character training data set in a couple of hours.

The results are impressive. “We were able to generate the exact transcription for 65 percent of the word images of our dataset,” say Firmani and co.

That’s an important step forward for medieval text transcription and for historians in general. However, there is more work ahead. For example, the transcription works with lowercase letters, so an important next goal is to extend the vocabulary to include uppercase letters and some of the many abbreviations used in medieval text.

Just how the Vatican Secret Archives will use this technology isn’t yet clear. Nor is it known whether the Vatican Registers will be published when they are transcribed.

But even if they aren’t, the tools that Firmani and co are developing should allow scholars to make progress. For instance, they allow a data-driven study of historical documents that could look at word or phrase frequencies, for example, and how they change over time. That can provide important cultural insights.

It will be fascinating to see whether the Vatican decides to share this knowledge—or hide it.

This feature is from MIT Technology Review.

From our partners:

Available On Amazon

Available On Lulu

Empty Half The Earth Of Its Humans. It's The Only Way To Save The Planet

2018-03-22

Why Australian Cities Are Crying Out For Better Planning

2018-03-22

The Latest

Trump demands an end to the war in Gaza – could a ceasefire be close?

VAMO proposes an alternative to architectural permanence

The Summer Adventures : Camping Essentials

US and Iran have a long, complicated history, spanning decades before US strikes on nuclear sites

AI Tackles The Vatican’s Secrets

Partner with citi.io
for your next big idea.
Let us know here.

From our partners:

From our partners:

Empty Half The Earth Of Its Humans. It's The Only Way To Save The Planet

Why Australian Cities Are Crying Out For Better Planning

Canberra’s Tug & Pull Between The Ideal & The Mediocre

Cities’ Past, Its Undoing Or Its Foundation For The Future?

The Seven Deadly Sins Of AI Predictions

Make Your Cities Flood-Resilient With These Ideas

Press-Shy Presidents?

European E-Retailers Impatiently Bypassing Regulators For Unified Market

How Digital Liberated Innovation From Large Cities

Drive Book Sales With These Expert-Approved Tips

Inversion Solutions — Tackling Utah’s Pollution Problem

The Good, Bad & Ugly Sides To Being A Tourist In New York

5 Important Factors To Consider While Hiring A Commercial Roofing Service

Cannabis Infused Medicines For Pain Management – What Is the Effectiveness & Ability?

5 Reasons Why More People Are Heading To Scandinavia For Their Holidays

Prince Philip Dies: Old-School European Aristocrat And Dedicated Royal Consort

Understanding The Benefits Of Going Solar

AI Tackles The Vatican’s Secrets

From our partners:

From our partners:

Empty Half The Earth Of Its Humans. It's The Only Way To Save The Planet

Why Australian Cities Are Crying Out For Better Planning

Related Posts