By Catherine Elvy
Researchers at Dartmouth College turned to the Bible as part of a high-tech effort to improve computer-based text translations. The team produced algorithms that trained on various versions of Scripture to convert written material into a variety of styles.
While Internet tools to translate text between languages are widely available, style translators have lagged behind. Until recently, application developers have been frustrated in their efforts to improve style translators because of the challenges associated with acquiring adequate quantities and calibers of data.
Dartmouth scholars turned to the Bible, which provided “a large, previously untapped dataset of aligned parallel text.”
“The English-language Bible comes in many different written styles, making it the perfect source text to work with for style translation,” said Keith Carlson, a doctoral student in computer science and lead author of the research initiative.
Each version of the Bible contains 31,000-plus verses, which the researchers used to produce more than 1.5 million distinct pairings of source and target verses for machine-learning training sets.
One of the Dartmouth researchers readily heralded his team’s advancements on behalf of the intersection of the technological and literary sectors and future applications.
“While we present these data as a style-transfer corpus, we believe that it is of unmatched quality and may be useful for other natural language tasks as well,” wrote Carlson for Royal Society Open Science.
Carlson teamed up with Dartmouth Professor Daniel Rockmore and Indiana University Assistant Professor Allen Riddell to pen Evaluating Prose Style Transfer with the Bible for the October issue of Royal Society.
Among the other architects of the high-profile project, Rockmore, Princeton ’84, Harvard Ph.D. ’89, is Dartmouth’s associate dean for the sciences. As for Riddell, the Indiana University academician was a fellow in Dartmouth’s Neukom Institute from 2013 to 2016.
In late 2018, the trio generated headlines after being the first team of researchers to utilize the Bible to advance the field of computer-generated style translation.
While so-called parallel datasets are not novel, the Dartmouth effort represented the first to harness the linguistic content within the wide body of scriptural translations. Other text sources, including Shakespearian plays and Wikipedia entries, proved sub-optimal as data sets for computer-generated style translations.
As an added benefit, the Bible is already thoroughly indexed by consistent book, chapter, and verse numbers. The predictable organization of the material across versions eliminates the risk of alignment errors that could be trigged by automatic methods of matching textual versions.
Rockmore went so far as to call the Good Book a “divine data set” for this and future projects. “Humans have been performing the task of organizing Bible texts for centuries, so we didn’t have to put our faith into less reliable alignment algorithms,” he said.
For the project, the team initially looked at 34 stylistically distinct Bible translations, ranging in linguistic complexity from the King James adaption to the Bible in Basic English version. While the King James Version features a distinctive, archaic voice, the Bible in basic English is meant to be understood by readers with limited vocabulary.
The academicians then took into account some practical and legal considerations.
Upon closer examination, the men “found that seven of these collected versions are in the public domain and thus can be freely distributed,” Carlson noted. Additionally, the license for the Lexham English Bible allows for free distribution.
Thus, eight public versions created the corpus at the heart of the Dartmouth style-translation project.
The Dartmouth trio took on the ground-breaking initiative because individual languages offer multiple ways to convey similar concepts.
“Our systems aim to produce text with the same meaning as the original, but do so with different words,” said Carlson.
Dartmouth College has a rich history of innovation in computer science. The term “artificial intelligence” was coined at Dartmouth during a 1956 conference that created the actual discipline. Other advancements include the design of BASIC — the first general-purpose and accessible programing language — and the Dartmouth Time-Sharing System that contributed to the modern operating system.
This story orginally appeared in Christian Union: The Magazine. Christian Union’s work is focused in three areas: developing bold Christian leaders at the most strategic and profoundly influential universities in America; building networks of Christian leaders in cities; and promoting national revival through the Christian Union Day & Night online ministry.