Looking into the Effects of Machine Translation on Interlingual Subtitles

By Hanna Hagström

Hanna Hagström holds a BA and two MAs in Translation Studies. She is a PhD student at the Institute for Interpreting and Translation Studies at Stockholm University where she researches subtitling and teaches translation, subtitling, and translation theory. Her main research interest is interlingual subtitling, particularly in relation to new technology. Hanna has worked as a professional translator, subtitler and audio describer since 2006.

The increasing number of TV programs and films in need of subtitles have posed a great challenge for the subtitling industry. In order to meet the demand, changes in workflow have become necessary. Therefore, in the spring of 2020, the subtitling industry in Sweden was introduced to a new process: post-editing machine translated subtitles. This change meant that instead of translating the content themselves, usually via templates, subtitlers were now supposed to revise and edit machine translated subtitles for some jobs. The implementation of this new technology was initially not embraced by the subtitling community. On the one hand, the Swedish subtitling union blacklisted companies who cut rates for these new subtitling jobs, and on the other, subtitlers worried about their future. Subtitlers’ main worry, apart from the risk of lower remuneration, was that they would not be able to produce high quality subtitles. Among other things, concerns were raised regarding the quality of translations and target language. In short, it was indeed a turbulent time.

Photo by Erik Mclean on Unsplash

In light of the strong reactions and opinions caused by this change, myself and Jan Pedersen decided to investigate these new subtitles. We wanted to see if they were in fact different from traditional ones, where subtitlers translate themselves. We assembled a corpus of Swedish subtitles (translated from English) to analyze. At the time, this new process was limited to reality series and documentaries, or daytime TV, so we assembled 26 episodes from those two genres. Half of them were translated by subtitlers via templates, or EMT files, and were commissioned during the 2010s. EMT stands for English Master Template; it is a spotted and segmented file with transcribed (or translated) dialogue. The other half of the episodes were machine translated (via EMTs) and then post-edited by professional subtitlers and were commissioned in the early 2020s. For that reason, we refer to them as 2010 episodes and 2010 subtitles and 2020 episodes and 2020 subtitles, respectively. We wanted the two sets of episodes to be closely matched so we chose episodes from the same series, before and after the new process was used, or, when that was not possible, from similar series. All in all, we had just under 12000 subtitles in our corpus, and they were commissioned by and produced for the same company.

We investigated two aspects of the subtitles; we assessed quality, and we looked for potential patterns in the 2020 subtitles, that we could not find in the 2010 ones. For the quality assessment, we used a model called the FAR model (Pedersen 2017). This model was developed for assessing quality of three aspects in interlingual subtitles – Functional equivalence: how good the translation is, Acceptability: how good the target language is, and Readability: how easily viewers can read and understand the subtitles. We also did a close reading of the subtitles and made notes of any patterns we detected. Since quality is a very tricky thing to assess and highly subjective, we needed a yardstick with which we could determine quality. We did not have access to the in-house guidelines of the company responsible for the subtitles; instead, we used the Swedish national guidelines for subtitling (available at medietextarna.se). These were appropriate in this study since they were co-created by most of the subtitling companies (and other stakeholders) in Sweden, including the company responsible for the subtitles in this study.

Photo by CAR GIRL on Unsplash

The results of the quality assessment showed that quality was higher in the 2010 subtitles, and this was true for all three aspects we investigated. The 2020 subtitles contained almost seven times as many errors. For instance, translation errors, unidiomatic language use, and unconventional line-breaks were frequent in these subtitles. Also, several patterns were found in the 2020 subtitles. The number of subtitles (subtitle density) were much higher, almost three more subtitles per minute, and the number of one-liners were also high (according to Swedish norms, two-liners are preferrable when possible). There were also examples of important information missing in the subtitles, errors in punctuation where speaker dashes were used in a misleading way, and several of the subtitles had a kind of telegraphic style, where full stops were used instead of conjunctions even though there were no pauses in the dialogue. Also, more spoken language features were transferred to the 2020 subtitles. All in all, the two analyses resulted in fourteen features of the 2020 subtitles that can be categorized in accordance with the three aspects of the FAR model:

Functional equivalence Acceptability Readability
1. Semantic errors 3. Stylistic errors 7. Unsyntactic segmentation
2. Infelicitous omissions 4. Grammar errors 8. Spotting errors
5. Spelling errors 9. High reading speed, cps
6. Idiomaticity errors 10. High subtitle density
11. High percentage of 1-liners
12. Increased orality
13 Punctuation errors
14. Low cohesion

Results from Hagström & Pedersen’s (2022) study

Based on this result, we concluded that the 2020 subtitles in our study were lower in quality, less complete, and less cohesive. They were also characterized by increased orality in the sense that spoken language features like repetitions and filler words, which usually would be omitted, were present in the subtitles. There can be many reasons for the differences in the subtitles we investigated. However, we did not investigate the process so we cannot offer any explanations. Perhaps future studies will shed light on this aspect. For now, I will continue to investigate these new subtitles from a different perspective – namely the viewers. In my PhD thesis, I’m looking into how Swedish viewers react to features of machine translated and postedited interlingual subtitles. I’m conducting a reception study on interlingual subtitles with special focus on machine translated and post-edited subtitles. I will investigate how specific features of these subtitles affect viewers’ comprehension and experience while watching subtitled programs.

Sources

Hagström, H., & Pedersen, J. (2022). Subtitles in the 2020s: The Influence of Machine Translation. Journal of Audiovisual Translation, 5(1), 207–225. https://doi.org/10.47476/jat.v5i1.2022.195

Pedersen, J. (2017). The FAR model: assessing quality in interlingual subtitling. Journal of Specialised Translation. 28, 210–229.

1 thought on “Looking into the Effects of Machine Translation on Interlingual Subtitles”

Leave a Comment

css.php