In this talk, we will present and discuss some interim findings from an ongoing study of functionally-defined recurrent n-grams in translation. Focusing on the empirical case of English-to-Polish translations, our study uses both parallel and comparable corpora to address the following three research questions:
1) How are recurrent n-grams in source texts translated in target texts?
2) How idiomatic (i.e. 'nativelike') are the translations of these n-grams?
3) To what extent are the discourse functions carried out by functionally complex n-grams 'preserved' in translation?
We investigate questions 1 and 2 by carrying out quantitative and qualitative corpus-based analyses of two pre-selected sets of n-grams, one consisting of items with a stance-taking function ('it is clear that', 'the problem is that', and 'it is hard to'), and the other consisting of items with a text-structuring function ('is not just about', 'one of the reasons why', and 'the question of whether'). Question 3 is addressed by means of a more fine-grained analysis of the English 6-gram 'at the end of the day' and its Polish translations.
We find, firstly, that the recurrent English n-grams in our study are translated into Polish in a surprisingly wide variety of ways; secondly, that some of these translational units are rarely used in native contemporary Polish texts of the same genre, and thus arguably not idiomatic choices in Polish; and thirdly, that the discourse function of the source phrase 'at the end of the day' was not preserved in its Polish translation in about a quarter of cases. We will conclude by discussing the implications of these findings, and by looking ahead to further work.