Register investigation on Czech: Designing an MDA-based experimental study Abstract uri icon

abstract

  • CRC Register project A03
    One way to investigate registers is to perform a multidimensional analysis (MDA) on a sufficiently large and balanced corpus (Biber, 1988; Sharoff, 2021; Cvrˇcek et al., 2020). Another way is to approach the broad topic of registers in an experimental way. In our talk, we will briefly describe how we combine the MDA method with an experimental investigation to approach the overall research question: how does register knowledge relate to grammatical aspects of linguistic knowledge in Czech language? We will then discuss the differences between MDA on Czech and English, and lastly present results of an experimental pre-study about the usage of language contexts in Czech. What registers distinctions can be reliably identified in text corpora of Czech and what linguistic features are characteristic for the different registers? Following Biber´s approach (Biber, 1988), an MDA of the Czech corpus Koditex (Zasina et al., 2018) by Cvrˇcek et al. (2020) established 8 dimensions of variation based on 122 linguistic features. There are some profound differences in the selection and grouping of features for English and Czech, since Czech is a Slavic language with rich inflection, distinctive morphology and sociolinguistic situation bordering on diglossia. The list of Czech features emphasizes morphological variation, lexicon-level variation, and type-based features, which complements more commonly used frequency-based characteristics. First two dimensions of variation identified in Czech MDA are labeled as 1. dynamic (+)/static (-) dimension and 2. spontaneous (+)/prepared (-) dimension. These two dimensions are in focus of our attention, since they explain the largest proportion of shared variance. The first dimension reflects language user´s strategy to elaborate clause members in detail (static pole), or to add new clauses and proceed to new topics in conversation (dynamic pole). Each dimension is associated with linguistic features with different positive or negative loadings. Based on these MDA findings, we pose the question: is there a systematic relationship between the assignment of certain linguistic features to dimensions by the MDA and the perception of these features by native speakers? To approach this question we have designed an experiment combining two techniques, a forced choice task and a rating on a 7 points Likert-like scale. This experimental study is led by two subordinate research questions 1. Do speakers’ intuitions reveal a particular situation-dependent relevance of certain linguistic features which corresponds to the MDA based feature distribution in a corpus (forced-choice task)? 2. Do speakers’ intuitions reveal a preference for properties of utterances with certain linguistic features which corresponds to the interpretation of the dimensions formed by the MDA of the corpus (rating task)? While the latter question can be investigated in a rating study without pretesting any experimental materials, the first question raises the issue: how can we determine situations for the experiment so that they refer to the MDA-based dimensions? To answer this question we designed an exploratory situations pre-study. Since the MDA dimensions are labelled by two terms, which can be located on two opposite ends of an imaginary scale, e.g. dynamic - static, we have created descriptions of various situations which we assigned to the respective poles of those scales. The participants rated the situations on 7 point Likert-like scales representing the MDA based dimensions. Then, we identified the best situations descriptions for the main study comparing the medians of the rating values between the items and the variance between the participants. A descriptive analysis showed that some situations evoked the desired poles of first two dimensions better than others. The pre-study offers an insight into how native speakers perceive situational context of language usage in terms of preparedness, subjectivity, and interactivity.

    References
    Biber, D. (1988). Variation across speech and writing. Cambridge UP.
    Cvrˇcek, V., Laubeov´a, Z., Lukeˇs, D., Poukarov´a, P., Rehoˇrkov´a, A., & Zasina, A. J. (2020). ˇ Registry v ˇceˇstinˇe. NLN. Sharoff, S. (2021). Genre annotation for the web: Text-external and text-internal perspectives. Register Studies, 3 (1), 1–32. Zasina, J., Lukeˇs, D., Komrskov´a, Z., Poukarov´a, P., & Rehoˇrkov´a, A. (2018). ˇ Koditex: A corpus of diversified texts. Institute of the Czech National Corpus, Faculty of Arts, Charles University. www.korpus.cz