Projects & Grants

Internal Grant Competition DGC

Quantitative analysis of texts of CzeSL-SGT corpus
Project IdSGS06/FF/2022
Main solverMgr. Miroslav Kubát, Ph.D.
Period1/2022 - 12/2022
ProviderSpecifický VŠ výzkum
AnotationThe project will focus on the quantitative analysis of the texts of the CzeSL-SGT corpus in order to obtain data on texts of individual language levels, to model the development of these texts and to analyze the process of learning Czech as a foreign language. This corpus contains over 8000 texts written by learners of Czech as a foreign language at all language levels. We will analyze the texts using the QuitaUP and UDPipe software, which allow us to compute various properties of the texts. In particular, we will be interested in the average length of tokens, the descriptivity of the text, the verb distances, the length of sentences, lexical richness, the number of clauses in a sentence, syntactic characteristics of dependency trees. This project is the first phase of research towards a M. Hanušková's PhD thesis focused on the analysis of texts written by non-native speakers of Czech. The applied methods are also involved in MA theses of M. Nogolová and M. Guńková. The results of the research will be presented at linguistic conferences and in a scientific articles.