Programming for Corpus Linguistics with Python and Dataframes
- 114bladzijden
- 4 uur lezen
Focusing on advanced programming techniques, this Element equips intermediate or experienced programmers with algorithms tailored for Corpus Linguistic (CL) analysis using Python dataframes. It showcases methods for handling large datasets and demonstrates practical applications such as creating concordances, identifying collocates, and conducting key feature analysis. Additionally, it introduces algorithms for constructing dataframe corpora, incorporating tokenization, part-of-speech tagging, and lemmatization with spaCy, enabling innovative analyses beyond traditional corpus software capabilities.
