Cover -- Tilte Page -- Coypright Page -- Natural Language Processing for Corpus Linguistics -- Contents -- Accessing the Code Notebooks -- 1 Computational Linguistic Analysis -- 1.1 Scaling Up Corpus Linguistics -- 1.2 The Case Studies -- 1.3 Categorization Problems -- 1.4 Comparison Problems -- 1.5 Language in Vector Space -- 1.6 Ethics: Data Rights -- 2 Text Classification -- 2.1 Evaluating Classifiers -- 2.2 Representing Content -- 2.3 Representing Structure -- 2.4 Representing Context -- 2.5 Representing Sentiment -- 2.6 Logistic Regression -- 2.7 Feed-Forward Networks
2.8 Ethics: Implicit Bias -- 3 Text Similarity -- 3.1 Categorization and Cognition -- 3.2 Measuring Corpus Similarity -- 3.3 Measuring Document Similarity -- 3.4 Measuring Word Similarity Using Association -- 3.5 Measuring Word Similarity in Vector Space -- 3.6 Clustering by Similarity -- 3.7 Ethics: Model Discrimination -- 4 Validation and Visualization -- 4.1 Reporting Results for Political Speech Prediction -- 4.2 Ensuring Validity Using Box Plots -- 4.3 Unmasking Pseudonymous Authors Using Line Plots -- 4.4 Comparing Word Embeddings Using Heat Maps
4.5 Following Linguistic Diversity using Choropleth Maps -- 4.6 Ethics: Equal Access -- 5 Conclusions -- References -- Acknowledgments -- Data Availability Statement
Summary
Corpus analysis can be expanded and scaled up by incorporating computational methods from natural language processing. This Element shows how text classification and text similarity models can extend our ability to undertake corpus linguistics across very large corpora
Notes
Online resource; title from digital title page (viewed on May 10, 2022)