Limit search to available items
Book Cover
E-book
Author Dunn, Jonathan, author

Title Natural language processing for corpus linguistics / Jonathan Dunn
Published Cambridge, United Kingdom ; New York, NY : Cambridge University Press, 2022
©2022

Copies

Description 1 online resource (84 pages) : illustrations, maps
Series Cambridge elements. Elements in corpus linguistics
Cambridge elements. Elements in corpus linguistics.
Contents Cover -- Tilte Page -- Coypright Page -- Natural Language Processing for Corpus Linguistics -- Contents -- Accessing the Code Notebooks -- 1 Computational Linguistic Analysis -- 1.1 Scaling Up Corpus Linguistics -- 1.2 The Case Studies -- 1.3 Categorization Problems -- 1.4 Comparison Problems -- 1.5 Language in Vector Space -- 1.6 Ethics: Data Rights -- 2 Text Classification -- 2.1 Evaluating Classifiers -- 2.2 Representing Content -- 2.3 Representing Structure -- 2.4 Representing Context -- 2.5 Representing Sentiment -- 2.6 Logistic Regression -- 2.7 Feed-Forward Networks
2.8 Ethics: Implicit Bias -- 3 Text Similarity -- 3.1 Categorization and Cognition -- 3.2 Measuring Corpus Similarity -- 3.3 Measuring Document Similarity -- 3.4 Measuring Word Similarity Using Association -- 3.5 Measuring Word Similarity in Vector Space -- 3.6 Clustering by Similarity -- 3.7 Ethics: Model Discrimination -- 4 Validation and Visualization -- 4.1 Reporting Results for Political Speech Prediction -- 4.2 Ensuring Validity Using Box Plots -- 4.3 Unmasking Pseudonymous Authors Using Line Plots -- 4.4 Comparing Word Embeddings Using Heat Maps
4.5 Following Linguistic Diversity using Choropleth Maps -- 4.6 Ethics: Equal Access -- 5 Conclusions -- References -- Acknowledgments -- Data Availability Statement
Summary Corpus analysis can be expanded and scaled up by incorporating computational methods from natural language processing. This Element shows how text classification and text similarity models can extend our ability to undertake corpus linguistics across very large corpora
Notes Online resource; title from digital title page (viewed on May 10, 2022)
Subject Corpora (Linguistics) -- Data processing
Natural language processing (Computer science)
Natural Language Processing
Natural language processing (Computer science)
Form Electronic book
ISBN 9781009070447
1009070444