Limit search to available items
Book Cover
Author Silge, Julia, author

Title Text mining with R : a tidy approach / Julia Silge and David Robinson
Edition First edition
Published Sebastopol, CA : O'Reilly Media, 2017
Online access available from:
Safari O'Reilly books online    View Resource Record  


Description 1 online resource
Contents Copyright; Table of Contents; Preface; Outline; Topics This Book Does Not Cover; About This Book; Conventions Used in This Book; Using Code Examples; O'Reilly Safari; How to Contact Us; Acknowledgements; Chapter 1. The Tidy Text Format; Contrasting Tidy Text with Other Data Structures; The unnest_tokens Function; Tidying the Works of Jane Austen; The gutenbergr Package; Word Frequencies; Summary; Chapter 2. Sentiment Analysis with Tidy Data; The sentiments Dataset; Sentiment Analysis with Inner Join; Comparing the Three Sentiment Dictionaries; Most Common Positive and Negative Words
Casting to a Document-Term MatrixReady for Topic Modeling; Interpreting the Topic Model; Connecting Topic Modeling with Keywords; Summary; Chapter 9. Case Study: Analyzing Usenet Text; Preprocessing; Preprocessing Text; Words in Newsgroups; Finding tf-idf Within Newsgroups; Topic Modeling; Sentiment Analysis; Sentiment Analysis by Word; Sentiment Analysis by Message; N-gram Analysis; Summary; Bibliography; Index; About the Authors; Colophon
Counting and Correlating Among SectionsExamining Pairwise Correlation; Summary; Chapter 5. Converting to and from Nontidy Formats; Tidying a Document-Term Matrix; Tidying DocumentTermMatrix Objects; Tidying dfm Objects; Casting Tidy Text Data into a Matrix; Tidying Corpus Objects with Metadata; Example: Mining Financial Articles; Summary; Chapter 6. Topic Modeling; Latent Dirichlet Allocation; Word-Topic Probabilities; Document-Topic Probabilities; Example: The Great Library Heist; LDA on Chapters; Per-Document Classification; By-Word Assignments: augment; Alternative LDA Implementations
WordcloudsLooking at Units Beyond Just Words; Summary; Chapter 3. Analyzing Word and Document Frequency: tf-idf; Term Frequency in Jane Austen's Novels; Zipf's Law; The bind_tf_idf Function; A Corpus of Physics Texts; Summary; Chapter 4. Relationships Between Words: N-grams and Correlations; Tokenizing by N-gram; Counting and Filtering N-grams; Analyzing Bigrams; Using Bigrams to Provide Context in Sentiment Analysis; Visualizing a Network of Bigrams with ggraph; Visualizing Bigrams in Other Texts; Counting and Correlating Pairs of Words with the widyr Package
Summary Chapter 7. Case Study: Comparing Twitter Archives; Getting the Data and Distribution of Tweets; Word Frequencies; Comparing Word Usage; Changes in Word Use; Favorites and Retweets; Summary; Chapter 8. Case Study: Mining NASA Metadata; How Data Is Organized at NASA; Wrangling and Tidying the Data; Some Initial Simple Exploration; Word Co-ocurrences and Correlations; Networks of Description and Title Words; Networks of Keywords; Calculating tf-idf for the Description Fields; What Is tf-idf for the Description Field Words?; Connecting Description Fields to Keywords; Topic Modeling
Bibliography Includes bibliographical references and index
Notes Online resource; title from PDF title page (EBSCO, viewed June 20, 2017)
Subject Data mining.
R (Computer program language)
Form Electronic book
Author Robinson, David (Data scientist), author
ISBN 1491981601 (electronic bk.)
1491981628 (electronic bk.)
9781491981603 (electronic bk.)
9781491981627 (electronic bk.)