Description |
1 online resource |
Contents |
Copyright; Table of Contents; Preface; Outline; Topics This Book Does Not Cover; About This Book; Conventions Used in This Book; Using Code Examples; O'Reilly Safari; How to Contact Us; Acknowledgements; Chapter 1. The Tidy Text Format; Contrasting Tidy Text with Other Data Structures; The unnest_tokens Function; Tidying the Works of Jane Austen; The gutenbergr Package; Word Frequencies; Summary; Chapter 2. Sentiment Analysis with Tidy Data; The sentiments Dataset; Sentiment Analysis with Inner Join; Comparing the Three Sentiment Dictionaries; Most Common Positive and Negative Words |
|
Casting to a Document-Term MatrixReady for Topic Modeling; Interpreting the Topic Model; Connecting Topic Modeling with Keywords; Summary; Chapter 9. Case Study: Analyzing Usenet Text; Preprocessing; Preprocessing Text; Words in Newsgroups; Finding tf-idf Within Newsgroups; Topic Modeling; Sentiment Analysis; Sentiment Analysis by Word; Sentiment Analysis by Message; N-gram Analysis; Summary; Bibliography; Index; About the Authors; Colophon |
|
Counting and Correlating Among SectionsExamining Pairwise Correlation; Summary; Chapter 5. Converting to and from Nontidy Formats; Tidying a Document-Term Matrix; Tidying DocumentTermMatrix Objects; Tidying dfm Objects; Casting Tidy Text Data into a Matrix; Tidying Corpus Objects with Metadata; Example: Mining Financial Articles; Summary; Chapter 6. Topic Modeling; Latent Dirichlet Allocation; Word-Topic Probabilities; Document-Topic Probabilities; Example: The Great Library Heist; LDA on Chapters; Per-Document Classification; By-Word Assignments: augment; Alternative LDA Implementations |
|
WordcloudsLooking at Units Beyond Just Words; Summary; Chapter 3. Analyzing Word and Document Frequency: tf-idf; Term Frequency in Jane Austen's Novels; Zipf's Law; The bind_tf_idf Function; A Corpus of Physics Texts; Summary; Chapter 4. Relationships Between Words: N-grams and Correlations; Tokenizing by N-gram; Counting and Filtering N-grams; Analyzing Bigrams; Using Bigrams to Provide Context in Sentiment Analysis; Visualizing a Network of Bigrams with ggraph; Visualizing Bigrams in Other Texts; Counting and Correlating Pairs of Words with the widyr Package |
Summary |
Chapter 7. Case Study: Comparing Twitter Archives; Getting the Data and Distribution of Tweets; Word Frequencies; Comparing Word Usage; Changes in Word Use; Favorites and Retweets; Summary; Chapter 8. Case Study: Mining NASA Metadata; How Data Is Organized at NASA; Wrangling and Tidying the Data; Some Initial Simple Exploration; Word Co-ocurrences and Correlations; Networks of Description and Title Words; Networks of Keywords; Calculating tf-idf for the Description Fields; What Is tf-idf for the Description Field Words?; Connecting Description Fields to Keywords; Topic Modeling |
Bibliography |
Includes bibliographical references and index |
Notes |
Online resource; title from PDF title page (EBSCO, viewed June 20, 2017) |
Subject |
Data mining.
|
|
R (Computer program language)
|
Form |
Electronic book
|
Author |
Robinson, David (Data scientist), author
|
ISBN |
1491981601 (electronic bk.) |
|
1491981628 (electronic bk.) |
|
1491981652 |
|
9781491981603 (electronic bk.) |
|
9781491981627 (electronic bk.) |
|
9781491981658 |
|