Description |
1 online resource (xx, 248 pages) : illustrations (some color) |
Series |
Synthesis lectures on human language technologies, 1947-4059 ; #13 |
|
Synthesis lectures on human language technologies ; lecture #13.
|
Contents |
Preface -- Acknowledgments -- 1. Representations and linguistic data -- Sequential prediction -- Sequence segmentation -- Word classes and sequence labeling -- Morphological disambiguation -- Chunking -- Syntax -- Semantics -- Coreference resolution -- Sentiment analysis -- Discourse -- Alignment -- Text-to-text transformations -- Types -- Why linguistic structure is a moving target -- Conclusion -- 2. Decoding: making predictions -- Definitions -- Five views of decoding -- Probabilistic graphical models -- Polytopes -- Parsing with grammars -- Graphs and hypergraphs -- Weighted logic programs -- Dynamic programming -- Shortest or minimum-cost path -- Semirings -- DP as logical deduction -- Solving DPs -- Approximate search -- Reranking and coarse-to-fine decoding -- Specialized graph algorithms -- Bipartite matchings -- Spanning trees -- Maximum flow and minimum cut -- Conclusion -- 3. Learning structure from annotated data -- Annotated data -- Generic formulation of learning -- Generative models -- Decoding rule -- Multinomial-based models -- Hidden Markov models -- Probabilistic context-free grammars -- Other generative multinomial-based models -- Maximum likelihood estimation by counting -- Maximum a posteriori estimation -- Alternative parameterization: log-linear models -- Comments -- Conditional models -- Globally normalized conditional log-linear models -- Logistic regression -- Conditional random fields -- Feature choice -- Maximum likelihood estimation -- Maximum a posteriori estimation -- Pseudolikelihood -- Toward discriminative learning -- Large margin methods -- Binary classification -- Perceptron -- Multi-class support vector machines -- Structural SVM -- Optimization -- Discussion -- Conclusion -- 4. Learning structure from incomplete data -- Unsupervised generative models -- Expectation maximization -- Word clustering -- Hard and soft k-means -- The structured case -- Hidden Markov models -- EM iterations improve likelihood -- Extensions and improvements -- Log-linear EM -- Contrastive estimation -- Bayesian unsupervised learning -- Empirical Bayes -- Latent Dirichlet allocation -- EM in the empirical Bayesian setting -- Inference -- Nonparametric Bayesian methods -- Discussion -- Hidden variable learning -- Generative models with hidden variables -- Conditional log-linear models with hidden variables -- Large margin methods with hidden variables -- Conclusion -- 5. Beyond decoding: inference -- Partition functions: summing over Y -- Summing by dynamic programming -- Other summing algorithms -- Feature expectations -- Reverse DPs -- Another interpretation of reverse values -- From reverse values to expectations -- Deriving the reverse DP -- Non-DP expectations -- Minimum Bayes risk decoding -- Cost-augmented decoding -- Decoding with hidden variables -- Conclusion -- A. Numerical optimization -- A.1. The hill-climbing analogy -- A.2. Coordinate ascent -- A.3. Gradient ascent -- Subgradient methods -- Stochastic gradient ascent -- A.4. Conjugate gradient and quasi-Newton methods -- Conjugate gradient -- Newton's method -- Limited memory BFGS -- A.5. "Aggressive" online learners -- A.6. Improved iterative scaling -- B. Experimentation -- B.1. Methodology -- Training, development, and testing -- Cross-validation -- Comparison without replication -- Oracles and upper bounds -- B.2. Hypothesis testing and related topics -- Terminology -- Standard error -- Beyond standard error for sample means -- Confidence intervals -- Hypothesis tests -- Closing notes -- C. Maximum entropy -- D. Locally normalized conditional models -- Probabilistic finite-state automata -- Maximum entropy Markov models -- Directional effects -- Comparison to globally normalized models -- Decoding -- Theory vs. practice -- Bibliography -- Author's biography -- Index |
Summary |
A major part of natural language processing now depends on the use of text data to build linguistic analyzers. We consider statistical, computational approaches to modeling linguistic structure. We seek to unify across many approaches and many kinds of linguistic structures. Assuming a basic understanding of natural language processing and/or machine learning, we seek to bridge the gap between the two fields. Approaches to decoding (i.e., carrying out linguistic structure prediction) and supervised and unsupervised learning of models that predict discrete structures as outputs are the focus. We also survey natural language processing problems to which these methods are being applied, and we address related topics in probabilistic inference, optimization, and experimental methodology |
Analysis |
Natural language processing |
|
Computational linguistics |
|
Machine learning |
|
Decoding |
|
Supervised learning |
|
Unsupervised learning |
|
Structured prediction |
|
Probabilistic inference |
|
Statistical modeling |
Bibliography |
Includes bibliographical references (pages 209-240) and index |
Subject |
Natural language processing (Computer science)
|
|
Computational linguistics.
|
|
Linguistic analysis (Linguistics) -- Data processing
|
|
Natural Language Processing
|
|
computational linguistics.
|
|
COMPUTERS -- Natural Language Processing.
|
|
Computational linguistics
|
|
Natural language processing (Computer science)
|
Form |
Electronic book
|
ISBN |
9781608454068 |
|
1608454061 |
|
9783031021435 |
|
3031021436 |
|