Limit search to available items
Book Cover
E-book
Author Dror, Rotem, author.

Title Statistical significance testing for natural language processing / Rotem Dror, Lotem Peled-Cohen, Segev Shlomov, and Roi Reichart, Technion--Israel Institute of Technology
Published Cham, Switzerland : Springer, [2020]
©2020

Copies

Description 1 online resource (xvii, 98 pages) : illustrations
Series Synthesis lectures on human language technologies ; #45
Synthesis lectures on human language technologies ; lecture #45.
Contents Introduction -- Statistical Hypothesis Testing -- Hypothesis Testing -- P-Value in the World of NLP -- Statistical Significance Tests -- Preliminaries -- Parametric Tests -- Nonparametric Tests -- Statistical Significance in NLP -- NLP Tasks and Evaluation Measures -- Decision Tree for Significance Test Selection -- Matching Between Evaluation Measures and Statistical Significance Tests -- Significance with Large Test Samples -- Deep Significance -- Performance Variance in Deep Neural Network Models -- A Deep Neural Network Comparison Framework -- Existing Methods for Deep Neural Network Comparison -- Almost Stochastic Dominance -- Empirical Analysis -- Error Rate Analysis -- Summary -- Replicability Analysis -- The Multiplicity Problem -- A Multiple Hypothesis Testing Framework for Algorithm Comparison -- Replicability Analysis with Partial Conjunction Testing -- Replicability Analysis: Counting -- Replicability Analysis: Identification -- Synthetic Experiments -- Real-World Data Applications -- Applications and Data -- Statistical Significance Testing -- Results -- Results Summary and Overview -- Open Questions and Challenges
Summary Data-driven experimental analysis has become the main evaluation tool of Natural Language Processing (NLP) algorithms. In fact, in the last decade, it has become rare to see an NLP paper, particularly one that proposes a new algorithm, that does not include extensive experimental analysis, and the number of involved tasks, datasets, domains, and languages is constantly growing. This emphasis on empirical results highlights the role of statistical significance testing in NLP research: If we, as a community, rely on empirical evaluation to validate our hypotheses and reveal the correct language processing mechanisms, we better be sure that our results are not coincidental. The goal of this book is to discuss the main aspects of statistical significance testing in NLP. Our guiding assumption throughout the book is that the basic question NLP researchers and engineers deal with is whether or not one algorithm can be considered better than another one. This question drives the field forward as it allows the constant progress of developing better technology for language processing challenges. In practice, researchers and engineers would like to draw the right conclusion from a limited set of experiments, and this conclusion should hold for other experiments with datasets they do not have at their disposal or that they cannot perform due to limited time and resources. The book hence discusses the opportunities and challenges in using statistical significance testing in NLP, from the point of view of experimental comparison between two algorithms. We cover topics such as choosing an appropriate significance test for the major NLP tasks, dealing with the unique aspects of significance testing for non-convex deep neural networks, accounting for a large number of comparisons between two NLP algorithms in a statistically valid manner (multiple hypothesis testing), and, finally, the unique challenges yielded by the nature of the data and practices of the field
Bibliography Includes bibliographical references
Notes Online resource; title from digital title page (viewed on April 27, 2020)
Subject Natural language processing (Computer science)
Natural language processing (Computer science)
Form Electronic book
Author Peled-Cohen, Lotem, author.
Shlomov, Segev, author.
Reichart, Roi, 1980- author.
ISBN 1681737965
9781681738307
1681738309
9781681737966
9783031021749
3031021746