Limit search to available items
Book Cover
E-book
Author Wachsmuth, Henning.

Title Text analysis pipelines : towards ad-hoc large scale text mining / Henning Wachsmuth
Published Cham : Springer, [2015]
©2015

Copies

Description 1 online resource (xx, 302 pages) : illustrations
Series LNCS sublibrary. SL 1, Theoretical computer science and general issues
Lecture notes in computer science ; 9383. 1611-3349
LNCS sublibrary. SL 1, Theoretical computer science and general issues.
Contents Intro -- Foreword -- Preface -- Symbols -- Contents -- 1 Introduction -- 1.1 Information Search in Times of Big Data -- 1.1.1 Text Mining to the Rescue -- 1.2 A Need for Efficient and Robust Text Analysis Pipelines -- 1.2.1 Basic Text Analysis Scenario -- 1.2.2 Shortcomings of Traditional Text Analysis Pipelines -- 1.2.3 Problems Approached in This Book -- 1.3 Towards Intelligent Pipeline Design and Execution -- 1.3.1 Central Research Question and Method -- 1.3.2 An Artificial Intelligence Approach -- 1.4 Contributions and Outline of This Book
1.4.1 New Findings in Ad-Hoc Large-Scale Text Mining -- 1.4.2 Contributions to the Concerned Research Fields -- 1.4.3 Structure of the Remaining Chapters -- 1.4.4 Published Research Within This Book -- 2 Text Analysis Pipelines -- 2.1 Foundations of Text Mining -- 2.1.1 Text Mining -- 2.1.2 Information Retrieval -- 2.1.3 Natural Language Processing -- 2.1.4 Data Mining -- 2.1.5 Development and Evaluation -- 2.2 Text Analysis Tasks, Processes, and Pipelines -- 2.2.1 Text Analysis Tasks -- 2.2.2 Text Analysis Processes -- 2.2.3 Text Analysis Pipelines -- 2.3 Case Studies in This Book
2.3.1 InfexBA -- Information Extraction for Business Applications -- 2.3.2 ArguAna -- Argumentation Analysis in Customer Opinions -- 2.3.3 Other Evaluated Text Analysis Tasks -- 2.4 State of the Art in Ad-Hoc Large-Scale Text Mining -- 2.4.1 Text Analysis Approaches -- 2.4.2 Design of Text Analysis Approaches -- 2.4.3 Efficiency of Text Analysis Approaches -- 2.4.4 Robustness of Text Analysis Approaches -- 3 Pipeline Design -- 3.1 Ideal Construction and Execution for Ad-Hoc Text Mining -- 3.1.1 The Optimality of Text Analysis Pipelines
3.1.2 Paradigms of Designing Optimal Text Analysis Pipelines -- 3.1.3 Case Study of Ideal Construction and Execution -- 3.1.4 Discussion of Ideal Construction and Execution -- 3.2 A Process-Oriented View of Text Analysis -- 3.2.1 Text Analysis as an Annotation Task -- 3.2.2 Modeling the Information to Be Annotated -- 3.2.3 Modeling the Quality to Be Achieved by the Annotation -- 3.2.4 Modeling the Analysis to Be Performed for Annotation -- 3.2.5 Defining an Annotation Task Ontology -- 3.2.6 Discussion of the Process-Oriented View -- 3.3 Ad-Hoc Construction via Partial Order Planning
3.3.1 Modeling Algorithm Selection as a Planning Problem -- 3.3.2 Selecting the Algorithms of a Partially Ordered Pipeline -- 3.3.3 Linearizing the Partially Ordered Pipeline -- 3.3.4 Properties of the Proposed Approach -- 3.3.5 An Expert System for Ad-Hoc Construction -- 3.3.6 Evaluation of Ad-Hoc Construction -- 3.3.7 Discussion of Ad-Hoc Construction -- 3.4 An Information-Oriented View of Text Analysis -- 3.4.1 Text Analysis as a Filtering Task -- 3.4.2 Defining the Relevance of Portions of Text -- 3.4.3 Specifying a Degree of Filtering for Each Relation Type
Summary This monograph proposes a comprehensive and fully automatic approach to designing text analysis pipelines for arbitrary information needs that are optimal in terms of run-time efficiency and that robustly mine relevant information from text of any kind. Based on state-of-the-art techniques from machine learning and other areas of artificial intelligence, novel pipeline construction and execution algorithms are developed and implemented in prototypical software. Formal analyses of the algorithms and extensive empirical experiments underline that the proposed approach represents an essential step towards the ad-hoc use of text mining in web search and big data analytics. Both web search and big data analytics aim to fulfill peoplesℓ́ℓ needs for information in an adhoc manner. The information sought for is often hidden in large amounts of natural language text. Instead of simply returning links to potentially relevant texts, leading search and analytics engines have started to directly mine relevant information from the texts. To this end, they execute text analysis pipelines that may consist of several complex information-extraction and text-classification stages. Due to practical requirements of efficiency and robustness, however, the use of text mining has so far been limited to anticipated information needs that can be fulfilled with rather simple, manually constructed pipelines
Notes English
Subject Data mining.
Text processing (Computer science)
Computer science.
Computers.
Logic, Symbolic and mathematical.
Database management.
Information retrieval.
Artificial intelligence.
Word processing operations.
Word processing.
Electronic data processing.
Electronic digital computers.
data processing.
computer science.
computers.
information retrieval.
artificial intelligence.
Information retrieval.
Artificial intelligence.
Mathematical theory of computation.
Databases.
User interface design & usability.
Computers -- Information Technology.
Computers -- Intelligence (AI) & Semantics.
Mathematics -- Logic.
Computers -- Database Management -- General.
Computers -- Machine Theory.
Computers -- System Administration -- Storage & Retrieval.
Word processing operations
Word processing
Electronic digital computers
Electronic data processing
Artificial intelligence
Computer science
Computers
Data mining
Database management
Information retrieval
Logic, Symbolic and mathematical
Text processing (Computer science)
Form Electronic book
ISBN 9783319257419
3319257412
3319257404
9783319257402