Limit search to available items
Book Cover
E-book
Author Gipp, Béla, author.

Title Citation-based plagiarism detection : detecting disguised and cross-language plagiarism using citation pattern analysis / Bela Gipp
Published Wiesbaden : Springer Vieweg, 2014

Copies

Description 1 online resource (xxvi, 350 pages) : illustrations
Contents Machine generated contents note: 1. Introduction -- 1.1. Problem Setting -- 1.2. Motivation -- 1.3. Research Objective -- 1.4. Thesis Outline -- 2. Plagiarism Detection -- 2.1. Academic Plagiarism -- 2.1.1. Definition -- 2.1.2. Forms of Academic Plagiarism -- 2.1.3. Prevalence of Plagiarism in the Academic Environment -- 2.2. Plagiarism Detection Approaches -- 2.2.1. Generic Detection Approach -- 2.2.2. Overview of Plagiarism Detection Approaches -- 2.2.3. Fingerprinting -- 2.2.4. Term Occurrence Analysis -- 2.2.5. Stylometry -- 2.2.6. Cross-Language Plagiarism Detection -- 2.3. Plagiarism Detection Systems -- 2.3.1. Evaluations of PDS -- 2.3.2. Technical Weaknesses of PDS -- 2.4. Conclusion -- 3. Citation-based Document Similarity -- 3.1. Terminology -- 3.1.1. Citation vs. Reference -- 3.1.2. Similarity vs. Relatedness -- 3.1.3. Dimensions of Similarity: Lexical, Semantic, Structural -- 3.2. Citation-based Similarity Measures -- 3.2.1. Direct Citation -- 3.2.2. Bibliographic Coupling -- 3.2.3. Co-citation -- 3.2.4. Amsler -- 3.2.5. Co-citation Proximity-based Methods -- 3.3. Conclusion -- 4. Citation-based Plagiarism Detection -- 4.1. Concept -- 4.1.1. Citing Behavior -- 4.2. Citation Characteristics Considered -- 4.2.1. Bibliographic Coupling Strength -- 4.2.2. Probability of Citation Co-occurrence -- 4.2.3. Order and Proximity of Citations -- 4.3. Challenges to Citation Pattern Identification -- 4.3.1. Unknown Pattern Constituents -- 4.3.2. Transpositions -- 4.3.3. Scaling -- 4.3.4. Insertions or Substitutions of Citations -- 4.4. Design of Citation-based Detection Algorithms -- 4.4.1. Bibliographic Coupling (BC) -- 4.4.2. Longest Common Citation Sequence (LCCS) -- 4.4.3. Greedy Citation Tiling (GCT) -- 4.4.4. Citation Chunking (Cit-Chunk) -- 4.5. Projected Suitability of CbPD Algorithms for Plagiarism Forms -- 4.6. Assessment of Identified Citation Patterns -- 4.6.1. Citing Frequency-Score (CF-Score) -- 4.6.2. Continuity-Score (Cont.-Score) -- 4.7. Conclusion -- 5. Prototype: CitePlag -- 5.1. Document Parser -- 5.2. Database -- 5.2.1. Consolidation of Reference Identifiers -- 5.3. Detector -- 5.4. Frontend -- 5.5. Conclusion -- 6. Quantitative and Qualitative Evaluation -- 6.1. Methodology -- 6.1.1. Test Collection Requirements -- 6.1.2. Test Collection Challenges -- 6.1.3. GuttenPlag Wiki -- 6.1.4. VroniPlag Wiki -- 6.1.5. PubMed Central OAS -- 6.1.6. Summary and Comparison of Test Collections -- 6.2. Evaluation using GuttenPlag Wiki -- 6.3. Evaluation using VroniPlag Wiki -- 6.3.1. Evaluation: Random Sample of Sources -- 6.3.2. Evaluation: Translated Plagiarism -- 6.3.3. Evaluation: Plagiarism Case Heun -- 6.3.4. Conclusion VroniPlag Wiki -- 6.4. Evaluation using PubMed Central OAS -- 6.4.1. Methodology -- 6.4.2. Results -- 6.4.3. Conclusion of PMC OAS Evaluation -- 6.5. Conclusion of Evaluations -- 7. Summary & Future Work -- 7.1. Summary -- 7.2. Contributions -- 7.3. Future Work -- 7.3.1. General Research Need -- 7.3.2. Improvements to Detection Accuracy -- 7.3.3. Additional Applications -- 7.3.4. Further Evaluations -- References -- Appendix -- A. Preliminary PMC OAS Corpus Analysis -- A.1. Bibliographic Coupling -- A.2. Longest Common Citation Sequence -- A.3. Greedy Citation Tiling -- A.4. Citation Chunking -- A.5. Character-based PDS Sherlock -- A.6. Character-based PDS Encoplot -- B. Technical Details of the CitePlag Prototype -- B.1. Sentence-Word-Tagger (SW-Tagger) -- B.2. Data Parser -- B.3. Consolidation of Reference Identifiers -- B.4. Database Documentation -- C. Data and Source-code Downloads -- D. Related Publications -- E. Patent Application -- F. User Study Feedback -- G. Reactions of Contacted Authors -- H. Empirical Studies on Plagiarism Frequencies -- I. Studies on Citation-based Similarity Measures -- J. Overview of Selected PDS
Summary Plagiarism is a problem with far-reaching consequences for the sciences. However, even today's best software-based systems can only reliably identify copy & paste plagiarism. Disguised plagiarism forms, including paraphrased text, cross-language plagiarism, as well as structural and idea plagiarism often remain undetected. This weakness of current systems results in a large percentage of scientific plagiarism going undetected. Bela Gipp provides an overview of the state-of-the art in plagiarism detection and an analysis of why these approaches fail to detect disguised plagiarism forms. The author proposes Citation-based Plagiarism Detection to address this shortcoming. Unlike character-based approaches, this approach does not rely on text comparisons alone, but analyzes citation patterns within documents to form a language-independent "semantic fingerprint" for similarity assessment. The practicability of Citation-based Plagiarism Detection was proven by its capability to identify so-far non-machine detectable plagiarism in scientific publications. Contents Current state of plagiarism detection approaches and systems Citation-based Plagiarism Detection Target Groups Readers interested in the problem of plagiarism in the sciences Faculty and students from all disciplines, but especially computer science The Author Bela Gipp is a postdoctoral researcher at the University of California, Berkeley
Notes "Dissertation Otto-von-Guericke University Magdeburg, Germany, 2013."
Bibliography Includes bibliographical references and index
Notes Online resource; title from PDF title page (SpringerLink, viewed July 7, 2014)
Subject Optical pattern recognition.
Information retrieval.
Plagiarism.
Bibliographical citations.
information retrieval.
plagiarism.
COMPUTERS -- General.
Bibliographical citations
Information retrieval
Optical pattern recognition
Plagiarism
Form Electronic book
ISBN 9783658063948
3658063947
3658063939
9783658063931