Intro; Preface; Organization; Contents; Data Acquisition and Annotation; Effects of Training Data Size and Class Imbalance on the Performance of Classifiers; 1 Introduction; 2 Classifiers and Performance Criterion; 2.1 Classifiers; 2.2 Performance Metric; 2.3 Datasets; 2.4 Experimental Design; 2.5 Predictive Performance and Discussion; 2.6 Ranks of Classifiers and Discussion; 3 Conclusions and Future Work; References; SentiRusColl: Russian Collocation Lexicon for Sentiment Analysis; 1 Introduction; 2 Previous Work; 3 Text Corpora; 4 Creating the Sentiment Collocation Lexicon
5 Results and Discussion6 Conclusion; References; An Approach to Inter-annotator Agreement Evaluation for the Named Entities Annotation Task at OpenCorpora; 1 Introduction; 1.1 Named Entities Annotation in OpenCorpora; 1.2 The Purpose of Measuring the IAA for the NEA Task; 2 Overview of Approaches Towards IAA Evaluation; 3 A Procedure for IAA Evaluation for the NEA Case; 4 Approbation; 4.1 Setting; 4.2 Evaluation of IAA for Full Corpus; 4.3 An Experiment with Motivated Volunteers; 4.4 Detailed Evaluation of IAA for Subcorpora by Tags Groups; 4.5 Evaluation of IAA for a Corpus of Recipes
5 ConclusionReferences; Human-Computer Interaction; Soft Estimates of User Protection from Social Engineering Attacks; 1 Introduction; 2 Related Works; 3 Models of the ̀̀Critical Documents-Information System-User-Malefactor'' Complex; 4 Approach for User Security Estimation; 5 Example; 6 Discussion; 7 Conclusion; References; Retrieval of Visually Shared News; 1 Introduction; 2 Culturally Distinctive Keywords; 2.1 Annotation; 2.2 Retrieval Analysis; 2.3 Anomalies Due to Midsummer Coverage by Sputnik; 3 Identification of Visually Shared Mass Media Content; 3.1 The Image Dataset; 3.2 The Model
3.3 Model Training and Hyper-parameter Tuning3.4 The Final Deduplicated Dataset; 3.5 Evaluation; 4 Visual Shares; 5 Conclusion; References; Statistical Natural Language Processing; Comparative Analysis of Scientific Papers Collections via Topic Modeling and Co-authorship Networks; 1 Introduction; 2 Methods; 3 Experimental Part; 4 Conclusions; References; Experimental Comparison of Unsupervised Approaches in the Task of Separating Specializations Within Professions in Job Vacancies; 1 Introduction; 2 Related Work; 3 Dataset Description; 4 Experimental Setup; 4.1 Vector Space Models
4.2 Clustering Methods4.3 Evaluation; 5 Results and Discussion; 6 Conclusion; References; Usage of HMM-Based Speech Recognition Methods for Automated Determination of a Similarity Level Between Languages; 1 Introduction; 2 Data; 3 Experiment; 4 Euclidean Metrics and Its Improvements; 5 Kullback-Leibler Divergence; 6 Discussion and Conclusions; References; Prosodic Boundaries Prediction in Russian Using Morphological and Syntactic Features; 1 Introduction; 2 Experimental Material; 3 Features; 4 Modeling; 4.1 Rule Governed Method; 4.2 Conditional Random Fields; 4.3 BiLSTM