Title Analyzing network data in biology and medicine : an interdisciplinary textbook for biological, medical and computational scientists / edited by Nataa Prulj, University College London

Published Cambridge, United Kingdom ; New York, NY : Cambridge University Press, 2019

Click on the following:

Cambridge Core

Copies

Description 1 online resource

Contents Cover -- Half Title -- Title Page -- Copyright Information -- Dedication -- Contents -- Preface -- 1 From Genetic Data to Medicine: From DNA Samples to Disease Risk Prediction in Personalized Genetic Tests -- 1.1 Background -- 1.2 Genetic Tests in Healthcare -- 1.2.1 Types of Genetic Tests -- 1.2.2 Genetic Tests Providers -- 1.3 Common Technologies and Algorithms for SNPs Identification -- 1.3.1 Microarrays -- 1.3.1.1 Affymetrix SNP Microarrays -- 1.3.1.2 Illumina SNP BeadChips -- 1.3.1.3 Algorithms for Genotyping -- 1.3.2 Next Generation Sequencing -- 1.3.2.1 The Illumina NGS Platform -- 1.3.2.2 Algorithms for SNP Calling and Genotyping -- 1.3.3 Pros and Cons of Microarrays and NGS -- 1.4 Algorithms to Predict SNP-Disease Association -- 1.4.1 Single-SNP Association Studies -- 1.4.2 Multi-SNP Association Studies -- 1.4.2.1 Logistic Regression Models -- 1.4.2.2 Support Vector Machines (SVMs) -- 1.4.2.3 Random Forests (RFs) -- 1.4.2.4 Bayesian Networks (BNs) -- 1.4.3 Predictive genetic risk models in DTC services -- 1.5 Perspectives and Recommendations -- 1.6 Exercises -- 1.7 Acknowledgments -- References -- 2 Epigenetic Data and Disease -- 2.1 Background -- 2.2 DNA Methylation and its Role in Genome Regulation -- 2.2.1 DNA Demethylation and its Role in Genomic Profiles -- 2.2.2 Different Experimental Strategies for the DNA Methylation Analysis -- 2.2.3 Processing and Analysis Methods and Tools for DNA Methylation Data from Bisulfite Based Assays -- 2.2.3.1 Bisulfite Conversion -- 2.2.3.2 Methylation Microarrays -- 2.3 The Post-Translational Modifications of Histones -- 2.3.1 Experimental Evaluation of Post-Translational Modifications of Histones -- 2.3.2 ChIP-seq Data Analysis -- 2.4 Higher Order Chromatin Organization -- 2.4.1 Technologies to Study Chromatin Conformation -- 2.4.1.1 The 3C, 4C, 5C, and ChIA-PET Technologies

2.4.1.2 The Hi-C Technology -- 2.4.2 Bioinformatic Methods of Hi-C Analysis -- 2.4.3 Mapping and Filtering -- 2.4.4 Normalization -- 2.4.5 Statistical Analysis -- 2.4.6 Visualization of Hi-C Data -- 2.4.7 Topological Associated Domain Identification from Hi-C Data -- 2.5 Long Non-Coding RNAs, Novel Molecular Regulators -- 2.5.1 The Implications of lncRNAs in Precision Medicine -- 2.5.2 Bioinformatic Tools for lncRNAs Analysis -- 2.5.3 Analysis of Annotated lncRNAs -- 2.5.4 Analysis of Unannotated lncRNAs -- 2.6 Epigenetic Databases -- 2.6.1 Encyclopedia of DNA Elements in the Human Genome -- 2.6.2 The Roadmap Epigenomics Project -- 2.6.3 Functional Annotation of the Mammalian Genome -- 2.6.4 BLUEPRINT Epigenome -- 2.6.5 The International Human Epigenome Consortium -- 2.7 Conclusion and Final Remarks -- 2.8 Exercises -- 2.9 Acknowledgements -- References -- 3 Introduction to Graph and Network Theory -- 3.1 Motivation -- 3.2 Background -- 3.2.1 Mathematical Background -- 3.2.1.1 Matrix Operations -- 3.2.1.2 Special Matrices -- 3.2.1.3 Sets of Vectors -- 3.2.1.4 Matrix Spectral Decomposition -- 3.2.2 Computational Complexity -- 3.3 Graph Theory -- 3.3.1 Definitions -- 3.3.2 Degree and Neighborhood -- 3.3.3 Subgraphs and Connectedness -- 3.3.4 Types of Graphs -- 3.3.5 Classic Graph Theory Problems -- 3.3.5.1 Eulerian Circuit -- 3.3.5.2 Hamiltonian Paths -- 3.3.5.3 Matching -- 3.3.6 Data Structures and Search Algorithms for Graphs -- 3.3.6.1 Data Structures -- 3.3.6.2 Graph Search Algorithms -- 3.3.7 Spectral Graph Theory -- 3.4 Network Measures -- 3.4.1 Network Properties -- 3.4.2 Network Models -- 3.4.2.1 Erdős-Renyi Random Graphs -- 3.4.2.2 Scale-free Networks -- 3.4.2.3 Geometric Networks -- 3.4.2.4 Stickiness Index Based Networks -- 3.5 Summary -- 3.6 Exercises -- 3.7 Acknowledgments -- References

4 Protein-Protein Interaction Data, their Quality, and Major Public Databases -- 4.1 Protein-Protein Interactions: Introduction and Motivation -- 4.2 Experimental Detection and Computational Prediction of PPIs -- 4.2.1 Experimental Methods -- 4.2.2 Computational Methods -- 4.2.3 Errors and Challenges -- 4.2.3.1 Error Rates -- 4.2.3.2 Biases -- 4.3 Challenges of Data Integration -- 4.3.1 Heterogeneity in Biological Data -- 4.4 Protein-Protein Interaction Databases -- 4.4.1 Curated Databases -- 4.4.2 Prediction Databases -- 4.4.2.1 Integrated Databases -- 4.4.3 PPI Context Annotation -- 4.4.3.1 Subcellular Localization -- 4.4.3.2 Tissue Annotation -- 4.4.3.3 Disease -- 4.5 Protein-Protein Interaction Networks and their Properties -- 4.5.1 Short Introduction to Networks -- 4.5.2 Network Construction -- 4.5.3 Properties of PPI Networks -- 4.5.3.1 Degree and Betweenness Centrality -- 4.5.3.2 Articulation Points -- 4.5.3.3 Graph Density -- 4.5.3.4 Distance -- 4.5.3.5 Clustering Coefficient -- 4.5.3.6 Cliques -- 4.5.3.7 Other Properties -- 4.5.4 PPI Network Annotations and Visualization -- 4.5.4.1 Qualitative Annotations -- 4.5.4.2 Quantitative Annotations -- 4.6 Applications of PPI Network Analysis -- 4.6.1 Identification of Disease-associated Genes -- 4.6.2 Improvement of Gene Signatures -- 4.6.3 Prediction of Drug Targets -- 4.6.4 Annotation of Protein Functions -- 4.7 Integrative Computational Biology Workflow -- 4.8 Closing Remarks -- 4.9 Exercises -- 4.10 Acknowledgments -- References -- 5 Graphlets in Network Science and Computational Biology -- 5.1 Introduction -- 5.2 Graphlets and Graphlet-based Measures of Network Topology -- 5.2.1 Graphlets -- 5.2.1.1 Original Graphlets -- 5.2.1.2 Directed Graphlets -- 5.2.1.3 Dynamic Graphlets -- 5.2.1.4 Heterogeneous Graphlets -- 5.2.1.5 Ordered Graphlets

5.2.2 Graphlet-based Measures of Topological Position of Individual Nodes, Edges, or Non-edges -- 5.2.2.1 Graphlet Orbits -- 5.2.2.2 Graphlet Degree Vector (GDV) -- 5.2.2.3 GDV-similarity -- 5.2.2.4 GDV-centrality -- 5.2.3 Graphlet-based Measures of Entire Network Topology -- 5.2.3.1 Graphlet Frequency Vector (GFV) -- 5.2.3.2 Graphlet Degree Distributions (GDDs) -- 5.2.3.3 Graphlet Correlation Matrix (GCM) -- 5.3 Computational Approaches Based on the Graphlet Measures -- 5.3.1 Clustering of Nodes or Edges in a Network -- 5.3.2 Dominating Set of a Network -- 5.3.3 Link Prediction -- 5.3.4 Network Comparison -- 5.3.4.1 Alignment-free Network Comparison -- 5.3.4.2 Alignment-based Network Comparison: Network Alignment (NA) -- 5.4 Biological Applications of the Graphlet Measures -- 5.4.1 Protein Function Prediction -- 5.4.2 Aging -- 5.4.2.1 Static Analysis of the Human PPI Network in the Context of Aging -- 5.4.2.2 Dynamic Analysis of the Human PPI Network at Different Ages -- 5.4.2.3 Transfer of Aging-related Knowledge from Model Species to Human via Network Alignment (NA) -- 5.4.3 Disease -- 5.4.3.1 Cancer -- 5.4.3.2 Pathogenicity -- 5.4.4 Health-related Applications Beyond Computational Biology: Social Networks -- 5.5 Graphlet-based Software Tools -- 5.5.1 General-purpose Software for Graphlet Counting -- 5.5.2 Task-specific Graphlet-based Software -- 5.6 Exercises -- 5.7 Acknowledgment -- References -- 6 Unsupervised Learning: Cluster Analysis -- 6.1 Formal Definitions -- 6.1.1 Clustering -- 6.1.2 Data Formats -- 6.2 Cluster Analysis -- 6.3 Preprocessing -- 6.3.1 Normalization and Standardization -- 6.3.2 Feature Selection -- 6.3.3 Principal Component Analysis -- 6.4 Proximity Calculation -- 6.4.1 Continues Variables -- 6.4.1.1 Euclidean Distance -- 6.4.1.2 Minkowski Distance -- 6.4.1.3 Correlation -- 6.4.2 Categorical Values

6.4.2.1 Boolean Variables -- 6.4.2.2 General Categorical Variables -- 6.4.3 Practical Issues -- 6.5 Clustering Algorithms -- 6.5.1 Cluster Approaches -- 6.5.2 k-means -- 6.5.2.1 Algorithm -- 6.5.2.2 Initialization Strategies -- 6.5.2.3 Other Variants -- 6.5.3 Hierarchical Clustering -- 6.5.3.1 Algorithm -- 6.5.3.2 Linkage Functions -- 6.5.3.3 Discussion -- 6.5.4 DBSCAN -- 6.5.4.1 Algorithm -- 6.5.4.2 Discussions -- 6.5.5 Transitivity Clustering -- 6.5.5.1 Transitive Graph Projection Problem -- 6.5.5.2 Heuristic Solution -- 6.5.6 Discussion -- 6.6 Cluster Evaluation -- 6.6.1 External Cluster Evaluation -- 6.6.2 Internal Cluster Evaluation -- 6.6.3 Optimization Strategies -- 6.6.3.1 k is not a Parameter -- 6.6.3.2 k as a Parameter -- 6.6.3.3 The Gap Statistic -- 6.7 Final Remarks -- 6.8 Exercises -- References -- 7 Machine Learning for Data Integration in Cancer Precision Medicine: Matrix Factorization Approaches -- 7.1 Introduction -- 7.2 Precision Medicine -- 7.3 The Different Types of Data Integration Methods -- 7.3.1 Homogeneous and Heterogeneous Data Integration -- 7.3.2 Early, Intermediate, and Late Integration -- 7.3.3 Supervised, Unsupervised, and Semi-supervised Data Integration -- 7.4 Summary of Data-Integration Methods -- 7.4.1 Network-based Data Integration -- 7.4.2 Bayesian Approaches -- 7.4.3 Kernel-based Methods -- 7.5 Homogeneous Data Integration with Non-Negative Matrix Factorization -- 7.5.1 Principles and Properties -- 7.5.2 Solving NMF -- 7.5.3 Homogeneous Data Integration with NMF -- 7.5.3.1 Simultaneous Decomposition -- 7.5.3.2 Graph Regularization -- 7.6 Heterogeneous Data Integration with Non-Negative Matrix Tri-Factorization -- 7.6.1 Principle and Properties -- 7.6.2 Solving NMTF -- 7.6.2.1 Optimizing Non-Linear Constrained Continuous Optimization Problems -- 7.6.2.2 Applying KKT Conditions to NMTF

Summary The increased and widespread availability of large network data resources in recent years has resulted in a growing need for effective methods for their analysis. The challenge is to detect patterns that provide a better understanding of the data. However, this is not a straightforward task because of the size of the data sets and the computer power required for the analysis. The solution is to devise methods for approximately answering the questions posed, and these methods will vary depending on the data sets under scrutiny. This cutting-edge text introduces biological concepts and biotechnologies producing the data, graph and network theory, cluster analysis and machine learning, before discussing the thought processes and creativity involved in the analysis of large-scale biological and medical data sets, using a wide range of real-life examples. Bringing together leading experts, this text provides an ideal introduction to and insight into the interdisciplinary field of network data analysis in biomedicine

Bibliography Includes bibliographical references

Notes Print version record

Subject Medical informatics -- Data processing

Bioinformatics.

Medical Informatics

Computational Biology

Bioinformatics

Form Electronic book

Author Prulj, Nataa, editor

ISBN 9781108377706

110837770X

Permalink

146 results found. Sorted by relevance | date | title .