We gratefully acknowledge support from
the Simons Foundation and member institutions.

Quantitative Methods

New submissions

[ total of 8 entries: 1-8 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Mon, 13 Jul 20

[1]  arXiv:2007.05050 [pdf]
Title: PepSIRF: a flexible and comprehensive tool for the analysis of data from highly-multiplexed DNA-barcoded peptide assays
Comments: 5 pages, 1 figure
Subjects: Quantitative Methods (q-bio.QM); Genomics (q-bio.GN)

By coupling peptides with DNA tags (i.e., 'barcodes'), it is now possible to harness high-throughput sequencing (HTS) technologies to enable highly multiplexed peptide-based assays, which have a variety of potential applications including broad characterization of the epitopes recognized by antibodies. While the processing of HTS data, in general, is already well supported, there are very few software tools that have been developed for working with data generated in these highly-multiplexed peptide assays. In order to fill this gap, we present PepSIRF (Peptide-based Serological Immune Response Framework), which is a flexible and comprehensive software package designed specifically for the analysis of HTS data from highly-multiplexed peptide-based assays.

[2]  arXiv:2007.05401 [pdf, other]
Title: Learning Heat Diffusion for Network Alignment
Comments: 4 Pages, 2 figures
Journal-ref: Presented at the ICML 2020 Workshop on Computational Biology (WCB)
Subjects: Quantitative Methods (q-bio.QM); Physics and Society (physics.soc-ph); Molecular Networks (q-bio.MN)

Networks are abundant in the life sciences. Outstanding challenges include how to characterize similarities between networks, and in extension how to integrate information across networks. Yet, network alignment remains a core algorithmic problem. Here, we present a novel learning algorithm called evolutionary heat diffusion-based network alignment (EDNA) to address this challenge. EDNA uses the diffusion signal as a proxy for computing node similarities between networks. Comparing EDNA with state-of-the-art algorithms on a popular protein-protein interaction network dataset, using four different evaluation metrics, we achieve (i) the most accurate alignments, (ii) increased robustness against noise, and (iii) superior scaling capacity. The EDNA algorithm is versatile in that other available network alignments/embeddings can be used as an initial baseline alignment, and then EDNA works as a wrapper around them by running the evolutionary diffusion on top of them. In conclusion, EDNA outperforms state-of-the-art methods for network alignment, thus setting the stage for large-scale comparison and integration of networks.

Cross-lists for Mon, 13 Jul 20

[3]  arXiv:2007.05055 (cross-list from cs.LG) [pdf]
Title: SARS-CoV-2 virus RNA sequence classification and geographical analysis with convolutional neural networks approach
Authors: Selcuk Yazar
Subjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)

Covid-19 infection, which spread to the whole world in December 2019 and is still active, caused more than 250 thousand deaths in the world today. Researches on this subject have been focused on analyzing the genetic structure of the virus, developing vaccines, the course of the disease, and its source. In this study, RNA sequences belonging to the SARS-CoV-2 virus are transformed into gene motifs with two basic image processing algorithms and classified with the convolutional neural network (CNN) models. The CNN models achieved an average of 98% Area Under Curve(AUC) value was achieved in RNA sequences classified as Asia, Europe, America, and Oceania. The resulting artificial neural network model was used for phylogenetic analysis of the variant of the virus isolated in Turkey. The classification results reached were compared with gene alignment values in the GISAID database, where SARS-CoV-2 virus records are kept all over the world. Our experimental results have revealed that now the detection of the geographic distribution of the virus with the CNN models might serve as an efficient method.

[4]  arXiv:2007.05114 (cross-list from stat.ME) [pdf, other]
Title: Analyzing the Effects of Observation Function Selection in Ensemble Kalman Filtering for Epidemic Models
Comments: 25 pages, 11 figures
Subjects: Methodology (stat.ME); Quantitative Methods (q-bio.QM); Applications (stat.AP); Computation (stat.CO)

The Ensemble Kalman Filter (EnKF) is a Bayesian filtering algorithm utilized in estimating unknown model states and parameters for nonlinear systems. An important component of the EnKF is the observation function, which connects the unknown system variables with the observed data. These functions take different forms based on modeling assumptions with respect to the available data and relevant system parameters. The goal of this research is to analyze the effects of observation function selection in the EnKF in the setting of epidemic modeling, where a variety of observation functions are used in the literature. In particular, four observation functions of different forms and various levels of complexity are examined in connection with the classic Susceptible-Infectious-Recovered (SIR) model. Results demonstrate the importance of choosing an observation function that well interprets the available data on the corresponding EnKF estimates in several filtering scenarios, including state estimation with known parameters, and combined state and parameter estimation with both constant and time-varying parameters. Numerical experiments further illustrate how modifying the observation noise covariance matrix in the filter can help to account for uncertainty in the observation function in certain cases.

[5]  arXiv:2007.05351 (cross-list from cs.CV) [pdf, other]
Title: Are pathologist-defined labels reproducible? Comparison of the TUPAC16 mitotic figure dataset with an alternative set of labels
Comments: 10 pages, submitted to LABELS@MICCAI 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Quantitative Methods (q-bio.QM)

Pathologist-defined labels are the gold standard for histopathological data sets, regardless of well-known limitations in consistency for some tasks. To date, some datasets on mitotic figures are available and were used for development of promising deep learning-based algorithms. In order to assess robustness of those algorithms and reproducibility of their methods it is necessary to test on several independent datasets. The influence of different labeling methods of these available datasets is currently unknown. To tackle this, we present an alternative set of labels for the images of the auxiliary mitosis dataset of the TUPAC16 challenge. Additional to manual mitotic figure screening, we used a novel, algorithm-aided labeling process, that allowed to minimize the risk of missing rare mitotic figures in the images. All potential mitotic figures were independently assessed by two pathologists. The novel, publicly available set of labels contains 1,999 mitotic figures (+28.80%) and additionally includes 10,483 labels of cells with high similarities to mitotic figures (hard examples). We found significant difference comparing F_1 scores between the original label set (0.549) and the new alternative label set (0.735) using a standard deep learning object detection architecture. The models trained on the alternative set showed higher overall confidence values, suggesting a higher overall label consistency. Findings of the present study show that pathologists-defined labels may vary significantly resulting in notable difference in the model performance. Comparison of deep learning-based algorithms between independent datasets with different labeling methods should be done with caution.

Replacements for Mon, 13 Jul 20

[6]  arXiv:2005.06239 (replaced) [pdf, other]
Title: Classification of particle trajectories in living cells: machine learning versus statistical testing hypothesis for fractional anomalous diffusion
Comments: 32 pages, 5 figures
Subjects: Quantitative Methods (q-bio.QM); Biological Physics (physics.bio-ph)
[7]  arXiv:2003.00110 (replaced) [pdf]
Title: Technology dictates algorithms: Recent developments in read alignment
Subjects: Genomics (q-bio.GN); Quantitative Methods (q-bio.QM)
[8]  arXiv:2004.13178 (replaced) [pdf]
Title: What Can We Estimate from Fatality and Infectious Case Data using the Susceptible-Infected-Removed (SIR) model? A case Study of Covid-19 Pandemic
Subjects: Populations and Evolution (q-bio.PE); Quantitative Methods (q-bio.QM)
[ total of 8 entries: 1-8 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, q-bio, recent, 2007, contact, help  (Access key information)