We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

q-bio.QM

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Quantitative Biology > Quantitative Methods

Title: SARS-Cov-2 RNA Sequence Classification Based on Territory Information

Authors: Jingwei Liu
Abstract: CovID-19 genetics analysis is critical to determine virus type,virus variant and evaluate vaccines. In this paper, SARS-Cov-2 RNA sequence analysis relative to region or territory is investigated. A uniform framework of sequence SVM model with various genetics length from short to long and mixed-bases is developed by projecting SARS-Cov-2 RNA sequence to different dimensional space, then scoring it according to the output probability of pre-trained SVM models to explore the territory or origin information of SARS-Cov-2. Different sample size ratio of training set and test set is also discussed in the data analysis. Two SARS-Cov-2 RNA classification tasks are constructed based on GISAID database, one is for mainland, Hongkong and Taiwan of China, and the other is a 6-class classification task (Africa, Asia, Europe, North American, South American\& Central American, Ocean) of 7 continents. For 3-class classification of China, the Top-1 accuracy rate can reach 82.45\% (train 60\%, test=40\%); For 2-class classification of China, the Top-1 accuracy rate can reach 97.35\% (train 80\%, test 20\%); For 6-class classification task of world, when the ratio of training set and test set is 20\% : 80\% , the Top-1 accuracy rate can achieve 30.30\%. And, some Top-N results are also given.
Comments: 7 figures
Subjects: Quantitative Methods (q-bio.QM); Machine Learning (cs.LG); Computation (stat.CO)
Cite as: arXiv:2101.03323 [q-bio.QM]
  (or arXiv:2101.03323v1 [q-bio.QM] for this version)

Submission history

From: Jingwei Liu [view email]
[v1] Sat, 9 Jan 2021 09:12:27 GMT (8215kb)

Link back to: arXiv, form interface, contact.