We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

q-bio.GN

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Quantitative Biology > Genomics

Title: Bacteriophage classification for assembled contigs using Graph Convolutional Network

Abstract: Motivation: Bacteriophages (aka phages), which mainly infect bacteria, play key roles in the biology of microbes. As the most abundant biological entities on the planet, the number of discovered phages is only the tip of the iceberg. Recently, many new phages have been revealed using high throughput sequencing, particularly metagenomic sequencing. Compared to the fast accumulation of phage-like sequences, there is a serious lag in taxonomic classification of phages. High diversity, abundance, and limited known phages pose great challenges for taxonomic analysis. In particular, alignment-based tools have difficulty in classifying fast accumulating contigs assembled from metagenomic data. Results: In this work, we present a novel semi-supervised learning model, named PhaGCN, to conduct taxonomic classification for phage contigs. In this learning model, we construct a knowledge graph by combining the DNA sequence features learned by convolutional neural network (CNN) and protein sequence similarity gained from gene-sharing network. Then we apply graph convolutional network (GCN) to utilize both the labeled and unlabeled samples in training to enhance the learning ability. We tested PhaGCN on both simulated and real sequencing data. The results clearly show that our method competes favorably against available phage classification tools.
Comments: 15 pages, 10 figures
Subjects: Genomics (q-bio.GN); Machine Learning (cs.LG)
Journal reference: Bioinformatics, Volume 37, Issue Supplement1, July 2021, Pages 25-33
DOI: 10.1093/bioinformatics/btab293
Cite as: arXiv:2102.03746 [q-bio.GN]
  (or arXiv:2102.03746v2 [q-bio.GN] for this version)

Submission history

From: Jiayu Shang [view email]
[v1] Sun, 7 Feb 2021 08:58:35 GMT (1823kb,D)
[v2] Sat, 4 Sep 2021 10:08:25 GMT (14852kb,D)

Link back to: arXiv, form interface, contact.