We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

q-bio

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Quantitative Biology > Genomics

Title: ItLnc-BXE: a Bagging-XGBoost-ensemble method with multiple features for identification of plant lncRNAs

Abstract: Motivation: Since long non-coding RNAs (lncRNAs) have involved in a wide range of functions in cellular and developmental processes, an increasing number of methods have been proposed for distinguishing lncRNAs from coding RNAs. However, most of the existing methods are designed for lncRNAs in animal systems, and only a few methods focus on the plant lncRNA identification. Different from lncRNAs in animal systems, plant lncRNAs have distinct characteristics. It is desirable to develop a computational method for accurate and robust identification of plant lncRNAs. Results: Herein, we present a plant lncRNA identification method ItLnc-BXE, which utilizes multiple features and the ensemble learning strategy. First, a diversity of lncRNA features is collected and filtered by feature selection to represent RNA transcripts. Then, several base learners are trained and further combined into a single meta-learner by ensemble learning, and thus an ItLnc-BXE model is constructed. ItLnc-BXE models are evaluated on datasets of six plant species, the results show that ItLnc-BXE outperforms other state-of-the-art plant lncRNA identification methods, achieving better and robust performances (AUC>95.91%). We also perform some experiments about cross-species lncRNA identification, and the results indicate that dicots-based and monocots-based models can be used to accurately identify lncRNAs in lower plant species, such as mosses and algae. Availability: source codes are available at this https URL Contact: zhangwen@mail.hzau.edu.cn (or) zhangwen@whu.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.
Comments: 7 pages, 3 figures, 4 tables
Subjects: Genomics (q-bio.GN); Machine Learning (cs.LG)
Cite as: arXiv:1911.00185 [q-bio.GN]
  (or arXiv:1911.00185v2 [q-bio.GN] for this version)

Submission history

From: Ziru Liu [view email]
[v1] Fri, 1 Nov 2019 02:28:19 GMT (979kb)
[v2] Fri, 24 Jan 2020 09:04:28 GMT (1319kb)

Link back to: arXiv, form interface, contact.