We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Machine Learning

Title: FsNet: Feature Selection Network on High-dimensional Biological Data

Abstract: Biological data are generally high-dimensional and require efficient machine learning methods that are well generalized and scalable to discover their complex nonlinear patterns. The recent advances in the domain of artificial intelligence and machine learning can be attributed to deep neural networks (DNNs) because they accomplish a variety of tasks in computer vision and natural language processing. However, standard DNNs are not suitable for handling high-dimensional data and data with small number of samples because they require a large pool of computing resources as well as plenty of samples to learn a large number of parameters. In particular, although interpretability is important for high-dimensional biological data such as gene expression data, a nonlinear feature selection algorithm for DNN models has not been fully investigated. In this paper, we propose a novel nonlinear feature selection method called the Feature Selection Network (FsNet), which is a scalable concrete neural network architecture, under high-dimensional and small number of samples setups. Specifically, our network consists of a selector layer that uses a concrete random variable for discrete feature selection and a supervised deep neural network regularized with the reconstruction loss. Because a large number of parameters in the selector and reconstruction layer can easily cause overfitting under a limited number of samples, we use two tiny networks to predict the large virtual weight matrices of the selector and reconstruction layers. The experimental results on several real-world high-dimensional biological datasets demonstrate the efficacy of the proposed approach.
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as: arXiv:2001.08322 [cs.LG]
  (or arXiv:2001.08322v1 [cs.LG] for this version)

Submission history

From: Makoto Yamada [view email]
[v1] Thu, 23 Jan 2020 00:49:57 GMT (162kb,D)
[v2] Tue, 29 Sep 2020 06:46:11 GMT (275kb,D)
[v3] Fri, 18 Dec 2020 00:48:37 GMT (275kb,D)

Link back to: arXiv, form interface, contact.