Probabilistically Sampled and Spectrally Clustered Plant Genotypes using Phenotypic Characteristics

Shastri, Aditya A.; Ahuja, Kapil; Ratnaparkhe, Milind B.; Busnel, Yann

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2009

Computer Science > Machine Learning

Title: Probabilistically Sampled and Spectrally Clustered Plant Genotypes using Phenotypic Characteristics

Authors: Aditya A. Shastri, Kapil Ahuja, Milind B. Ratnaparkhe, Yann Busnel

(Submitted on 18 Sep 2020)

Abstract: Clustering genotypes based upon their phenotypic characteristics is used to obtain diverse sets of parents that are useful in their breeding programs. The Hierarchical Clustering (HC) algorithm is the current standard in clustering of phenotypic data. This algorithm suffers from low accuracy and high computational complexity issues. To address the accuracy challenge, we propose the use of Spectral Clustering (SC) algorithm. To make the algorithm computationally cheap, we propose using sampling, specifically, Pivotal Sampling that is probability based. Since application of samplings to phenotypic data has not been explored much, for effective comparison, another sampling technique called Vector Quantization (VQ) is adapted for this data as well. VQ has recently given promising results for genome data.
The novelty of our SC with Pivotal Sampling algorithm is in constructing the crucial similarity matrix for the clustering algorithm and defining probabilities for the sampling technique. Although our algorithm can be applied to any plant genotypes, we test it on the phenotypic data obtained from about 2400 Soybean genotypes. SC with Pivotal Sampling achieves substantially more accuracy (in terms of Silhouette Values) than all the other proposed competitive clustering with sampling algorithms (i.e. SC with VQ, HC with Pivotal Sampling, and HC with VQ). The complexities of our SC with Pivotal Sampling algorithm and these three variants are almost same because of the involved sampling. In addition to this, SC with Pivotal Sampling outperforms the standard HC algorithm in both accuracy and computational complexity. We experimentally show that we are up to 45% more accurate than HC in terms of clustering accuracy. The computational complexity of our algorithm is more than a magnitude lesser than HC.

Comments:	16 Pages, 3 Figures, and 6 Tables
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
MSC classes:	92B05, 68T09
ACM classes:	I.2.1; J.3
Cite as:	arXiv:2009.09028 [cs.LG]
	(or arXiv:2009.09028v1 [cs.LG] for this version)

Submission history

From: Kapil Ahuja [view email]
[v1] Fri, 18 Sep 2020 18:59:00 GMT (38kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2009.09028

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Probabilistically Sampled and Spectrally Clustered Plant Genotypes using Phenotypic Characteristics

Submission history