We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:

References & Citations


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Quantitative Biology > Populations and Evolution

Title: Building alternative consensus trees and supertrees using k-means and Robinson and Foulds distance

Abstract: Each gene has its own evolutionary history which can substantially differ from the evolutionary histories of other genes. For example, some individual genes or op-erons can be affected by specific horizontal gene transfer and recombination events. Thus, the evolutionary history of each gene should be represented by its own phylogenetic tree which may display different evolutionary patterns from the species tree that accounts for the main patterns of vertical descent. The output of traditional consensus tree or supertree inference methods is a unique consensus tree or supertree. Here, we describe a new efficient method for inferring multiple alternative consensus trees and supertrees to best represent the most important evolutionary patterns of a given set of phylogenetic trees (i.e. additive trees or X-trees). We show how a specific version of the popular k-means clustering algo-rithm, based on some interesting properties of the Robinson and Foulds topologi-cal distance, can be used to partition a given set of trees into one (when the data are homogeneous) or multiple (when the data are heterogeneous) cluster(s) of trees. We adapt the popular Cali\'nski-Harabasz, Silhouette, Ball and Hall, and Gap cluster validity indices to tree clustering with k-means. A special attention is paid to the relevant but very challenging problem of inferring alternative supertrees, built from phylogenies constructed for different, but mutually overlapping, sets of taxa. The use of the Euclidean approximation in the objective function of the method makes it faster than the existing tree clustering techniques, and thus per-fectly suitable for the analysis of large genomic datasets. In this study, we apply it to discover alternative supertrees characterizing the main patterns of evolution of SARS-CoV-2 and the related betacoronaviruses
Comments: submitted
Subjects: Populations and Evolution (q-bio.PE)
Cite as: arXiv:2103.13343 [q-bio.PE]
  (or arXiv:2103.13343v1 [q-bio.PE] for this version)

Submission history

From: Vladimir Makarenkov [view email]
[v1] Wed, 24 Mar 2021 17:04:00 GMT (1658kb)

Link back to: arXiv, form interface, contact.