We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

math.CO

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Mathematics > Combinatorics

Title: Metric Dimension of Hamming Graphs and Applications to Computational Biology

Authors: Lucas Laird
Abstract: Genetic sequencing has become an increasingly affordable and accessible source of genomic data in computational biology. This data is often represented as $k$-mers, i.e., strings of some fixed length $k$ with symbols chosen from a reference alphabet. In contrast, some of the most effective and well-studied machine learning algorithms require numerical representations of the data. The concept of metric dimension of the so-called Hamming graphs presents a promising way to address this issue. A subset of vertices in a graph is said to be resolving when the distances to those vertices uniquely characterize every vertex in the graph. The metric dimension of a graph is the size of a smallest resolving subset of vertices. Finding the metric dimension of a general graph is a challenging problem, NP-complete in fact. Recently, an efficient algorithm for finding resolving sets in Hamming graphs has been proposed, which suffices to uniquely embed $k$-mers into a real vector space. Since the dimension of the embedding is the cardinality of the associated resolving set, determining whether or not a node can be removed from a resolving set while keeping it resolving is of great interest. This can be quite challenging for large graphs since only a brute-force approach is known for checking whether a set is a resolving set or not. In this thesis, we characterize resolvability of Hamming graphs in terms of a linear system over a finite domain: a set of nodes is resolving if and only if the linear system has only a trivial solution over said domain. We can represent the domain as the roots of a polynomial system so the apparatus of Gr\"obner bases comes in handy to determine, whether or not a set of nodes is resolving. As proof of concept, we study the resolvability of Hamming graphs associated with octapeptides i.e. proteins sequences of length eight.
Subjects: Combinatorics (math.CO); Quantitative Methods (q-bio.QM)
Cite as: arXiv:2007.01337 [math.CO]
  (or arXiv:2007.01337v1 [math.CO] for this version)

Submission history

From: Lucas Laird [view email]
[v1] Thu, 2 Jul 2020 18:36:51 GMT (430kb,D)

Link back to: arXiv, form interface, contact.