Current browse context:
math
Change to browse by:
References & Citations
Mathematics > Combinatorics
Title: Metric Dimension of Hamming Graphs and Applications to Computational Biology
(Submitted on 2 Jul 2020)
Abstract: Genetic sequencing has become an increasingly affordable and accessible source of genomic data in computational biology. This data is often represented as $k$-mers, i.e., strings of some fixed length $k$ with symbols chosen from a reference alphabet. In contrast, some of the most effective and well-studied machine learning algorithms require numerical representations of the data. The concept of metric dimension of the so-called Hamming graphs presents a promising way to address this issue. A subset of vertices in a graph is said to be resolving when the distances to those vertices uniquely characterize every vertex in the graph. The metric dimension of a graph is the size of a smallest resolving subset of vertices. Finding the metric dimension of a general graph is a challenging problem, NP-complete in fact. Recently, an efficient algorithm for finding resolving sets in Hamming graphs has been proposed, which suffices to uniquely embed $k$-mers into a real vector space. Since the dimension of the embedding is the cardinality of the associated resolving set, determining whether or not a node can be removed from a resolving set while keeping it resolving is of great interest. This can be quite challenging for large graphs since only a brute-force approach is known for checking whether a set is a resolving set or not. In this thesis, we characterize resolvability of Hamming graphs in terms of a linear system over a finite domain: a set of nodes is resolving if and only if the linear system has only a trivial solution over said domain. We can represent the domain as the roots of a polynomial system so the apparatus of Gr\"obner bases comes in handy to determine, whether or not a set of nodes is resolving. As proof of concept, we study the resolvability of Hamming graphs associated with octapeptides i.e. proteins sequences of length eight.
Link back to: arXiv, form interface, contact.