We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.FL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Formal Languages and Automata Theory

Title: Generating Tokenizers with Flat Automata

Authors: Hans de Nivelle (School of Engineering and Digital Sciences, Nazarbayev University, Nursultan-City, Kazakkhstan), Dina Muktubayeva (School of Engineering and Digital Sciences, Nazarbayev University, Nursultan-City, Kazakhstan)
Abstract: We introduce flat automata for automatic generation of tokenizers. Flat automata are a simple representation of standard finite automata. Using the flat representation, automata can be easily constructed, combined and printed.
Due to the use of border functions, flat automata are more compact than standard automata in the case where intervals of characters are attached to transitions, and the standard algorithms on automata are simpler.
We give the standard algorithms for tokenizer construction with automata, namely construction using regular operations, determinization, and minimization. We prove their correctness. The algorithms work with intervals of characters, but are not more complicated than their counterparts on single characters. It is easy to generate C++ code from the final deterministic automaton. All procedures have been implemented in C++ and are publicly available. The implementation has been used in applications and in teaching.
Comments: In Proceedings GandALF 2022, arXiv:2209.09333. An implementation of flat automata can be found on: www.compiler-tools.eu
Subjects: Formal Languages and Automata Theory (cs.FL)
ACM classes: F.1.1; F.4.3; B.1.4; D.3.4
Journal reference: EPTCS 370, 2022, pp. 66-80
DOI: 10.4204/EPTCS.370.5
Cite as: arXiv:2209.10313 [cs.FL]
  (or arXiv:2209.10313v1 [cs.FL] for this version)

Submission history

From: EPTCS [view email]
[v1] Wed, 21 Sep 2022 12:44:23 GMT (41kb)

Link back to: arXiv, form interface, contact.