Naive Bayes Classifiers and One-hot Encoding of Categorical Variables

Williams, Christopher K. I.

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2404

Computer Science > Machine Learning

Title: Naive Bayes Classifiers and One-hot Encoding of Categorical Variables

Authors: Christopher K. I. Williams

(Submitted on 28 Apr 2024)

Abstract: This paper investigates the consequences of encoding a $K$-valued categorical variable incorrectly as $K$ bits via one-hot encoding, when using a Na\"{\i}ve Bayes classifier. This gives rise to a product-of-Bernoullis (PoB) assumption, rather than the correct categorical Na\"{\i}ve Bayes classifier. The differences between the two classifiers are analysed mathematically and experimentally. In our experiments using probability vectors drawn from a Dirichlet distribution, the two classifiers are found to agree on the maximum a posteriori class label for most cases, although the posterior probabilities are usually greater for the PoB case.

Comments:	7 pages, 3 figures
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2404.18190 [cs.LG]
	(or arXiv:2404.18190v1 [cs.LG] for this version)

Submission history

From: Chris Williams [view email]
[v1] Sun, 28 Apr 2024 14:04:58 GMT (205kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2404.18190

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Naive Bayes Classifiers and One-hot Encoding of Categorical Variables

Submission history