We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DB

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Databases

Title: Automated Metadata Harmonization Using Entity Resolution & Contextual Embedding

Abstract: ML Data Curation process typically consist of heterogeneous & federated source systems with varied schema structures; requiring curation process to standardize metadata from different schemas to an inter-operable schema. This manual process of Metadata Harmonization & cataloging slows efficiency of ML-Ops lifecycle. We demonstrate automation of this step with the help of entity resolution methods & also by using Cogntive Database's Db2Vec embedding approach to capture hidden inter-column & intra-column relationships which detect similarity of metadata and then predict metadata columns from source schemas to any standardized schemas. Apart from matching schemas, we demonstrate that it can also infer the correct ontological structure of the target data model.
Comments: Paper Accepted at Computing Conference, 2021 (Research Conference formerly called Science and Information (SAI) Conference). This is a replacement with change edit on conference status updated to "Accepted"
Subjects: Databases (cs.DB); Machine Learning (cs.LG)
Cite as: arXiv:2010.11827 [cs.DB]
  (or arXiv:2010.11827v2 [cs.DB] for this version)

Submission history

From: Kunal Sawarkar [view email]
[v1] Sat, 17 Oct 2020 02:14:15 GMT (4087kb,D)
[v2] Tue, 1 Dec 2020 16:23:05 GMT (2083kb,D)

Link back to: arXiv, form interface, contact.