We gratefully acknowledge support from
the Simons Foundation and member institutions.

Databases

New submissions

[ total of 6 entries: 1-6 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Wed, 26 Jan 22

[1]  arXiv:2201.10179 [pdf, other]
Title: Flexible skylines, regret minimization and skyline ranking: a comparison to know how to select the right approach
Authors: Vittorio Fabris
Subjects: Databases (cs.DB)

Recent studies pointed out some limitations about classic top-k queries and skyline queries. Ranking queries impose the user to provide a specific scoring function, which can lead to the exclusion of interesting results because of the inaccurate estimation of the assigned weights. The skyline approach makes it difficult to always retrieve an accurate result, in particular when the user has to deal with a dataset whose tuples are defined by semantically different attributes. Therefore, to improve the quality of the final solutions, new techniques have been developed and proposed: here we will discuss about the flexible skyline, regret minimization and skyline ranking approaches. We present a comparison between the three different operators, recalling their way of behaving and defining a guideline for the readers so that it is easier for them to decide which one, among these three, is the best technique to apply to solve their problem.

[2]  arXiv:2201.10217 [pdf]
Title: Survey on Poisson's CDF applied to Flexible Skylines
Comments: 9 pages, 2 figures
Subjects: Databases (cs.DB)

The evolution of skyline and ranking queries has created new archetypes like flexible skylines, which have proven to be an efficient method to select relevant data from large datasets using multi objective optimization. This paper aims to study the possible applications of Poisson distribution mass function as a monotonic scoring function in flexible skyline processes, especially those featuring schemas whose attributes can be translated to constant mean rates. Moreover, a method to express users's requirement by means of the F-dominant set of tuples will be proposed using parametrical variations in F[1], simultaneously, algorithm construction and potential applications will be studied.

[3]  arXiv:2201.10442 [pdf, other]
Title: Serving Deep Learning Models with Deduplication from Relational Databases
Subjects: Databases (cs.DB)

There are significant benefits to serve deep learning models from relational databases. First, features extracted from databases do not need to be transferred to any decoupled deep learning systems for inferences, and thus the system management overhead can be significantly reduced. Second, in a relational database, data management along the storage hierarchy is fully integrated with query processing, and thus it can continue model serving even if the working set size exceeds the available memory. Applying model deduplication can greatly reduce the storage space, memory footprint, cache misses, and inference latency. However, existing data deduplication techniques are not applicable to the deep learning model serving applications in relational databases. They do not consider the impacts on model inference accuracy as well as the inconsistency between tensor blocks and database pages. This work proposed synergistic storage optimization techniques for duplication detection, page packing, and caching, to enhance database systems for model serving. We implemented the proposed approach in netsDB, an object-oriented relational database. Evaluation results show that our proposed techniques significantly improved the storage efficiency and the model inference latency, and serving models from relational databases outperformed existing deep learning frameworks when the working set size exceeds available memory.

Cross-lists for Wed, 26 Jan 22

[4]  arXiv:2201.10066 (cross-list from cs.CL) [pdf, other]
Title: Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources
Comments: 8 pages plus appendix and references
Subjects: Computation and Language (cs.CL); Databases (cs.DB)

In recent years, large-scale data collection efforts have prioritized the amount of data collected in order to improve the modeling capabilities of large language models. This prioritization, however, has resulted in concerns with respect to the rights of data subjects represented in data collections, particularly when considering the difficulty in interrogating these collections due to insufficient documentation and tools for analysis. Mindful of these pitfalls, we present our methodology for a documentation-first, human-centered data collection project as part of the BigScience initiative. We identified a geographically diverse set of target language groups (Arabic, Basque, Chinese, Catalan, English, French, Indic languages, Indonesian, Niger-Congo languages, Portuguese, Spanish, and Vietnamese, as well as programming languages) for which to collect metadata on potential data sources. To structure this effort, we developed our online catalogue as a supporting tool for gathering metadata through organized public hackathons. We present our development process; analyses of the resulting resource metadata, including distributions over languages, regions, and resource types; and our lessons learned in this endeavor.

[5]  arXiv:2201.10459 (cross-list from cs.LG) [pdf, other]
Title: FRAMED: Data-Driven Structural Performance Analysis of Community-Designed Bicycle Frames
Subjects: Machine Learning (cs.LG); Databases (cs.DB)

This paper presents a data-driven analysis of the structural performance of 4500 community-designed bicycle frames. We present FRAMED -- a parametric dataset of bicycle frames based on bicycles designed by bicycle practitioners from across the world. To support our data-driven approach, we also provide a dataset of structural performance values such as weight, displacements under load, and safety factors for all the bicycle frame designs. By exploring a diverse design space of frame design parameters and a set of ten competing design objectives, we present an automated way to analyze the structural performance of bicycle frames. Our structural simulations are validated against physical experimentation on bicycle frames. Through our analysis, we highlight overall trends in bicycle frame designs created by community members, study several bicycle frames under different loading conditions, identify non-dominated design candidates that perform well on multiple objectives, and explore correlations between structural objectives. Our analysis shows that over 75\% of bicycle frames created by community members are infeasible, motivating the need for AI agents to support humans in designing bicycles. This work aims to simultaneously serve researchers focusing on bicycle design as well as researchers focusing on the development of data-driven design algorithms, such as surrogate models and Deep Generative Methods. The dataset and code are provided at this http URL

Replacements for Wed, 26 Jan 22

[6]  arXiv:2110.08633 (replaced) [pdf, other]
Title: Hydra: A System for Large Multi-Model Deep Learning
Comments: 12 pages including references. Preprint
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Databases (cs.DB); Machine Learning (cs.LG)
[ total of 6 entries: 1-6 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2201, contact, help  (Access key information)