%PDF-1.4
%
2 0 obj
<< /Type /OutputIntent /DestOutputProfile 1 0 R
/Info (sRGB IEC61966 v2.1 with black scaling)
/OutputConditionIdentifier (sRGB_IEC61966-2-1_black_scaled)
/RegistryName (http://www.color.org) /S /GTS_PDFA1 >>
endobj
3 0 obj
<< /Type /Metadata /Subtype /XML /Length 6624 >>
stream
http://ns.adobe.com/pdfx/1.3/
pdfx
PDF/X Schema
external
URL to an online version or preprint
AuthoritativeDomain
Text
PRISM metadata
http://prismstandard.org/namespaces/basic/2.2/
prism
aggregationType
Text
external
The type of publication. If defined, must be one of book, catalog, feed, journal, magazine, manual, newsletter, pamphlet.
pdfTeX
application/pdf
Leveraging Schema Labels to Enhance Dataset Search
Zhiyu Chen
Haiyan Jia
Jeff Heflin
Brian D. Davison
Copyright © 2020 Zhiyu Chen
A search engine's ability to retrieve desirable datasets is important for data sharing and reuse. Existing dataset search engines typically rely on matching queries to dataset descriptions. However, a user may not have enough prior knowledge to write a query using terms that match with description text.We propose a novel schema label generation model which generates possible schema labels based on dataset table content. We incorporate the generated schema labels into a mixed ranking model which not only considers the relevance between the query and dataset metadata but also the similarity between the query and generated schema labels. To evaluate our method on real-world datasets, we create a new benchmark specifically for the dataset retrieval task. Experiments show that our approach can effectively improve the precision and NDCG scores of the dataset retrieval task compared with baseline methods. We also test on a collection of Wikipedia tables to show that the features generated from schema labels can improve the unsupervised and supervised web table retrieval task as well.
dataset search
table retrieval
text normalization
data fusion
1
B
LaTeX with hyperref package
2022-02-13T14:29:16+00:00
2022-02-13T14:29:16+00:00
2022-02-13T14:29:16+00:00
True
Copyright © 2020 Zhiyu Chen
uuid:DF005B57-2A06-8753-7167-92D8CE935460
uuid:E85103CA-16DF-FF3D-C6F4-1CF6F89C9C99
endstream
endobj
7 0 obj
<< /D (chapter.1) /S /GoTo >>
endobj
10 0 obj
(Leveraging Schema Labels to Enhance Dataset Search)
endobj
11 0 obj
<< /D [ 12 0 R /Fit ] /S /GoTo >>
endobj
24 0 obj
<< /Filter /FlateDecode /Length 2278 >>
stream
xڕr6_,7ΙI6[Tr<@$,bCCP8_}i35Fot6+7W7l"ps{ x}@6
<ݫi]%QnT{ ?n?mvHsȣӾo+ԠWzfLn>M.4LQ8HbFN_+xQ佭t{`}TqJOɨk*E,#"ORIS~D81##Ֆӛ;{^`lײ l;)|rBъ9Tp8[uoͰ
sq+7衪5֑Lnpv]^ ^9wΗ{"* Z3j÷X&%.O+vy