We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Machine Learning

Title: Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank

Abstract: Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals, underscoring a critical gap in genetic research. Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data. We evaluate the performance of Group-LASSO INTERaction-NET (glinternet) and pretrained lasso in disease prediction focusing on diverse ancestries in the UK Biobank. Models were trained on data from White British and other ancestries and validated across a cohort of over 96,000 individuals for 8 diseases. Out of 96 models trained, we report 16 with statistically significant incremental predictive performance in terms of ROC-AUC scores (p-value < 0.05), found for diabetes, arthritis, gall stones, cystitis, asthma and osteoarthritis. For the interaction and pretrained models that outperformed the baseline, the PRS score was the primary driver behind prediction. Our findings indicate that both interaction terms and pre-training can enhance prediction accuracy but for a limited set of diseases and moderate improvements in accuracy
Subjects: Machine Learning (cs.LG); Quantitative Methods (q-bio.QM); Applications (stat.AP); Computation (stat.CO)
Cite as: arXiv:2404.17626 [cs.LG]
  (or arXiv:2404.17626v2 [cs.LG] for this version)

Submission history

From: Thomas Le Menestrel [view email]
[v1] Fri, 26 Apr 2024 16:39:50 GMT (5347kb,D)
[v2] Tue, 7 May 2024 16:21:28 GMT (5889kb,D)

Link back to: arXiv, form interface, contact.