Malmö University Publications
Operational message
There are currently operational disruptions. Troubleshooting is in progress.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A combined strategy of feature selection and machine learning to identify predictors of prediabetes
Department of Clinical Sciences, Faculty of Medicine, Lund University, Lund,Sweden; Department of General Practice, School of Primary and Allied Health Care, Faculty of Medicine, Nursing, and Health Sciences, Monash University, Notting Hill, Australia.
Malmö University, Faculty of Odontology (OD). Swedish Dental Service of Skåne, Lund, Sweden.ORCID iD: 0000-0001-8298-539X
Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, Minneapolis, Minnesota, USA.
2020 (English)In: JAMIA Journal of the American Medical Informatics Association, ISSN 1067-5027, E-ISSN 1527-974X, Vol. 27, no 3, p. 396-406Article in journal (Refereed) Published
Abstract [en]

OBJECTIVE: To identify predictors of prediabetes using feature selection and machine learning on a nationally representative sample of the US population.

MATERIALS AND METHODS: We analyzed n = 6346 men and women enrolled in the National Health and Nutrition Examination Survey 2013-2014. Prediabetes was defined using American Diabetes Association guidelines. The sample was randomly partitioned to training (n = 3174) and internal validation (n = 3172) sets. Feature selection algorithms were run on training data containing 156 preselected exposure variables. Four machine learning algorithms were applied on 46 exposure variables in original and resampled training datasets built using 4 resampling methods. Predictive models were tested on internal validation data (n = 3172) and external validation data (n = 3000) prepared from National Health and Nutrition Examination Survey 2011-2012. Model performance was evaluated using area under the receiver operating characteristic curve (AUROC). Predictors were assessed by odds ratios in logistic models and variable importance in others. The Centers for Disease Control (CDC) prediabetes screening tool was the benchmark to compare model performance.

RESULTS: Prediabetes prevalence was 23.43%. The CDC prediabetes screening tool produced 64.40% AUROC. Seven optimal (≥ 70% AUROC) models identified 25 predictors including 4 potentially novel associations; 20 by both logistic and other nonlinear/ensemble models and 5 solely by the latter. All optimal models outperformed the CDC prediabetes screening tool (P < 0.05).

DISCUSSION: Combined use of feature selection and machine learning increased predictive performance outperforming the recommended screening tool. A range of predictors of prediabetes was identified.

CONCLUSION: This work demonstrated the value of combining feature selection with machine learning to identify a wide range of predictors that could enhance prediabetes prediction and clinical decision-making.

Place, publisher, year, edition, pages
Oxford University Press, 2020. Vol. 27, no 3, p. 396-406
Keywords [en]
NHANES, feature selection, machine learning, prediabetes, predictors
National Category
Endocrinology and Diabetes
Identifiers
URN: urn:nbn:se:mau:diva-14267DOI: 10.1093/jamia/ocz204ISI: 000548302800007PubMedID: 31889178Scopus ID: 2-s2.0-85079353320OAI: oai:DiVA.org:mau-14267DiVA, id: diva2:1417786
Available from: 2020-03-30 Created: 2020-03-30 Last updated: 2024-06-17Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textPubMedScopus

Authority records

Jönsson, Daniel

Search in DiVA

By author/editor
Jönsson, Daniel
By organisation
Faculty of Odontology (OD)
In the same journal
JAMIA Journal of the American Medical Informatics Association
Endocrinology and Diabetes

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 98 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf