University of Tasmania
154052 - single nucleotide polymorphism genes.pdf (2.23 MB)

Single nucleotide polymorphism genes and mitochondrial DNA haplogroups as biomarkers for early prediction of knee osteoarthritis structural progressors: use of supervised machine learning classifiers

Download (2.23 MB)
journal contribution
posted on 2023-05-21, 14:37 authored by Bonakdari, H, Pelletier, JP, Blanco, FJ, Rego-Perez, I, Duran-Sotuela, A, Dawn AitkenDawn Aitken, Graeme JonesGraeme Jones, Cicuttini, F, Jamshidi, A, Abram, F, Martel-Pelletier, J

Background: Knee osteoarthritis is the most prevalent chronic musculoskeletal debilitating disease. Current treatments are only symptomatic, and to improve this, we need a robust prediction model to stratify patients at an early stage according to the risk of joint structure disease progression. Some genetic factors, including single nucleotide polymorphism (SNP) genes and mitochondrial (mt)DNA haplogroups/clusters, have been linked to this disease. For the first time, we aim to determine, by using machine learning, whether some SNP genes and mtDNA haplogroups/clusters alone or combined could predict early knee osteoarthritis structural progressors.

Methods: Participants (901) were first classified for the probability of being structural progressors. Genotyping included SNP genes TP63, FTO, GNL3, DUS4L, GDF5, SUPT3H, MCF2L, and TGFA; mtDNA haplogroups H, J, T, Uk, and others; and clusters HV, TJ, KU, and C-others. They were considered for prediction with major risk factors of osteoarthritis, namely, age and body mass index (BMI). Seven supervised machine learning methodologies were evaluated. The support vector machine was used to generate gender-based models. The best input combination was assessed using sensitivity and synergy analyses. Validation was performed using tenfold cross-validation and an external cohort (TASOAC).

Results: From 277 models, two were defined. Both used age and BMI in addition for the first one of the SNP genes TP63, DUS4L, GDF5, and FTO with an accuracy of 85.0%; the second profits from the association of mtDNA haplogroups and SNP genes FTO and SUPT3H with 82.5% accuracy. The highest impact was associated with the haplogroup H, the presence of CT alleles for rs8044769 at FTO, and the absence of AA for rs10948172 at SUPT3H. Validation accuracy with the cross-validation (about 95%) and the external cohort (90.5%, 85.7%, respectively) was excellent for both models.

Conclusions: This study introduces a novel source of decision support in precision medicine in which, for the first time, two models were developed consisting of (i) age, BMI, TP63, DUS4L, GDF5, and FTO and (ii) the optimum one as it has one less variable: age, BMI, mtDNA haplogroup, FTO, and SUPT3H. Such a framework is translational and would benefit patients at risk of structural progressive knee osteoarthritis.


Publication title

BMC Medicine



Article number









Menzies Institute for Medical Research


BioMed Central

Place of publication

United Kingdom

Rights statement

© The Author(s). 2022 Open Access This article is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License, ( which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

Repository Status

  • Open

Socio-economic Objectives

Determinants of health

Usage metrics

    University Of Tasmania


    Ref. manager