Machine learning algorithms could support early diagnosis of HIV, Hepatitis B, and Hepatitis C in primary care settings
Researchers from the Clinical Effectiveness Group have collaborated on the first study to investigate the use of machine learning algorithms with primary care data to support testing for blood-borne viruses.
/filters:format(webp)/prod01/channel_366/clinical-effectiveness-group/media/ceg/images/Blood-samples-in-a-laboratory_fullsize.-Credit-solarseven-from-iStock-600X400.jpg)
Image: Blood samples in a laboratory. Solarseven from iStock.com.
The study, published in BMC Infectious Diseases, explores the feasibility of integrating machine learning with routine primary care data to improve testing for HIV, Hepatitis B, and Hepatitis C within the general population. The research is a collaboration with colleagues at University of Oxford and the UK Health Security Agency.
“Our findings suggest that combining digital technologies via tailored machine learning algorithms with routinely available primary care data has the potential to improve the efficiency, targeting, and timeliness of blood-borne viruses testing,” says Professor Jasmina Panovska-Griffiths, University of Oxford, co-lead author.
In high-income countries, testing for HIV, Hepatitis B, and Hepatitis C is generally performed in specialist settings such as drug misuse treatment centres and sexual health and antenatal clinics, which has led to considerable progress. However, people with undiagnosed infection exist outside of these settings. 50% of chronic Hepatitis B cases in England remain undiagnosed.
“Testing in primary care settings is largely opportunistic, highly variable and poorly targeted,” says Dr Werner Leber, Clinical Lecturer in Primary Care at Queen Mary University of London and study co-lead.
“Our recent systematic review of blood-borne virus risk prediction methods showed the lack of machine learning algorithms relevant to primary care and across HIV, Hepatitis B and Hepatitis C. To fill this gap, in this study we developed a suite of machine learning methods to estimate the likelihood that people with risk factors are likely to be living with one of these three blood-borne viruses.”
Improving testing strategies within the general population would ultimately support better patient outcomes and mitigate the public health burden associated with these infections. The study, funded by the NIHR School of Primary Care Research and co-produced with people living with blood-borne viruses and their representatives, is the first to explore the use of machine learning to support blood-borne virus testing in primary care.
Identifying the risk factors associated with infection
The study’s authors used a large-scale, anonymised dataset of 1.9 million GP patients in North London to train and test three machine learning algorithms, evaluating their performance in predicting positivity to HIV, Hepatitis B and Hepatitis C, or a combination thereof, and identifying risk factors associated with infection.
The researchers identified age as the most important risk factor for all infections considered. Sex and GP recorded ethnicity, drug and alcohol misuse, imprisonment, sexual behaviour, tattoo and transfusion records, associated co-morbidities, homelessness and migration status were other key factors associated with heightened risk of testing positive for blood-borne viruses.
Several predictors were shared between two infections, such as Black African ethnicity (HIV and Hepatitis B), liver disease (Hepatitis B and C), and opiate or cocaine use (Hepatitis B and C).
Among all individual infections, Hepatitis C was the most accurately predicted across all models. While no shared risk factors were identified across all three infections, the authors suggest that key predictors are largely a combination of established risk factors for individual blood-borne virus infection.
“We used the Balanced Random Forest Classifier as it is robust to overfitting, we fitted an AdaBoost model which addresses data imbalance and also explored Logistic Regression with balanced class weights as a simpler predictive approach”, said co-lead author Harrison Manley, UK Health Security Agency.
None of the models emerged as a clear winner in predicting individual HIV, Hepatitis B or C positivity, but the Logistic Regression model achieved a robust performance whilst also offering practical advantages due to its potential to be readily implemented in other settings.
“Evaluating different machine learning algorithms and applying a broad set of accuracy criteria when using digital technology is necessary for improved accuracy in real-life application of precision medicine”, adds Professor Panovska-Griffiths.
Future implications
The study does not intend to redefine diagnostic criteria for blood-borne viruses, it aims to improve testing recommendations by identifying risk combinations that are not included in current guidelines and do not currently trigger a blood-borne virus test.
The authors see this work as the first step towards identifying additional and more specific cohorts for blood-borne virus testing in general settings, strengthening the role of primary care in identifying individuals at risk and linking them to a wider network of specialist and community-based services.
According to the authors, participating primary care practices would be able to identify patients with at least one recorded blood-borne virus risk factor in their Electronic Health Record, and use the predictive algorithms developed in this study to generate a personalised risk score. GP practices would then be able to prioritise the most at-risk patients for blood testing. Patients with a positive test would then be referred to specialist services – including sexual health, hepatology or substance use services – for further assessment, appropriate care and support.
More information
- Read the paper: Application of machine-learning algorithms to identify the key determinants of risk for HIV, hepatitis C and hepatitis B in primary care settings. Harrison Manley, Werner Leber, Kelvin Smith, Hamzah Z. Farooq, Manish Pareek, Rebecca F. Baggaley, Jane Anderson, Leo Loman, Chris Griffiths, John Robson, Jasmina Panovska-Griffiths. BMC Infectious Diseases.
- For media information, please contact press@qmul.ac.uk