Projects

Our group is interested in finding new ways to early identify common preventable diseases. To do that we develop statistical and deep learning approaches and apply them to millions of health information from electronic health record/national health registries (e.g. ML4Health). We then integrate registry-based information with genetic and proteomics information from large biobank-based studies (e.g. Finngen) to help identify groups of individuals that can most benefit from existing pharmacological interventions. Finally, we aim to implement these approaches by recontacting individuals and establishing prospective, possibly randomized, clinical studies.

These are some examples of the projects we are running in the Lab. But there are many other projects, if you are interested in specific projects contact andrea.ganna@helsinki.fi

Human Genetics and -omics

Removing genomics from biomarkers to improve disease prediction

This project integrates genomic data with blood-based and clinical biomarkers to strengthen biomarker–disease associations and improve downstream prediction. Biomarker measurements, including plasma proteins and laboratory values, reflect a mixture of inherited and non-genetic influences. When a biomarker is correlated with disease but not causally involved, its genetic determinants can introduce unwanted variability and obscure clinically relevant signals. We therefore adjust biomarker measurements for their genetically predicted component (e.g., using polygenic scores) to better isolate non-genetic variation that may reflect modifiable biology or disease processes. This can reduce genetically driven confounding and, in some settings, sharpen associations and improve predictive performance. Beyond risk modeling, a key application is clinical trial design: genetics-adjusted biomarkers may enable cleaner participant stratification and more sensitive biomarker endpoints for detecting treatment effects.

Quantify the impact of genetic variation on healthcare costs - The GenCOST consortium

We are coordinating the GenCOST consortium, a large multinational consortia comprising over 2 million individuals from cohorts across Europe, America and Australia, also including several cohorts of diverse ancestry. This consortium aims to analyse the effects of genetic variation on healthcare costs in a diverse meta-analysis. The study will include a range of healthcare costs, including hospitalizations, medications, medical procedures and GP visits, estimating the total cost of an individual to a healthcare system. This will be done by leveraging numerous Biobanks and incorporating registry data.The study aims to find the genetic factors which are responsible for increased healthcare costs for an individual, analysing this from both a biological and public health perspective. It will also analyse the results from the perspective of socioeconomic status and in relation to several important disease outcomes. By understanding the genetic factors that contribute to healthcare costs, it may be possible to identify individuals who are at higher risk for certain health conditions and to develop more personalized and cost-effective treatment strategies

New methods to study the genetic of longitudinal clinical laboratory measures

Repeated clinical laboratory measurements serve as powerful indicators of an individual's health status and disease progression. These measurements are not only sensitive and objective but also provide valuable insights into underlying biological mechanisms. While it's relatively straightforward to identify genetic variants linked to the average or variability of these measurements through GWAS, the question of whether the trajectory of an individual's lab values is also genetically determined remains largely unexplored. In this project, we aim to uncover the genetic factors that shape the trajectory of lab values, leveraging 221 million measurements from 482,287 Finnish individuals in the FinnGen cohort. By gaining a deeper understanding of these genetic influences, we hope to develop clinically meaningful disease predictors that can better inform healthcare practices.

Machine learning to enhance phenotypes in GWAS

In many genetic analyses, the current approach is to define a phenotype by comparing diseased and healthy individuals. However, when using phenotypes derived from registry or electronic health record data, misclassification can occur. We use the comprehensive phenotype information in FinnGen to train a machine learning algorithm (gradient boosted tree) to improve phenotypes classification. The binary classification is replaced by probability values. By studying genetic associations with continuous disease liability instead of a binary phenotype can help identifying new genetic variants and improve the predictive power of polygenic risk scores.

The nature of nurture: the interplay between genetic and the social environment in impacting health

The goal of the project is to explore the interaction between genetic endowments and socio-economic conditions in shaping health and socio-economic outcomes in life. We do so by taking advantage of rich Finnish administrative data on socio-economic status, health, and genes. Many common diseases have a strong genetic component, but in order to understand better the development of such health conditions, we need to highlight which environmental factors are responsible for their emergence. Moreover, health conditions can themselves be attenuated or enhanced by socio-economic outcomes. Thus, it becomes important to consider both health and socioeconomic factors into account in the same setting. To explore these questions, we have linked approximately 180,000 individuals with genetic information available with details health and socio-economic information from Statistics Finland, including income, education and labour market. This project is carried out in collaboration with Stefano Lombardi at VATT, THL and Blood Service Biobank. See privacy notice.

Beyond Chromosomes: Unveiling the Sex-Specific Proteome and Its Impact on Human Health

This project aims to investigate the sex-specific human proteome and the underlying factors of variability both between and within the sexes. By analyzing a large-scale proteomics dataset from multiple studies, this project evaluates sex biases in proteins, analyzes correlation structures that drive sex differences in the proteome, and develops a machine-learning model to quantify proteomic sex and its variability. The goal is to deepen our understanding of how sex-specific proteomic profiles influence biological diversity and disease susceptibility, enhancing knowledge of sex differences beyond genetic factors alone.

AI for health

ML4health - AI foundation models for nationwide health and socio-economic data

ML4health combines health data with a wide range of other information from nearly the whole Finnish population. The research data include information about diagnoses, treatment of diseases, medications, laboratory values, medical notes, home care, and institutional care. On top of that, it includes basic personal information, living history, marriage history, pregnancies and births, education, job position, social assistance, and death times and reasons. By leveraging this high-resolution longitudinal data, we aim to develop self-supervised health data foundation models, and other modern AI approaches to improve prediction of phenotypes with scarce training data, such as rare cancers or disease progression and recurrence. These models are known to benefit from heterogeneous, large training data, making the Finnish registries ideal for development of such models. With our European collaborators, we will also study the performance of such models across different European healthcare systems. See privacy policy.

Improving the transferability of multimodal AI for disease risk prediction

The rising burden of non-communicable diseases underscores the urgent need for more effective preventive healthcare strategies to reduce growing societal and economic costs. Electronic Health Records (EHRs) offer valuable opportunities for predicting future disease risk by drawing on individuals' past health trajectories. However, current prediction models often rely heavily on country-specific coding systems and healthcare structures, limiting their transferability across different settings. At the same time, the integration of diverse data modalities—such as clinical, genomic, and lifestyle data—into these models remains limited, hindering their predictive power. This project aims to address both challenges by developing methods to enhance the portability of AI models across healthcare systems and to expand their capacity to incorporate multimodal data. Special attention will be paid to ensuring fairness in prediction, particularly in light of socioeconomic disparities that may affect model performance when applied across populations and countries.

PRIMUS consortium: machine learning for doctor-patient networks

In the PRIMUS consortium, we're setting out to create new ways to understand large-scale networks and make learning models that are tailored to each user, mainly to assist doctors in making better decisions. A big part of this project looks at how doctors' own backgrounds and behaviors might affect their decisions and the outcomes for their patients. We'll use machine learning to predict how patients might do based on things like what doctors specialize in, what medicines they use, and their own life experiences. We're also interested in seeing how major events in a doctor's life might change the way they care for patients. This project is done in collaboration with Sami Kaski at Aalto University/FCAI and Markus Perola at THL. See privacy notice.

Can AI identify people with the highest need for a blood test checkup ?

Clinical laboratory tests are essential to monitor overall health and diagnose various diseases. Yet, current testing strategies are inefficient and exacerbate societal inequalities due to a lack of systematicity. In this project, we develop and prospectively validate a deep-learning approach to identify individuals who would benefit most from targeted screening for clinically actionable biomarkers. We use the nationwide data from FinRegistry (N=5.2 million) and genetic information from FinnGen (N=520,00) to develop an AI model to predict laboratory value trajectories and prospectively validate the model by inviting a random subset of individuals with predicted abnormal kidney function for a screening.