Projects

Our group is interested in finding new ways to early identify common preventable diseases. To do that we develop statistical and deep learning approaches and apply them to millions of health information from electronic health record/national health registries (e.g. FinRegistry). We then integrate registry-based information with genetic and proteomics information from large biobank-based studies (e.g. Finngen) to help identify groups of individuals that can most benefit from existing pharmacological interventions. Finally, we aim to implement these approaches by recontacting individuals and establishing prospective, possibly randomized, clinical studies.

These are some examples of the projects we are running in the Lab. But there are many other projects, if you are interested in specific projects contact aganna@broadinstitute.org

 
 

Human Genetics-related projects

 
INTERVENE_andrea.jpg

INTERVENE consortium

We are coordinating INTERVENE, an international and interdisciplinary consortium that seeks to leverage the vast, but underused data resources to generate clinically actionable knowledge for improved understanding of diseases and treatment options tailored to individuals. Specifically, INTERVENE seeks to advance AI-facilitated analyses of complex medical data to develop genetic risk scores, which summarize the estimated effect of an individual’s genetic makeup on the risk of developing a particular disease. A central aim is to meet the urgent need for clinically validated risk scores with predictive value for complex and rare diseases, applicability for disease screening, and understandable by clinicians and citizens. The INTERVENE consortium consists of 17 leading research and other organizations representing seven EU members states (Finland, Germany, Italy, Estonia, Austria, Belgium, and the Netherlands) as well as Norway, the United Kingdom and the USA.

 

Quantify the impact of genetic variation on healthcare costs - The GenCOST consortium

We are coordinating the GenCOST consortium, a large multinational consortia comprising over 2 million individuals from cohorts across Europe, America and Australia, also including several cohorts of diverse ancestry. This consortium aims to analyse the effects of genetic variation on healthcare costs in a diverse meta-analysis. The study will include a range of healthcare costs, including hospitalizations, medications, medical procedures and GP visits, estimating the total cost of an individual to a healthcare system. This will be done by leveraging numerous Biobanks and incorporating registry data.The study aims to find the genetic factors which are responsible for increased healthcare costs for an individual, analysing this from both a biological and public health perspective. It will also analyse the results from the perspective of socioeconomic status and in relation to several important disease outcomes. By understanding the genetic factors that contribute to healthcare costs, it may be possible to identify individuals who are at higher risk for certain health conditions and to develop more personalized and cost-effective treatment strategies

 

New methods to study the genetic architecture of disease progression

Understanding disease progression is of high biological and clinical interest. Unlike disease susceptibility whose genetic basis has been abundantly studied, less is known about the genetics of disease progression and its overlap with disease susceptibility. In this project, we develop new methods to better study the genetic variation underlying disease progression. For example, we evaluate longitudinal changes in biomarkers and their genetics bases.

 

The GENEROOS study: a randomized diet intervention study in overweight individuals with high or low genetic risk for obesity

GENEROOS aims to determine whether a body mass index (BMI) polygenic score impacts the effectiveness of dietary/lifestyle intervention in reducing BMI among overweight individuals. This study leverages Finland's distinctive research environment, which allows access to a vast pool of individuals who can be recontacted and whose genetic data is stored in biobanks. Our goal in GENEROOS is to enlist aprox 200 participants to demonstrate the practicality of genetic-based recontacting methods. Additionally, we plan to investigate variations in blood biomarkers and high-throughput proteomics before and after the dietary interventions.

 

Genetic modifiers of GLP1-RA efficacy - The GLP1WL consortium

Glucagon-like peptide-1 receptor agonists (GLP1-RAs) represent a growing class of treatments for type 2 diabetes, gaining popularity due to their notable effects on weight loss. While randomized controlled trials have established the body weight-lowering efficacy of GLP1-RAs, understanding the magnitude of this effect in real-world patients remains a subject requiring further investigation.

The primary objective of this consortium, which combines data from multiple international biobanks, is to investigate the genetic bases of heterogeneity in weight loss outcomes associated with the use of GLP1-RA. The secondary objective is to assess and compare the weight loss effects of the pharmacological treatment with those of conservative surgical intervention.

 

Machine learning to enhance phenotypes in GWAS

In many genetic analyses, the current approach is to define a phenotype by comparing diseased and healthy individuals. However, when using phenotypes derived from registry or electronic health record data, misclassification can occur. We use the comprehensive phenotype information in FinnGen to train a machine learning algorithm (gradient boosted tree) to improve phenotypes classification. The binary classification is replaced by probability values. By studying genetic associations with continuous disease liability instead of a binary phenotype can help identifying new genetic variants and improve the predictive power of polygenic risk scores.

 

Genetically-informed emulated clinical trials

Using nationwide information on drug prescription/purchases and disease outcomes in FinRegistry, we aim to answer causal questions regarding the effects of pharmacotherapies. We both investigate potential adverse effects to inform medical practice and explore beneficial effects, including the potential for drug-repurposing. Our main approach is emulating target trials using observational data. Our semi-automated framework allows both for hypothesis-free exploration of effects across diseases and facilitates the quick analysis of pre-specified questions. In parallel, we implement various study designs, involving machine learning, within-family, and self-controlled studies, to rigorously assess causal effects. To enhance the precision of our causal inferences, we leverage genetics within FinnGen as instrumental variables, allowing us to detect and control for residual confounding factors. This integration provides a nuanced understanding of the genetic components influencing drug responses and disease outcomes. This multifaceted strategy ensures a comprehensive evaluation of the intricate relationships between pharmacotherapies, genetic factors, and disease outcomes.

 

The nature of nurture: the interplay between genetic and the social environment in impacting health

The goal of the project is to explore the interaction between genetic endowments and socio-economic conditions in shaping health and socio-economic outcomes in life. We do so by taking advantage of rich Finnish administrative data on socio-economic status, health, and genes. Many common diseases have a strong genetic component, but in order to understand better the development of such health conditions, we need to highlight which environmental factors are responsible for their emergence. Moreover, health conditions can themselves be attenuated or enhanced by socio-economic outcomes. Thus, it becomes important to consider both health and socioeconomic factors into account in the same setting. To explore these questions, we have linked approximately 180,000 individuals with genetic information available with details health and socio-economic information from Statistics Finland, including income, education and labour market. This project is carried out in collaboration with Stefano Lombardi at VATT and Blood Service Biobank.

 
 

Electronic health records-related projects

 
grant_schema_Aof.png

Develop deep learning approaches to model and generate disease trajectories from nation-wide registries

We aim to develop novel deep-learning approaches based on long short-term memory recurrent neural networks that leverage nation-wide information about diagnoses, medications, familial risk and socio-demographic indicators at an unprecedented scale to provide an accurate risk assessment of cardiometabolic diseases before “the patient steps into doctor’s office”. 

Moreover, for younger individuals, who have had a limited contact with the healthcare system or, for individuals with specific health trajectories, we aim to study if genetic information can provide additional predictive value. Finally, recognizing the privacy challenges of using nation-wide data, we will use deep-learning-based methods that minimize privacy loss. In particular, we will generate synthetic health-trajectories using generative adversarial networks.

 
FinRegistry_registries.png

FinRegistry

FinRegistry uses nationwide registry data to better understand and predict the onset of diseases in the Finnish population. We combine health data with a wide range of other information from nearly the whole Finnish population. The research data include information about diagnoses, treatment of diseases, medications, home care, and institutional care. On top of that, it includes basic personal information, family relations, living history, marriage history, pregnancies and births, education, job position, social assistance, and death times and reasons. By leveraging this high-resolution longitudinal data, we aim to develop new ways to model the complex relationships between health and risk factors. FinRegistry is a joint research project with the Finnish Institute for Health and Welfare (THL). Check the project website: www.finregistry.fi

 

PRIMUS consortium: machine learning for doctor-patient networks

In the PRIMUS consortium, we're setting out to create new ways to understand large-scale networks and make learning models that are tailored to each user, mainly to assist doctors in making better decisions. A big part of this project looks at how doctors' own backgrounds and behaviors might affect their decisions and the outcomes for their patients. We'll use machine learning to predict how patients might do based on things like what doctors specialize in, what medicines they use, and their own life experiences. We're also interested in seeing how major events in a doctor's life might change the way they care for patients. This project is done in collaboration with Sami Kaski at Aalto University/FCAI and Markus Perola at THL.

 

Can AI identify people with the highest need for a blood test checkup ?

Clinical laboratory tests are essential to monitor overall health and diagnose various diseases. Yet, current testing strategies are inefficient and exacerbate societal inequalities due to a lack of systematicity. In this project, we develop and prospectively validate a deep-learning approach to identify individuals who would benefit most from targeted screening for clinically actionable biomarkers. We use the nationwide data from FinRegistry (N=5.2 million) and genetic information from FinnGen (N=520,00) to develop an AI model to predict laboratory value trajectories and prospectively validate the model by inviting a random subset of individuals with predicted abnormal kidney function for a screening.

 

Learning graph representations of familial relationships to understand health and diseases

Electronic health records (EHRs) and registry data are widely studied using deep learning approaches because they contain detailed information about a patient's medical history. A nationwide EHR system spanning multiple generations presents new opportunities for studying a connected network of medical histories for entire families. Family history is predictive of disease risk because it implicitly captures shared genetic, environmental and lifestyle factors of disease. This information has an underlying geometric structure that is itself informative for disease prediction. Specifically, the closeness of genetic relationships between relatives and attributes of each relative such as a history of the disease or associated risk factors can be represented as graph-structured data. In this project, we will use deep learning approaches to study these graph structures and the individual contributions of each family member to a patient's risk of disease.

 

The Risteys portal

We created the Risteys web portal to allow users to get insights into the the distribution of health outcomes and medications in the Finnish population. The Finnish healthcare system has a unique and extensive nation-wide registries. We are able to leverage this vast amount of data as part of FinnGen and developed a data analysis pipeline that integrates expert knowledge on health registries. The output of this pipeline is then delivered on the web portal. Users are able to explore this data with the help of visualizations and interactive elements. The Risteys web will be expanded to include nationwide data. Check the project website: https://risteys.finngen.fi/