Skip to main content

Can Artificial Intelligence Prevent Diseases? – With the help of self-learning methods, researchers analyze which risk of disease lies in our genes

Portrait von Prof. Dr. Christoph Lippert
Photo : Tobias Hopfgarten
Prof. Dr. Christoph Lippert ist seit 2018 Professor an der Universität Potsdam und Forschungsgruppenleiter am HPI.

About 1.7 million people from Europe and the US provide the database for a large-scale interdisciplinary research project. Their genetic information and health data are precisely analyzed and interlinked. With the help of new artificial intelligence methods, researchers want to find out how genetic predispositions influence the risk of disease and how medicine can make use of this knowledge.

It was a mammoth project in the history of genetics and took more than ten years: in 1990, a research consortium ventured to completely decode the human genome. Over 1,000 scientists from 40 countries took part in the Human Genome Project. Biochemist Craig Venter’s private corporation Celera also accepted the challenge and worked on sequencing the genetic material at the same time as the research teams. In 2001, both the corporation and the researchers could pop the corks: They had both arrived at the result in different ways and deciphered the exact sequence of the approximately 3.4 billion base pairs that make up human DNA – albeit still somewhat incompletely.

The interplay of genes and living conditions determines the risk of disease

More than 20 years later, the sequencing of the human genome has become routine in biotechnology laboratories. Even the gaps have largely been closed due to improved procedures and technologies. Today, sequencing takes only 24 hours and, at a few hundred dollars, costs only a fraction of the initial sum. This opens up completely new possibilities for medicine – there is, after all, also a lot of health information in our more than 19,000 genes.

“If I sequence your genome, I will most likely discover risks for certain diseases,” explains bioinformatician Christoph Lippert, Professor of Digital Health and Machine Learning at the University of Potsdam and research group leader at the Digital Health Center of the Hasso Plattner Institute (HPI). Which diseases these are, how high the risk of disease is, how it can be reduced through prevention and how the disease is best treated – the scientist investigates all this in the INTERVENE research project, which is funded by the European Union with ten million euros over five years and which includes 17 institutes from all over Europe and the US.

Whether cardiovascular diseases, diabetes, breast cancer, or prostate cancer, there are numerous diseases that have a high genetic component. Mutations in certain parts of the genetic material increase the risk of disease. Changes in the so-called “breast cancer genes” BRCA1 and BRCA2, for example, which are responsible for the outbreak of the disease in 5-10% of all breast cancer patients, are well known and researched. Women with certain genetic changes in these high-risk genes have a 50–80% higher risk of developing breast cancer. In addition, they develop the disease about 20 years earlier than women who do not have these mutations. However, it is often not only individual genetic variants that determine whether we get diabetes or cancer. Rather, the risk of disease is influenced by the interaction of numerous genetic components and living conditions.

Health data from several decades

In order to link and decode all this health-relevant information, the INTERVENE researchers count on artificial intelligence. Their goal is to develop new methods to accurately measure which risk for developing certain diseases can be read from the genome. To do this, the researchers can draw on genome and health data from a total of 1.7 million people from Europe and the USA.

These data come from so-called “biobanks”, which are comprehensive databases from large health studies with voluntary test subjects. The two largest with data sets of 500,000 people each are from Great Britain and Finland. In addition, there are smaller biobanks of other European countries and the US, which represent a cross-section of the population. The subjects are usually observed over a period of several decades and undergo medical checkups on a regular basis. This includes taking blood or saliva samples, from which the genome is read. This provides researchers with very comprehensive data on various diseases, living conditions, and risk factors, which can be linked to genetic information.

Artificial intelligence identifies important biomarkers

„The big goal is to improve medical care,” says senior medical scientist Dr. Henrike Heyne. She and her team take a close look at genetic risk factors for various diseases. “We are investigating polygenic risk scores,” she explains. In this process, the researchers examine thousands of common genetic variants that individually do not increase the risk of disease. Taken together, however, such small mutations can have a major impact on the onset of diseases such as cancer, diabetes, or heart attacks. “If we are able to better predict who has a higher risk of falling ill, we can optimize screenings and prevention programs”, Heyne explains. This is not only of great benefit to the individual, but also to the community.

Remo Monti, who is doing his doctorate in the project, examines the genetic foundations of diseases and analyses them with artificial intelligence. With the help of AI models, he has identified genes from biobank data that are associated with certain blood biomarkers. These biomarkers are the body’s own signal substances and molecules that can indicate diseases. In total, Monti identified 117 genes in which genetic variants can potentially affect blood biomarkers such as cholesterol. “The great thing is that we can also analyze how very rare mutations influence blood biomarkers with this method and the large data base,” Prof. Lippert emphasizes. These rare gene mutations can have a major impact on a person’s health, but have hardly been researched because of the still insufficient data situation. New models are intended to close this gap and help to predict the functions of these rare mutations at the molecular level.

Ethical questions are the most difficult aspect

It is a first, encouraging result that the Potsdam researchers can already enter in the books after just one year of work. While machine learning tools and theories on artificial intelligence are being developed and the methods are being further optimized and applied in Potsdam, other INTERVENE groups are also working on concrete clinical applications of these tools. Patients who have a high genetic risk for breast cancer and certain cardiovascular diseases are to be informed about this and given medical care in the pilot studies. In these intervention studies, those who are already ill will be treated with adapted therapies and those who are still healthy but have a high risk of disease will get preventive medical supervision. The comparison of high-risk groups and low-risk groups, finally, is expected to show whether the measures are successful and can reduce the disease rates.

“I am a computer scientist. For the biological and medical questions and for interpreting our results, it is necessary to cooperate with other research partners,” Lippert emphasizes. Accordingly, he closely cooperates with Charité Berlin, for example. But the most difficult questions to answer – according to the researcher – are the ethical ones. This involves data protection and privacy, but also how resources for early screening are best distributed, at what point treatment actually becomes necessary, and how each individual deals with the knowledge of his or her personal risk of disease. “A lot of research and learning is still to be done,” Lippert says.

The Project

“INTERVENE (International consortium for integrative genomics prediction)” is an international and interdisciplinary consortium of 17 leading research and other organizations. The researchers are developing new technologies to better diagnose, treat, and prevent diseases. To do this, they use data from genetic information, which they analyze with new AI-based methods.

Funding: the European Union's Horizon 2020 research and innovation program
Duration: 2021–2025
Participants: University of Helsinki, European Molecular Biology Laboratory, University of Siena, Norwegian University of Science and Technology, University of Tartu, BBMRI-ERIC, Technical University of Munich, CSC – IT Center for Science, Hasso Plattner Institute, Aalto University, HUS Helsinki Biobank, University of Cambridge, Massachusetts General Hospital, University of Turin, European Cancer Patient Coalition, Ttopstart, Queen Mary University of London

The Researchers

Prof. Dr. Christoph Lippert studied bioinformatics Ludwig-Maximilians Universität München and obtained his doctoral degree at the University of Tübingen. Since 2018, he has been Professor for Digital Health and Machine Learning at the University of Potsdam and research group leader at the HPI.
Email: office-lipperthpide

Dr. Henrike Heyne studied medicine at Leipzig University and obtained her doctoral degree at the Max Planck Institute for Evolutionary Anthropology. Since 2020, she has been Senior Researcher at the HPI (University of Potsdam) where she is a research group leader.
Email: Henrike.Heynehpide

Remo Monti studied biotechnology at the ETH in Basel. He is doing his doctorate in the group of Prof. Lippert at the HPI and Prof. Ohler at the Max Delbrück Center.
Email: remo.montihpide


This text was published in the university magazine Portal Wissen - One 2023 „Learning“ (PDF).


Online editorial

Sabine Schwarz


Susanne Voigt