For most people, “big data” is just an abstract term, an obscure mass of numbers, data, and formulas. Emmanuel Müller, however, is at home in this world. He is able to make connections visible, identify patterns in a vast array of highly complex data, and gain new insight for science and industry. Müller is a data mining expert. He uses statistical methods to extract information from large data resources and focuses on humans, who ought to understand hidden patterns and unknown relationships in these data.
“I’m sorry; I'm a bit tired. It was a short night.” Emmanuel Müller became a father only a few days ago. His second son just spent his first night at home – apparently quite a restless one. Müller nevertheless receives us in his office at 10pm; the 35 year old shows no signs of tiredness. Of course, he would also like to be with his family now but “as a researcher, you live for your profession.”
“Our hypotheses come from the machine”
Two years ago, Müller came to Potsdam from Karlsruhe – and became a professor at the Hasso Plattner Institute of the University of Potsdam (HPI) – only 33 years old. He heads the Knowledge Discovery and Data Mining Chair, a joint research group of HPI and the German Research Centre for Geosciences (GFZ). A computer scientist working as a geoscientist? Is that possible? “We are researching and developing data mining methods,” Müller explains. He and his team extract new patterns from big data and make visible unexpected connections in the data. There is a high demand for this in almost every research field. Whether these are data from gene sequencing or research on climate or energy is initially secondary. “Data researchers do not research for a specific domain; they work across disciplines.” For Müller, this is precisely the appeal of his field of research. In geosciences, he is particularly interested in remote sensing data, which monitor and detect various phenomena on Earth, measuring, for example, vegetation or greenhouse gases. His task is to develop methods to analyze them, bridging computer science and geosciences.
The amount of available data is more extensive and complex than ever before. It is now even possible to examine not only existing working hypotheses with data but also to deduce new ones. Data analysis makes hidden structures and contexts visible – and opens up new perspectives. “Our hypotheses come from the machine,” Müller puts it in a nutshell. Humans are nevertheless the focus of his work. “The goal is to make these patterns comprehensible for individuals and to have humans verify them. Data science does not mean replacing humans.”
The idea for this approach came, however, from industry rather than science. In 2008, Müller – a research assistant at RWTH Aachen University at the time – was analyzing data that a car company had provided for a Bachelor thesis as part of a joint project with the university. What followed surprised the researcher. “The anomalies in the data were not simple measurement errors but had to be manually examined and verified by the company.” Unfortunately, the company was initially unable to draw conclusions from the data mining results, because the existing methods were incapable of describing the anomalies to the user. The statistical method lacked the intuitive aspect and, thus, the factor human.
Change points indicate when something is changing in the system
This industrial project made the researcher aware of a new, widespread problem. Discussions with other industry partners and experts showed that the known data analysis and data mining methods often fail to meet their needs – a problem that the Helmholtz Association recognized. Müller mentions the GFZ as a good example. “Scholars have to understand and question causal connections.” In the age of big data, correlations and predictions – such as potential buying behavior in advertising – are no longer enough to provide a deep understanding of the connections. Researchers need to know exactly what is actually in the data, which are divided into cohorts by algorithms and are completed with information on statistical probabilities. Müller wants to strengthen the role of the human in big data science – using methods that more readily detect anomalies in data and enable a more sensitive evaluation. “This is when it started getting exciting for us! We addressed this subject in three doctoral theses and in several research projects, and we continue to research it,” Müller says.
Methods capable of identifying changes – so-called change points – are at the center of Müller’s field of research. They indicate that something in the system is changing – be it data on vegetation recorded by satellite or a hospital patient’s vital signs recorded by measuring devices. The procedures Müller is developing are intended to describe not only what is changing but also to answer why. “There are still too few procedures for this, and that motivates us.” Data science is a rapidly growing field. There are still only a few courses of study that train the experts so sought after by industry and academia. “We will be only able to meet this demand if we train at all levels—our students as well as company employees,” Müller asserts. The number of places at universities, however, is increasing. “We will close the gap in the coming years.”
Big data, data mining, and deep learning – there is an increasing number of technical terms in the world of data. Few people, however, actually know what they mean. Data will also become increasingly significant in the coming years. Their analysis facilitates many things, generates new knowledge, and reveals previously unknown connections. The researcher emphasizes that there is one thing that they are still unable to do: “Algorithms can support decision-making of, for example, corporate executives, politicians, or scholars, but humans still have to make the decision.”
Prof. Dr. Emmanuel Müller studied computer science at RWTH Aachen University. Since 2015, he has been Professor for Knowledge Discovery and Data Mining at GFZ and HPI, which formed the joint Digital Engineering Faculty with the University of Potsdam in April 2017.
für Digital Engineering gGmbH
Text: Heike Kampe
Translation: Susanne Voigt
Published online by: Marieke Bäumer
Contact for the online editorial office: onlineredaktionuuni-potsdampde
Read this and other articles on research at the University of Potsdam in our research magazine Portal Wissen.