Completed Projects

Machine Learning for On-Board Diagnostics

Principal Investigator: Tobias Scheffer
Funding: IAV GmbH
Duration: 2016-2018

This project explores embedded predictive models for on-board diagnostics of SCR systems.

Risk Analysis for Peer-to-Peers Loans

Principal Investigator: Tobias Scheffer
Funding: Bitbond GmbH
Duration: 2015-2019

In this project, we analyze the risk of online peer-to-peer loans based on PayPal and eBay account histories and other available information.

Automatic Generation of Sales Leads

Principal Investigator: Tobias Scheffer
Funding: Datalovers AG
Duration: 2015-2018

We create a technology that analyzes CRM data and a wide range or corporate data in order to identify leads with a high chance of conversion.

Prediction Games: Parallel Robust Machine Learning Algorithms

Principal Investigator: Tobias Scheffer
Funding: German Science Foundation DFG
Duration: 2014-2017

Machine learning addresses problem settings that involve automatic model building from data and predicting the future behavior of the system that is reflected in the data. Most results of this field are based on the assumption that available data and future behavior of the system are governed by the same probability distribution. This assumption is an oversimplification for applications in which an active adversary can influence the future behavior of the system. This is the case, for instance, for detecting phishing emails and fraudulent credit card transactions.

In the preceding project, we have modeled adversarial learning problems using paradigms of game theory. We have been able to identify conditions under which non-zero-sum prediction games have a unique equilibrium point; such points are an optimal solution when learner and adversary both aim at minimizing their own cost functions. We have derived primal and dual learning algorithms for static prediction games in which learner and adversary act simultaneously, without knowing the action which their respective opponent chooses. We also derived learning methods for learning problems in which the adversary can react on the model chosen by the learner and for learning problems in which the learner has some uncertainty about the exact cost function which the adversary is trying to minimize. In the context of email spam filtering, we can observe empirically that predictive models generated by game-theoretic learning algorithms maintain high accuracy for longer periods of time than models generated by learning algorithms that do not account for an active adversary. However, game-theoretic learning algorithms have to solve complex optimization problems; these methods are not immediately practical for large data sets.

In the succeeding project, we want to focus on scalable solutions to robust learning problems that can be executed in parallel on GPU and cluster architectures. The highest degree of parallel execution can be attained by algorithms that first solve subproblems entirely parallel, and then aggregate the solutions to these subproblems into a total model in a final, single aggregation step. However, not all optimization problems can be solved by algorithms which have this structure. The goals of the project therefore are the theoretical analysis of this approach to parallel, robust learning and the development and empirical analysis of scalable, parallel, robust – in particular, game theoretical – machine learning algorithms. The analysis of the convergence of algorithms which follow this structure towards the optimal solution of the underlying optimization problem is a central element of investigation. We will explore several approaches to splitting robust learning problems into subproblems that can be solved in parallel. We will study properties of parallel solutions to zero-sum, static non-zero-sum, and Stackelberg prediction games both theoretically and empirically.

Not Too Long; Did Read

Principal Investigator: Tobias Scheffer
Funding: Golem Media
Duration: 2016-2017

The goal of this project is to create a technology that is able to automatically expand documents by identifying related information that fits a given context.

Efficient Algorithms for Embedded Face identification

Principal Investigator: Tobias Scheffer
Funding: Asaphus Vision GmbH and IBB Business Team
Duration: 2015-2016

The goal of this project is to develop and improve algorithms for efficient landmark localization and real-time face identification on embedded systems.

Prediction Games

Principal Investigator: Tobias Scheffer
Funding: German Science Foundation DFG
Duration: 2010-2014

Most results on machine learning rely on the assumption that training data reflect the future behavior of the system under investigation. This assumption over-simplifies reality when an active adversary can exercise some control on the future behavior of the system. This is the case, for instance, with the identification of phishing attacks or credit card fraud. Here, model building becomes a game between learner and adversary. Game theory models such interactions as interleaved optimization problems. Since data-dependent optimization criteria are not a focus of game theory, many questions remain open today. Based on game-theoretic paradigms that model various patterns of interaction between players, the project aims at analyzing prediction games. In particular, the project will investigate learning models that constitute optimal solution to prediction games under defined circumstances.

Multimedia Retrieval

Principal Investigator: Tobias Scheffer
Funding: STRATO AG
Duration: 2009-2013

The goals of this project is to evaluate and develop thechnology that allows to implement intuitive and intelligent ways of navigating large photo and video collections.

Scalable Ranking of Online Ads

Principal Investigator: Tobias Scheffer
Funding: nugg.ad AG
Duration: 2007-2013

In this project, we investigate efficient algorithms that predict which ad a user is most likely to click at, based on that user's past clicking behavior and all other information that is available.

Modelling and Optimization of Dialysis Treatment

Principal Investigator: Tobias Scheffer
Funding: Fresenius-affiliate NephroCare e-Services GmbH
Duration: 2008-2012

We investigate model-building and the generation of actionable knowledge from records of dialysis treatments.

Differing Training and Test Distributions in Active Learning

Principal Investigator: Tobias Scheffer
Funding: Google Research Award
Duration: 2009

Active learning reduces the labeling effort incurred by applying machine learning algorithms. Active learning procedures direct the attention of a labeler towards examples whose label is believed to convey a maximum of information. Labeled samples in active learning are governed by a distribution that differs from the natural test distribution for multiple reasons. An initial labeled sample may be compiled from auxiliary data sources; the natural input distribution may change over time, or may be altered by an adversary. In addition - and specific to active learning - an active instance selection procedure creates a labeled sample that is biased by the selection criterion. Treating the artificially selected sample in active learning as if it was governed by the test distribution is not necessarily the best course of action. We will understand, develop, and evaluate systematic approaches to active learning that account for this discrepancy between labeled training and test distributions.

Mining Jazz Data to Assess Development Processes

Principal Investigators: Andreas Zeller, Tobias Scheffer
Funding: IBM, Jazz Faculty Grant
Duration: 01/2008-12/2008

What is it that makes a good development process? We want to develop a plug-in that learns from collaboration and defect data as tracked by Jazz, relates features of the collaborative development process to the defect density of individual components, and thereby automatically predicts code quality. For instance, the plug-in might advise that package P should be reviewed more, because a new dependency on compiler internals has been added shortly before the release date by a developer who is new to the team.

Text Mining: Knowledge Discovery in Text Databases and Efficient Document Processing

(German project title: Text Mining: Wissensentdeckung in Textsammlungen und Effizienz von Dokumentenverarbeitungsprozessen)

Principal Investigator: Tobias Scheffer
Funding: German Science Foundation DFG, Emmy Noether Program
Duration: June 2003 through June 2008

The amount of documents available in archives and on the web is growing exponentially. This growth induces a demand for methods that automatically analyze large volumes of documents, discover and utilize valuable knowledge contained in them. A substantial part of our working processes consists of processing (i.e., reading, writing, manipulating) documents. Many tools support the administration of text documents, such as file systems, databases, or document management systems. Much greater efforts (and more expenses), however, are imposed by the actual document manipulation processes — such as writing documents. Any support of document manipulation processes requires substantial knowledge; it is therefore much more difficult to support document processing rather than document administration.
The goal of the „Text Mining“ project is to develop and study text mining algorithms that discover knowledge in large document archives, and utilize this knowledge to support future text manipulation processes.

  • One of the project goals lies in the development, and in studying the properties of, efficient active learning algorithms that generate sequence models from example documents. Such statistical models are able to segment, classify, and extract information from documents. In several ways, statistical sequence models can be used to support, and enhance the efficiency of, document manipulation processes.
  • While text mining methods allow to extract information from textual documents and to translate this information into structured representation, data mining algorithms are able to discover knowledge (e.g., patterns or rules) in structured databases. Our goal is to study how these steps can be interleaved automatically, allowing discovery of knowledge that is hidden in textual archives.
  • Our ultimate project goal is to develop and study methods that discover knowledge in document archives, and utilize this knowledge to effectively support future document manipulation processes. Prototypically, we will develop a sentence completion function for natural language. Based on stored documents, the system is to generate a domain and user specific language model representing frequently used phrases and their semantic context. In the application phase, the system will analyze text fragments entered by the user of – for instance – a word processing system. Based on the analysis of text fragment entered so far, the system will propose the most likely completion of a sentence, if this completion can be derived from the acquired knowledge.

Data and Text Mining in Quality and Service

Principal Investigator: Tobias Scheffer
Funding: DaimlerChrysler AG
Duration: 08/2005-07/2008

We study the problem of discovering trends and new developments in production and warranty databases as well as in workshop reports. We develop technologies that automatically identify such trends and discover their hidden causes. The goal of this project is the constructive analysis of data mining methods that lead to improved service processes by integrating and analyzing textual information and data from multiple, heterogeneous and distributed databases.