Skip to main content

I See What You See - Potsdam Cognitive scientists develop models for predicting eye movements

Prof. Ralf Engbert. | Foto: Tobias Hopfgarten
Photo : Tobias Hopfgarten
Prof. Ralf Engbert. | Foto: Tobias Hopfgarten

For some time, cognitive scientists have been focusing on the human gaze, especially when it comes to reading and viewing images and objects. We know more and more about why we look at a specific spot and how we grasp what we are looking at. Researchers have gone, however, a step further and investigate whether the movements of our eyes may also reveal what we are going to look at next. Matthias Zimmermann participated in a self-experiment on an eye-tracking analysis - and talked to cognitive scientist Prof. Ralf Engbert.

“Top left, right, down, up, bottom left, I have no idea.” The symbols that appear on the screen about five meters in front of me are getting smaller and smaller. At some point I have to guess. I wonder if others can still recognize what has already become blurred before my eyes. When I think I have failed, the friendly colleague explains to me that my visual acuity is perfect. Lucky me! Actually, I am here today to participate in an experiment on eye movement measurement. But before we really get started, they will test if I am allowed to participate. This time, a cognitive test follows. I have to assign symbols and numbers to each other with a time limit and then do a vocabulary test. I feel a little bit like at school, but my ambition has been aroused and I give it all. The experiment itself, however, has not even begun. Then it’s time for looking again! My eyes are thoroughly checked: focusing -okay, depth of focus- works, color vision - perfect, eye dominance - on the right. At some point I am completely registered, get a number for anonymization and then I’m ready to hold my eyes in the camera. But why? What the researchers will see, when they see what I see, I do not know yet. I hope I will find out soon.

“Eye movement measurements are a methodical tool that is versatile and can be used in different disciplines,” says Ralf Engbert. The physicist is Professor of General and Biological Psychology and an expert on the mathematical modeling of eye movements and attention processes. “In Potsdam, we have established many scientific applications for eye movement analyses in psychology and linguistics.”
It is therefore hardly surprising that Engbert is involved in the eye movement projects of two DFG Collaborative Research Centers. Within the CRC 1287 “Limits of Variability in Language”, he and Prof. Shravan Vasishth investigate what eye movements reveal about language parsing processes. The project in the CRC 1294 “Data Assimilation” focuses on theoretical models that can describe eye movements for texts or pictures and how to predict where we look when we look at a scene. “There are already functioning models for static scenes,” Engbert says. The advertising industry, for example, already uses heat maps, which document the areas of images, graphics, or websites we are looking at particularly intensively. “We are working on dynamic cognitive models.” With their help, it should be possible not only to reconstruct on average but to intrinsically predict where someone will look next - and not only when looking at a photo but eventually in a real, changing environment. “The potential applications for such a prediction of eye movements in real time are enormous,” says Engbert. “Especially for human-machine interaction. An appropriately equipped assistance system in a car could warn you if you overlook the pedestrian on the side of the road because you are looking elsewhere.”

The laboratory looks unspectacular: a sober room, which is lined with black curtains, two computers, chairs, and inconspicuous gray plastic goggles. That’s it. I’m still not impressed. That changes when the experimenter, Daniel Backhaus, puts the goggles on my nose, which has no glasses but is wired, and turns it on. A program window opens on one of the screens and I see there what I see through the goggles. I shake my head and the live broadcast shakes as well. This is getting even more bizarre as I step closer and see the screen in the screen and another smaller one inside and one more and another until they are too small to identify. I feel reminded of a mirror cabinet. The goggles are calibrated before and also during the test series, altogether at least 20 times. For this I have to focus three consecutive black dots on a white background. Everything is adjusted to the millimeter. Even the canvas is adapted to my height, so I really look straight ahead and not up or down.

The eye-tracking glasses are the technical heart and the experiment’s “all-seeing eye”. They are equipped with several small cameras, some of which are aligned to the front and record what the subject sees. The others focus on the pupils and register their movements - down to the smallest detail and in real time. When they are combined, visual tracks are created that document where you look in the picture. The constantly repeated calibration ensures that the visual track and the viewed image ultimately fit together exactly.

Finally, the test begins. The experimenter places me on two so-called wobble boards, i.e. round plates that are mounted on a half ball so that I must constantly balance in order to not lose my balance - in other words, aggravated conditions. About three meters in front of me is a large, white canvas and directly above me a projector.
Wobbly like a surfer when trying to stand on the board for the first time I have to look at 15 photos. I see landscapes, streets, houses - and in between, animals: elephants in the savannah, seagulls in the harbor, monkeys in the zoo, dogs, horses, sheep, sometimes single animals, sometimes masses. My job is to count how many animals I see in the photos. I have ten seconds for each picture. Then I can choose from three possible answers and must say my answer aloud. If I’m correct, the screen lights up in green and my “experiment account” is credited a few cents. If I am wrong, there is no nasty “eeh” as in a game show, but the screen turns red. Experiment or not: I cannot deny that I feel a bit under pressure. The number is often correct, but again and again I am wrong - and that annoys me.

Research on eye movement measurements has been around for some time, Engbert explains, but so far it has usually taken place under laboratory conditions. “Visual perception, however, serves as a preparation for actions. We look at a cup and then pick it up; we look at things and reflect upon their function.” Visual perception depends very much on the task and therefore can only be expediently examined in these contexts. “Our goal is to get the eye movement analysis out of the lab - without sacrificing scientific precision.”
The researchers in Potsdam do this in two ways. On the one hand, it is now possible to simulate more natural conditions thanks to the latest technology. For a long time, permanently installed trackers were used for a long time, where you had to put your chin on a support and were not allowed to move your head during the experiment. The new tracking glasses offer more mobility. Standing or turning your head - all that is no longer a problem. “It is already possible to walk across the campus with an eyetracker on your head,” says Engbert. “With this first step we make our subjects move within our different settings to create conditions that enable scientific conclusions about a natural behavior of ours.”
On the other hand, the researchers simply “ensure” that the sight of the test subjects is meaningful by giving them a task. In this case it is counting animals. That makes their eye movements not only realistic but also comparable. This, in turn, is a prerequisite for a good model that they can feed with the datasets and then refine. Of course, experiments become increasingly complex when they are more realistic. That does not make it easier for researchers. “Natural conditions present challenges in many fields,” says Engbert. He refers to the complexity of the models, for example. The more parameters that have to be taken into account in an experiment, the more data that is incorporated into the modeling. “The only thing that helps to take advantage of the wealth of data without getting lost in its everglade is good theory.”

Finally, the first round is over and, relieved, I step down from my “surfboard”. A short rest, but I am still far from completing the experiment. Another 45 pictures are in front of me. This time, I firmly stand on my feet for the next 15 pictures, so it’s a piece of cake. Nevertheless, I cannot relax; the pictures are demanding. I search in vain for some animals. Again and again red screens, and in between, I remember that measuring eye movements is a bit like mind reading. Experimenter Daniel Backhaus sees exactly where I am looking - also my glimpse at the slowly growing “reward account”, which I deny myself from now on. I am here to report on a research project and not to earn money. But then it crosses my mind: What if the whole experiment is a psychological study to find out who peers hard at money and who not? I pull myself together, I have to count animals. Eight sheep, none, three elephants. Wrong, damn!

Although the CRC 1294, which I am also contributing to with “my” experiment, was only started in autumn 2017, Engbert and his team of cognitive scientists have already achieved a lot in their subprojects. “Our mathematical models have improved considerably,” he says with some pride. Thanks to data assimilation, the dynamic cognitive models, which he developed together with the project partners, are not only able to depict the sequential structure of vision. “With each new data point, we are better able to predict the next fixation point from the recorded sequences, that means the point in a picture where the viewer looks next.”
But that’s not all. Improved modeling through data assimilation has enabled them to make more accurate predictions with less data. “Up to now, measurement data from many subjects was needed to formulate relatively static and general statements about eye movements. We have already come to a point where we are able to make individual predictions with data sets from individual subjects.”

Two more “photo rounds” are next. After all, I am allowed to sit now, first on a kind of bar stool and then on a chair at a table. I have to put my head on a small supporting frame - and must not move it. Therefore, I cannot say how many animals I see. Instead, I’m shown three answer options: I have to look at one of the numbers and then close my eyes for a moment - and that is how I show my decision. Two ducks -blinking - correct. I would like to order my food in the restaurant like this, I think, and wait for the next photo.

The fact that their work has been so successful is not least because of the particularly fruitful research climate at the CRC, says Engbert with enthusiasm. “People exchange their ideas over a long period of time - both on a theoretical and practical level.” Sometimes it becomes evident that some model approaches have more in common than was previously thought, even though they may come from earthquake research and eye movement analysis. Ultimately, everyone benefits from the joint work of experienced researchers in various disciplines and young doctoral students and postdocs. “The joint supervision of two principal investigators ensures that you always deal with different perspectives and new questions. And when young, interested people are ready to try new things, we all end up learning something.”

Finally, the test series is done, and I am completely exhausted. I am quite proud that I found so many animals, even if the task was actually just a sideline. Experimenter Daniel Backhaus finally asks me about my strategy for searching for the animals. I ponder. I scanned, looked for big fixed points, then looked at unclear spots more intensively, and then repeated everything again. In fact, after a few minutes, I had developed a procedure that seemed the most sensible to me for “animal searching in seconds”. Did others do it the same way? I hope I will find out when the experiment has been evaluated.

The Project

Ralf Engbert is involved in two subprojects of the collaborative Research Center 1294 “Data Assimilation”. Project B03 – „Parameter inference and model comparison in dynamical cognitive models“, that he heads together with mathematician Prof. Sebastian Reich, examines the data assimilation for dynamic cognitive models. The project focusses on improving mathematical models of eye movement control in reading, scene perception, and fixational eye movements. The goal is to develop efficient algorithms for data assimilation and model comparison to be able to predict eye movements in real time. Project B05 – “Attention selection and recognition in scene-viewing”, that Engbert heads together with computer scientist Tobias Scheffer, develops algorithms and mathematical models that describe eye movements while considering the individual characteristics of the viewer. A second goal is to generate discriminative models from such generative models of fixation sequences that enable extracting the latent properties of the observer from the viewed fixation sequences. From the exact analysis of eye movements, it could be inferred whether the viewer is familiar with depicted persons or other images. In the long run, such models could be used in e-learning and in criminology.

The Researcher

Prof. Dr. Ralf Engbert studied physics at RWTH Aachen. Since 2008, he has been Professor of Experimental and Biological Psychology at the University of Potsdam.
Mail: ralf.engbertuni-potsdamde


This text was published in the university magazine Portal Wissen - Two 2019 „Data“.