COSMOS consists of 15 distinct projects all working on related themes around the analysis of complex signals, towards a common shared goal of the creation of a single software package. This final homogenization of competences is the most important goal of COSMOS. The originality of the COSMOS research programme builds on the idea of matching data-based modelling with model-based data-analysis. These two approaches have so far evolved quite independently, so that often currently used techniques are selected on the basis of personal preference rather than being dictated by objective reasons.

Complex signals are the result of the (nonlinear) combination of several components, which do not necessarily correspond to well-defined subunits in real space. For this reason, the identification of the relevant variables requires a careful mixture of top-down approaches, where the knowledge of the microscopic dynamical rules allows formulating and testing of different conjectures, with bottom-up approaches to infer correlation properties among suitably chosen observables.

For the COSMOS approach to prove effective, it is not, however, sufficient to validate given methods in specific contexts. It is also necessary:

- to identify which methods prove superior and under which conditions;
- to be able to properly integrate top-down with bottom-up approaches to make them available to a broad audience (including researchers with little if any competence in the evolution of dynamical systems and statistical).

**The top-down approach** will explore various mathematical models that are typically used in the description of complex oscillatory systems: ordinary differential equations, differential-delay equations, coupled maps, cellular-automata, embedded either in regular lattices, or network structures, with short and long range interactions. The main objective of this approach is the development of “dimension-reduction” techniques, that is techniques for the identification of a smaller subset of variables (e.g., the slow variables in the language of statistical physics) that can capture the relevant dynamics of the systems. This will be achieved by combining tools and concepts from both nonlinear dynamics and statistical physics.

When the equations of motion for a set of variables are known, it might be sufficient to simulate the model on a computer. However, in most cases explicit forms of these equations are not available, and/or there is not sufficient computational capacity to properly simulate the full dynamical model and/or to characterize the response, as model parameters may vary in a huge parameter space. In particular, it is crucial to identify the variables that are most useful for understanding the collective dynamics in the perspective of extracting them from experimental data.

A generic multi-component nonlinear system is characterized by many variables (they may be associated with the presence of different observables/components, or they can arise from the spatial dependence of some other observable, from the presence of delayed feedback, and so on). Many of the variables which are formally present in the original model may, however, correspond to inactive degrees of freedom; some of them may be corrupted by noise, some may give rise to an intrinsic chaotic behaviour and be practically indistinguishable from noise, some other variables may cluster together because of synchronization phenomena, finally some variables may spontaneously emerge from a pool of seemingly unrelated degrees of freedom, in connection with the onset of a collective behaviour that is not encoded in the evolution equation of the single elements. The problem tackled by this top-down approach is precisely that of shedding light on the above phenomena in generic contexts. Priority will be given to the following items:

- Presence of noise (both external and endogenous). Although it is well known how to include noise in the mathematical equations, the combination of noise with deterministic chaos may give rise to complex scenarios, with the possible presence of noise over different spatial and temporal scales. On the other hand it is not obvious to what extent the presence of an irregular dynamics can be truly assimilated to noise rather than being the signature of some yet not-understood high-level computation. These issues will be explored by project 8, with the crucial help of tools taken from non-equilibrium statistical mechanics, and partially in project 4.
- Role of delay. Delay is unavoidably present in multi-component systems, due to the finite time needed for the information to travel. In the context of neural systems it is well known that even a relatively small delay may drastically affect the stability of some states and, moreover, from a mathematical point of view it is known that the presence of delayed interactions adds infinity of new degrees of freedom to the system. How many of them prove important to modify the behaviour of a multicomponent system? This will be one of the issues tackled by project 3 and project 1.
- Connection topology. The topology and the kind of connections between the various components is yet another key element: in particular, it is important to identify those sub-systems which are effectively decoupled, or to understand the direction of coupling, so as to be able to separate out the contribution of internal degrees of freedom from the effect of the “environment”. To some extent, this question is implicitly present in all top-down approach projects. Project 13 will explicitly address this problem by studying various classes of networks (purely random, small-world, scale-free). The importance of coupling directionality will be also addressed in project 7, which deals with cell dynamics: a test case for various ideas.

Finally, as mentioned above, one of the remarkable properties of large ensembles of oscillators is the emergence of non-trivial collective properties. This phenomenon is basically the extension of the concept of phase-transitions to systems out-of-equilibrium and owes its importance to the fact that similar or analogous scenarios can emerge from rather different “microscopic” environments. The underlying questions are so challenging, that they have not yet been clarified even in systems of phase-oscillators, i.e. of dynamical systems characterized by a single variable: the phase. Particularly appealing are the chimera states which result from an unexpected symmetry-breaking, when single ensemble of identical oscillators may split into two subpopulations characterized by qualitatively different dynamics (synchronous vs. asynchronous). This problem will be explored by project 1 starting from simple setups in the perspective of developing a general theory, and by project 11 in the presence of heterogeneous structures, to clarify how such phenomena can shed light on the behaviour of interacting populations.

This last problem is quite general and could be classified as that of understanding how different scales are mutually related: coarse graining is a standard technique to understand the behaviour of entire populations and it is desirable to define protocols to predict/determine the large scale evolution on the basis of the microscopic models. This problem, besides being addressed in project 11, will be also explored in project 12.

**The bottom-up approach** aims to identify proper variables while starting from uni- and multi-variate time series with a specific focus on the constraints coming from the presence of both intrinsic and external noise. Different methods for the variable identification will be tested on self-generated synthetic signals (to a large extent, those generated in the top-down approach projects), on experimental time-series signals from the partners own labs, as well as on signals obtained via the diverse relationships each partner has with external bodies.

In many experiments no (sufficiently accurate) mathematical model is available and one is faced with the “inverse” problem of reconstructing the dynamics starting from raw data. This is certainly the case of biological time-series such as the physiological and neural signals that we plan, among others, to study. Such signals will be investigated also in the perspective of improving diagnostic techniques and of developing prototypes for novel approaches to cure diseases, possibly related to malfunctioning of complex oscillatory systems, like Parkinson’s tremor and epilepsy.

Inverse problems are notoriously tough to deal with and are even tougher, when only a subset of observables is available from reliable measurements. In this case, one needs a combination of methods to identify the putatively relevant variables together with methods to infer the mutual interactions.

The first task can be undertaken by identifying substructures characterized by a relatively strong mutual synchronization. Project 9 will implement various measures of spike-train distances to multi-neuron recordings in order to quantify the degree of synchrony within and across neuron subpopulations and possibly estimate the amount of information therein encoded.

As for the inference of mutual interactions, nonlinear interdependence measures can help to quantify the presence of directional couplings, but it is known that bivariate measurements often fail to distinguish direct from indirect coupling. For this reason, it is necessary to go beyond, extending the techniques to multivariate signals. This will be the task of project 10. On the other hand, one cannot aim to analyse all the variables at once: the required statistics would be enormous! It is therefore desirable to restrict the multivariate analysis to relatively limited sets, although the price one has to pay is that the outcome is generally affected by the presence of the “hidden” non-measurable variables. For this reason, it is important to develop methods to distinguish endogenous dynamical properties due to internal mutual interactions from the effect of external forcing. A relatively simple setup, where one can tackle this problem is that of chronotaxic (i.e. non-autonomous) systems, where the hidden variable is an external modulation signal. Project 6 will work within this framework with the goal of unveiling the presence of external degrees of freedom from the analysis of given irregular time-series. Project 5 will instead have the task of exploring the potentiality of combining all methods to infer the network structure in relatively large networks (going beyond the few elements that have been so far considered).

In many cases, the relevant properties of a multi-component system emerge for the phase-dynamics that is more susceptible to external and internal perturbations than the corresponding amplitudes, which play the role of enslaved (inactive) degrees of freedom. Under these circumstances, powerful methods, based on the reconstruction of the phase response curve, are available and better performances could be achieved. Project 2 will deal with network reconstruction in the case of phase-oscillators. Project 14 will be devoted to the exploitation of a new inference technique, based on *derivative-variable correlations*. Such a method has the drawback of involving the numerically unstable computation of (time) derivatives, but is otherwise fairly general and looks rather promising especially in the context of metabolic networks.

Altogether, in many cases one is unavoidably confronted with the lack of information and yet it may be necessary to make predictions. Going beyond the above-mentioned case of chronotaxic dynamics, it becomes inevitable to introduce simplifying assumptions, such as the assimilation of some degrees of freedom to stochastic processes and the resulting use of maximum entropy approaches extended to a dynamical, out of equilibrium, context. The limits and potential of such a strategy will be explored by project 4 in various controlled conditions. Finally, project 15 will focus on model reconstruction of physiological coupling, exploring possibilities to apply nonlinear data analysis for improving health prognosis.

**Integration**. This will tackle the problem of integrating the knowledge acquired across the various projects, where the primary goals are:

- to critically compare the different approaches that have been developed to tackle similar problems;
- to widen the competences of the ESRs beyond those required by their specific projects, enabling them to familiarize themselves with other techniques;
- to homogenize the results, expressing them in a common language coherent with the planned objectives for a better fruition by the community of potential users.

For instance, several topics will be addressed from different points of view in more than one project. Some examples are: network reconstruction, the effect of noise, a qualitative comparison of small vs. large networks, the dynamics of neural systems, and the emergence of collective behaviour. One of the objectives of integration will be that of checking the consistency of the know-how gained by the different projects and integrating it in general coherent schemes.

Finally, we plan to transform the tools developed during the COSMOS activity into well-documented protocols and software packages that should help anyone who needs to interpret a set of multivariate time series and to formulate hypotheses about specific features, without the need of being a specialist in either the theory of dynamical systems or statistical mechanics. More specifically, we plan to develop a common platform for the implementation of the various steps accompanied by the development of tutorials to familiarize users with the basic concepts lying behind the evolution of complex signals. The first stage in this direction would be cross-testing of the toolbox developed from the data sets of COSMOS partners, including physiological time series, weather-related time series, active agents, and brain dynamics data. At this stage the non-academic partners will actively participate to transform the methods developed by the academic sector into powerful software toolboxes.