Datenanalyse und Stochastische Modellierung
12. Causality

### More than one input

In this course we mainly used one time series to model and predict future steps

For a better prediction one would usually use information of variables that potentially correlate with the output

• One can the look if adding a specific feature improves the predictions or not

### Black boxmodels and interpretability

• The weights in a neural network are difficult to directly link to feature importance andthe structure of the dynamics, so the neural network is considered a black-box model
• In this case it is particularly important to have methods that make the model explainable after it was fitted
• This is related but not equivalent to measures of cross-correlations and causality; in this chapter we will discuss several measures which all have different interpretations

### Cross Correlation

Autocovariance: $\langle (x_i-\langle x_i\rangle )(y_j-\langle y_j\rangle ) \rangle$

Autocorrelation matrix: $\left(\begin{array}{c c c c} \langle x_1y_1\rangle & \langle x_1y_2\rangle & ... & \langle x_1y_n\rangle \\ \langle x_2y_1\rangle & \langle x_2y_2\rangle & ... & \langle x_2y_n\rangle \\ \vdots & \vdots & ... & \vdots\\ \langle x_my_1\rangle & \langle x_my_2\rangle & ... & \langle x_my_n\rangle \end{array}\right)$

Zero mean: Convolution

Stationary series - time averages: $K_\Delta = mean_t x_{t+\Delta} y_t$

### Correlation versus causality

The benefit of eating chocolate

### Granger causality

"if X can predict (portions of) Y, X causes Y"

Compare two autoregressive models for predicting x, one only using past values and one augmenting the prediction with a second time series y

$x_t = a_0 + \sum_i a_i x_{t-i} + \xi_t$ $x_t = a_0 + \sum_i a_i x_{t-i} \sum_j B_jy_{t-j} + \xi_t$

Perform hypothesis test with null hypothesis y does not Granger cause x. The model performs signifficantly better by including y, we found Granger causality

One can use neural networks instead of autoregressive models

Neural granger causality, A Tank, I Covert, N Foti, A Shojaie, EB Fox (2021)
• One can compare more than two inputs, in this case regularizing methods like lasso regression can be used

### Counterfactuals

“x causes y” means that changing x alone changes y
• Find a new input such that y' is achieved from a minimal changed input x'
• Many different models
• Example of one definition: $\min_{x^\prime} \max_\lambda \left[ \lambda (f(x^\prime)-y^\prime)^2 + \sum_j \frac{|x_j - x_j^\prime|}{median(|x_j-median(x_j)|)} \right]$

### Full analysis of causality

• One can draw graphs of causality from the full analysis
• Look for inconsistencies to find external influences

### Permutation importance

• Change order of one input vector and see how much model performance decreases
• Can be performed on training set and/or on test set
• Only if the model did not overfit, the results are the same for both training set y test set

### Shapley values

• Concept from game theory
• Each feature is a player
• Players work together to achieve best possible prediction
• Calculate predictions and corresponding errors for all combiations of features (from no features = predict mean, to the full model with all features) by setting the remaining features to random input
• Contribution of i (other inputs: S; total number of inputs: n; measure of prediction improvement: v) $v(S\cup i)-v(S)$ the Shapley value is $\sum_{S} \frac{(n-1-|S|)!|S|!}{n!}(v(S\cup i)-v(S))$
• 'fair' but computationally costly
• Example: General phase-structure relationship in polar rod-shaped liquid crystals: Importance of shape anisotropy and dipolar strength