# Improved Pseudolikelihood Regularization and Decimation methods on Non-linearly Interacting Systems with Continuous Variables

### Submission summary

 As Contributors: Luca Leuzzi Arxiv Link: https://arxiv.org/abs/1708.00787v4 Date accepted: 2018-06-02 Date submitted: 2018-05-25 Submitted by: Leuzzi, Luca Submitted to: SciPost Physics Domain(s): Theor. & Comp. Subject area: Statistical and Soft Matter Physics

### Abstract

We propose and test improvements to state-of-the-art techniques of Bayeasian statistical inference based on pseudolikelihood maximization with $\ell_1$ regularization and with decimation. In particular, we present a method to determine the best value of the regularizer parameter starting from a hypothesis testing technique. Concerning the decimation, we also analyze the worst case scenario in which there is no sharp peak in the tilded-pseudolikelihood function, firstly defined as a criterion to stop the decimation. Techniques are applied to noisy systems with non-linear dynamics, mapped onto multi-variable interacting Hamiltonian effective models for waves and phasors. Results are analyzed varying the number of available samples and the externally tunable temperature-like parameter mimicing real data noise. Eventually the behavior of inference procedures described are tested against a wrong hypothesis: non-linearly generated data are analyzed with a pairwise interacting hypothesis. Our analysis shows that, looking at the behavior of the inverse graphical problem as data size increases, the methods exposed allow to rule out a wrong hypothesis.

### Ontology / Topics

See full Ontology or Topics database.

We thank referee 2 for her/his observations and we give a detailed answer in the following.

"Prior to publication, I would appreciate if the authors could again consider the comment I made in my initial report, that is, point 4 in their rebuttal letter. It is hard for me to understand why inference with a wrong model leads to vanishing couplings, and not to something complicated and essentially inconsistent, but that fits the data! I find the answer rather not convincing on this aspect. Let me be clear: I am not saying that the authors are wrong and their study is incorrect, I am simply saying that I do not understand their findings and that they do not present compelling arguments allowing me to get what is going on. If I take a bunch of data generated by model 1 and fit with model 2, I will get a set of meaningless parameters for model 2, fitting as best as possible the data. Why should these parameter vanish? It will depend on the expressive power of model 2 with respect to model 1, won't it?

The referee gives for granted that "if I take a bunch of data generated by model 1 and fit with model 2, I will get a set of meaningless parameters for model 2, fitting as best as possible the data". Even though this can be a possible outcome, it is not the only outcome of an inference procedure. Another possibility is to get meaningful parameters for model 2, i. e., parameters tending to zero.
In particular, in our analysis above the critical point of our model, the outcome provides small pairwise couplings, decreasing as $M$ is increased.

We try to clarify the results obtained in a more formal way.
Let us first stress that, when the temperature is above the critical temperature, in a $4$-body system all two point correlations functions are zero even if the two variables belong to the same quadruplet. On the contrary, four point correlations of four variables in the same interacting quadruplet are non-zero.
To observe the consequence of the correlation between $i$ and $j$ being zero on the value of the coupling $J_{ij}$ between them, let us compute the two point correlation explicitly in the wrong pairwise hypothesis. We assume a Hamiltonian of the form Eq. (37):
$$H = -\sum_{i,j \in E_{ij} } J_{ij} \cos{\left(\phi_i - \phi_j\right)}$$
where $E_{ij}$ indicates the edges actually present in the graph (an edge is there when $J_{ij}\neq 0$).
In this case the Pseudolikelihood $P_i(\phi_i | \gv{\phi}_{\backslash i})$ is:
$P_i(\phi_i | \gv{\phi}_{\backslash i}) = \frac{1}{Z_i} \exp{\bigl[H_i^x \left(\gv{\phi}_{\backslash i}\right) \cos(\phi_i) + H_i^y \left(\gv{\phi}_{\backslash i}\right) \sin(\phi_i)\bigr] }$
where we use the shortening
$H^x_i(\gv{\phi}_{\backslash i}) := \sum_{j\in \partial i} J_{ij}\cos(\phi_j)$
$H^y_i(\gv{\phi}_{\backslash i}) := \sum_{j\in \partial i} J_{ij}\sin(\phi_j)$
and
$Z_i = 2 \pi I_0( H_i)$
$H_i \equiv \sqrt{(H^x_i)^2 + (H^y_i)^2}$
where $I_0$ is the modified Bessel function defined in Sec. 3.1.1 and $\partial i$ denotes the neighbors if $i$. {We have rescaled the $J$s to include $\beta$.}

Let us consider the case in which we are given $M$ independent observations $\{\gv{\phi}^{(\alpha)}\}_{\alpha=1}^M$. The
log-pseudolikelihood is defined as:
\begin{eqnarray}
\label{eq:pl}
L_i & := &\frac{1}{M}\sum_{\alpha=1}^M\log{ P_i(\phi^{(\alpha)}_i | \gv{\phi}^{(\alpha)}_{\backslash i})}\\
& = & \frac{1}{M}\sum_{\alpha=1}^M \left\{H_i^x \left(\gv{\phi^\alpha}_{\backslash i}\right)
\cos(\phi^\alpha_i) + H_i^y \left(\gv{\phi^\alpha}_{\backslash i}\right) \sin(\phi^\alpha_i)
- \ln 2 \pi I_0(H_i^\alpha)\right\}
\nonumber
\end{eqnarray}

Within the decimation procedure we minimize ${-}\sum_i L_i$.
Evaluating the first derivative with respect to $J_{l m}$ and setting the result to zero, we obtain the equation
\begin{eqnarray}
\label{eq:corr_jj}
\sum_{\alpha=1}^M \cos{\left(\phi^\alpha_l - \phi^\alpha_m\right)} =
\frac{1}{2}\sum_{\alpha=1}^{M} && \!\!\! \Biggl[\frac{I_1(H_l^\alpha)}{I_0(H_l^\alpha)}\frac{\cos \phi^\alpha_m H^x_l+\sin \phi^\alpha_m H^y_l}{H^\alpha_l}
\\
\nonumber
&& + \frac{I_1(H_m^\alpha)}{I_0(H_m^\alpha)}\frac{\cos \phi^\alpha_l H^x_m+\sin \phi^\alpha_l H^y_m}{H^\alpha_m}\Biggr]
\end{eqnarray}

Recalling that $I_0(x)\to 1$ and $I_1(x) \sim x$ for small $x$, we find that for small $J$'s

\sum_{\alpha=1}^M \cos{\left(\phi^\alpha_l - \phi^\alpha_m\right)} \simeq \sum_{\alpha}^{M}\left[\cos \phi^\alpha_m H^x_l+\sin \phi^\alpha_m H^y_l + \cos \phi^\alpha_l H^x_m+\sin \phi^\alpha_l H^y_m \right]

In the paramagnetic phase of the $4$-body Hamiltonian system, that is above the critical temperature,
the correlation between the nodes $l$ and $m$ in the left hand side turns out to be zero.
Therefore, if $M$ is high enough, we expect the right hand side of Eq. \eqref{eq:corr_jj} to be zero, as well: all the $J$'s will move to lower and lower values as $M$ increases and eventually tend to zero.

Fig. 18 reflects this result for the ML-graph (right) for which we are always above $T_c \sim 0.5$.

On the contrary, for the ER-graph (left panel of Fig. 18) - for which $T_c \sim 2.3$ - we see that the algorithm tries to accomodate to some effective non-zero $J$'s that can describe $\overline{\cos{\left(\phi_l-\phi_m\right)}}$ at $T\lesssim T_c$. The same behavior is shown in Fig. 19.

### List of changes

Appendices: minor changes

Ref. [78] replaced.

### Submission & Refereeing History

Resubmission 1708.00787v4 on 25 May 2018
Resubmission 1708.00787v3 on 6 March 2018
Submission 1708.00787v2 on 25 September 2017

## Reports on this Submission

### Report

I am grateful to the authors for their explanations, and I think the manuscript can now be published as it stands.

• validity: -
• significance: -
• originality: -
• clarity: -
• formatting: -
• grammar: -