SciPost Submission Page
Improved Pseudolikelihood Regularization and Decimation methods on Non-linearly Interacting Systems with Continuous Variables
by Alessia Marruzzo, Payal Tyagi, Fabrizio Antenucci, Andrea Pagnani, Luca Leuzzi
This Submission thread is now published as
Submission summary
Authors (as registered SciPost users): | Luca Leuzzi |
Submission information | |
---|---|
Preprint Link: | https://arxiv.org/abs/1708.00787v4 (pdf) |
Date accepted: | 2018-06-02 |
Date submitted: | 2018-05-25 02:00 |
Submitted by: | Leuzzi, Luca |
Submitted to: | SciPost Physics |
Ontological classification | |
---|---|
Academic field: | Physics |
Specialties: |
|
Approaches: | Theoretical, Computational |
Abstract
We propose and test improvements to state-of-the-art techniques of Bayeasian statistical inference based on pseudolikelihood maximization with $\ell_1$ regularization and with decimation. In particular, we present a method to determine the best value of the regularizer parameter starting from a hypothesis testing technique. Concerning the decimation, we also analyze the worst case scenario in which there is no sharp peak in the tilded-pseudolikelihood function, firstly defined as a criterion to stop the decimation. Techniques are applied to noisy systems with non-linear dynamics, mapped onto multi-variable interacting Hamiltonian effective models for waves and phasors. Results are analyzed varying the number of available samples and the externally tunable temperature-like parameter mimicing real data noise. Eventually the behavior of inference procedures described are tested against a wrong hypothesis: non-linearly generated data are analyzed with a pairwise interacting hypothesis. Our analysis shows that, looking at the behavior of the inverse graphical problem as data size increases, the methods exposed allow to rule out a wrong hypothesis.
Author comments upon resubmission
We thank referee 2 for her/his observations and we give a detailed answer in the following.
The Referee comments:
"Prior to publication, I would appreciate if the authors could again consider the comment I made in my initial report, that is, point 4 in their rebuttal letter. It is hard for me to understand why inference with a wrong model leads to vanishing couplings, and not to something complicated and essentially inconsistent, but that fits the data! I find the answer rather not convincing on this aspect. Let me be clear: I am not saying that the authors are wrong and their study is incorrect, I am simply saying that I do not understand their findings and that they do not present compelling arguments allowing me to get what is going on. If I take a bunch of data generated by model 1 and fit with model 2, I will get a set of meaningless parameters for model 2, fitting as best as possible the data. Why should these parameter vanish? It will depend on the expressive power of model 2 with respect to model 1, won't it?
Authors reply:
The referee gives for granted that "if I take a bunch of data generated by model 1 and fit with model 2, I will get a set of meaningless parameters for model 2, fitting as best as possible the data". Even though this can be a possible outcome, it is not the only outcome of an inference procedure. Another possibility is to get meaningful parameters for model 2, i. e., parameters tending to zero. In particular, in our analysis above the critical point of our model, the outcome provides small pairwise couplings, decreasing as $M$ is increased.
We try to clarify the results obtained in a more formal way. Let us first stress that, when the temperature is above the critical temperature, in a $4$-body system all two point correlations functions are zero even if the two variables belong to the same quadruplet. On the contrary, four point correlations of four variables in the same interacting quadruplet are non-zero. To observe the consequence of the correlation between $i$ and $j$ being zero on the value of the coupling $J_{ij}$ between them, let us compute the two point correlation explicitly in the wrong pairwise hypothesis. We assume a Hamiltonian of the form Eq. (37):
Let us consider the case in which we are given $M$ independent observations ${\gv{\phi}^{(\alpha)}}_{\alpha=1}^M$. The log-pseudolikelihood is defined as: \begin{eqnarray} \label{eq:pl} L_i & := &\frac{1}{M}\sum_{\alpha=1}^M\log{ P_i(\phi^{(\alpha)}i | \gv{\phi}^{(\alpha)})}\ & = & \frac{1}{M}\sum_{\alpha=1}^M \left{H_i^x \left(\gv{\phi^\alpha}{\backslash i}\right) \cos(\phi^\alpha_i) + H_i^y \left(\gv{\phi^\alpha}\right) \sin(\phi^\alpha_i) - \ln 2 \pi I_0(H_i^\alpha)\right} \nonumber \end{eqnarray}
Within the decimation procedure we minimize ${-}\sum_i L_i$.
Evaluating the first derivative with respect to $J_{l m}$ and setting the result to zero, we obtain the equation
\begin{eqnarray}
\label{eq:corr_jj}
\sum_{\alpha=1}^M \cos{\left(\phi^\alpha_l - \phi^\alpha_m\right)} =
\frac{1}{2}\sum_{\alpha=1}^{M} && !!! \Biggl[\frac{I_1(H_l^\alpha)}{I_0(H_l^\alpha)}\frac{\cos \phi^\alpha_m H^x_l+\sin \phi^\alpha_m H^y_l}{H^\alpha_l}
\
\nonumber
&& + \frac{I_1(H_m^\alpha)}{I_0(H_m^\alpha)}\frac{\cos \phi^\alpha_l H^x_m+\sin \phi^\alpha_l H^y_m}{H^\alpha_m}\Biggr]
\end{eqnarray}
Recalling that $I_0(x)\to 1$ and $I_1(x) \sim x$ for small $x$, we find that for small $J$'s
\begin{equation} \sum_{\alpha=1}^M \cos{\left(\phi^\alpha_l - \phi^\alpha_m\right)} \simeq \sum_{\alpha}^{M}\left[\cos \phi^\alpha_m H^x_l+\sin \phi^\alpha_m H^y_l + \cos \phi^\alpha_l H^x_m+\sin \phi^\alpha_l H^y_m \right] \end{equation} In the paramagnetic phase of the $4$-body Hamiltonian system, that is above the critical temperature, the correlation between the nodes $l$ and $m$ in the left hand side turns out to be zero. Therefore, if $M$ is high enough, we expect the right hand side of Eq. \eqref{eq:corr_jj} to be zero, as well: all the $J$'s will move to lower and lower values as $M$ increases and eventually tend to zero.
Fig. 18 reflects this result for the ML-graph (right) for which we are always above $T_c \sim 0.5$.
On the contrary, for the ER-graph (left panel of Fig. 18) - for which $T_c \sim 2.3$ - we see that the algorithm tries to accomodate to some effective non-zero $J$'s that can describe $\overline{\cos{\left(\phi_l-\phi_m\right)}}$ at $T\lesssim T_c$. The same behavior is shown in Fig. 19.
List of changes
Section 5. Last paragraph added.
Appendices: minor changes
Ref. [78] replaced.
Published as SciPost Phys. 5, 002 (2018)