# An objective criterion for cluster detection in stochastic epidemic models

### Submission summary

 As Contributors: Eugenio Lippiello Preprint link: scipost_202103_00025v1 Date submitted: 2021-03-25 00:01 Submitted by: Lippiello, Eugenio Submitted to: SciPost Physics Academic field: Physics Specialties: Statistical and Soft Matter Physics Approach: Theoretical

### Abstract

The correct identification of clusters is crucial for an accurate monitoring of the spread of a disease and also in many other natural, social and physical phenomena which exhibit an epidemic structure. Nevertheless, even when an accurate mathematical model is available, no simple tool exists which allows one to identify how many independent clusters are present and to link elements to the appropriate clusters. Here we develop an automatic method for the detection of the internal structure of the clusters and their number, independently of the model that describes the dynamics of the phenomenon. It is substantially based on the difference of the log-likelihood $\delta {\cal LL}$\, that is evaluated when all elements are connected and when they are grouped into clusters. As a function of the number of connected elements $\delta {\cal LL}$ presents a change of slope and a singularity which can be both used in cluster identification. Our method is validated for an epidemic model with a minimal temporal structure and for the Epidemic Type Aftershock Sequence model describing the spatio-temporal clustering of earthquakes.

###### Current status:
Editor-in-charge assigned

### Submission & Refereeing History

Submission scipost_202103_00025v1 on 25 March 2021

## Reports on this Submission

### Report

This work describes an intriguing method for reconstructing the clusters of an epidemic model. It is based on a clever use of a log-likelihood difference and provides indicators of the best number of links for detecting optimally “immigrants” (nodes without ancestors) and clusters. I think that the papers deserves publication in SciPost because it opens a new direction in the much used field of unsupervised methods for clustering. Nevertheless, first it needs some clarifications.

– Literature: some papers by Ogata already used the log-likelihood for declustering seismicity (probably ref 8 and 9). This should be discussed in the text.

– log-likelihood: is there any rigorous reason for using the log-likelihood difference (2) or it is more about intuition?

– Algorithm: it is not clear how the algorithm is implemented in practice and how it can scale linearly with the number of links. Is there a ordering of the q_ij involved? The paper would become much clearer with a pseudo-code of the main algorithm.

– Parametrization: in which sense the number of links can be used to parametrize the partitions Y? Partitions with the same number of links can be very different.

– symbol j: probably the symbol “j” used at page 4 as an increment of the number of links could be replaced with something that does not sound as the index of an event.

– R: after eq.(3) I do not understand the sentence saying that R drops fast to zero after n1*. Why is it so? Should not it be after n2*?

– Averages: it is not clear in which sense averages are taken: over different realizations? See e.g. sentences before eq.(4).

– Notation: numbers in the form 2E5 are unusual, normally one sees 2 \times 10^5 (as in figures’ axes).

– Figures: vertical lines in correspondence of the true number of clusters and of links would be a good guide to the eye.

– Model: the realization of the first model is not totally clear, especially concerning the immigrants’ spreading.

• validity: -
• significance: -
• originality: -
• clarity: -
• formatting: -
• grammar: -