Estimation of the geometric measure of entanglement with Wehrl Moments through Artificial Neural Networks

In recent years, artificial neural networks (ANNs) have become an increasingly popular tool for studying problems in quantum theory, and in particular entanglement theory. In this work, we analyse to what extent ANNs can accurately predict the geometric measure of entanglement of symmetric multiqubit states using only a limited number of Wehrl moments (moments of the Husimi function of the state) as input, which represents partial information about the state. We consider both pure and mixed quantum states. We compare the results we obtain by training ANNs with the informed use of convergence acceleration methods. We find that even some of the most powerful convergence acceleration algorithms do not compete with ANNs when given the same input data, provided that enough data is available to train these ANNs. We also provide an experimental protocol for measuring Wehrl moments, which is state-independent. More generally, this work opens up perspectives for the estimation of entanglement measures and other SU(2)-invariant quantities, such as the Wehrl entropy, in a way that is more accessible in experiments than by means of full state tomography.


Introduction
Entanglement is at the heart of quantum physics and constitutes a crucial resource for most quantum technologies [1].Detecting and estimating the entanglement of a system is usually a challenging task, both theoretically and experimentally, and the development of theoretical methods and experimental protocols are essential in this context.The detection of entanglement has already been explored around specific symmetric multiqubit states [2,3] or using criteria based on collective measurements [3] or PPT mixtures [4] that are able to detect certain classes of entanglement.In this work, we propose a method for estimating the entanglement of symmetric multiqubit states, but we make no a priori assumptions about the form of the states or their entanglement.
More precisely, we tackle the problem of estimating entanglement via the use of artificial neural networks (ANNs).Over the past few years, deep learning methods have gained momentum in quantum physics [5,6].In the context of quantum state tomography, they have been used to reconstruct density matrices from measurement results [7,8] and to find an optimal measurement basis [9].In quantum optics, artificial neural networks have been trained to detect multimode Wigner negativity [10].Deep reinforcement learning and recurrent neural networks have also been exploited for quantum information theory purposes, such as quantum state preparation [11] and quantum error-correction [12,13].
In the context of entanglement theory, ANNs have been used to quantify the amount of entanglement in multipartite quantum systems [14,15] and to classify the entanglement in pure states [16] and mixed states [17].In [14], the authors trained complex-valued ANNs to predict the geometric measure of entanglement (GME) of symmetric states.To do so, they reformulated the GME computational problem as the search for the best rank-one tensor approximation of complex tensors, for which they used ANNs.Other authors have used deep learning methods to compute the concurrence and mutual information from an incomplete tomography of mixed qubit states [15].In quantum many-body physics, convolutional neural networks were employed to compute e.g. the entanglement entropy from the variance on the number of particles in an electron chain [18].
More specifically, the general question posed in this work, which is along these lines, is: To what extent is it possible to estimate the geometric measure of entanglement of symmetric multiqubit states using only partial information in the form of some of their Wehrl moments?Wehrl moments are the moments of the Husimi Q function of a state [19].They have been used to define measures of non-classicality, chaoticity or entropy of quantum states [19][20][21], and have some relevance in various contexts, such as for the characterization of quantum phase transitions [21,22].Importantly, Wehrl moments are experimentally accessible quantities, as we show in this work, from projection measurements of collective observables (see [24] for a full state tomography protocol).On the other hand, there is currently no protocol to determine the GME experimentally other than by full-state tomography, and its calculation, even for pure symmetric states, cannot generally be performed analytically and requires numerical optimisation.A good estimate of the GME on the basis of more readily available partial information than the full quantum state is therefore of theoretical and practical interest, and motivates our approach.In this work, assuming the knowledge of a few Wehrl moments of symmetric multiqubit states, we present and compare three different approaches to estimate their GME, one of which being an ANN that we found to be the most efficient.Note that similar but distinct issues to the one addressed in this work have recently been studied with respect to the detection and certification of entanglement from the Peres-Horodecki criterion based on the first moments of the partial transpose of a state [25,26].
Our paper is organised as follows.In Sec. 2, we define the Husimi function, the Wehrl moments, the GME and their relations to each other for pure symmetric multiqubit states.In Sec. 3, we present how we generated the datasets of Wehrl moments used throughout this work.In Sec. 4, we introduce the three different approaches to estimate the GMEs of the dataset: i) a first one based on the two highest known successive Wehrl moments, ii) a second one based on a convergence acceleration algorithm applied on the sequence of the known Wehrl moments and iii) a third one based on a trained ANN.In Sec. 5, we compare and analyse our results.In Sec. 6, we consider the more complex case of mixed states.In Sec. 7, we propose a protocol for the experimental determination of Wehrl moments based on the measurement of a set of collective observables, the number of which varies only quadratically with the number of qubits.In Sec. 8, we conclude and present perspectives of our work.Finally, this manuscript ends with a series of technical appendices, one of which presents a semi-definite program for the calculation of the GME of mixed multiqubit symmetric states (Appendix E).

Wehrl moments and geometric measure of entanglement
In this section, we define multiqubit symmetric states, the Husimi function and the associated Wehrl moments, the GME, and present how these quantities are related to each other.

Multiqubit symmetric states
A multiqubit state is said to be symmetric if it is invariant under any permutation of the qubits.Let |ψ〉 be an N -qubit symmetric state.We can always write this state in terms of N single-qubit normalized states |ε i 〉 as where N |ψ〉 is a normalization constant and S N is the symmetric group on N elements.Since a one-qubit state, up to a phase factor, can be represented by a point on the Bloch sphere, any symmetric multi-qubit state can be represented geometrically by a constellation of N points, each associated with one of the |ε i 〉, on the same sphere [27].In the following, we will refer to these points as the Majorana points of |ψ〉.
Alternatively, a symmetric state of N qubits can be expanded in the symmetric Dicke states basis as where the symmetric Dicke states N 〉 are given by Eq. ( 1) with N 〉 can be thought as angular momentum eigenstates once we introduce the collective spin operators associated with the N -qubit system,

Husimi function
For a spin j, the Husimi function of an arbitrary state |ψ j 〉 is defined as where |Ω〉 is a spin-coherent state with Ω specifying a point on the unit sphere of 3 [28].The Husimi function Q |ψ j 〉 (Ω) is an infinitely differentiable function on the sphere S 2 .In what follows, we will mainly use the notation Q |ψ j 〉 (θ , ϕ) where θ ∈ [0, π] and ϕ ∈ [0, 2π[ are the polar and azimuthal angles associated to a point on the unit sphere.The Husimi function is normalized according to [28] For multiqubit symmetric states, the Husimi function Q |ψ〉 (θ , ϕ) of an N -qubit state |ψ〉 is similarly defined as the overlap squared of |ψ〉 with a symmetric separable pure state |ε〉 ⊗N  where Ω = (θ , ϕ) are the coordinates of the point on the Bloch sphere associated with the single-qubit state |ε〉 ≡ |θ , ϕ〉.The Husimi function of any state |ψ〉 is normalized according to (3) with |ψ j 〉 → |ψ〉 and 2 j → N .Using Eq. (1), we can expand it as The Husimi function of three different symmetric states of N = 8 qubits are shown in Figure 1.

Wehrl moments -explicit expressions
The Wehrl moment W (q) |ψ〉 of integer order q is the SU(2) invariant defined as Figure 1: Husimi Q function of symmetric 8-qubit states taken from the three different data subsets introduced in Sec. 3. From left to right (subsets 1 to 3), the GME is 0.717, 0.211 and 0.620 respectively.The more uniform the Husimi function, the higher the GME.
A tight upper bound for Wehrl moments of order q > 1 that is valid for any state is given by [28] where the equality holds only for coherent states [29].
An explicit expression for the Wehrl moments of symmetric multiqubit states in terms of expansion coefficients d k in the Dicke states basis has been given by Gnutzmann and Zyczkowski [19], and reads in our notations where the inner sum goes from 0 to N for each i k with the restriction q k=1 i k = m.This relation is exact and allows us to calculate the Wehrl moments when we know the expansion (2) of a symmetric state.In Appendix A, we give an alternative expression of Wehrl moments in terms of permanents of Gram matrices of constituent states {|ε i 〉} N i=1 , see Eq. (A.11).The latter expression is more appropriate when a symmetric state is known in the form of Eq. (1) rather than Eq.(2).

Geometric measure of entanglement
The geometric measure of entanglement (GME) of an N -qubit pure state |ψ〉, denoted by E G (|ψ〉), quantifies how far |ψ〉 is from the set of separable states.Just as the Wehrl moments, it is an SU(2) invariant quantity, defined as [30] where the maximization is performed over the N single-qubit states |φ i 〉.The GME is always smaller than 1 and is equal to 0 only when |ψ〉 is separable.In the case of symmetric states, the maximization appearing in Eq. ( 8) can be replaced by the simpler maximization where all single qubit states |φ i 〉 are identical, i.e. |φ i 〉 = |ε〉 for i = 1, . . ., N [31].We are thus left with the problem of finding the maximum of the Husimi function of |ψ〉 on the sphere S 2 , that is The GME is zero for all product states and non-zero for all entangled states.An (not tight) upper bound on the GME of N -qubit symmetric states is given by [32]

Bounds on GME from Wehrl moments
For any integers q > p > 1 and any state |ψ〉, it holds that max θ ,φ This is a consequence of the integral Hölder's inequality [33], where , and f and g are functions defined on X .By taking , dµ = dΩ/4π, r = ∞ and m = 1, we readily get Eq.( 11) by noting that ∥ f ∥ ∞ = max X f where ∥•∥ ∞ denotes the spectral norm.Equation (11) provides us with a chain of better and better upper bounds for the GME as q and p increase.In fact, defining the sequence (for integer q > 1) we have that and Equation (15) shows that the geometric measure of entanglement E G can be extracted from the limit of the sequence S |ψ〉 (q) of ratios of successive Wehrl moments.The Wehrl moments admit in some cases simple analytical expressions.For instance, for symmetric Dicke states, they are given by [19] This then leads to in agreement with known results for the geometric entanglement of Dicke states [30].It is also instructive to analyze how the sequence S |D (k) N 〉 (q) converges to its limit.From Eq. ( 16) for 0 < k < N − 1, we find that the sequence S |D (k) N 〉 (q) is monotonously decreasing and converges asymptotically to its limit as For separable states (k = 0 or k = N ), we have [19] and In both cases, the dominant correction scales as 1/q.The asymptotic scaling as 1/q of the dominant correction of S |ψ〉 (q) is actually a general feature of the sequence valid for any state |ψ〉.Indeed, the asymptotic scaling of the Wehrl moments ( 5) can be calculated using Laplace's method (see Appendix B for a detailed derivation) and reads where c |ψ〉 is a constant independent of q and o(•) the little-o notation. 1From the definition ( 13) and properties of the little-o and Big-o, we get In Section 4.2, we show how to generalize this analysis and how to take advantage of the knowledge of the asymptotic behavior of the sequence S |ψ〉 (q) to estimate its limit from a finite number of terms.

Datasets and performance metrics
As our objective is to compare different methods to determine the best estimate of the GME of a state from its first few Wehrl moments, we need a set of representative pure multiqubit states on which to test these methods and calculate some metrics to compare their respective performances (see Sec. 4).This section aims to explain how we generated these representative multiqubit states and what our performance measures are.

Generation of datasets
In order to obtain a dataset with the most distributed GME values, we generate three different subsets of states.Subset 1 is made of symmetric states with randomly and uniformly distributed Majorana points on the Bloch sphere.Subset 2 is made of random states for which degenerated Majorana points are uniformly distributed on the Bloch sphere, with random degeneracy tuples drawn uniformly from all partitions of N .Finally, the subset 3 is made of superpositions of |GHZ〉 = (|D (0) with random real number α ∈ [0, 1] and random integer k between 0 and N .For each number of qubits N , 20000 states are randomly drawn for each subset.All these states are then divided into two equally sized sets: one for training the ANN and the other for testing the three different methods in the estimation of the GME.The Wehrl moments up to q max = 8 and E G are computed for all states.The number of states in the data sets is large enough to generate a similar GME distribution for the training and test sets.For N = 8, the maximal GME is E G ≈ 0.816 [34], while Eq. ( 10) gives the upper bound E G (|ψ〉) ⩽ 8/9 ≈ 0.889.
Figure 2 shows the GME probability distribution of training states (left) and test states (right) for N = 8.We find that these three subsets have very different entanglement distributions and are therefore a good set of training and test data.In particular, subset 2 (yellow histograms) is mostly made up of weakly entangled states, while subset 3 (red histograms) contains a significant proportion of very highly entangled states.

Performance metrics
In order to compare the different methods to estimate the GME, such as convergence acceleration processes and ANNs, we first define the relative difference between the predicted GME and the actual GME as where E pred G (|ψ i 〉) stands for the predicted GME of state |ψ i 〉 of the test dataset.Then, we define the mean absolute relative difference, hereafter called mean relative error (MRE), where we sum over all states of the test dataset of size M = 30 000.As the distribution of the absolute relative difference |δ i | is not Gaussian, the standard deviation is not a good estimate for error bars.Instead, we calculate a low error bar and a high error bar so as to include 68.2% of the |δ i | distribution in the error bar and have 15.9% of the distribution below (above) the low (high) error bar, as would be the case for an interval of one standard deviation centred around the mean for a Gaussian distribution.

Estimation of the geometric measure of entanglement
In this section, we estimate the GME of the states of the test dataset presented previously based on the knowledge of their Wehrl moments from q = 1, . . ., q max , expecting a better estimate of the GME as q max increases.We use and compare three different methods: i) a crude one based on the ratio of the two highest known Wehrl moments, ii) a second one based on a convergence acceleration algorithm applied on the set of known Wehrl moments and iii), a third one based for N = 4 and different maximal orders q max (top) and for q max = 4 and different number of qubits N (bottom).Left panels: Predicted value versus actual value of GME for all states of the test dataset.Middle panels: Probability to predict the GME with a certain relative difference.The bins size is 0.5%.Right panels: Mean relative error (25) as a function of q max and N .The grey solid line in the top right panel shows a fit of equation ∆(q max ) = A/q max with A ≈ 102, which is the expected behaviour at large q max according to Eq. ( 22). on a trained ANN.We are particularly interested in the performance of the different methods as a function of the highest considered order q max and of the number of qubits N .

Wehrl moments ratios
As the ratios of successive Wehrl moments (13) converge to the maximum of the Husimi function when q → ∞ [see Eq. ( 15)], a first estimate of the GME of the test states based on these ratios is given by The predictive power of ( 26) is illustrated in Fig. 3 for different maximal orders q max and number of qubits N .As expected from the inequality ( 14), we observe that the estimate ( 26) is always larger than the actual value of the GME (left panels), which results in a positive relative difference (middle panels).As q max increases, the estimate becomes better and better, with a decrease in mean relative error (MRE) as a function of q max (top right panel).However, even with q max = 8, the MRE remains above 10%.The MRE increases slightly with N before stabilising quickly, as shown in the bottom right panel.

Convergence acceleration algorithms
Convergence acceleration algorithms consist in transforming a sequence into another sequence that converges faster to its limit, by taking as inputs only the first terms of the original sequence.Different algorithms exist in the literature and differ from each other depending on how the terms of the initial sequence are combined together to generate the new sequence.
We focus here on the use of the recursive E-algorithm [35], which is among the algorithms we tested the one that showed the best performance.Our goal, by applying it on the sequence S |ψ〉 (q) [Eq.( 13)], is to obtain a better estimate of its limit S |ψ〉 (∞), and thus of the GME of the states through Eq. ( 15).
The recursive E-algorithm makes it possible to accelerate sequences f (q) with asymptotic expansions of the general form where g i (q) are known (or postulated) scaling functions ordered such that lim i.e., so that g 1 (q) corresponds to the dominant asymptotic scaling of the sequence f (q), and with arbitrary (and potentially unknown) coefficients λ i .According to the recursive Ealgorithm, a better estimate of the limit f (∞) can be obtained by computing via recurrence the quantities taking E (q) 0 = f (q) as the initial conditions and the coefficients A quick inspection shows that k is a function of the set { f (q), f (q + 1), . . ., f (q + k)}.In practice, increasing the order k of the algorithm generally provides a better estimate E (q) k of the limit f (∞) of the initial sequence f (q), but requires knowing and combining more terms of the sequence.
The recursive E-algorithm is particularly suited for the acceleration of the sequence S |ψ〉 (q) for which we have an idea of the form of the scaling functions g i (q) defined in Eq. (27).Indeed, motivated by the general asymptotic behaviour of S |ψ〉 (q) given by Eq. ( 22) and the two particular cases (18) and (20) studied in Sec.2.4, we consider here the following ansatz: i.e., the general expansion (27) with g i (q) = q −i .In Fig. 3, we showed the GMEs of the states of the test dataset via the crude estimate S |ψ〉 (q max ).In order to have a fair comparison, we estimate here the GMEs of these states with E (2) q max −2 , which exploits all the first terms of the sequence S |ψ〉 (q) up to q = q max , i.e., S |ψ〉 (q) : q = 2, . . ., q max .Figure 4 shows the results for different N and q max .As expected, the estimates of the GME is better than the crude estimate S |ψ〉 (q max ) with the convergence acceleration algorithm, especially for low q max .In particular, the skewness of the distributions of predicted GMEs compared to actual GMEs is much less pronounced.For N = 4, we can see in the top right panel that the MRE is already reduced to only about 10% for q max = 3 [one order of magnitude lower than for the estimate 1 − S |ψ〉 (q max )].For larger q max , we find a behaviour compatible with an exponential decrease of the MRE.
Note that we also compared the results of the E-algorithm to the ones obtained via the implementation of the θ -algorithm, a popular convergence acceleration algorithm which has the advantage to not require the knowledge of the asymptotic scaling of the accelerated sequence, but we did not find better performance (data not shown).
Figure 4: Same representation and parameters as in Fig. 3, but with predictions based on the recursive E-algorithm.The grey solid line in the top right panel shows a decreasing exponential fit of equation ∆(q max ) ≈ 8.667 exp(−0.204q max ).

Artificial neural networks
One of the great advantages of ANNs is their predictive power in non-linear regression problems.Here we are interested in the ability of an ANN to predict the GME based on a few Wehrl moments.Basically, a neural network is a set of layers (see e.g.Fig. 5), indexed by l, containing a given number N l of nodes, indexed by i, each containing a real value y Therefore, the values of the nodes in the first layer are as follows To increase the capability and predictive power of the network, a bias b i can be added to each node and, in order to obtain a non-linear regression, a non-linear function f can be applied to each value in a given layer.Thus, the general form of the values contained in layer l is By feeding the nodes of one layer with the values of the previous layer, the input data flows through the network and finally the last layer contains the value of the regression, in this case an estimate of the GME.Initially, the weights and biases are chosen randomly.In the training process, the neural network updates them using the gradient descent algorithm in order to minimise a given loss function that compares the expected result and the value of the last layer.
For the learning process, we take a batch size of 500 and, for each q max ∈ [2, 8] and N ∈ [2, 10], we train the ANN in a supervised manner for 5000 epochs with the ADAM optimizer.Our loss function is the squared difference averaged over the batch.Remarkably, even after 5000 epochs, no overfitting is observed (see Fig. 11 and the additional discussion in Appendix C).
We show in Fig. 6 the results of the different trainings applied to the test dataset.We find that ANNs give quite reliable predictions already for q max = 3 with a MRE at 1%, one order of magnitude less than with the convergence acceleration.More surprisingly, even on the basis of the first non-trivial Wehrl moment W (2) |ψ〉 , ANNs give a good estimate for weakly and strongly entangled states.When we take into account more Wehrl moments, the ANNs are able to predict the GME more accurately.For a fixed number q max = 4 of Wehrl moments (see Appendix C for q max = 8), we find that the MRE increases as we increase the number of qubits but eventually saturates.We believe that for a higher number of qubits, there is a greater spectrum of states with the same first Wehrl moments but different GMEs.This would imply that the input to the ANN is not sufficient to distinguish between these different states and would explain the observed increase in error.We also observe that at q max = 4, the MRE saturates at about 1% for N ≳ 5.This result is quite remarkable as it shows that with ANNs the MRE seems to scale very favourably with N .

Discussion of the main results
We will now summarise our main results.We show in Fig. 7 the mean relative error for the different methods investigated in Sec. 4, for a wide range of maximum orders q max and number of qubits N .The relative performance of the different methods of obtaining estimates for the GME are clearly evident.We consistently find that the MRE on the GME is lowest for the ANNs, then for the convergence acceleration algorithm and finally for the Wehrl moment ratios.The differences in performance are quite large, with ANNs outperforming the other   6: Same representation and parameters as in Fig. 3, but with predictions based on trained ANNs.The grey solid line in the top right panel shows a decreasing exponential fit of equation ∆(q max ) ≈ 0.989 exp(−0.179q max ).
methods by at least an order of magnitude.For the methods based on ANNs and convergence acceleration algorithms, the MRE decreases very rapidly from q max = 2 to q max = 4.Then, the MRE decreases exponentially at roughly the same rate for both methods.For q max = 4, the MRE obtained with ANNs seems to quickly saturate to about 1% for large number of qubits (N ≳ 6, see right panel).We have also tested the ANN on a set of pure states that have been dynamically generated from spin squeezing.This set is characterised by a GME distribution that differs strongly from those used to train the ANN (see appendix C for more details).In this case, we find that the ANN also works very well with similar performance, demonstrating its great flexibility upon variations of input data.Furthermore, we show in Appendix D that an ANN trained on noisy Wehrl moments is still able to predict the GME quite accurately.

Extension to mixed states
Under experimental conditions, the quantum state of a system is never perfectly pure due to the interaction of the system with its environment, resulting for example in depolarisation.It is therefore important to address the case of mixed states as well.Although the relationship (15) between Wehrl moments and GME is only valid for pure states, Wehrl moments can nevertheless provide valuable information about mixed states and potentially also about their entanglement.Therefore, it is still interesting to try to train ANNs to predict the GME of mixed states on the basis of their Wehrl moments.Note that for a mixed state ρ, the Wehrl moments are defined as in Eq. ( 5) with the Husimi function now given by Q ρ (Ω) = 〈Ω|ρ|Ω〉.

GME for mixed states
The geometric measure of entanglement of a mixed state ρ is defined based on the convex roof construction where the minimum is taken over all pure state decompositions {p i , |ψ i 〉} of ρ.In [37], it was shown that this definition is equivalent to another definition based on the distance of ρ to the convex set S of separable mixed states, where is Uhlmann's fidelity between any two mixed states ρ and σ.The form (36) allows us to compute the GME of mixed states using a semidefinite program, as we explain in Appendix E (see also [38]).

Results for depolarized states
For training the network, we generated a set of 1000 depolarised mixed states for each N ∈ {2, 3, 4} and reduced the batchsize to 50.The mixed states were obtained by drawing pure random states |ψ〉 according to the Haar measure and mixing them with the maximally mixed state ρ 0 = /(N + 1) as follows where k ∈ [0, 1] is a parameter quantifying the degree of depolarisation.The results on the test data are represented in Fig. 8 for k = 0.05 by yellow diamonds.We see that for depolarised states, the MRE is around 0.1% or even below for N ∈ {2, 3} and below 1% for N = 4 for q max ⩾ 4. For comparison, we also show the lower MRE obtained for pure random states (see Section 4.3) by blue dots.The data displayed in Fig. 8 shows that Wehrl moments remain useful quantities for predicting entanglement of mixed states in multiqubit systems.It is interesting to note that even for highly mixed states of the form (38), ANNs are still able to predict with high accuracy the GME.This is shown in Fig. 9, where we consider higher degrees of depolarisation k.Counter-intuitively, we find that the predictions on the GME improve as k increases (see middle and right panel).This is probably due to the specific class of mixed states we have considered and the fact that the range of GME values that the ANN has to account for decreases with k (see left panel).It does, however, show that for a typical decoherence model such as depolarization, Wehrl moments still contain essential information for predicting the GME even for highly mixed states.38) and ( 39) to generate the mixed states forming the test dataset.
We also trained ANNs on 1000 mixed states obtained by drawing pure random states |ψ〉 and mixed random states ρ and mixing them as follows The results on the test data are represented in Fig. 8 by red squares.This time, the error is systematically higher than that obtained for the depolarised states (38), but it remains at an acceptable level for k = 0.05 and q max ⩾ 4.

Protocol for measuring Wehrl moments
In this section, we propose a simple protocol based on spherical t-designs that allows the experimental determination of Wehrl moments of various orders from the same set of measurement outcomes of Stern-Gerlach experiments.A spherical t-design is a set of n t points on the unit sphere, located at angles for all trigonometric polynomials P of degree at most t.Taking P(Ω) = Q ρ (Ω) q , and assuming for the moment that t is sufficiently large, we obtain by combining Eqs. ( 5) and ( 40) from which we can conclude that it is sufficient to measure the Husimi function in a finite number n t of directions to determine the Wehrl moments.The Husimi function at Ω can be rewritten as where R(Ω) is the non-entangling rotation operator which maps the separable Dicke state is the rotated state and p Ω 0 is the probability that the system in state ρ Ω is found in state |D (0) N 〉.The latter probability can be measured from a Stern-Gerlach experiment giving access to N 〉| 2 : k = 0, . . ., N or, in the case of an atomic system, by driving a dipole transition to an auxiliary energy level and then observing the resonance fluorescence to obtain p Ω 0 [41].This protocol involving measurements of the Husimi function by determining the probability of a multi-qubit state being in different pure separable states using rotations is a fairly common technique, see e.g.[24,42,43], and can be applied for single spin systems, collections of two-level systems and even for light polarization.In fact, it has already been routinely implemented in several experiments [44][45][46], e.g. using half-wave plates and polarising beam splitters in the case of multiphoton polarization states [46].
The advantage of our protocol, which consists of measuring the Husimi function in a finite number of directions and extracting the Wehrl moments, is that it is totally independent of the state under consideration.Indeed, the Husimi function of any N -qubit symmetric state is a polynomial function of degree N .By choosing t = N q, all Wehrl moments can be extracted exactly, up to order q, irrespective of the state ρ.As regards spherical designs, it has been shown numerically that n t ≈ t 2 /2 [47], so to extract the Wehrl moments up to order q, we should measure the Husimi function in ≈ (N q) 2 /2 points.This quadratic scaling with N is clearly more favourable than the cubic scaling of full state tomography for multiqubit symmetric states [48,49].Note also that our protocol is not necessarily optimal and that there might be clever ways of using the full set of probabilities {p Ω k } obtained in Stern-Gerlach experiments (instead of only p Ω 0 ) to find a better approximation of the Wehrl moments.

Results from approximate Wehrl moments
Since ANNs are, to a certain extent, intrinsically robust to noise, it is not necessary to have perfect determination of the Wehrl moments in order to obtain good estimates of the GME (see Appendix D for more details).This suggests the possibility of using spherical designs of order t less than N q to obtain approximate Wehrl moments up to order q via Equation ( 42) approximates all Wehrl moments with q > t/N from the same set of Husimi function values.Therefore, as long as this improves the prediction of the ANN, we can give it approximate Wehrl moments of increasing order.
Figure 10: Same representation and parameters as in Fig. 3, but with predictions based on Wehrl moments obtained from Eq. ( 42) with Ω k the points defining a spherical t-design with t = 13 and n t = 94.For the lower panels, we took q max = 10 instead of q max = 4.
We show in Fig. 10 the results of the training of ANNs based on the spherical t-design with t = 13 and the same test set of pure states as presented in Sec. 3.They show that the MRE can be brought down to a level of 1% with t = 13 even for a number of qubits up to 10.We chose this particular value of t because the spherical design contains antipodal points and the Husimi function at two antipodal points can be measured by a single Stern-Gerlach experiment.The number of directions in which the Stern-Gerlach experiment must be performed can therefore be halved in this case (from n t = 94 to 47).

Conclusion
In this work, we have studied how ANNs can be used to give an accurate estimate of the geometric measure of entanglement (GME) of pure and mixed symmetric multiqubit states based on their first Wehrl moments (moments of their Husimi function).We also used convergence acceleration methods to estimate the GME.More specifically, we implemented the algorithm E informed by the asymptotic behaviour of the Wehrl moments which we determined analytically.We found that even this powerful convergence acceleration algorithm is outperformed by ANNs when fed with the same input data.We proposed an experimental protocol for measuring Wehrl moments that offers a gain over full state tomography and we showed that it can be coupled with ANNs to obtain a good estimate of the GME.This provides opportunities for the experimental estimation and certification of entanglement on the basis of a few Wehrl moments.
This work opens up several perspectives.First, while we have focused on the determination of GME, our approach could have been used to determine e.g.Wehrl entropy [50,51], as both GME and such entropy are based on Wehrl moments, opening up characterizations of quantum chaos and phase transitions via ANNs.Secondly, it is known that determining the GME of a quantum state is a considerably more complex task for mixed states than for pure states.Nevertheless, as we have shown in Sec.6, the GME of a depolarised state can still be predicted with high accuracy from its first Wehrl moments.Remarkably, we even found that GME predictions improve as the state purity decreases, probably because the entanglement also decreases in this case.It would be of great interest to know for which other types of mixed states ANNs also give reliable estimates of the GME.In addition, our approach could be generalized to non-symmetric many-body quantum states where one is confronted with the exponential many-body wall, as it can be expected that ANNs will also be perform well in this context [52].More generally, an approach similar to the one used in this work could be followed to estimate the maximum or minimum of a continuous (quasi)probability distribution other than the Husimi function from its first moments, such as the Wigner function to explore the non-classicality of quantum spin states.

B Asymptotic behaviour of the Wehrl moments
In this Appendix, we derive the asymptotic scaling of the Wehrl moments, Eq. ( 21), using Laplace's approximation for evaluating integrals, following [61].
First, let us rewrite without restriction the Wehrl moments (5) as where For large q, we expect the integrand to be non-negligible only around the minimum of f |ψ〉 (Ω) (the maximum of Q |ψ〉 (Ω)).For simplicity, we consider here the generic case where the minimum is unique, which is however not the case for all states.The idea to obtain the asymptotic behavior of the Wehrl moments as q → ∞ is to perform a series expansion of f |ψ〉 (Ω) around its minimum.For convenience, we expand instead the function f|ψ〉 where || • || is the standard Euclidian norm and o(•) the little-o notation. 1The Wehrl moment (B.1) then reads where det ( • ) is the determinant.For large q, the region of integration tends to 2 , and the integral becomes a standard 2D Gaussian integral equal to 2π/(1 Hence, the asymptotic behavior of the Wehrl moments finally reads is a constant independent of q.

C Additional information on ANNs
Figure 11 shows an example of the evolution of the loss function on the test dataset throughout the training of the ANN for different numbers of qubits and q max = 4.We observe no overfitting, with the loss function decreasing even after a large number of epochs.Figure 12: Same as Fig. 6 for N = 8 (top) and q max = 8 (bottom).The grey solid line in the top right panel shows a decreasing exponential fit of equation ∆(q max ) ≈ 1.919 exp(−0.197q max ).
Figure 12 shows the performance of the ANNs for a larger number of qubits and a larger maximal order than the results presented in the main text.For the top panels N = 8 and for the bottom panels q max = 8.The same general observations as in the main text apply in this case, in particular the fact that the mean relative error is below 1% already for q max = 4.
In order to further test the performance of ANNs, we generated another set of states resulting from the dynamical evolution corresponding to a spin squeezing.We calculated the time evolution of the initial coherent/product state |D (0) N 〉 under the Hamiltonian where χ x , χ y , χ z are squeezing rates along the three spatial directions.At regular times, we sampled the state of the system and calculated its Wehrl moments and GME.After 500 time steps ∆t = 0.1, we ended the evolution and started again from the same initial state.The χ α rates were chosen randomly between 0 and 1 at the beginning of each evolution.In this way, we generated 30 000 states on which we tested the previously trained ANNs.The results are presented in Fig. 13.We find that the ANNs still predict E G very well even though they have never handled this type of states before.This shows that the training set was sufficiently large and representative to obtain ANNs capable of inferring beyond the states on which they have been trained.

D Noisy Wehrl moments
In our previous developments, we used the exact value of the Wehrl moments for each multiqubit state.However, the Wehrl moments may not be known exactly, e.g. because of noises that are inevitably present in an experiment or because they can only be calculated approximately.This provides an incentive to test ANNs with noisy inputs.As a first approach, we applied Gaussian noise to our inputs S |ψ〉 (q) (from the same training and test data sets as before).More precisely, for each q, we first calculated the average value of the ratio of Wehrl moments over the whole data set, S |ψ〉 (q) .Based on this value, we defined a normal distribution with a mean value of zero and a standard deviation given by σ = η S |ψ〉 (q) , (D.1) where η is a real number that quantifies the magnitude of the noise.Then we applied noise, sampled from the normal distribution, to each Wehrl moment ratio and fed these noisy Wehrl moments to ANNs trained in two different ways: ANNs trained as before on noiseless Wehrl moments and ANNs trained directly on noisy Wehrl moments.The results are shown in Fig. 14 for η = 0.01.We find that the least satisfactory predictions are obtained from ANNs that have not been trained on noisy Wehrl moments (red squares).The explanation we see is that ANNs trained on noiseless Wehrl moments become excellent at predicting GME with such data but are unable to generalise on noisy data (a phenomenon similar to overfitting).However, ANNs trained on noisy Wehrl moments work much better and give a low mean relative error, around 1%, for q max ⩾ 4 (yellow diamonds).For a higher noise level, the MRE increases and is of the order of 2.6% for η = 0.03 with q max = 4 and N = 4.

E Semidefinite program for calculating the GME of mixed multiqubit symmetric states
The computation of the geometric measure of entanglement (GME) of a mixed state ρ, which can be defined as where F ρ, σ sep is Uhlmann's fidelity between ρ and σ sep , involves an optimization on the convex set S of separable states.In Ref. [62], a method was derived to compute the maximum fidelity between a state ρ and an arbitrary convex set of states D using semidefinite programming (SDP).This method is based on the equivalence between the problem of finding max σ∈D F (ρ, σ) and the SDP problem where X is a matrix with complex entries.Therefore, we only need a parametrization (even approximate) of the set of separable states D ≡ S to be used in the SDP program (E.2) in order to be able to calculate the (approximate) value of the GME of mixed states.By Carathéodory's theorem (see e.g.[28]), we know that any separable symmetric state of N qubits can be expressed as a convex sum of (N + 1) 2 pure symmetric product states, that is where p i ⩾ 0 and i p i = 1.In our SDP problem, the p i and the entries of the X matrix are then the variables to be optimised on.To perform the optimization, we used the Convex.jlpackage [54] written in Julia with the SCS optimizer [55].We have verified that our SDP

Figure 2 :
Figure 2: Frequency distributions of GME of the training set (left) and test set (right) for N = 8 qubits, where the three subsets of states are represented by different colors.The number of states in the data sets is large enough to generate a similar GME distribution for the training and test sets.For N = 8, the maximal GME is E G ≈ 0.816[34], while Eq.(10) gives the upper bound E G (|ψ〉) ⩽ 8/9 ≈ 0.889.

Figure 3 :
Figure 3:Predictions of E G based directly on Wehrl moment ratios S |ψ〉 (q max ) = W node is linked to the nodes in the nearest layers by weights w (l) i j .The values contained in the first layer are the input data y (0) i .In this work, y |ψ〉 .Each value of these nodes is propagated to the nodes of the next layer by multiplying it by the weight connecting the two nodes.

Figure 5 :
Figure 5: Representation of the ANN architecture used in this work.

Figure
Figure6: Same representation and parameters as in Fig.3, but with predictions based on trained ANNs.The grey solid line in the top right panel shows a decreasing exponential fit of equation ∆(q max ) ≈ 0.989 exp(−0.179q max ).

Figure 7 :
Figure 7: Comparison of the mean relative error (MRE) on the GME obtained with the bare Wehrl moment ratios (blue dots), with the recursive E-algorithm for convergence acceleration (yellow diamonds) and with ANNs (red squares).Left panel: MRE as a function of q max for N = 4. Right panel: MRE as a function of N for q max = 4.

4 Figure 8 :
Figure 8: Results of the training of the ANNs on mixed states of the form (38) (yellow diamonds) and (39) (red squares) for k = 0.05.The blue dots represent the MRE for the predictions of the ANNs trained in Section 4.3 and applied to the pure states used in the equations(38) and(39) to generate the mixed states forming the test dataset.

Figure 11 :
Figure 11: Loss function (averaged squared error, see Sec. 4.3) of the test dataset as a function of the number of training epochs for a maximal order q max = 4 and different numbers of qubits N .

Figure 13 :
Figure13: Left panel: Frequency distribution of GME of 30 000 squeezed states generated for N = 8 qubits.Middle and right panels: mean relative error on the estimate of the GME obtained from ANNs for N = 4 and q max = 4 respectively.The grey solid line shows a decreasing exponential fit of equation ∆(q max ) ≈ 1.745 exp(−0.189q max ).

i=1 p i
|α i 〉〈α i | , (E.3) with |α i 〉 ≡ |α i 〉⊗N where |α i 〉 are single-qubit states.But since the |α i 〉 in (E.3) are a priory not known, we can construct an ansatz for separable states by taking the convex combination of a large number n max ≫ (N + 1) 2 of fixed pure product states |α rand i 〉 drawn at random, i.e.

Figure 14 : 2 p + 1 − p 2 ,
Figure 14: Mean relative error (MRE) on the GME obtained from ANNs fed with noisy input data.The red and yellow symbols give the MRE for ANNs trained respectively on noiseless and noisy Wehrl moments.For comparison, the blue dots give the MRE for ANNs trained and tested on noiseless Wehrl moments (see Fig. 6).