SciPost Submission Page
Symmetry meets AI
by Gabriela Barenboim, Johannes Hirn, Veronica Sanz
This is not the latest submitted version.
This Submission thread is now published as
Submission summary
Authors (as registered SciPost users): | Veronica Sanz |
Submission information | |
---|---|
Preprint Link: | https://arxiv.org/abs/2103.06115v1 (pdf) |
Date submitted: | 2021-04-13 08:21 |
Submitted by: | Sanz, Veronica |
Submitted to: | SciPost Physics |
Ontological classification | |
---|---|
Academic field: | Physics |
Specialties: |
|
Abstract
We explore whether Neural Networks (NNs) can {\it discover} the presence of symmetries as they learn to perform a task. For this, we train hundreds of NNs on a {\it decoy task} based on well-controlled Physics templates, where no information on symmetry is provided. We use the output from the last hidden layer of all these NNs, projected to fewer dimensions, as the input for a symmetry classification task, and show that information on symmetry had indeed been identified by the original NN without guidance. As an interdisciplinary application of this procedure, we identify the presence and level of symmetry in artistic paintings from different styles such as those of Picasso, Pollock and Van Gogh.
Current status:
Reports on this Submission
Report #3 by Luigi Del Debbio (Referee 2) on 2021-5-14 (Invited Report)
- Cite as: Luigi Del Debbio, Report on arXiv:2103.06115v1, delivered 2021-05-14, doi: 10.21468/SciPost.Report.2916
Report
This manuscript presents a very interesting study of the possibility to use ML to 'detect' symmetries in a given image. The focus here is in understanding how a NN encodes the information about the image that it is being learned, and whether this information allows us to say something about the symmetries of the image. The manuscript should be accepted for publication, the authors may wish to consider some of the remarks below.
The key idea of this work is to analyse the last hidden layer of the NN and determine whether some information about the symmetries of the image is encoded there. The output of the 200 neurons in this layer is projected onto two-dimensions using PCA. The output is an image that is fed into a convolutional neural network that will learn the symmetry classification. The training is done using physics potentials with known symmetry properties, before using the trained machinery onto paintings.
The results are tantalising and should inspire further studies.
I have a few comments that I would like the authors to address.
1) It would useful to specify clearly in the manuscript what is the *exact input* and what is the output of the FCNN used for the decoy task. What are the details of the training? is there a training set and a validation set?
2) As usual with these kind of studies, it would be good to have some reassurance about the potential systematic errors in the procedure. How strongly do the results depend on tuning some of the hyper-parameters? There are currently a few comments at the end of section II in the manuscript, but it would be useful to expand and provide some more quantitative information. For instance when saying that 'the CNN [...] does not manage to reach the same accuracy...', can this be quantified with a few examples? This point in particular could be important, since the authors suggest that a 'less-perfect' learning would rely on symmetries to encode the image more than a perfect reproduction. This is obviously a very interesting suggestion, which deserves more detailed scrutiny. Surely, if the training is relaxed too much, then informations about the image will be lost. There must be an ideal window where the system works best. Is it possible to explore this feature quantitatively?
Requested changes
I would suggest that the authors consider the remarks above.
Report #2 by Anonymous (Referee 3) on 2021-5-10 (Invited Report)
- Cite as: Anonymous, Report on arXiv:2103.06115v1, delivered 2021-05-10, doi: 10.21468/SciPost.Report.2895
Report
The authors propose a methodology to extract the underlying mathematical symmetries from the learning procedure of a neural network. They initially train a fully connected neural network (FCNN) with images generated using certain symmetries. Then authors propose a CNN based learning procedure where the initial two principal components collected from FCNN are fed into to classify underlying symmetries. The paper is well-written, relevant for both physics and AI applications, and the results are explained in detail. I would suggest the publication on SciPost if the authors can clarify the following points.
I. In section II.1, the authors are designing a decoy task procedure using FCNN where after a step of preprocessing the template images (fig 2), they are given to an FCNN. This requires a particular procedure of flattening the image. If a large enough network has been given, the flattening procedure should not affect the outcome. I was wondering if the authors have investigated different flattening options or their effects on the outcome. Since this will affect the relationship between neighbouring pixels, I wonder if it affects learning the underlying symmetries.
II. Please avoid this request in case if I missed it in the manuscript. For reproducibility purposes, can authors clarify the specification of the FCNN that has been used? Such as the activation functions, loss functions, if any etc.
III. In section II.2, the authors are using ResNet18 for the classification of the symmetries. I wondered if smaller network options have been investigated and if the reason for choosing such an extensive network is purely for increasing accuracy.
IV. Can the authors clarify the origin of the error bars of the symmetry bins in figures 4 to 8? Does it appear as a result of running the algorithm many times, as described on page 5? Please correct me if I am missing something, but a neural network will result with the same output every time unless it has a Bayesian layer. Is this because the authors are using multiple distorted template images to classify one symmetry; hence, a class is decided by the statistical significance of this collection?
V. I'm aware that it is hard to release a public code, but if it is possible, can authors also release the analysis code that has been used for this study. I'm sure that the community will appreciate and further develop related studies using their code.
Report #1 by Tilman Plehn (Referee 1) on 2021-5-7 (Invited Report)
- Cite as: Tilman Plehn, Report on arXiv:2103.06115v1, delivered 2021-05-07, doi: 10.21468/SciPost.Report.2880
Report
The paper is very interesting, even though there does not appear all that much physics-specific content. So I recommend it for publication, with a set of comments included in the pdf file (in red). One aspect that could use some more work is the list of references, one of the relevant ML keywords might be `representation learning'.