SciPost Submission Page
Tensorization of neural networks for improved privacy and interpretability
by José Ramón Pareja Monturiol, Alejandro Pozas-Kerstjens, David Pérez-García
Submission summary
Authors (as registered SciPost users): | José Ramón Pareja Monturiol |
Submission information | |
---|---|
Preprint Link: | scipost_202503_00007v1 (pdf) |
Code repository: | https://github.com/joserapa98/tensorization-nns |
Date submitted: | March 4, 2025, 5:56 p.m. |
Submitted by: | Pareja Monturiol, José Ramón |
Submitted to: | SciPost Physics |
Ontological classification | |
---|---|
Academic field: | Physics |
Specialties: |
|
Approaches: | Theoretical, Computational |
Abstract
We present a tensorization algorithm for constructing tensor train/matrix product state (MPS) representations of functions, drawing on sketching and cross interpolation ideas. The method only requires black-box access to the target function and a small set of sample points defining the domain of interest. Thus, it is particularly well-suited for machine learning models, where the domain of interest is naturally defined by the training dataset. We show that this approach can be used to enhance the privacy and interpretability of neural network models. Specifically, we apply our decomposition to (i) obfuscate neural networks whose parameters encode patterns tied to the training data distribution, and (ii) estimate topological phases of matter that are easily accessible from the MPS representation. Additionally, we show that this tensorization can serve as an efficient initialization method for optimizing MPS in general settings, and that, for model compression, our algorithm achieves a superior trade-off between memory and time complexity compared to conventional tensorization methods of neural networks.
Author indications on fulfilling journal expectations
- Provide a novel and synergetic link between different research areas.
- Open a new pathway in an existing or a new research direction, with clear potential for multi-pronged follow-up work
- Detail a groundbreaking theoretical/experimental/computational discovery
- Present a breakthrough on a previously-identified and long-standing research stumbling block
Current status:
Reports on this Submission
Strengths
- The problem of fitting data to a matrix product state is of clear interest and timely.
Weaknesses
- This is not a physics paper.
- The performances of the approach are unclear. The algorithm is insufficiently characterised. (At first glance the performances look inferior to existing algorithms).
- No comparison with other approaches has been performed.
Report
The resulting RSS algorithm is applied to the compression of neural networks and in particular to the problem of privacy (making the training data unretrievable from the neural network) and interpretability.
I had a hard time to figure out if the method was working or not. For instance, I assume that Fig.3.1 plots R(s) and that the left panel is multiplied by 10^-6. But what about the middle panels? If it is also multiplied by 10^-6 then the method works fine, I am just surprised that it works better on the test than on the training data. But I am puzzled by the statement in the “slater” subsection: “Although these results are less conclusive than those of the previous case, the bottom row of Fig. 3.1
shows that the errors decrease as N increases. Remarkably, even with a higher discretization level of
l=100, we achieve test errors on the order of 10^-2 for N> 70” which does not match with the above interpretation (10^-2 >> 10^-6). In any case the first example being exactly of finite rank, the error should drop to machine precision and it clearly does not. There is something I miss here. Also, how does N translates in terms of the number of function calls is unclear.
The second “slater” example is the function exp(-r) with r=\sqrt{x^2+y^2+z^2+…}. I may be mistaken but my feeling is that a technique like Tensor Cross Interpolation would give much better results than a mere two digits. Overall, the authors should clarify the status of these results.
I think that the application to interpretability and privacy does not make much sense if the results are poor. I had an even harder time interpreting these results but e.g. the log scale on the lower panel of Fig.3.2 test error column looks pretty bad to me. Same with later figures, accuracies at the 50% level are not really useful. The authors argue that this may be a baseline improved later for “privacy” purpose. Perhaps, but I am not fully convinced. The AKLT example is not fully convincing too: it can be recovered easily by other methods.
So overall, I found that the article lacks in clarity as far as the results are concerned, and I am afraid that this lack of clarity simply hides the fact that the results are not so good.
Requested changes
- Compare the performance to other algorithms such as Tensor Cross Interpolation.
- Clarify the performances of RSS. Include less trivial examples.
- Some figures (e.g. 3.1, 3.2… ) have no clear X-axis and Y-axis. Please update.
- No study of the behaviour of the approximation versus c is shown so we don’t know the limits of the approach. Please update.
- Same with the number of function calls
Recommendation
Ask for major revision
Strengths
- Introduce a practical heuristic algorithm for determining a TT approximation from sampled values
- Demonstration in a various range of tasks with a special focus on privacy and interpretability
Weaknesses
- It is not written what are the advantages of TT-RSS over TT-RS
- No direct comparisons of efficiency with similar technologies [Tensor Train Cross Interpolation (TT-CI)]
Report
The proposed algorithm constructs a TT representation from the values of a function at selected sampling points. As in previous studies, the algorithm uses the Core Determining Equations (CDEs) to determine the TT cores, but it employs sketches and random projections to make solving the CDEs feasible. Overall, the algorithm resembles TT-RS (Ref. 63), with the main differences being the choice of sketching matrices and the random projections used.
The manuscript is accessible to readers who are not already familiar with TT-RS, and its preliminary applications to machine learning will likely attract interest from researchers outside of physics. I believe the manuscript meets the criterion of “providing a novel and synergetic link between different research areas” along with other general acceptance criteria. However, I would like to hear reviews from applied mathematicians and machine learning experts as well, if possible.
I recommend that the authors make some modifications to clarify what is new in their approach. Below are my specific comments:
1. In Ref. 63, Hur et al. proved conditions for suitable sketch matrices (Theorem 3). Does the choice of sketch matrices in the present study satisfy these conditions?
2. The authors state that TT-CI does not apply to continuous functions. However, Ref. 77 (Appendix A.1) presents a TT-CI algorithm for continuous functions. Please clarify this discrepancy.
3. I did not understand the point of the sentence: “Although we do not formally establish properties that the selected pivots must satisfy to yield good results, our proposal of choosing pivots from the training set…” Is the intended meaning that, when approximating a neural network model trained on a dataset, choosing pivots randomly from that same training set is empirically effective? If so, what is a robust method for selecting good pivots from a black-box function? Random sampling in an exponentially large space is likely to suffer from the curse of dimensionality.
4. In Equation (2.10), the authors introduce random square projection matrices. The role of these projections is unclear. They appear to mix the columns of bar(Phi) in Equation (2.13), yet after the singular value decomposition in Equation (2.14), the right matrix, Ck, is discarded. Please clarify the purpose and impact of this projection on the overall algorithm.
Requested changes
1. Section 2 is very similar to the description in Ref. 63. To clarify the differences and fill in some gaps in the logic, please refer to Ref. 63 more frequently. For example, the introduction of the A matrix in Equation (2.15) appears rather abrupt without additional context.
2. The statement, “Moreover, Ref. [73] demonstrated that Equation (2.29) yields a TT representation of f if the interpolation sets …” is misleading. When pivots are nested, the TT-CI formula interpolates the pivots. If not nested, the TT-CI formula only approximates the function.
Recommendation
Ask for minor revision