Quantum Approximate Optimization for Hard Problems in Linear Algebra

The Quantum Approximate Optimization Algorithm (QAOA) by Farhi et al. is a framework for hybrid quantum/classical optimization. In this paper, we explore using QAOA for binary linear least squares; a problem that can serve as a building block of several other hard problems in linear algebra. Most of the previous efforts in quantum computing for solving these problems were done using the quantum annealing paradigm. For the scope of this work, our experiments were done on the QISKIT simulator and an IBM Q 5 qubit machine. We highlight the possibilities of using QAOA and QAOA-like variational algorithms for solving such problems, where the result outputs produced are classical. We find promising numerical results, and point out some of the challenges involved in current-day experimental implementations of this technique on a cloud-based quantum computer.

Abstract-The Quantum Approximate Optimization Algorithm (QAOA) by Farhi et al. is a framework for hybrid quantum/classical optimization. In this paper, we explore using QAOA for binary linear least squares; a problem that can serve as a building block of several other hard problems in linear algebra. Most of the previous efforts in quantum computing for solving these problems were done using the quantum annealing paradigm. For the scope of this work, our experiments were done on the QISKIT simulator and an IBM Q 5 qubit machine. We highlight the possibilities of using QAOA and QAOA-like variational algorithms for solving such problems, where the result outputs produced are classical. We find promising numerical results, and point out some of the challenges involved in currentday experimental implementations of this technique on a cloudbased quantum computer.

I. INTRODUCTION
The application of quantum computing to hard optimization problems is a candidate where quantum computing may eventually outperform classical computation [1]- [6]. At the time of writing this paper, Noisy Intermediate Scale Quantum (NISQ) computers [5] are being developed by several firms and research groups [7]- [13]. The two main approaches to quantum optimization are (i) the Quantum Annealing (QA) physical heuristic [6] and (ii) Quantum Approximate Optimization Algorithm (QAOA) [4] on the gate-model quantum computer [1].
In this paper, we are going to explore and propose the use of QAOA for hard problems in linear algebra. In particular, we are going to focus on the problem of binary linear least squares (BLLS). The reason for choosing BLLS is because it can be a building block for other hard problems in linear algebra 1 . Previous works in quantum computing for such problems were done with quantum annealing [14]- [17]. We hope that our work provides insights to fellow researchers to further explore the use of NISQ era methods [4], [18] for problems in linear algebra and numerical computation. In Section 2, we cover the necessary background and related work for our paper. Section 3 is about formulating the BLLS problem for the QAOA ansatz. The experiments, results and discussion are detailed in Section 4. We finally conclude our paper in Section 5. We also 1

explained in Section II-A1
have Appendices to complement and support the information in the main paper as needed.

II. BACKGROUND AND RELATED WORK
A. Background 1) The binary linear least squares problem : Given a matrix A ∈ R m×n , a column vector of variables x ∈ {0, 1} n and a column vector b ∈ R m (Where m > n). The linear BLLS problem is to find the x that would minimize Ax − b the most. In other words, it can be described as: arg min The motivation behind choosing the BLLS problem to be applied to QAOA is twofold: firstly, its an NP-Hard problem [19] that makes it a suitable candidate for QAOA. Secondly, it can act as a building block for other hard problems in linear algebra, such as the Non-negative Binary Matrix Factorization [15]. Another reason why one may view BLLS a building block for other problems is because multiple binary variables can be clubbed together for a fixed point approximation of a real variable [14], [16], [20]- [23]. Amongst these, there are some problems that are NP-hard for which an approximate solution would be acceptable [16], [23]. In these cases, QAOA may be able to provide an improvement in approximation ratio (compared to classical solvers) and even increase the probability of sampling the best solution.
2) Non-negative Binary Matrix Factorization (NBMF) : NBMF is a specialized version of the Non-negative Matrix Factorization (NMF) problem. Given a matrix V ∈ R m×n ≥0 , the problem is to factorize it into matrices W ∈ R m×r ≥0 and H ∈ {0, 1} r×n (H would have non-negative real entries in NMF).
NMF and its variants are used in multiple disciplines such as computer vision [24], astronomy [25] and data mining [26], just to name a few. BLLS can be used in order to solve the NBMF variant by using the alternating least squares method [27].
In Algorithm 1, line 5 is solved classically since efficient algorithms exist for it [28], it's line 7 that is solved using BLLS (where QAOA would be applied).
Algorithm 1 Alternating least squares for NMF 1: procedure MAIN(V ) V is the matrix to be factorized 2: Randomly initialize the matrix H ∈ {0, 1} r×n 3: while not converged do 4: for row i from 1 to n do 5: for column j from 1 to m do 7: In the past, quantum annealing was used as a subroutine within this algorithm to solve NBMF and other NMF related problems [15], [16]. Based on our work with this paper, QAOA can be an alternative to quantum annealing for NBMF, which can be explored in the future.
3) Quadratic Unconstrained Binary Optimization (QUBO): The QUBO Objective function is as follows, where q a ∈ {0, 1}, v a and w ab are real coefficients for the linear and quadratic parts of the function respectively. The QUBO objective function is NP-hard in nature [29]. The advantage of this objective function is that many application domain problems map naturally to QUBO [14]- [16], [30], [31]. In the process of applying BLLS to gate model quantum devices, we use the QUBO formulation as an intermediate stage of expressing the problem. 4) The Quantum Approximate Optimization Algorithm (QAOA): In 2014 Farhi et al. proposed an algorithm that uses both quantum and classical computation for solving optimization problems [4]. The potential advantage of using this algorithm is that it can be implemented by using low depth quantum circuits [32], making it suitable for NISQ devices. We here briefly summarize the QAOA formalism applied to binary optimization problems. For the required preliminaries of quantum computing, the authors recommend the textbook by Nielsen and Chuang [1].
One popular method of encoding an optimization problem to be solved using QAOA, is to first formulate the problem as an Ising Objective function.
where σ a = 2q a − 1 Where σ a ∈ {−1, 1}, h and J are coefficients associated with individual and coupled binary variables respectively. The Ising model is a popular statistical mechanics model, associated primarily with ferromagnetism [33]. Because it has been shown to be NP-Complete in nature [34], the objective function associated with it can be used to represent hard problems [35]. Moreover, if any NP-complete problem has a polynomial time algorithm, all problems in NP do, which makes this a tempting target to solve on a quantum computer in polynomial time (although there are no formal proofs that this is generally possible). The problem then would be to maximize or minimize Eqn(3), depending on how it is set up. The quantum Ising Hamiltonian, which naturally maps the Ising objective Eqn(3) to qubits, can be expressed as: Here, indices a, b, i label the qubits, n is the total number of qubits,σ (z) is the Pauli Z operator and I is the identity operator. The other type of Hamiltonian in the QAOA process is a summation of individual Pauli X operators for each qubit involved in the process, which intuitively represents a transverse field in the Ising model: whereσ andσ (x) = 0 1 1 0 In QAOA, the qubits are first put in a uniform superposition over the computational basis states by applying a Hadamard gate, which maps |0 → (|0 + |1 )/ √ 2, on every qubit. Then, the Hamiltonian pairĈ andB is applied p number of times using a set of angles γ and β, where, for 1 ≤ l ≤ p, each 2 γ l ∈ [0, 2π] and β l ∈ [0, π] [4]. The expectation of the resultant state |ψ p,γ,β is calculated with respect to the HamiltonianĈ as ψ p,γ,β |Ĉ |ψ p,γ,β . A classical black-box optimizer then uses the expectation as its input and suggests new γ and β sets (of length p each). The hope is that as the number of qubits (more specifically, variables) n involved in the optimization increases, if for circuit depth p n we are able to efficiently sample the best solution, we would have an advantage in using QAOA over classical methods. Although algorithm 2 is a summary of the QAOA method, we recommend readers the original paper [4] for further details. 5) Implicit filtering optimization: As mentioned before, QAOA requires us to give it the sets of angles γ and β in order to change the state of the quantum system. The most common way to do this is to use classical black-box optimization techniques that do not need the derivative information of the problem [9], [36], [37]. Since the expected value of the objective function cost (or energy) would be approximate in nature, we need an optimization technique that can handle noisy data. The technique of our choice for this work is the Implicit Filtering algorithm [38].
In essence, Implicit Filtering or ImFil is a derivative-free, bounded black-box optimization technique that accommodates while (β, γ) can be further optimized, or a limit is reached do 5: Initialize res set ← {∅} 6: for a fixed number of shots do 7: res set ← res set ∪ QAOA(B,Ĉ, β, γ, p) 8: From res set, calculate the expected value and store in expt val 9: Based on the expt val, pick new 2p angles (β, γ) by classical optimization 10: From the final res set, set the result with lowest energy, best res ← min(res set) 11: return best res This is the approx. soln 12: procedure QAOA(B,Ĉ, β, γ, t) The quantum procedure 13: Initialize n qubits, |ψ ← |0 j ← 1 16: while j ≤ t do 17: |ψ ← e −iγjĈ |ψ 18: |ψ ← e −iβjB |ψ 19: j ← j + 1 20: Measure |ψ in standard basis and store in a classical register o 21: return o noise when it tries to suggest the best parameters to minimize the objective function. Various other techniques for noisy optimization exist, such as Bayesian Optimization [39], COMPASS [40], SPSA [41], etc. However, we found Implicit Filtering the best for our current efforts. For further details, we recommend the book by C.T Kelly on the topic [38].

B. Related work
One of the first applications of quantum computing for solving problems in the field of linear algebra is the HHL algorithm for solving a system of linear equations [42]. This was followed by works for solving linear least squares [43], preconditioned system of linear equations [44], recommendation systems [45] and many others [46]- [48]. Although the classical counterparts of the above mentioned algorithms run in polynomial time, the quantum algorithms mentioned above run in the polylog time complexity.
However, there are some caveats with such kind of algorithms [49]. Among the many caveats, we'd like to emphasize on the two that affect the practicality of their utility in the near future. Firstly, they require fault tolerant quantum computers whereas, at the time of writing this paper, we have just entered the NISQ era [5]. Secondly, for the algorithms focused on linear system of equations [42], [44], [46] and least squares [43], [47], the output data is encoded as a normalized vector of a quantum state |x (which means that the probability amplitudes of the basis states encode the data). This means that we need an efficient method to prepare the input data as a quantum state; and the output will be a quantum state as well, which means it wouldn't be available for us in the classical world directly by performing measurement in the standard computational basis. This can be mitigated by either measuring the final state in a basis of our choice if our goal is to know some statistical information about x [42], [50] or learning certain values in x (though that will eliminate the exponential speedup [49]).
With respect to quantum annealing, O'Malley and Vesselinov's paper in 2016 [14] was one of the first that proposed to solve linear least squares. Other works in this domain were for solving specific NMF problems [15], [16], polynomial system of equations [20], underdetermined binary linear systems [17] and polynomial least squares [22]. It's hard to speculate about speedups analytically with (i) D-wave's noisy implementation of quantum annealing [51] and (ii) the problem of exponential gap-closing between the problem Hamiltonian's ground state and its excited states [52]. The above mentioned quantum annealing techniques use the Ising objective function for problem formulation. This means that measuring the post annealing quantum state in computational basis gives us a classical x, unlike most gate-model algorithms so far, including the ones mentioned above [42]- [44], [46], [47] that encode the solution in the amplitudes of |x .
NISQ-compatible algorithms for efficiently solving linear algebra problems are highly desirable as of the time of writing this paper. The work by Chen et. al [53] proposes a hybrid algorithm that uses quantum random walks for solving a particular type of linear system, producing a classical result in O(n log n). However, the closest related works to ours are the recent papers that employ variational algorithms [54], [55]. The major difference however, is that, in those papers : i) The output is encoded as the vector of probability amplitudes if the quantum state |x and ii) The problems explored thus far are convex in nature and solved in polynomial time classically.
We in this paper implement QAOA on similar problems which were implemented on D-wave's quantum annealer previously, and therefore briefly mention a comparison here. While QAOA can behave like a discretized version of the annealing process [56], it need not do so in order to be effective. Which means that adherence to adiabatic evolution is not a necessity [3], [4]. One of the promises of QAOA is that it can variationally find optimal paths through the complicated cost Hamiltonian spectrum, in a shorter time/depth than annealing. The associated set of challenges and opportunities for QAOA are different from quantum annealing; in this work, we make a first attempt at facing some of those challenges.

III. QAOA FOR BLLS A. Problem formulation
O'Malley and Vesselinov first gave a QUBO formulation for the BLLS problem [14]. The details of how that is done is in Appendix A. Referring back to Eqn(1), if A ∈ R m×n , b ∈ R m and x ∈ {0, 1} n , we can refine Eqn(2) to be where and Which means that the number of qubits depends only upon the size of the column vector x. All the rows in Matrix A and vector b are preprocessed classically in order to produce the coefficients of the QUBO problem. By the equivalence stated in Eqn (4), we can then convert the problem into an Ising objective function (plus an offset value, irrelevant for optimization) B. Mapping to quantum gates Using the h,J coefficients from Eqn (14) along with the mapping to a quantum Ising Hamiltonian given in Eqn(6) we get:Ĉ Because the individual components of Eqn(15) commute [4], we can express the Hamiltonian simulation ofĈ with an angle γ l as follows Similarly, the exponential of hamiltonian B can be broken down as In order to realize Eqn (16) and Eqn (17), we use the following gates While Eqn (18) is the only gate needed to realize Eqn (17), Eqn (19) alone can merely help with the single qubit components of Eqn (16). For the components that require two qubit interaction, the following gate combination (expressed diagrammatically) is used as a template While Eqn (21) shows the ZZ interactions for adjacent qubits, this strategy can be generalized to any pair of qubits in the system. Appendix B provides an example of a QAOA circuit for BLLS. 1) For IBM Q specific gates: Our experiments were done on an IBM Q device (ibmq_london) available to us through the IBM Q Network. This machine has the following basis gates: {U 1 , U 2 , U 3 , CNOT, I}. The first three gates in the set can be described as We can implement Eqn (18) and Eqn (19) [57] as Another practical consideration to be taken is the qubit connectivity of a real quantum computer. As the number of qubits increase, it is safe to assume that full connectivity between physical qubits is not feasible to engineer. This means that for distant-qubits to interact with each other, we would need logical qubit replacement using SWAP operations. Appendix C elaborates on this with a demonstration with IBM Q gates.

IV. EXPERIMENTS A. Experiment methods
The dataset used in our experiments was randomly generated (seeded for reproducibility) consisting of A ∈ R 40×n , b ∈ R 40 and x ∈ {0, 1} n where n ∈ {3, 4, 5, 9, 10} is the size of the problem. All values for A in the dataset are generated by uniformly sampling floating point approximations of real values in the interval [−1.0, 1.0), and then rounding the values to 3 decimal places. For each value of n, we generate 100 test cases with 40 cases in which Ax * = b, where the best solution x * is sampled randomly and b ← Ax * . The other 60 cases have Ax * = b, where b is generated similarly to A and the best solution x * is found by going through all 2 n possible values for x. This is done to cover both scenarios of the least squares problem. The matrix A is a sparse matrix having density of 0.2, this was done because sparse matrices have a lot of applications in numerical computation and machine learning [58]- [60].
We use the QISKIT [61] SDK to write our own implementation of the QAOA algorithm. As mentioned before, ImFil [38] is our black-box optimizer of choice. The only parameter of ImFil we access is the budget, which governs the maximum iteration limit. The rest of the ImFil parameters for our experiments use their default values. Similarly, unless explicitly stated, all qiskit parameters values taken are default as well. All classical simulations were conducted on standard x86-64 based laptops. Following is a list of the experiments we conducted.
1) Experiments with no noise: Our first set of experiments on the dataset were done on a simulator with the statevector backend, giving us the exact waveform. This means that we are able to compute the exact expectation ψ p,γ,β |Ĉ |ψ p,γ,β for the set of angles γ and β. These experiments help us assess the performance of QAOA in a perfectly noiseless environment for a large dataset.
The above set of experiments were done for p =1, 2 and 3 with random starting points: 20 for p =1, 40 for 2 and 60 for 3 (seeded for reproducibility). Our preliminary study suggested a budget of 200 iterations for p = 1, 2 and 400 iterations for p = 3 respectively. This ensured that at least 70% of our tests converged within the budget while being computationally feasible. At the end of the process, the best results from all the starting points is chosen and recorded.
2) Experiments to compare no noise and shot-noise performance: For our next set of experiments, we use measurement based results on the simulator. Each circuit is run a number of times, specified by the 'shots' parameter. This means that the expectation we get for a given γ and β is approximate in nature. Thus, while quantum circuit simulation itself is noiseless and deterministic in producing the same wavefunction before taking each shot, a finite number of shots is sampled from the resulting wavefunction output probability distribution, introducing a stochastic component. In a real quantum device, one is always limited to this finite tomography, as one has no direct access to the qubit register's quantum wavefunction.
Since in a real quantum device, we do not have access to the qubit register's waveform, simulations with shot-noise are important to conduct. Each experiment was done 10 times per shot value. The shot values chosen for these experiments are in the set {2 i |n − 2 ≤ i ≤ n + 2, i ∈ Z}. We chose this range in order to observe the performance in the limit of perfectly reproducing the wavefunction.
Also, the problem instances chosen for this set of experiments are a random subset of the original dataset. For each problem size n ∈ {3, 4, 5, 9, 10}, we randomly choose 5 problems of the 100 problems (while maintaining the 2 : 3 ratio of the problems by their type). This is done because doing the shot-noise experiments on the original dataset would be computationally infeasible for the limited computational resources at our disposal, since each shot-noise experiment is at least 50 times slower than its statevector counterpart.
The parameters of these experiments have also been modified accordingly. They were done for p =1, 2 and 3, for a budget of 200 iterations with random starting points: 5 for p =1, comparison, this subset of problem instances was also run with the statevector backend for the same parameters.
3) Experiments on an IBM Q device: Based on the results of the first two sets of experiments, we design our experiments for the 5 qubit IBM Q device 'ibmq_london'. In a real device like this one, the qubits face decoherence issues, coherent gate errors, control errors, incoherent gate errors, leakage, cross-talk, readout noise and more. The first set of IBM Q experiments was to run QAOA for problems with n =5, for parameters p =1, budget of 200 iterations and a shot value of 1024. The reason for choosing these parameters for QAOA is to take into account the gate depth limitation and noisy computation, thus choosing the minimal number of qubits while still covering a non-trivial problem graph structure, which can still be easily verified with classical methods at this size. The next set of experiments was to take the γ and β from the results of the statevector experiments done in Section IV-A2 (where n =5) and to try and recreate the distribution and expectation values using the quantum computer.

B. Results
Our two main metrics to assess the performance of our method in our experiments are (i) the probability of sampling the best possible solution (or the ground state of Hamiltonian C) and (ii) the relative error of the expectation value with respect to the ground state energy as relative error = expectation energy − ground state energy ground state energy (27)  1) Optimization trajectory for QAOA: In Figure 1, we see an example of how QAOA with ImFil performs on a BLLS problem. As the iterations progress, the fluctuations in the energy expectation also reduces. This happens either till the black box optimizer converges to a solution (depending on default internal parameters in our case) or the iterations have reached the maximum threshold (governed by the budget).
Here the experiments done with the statevector backend, which has access to the exact energy expectation, sets the baseline for the other modes of experiments. While our experiments containing shot-noise due to measurement do relatively well against statevector results, the experiments on a real quantum device are mixed. At the time of writing, the IBM Q device we tested on did not approximate the theoretically-optimal QAOA result distribution very well, but it still finds the best solution every time. We have discussed this further in Section IV-B6.
2) QAOA results with no-noise: Before we study the results of QAOA for BLLS with shot-noise, it is important to evaluate the theoretical performance of the same without any noise at all. Figure 2 shows the relative error growth with respect to the problem size n for the experiments described in Section IV-A1. We use Median as measure of central tendency and Median Absolute Deviation (MAD) for our error bars. Simulations larger than p = 3 take a lot more time for the complete dataset and were computationally infeasible for this project.
The line graphs in the figure seem to suggest a nonexponential growth of the relative error to problem size. You can see that going from p = 1 to p = 2 decreases the relative error moderately. The difference in performance between p = 2 and p = 3 is more modest, particularly for the larger problem sizes n. There may be room for further improvement if we allow a larger simulation time budget, for example by tightening the classical optimizers convergence parameters and increasing the number of initial starting points for the optimizer. Further rigorous experimentation would be required to draw definite conclusions about the scaling based on such numerics. Fig. 3: Comparison of converged final relative error (median), for p ∈ {1, 2, 3}, as a function of problem size n ∈ {3, 4, 5, 9, 10}, done on 5 problem instances per n with a budget of 200 iterations. (TOP) shows the results from the optimization having access to the exact statevector simulator, while (BOTTOM) shows the results using a shot-noise simulator with 2 n+2 shots. For drawing the bottom plot, we use the best angles found using shot-based optimization and used the statevector backend for one more run at those angles in order to compute the exact expectation value and corresponding relative error. These results are from the experiments described in Section IV-A2. Figure 3 shows us how QAOA with ImFil performs for the parameters described in Section IV-A2 for statevector and measurement based results for 2 n+2 shots. We have 5 different problem instances per n ∈ {3, 4, 5, 9, 10}. The reason for choosing 2 n+2 shots for this comparison was to try and see if the optimizer could replicate the statevector results given plentiful shots. In subsequent figures, we'll be showing the performance of the optimizer with fewer number of shots.

3) No noise vs Shot-noise optimization:
We can see the similarities between the top and bottom plots in Figure 3. The main difference however, seems to be the result and error bar overlap between the results of p =1 and 2. While the two lines are close to each other in the statevector results uptil problem size of 9, the measurement-based results for the two parameters are extremely close to each other (when considering median and MAD). This could be attributed to the noise due to approximate results, or due to the small number of experiments we average over, as detailed in Section IV-A2. Another effect of the smaller dataset here is that the relative error's growth doesn't seem fully monotonic to the problem size, unlike in Section IV-B2. However, it still shows a general upward trajectory. Simulations for p = 4 and upwards become computationally infeasible due to the time required, even for the smaller dataset we worked with in Figure 3. This can be attributed to the exponential growth in runtime as a function of the circuit depth p [4].  Figure 4, we use the outputs of the statevector experiments described in Section IV-A1, and look at the success probabilities of finding the ground state for each of the problems in our dataset. We use box-plots to represent the errors for this figure. We can see how even with p =1, QAOA performs better than standard random sampling.
As the data suggests, the probability of finding the ground state goes down rapidly as the problem size is increased. One reason for this is the exponentially-increasing state space with problem size n. While for uniformly random sampling this In this bar graph, we collect the number of experiment instances in which we observe the exact ground state bitstring, at least once (on the Y axis) for n ∈ {3, 4, 5, 9, 10}. Each n has 5 problem instances, which is repeated 10 times for a given shot value. We compare the QAOA shot-noise simulator experiments (colour-coded with the number of binary variables, n), with the results one would expect randomly sampling from a uniform distribution shots times (black bars, labeled with rand, x-axis positioning corresponding to its colour-labeled partner). The QAOA data in this figure comes from sampling the circuit with optimized angle sets γ * and β * , after acquiring them by optimizing for a circuit depth of p = 3 and a budget of 200 iterations, as described in Section IV-A2. would imply exponentially decreasing success probabilities, the ground state probability amplitudes in QAOA can be polynomially or exponentially amplified and this would still show as a decreasing trend as a function of system size. Larger-scale simulations would be required to extrapolate the expected performance scaling as a function of n. Another important factor is that the problem graph for BLLS requires all-to-all connectivity [14], [15] (also see Appendix A). Recent research has shown how having a high problem density can be a challenge for QAOA to optimize over [62]. In our discussion section, we discuss how we can mitigate this (along with other strategies for potential future work).
It is one thing to calculate probability, it is another to sample the best solution (or ground state) from a quantum state after QAOA. Figure 5 displays the number of experimental instances where we sample the ground state, at least once, for a particular set of parameters, across various problem sizes and instances (for shot-noise experiments in Section IV-A2. We contrast this with the analytical results of getting the ground state by uniform random sampling. Here, we see that for optimization done with upto 2 n shots, QAOA has a clear advantage over random sampling. This can be explained by the mechanism of QAOA, which selectively amplifies those bitstring sampling probabilities which have the lowest energy, while suppressing those with higher energy. In this way, the success probabilities may be greatly enhanced over the naive random sampling from the uniform probability distribution.

5) Effect of shot number on optimization:
For QAOA to become practical, the shot number chosen for the computation has to be far less than the number of eigenstates for our cost Hamiltonian (2 n for our case). For this work, we chose not to randomly guess a shot number value but rather get an understanding of the optimization performance for a set of shot numbers in {2 i |n − 2 ≤ i ≤ n + 2, i ∈ Z}. We hope this helps all future research work in finding better estimates for the least amount of shots required for QAOA, especially for these type of applications. In Figure 6 we see the optimization result for a problem instance where n = 5. The shot optimization is compared with the statevector optimization. Here it is important to point that we optimized using the stochastic blackbox (using a given shot number) and then calculate the exact expectation value using wavefunction method (with the statevector backend) in order to assess the true value of relative error. For the most part, our experiments show that as the problem size increases, we see the optimizer do well even with 2 n−2 shots. This seems to indicate that the number of shots required to get a good optimization may not be exponential in terms of the problem size, or at least with a smaller exponent than applying random sampling from a uniform distribution. Further research is needed.
6) IBM Q device performance: We briefly mentioned our real device results in Section IV-B1. The good news is that IBM Q was always able to find the best solution for our optimization experiments. But that came at the price of taking 1024 shots for each QAOA iteration, which is relatively expensive for a problem size of 5 qubits. When we lowered the shot number, the optimal bitstring was not always sampled and the convergence deteriorated further. The immediate cause of why the optimization process on the device was not close to the simulation results, is the inability of the device to approximate the distribution of the measured bitstrings (for a given circuit). We provide an example of this in Appendix D for the readers.
It is crucial to consider the entire context here. Firstly, the problem graph of our use case is fully connected. Due to the sparse connectivity of the on-chip qubits in the device, logical qubits have to be swapped around a number of times for them to be able to entangle with each other (for the ZZ interactions of our problem). This makes the average gate depth for the final (transpiled) circuits that run on the ibmq_london machine to be about 35. Since two qubit gate fidelity is still low (at the time of writing), the error propagates across the circuit. Secondly, due to the large circuit depth on the real device, we need to take decoherence into account. Thirdly, readout-errors were not considered here and they have significant impact on the noise in the qubit measurement results.
It should also be emphasized that in this work, we primarily focused on how to model the BLLS problem using QAOA. Thus, the experiments on the real devices were done "as is", in order to demonstrate the near-term implementability, without any error mitigation [63]. This could be looked at for future work.

C. Discussion
We can see the various possibilities and potential advantages QAOA may provide in solving BLLS and similar problems. However, there are challenges that need to be addressed. These are both theoretical and practical in nature.
One theoretical challenge is the proper pre-processing of the problem Hamiltonian by scaling and shifting the coefficients of the objective function, such that we optimally make use of the parameter space β l ∈ [0, π], γ l ∈ [0, 2π] (most of the problems in the dataset did not suffer from this issue, as we found the default scaling to work well already). However, scaling the problem way beyond necessity also creates issues as the energy landscape is periodic in nature [4]. Thus, one possible way is to use scaling as a heuristic within the QAOA process, and treat it as a hyperparameter to optimize over.
Another challenge, which is both theoretical and practical in nature, is the full connectivity in our problem and in most hard optimization problems in general [62].Computationally non-trivial problems typically require a high degree of graph connectivity (for instance, non-planar graphs are easy to solve classically [64]). Simultaneously, a high connectivity poses a challenge in quantum chip implementation because not all gatesets implement non-nearest neighbour interactions natively. Those need then be implemented effectively by means of a swap network approach [65]. For future work, we can suggest to modify the problem formulation by not considering the ZZ interactions of a pair of qubits, if its coefficient's magnitude falls below a user defined threshold. This can potentially make it easier for QAOA to run, but its effectiveness in finding the ground state would come under scrutiny. Nonetheless, it can make for interesting future work. Also, error mitigation techniques and readout error correction will also help in improving the results [65]. It would take a combination of the above mentioned approaches to improve performance on a real device.
Challenges aside, one of the next steps would be to explore if QAOA can be valuable in applications that require BLLS as a subroutine, such as NBMF [15]. Another step can be to try the BLLS problem on other types of quantum computers [9]- [13] to see how different hardware implementations fare.

V. CONCLUSION
In this work, we described the implementation of a binary optimization problem, relevant to hard problems in linear algebra, on a gate-based quantum computer via a QAOA approach suitable for NISQ devices. We discussed practical implementation considerations, and shown the expected performance on some particular examples. We compared the solutions found using QISKIT, on an exact quantum wavefunction simulator, a shot-based simulator, and using a IBM Q cloud-based quantum processor based on superconducting qubits. In this first implementation, we showed promising mapping of the problems to the QAOA solver, good theoretical performance compared to random sampling, but found it is still challenging to implement linear-depth, high-connectivity circuits on the latest hardware available today. In future work, it would be interesting to try to compare directly the performance and scaling of this method as compared to the most competitive classical alternatives, even though this quantum-classical comparison is sometimes hard to realize in practice. Also, we expect a future experimental implementation would benefit greatly from gate-error mitigation techniques and post-processing readout errors. It would furthermore be very interesting to see what other hard problems in linear algebra may be implemented using the QAOA and what their expected performance would be.

APPENDIX A DETAILED QUBO FORMULATION FOR BINARY LINEAR LEAST SQUARES
In this section of the Appendix, we describe the method by which O'Malley and Vesselinov [14] formulated the binary linear least squares (BLLS) problem. This QUBO formulation will be converted into its equivalent Ising objective function and used in QAOA. Let us begin by writing out Ax − b which would help us in minimizing x and thereby solve Eqn(1) Taking the 2 norm square of the resultant vector of Eqn(29), we get Because we are dealing with real numbers for A and b, (|.|) 2 = (.) 2 And thus, the coefficients in Eqn(2) are found by expanding Eqn(31) to be You will notice that there is a constant value from Eqn(31) that we leave out of Eqn (32) and Eqn (33). Because this value is not a coefficient for any of the variables, we can't optimize over it and it's left as is, which is b 2 2 . Also, the ground state energy (QUBO) for when Ax * − b 2 = 0 where x * is the best solution, is − b 2 2 . APPENDIX B EXAMPLE QAOA CIRCUIT FOR BLLS Let us consider a simple problem, without loss of generality, to demonstrate how quantum circuits for BLLS can be designed. Consider the following problem, Find This particular problem is more appropriately categorized as a linear system of equations (A ∈ R n 2 ) and has a solution x = (1, 1, 0) T , such that Ax * = b or Ax * − b 2 = 0. However, our problem formulation does not change.
In order to solve this problem using QAOA, we require 3 qubits. Using the formulation process detailed in the Appendix A, the QUBO formulation we get is The constant value that didn't make it to the QUBO here is b 2 2 = 18. Converting the QUBO into Ising using Eqn(4), we get The offset when going from QUBO to Ising is -11. Therefore, the Ising ground state for this problem is −18 − (−11) = −7. Now let us assume that we are designing a circuit for QAOA where p = 1. So for a given pair of angles (β, γ) a circuit would look like the one shown down below in Eqn (38).
The results of this circuit will be used to calculate the expectation. Based on the expectation, a new pair of β and γ will be calculated using a classical black box optimization algorithm (like ImFil). These new angles will be fed into another such circuit till the optimization loop converges.

|0
H The results from Eqn (38) would be classical bitstrings when measured in the standard basis. In order to calculate the energy or cost of a particular bitstring with respect to the Ising Cost function Eqn (3), we would first need to substitute a 1 for each 0 and -1 for each 1 in the bitstring. In short, this is becausê σ (z) describes a quantum state to have an energy of +1 for |0 and -1 for |1 (in arbitrary units). For our example, if we measure a bitstring ξ to be {ξ 1 = 0, ξ 2 = 0, ξ 3 = 1} the equivalent ising set would be {σ 1 = 1, σ 2 = 1, σ 3 = −1}.

APPENDIX C IMPLEMENTING TWO-QUBIT INTERACTIONS ON A QPU
As mentioned in Section III-B1, most practical quantum computers would not have all to all qubit connectivity. But if the problem that we need to solve on a quantum device requires dense connectivity (such as the BLLS), we need SWAP gates for allowing distant qubits to interact with one another. Let us first describe the SWAP gate in its matrix form Diagrammatically, we can decompose it with CNOT gates as To illustrate how this takes place, consider a hypothetical device where every qubit is only connected to the adjacent qubit in a line. Thus in order to realize the gates described on the LHS of Eqn (41), one way is to SWAP between the top and the middle qubit so that it could interact with the bottom qubit.
After our desired two-qubit interaction takes place, the top and middle qubits are swapped again, returning the logical qubits to their original place. Of course, there are other methods (involving SWAP) that the compiler may take to realize the original unitary operations to be performed. In Eqn (41), we also show the decomposition of the R z gate as defined in Eqn (26).  We compare the result for IBM Q device (TOP) versus the qiskit shot-based simulator (BOTTOM), for the same problem instance for a total of 10240 shots. This figure represents data from an experiment done in Section IV-A3.

APPENDIX D PROBABILITY DISTRIBUTION OF RUNNING
At the time of writing, our experiments on the quantum processor (ibmq_london) are unable to produce results similar to what a simulator produces. Figure 7 shows an example of the distribution we get from running a QAOA circuit with optimized (and fixed) β and γ angles. The simulation suggests a few bitstrings with a high probability of being measured (with the ground state having the highest), whereas the distribution from the quantum processor is more evenly spread out, with the ground state not having a significantly high probability of being measured.