Machine Learning and Quantum Physics I

An exploration of the most recent innovations in the application of machine learning towards simulating many-body quantum states. Covering the general implementations of Variational Monte Carlo methods through Jastrow wavefunctions, Restricted Boltzmann Machines, and Feed-forward Neural Networks.

By Richard Ou

03 March, 2021

Even now I can hardly believe that it was only two years ago that I had begun to grasp the fundamentals of Quantum Mechanics, and that I am now presented with the opportunity to perform work on a project that sits on the very frontier of the field. It is also within this very time-span that we have seen significant developments in the intersection between Quantum Physics and Machine Learning.

The premise of this project, as set out by my supervisor Dr MJ Bhaseen,¹ Co-Head of the Theory and Simulation of Condensed Matter Group at King’s College London is to explore the utility of machine learning in the context of quantum many-body systems. The core attraction of this project is in the fact that it seeks to explore one of the greatest challenges of quantum mechanics: the modelling of interacting many-body quantum states.

In principle, this task is seemingly intractable as it requires an exponential amount of information by nature. This is further compounded by a similar issue that plagues machine learning, the curse of dimensionality; where the number of training data sets required for a given machine to learn the desired information increases exponentially with dimension. Despite these seemingly insurmountable challenges, it is widely agreed that the amalgamation between the developments of these two fields will enable us to achieve new, more efficient and effective solutions; thereby enabling the resolution of the countless problems that lie at the heart of condensed matter, nuclear, and chemical physics.

Introduction
Exploration with NetKet
Conclusion

Introduction

The aim of this writing is in the hopes of enabling myself and others a means of grappling with the diverse challenges that are to be faced in exploring this intersection between fields. This is particularly true for myself in my quest to grasp the almost entirely foreign concepts of machine learning. In this spirit, I shall begin by reviewing the research that I have conducted thus far, and which compounds the basis of my understanding.

We start with the principle approaches in resolving this non-trivial challenge, of which there are two main classical categories. The first is through the representation of quantum states in stochastic Variational Monte Carlo (VMC) calculations, Jastrow wave-functions being the prime example. This approach carries high entanglement but also limited variational freedom. The second, more recent approach, utilises tensor-networks for non-stochastic variational optimisation, which operate more specifically on entanglement-limited variational wavefunctions. In response to the limitations of these approaches, recent developments have been made towards the utility of Artificial Neural Networks (ANN) in tackling these issues.

Arguably the most well-known development in this area, that of which has been devised by Carleo and Troyer, introduces a new variational quantum states representation based on the Restricted Boltzmann Machine (RBM), which are in themselves specialised ANNs.²Carleo, G. and Troyer, M., 2017. Solving the quantum many-body problem with artificial neural networks. Science, 355(6325), pp.602-606. A RBM, in particular, stands out as having a bipartite arrangement of two layers of neurons, a visible and a hidden layer. Its lack of intra-layer connections is what makes it a “Restricted” Boltzmann Machine, and which enables more efficient training algorithms such as Stochastic Gradient Descent (SGD) – which I will endeavour to cover later on. This Neural-network Quantum States (NQS) approach couples the parameterisation of wavefunction amplitudes as Feed-Forward Neural Networks (FFNNs) and the utility of a reinforcement-learning scheme to achieve high accuracy in the determination of equilibrium and dynamical properties of interacting spin models in both one and two dimensions.

NQS by itself remains relatively shallow, however further lines of research have been conceived that have integrated recent advancements in machine learning. In particular, Sharir et al implements deep network architecture to increase the expressive power of NQS by enabling more efficient representations of highly entangled many-body quantum states.³Sharir, O., Levine, Y., Wies, N., Carleo, G. and Shashua, A., 2020. Deep Autoregressive Models for the Efficient Variational Simulation of Many-Body Quantum Systems. Physical Review Letters, 124(2). This is achieved by resolving two inherent challenges: The first is in the reliance on the computationally expensive Markov-Chain Monte-Carlo (MCMC) sampling towards estimating quantum expectations; the second is the intrinsic optimisation bottleneck that results from a large number of parameters. To this end, they proposed a specialised Neural Network architecture that supports efficient and exact sampling which circumvents the need for MCMC; thereby enabling the attainment of accurate results in previously inaccessible sizes.

The proposed architecture is heavily inspired by generative machine learning models and FFNNs, particularly that of Neural Autoregressive Distribution Estimation (NADE) models to establish Neural Autoregressive Quantum States (NAQS). Introduced by Larochelle and Murray, NADE models utilise the probability product rule and the weight sharing scheme from RBMs to achieve a tractable estimator that enables good generalisation performance.⁴Larochelle, H. and Murray, I., 2011. The Neural Autoregressive Distribution Estimator. Journal of Machine Learning Research, (15), pp.29-37. This has enabled the resolution of limitations from large autocorrelation times and lack of ergodicity through MCMC. However, NADE is limited in its tendency towards becoming computationally expensive, of which there are two main methods: first, the addition of multiple layers of hidden units results in further processes; second, the introduction of extra non-linear calculations increases time complexity to cubic form in the number of units per layer⁵As a result of an inability to share hidden layer computations after non-linearity in the first layer. Despite these limitations, it is important to highlight that this development underpins the numerous advantages that can be achieved by applying methods and models that are well established within the machine learning field.

Having explored a diverse selection of recent major developments in this fast-paced research environment, we now have a good understanding as to how the application of these machine learning methods can help tackle the challenge at hand. The next intrinsic step in this exploration is to build an understanding as to how to implement these techniques towards increasingly more complex quantum many-body systems.

Exploration with NetKet

With the development of NQS came further innovations in the tools for use in their application. Carleo et al introduces Netket, a comprehensive open-source framework for the study of many-body quantum systems. Given its low barrier to entry, it facilitates a good starting point for exploring the implementation of NQS, SGDs, VMC, and supervised and unsupervised learning environments. As such, I will be making full use of its Python library at this point in the project.

In this endeavour, I explore the approaches towards resolving quantum states by tracing the aforementioned chronological developments in this area with a particular focus on the ground state constraint. As such, I first began by defining the Hamiltonian of the system, before exploring the stochastic VMC approach in the form of a Jastrow Wavefunction.

I. Hamiltonian

In the following section, I will explore the ground state of an anti-ferromagnetic $J_{1} – J_{2}$ Heisenberg spin chain with an assumed length of $22$ and periodic boundary conditions. I consider a lattice system which can be expanded as: $\Psi = \sum_{\vec{\sigma}} C[\vec{\sigma}]\ket{\vec{\sigma}}$, where $\ket{\vec{\sigma}} = \ket{\vec{\sigma}_{1} … \vec{\sigma}_{L}}$ are $S^{z}_{i}$ eigenstates spanning the Hilbert space of the spin configurations. Consequently, the task at hand is to efficiently approximate the function $C[\sigma]$ using a minimal number of parameters than the Hilbert space dimensionality. Given the scope of the exploration, the Hilbert space in this instance is constrained to the zero magnetisation sector. The Hamiltonian that will be considered in this instance is as such:

$$H = \sum_{i=1}^{L} J_{1}\vec{\sigma}_{i} \cdot \vec{\sigma}_{i+1} + J_{2} \vec{\sigma}_{i} \cdot \vec{\sigma}_{i+2}$$

As the assumed length of the system is only $22$ for our 1-dimensional lattice, it is possible to calculate the exact ground-state energy of the given length using the Lanczos algorithm as a method of brute-force exact diagonalisation. NetKet provides the wrappers necessary for implementing the algorithm, enabling the easy attainment of an exact ground state energy of $-39.148$ for the given hamiltonian.

II. Jastrow Wavefunction

I began by implementing the first of the two aforementioned traditional approaches towards the representation of trial wavefunctions: VMC calculations. The most notable example being the Jastrow wavefunction.⁶Jastrow, R., 1955. Many-Body Problem with Strong Forces. Physical Review, 98(5), pp.1479-1484. This approach, introduced by Jastrow in 1955, is summarised as the product of the Slater determinants and the totally symmetric non-negative Jastrow correlation factor. The former is usually obtained from accurate Linear Discriminant Analysis or Hartree-Fock calculations, while the latter is an explicit function of electron-electron separations.⁷Foulkes, W., Mitas, L., Needs, R. and Rajagopal, G., 2001. Quantum Monte Carlo simulations of solids. Reviews of Modern Physics, 73(1), pp.33-83. Principally, the wavefunction is defined as such:

$$\Psi(s_1,\dots s_N) = e^{\sum_{ij} s_i W_{ij} s_j}$$

In order to implement the trial wavefunction, we must also consider two other primary components: the optimiser and the sampler. The optimiser seeks to select the best results of the function, the most popular example of which is the above-noted Stochastic Gradient Descent. A Gradient Descent optimising method, in general, is an iterative method that follows the downhill direction of the surface gradient created by the objective function until a minimum is reached. An SGD, in particular, introduces a randomised element by performing a parameter update for each training example; thereby increasing the chances of achieving the global minima. Meanwhile, the sampling method defines the approach in which elements out of an entire set are chosen. One of the most popular approaches is through the Metropolis-Hastings algorithm, an MCMC method that obtains a sequence of random samples from a probability distribution.⁸Hastings, W., 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57(1), pp.97-109.

Implementation of the trial wavefunction on NetKet is exceedingly simple with the provision of the Jastrow, Metropolis Exchange, and SGD classes. In this instance, the learning rate for the SGD method was set to $0.1$, while the random sampling parameters were set with a sigma value of $0.01$. The resultant ground-state energy calculation across iterations compared against the exact energy is presented in the graph below.

The Jastrow wavefunction was shown to calculate $(-39.123 + 0.000i)\pm 0.017$ for the ground-state energy, with a variance of $0.217$ and range of $1.003$. As a means for further comparison, I shall note that this particular instance of the calculation took $35$ seconds to calculate.

III. Restricted Boltzmann Machine

We will now consider another celebrated approach, the Restricted Boltzmann Machine. As described before, the RBM consists of two layers, which in this instance represent: a visible layer for the half-spin degrees of freedom and a hidden layer for the different numbers M of hidden units. The relationship between the visible and hidden nodes is fully described by the following ansatz:

$$\Psi_{\rm RBM} (\sigma_1^z,\sigma_2^z, …, \sigma_L^z) = \exp ( \sum_{i=1}^L a_i \sigma_i^z ) \prod_{i=1}^M \cosh (b_i + \sum_j W_{ij} \sigma^z_j)$$

Where $a_{i}$ and $b_{i}$ are the visible and hidden biases respectively. Together with the weights $W_{ij}$, they are variational parameters that will minimise the energy calculation. Netket conveniently provides control on the important parameters in this ansatz, such as $M$, through its RBM spin machine class.

In this instance, I have defined the hidden unit $\alpha=M/L$, and invoke the RBM ansatz in NetKet with as many hidden units as there are visible.

The resultant calculation for $300$ iterations attained a ground-state energy of $(-39.145 + 0.029i) \pm 0.041$, with a variance of $1.245$ and range of $1.003$, taking a total of $255$ seconds. In this instance, with the given parameters, the RBM is both less efficient and effective towards calculating the ground-state energy. However, by adjusting the hyperparameters an improved result can likely be achieved but at the cost of increased computational time. One way of improving the approach is by taking advantage of the symmetries of the Hamiltonian in the ANN construction; thereby reducing the number of parameters to optimise. A symmetric RBM can easily be implemented via the use of NetKet’s RBM Spin Symmetry class. Given the exact same hyperparameters, the results can be extracted and plotted yet again.

The attained value in this case was $(-39.132 + 0.004i) \pm 0.012$, at a variance of $0.031$ and range of $1.011$, taking $81$ seconds. The symmetric RBM method is shown to be more efficient (by iterations) and effective at reaching the exact ground-state energy than either Jastrow or standard RBM. Most significantly, the application of symmetry reduced the number of parameters from 528 to just 24. It is evident now that we are fast approaching the current cutting-edge developments in this area.

IV. Feed Forward Neural Networks

In 2017, Cai and Liu showed that a Feed-Forward Neural Network (FFNN) with a small number of hidden layers can be trained to approximate the ground states of quantum many-body wavefunctions with high precision.⁹Cai, Z. and Liu, J., 2018. Approximating quantum many-body wave functions using artificial neural networks. Physical Review B, 97(3). The FFNN was the first type of ANN devised and is differentiated by the non-cyclic nature of its edges and its forward-limited direction of movement from the input nodes.

My initial implementation begins with a simple structure: a first layer which takes an L-dimensional input, to apply a bias and output double the data; an activation layer, which applies $lncosh$; and a final $SumOutput$ layer, which produces a single number for the wavefunction coefficient associated to the input basis state.

The first FFNN attempt produces a result of $(-39.125 + 0.010i) \pm 0.010$ for the ground-state energy, resulting in a variance of $0.125$ and range of $0.999$. This result performs marginally better than the Jastrow wavefunction method, but does not supersede the result of the symmetric RBM. The largest downside of this attempt is that it required 1012 parameters to operate, resulting in a computation time of $490$ seconds – significantly more time than any of the prior approaches.

Conclusion

The real power of these machine learning techniques in this domain has only been partially demonstrated through the applications of Jastrow, RBM, and FFNN via NetKet, there remain many more sophisticated techniques that I shall seek to explore further.

In the weeks ahead, I would like to explore the implementation of the deep autoregressive models for variational simulations of many-body quantum systems. Doing so will enable direct comparisons to be drawn between these methodologies.

Finally, I hope to develop a deeper understanding of the variational studies with NQS, as the origin of the empirical successes obtained so far among these methods are not equally well understood as with other families of variational states, like tensor networks.

Contents