-
The discovery of the Higgs Boson in 2012 completed the jigsaw of elementary particles according to the current Standard Model. Soon after, CERN announced the Higgs boson machine learning challenge [19] to the public. The goal of this challenge is to help experimentalists better distinguish the signal of Higgs boson decay from background noise using machine learning (ML) methods. ML was found to be extremely effective in this task.
As a branch of the field of artificial intelligence, ML technology has promoted intelligentization in many areas of industry, such as autonomous driving, smart mobile electronic devices, and the internet. Because scientists often deal with large quantities of data or require information that is difficult to obtain using traditional methods, they have applied ML techniques to fundamental science research, and physics is no exception. On the whole, the applications of ML in physics can be divided into two categories. The first is the replacement of physical models with ML models if the latter are more effective for specific problems. By training ML models with a large quantity of data, a map can be constructed between two or more physical quantities. For instance, by training neural networks to approximately imitate wave functions, we can construct a map between the potential and one particle's energy without solving the Schrödinger equation [20] or quantum many-body problem [21]. The second is to recognize the target signal (such as a physical phenomenon) from the background with noise. For example, deep neural networks can help distinguish the Higgs boson or other exotic particles of interest (signal) from other particles (background) [22]. In short, ML algorithms can be used to deal with regression and classification problems in physics. As a branch of ML, DL has become the most popular AI method in physics research. In the field of relativistic heavy ion collisions, DL has been applied to the problems of QCD phase transition [23–25], relativistic hydrodynamics [26], jet structure [27, 28], the search for the chiral magnetic effect [29], and recognition of the initial clustering structure in nuclei [30]. Among various DL algorithms, the DNN, especially the convolutional neural network, are commonly used.
The DNN is one of the early DL models. Owing to its remarkable ability to realize nonlinear mapping with a comparatively simple structure, the DNN is still a preferred tentative model for most regression tasks. A DNN (Fig. 1), also known as a multi-layer perceptron (MLP), is composed of an input layer, sufficient hidden layers to be considered 'deep', and an output layer.
Figure 1. (color online) Architecture of the multi-layer perceptron (MLP) we use in this study. It contains four hidden layers. The first layer is composed of 512 neurons, the second layer has 256 neurons, the third layer has 128 neurons, and the final hidden layer has 64 neurons.
The CNN commonly appears in 2D image-related tasks. A typical CNN consists of the input layer, convolutional layers, pooling layers, fully connected layers, and the output layer. Our CNN architecture is shown in Fig. 2.
Figure 2. (color online) Structure of the convolutional neural network (CNN). It contains two convolutional layers and four fully-connected layers.
As a supervised learning regression task, impact parameter determination aims at constructing a map between the input observables and a single value, that is, the impact parameter of an event. Thus, it is appropriate to choose the mean squared error (MSE) as the loss function to examine the performance of the learning models. This is defined as
$ \begin{equation} {\rm Loss} = \frac{1}{N_{\rm batch}} \sum\limits_{i=1}^{N_{\rm batch}}(y_i^{\rm pred} - \hat{y}_i^{\rm true})^2 , \end{equation} $
(1) where
$ \hat{y}_i^{\rm true} $ is the true value of the impact parameter of an event among a batch of events with a size of$ N_{\rm batch} $ , and$ y_i^{\rm pred} $ is the output of the MLP/CNN model, corresponding to the prediction value. -
DL algorithms have revealed a strong capability to construct a map between the input data and target. As a result, we can succeed in various regression or classification tasks without prior knowledge. However, most widely used DL models are not interpretable. Because of their 'complexity' and 'dimensionality', it is difficult to understand how these models work and obtain instructive information from them [39]. Therefore, these DL models are viewed as 'black boxes' in most cases. More effort has been made to open the 'black boxes' of DL algorithms. By analyzing the DNNs on the information plane [40], one can understand the training and learning processes and hence the internal representations of the DNNs [41]. In addition, for CNNs, which are usually used in visual recognition tasks, there have been a number of studies aimed at visualizing them. By operating global average pooling and defining a quantity measuring the importance of neurons in a CNN, one can generate a class activation map (CAM) [42], with which we can localize the crucial regions of a 2D matrix for a CNN to succeed in classification tasks. Based on the CAM method, R. R. Selvaraju et al. proposed a new CNN interpretation method known as gradient-weighted class activation mapping (Grad-CAM) [43]. Compared with the former method, Grad-CAM can be applied to more types of CNNs and can provide more information about what is learned by the neuron network.
In the CAM method, the class activation map for class c (in our case, c is trivial and can be supressed because we have a regression problem instead of a classification problem) is defined as
$ {M^{c}} $ .$ \begin{equation} M^{c}_{x,y}(A) = \sum\limits_k \omega_k^c f_k(A;x,y). \end{equation} $
(2) Here,
$ f_k(A;x,y) $ represents the activation of unit k in the final convolutional layer at spatial location$ (x,y) $ of input data A, and$ \omega_k^c $ is the weight measuring the importance of unit k for class c. In Grad-CAM,$ \omega_k^c $ is defined as the result of performing global average pooling on the gradient of the score for class c with respect to activations$ f_k $ :$ \begin{equation} \omega_k^c = \frac{1}{Z} \sum\limits_x \sum\limits_y \frac{\partial y^c}{\partial f_k(A; x,y)}, \end{equation} $
(3) where
$ (1/Z)\sum\limits_x \sum\limits_y $ represents the operation of global average pooling. Subsequently, the gradient-weighted class activation map is given as$ \begin{equation} L^c = {\rm ReLU} \left(\sum\limits_k \omega_k^c f^k \right). \end{equation} $
(4) CAM and Grad-CAM have succeeded in classification problems. However, several adjustments are required in the definitions of the maps to apply them to our CNN for regression. When Grad-CAM meets classification tasks, only the regions positively correlated with the class of interest should be preserved. Thus, a ReLU operation is performed on the linear combination of maps. However, for regression problems, both the positively-correlated and negatively-correlated features should be considered. Consequently, we redefine an activation map as the absolute value of the linear combination of maps.
$ \begin{equation} L^c = Abs \left(\sum\limits_k \omega_k^c f^k \right). \end{equation} $
(5) We obtain average 'attention' maps for impact parameters in the interval
$[2,11]$ fm, where our CNN behaves well in prediction (see Fig. 11). It turns out that compared with the cases of central collision, the CNN turns its 'attention' to regions of larger transverse momenta for peripheral collisions. The CNN gives high marks to charged particles with small transverse momenta when it tries to 'recognize' a central event's impact parameter. With an increase in b, the peripheral area of the energy spectrum begins to attract the CNN's attention and becomes more important for distinguishing peripheral events from central ones. This suggests that the CNN focuses on particles with larger transverse momenta for peripheral events.
Determination of the impact parameter in high-energy heavy-ion collisions via deep learning
- Received Date: 2022-03-17
- Available Online: 2022-07-15
Abstract: In this study, Au+Au collisions with an impact parameter of