Chaotic time series prediction with deep belief networks: an empirical evaluation

Chaotic time series are widespread in several real world areas such as finance, environment, meteorology, traffic flow, weather. A chaotic time series is considered as generated from the deterministic dynamics of a nonlinear system. The chaotic system is sensitive to initial conditions; points that are arbitrarily close initially become exponentially further apart with progressing time. Therefore, it is challenging to make accurate prediction in chaotic time series. The prediction using conventional statistical techniques, k-nearest-nearest neighbors algorithm, Multi-Layer-Perceptron (MPL) neural networks, Recurrent Neural Networks, Radial-Basis-Function (RBF) Networks and Support Vector Machines, do not give reliable prediction results for chaotic time series. In this paper, we investi-gate the use of a deep learning method, Deep Belief Network (DBN), combined with chaos theory to forecast chaotic time series. DBN should be used to forecast chaotic time series. First, the chaotic time series are analyzed by calculating the largest Lyapunov exponent, reconstructing the time series by phase-space reconstruction and determining the best embedding dimension and the best delay time. When the forecasting model is constructed, the deep belief network is used to feature learning and the neural network is used for prediction. We also compare the DBN –based method toRBFnetwork-basedmethod, whichisthestate-of-the-artmethodforforecastingchaotictimese-ries. The predictive performance of the two models is examined using mean absolute error (MAE), mean squared error (MSE) and mean absolute percentage error (MAPE). Experimental results on several synthetic and real world chaotic datasets revealed that the DBN model is applicable to the prediction of chaotic time series since it achieves better performance than RBF network.


INTRODUCTION
Time series in several real world areas such as finance, environment, meteorology, and weather are characterized as chaotic in nature. A chaotic time series is generated from the deterministic dynamics of a nonlinear system ( 1,2 ). The chaotic system is sensitive to initial conditions; points that are arbitrarily close initially become exponentially further apart with progressing time. Therefore, it is challenging to make accurate prediction in chaotic time series. The prediction using conventional statistical techniques, k-nearest-nearest neighbors algorithm, Multi-Layer-Perceptron (MLP) neural networks, Recurrent Neural Networks, Radial-Basis-Function (RBF) Networks and Support Vector Machines (SVMs), do not give reliable prediction results for chaotic time series. Deep learning models, such as Deep Belief Networks (DBNs), have recently attracted the interest of many researchers in some applications on big data analysis. DBN is generative neural network model with many hidden layers, introduced by Hinton et al. 3 along with a greedy layer-wise learning algorithm. The building block of a DBN is a probabilistic model called Restricted Boltzmann Machine (RBM). DBNs and restricted Boltzmann machines (RBMs) have already been applied successfully to solve many problems, such as classification, dimensionality reduction and image processing. There have been several research works of applying DBNs to predict time series data in finance ( 4,5 ), meteorology ( 6,7 ), and industry ( 8 ). However, so far there have been very few works on applying DBNs in forecasting chaotic time series. Kuremoto et al. in 2014 [6] studied the application of a DBN which composes RBMs and multi-layer perceptron (MLP) to predict chaotic time series data. The hyperparameters of the deep network were determined by particle swarm optimization (PSO) algorithm. Despite of the simple and effective structure for the proposed DBN, there exist three weaknesses in this paper. First, the paper does not make clear that the DBN model is combined with chaos theory in dealing with chaotic time series prediction. Second, the

SI102
work has tested DBN model over only two synthetic chaotic time series datasets: Lorenz and Henon Map. The lack of testing the proposed model with real world chaotic time series datasets cannot validate the high performance of DBN model in practical applications of chaotic time series prediction. Third, the work compared DBN model with only MLP model, a simple form of shallow neural networks. In this work, we present and evaluate extensively a method of chaotic time series prediction using the DBN model which has the same structure as given in the paper 9 . However, there are three focus points in our work which makes it different from the previous work by Kuremoto et al. in 2014. i) We combine the DBN model with the chaos theory, namely phase space reconstruction, in dealing with chaotic time series prediction. ii) We compare the performance of DBN model to that of Radial Basis Function (RBF) network, a special kind of shallow neural networks which can bring out better forecasting precision than MLP neural network in chaotic time series prediction 10 . iii) To verify the effectiveness of DBN, we experiment the performance of DBN over several benchmark chaotic time series datasets. In experiment, we use three synthetic time series datasets: Lorenz, Mackey-Class, Rossler and four real world time series datasets: Sunspots and some financial/economic datasets. The predictive performance of the two models is examined using mean absolute error (MAE), mean square error (MSE) and mean absolute percentage error (MAPE). Experimental results through three evaluation criteria revealed that DBN outperforms RBF network in most of the datasets. The remainder of the paper is organized as follows. Section 2 provides some basic backgrounds about DBN and chaos theory. In Section 3, the method using DBN for forecasting chaotic time series is introduced. Section 4 reports the experiments to compare the prediction accuracy of the DBN method to that of the RBF network model. Finally, section 5 gives some conclusions and future work.

BACKGROUND AND RELATED WORKS Deep Belief Network
Restricted Boltzmann Machines (RBMs) are often used to construct deeper models such as DBN. RBM is a kind of stochastic artificial neural network with two connected layers: a layer of binary visible units (v, whose states are observed) and a layer of binary hidden units (h, whose states cannot be observed). The hidden units act as latent variables (features) that allow the RBM to model probability distribution over state vectors (see Figure 1). The hidden units are conditionally independent given visible units. Given an energy function E(v, h) on the whole set of visible and hidden units, the joint probability is given by: where Z is a normalization partition function, which is obtained by summing up the energy of all possible (v, h) configurations.
For the binary units h i ∈ {0, 1} and v i ∈ {0, 1}, the energy function of the whole configuration is: The posterior probability of one layer given the other is easy to compute by the two following equations: where where Notice that σ is the sigmoid function. Inference of hidden factor h given the observed v can be done because h is conditionally independent given v.
A DBN is a generative model with an input layer and an output layer, separated by l layers of hidden stochastic units. This multilayer neural network can be efficiently trained by composing RBMs in such a way that the feature activations of one layer are used as the training data for the next layer.
An energy-based model of RBMs can be trained by performing gradient ascent on the log-likelihood of the training data with respect to the RBMs parameters. This gradient is difficult to compute analytically. Markov Chain Monte Carlo methods are wellsuited for RBMs. One iteration of the Markov Chain works well and corresponding to the following sampling procedure: where the sampling operations are schematically described. Rough estimation of the gradient using the above procedure is denoted by CD-k, where CD-k represents the Contrastive Divergence algorithm 3 for performing k iterations of the Markov Chain up to v k .

Chaos Theory
Given a univariate time series x t , where t = 1, 2,..., N, the phase space can be reconstructed using the method of delays 1 . The essence in this method is that the evolution of any single variable of a system is determined by the other variables with which it interacts. Information about the relevant variables is thus implicitly contained in the history of any single variable. On the basis of this idea, an equivalent phase space can be constructed by assigning an element of the time series x t and its successive delays as coordinates of a new vector.
where X t are the points of phase space, t is the delay time and m is embedding dimension. The dimension m of the reconstructed phase space is considered as the sufficient dimension for recovering the object without distorting any of its topological properties, thus it may be different from the true dimension of the space where this object lies. Both the t and m parameters must be determined from the time series.
To determine a reasonable time delayt, we can apply the mutual information method proposed by Fraser and Swinney 11 . To determine the minimum sufficient embedding dimension m we can apply the false nearest neighbor method proposed by Kennel et al. 2 .
To check whether a time series is chaotic or not, one needs to calculate the maximal Lyapunov exponent. Rosenstein et al. 12 proposed a method to calculate the largest Lyapunov exponent from an observed time series.

Related Work
In our previous work 10 , we proposed an efficient method of chaotic time series prediction using Radial Basis Function (RBF) network, a special kind of shallow neural networks. The RBF network is characterized by a set of inputs and a set of outputs. Between the inputs and outputs there is a layer of hidden units, each of which implements a radial basis function. Various functions have been tested as activation function for RBF network. Gaussian function is often used to activate the hidden layer. The nodes in the hidden layer operate on the distance from an applied input vector to an internal parameter vector, called a center. The output layer implements a weighted sum of hidden-unit outputs. The mapping function is given by: for j = 1, 2,…, l, where X is the input m-dimensional vector, p j (X) is the output of the j-th unit, w i j are the output weight from the i-th hidden unit to the j-th output unit, n is the number of hidden units, ϕ i is the i-th radial basis function at the i-th hidden node. If Gaussian function is used as radial basis function, ϕ i is defined as follows.

SI104
where c i is the center and σ i is the width of the i-th hidden unit, respectively. RBF network is trained using unsupervised and supervised learning methods. The unsupervised method is implemented between input to hidden layers and supervised method is implemented from hidden to output layer. Clustering algorithms are capable of finding cluster centers that best represents the distribution of data in the first stage of the training. There are two alternative heuristics for finding width factors ( 13 ) In 10 , we compared the performance of RBF network to that of MLP network on several real and synthetic datasets of chaotic time series. Experimental results revealed that RBF network with phase space reconstruction outperforms MLP networks in chaotic time series prediction.

DEEP LEARNING METHOD FOR CHAOTIC TIME SERIES PREDICTION: DBN
Inspired with the work by Kumemoto et al. 9 , we use the DBN model which composes one or two RBMs and MLP to forecast chaotic time series. The DBN model is used for feature learning and the MLP is for prediction. The forecasting DBN model is described in Figure 2.
As for the number of input nodes (at the first visible layer of RBM(s)) of DBN, we determine this parameter by using the embedding dimension (m) that we obtained when applying the phrase-space reconstruction method for each chaotic dataset. Each input node has one external input which represents the elements of X t , i.e., x t , x t+τ , x t+2τ , . . . , x t+(m−1)τ . That means we combine DBN model with chaos theory in chaotic time series prediction. As for the activation function used in DBN, we use RELU function which is described by the following formula: The training algorithm for our proposed DBN consists of two stages: an unsupervised learning and a supervised learning. The unsupervised learning stage is the Contrastive Divergence (CD) algorithm 3 used for training the RBM(s). The CD algorithm progresses on a layer-bylayer basis. First, a RBM is trained directly on the input data. Hence, the neurons in the hidden layer of the RBM can capture the important features of the input data. The activations of the trained features are then used as "input data" to train a second RBM.
The supervised learning stage is the back-propagation algorithm used for training the MLP.

EXPERIMENTAL EVALUATION
In this experiment, we compare the DBN method for chaotic time series forecasting to the method using RBF network. We implemented the DBN forecasting method with Tensorflow framework (using Python language) 14 and the RBF method with Microsoft Visual C#, .Net framework 4.5 and conducted the experiments on a Core i5 2.4 GHz, RAM 8GB PC. In this study, the mean absolute error (MAE), the mean squared error (MSE) and the mean absolute percentage error (MAPE) are used as evaluation criteria. The formula for MAE, MSE and MAPE are given as follows: where n is the number of observations, y t is the actual value in time period t, and ŷ t is the forecast value for time period t.

Datasets and Parameter Setting
The main purpose of this study is to evaluate the performance of DBN in forecasting not only on synthetic but also real world chaotic time series. This follows the tradition of evaluating the proposed methods in chaotic time series prediction. Here, the tested datasets consist of 3 synthetic chaotic time series datasets and 4 real world chaotic time series datasets. All these datasets are commonly-used by the research community in chaotic time series prediction. They are described as follows.
1. This dataset is derived from the Lorenz system, given by the three differential equations: where, a = 10, b = 28, and c = 8/3. This time series consists of 1000 data points.
2. This dataset is derived from the Mackey-Glass system, given by the following differential equation: 3. This dataset is derived from the Rossler system, given by the three differential equations:  Figure 3 shows the plots of the three synthetic datasets. Figure 4 shows the plots of the four real world datasets. With four real world datasets (Sunspots, CPI, USD/GBP, IBM) we use Lyapunov exponent to check whether each of time series is chaotic or not. The test shows that all four datasets possess the chaotic characteristics.
In this work, we estimate the embedding dimension and compute Lyapunov exponents by using the tseri-esChaos package in the R software (website: https://C RAN.R-project.org/package= tseriesChaos).
In the experiment, we use RBF network in two versions: RBF with chaos theory (denoted as RBF-2) and RBF without chaos theory (denoted as RBF-1). For RBF-2, we have to determine the embedding dimen- For all datasets, the RBF has the maximum number of learning iterations set to 1000. The parameter values for two versions of RBF network with all datasets are reported in Table 1. In Table 1, η 3 is the learning rate for output weights, η 2 is the learning rate for centers and η 1 is the learning rate for width factors.

Science & Technology Development Journal -Engineering and Technology, 3(SI1):SI102-SI112
The parameter values of DBN model for all datasets are reported in Table 2. As suggested in the paper 9 , we set the number of RBMs in DBN model to 1 or 2. Except for Sunspots dataset, we use only one RBM in DBN for all the datasets and use two RBMs only for the Sunspots dataset. As for the number of input nodes (at the first visible layer of RBM(s)) of DBN, we determine this parameter by using the embedding dimension (m) that we obtained when applying the phrase-space reconstruction for each chaotic time series. That means the number of input nodes of the DBN model is the same as that of RBF network with SI107 chaos theory (RBF-2) for each dataset. As for the number units in hidden layer of RBM and the number of units in hidden layer of MLP and all the other remaining parameters, we have to determine them through experiment. As for the CD-k algorithm to pre-train the DBN, we set the number of iterations k to 2. Each dataset is divided into two sets: training set and test set. The training set and test set for each of the seven datasets are given in Table 3.

Experimental Results
The experimental results on prediction accuracy are reported in Table 4. The prediction errors of DBN model are shown in column 3. The prediction errors of the two versions of RBF network method are shown in the columns 4 and 5.
From the experimental results in Table 4, we can see that the prediction errors MAE, MSE, MAPE of DBN method are always much lower than those of the two RBF methods in most of the tested datasets (5 out of 7). That means the DBN method achieves the best prediction accuracy for all seven datasets. However, for the USB/GBP dataset (length= 295) and IBM dataset (length=255), DBN still brings out lower MAE and MSE errors, but has MAPE error a little bit higher than the two other methods. We conjecture that for these two times series datasets of small size, DBN cannot manifest its strength in comparison to RBF network, a shallow form of neural networks. This observation implies that deep neural networks, such as DBNs, can exhibit their superior performance especially in working with big data rather than small size data. The training times (in milliseconds) of the three methods over seven datasets are given in Table 5. From Table 5, we can see that the training time of DBN is much higher than those of the two versions of RBF method in most of the datasets.

DISCUSSION
In all experimented time series, RBF-2 model shows better prediction performance than RBF-1 model. This implies that by applying chaos theory (namely, phase-space reconstruction), RBF-2 model outperforms RBF-1 model which does not use chaos theory in chaotic time series prediction. It should be noted that in this work the parameters of RBF networks (RBF1, RBF2) and the hyperparameters of DBN model are selected only after a few trial-and-error tests on a limited number of parameter values. As for DBN model, some previous works pointed out that its hyper-parameters, such as learning rate of RBM, the number of hidden units for MLP, learning rate of MLP, etc. have a great influence on the performance of this deep learning model ( 9 ). Therefore, the hyper-parameters of DBN model should be fine-tuned to its optimum before it can be used. One of the methods to handle this problem is applying some meta-heuristics to find suitable values of hyper-parameters in DBNs for forecasting time series.

CONCLUSIONS
In this paper, we present and evaluate extensively a method using DBN in phase-space prediction of chaotic time series. We compare the performance of DBN with that of RBF network in three synthetic datasets and four real world datasets of chaotic time series. Experimental results obtained reveal that the DBN method performs better than the RBF network, a shallow form of neural networks, in most of the datasets. Experimental results also show that deep neural networks, such as DBNs, can exhibit their superior performance especially in working with big data rather than small size data. As for future work, we intend to apply some metaheuristics such as Particle Swarm Optimization (PSO) or Harmony Search in finding suitable values of hyper-parameters in DBNs for forecasting chaotic time series.

COMPETING ENTEREST
The authors have declared that no competing interests exist.

AUTHOR CONTRIBUTION
Duong Tuan Anh has proposed the main ideas of the article, collected the datasets and wrote the article. Ta Ngoc Huy Nam has implemented the comparative models, conducted the experiments and reported the experimental results. Mackey-Glass