Autoregressive Integrated Moving Average vs. Artificial Neural Network in Predicting COVID-19 Cases in Malaysia

On March 11


INTRODUCTION
An outbreak caused by a new coronavirus  was declared by the Chinese government in December 2019 (Saba & Elsheikh, 2020). The virus was first detected from China and has since spread across the globe. On March 11, 2020, the World Health Organisation (WHO) officially announced the outbreak as a pandemic (Katris, 2021). The novel corona virus , a variant of SARS and MERS, began its journey in the Wuhan Province of China on January 21, 2020, and since then has spread to almost every country in the world. As the first cases reported in mid-January in Japan, South Korea, and Thailand, the Government in Chinese agreed to take measures by announcing an emergency lockdown starting on 23 rd January, with compulsory self-isolation and prohibiting travels out of the region (Sweeny et al., 2020). As reported by the World Health Organisation (WHO) on 10 th April 2020, confirmed cases and fatalities increased significantly and exceeded more than 1 million positive cases and more than 100,000 deaths within one month from the declaration date (Saba & Elsheikh, 2020).
On 25 th January 2020, Malaysia declared its first case of Covid-19 after testing close contact with positive cases of Chinese nationals arriving from Singapore in Malaysia (Muhamad, Zainon, Nawi & Ghazali, 2020). There were 190 new confirmed cases of Covid-19 in Malaysia as of March 15, 2020, taking the overall total of positive cases to 428, rendering it the most infected in Southeast Asia. This was the highest in the initial period of the pandemic in Malaysia (Kamaludin et al., 2020). The early move to reduce the positive cases initiated by Malaysia is Movement Control Order (MCO) (Muhamad et al., 2020). MCO's measures included a full ban on people leaving their homes or attending mass gatherings, as well as a restriction on all domestic and international travel. Academic institutions, as well as public and private buildings, were all shut down. The Royal Malaysian Police was called in to assist with the enforcement of the restrictions during this phase.
Despite of the continuous implementation of movement control order, the number of affected cases is not decreasing. Many concerns are looming over the spread of Covid-19 with the number of positive cases keep reaching higher per day. The number of people who will be infected in the upcoming days is keep being questioned every day. Is the curve will keep rising or gets flattened? Are there any mathematical models that could give a solution? Under the circumstances, it is very important to predict potential trends of this diseases so that the government, public health, as well as all citizens could be better prepared to deal with an upcoming emergency.

METHODOLOGY
In this study, two forecasting techniques are utilized to determine the most effective model for predicting upcoming Covid-19 cases: The first is Autoregressive Integrated Moving Average (ARIMA) modelling, and the second is Multilayer Perceptron Neural Networks (MPNN) modelling using an artificial neural network (ANN). From March 1, 2020, to March 29, 2021, 394 observations of daily Covid-19 instances in Malaysia were gathered from an online database www.kaggle.com.my.

Autoregressive Integrated Moving Average (ARIMA)
Autoregressive integrated moving average (ARIMA) is a model derived from the Box-Jenkins methodology. It was initially found in 1976 by Gwilym M. Jenkins and George E. P. Box (Lazim, 2011). ARIMA model have been used in various fields such as economics, social science, epidemiology, medicine, etc. Fattah, Ezzine, Aman, Moussami and Lachhab (2018) has proposed a study on predicting the demand of food company by using ARIMA model and Apergis, Mervar, and Payne (2017) was conducting research to determine the most appropriate model for creating precise predictions of tourists' arrival in Croatia.
In a study conducted by Alzahrani, Aljamaan, and Al-Fakih (2020), four time-series models are applied. The main purpose of the study is to observe and forecast the spread of Covid-19 in Saudi Arabia by using historical data of daily cases. The Box-Jenkins Methodology was used to conduct the study: Autoregressive (AR), Moving Average (MA), Autoregressive Moving Average (ARMA), and ARIMA models, which are synonymous to find the best-fit model. The study concluded that ARIMA is the most suitable model to be applied for prediction purposes. The following are three major stages involved in determining a suitable ARIMA model: 1) Model identification, 2) Model estimation and diagnostic checking, and 3) Model application.
Step 1: Model Identification The first stage in the development of an ARIMA model is the process of determining three parameters 'p', 'd', and 'q' in the model for an appropriate form of ARIMA(p,d,q). The parameter 'p' is determined by an Autoregressive (AR) process. It refers to the number of lagged terms of the dependent variable. Next, parameter 'd' denotes the number of differencing orders which require transformation from non-stationary series to a stationary time series. Lastly, parameter 'q', which is determined by Moving average (MA), refers to the number of lagged time or the order of moving average.
Prior to identifying a model, we must decide whether the series is stationary. If the stationary condition is not fulfilled by the series of data, a process of differentiation is necessary to transform it to a stationary series. There are four procedures to look at to check stationarity. Firstly, a time series data is plotted to see whether the series is constant around the mean value. Secondly, the correlograms, namely autocorrelation function (ACF) and the partial autocorrelation function (PACF), are analyzed. Finally, the two most popular approaches to test the unit root hypothesis, namely Augmented Dickey Fuller (ADF) and Kwiatkowski-Phillips-Schmidt-Shin (KPSS), are computed. Below is the ADF test by using the Ordinary Least Square (OLS) procedure to estimate the model, where is the number of lags for ∆ which ∆ = − −1 , is white noise with mean zero and variance 2 . The null hypothesis for this test, which presupposes that the series is not stationary, must be disproved. On the contrary, the null hypothesis for the KPSS test assumes that the data are stationary. Therefore, we wish to avoid rejecting the null hypothesis for this test. If both tests fail to get the series stationary, the differencing process is required until the data become stationary. The number of differencing processes is represented by the parameter 'd' in the ARIMA model. After successive unit root tests, we are able to identify all of the 'p', 'd' and 'q' parameters in ARIMA models.

Step 2: Model Validation and Diagnostic Checking
Model creation is a collaborative and iterative process. Numerous alternative models are evaluated before choosing the one that generally performs "best" using a criterion like Akaike's Information Criteria (AIC) (Chatfield & Xing, 2019). In this study, the fitness of an ARIMA model is evaluated using AIC. The ARIMA model is most appropriate when the AIC is low. Mathematically, it is formulated as: is the total of AR value plus MA value. is the number of observations in the data.
2 is a function to avoid overfitting the model. Besides AIC, another common statistical measure used to validate the ARIMA models is the Ljung-Box statistic which is given as, is the number of observations in the time series data. ℎ is the maximum lags being tested. is the order of AR terms. is the order of MA terms. is the ℎ sample autocorrelation of the residual terms. is the degree of differencing.

Step 3: Model Application
Once the model's fitness has been confirmed, it is then ready to be utilized to produce prediction values, where the accuracy between the actual output and the anticipated output is compared. This final stage can be done by using RStudio. The general steps necessary in creating ARIMA modelling are shown in Figure  1.

Multilayer Perceptron Neural Network (MPNN)
The second approach employed in this study is known as a Multilayer Perceptron Neural Network (MPNN), which is a feed-forward neural network type that is developed from the simple perceptron. It may represent non-linear functions by incorporating one or more hidden layers. A wide range of learning algorithms have been proposed to train multilayer perceptron networks. The back-propagation algorithm was the initial learning method designed for this purpose, and it is now used in nearly all business applications (Rodrigues & Carpinetti, 2019). Ranjan, Majhi, Kalli and Managi (2021) have used Multilayer Perceptron model to predict gross domestic product (GDP) in eight countries. Their outcome shown that the MPNN model was able to accurately forecast GDP figures with a lower mean absolute percentage error (MAPE) as well as performing well for solving economic-related issues. Additionally, Slimani, Sbiti and Amghar (2019) have compared a variety of neural network models such as Perceptron, Adaline, Radial Basis Function (RBF), NoProp, and Multilayer Perceptron (MPNN) to solve the traffic jam problem. With the smallest error in both training set and test set, Multilayer Perceptron (MPNN) generated the best accuracy forecast compared to other models. Another study has done by Zealand, Burn, and Simonovic (2000) for streamflow forecasting. The study has proved that the neural network model gave the best accuracy result compared to The Winnipeg Flow Forecasting System (WIFFS) model with the lowest root mean squared error (RMSE). They concluded that data-driven methods like neural network are suitable for handling complex problems.
In this work, the Alyuda NeuroIntelligence software is used to model multilayer perceptrons. Prior to obtaining the best neural network model, six main steps need to be executed. The stages are depicted in Figure 2. Step 1: Analyzing data The data must be checked before being imported into the Alyuda programme to remove data anomalies such as outliers or drifts that would adversely affect the network's performance. The dataset will be examined and divided into three parts, namely the training set, the testing set, and the validation set. To reduce overfitting, more data is used for the training set in the ratio of 75:25.
Step 2: Pre-processing data The outcomes of pre-processing data will be checked in stage two. The data needs to be clean before entering the network. Data cleaning involves the process of fixing or removing incorrect, corrupted, incorrectly formatted, duplicate, or incomplete data within a dataset.
Step 3: Designing network architecture Designing the network requires the decision of the number of nodes in the hidden layer and the number of architectures. By using Kolmogorov's Superposition Theorem, the number of nodes can be calculated mathematically as: represents the total of nodes in the hidden layer. While indicates the total of nodes in the input layer.
Analyzing data Pre-processing data Design network architecture Training the network Testing the network Validation Step 4: Train network At this stage, the optimal algorithm to employ for network training will be determined based on the model that gives the lowest value of absolute error. There are six different algorithms available in Alyuda NeuroIntelligence: Conjugate Gradient Descent, Quick Propagation Quasi-Newton, Levenberg-Marquardt, Limited Memory Quasi-Newton, Batch Back Propagation, and Online Back Propagation. When choosing these training algorithms, a lot of aspects are considered, including interpretability, the quantity of data points and features, the data format, and many more.
Step 5: Test network At this stage, the data will be validated and tested. Once the training phase is completed and the model parameters have been modified, the validation set is used to assess how well the model fits. Overfit and underfit scenario are identified by checking validation metrics such as accuracy and loss. The model is underfit when the validation loss is decreasing, and the model is overfit when validation loss is increasing. This study uses dropout layers to handle overfitting situation. In the testing phase, the targeted and predicted values are evaluated to assess the model's performance and identify the error that exists between them. As a result, the dataset's mean absolute error (MAE) is obtained by using the equation below.
is prediction, is target value and is total number of data points.

FINDINGS AND DISCUSSIONS Autoregressive Integrated Moving Average (ARIMA) model
The data is split into two sections during the model-building process: estimation and evaluation. The estimation component, or training data is used to fit the model, while the evaluation component, or test data, is used to evaluate the model's accuracy. With 295 training data and 99 testing data, the data splitting percentage is 75 percent versus 25 percent. All the training data (1 st March 2020 until 20 th December 2020) are transformed into a time series data type. Figure 4 shows the time series plot of Covid-19 cases in Malaysia. The series shows an upward trend over the period from March 1, 2020, until December 20, 2020, indicating the data series is not stationary. To satisfy the stationarity requirement, the trend must be eliminated. Thus, the process of differencing is required until the data series is stationary.

Figure 4: Time Series Plot of Covid-19 cases in Malaysia
First order differencing of the time series data is implemented, and the results obtained are shown in Figure 5, Figure 6, and Figure 7. From the result of Figure 4, it can be seen that the series in the first difference fluctuates randomly around zero value. Together, the present findings confirm it does not show growth or decline over time. More specifically, there is no presence of trend component. Therefore, the series can be said stationary.  To verify its stationarity, the two statistical tests ADF and KPSS are conducted. The results of both tests after the first order differencing are presented in Table 1. Both tests show that the data series is stationary. Therefore, further differencing the time series is no longer required and we adopt d=1 for the ARIMA(p,d,q) model since the process of differencing is only performed once.
The sample autocorrelation function (ACF) and the sample partial autocorrelation function (PACF) are the main tools to identify the initial ARIMA model for a given stationary time series. The parameters "p" and "q" in ARIMA (p,d,q) are guessed by the number of significant spikes that exceed the two significant limits (blue lines) in autocorrelation function (ACF) and partial autocorrelation function (PACF) plots in Figure  6 and Figure 7 respectively. According to Figure 6, there are significant spikes at lag 1, lag 2, lag 11, lag 12, lag 15, lag 17, lag 18, lag 19, and lag 21. This indicates moving average (MA) term of order is 9 (q=9). Referring to Figure 7, the significant spikes that extend beyond the limits are at lag 1, lag 2, lag 3, lag 4, lag 6, lag 10, lag 14, lag 15, lag 16, lag 18, lag 21, and lag 24. This indicates an autoregressive (AR) term of order 12 (p=12). As a result, the finding suggests ARIMA (12,1,9) as an initial model. But ARIMA (12,1,9) might not be the best fit. Therefore, in order to find the best model, other possible models are identified by considering every possible combination 'p' and 'q'.
A well fitted model produces the lowest Akaike's Information Criteria (AIC) and the residuals obtained are expected to be independently distributed. There are 25 models based on the Ljung-Box test, with independently distributed residuals (the errors are white noise) as shown in Table 2. However, a comparison of their AIC values is performed in order to select only one best model out of these 26 well-specified models. Based on the smallest value of AIC, the result points towards ARIMA (4,1,5). Thus, ARIMA (4,1,5) is the best fit model among the others. The mean absolute error of ARIMA (4,1,5) is then calculated to assess the accuracy of its forecasts, and the result is 1096.799.

Multilayer Perceptron Neural Network (MPNN)
The dataset for this study is partitioned into three parts: training set, testing set, and validation set as shown in Table 3. Since the neural networks can only process numeric inputs, all the data are in numerical format. Neural networks also require the input to be scaled in a consistent way. Therefore, the data is standardized into a distribution with a mean of 0 and a standard deviation of 1, through normalization process. The structure of neural networks consists of input, hidden neurons, and output. The most effective network in this study is [2-1-1] based on the minimum AIC. [2-1-1] refers to a network design with two inputs, one hidden node, and one output chosen from the 10 possible network architectures as shown in Table 4. Next, the network architecture [2-1-1] is trained with seven training algorithms as shown in Table 5. Based on the absolute error of these training algorithms, Quasi-Newton shows the smallest training absolute error. Hence, this algorithm is chosen to be applied on the [2-1-1] network in the testing phase. In the final phase (testing phase), the performance of the [2-1-1] neural network model is evaluated by using the testing dataset. Based on the testing result in Table 6, we may conclude that Multilayer Perceptron Neural Networks are appropriate for time series forecasting purposes, given the low mean absolute error of 334.59. Comparison between ARIMA and MPNN model Figure 8, Figure 9, and Table 7 provide graphical and statistical results for evaluating the forecast performance of ARIMA and MPNN. Based on the Table 9, the mean absolute errors (MAE) for the MPNN model are the lowest, demonstrating its better performance over the ARIMA (4,1,5) model. Additionally, the outcomes of Figures 5 and 6 also support this conclusion.

Forecasting the future Covid-19 cases in Malaysia
Next, the best MPNN model [2-1-1] is used to predict how many new cases there will be in the next 30 days. Figure 9 shows an upward trend in the number of instances. The number of positive cases is steadily increasing beginning in April of 2021. There are several potential causes for this increase in the number of cases, but the establishment of community clusters is most certainly one of them.  Several ideas that can be utilized to enhance future research. Firstly, the same techniques should be used for real-time data updates. In terms of forecasting accuracy, a larger number of data would provide more accurate model forecasting. Finally, as suggested by Aggarwal et al. (2020) and Phan and Nguyen (2020), hybridization of ARIMA-ANN should be good to be proposed.