Short Term Forecast of COVID-19 cases in Japan Using Time Series Analysis Models

The new strain of coronavirus (COVID-19) was found to have started in Wuhan, China in late December 2019. The virus has spread to countries all over the world including Japan. The World Health Organization (WHO) declared COVID-19 as a pandemic on 11 March 2020 due to the increasing number of confirmed cases and deaths daily. The COVID-19 outbreak has impacted the nation of Japan adversely and the number of confirmed cases in Japan continues to increase day by day. On 7 April 2020, Japan declared a state of emergency to prevent the pandemic from worsening. This study is conducted to forecast new daily confirmed cases of COVID-19 in Japan over a short-term period. Four univariate time series models were applied: the Naïve Model, Mean Model, Autoregressive Integrated Moving Average (ARIMA) Model and Exponential State Space Model. This study analyses daily data from 22 January to 10 April 2020 collected from the Our World in Data website. The prediction involves five phases of data analysis and five different partitions of estimation and evaluation parts in every model to ensure the accuracy of forecast values. R and R Studio software were used in this study to analyze the data. The results reveal that Naïve model with 99 percent of estimation part and 1 percent evaluation part produces the lowest value of error measures for Mean Error (ME), Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Mean Absolute Scaled Error (MASE).


INTRODUCTION
The new pandemic of Coronavirus disease  affects every country of the world. Starting in December 2019, the world witnessed the COVID-19 outbreak in a town in Wuhan, China. The outbreak became an epidemic, and the disease quickly spread beyond China's borders to the rest of the world. The Japanese Ministry of Health reported the first case of COVID-19 on 16 January 2020 in a person who returned from Wuhan in late December 2019. Japan declared a state of emergency on April 7 and as of May 27, 2020, Japan is the western Pacific's third-largest nation with a total of 16,651 confirmed cases involving 886 deaths.
Generally, the COVID-19 outbreak has negatively impacted Japan's society and economy since the number of confirmed cases in Japan increases daily. The economy of JAPAN shrank in the second quarter of the year by 7.8 percent, making it the worst recession in the history of the country. This was primarily due to the economic difficulties that were triggered by the pandemic of COVID-19 (Times, 2020). Many sectors are forced to be closed temporarily, the business performance was drop, events are being canceled and Summer Olympics scheduled to be held in Tokyo, are forced to be postponed due to the pandemic. Due to the loss of a significant number of profits, tourism and restaurant companies are suffering a big setback (Yamamura & Tsutsui, 2020).
A time series is a pattern that is recorded at regular time intervals. Much research uses the time series model to predict time series data. Chintalapudi et al. (2020) used the ARIMA model to predict registered and recovered cases after 60 days of incarceration in Italy. Aimran and Afthanorhan (2015) compared four exponential smoothing techniques to determine the best model for predicting the Malaysian population. Dehesh et al. (2020) used ARIMA models to predict new cases of COVID-19 in several countries, Italy, China, South Korea, Iran, and Thailand. Duan and Zhang (2020) use ARIMA model to forecast daily new confirmed cases of the COVID -19 outbreaks in Japan and South Korea. They discovered that the estimated ARIMA model can very well capture the dependent structure of the daily new confirmed cases time series. Therefore, this paper compares four univariate time series models to identify the most suitable model in forecasting the number of COVID-19 infected cases in Japan. The findings will benefit for other parties such as the Government, the Ministry of Health and business operations in developing action plans to stop the spread of the COVID-19 outbreak.

METHODOLOGY
This study used secondary data of earlier of COVID-19 outbreak cases in Japan, from January 22 to April 10, 2020. It involves five phases as shown in Figure 1. The process starts with the data cleaning; to ensure the data is free from missing values or outliers that can affect the accuracy of the forecast values. Then the next stages were conducted until the final stage of this study, the best model will be used to predict shortterm forecast of daily confirmed cases of COVID-19 in Japan. The forecast values will be compared to the actual data to see how well that model performs on unseen data and determine its accuracy.  . Naïve Forecast or Naïve Model was invented by Michael Gilliland (Snapp, 2012). The model is said to function successfully when there is no discernible pattern in historical data such as trend and fluctuation. Otherwise, the predictions value will be less accurate. The equation for Naïve Model is given in Equation 1.
where m t F + is the forecast value for m-step-ahead made at period t, y refer to the mean of the actual historical time series.
Autoregressive Integrated Moving Average (ARIMA) Model also knowns as Box-Jenkins model, was developed by George Box and Gwilym Jenkins in 1976 (M.A. Lazim, 2018). The general term of ARIMA is written as ARIMA (p, d, q) where the terms p refers to the order of autoregressive, q for the order of moving average model and d for the number of differencing required to achieve stationary data. ARIMA Model is applied when the stationary assumption of variable is not met. The formula for ARIMA Model is  Hyndman et al. (2002) proposed the State Space Models for Exponential Smoothing (ETS) framework, which included all 18 exponential smoothing models. This advancement simplifies forecasting by automatically generating prediction intervals, likelihood, and model selection criteria based on the model framework. Hyndman et al. (2008) depicts the model framework.
The common approach used in forecasting to evaluate forecast accuracy is splitting the data into two parts: estimation and evaluation (Lazim, 2018). Estimation data will be used to estimate the model parameters, while evaluation data will evaluate its accuracy. Five different sets of data partitioning will be used to ensure the accuracy of the forecast values.

Model Selection Criteria
The best model is based on a model that produce the smallest error measures calculated based on the outof sample ( (7) and (8).  Figure 2 shows there is an upward trend with additive relationship of daily cases of COVID-19 in Japan. Data set from January 22 to April 10, 2020 with a total of 80 data demonstrate the maximum value is 636 while the minimum value is 0. This indicates that the highest new confirmed cases of COVID-19 in Japan (up to date data have been collected April 11, 2020) were 636 cases on 10 April 2020, and Japan once recorded no case of COVID-19 outbreak for the following date: February 2/2/2020 3/2/2020 6/2/2020 7/2/2020 9/2/2020 Four time series models of Naïve, Mean, ARIMA and ETS models are being analyzed into RStudio software. Table 3 and 4 present the model summary of 5 sets each model for estimation and evaluation part. Based on Table 3, all the models is well-fitted with the lowest error measure with Set 5 (80% for estimation and 20% of evaluation part). However, the "win" model or set of data partitioning will be selected based on the model that produces the smallest error measures in the evaluation part (M.A. Lazim, 2011). Therefore, based on Table 4, the best set for Naive model is Set 1, Mean model (Set 5), ARIMA model (Set 1) and Exponential State Space model (Set 4).  Then, a model that produces the lowest error measures will then be collected and compared to determine the best forecast, as shown in Table 5. The next step is to identify the best model out of four models used in this study. Again, the "win" model will be selected based on model that produce the lowest error measures values. Table 5 below is the comparison between four models that had been chosen as the 'win' models.  Table 5 show that the Naïve Model has the lowest error measures four out of five error measurements. Therefore, the Naïve Model has been selected as the best model and can be used to forecast future daily confirmed cases of COVID-19 in Japan. The predictions of 3-steps ahead forecast and the actual data of new confirmed cases in Japan from 11 to April 13, 2020, are shown in Table 6. The result produced using Naïve Model shows that the model is nearly identical to the actual data with predictive accuracy ranges from 32.89 percent to 86.21 percent. Therefore, this model is suitable for forecasting the new daily confirmed cases of COVID-19 in Japan in short term period.

CONCLUSION AND RECOMMENDATIONS
This study was conducted to predict the short term forecast for new daily confirmed cases of COVID-19 in Japan based on data starting from January 22, 2020 until April 10, 2020. We analyzed and generated the results of the Univariate Time Series Analysis models covering the Naïve Model, Mean Model, ARIMA Model and Exponential State Space Model.
For each model, five sets of data partitioning were used to ensure the accuracy of forecast values. Moreover, the calculation of five error measurements for ME, RMSE, MAE, MAPE, and MASE was crucial to determine the model performance, whereby the lower the value, the more efficient the forecasting model.
The Naive model is reliable for forecasting future new daily confirmed cases due to its high forecast accuracy. However, the forecast accuracy could change over time whenever there are additional or interruptions on the data. This implies that the model is sensitive to the fluctuations of the new daily confirmed cases of COVID-19. Therefore, we need more information to predict COVID -19 cases over a long period, to increase the model's accuracy. (Shaharudin et al., 2021) This study only valid based on the dataset of COVID -19 in Japan using the short-term data set (January 22, 2020 to April 10, 2020). Future studies should use long term data set (large sample size of data) and approach data partitioning using crossvalidation technique to compare the performance and accuracy of the forecast model.