Comparison of Fuzzy Time Series and ARIMA to Forecast Tourist Arrivals to Homestay in Pahang

Predictions of future events must be incorporated into the decision-making process. For tourism demand, forecasting is very important to help directors and investors to make decisions in operational, tactical, and strategic decisions. This study focuses on forecasting performance between Fuzzy Time Series and ARIMA to forecast the tourist arrivals in homestays in Pahang. The main objective of this study is to compare and identify the best method between Fuzzy Time Series and Autoregressive Integrated Moving Average (ARIMA) in forecasting the arrival of tourists based on the secondary data of tourist arrivals to homestay in Pahang from January 2015 to December 2018. ARIMA models are flexible and widely used in time-series analysis and Fuzzy Time Series which do not need large samples and long past time series. These two methods have been compared by using the mean square error (MSE) and mean absolute percentage error (MAPE) as the forecast measures of accuracy. The results show that Fuzzy Time Series outperforms the ARIMA. The lowest value of MSE and MAPE was obtained from using the Fuzzy Time Series method at values 2192305.89 and 11.92256, respectively.


INTRODUCTION
Tourism is defined by the World Tourism Organization (UNTWO) as the activity of person(s) travelling somewhere outside their usual places for a purpose and staying there in within a year. People travel for different reasons that might include one of the followings: for leisure, recreation, holiday, business, visiting relatives, or for medical purposes. There is more to tourism than people's general idea about it; it is closely connected to various other sectors like politics, economics, religion, agriculture, environment, health, finance, transportation, society, immigration, and education. In fact, the tourism industry has been acknowledged to be one of the world's major significant service industries (Bhuiyan et al., 2013). This industry plays the key role in economic growth as an important foreign exchange mediator, and it will continue to be so to many countries across the globe (Salman & Hasim, 2012).
Homestays are one of the important accommodation services, apart from hotels and resorts, in the tourism industry where the tourists can experience the country's rich culture. Since homestays has potential in attracting both domestic and international tourists, the Ministry of Tourism Malaysia has considered the Homestay Program a top priority. Homestays can greatly help in the promotion of local attractions to international tourists because it focusses on the lifestyle and experience of the culture and economic activities. Besides that, there is also the opportunity of cross-cultural exchange between hosts and guests and the potential impact of cultural differences on selection of destination attributes (Fathilah et al., 2011).
There are several methods, which have been applied to compare the accuracy of many univariate and multivariate models in forecasting the international city tourist arrival in Paris from its most important foreign source markets, namely Germany, Italy, Japan, the United Kingdom, and the United States (Gunter & Onder, 2014). These varying methods are EC-ADLM, classical and Bayesian VAR, TVP, ARMA, ETS, and I model. The RMSFE and the MAE were used to assess the accuracy of the methods used in their study, and they found that the univariate models of ARMA(1,1) and ETS are the most accurate in predicting the foreign source markets from US and UK. Claveria & Torra (2013) identified the best method for forecasting Catalonia's tourist arrivals by comparing artificial neural network (ANN) and two-time series models, ARIMA and self-exciting threshold autoregression (SETAR) models. They used the statistical data from 2001 to 2009 of tourist arrivals and the overnight stay from all different countries to Catalonia. The value of the root mean squared forecast error (RMSFE) was compared during the process of analysing the best method, and it was found that compared to ANN and SETAR, the ARIMA model is the most accurate method to forecast the tourist arrivals. This is because the model showed a significant lower forecasting error in most of the countries, but SETAR and ANN showed significant lower forecasting error for six and two countries.
The monthly data from the years 2003 to 2013 were used in a study to forecast the tourism arrivals in Singapore (Kumar & Sharma, 2016) and the application of SARIMA, ARIMA, and Holt winters models were conducted in forecasting the tourist inflow. For SARIMA, the initial stages of the time series showed a non-stationary with seasonality that later became trend, seasonal and irregular components. Finally, with Mean Absolute Percentage Error (MAPE) of 3.21, it was concluded that SARIMA is the most accurate model compared to ARIMA and Holt Winter. Thus, the estimated highest seasonal factor value is for July and the smallest seasonal factor value is January. It was also found that the tourist arrival in Singapore has an increasing trend.
The forecasting of the tourism demand modelling in Malaysia was researched by Loganathan et al. (2010), with the purpose of forecasting the one-period-ahead of international arrivals in the country using the quarterly data from 1995 to 2008 to forecast the year 2009. The applications of the ARIMA models were used to generate the forecast of international tourist arrivals. Next, a formal stationary test was conducted that showed that the series is stationary. After that, the autoregressive and moving average was identified. The study also forecasts future tourist arrivals using the ARMA model, which is a combination of the AR and MA. Finally, the research has concluded that the ARIMA model (1,0,1) cannot be used to predict the tourism demand because the tourism demand is not affected by seasonality.
Other than that, Muainuddin & Hasan (2018) conducted a research that aims to forecast the number of domestic tourist's homestay use in Pahang, using data sets from January 2009 to December 2016 of the number of domestic tourists at homestays in Pahang. The single exponential technique, Box-Jenkins (ARIMA), and the Holt's method were used to forecast Pahang's tourist arrivals. The research has concluded that the single exponential and Holt's methods were more suitable compared to Box-Jenkins model, while the best method based on the MAPE values is single exponential method. For future studies, the researchers have recommended that more forecasting methods be used.
A study on the forecasting of the total number of tourist flow to Xi'an Musuem was conducted by Li et al. (2016). They used the monthly tourist arrival from 2011 to 2014 as the study's secondary data. The researchers compared the existing Fuzzy Time Series model, traditional Grey Model, and time series models (ARMA(2,1)) with the proposed method, Hybrid Fuzzy Time Series model based on entropy and Markov chain optimization method for the study. They concluded that the entropy-based method is the most suitable method to forecast the tourist arrival to Xi'an Musuem.
Based on the previous research on forecasting tourism demand based on Improved Fuzzy Time Series Model by Chou et al. (2010), the proposed model is verified by using tourist datasets and comparing forecasting accuracy. The results showed that the Improved Fuzzy Time Series Model approach outperforms with lower mean absolute percentage error.
In addition, Lee et al. (2012)  Lastly, Sarahintu & Tarmudi (2015) had studied the application of Fuzzy Time Series method to predict the tourist arrivals to Sabah. Steps of this method were defining fuzzy sets based on the universe discourse, fuzzification, establishing fuzzy logical relationship groups, defuzzification and computing the forecasted results. Based on average forecasting errors, the forecasting accuracy was determined. As a result, Fuzzy Time Series is a suitable method to forecast the number of tourist arrivals.
In this study, tourist arrivals at homestays in Pahang are considered. Ultimately, the comparison of the forecasting methods that is Fuzzy Time Series and ARIMA presented here may allow people to make more accurate forecasts of tourism and help develop planning for various tourism activities.

METHODOLOGY
This study compares Fuzzy Time Series and ARIMA in forecasting. The number of domestic tourists who used homestays in Pahang Malaysia presents the data for this study. The following describes the selected forecasting methods.

Fuzzy Time Series
In previous research, the forecasting of real-world situation has been done by using the Fuzzy Time Series. The fuzzy time series concept is a popular choice for research and application of social science (Chou, 2018). The data used for Fuzzy Time Series was divided into two variables which are Date and Tarrivals (Table 1). Step 1: Using Microsoft Excel, the monthly data was converted into a percentage change. This percentage change is computed using Equation 1 where, n y is the actual value of domestic tourist arrivals at time t , and n1 y − is the observed value of domestic tourist arrivals at t1 − .
Step 2: The maximum and minimum value need to be identified from the percentage of changes. Then, define the Universe of discourse, U by using Equation 2: where 1 D and 2 D are the positive number that needs to be assigned in U .
Step 3: Construct the fuzzy set i U within the same length of the intervals where i is equal to 1 until 7. The i U will be constructed into equal length of intervals where i equal to 1 to 7. Next, fuzzification of interval and the frequency distribution will be calculated as follows: Next, each interval was added by the length of interval or also known as fuzzification of interval and the frequency was generated by using Microsoft Excel.
Step 4: Based on step 2, the interval of 1 2 3 n v , v , v , , v was generated. The interval was done in form of trapezoidal number which can be represented as follows: where n b is the membership values.
Step 5: All the data need to be listed in terms of percentage and each data is classified according to the generated interval from step 4. Next, based on the data classification, the fuzzy logical relationship needs to be generated to which the fuzzy logical relation is symbolized as: where i A is the actual data and j A is the upcoming data.
Step 6: Create the fuzzy logical relationship rule by referring to the fuzzy logical relation in step 5. The rule of the fuzzy logical relation needs to be arranged in groups such as: Step 7: Every fuzzy relation rule group must be classified into one of the three different types of rules set.
The forecast value will be calculated as follows:

Autoregressive Integrated Moving Average (ARIMA)
The Box-Jenkins approach is synonymous with the general ARIMA modelling and for seasonal data is SARIMA. Auto Regressive (AR) is the lags of the differenced series, Moving Average (MA) is the lags of errors and (I) is the number of differences used to make the time series stationary. The procedure of ARIMA is as follows: Step 1: The first step is to import the data from excel because the R-Programming is used as the platform to find ARIMA model. Then, convert the data into time series data format by using this command.
Step 2: Used 70% of the data of Tarrival for estimation part and the other 30% for evaluation part to identify ARIMA model. Since the data consisted of 48 sets, a total of 34 data sets are used for estimation. Another 14 data sets will be used for evaluation part.
Step 3: Plot the autocorrelation (ACF) and partial autocorrelation (PACF) to collect more conclusive evidence to identify if it is stationary or not stationary.
Step 4: Next, use the Augmented Dickey Fuller (ADF) test and KPSS test to check the stationarity. If the p-value of KPSS test is more than 5%, then the series is stationary. Meanwhile, the series is not stationary if the p-value of ADF test is more than 5%.
Step 5: If the series is not stationary based on correlogram of ACF and PACF and the result of ADF and KPSS test, perform the first differencing. Next, plot the correlogram for ACF and PACF to check the stationary. Then, test the stationary of the series by using ADF and KPSS test. If the series is stationary, develop model identification.
Step 6: Identify the equation of the ARIMA model using the estimation data. Determine the error measure of ARIMA model using the evaluation model data. Next, checking for the mis-specification using Box-Pierce Q-statistic and Ljung-Box Statistics.

Mean Square Error (MSE)
The model's forecasting performance can be compared using the Mean Square Error (MSE). Using MSE can help to prevent large errors, in addition to its being easy to calculate and understand. In this study, MSE will be calculated to measure the error and to determine the best method that gives the lowest error. The value of MSE is given by where t y is the actual observed value of total road accident at time t , and t y is the forecasted value.

Mean Absolute Percentage Error (MAPE)
The popular unit for free measure is Mean Absolute Percentage Error (MAPE) that can measure the prediction accuracy of the forecasting method. (Armstrong and Collopy, 1992). MAPE will give the accuracy values in percentage, and it can be written as: where t e is the error, t y is the actual values and t y is the forecasted values.

Data Collection
The number of domestic tourists who used homestays in Pahang Malaysia presents the data for this study.
The monthly data was obtained from Tourism Pahang Malaysia dating January 2015 until December 2018, which is four years.

Data Analysis
The Fuzzy Times Series and R-Studio for Autoregressive Integrated Moving Average (ARIMA) methods can be applied to forecast the tourism demand. The two can also be used to forecast other types of data. Thus, for this study, both methods will be analysed using Microsoft Excel and then compared using the measure of accuracy: Mean Square Error (MSE) and Mean Absolute Percentage Error (MAPE).

FINDINGS AND DISCUSSIONS
Using the Microsoft Excel and R-programming software's to analyse and forecast the data of tourist arrival that have been collected and conducted for Fuzzy Time Series and ARIMA, respectively. By identifying the smallest error measure, the comparison and selection of the best model will be determined.

Fuzzy Time Series
The forecasted value of tourist arrival in changes for July 2016 and November 2018 were 22.06%. The calculation process has been done for all fuzzy logical relationship based on the rule types. After calculating the forecasted value by percent, the value needs to be changed to the total number of tourist arrivals at homestays in Pahang. However, the forecasted value for January 2015 and February 2015 cannot be calculated because the input data is not enough. Figure 1 shows the relationship between actual data and forecasted value of the number of tourist arrival at homestays in Pahang using Fuzzy Time Series from January 2015 to December 2018.

ARIMA
The time series plot of tourist arrivals at homestays in Pahang as described in Figure 2 is executed using Rprogramming. The data set of Tourist Arrival at homestays in Pahang is imported and created using Microsoft Excel. In this analysis, the name of the data set was set as DataArrival and by using command View (DataArrival), the data of tourist arrival at homestay in Pahang from January 2015 to December 2018 will appear. The series requires differencing, test for ADF and KPSS and the best ARIMA model was ARIMA(1,1,0).

Comparisons between Fuzzy Time Series and ARIMA
The predictions of visitor arrivals in the testing period are done using the two forecasting methods, to compare the prediction performance of the three approaches for the period January 2015 to December 2018, the following measures of accuracy were calculated: mean square error (MSE) and mean absolute percentage error (MAPE). The accuracy of forecasting this series by Fuzzy time series is better than forecasting by ARIMA. The reason for this is that the model has the lowest value of MSE and MAPE. Table  2 shows the comparison between two error measures for both methods.

CONCLUSION AND RECOMMENDATIONS
This paper investigates Fuzzy Time Series and ARIMA methods to predict visitor arrivals at homestays in Pahang. To find the best method to forecast the number of tourist arrivals at homestays in Pahang, this study has compared the performance of two methods. The two methods used in this study are Fuzzy Time Series and Autoregressive Integrated Moving Average (ARIMA). Both data must be analysed to identify the difference between the actual and forecasted data. Moreover, the best method to forecast the tourist arrival at homestays in Pahang can be determine by identifying the error measure for each of the data in the research. The fuzzy time series is good for predicting visitor arrivals as it gives a small MSE and MAPE values of 2192305.89 and 11.92256 respectively. Thus, the main objective of this research, which is to compare between the two methods is achieved.
There are various more methods that can be applied to forecast the tourist arrivals to homestay in Pahang. Some of methods that can be applied are Holt Winter, Artificial Neural Network, Neural Network Autoregression, SARIMA and other methods. Future researcher can use the suggested methods to study about the comparison between two or more method in forecasting the tourist arrivals to homestay in Pahang to determine which prediction method is the best. In addition, for future researcher who wish to do research regarding forecast using Fuzzy Time Series method, it is recommended to make an accuracy comparison between classical Fuzzy Time Series method and Improved Fuzzy Time Series method. Lastly, for those researchers who are interested to do research about tourist arrival, it is suggested to study the number of tourist arrival in Pahang, the number of domestic tourist arrivals in Pahang, or the effect of tourist arrival to it expenditure.