Comparison of Fuzzy Time Series and Arima Model for Predicting Stock Prices

___


INTRODUCTION
The term stock price refers to the current price that a share of stock is trading for on the market. When shares of a publicly traded firm are issued, their value is assigned at a price that, ideally, represents the worth of the company itself. The intrinsic value of the stock could increase or decrease. Finding equities that are currently undervalued is the aim of most stock investors.
The stock market is very volatile and extremely unpredictable. This means that the shares can go up and come down for reasons that sometimes cannot be explained. Due to this unpredictability, the stock market is considered a risky prospect for many investors. There are many factors can affect the stock prices such as supply and demand, interest rates, exchange rates fluctuations, political upheaval, natural calamities and much more (Csiszar, 2020). All these factors can affect the yields of investors. However, if the market is studied in detail and investors have a clear understanding of the market, they can decide the best time to buy or sell stock and earning good return. The best prediction of timing the stock is the key to successful investing. Various types of models have been introduced by researchers to anticipate stock prices such as the Autoregressive Integrated Moving Average (ARIMA), Residual Income Model (RIM), Integrated Artificial Neural Network (ANN), Long Short-Term Memory (LTSM), and Fuzzy Time Series (FTS). Each of these mathematical models has their own set of benefits and drawbacks. This study aims to make a comparison between the performance of the ARIMA model and the FTS model in predicting stock prices of Top Glove Corporation Bhd.

Previous Research on ARIMA model and FTS model
Time series analysis is a specific way of analyzing a sequence of data points collected over an interval of time. Data points are recorded at consistent intervals over a set period of time. Time series analysis is important in various fields such as economics, social science, epidemiology, medicine and many more. It can be used for forecasting, which is predicting future data or likelihood of future events based on historical data.
In a study conducted by Alzahrani, Aljamaan, and Al-Fakih (2020), four time-series models are applied, namely Autoregressive (AR), Moving Average (MA), Autoregressive Moving Average (ARMA), and ARIMA models to find the best fit model. The main purpose of the study is to observe and forecast the spread of Covid-19 in Saudi Arabia by using historical data of daily cases. The study concluded that the most suitable model to be applied for prediction purposes is ARIMA. It is proved by evaluating the root mean square error (RMSE), root mean squared relative error (RMSRE), mean absolute percentage error (MAPE), mean absolute error (MAE), and coefficient of determination (R 2 ). The ARIMA model was also utilised by Sahai, Rath, Sood, and Singh (2020) to forecast Covid-19 cases in five nations: Brazil, Russia, the United States, Spain, and India. The results showed that ARIMA models portrayed a good accuracy for Covid-19 cases in those five countries and gave benefits to the governments of these countries to prepare better strategies in managing the pandemic during critical period.
According to Phan and Nguyen (2020), time series models like Autoregressive Integrated Moving Average (ARIMA) is most efficient when modeling linear time series forecasting but less efficient for non-linear models. Therefore, a combination of two methods, namely ARIMA and machine learning (ML), to build a water level forecasting model has been proposed by authors. The results proved that a hybrid model provides a better accuracy compared to forecasting using a single model. In conclusion, hybridizing a linear and a non-linear time series model produces robust forecasting results.
Specifically for predicting stock prices, Kumar Meher et al. (2021) used the ARIMA Model to anticipate the share prices of pharmaceutical companies in India. The findings of their study revealed that different share price businesses have varying levels of trustworthiness. For example, Sun Pharmaceutical's prediction between the actual and anticipated share prices appears to be more dependable than Dr Reddy Laboratories. Furthermore, they also discovered that the ARIMA model might be more trustworthy with a higher value of R and modified R-squared if it is constructed with fewer periods. Another approach for time series analysis is Fuzzy Time Series (FTS). According to Zadeh (1965) who was the first introduced the fuzzy set theory, this theory offers a wide range of scientific applications. A Fuzzy Time Series (FTS) approach based on fuzzy set theory was introduced as an alternative to the traditional time series models. Zhang et al. (2010) stated in their study that people could solve forecasting difficulties using FTS that combines people's subjective attitudes and objective history values. Their study incorporates FTS into crude oil price predictions for the short term. They looked at West Texas Intermediate oil and utilized the root mean square error method to assess the performance of their method and their findings show that FTS can produce good forecast results.
Jilani & Burney (2008) studied a basic time-variant fuzzy time series forecasting algorithm. The suggested method employs a heuristic approach to define frequency-density-based partitions of the universe of discourse. They developed a fuzzy metric to apply frequency-density-based partitioning. The forecast is calculated using a trend predictor in the proposed fuzzy metric. They stated that this new technology is being used to forecast the enrolments of University of Alabama. The result demonstrated that the suggested method is more accurate than other fuzzy time series methods. Lee & Suhartono (2012) proposed a new weighted fuzzy time series model to increase forecast accuracy in seasonal data, based on the Exponential Smoothing approach and graphical order selection in their work. Their research demonstrates how the graphical order fuzzy relationship may be used to quickly determine the best order for fuzzy time series.
FTS models have been used to forecast stock market prices because it can extract pertinent information from big data sets without relying on any model assumptions. It also can be used to predict individual stock prices, establish the trend of the stock market based on the anticipated open, high, low, and close values, and help traders decide whether to buy or sell a stock. (Hwang & Oh, 2010). Referring to all research findings from earlier studies, this study is created with the goal of contrasting ARIMA and FTS in order to evaluate the trend movement of Top Glove Corporation Berhad, the largest glove supplier in the world and a key player in the Covid-19 global pandemic catastrophe.

METHODOLOGY
This study will analyze the share prices of Top Glove Corporation Bhd from January 2017 to August 2021 by using ARIMA and FTS. Top Glove Corporation Bhd is a healthcare company that manufactures and sells medical equipment and supplies. This company makes, research, trades gloves and rubber items, and also provides E-commerce services for healthcare supplies and glove trade. The data was gathered from Bursa Market Place, a website providing real-time stock quotes and market news in Malaysia.

Autoregressive Integrated Moving Average (ARIMA)
ARIMA is a forecasting algorithm that is most widely used in time series analysis. The development process of an ARIMA model is divided into three stages: Model Identification, Model Estimation, and model validation (Lazim, 2005). ARIMA has three inputs which are p, d, and q, and is written in general form as ARIMA (p,d,q). p represents the number of lag observations; d is the degree of difference; q represents the size of the moving average. The differencing process is required to make the data stationary if it is not already stationary. This process will remove the trend pattern from the actual data. The number of times the data must differ before they become stationary is indicated by the order of differencing. The mathematical formula as follows: where ∆ is the number of differences. Backward shift operator is a useful notation used when differentiating. The following formula uses operator B to represent the number of backward steps.
First-order differencing: where and ∅ are constant parameters, = current value, − = ℎ is the order of lagged current value, = error term.

By backward shift operator:
( The Moving Average (MA) model where = mean, is the moving average, = error term.
By backward shift operator: The general algorithm of ARIMA (p, d, q), where d is the number of times the needs to be different to achieve stationary. = + ∅ 1 −1 − 1 −1 + , where = − −1 , the first difference or can be written as

Model Identification
The dataset is first split into estimation and evaluation parts. Afterward, the line charts of the autocorrelation function (ACF), and the partial autocorrelation function (PACF) are plotted to observe the stationary behaviour and identify at least five versions of the ARIMA general model that are most appropriate.
Decay Spike s AR(p) ; p is the number of spikes in the PACF.

Spike s
Decay MA(q); q is the number of spikes in the ACF.

Spike s
Spike s ARMA(p, q); p and q are the number of spikes in the PACF and ACF. Relatively high of adjusted R 2 and iv) Q-statistics and correlogram with no significant pattern in the autocorrelation functions (ACFs) and partial autocorrelation functions (PACFs) of the residuals, which means the residual of the selected model is white noise.

Fuzzy Time Series (FTS) Using Cheng's Model Algorithm
Step 1: Define the discourse universe and rule abstraction intervals. The discourse universe can be characterized as follows: : [minimum value, maximum value]. If the occurrence of linguistic intervals is greater than the average occurrence of all linguistic intervals, these intervals must be separated to achieve high forecasting accuracy. Sturges formula can be used to calculate the number of linguistic intervals w: While the formula below can be used to calculate the length of linguistic intervals, − where n is the total number of observations.
Step 2: Create an associated fuzzy set (linguistic value) for each observation in the training dataset. The fuzzy sets 1 , 2 ,..., for the universe of discourse are defined in this stage by the Sturges formula, where the value of denotes the grade of membership of in fuzzy set , where ∈ [0,1] , 1 ≤ ≤ and 1 ≤ ≤ . This process will determine the degree to which each stock price belongs to each ( = 1, … , ). If the stock price's maximum membership is less than the fuzzified stock price , then it is labelled as follows. ( Chen, 1996)  (10) Step 3: Establish fuzzy relationships and fuzzification. Two consecutive fuzzy sets, ( − 1) and ( ) can be combined into a single FLR as → .
Step 4: Create an FLRG for each FLR. FLRG can be formed by grouping FLR with the same LHSs. → , → , → , can be grouped as → , , . A fluctuation-type matrix will be constructed by all FLRs.
Step 5: Assign a weight to each item. The fluctuation-type matrix from step 4 is further standardized to ( ). The standardized weight matrix equation should be used to normalize the weight matrix: Step 6: Multiply the weight matrix, ( − 1), by the defuzzified matrix, ( − 1) to get the initial forecast value. The median of each linguistic interval is the entry of the defuzzified matrix, therefore = [ 1, 2 , … , ] is defined, where is the median of each linguistic interval. This equation below can be used to calculate the initialization forecast: Step 7: Calculate the value of the adaptive forecast specified in this equation: with ( − 1) ; the current stock index at time t-1, F(t) is the initial forecasting value from the equation in step 6, and adaptive forecast (t) is the convincing forecasting value for the future stock price (t).

FINDINGS AND DISCUSSIONS Data Analysis with ARIMA Model
Visual inspection of the time series plots reveals some patterns such as trends, seasonal variations, and cyclical changes. Therefore, a transformation is required to make the time series data stationary. After the first differencing, the time series data appears to be stationary at first glance. Then, a formal technique known as the Augmented Dickey-Fuller (ADF) test is used to verify the stationarity of the time series data.  Table 2 shows the outcomes of this test, which was conducted using EViews. A 5% level of significance was used to test this hypothesis. The probability of the ADF test statistic is 0.0017, which is less than the critical value of 0.05 for the first-order difference. This indicates that the data become stationary after first differencing. Figure 1 shows the Autocorrelation Function (ACF) and the Partial Autocorrelation Function (PACF) of the data after performing the first differencing. There is one spike in the ACF at lag 1, and the value changes faster at lag 2, which degrades to almost zero quickly at lag 3. For PACF, there is one spike at lag 1, and also have a decaying pattern as ACF. A series is said to be stationary if it does not show growth or decline over time which the data series does not indicate any trend component (Lazim, 2005). Therefore, the data is now stationary. In Figure 1, there is one spike in the ACF at lag 1, and there is one spike in the PACF. These features suggest ARIMA (1,1,1) as an initial model. As this model might not be the best fit, other possible models are identified by considering every possible combination of 'p' and 'q' which is closest to ARIMA (1,1,1). As a result, the following three models have been chosen and identified; ARIMA(0,1,0), ARIMA (0,1,1) and ARIMA (1,1,0). EVIEWS are used again to calculate the non-seasonal AR and MA parameter estimates and the results are shown in Table 3.  Based on result in Table 3, the residuals of all models are white noise, meaning that there are no significant autocorrelation coefficients, and no partial autocorrelation coefficients exist (Lazim, 2005). The ARIMA (1,1,0) is considered a better fit among all other models because it has the lowest value of AIC, BIC, Hannan-Quinn, and SE of regression. In addition, the value of its R-squared is also highest among all the suggested models. Thus ARIMA (1,1,0) is selected to be used in predicting the share price of Top Glove as shown in Figure 2.

Data Analysis with FTS Model
The value of discourse universe or rule abstraction intervals for the data are as follows: :  Table 4. The number of intervals formed can be used to determine linguistic values during the fuzzification stage depending on the effective interval obtained. Table 5 shows the result of the fuzzification using Cheng Model notated in linguistic numbers. Next, Fuzzy Logical Relationship Group (FLRG) is performed by grouping fuzzy sets with the same current state into one group in the next state, as shown in Table 6. There are two stages involved in the fuzzy time series forecasting process. The first stage is finding the middle value for each period, and the second stage is calculating the predicted values. Table 7 shows the defuzzification result obtained using FLRG. The calculation of the initial forecast value (Yt) is as follows:  The actual data is compared with the forecasted share price, as shown in Figure 3. This Cheng method has a weighting value known as an additive value, and its range varies from 0 to 1. This study uses an adaptive value of 0.2 because it provides the best predictions.  Table 8. The values of error measures of the ARIMA model are less than the FTS model for all three measurements. Consequently, in order to anticipate the share price of Top Glove Corporation, the ARIMA model performs better than the Fuzzy Time Series model.  There are a variety of approaches for comparing and determining which method is the best for forecasting.

CONCLUSION AND RECOMMENDATIONS
To get more precise findings, various algorithms may be used to train the projected value. Therefore, future researchers can utilise and evaluate any sample data by using various forecasting techniques such as the Residual Income Model (RIM), Integrated Artificial Neural Network (ANN), Long Short-Term Memory (LTSM), and many more to get a comprehensive overview of all methods in terms of accuracy and able to choose the most effective forecasting approach.