Fuzzy Time Series for Projecting School Enrolment in Malaysia

There are a variety of approaches to the problem of predicting educational enrolment. However, none of them can be used when the historical data are linguistic values. Fuzzy time series is an efficient and effective tool to deal with such problems. In this paper, the forecast of the enrolment of pre-primary, primary, secondary, and tertiary schools in Malaysia is carried out using fuzzy time series approaches. A fuzzy time series model is developed using historical dataset collected from the United Nations Educational, Scientific, and Cultural Organization (UNESCO) from the year 1981 to 2018. A complete procedure is proposed which includes: fuzzifying the historical dataset, developing a fuzzy time series model, and calculating and interpreting the outputs. The accuracy of the model is also examined to evaluate how good the developed forecasting model is. It is tested based on the value of the mean squared error (MSE), Mean Absolute Percent Error (MAPE) and Mean Absolute Deviation (MAD). The lower the value of error measure, the higher the accuracy of the model. The result shows that fuzzy time series model developed for primary school enrollments is the most accurate with the lowest error measure, with the MSE value being 0.38, MAPE 0.43 and MAD 0.43 respectively.


INTRODUCTION
It is very crucial to make reasonably accurate forecasting of the future school enrolment in Malaysia because many decisions of the education system and resource planning will be made from them. For this reason, many researchers have proposed various methods to predict school enrolments. Fuzzy time series is one of the popular approaches used by researchers in forecasting problems due to its high accuracy and effectiveness compared to the other methods. It is an idea formulated by Song and Chissom proposed in 1993. It has many applications such as, forecasting wheat production (Stevenson & Porter, 2009), forecasting rainfall distribution of a region (Dani & Sharma, 2013), forecasting short-term electric load proposed by Huang (2015) and etc.
A higher order fuzzy time series forecasting model based on adaptive expectation and ANN was suggested by Hakan et al., (2010). The forecasted values have been adjusted through adaptive expectation and feedforward neural network in ANN. Fuzzy relationships in the higher order fuzzy time series are defined easily using this approach. As a result, the forecasting precision has increased significantly as presented in the results. Chen et al., (2009) have used fuzzy time series in projecting the enrolments of the University of Alabama. The automatic clustering techniques and fuzzy logical relationships were used to analyse the data. This method has a higher accuracy with a smaller MSE value and was found to be more accurate than the previous methods. Mahdzar et al., (2015) used fuzzy time series in predicting the number of tourist arrivals in Sabah. The study was conducted to assist the government in effectively planning strategies that would maintain and increase the production of tourism in Malaysia. P. Ramesh & K. Razak used a narrative time-invariant fuzzy time series for forecasting college admissions. They have found that fuzzy time series have significantly impact on achieving a better forecasting accuracy.
From the literature, we can see that researchers have used fuzzy time series in various fields. Therefore, this paper is carried out to show the application of fuzzy time series in projecting number of school enrolments in Malaysia.

METHODOLOGY
In this study, secondary data is taken from the United Nations Educational, Scientific, and Cultural Organization (UNESCO) institute. It consists of statistics of gross school enrolments in all stages of education in Malaysia from the year 1981 to 2018; a time-series data. The 35-years data from 1983 to 2018 is treated as the estimation period which is then used to forecast the number of school enrolments for the preceding years. The data is chosen just for simulation purposes to demonstrate the accuracy of the methods applied.
According to Hakan et al. (2014), fuzzy time series approach can deal with very small data and does not require the linearity assumption. This method makes the process of calculation become straightforward, (Dani & Sharma, 2013) and easy to apply in so many problems. There are seven steps to be followed. It can be summarized as follows: Step 1: All data for pre-primary, primary, secondary, and tertiary schools for the year 1981 until 2018 is imported into MS Excel. There are 38 sets of data for each educational level, thus there are 152 sets of data in total. The data is analysed and converted into percentage using MS Excel based on equation 1.1.  Table 1 below. Based on the values in Table 1, the universe of discourse, , was determined in this step as shown in Table 2.

Level of education Universe of discourse
Pre-primary school Step 3: The fuzzy set i U is partitioned into several length intervals. In this study, i can be anywhere in the range between 1 and 7 because the data in each education level are different every year. Then, the fuzzification of interval and the frequency distribution of each interval is calculated. The length of the interval for fuzzification is calculated using Equation ( After that, the intervals are fuzzified, where the value of this length is added into each interval. The actual values of interval length for all education stage are presented in Table 3. After the fuzzification process, the frequency distribution is generated for each interval as shown in Table 4. The equal length of the interval is divided by the numbers of data for each interval. An example of calculation is shown below: for Pre-primary school: 1 = [−30.00,18.57) with the length of 11.43, 2 = [−18.57, −14.76), . . . , 4 = [−10.95, −7.14) with the length 3.81, etc.
Step 5: All the data is listed in percentage and being classified based on the interval that has been generated in Step 4. The fuzzy set i A shows a linguistics value and if the data is founded in the range of , v j then it will be a fuzzy number, j A . Then, the fuzzy logical relationships are generated based on the classified data. Fuzzy logical relation is symbolised as follows: where i A is the present state and j A is the upcoming state. The actual fuzzified data is produced and categorized into the corresponding fuzzy number as shown in Table 5. Then, fuzzy logical relationship is also constructed for all datasets. An example of logical relationship is shown in Table 6. Step 6: Fuzzy relationship groups are generated in this step. The fuzzy logical relationships' rule is also arranged in groups as follows: Step 7: Each fuzzy relationship rule group is classified into one of the three different types of rule. The predicted production for each group is different according to the rule set. The rules involved in this step are shown as follows: where n is the number of i A in this group.

Example of Calculation for Step 7
The data in this study are classified into two type of rules, which are Rule 2 and Rule 3. For Rule 2, the fuzzy logical relationship group for the year 1995 to 1996 is Step 4 was repeated to get the trapezoidal number of 4 A using the following calculation:    The process is repeated to forecast the number school enrolments of the remaining fuzzy logical relationships groups.

Accuracy Test
In this study, the accuracy of the forecasted value is evaluated using Mean Square Error (

FINDINGS AND DISCUSSIONS
The forecasted values in percentage are changed into the total value of the school enrolments. Table 8 shows the comparison between the actual number of school enrolments and the forecasted values for each year using FTS for all levels of schools in Malaysia. The forecasted value for the year 1981 and 1982 cannot be evaluated because there is no previous input data for these two years.  Figure 1 illustrates the ups-and-down movement of the data between the actual and the forecasted values. The horizontal axis of the graph displays the timeframe from the year 1981 to 2018 and the vertical axis represents the total numbers of school enrollments. In overall movement, it shows that most of the forecasted values are closest to the actual data. Hence, to be more specific, the error values are calculated.  Table 9 shows the values of MSE, MAPE and MAD for four different level of education.  1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008

CONCLUSION AND RECOMMENDATIONS
The important part of the prediction is identifying the smallest error to determine the most accurate mathematical model that can fit the actual data. In this study, fuzzy time series approach is used to generate the mathematical model of school enrolment for pre-primary, primary, secondary, and tertiary school in Malaysia from 1981 to 2018. The actual data and the forecasted data are analyzed to determine the difference between these two data. By calculating the lowest value of MSE, MAPE and MAD for each stage of education, the mathematical model with the best accuracy was determined. The best result came from the model constructed for the data of primary school enrolment which has the lowest values of MSE, MAPE and MAD which are 0.38, 0.43 and 0.43 respectively. In overall, the values of error measures of model for all education level are still reasonable and acceptable, indicating that fuzzy time series approach is an effective and accurate way to predict the number of school's enrolments.
This study has been done with only one method which is fuzzy time series approach. In future it might be possible to analyze the same data by using other approaches such as Artificial Neural Network (ANN) or a hybrid method which is a combination of two different models, which might produce a better result.