5         The TellFuture Gas Load Forecast System

In the gas industry, daily or hourly load forecasting is critical for companies to optimize the distribution of their energy to their customers. For pipeline companies, load forecasting can help determine the operational impacts on their systems so that load demands are met and the estimate of future capacity availability may be made. It can also help them to find the best possible way to minimize the operational costs of meeting the demand i.e., the decisions regarding how many, at what level, and where to run compressors. A decision also has to be made regarding underground storage use, such as whether to inject or withdraw gas to meet demand. Another reason for load forecasting is that all gas moving on the system is not nominated. In other words, customers have certain rights to take or leave some gas in the system without having to nominate those transactions, and this capacity demand has to be planned. It is apparent from the above reasons that a reliable load forecast is a must for any kind of operational planning.

Gas load forecasting with neural networks borrows the ideas from electric load forecasting, in which neural networks have been used very extensively for short-term forecast - one hour up to a maximum of 24 hours [14]. Gas load forecasting, though it has similarities with forecasting in other utilities (water, electric power), has different objectives. It obtains forecasts for periods that match the planning cycles. Usually, this implies a forecast for the next three to five days.

In this chapter, a gas load forecast system - the TellFuture system, is built to show how neural networks work in forecasting. This system was implemented with Java, an object-oriented and platform–independent programming language. Here we will describe the details of this system.

5.1      Factors Affecting Load

The most difficult part of building a good model is to choose and collect the training and testing input data. A number of research papers [14][15][16][17][18][19][20] show that the following factors influence the demand of load:

1) Weather Conditions

This includes temperature, wind velocity, cloud cover, dew point, rainfall, and snowfall. It has been widely observed that, in most cases, there is a strong correlation between weather (especially temperature and wind velocity) and load demand. In most situations, as the temperature goes down, the demand for gas goes up and vice-versa. However, this relation is highly non-linear. Other weather effects influence the load to a lesser extent.

2) Calendar

This includes hour-of-day, day-of-week, and month-of-year, weekend and holiday effects. Most gas load patterns show a very consistent dependence on the calendar. For example, assuming all other factors remaining constant, the demand for energy at 1:00 AM when most people are sleeping is expected to be different from that at 6:00 AM when most people are getting up. Similar observations exist for the day-of-week. Though, it cannot be generalized, the middle days of the week (Tuesdays, Wednesdays, Thursdays and Fridays) behave differently from the remaining days. The month-of-year captures the seasonal effect. Holidays are again special days, they tend to produce behavior that is more like a weekend day.

3) Economic Information

This includes market gas price, price differential between gas and oil, and price differential between competitors’ prices. In many situations, the effect of economic factors on gas demand is non-trivial. A direct influence of economic factors on gas demand can be observed in some instances such as when the customer has storage fields for injection or withdrawal. If the market gas price is low, even if the temperature is high, there can be a high demand for gas if the customer is injecting gas into the storage field. Similarly, even if the temperature is low, if the gas price is high, customers may use the gas in the storage instead of buying new gas from pipeline companies, thus decrease the demand for load. Price differential between gas and oil plays an important role on the demand when the customer is a dual fuel use power plant. Here, depending on the price differential between gas and oil, the customer can increase or reduce gas consumption.

The above effects are those that can be quantified and hence are possible candidates to be used as inputs for neural net training and forecasting. There are other factors, such as contractual obligations that definitely influence gas demand, and these are too difficult to quantify and are therefore impossible to include as influencing variables. In addition there are a number of other factors such as maintenance or accidents on competitors’ lines, that influence the demand of gas load, that at best can only be explained qualitatively.

5.2      The Load Forecast Model

5.2.1      Input Data

The input data used in this model came from one of the author's clients, a pipeline company. They stored all of their history data in different tables in an Oracle database – weather information in one table, and hourly load history in another table. The author retrieved these data and stored them in text files with the following format:

       Date     

Hour

Temperature

Wind

Load

02-08-1998

00

37

3

1168

02-08-1998

01

37

9

1213

02-08-1998

02

37

6

1316

02-08-1998

03

37

3

1417

02-08-1998

04

37

3

1534

02-08-1998

05

37

5

1680

02-08-1998

06

36

5

1819

02-08-1998

07

34

6

1967

5.2.2      Data Processing

Given the above input data, the model can be set up to reflect up to the following six effects:

1) Temperature: represented by its actual value.

2) Wind velocity: represented by its actual value.

3) Hour-of-day: represents the 24 hours of one day by 0, 1, 2… 23

4) Weekday: represents Sunday, Monday, Tuesday, Wednesday, Thursday, Friday,  and Saturday by 0, 1, 2, 3, 4, 5, and 6 respectively. 

5) Weekend: represents Monday, Tuesday, Wednesday, Thursday and Friday as 0; Saturday and Sunday as 1.

6) Month-of-year: represents the twelve months in one year by 0 to 11 respectively.

Obviously, effect 1) and effect 2) are ordinal variables. They can be presented to the network by their actual values.  Effect 3), Effect 4), Effect 5) and Effect 6) are categorical variables.   From Section 4.1 we know that for categorical variables, we can either use the 1-of –c encoding method (which will use up to 1+1+24+7+2+12= 47 input units), or the one-effect-one-unit method (which will use only up to 1+1+1+1+1+1=6 input units). Here we adopt the second encoding method. Though this method does impose the artificial ordering on some data (Effect 3, Effect 4, Effect 5 and Effect 6), it dramatically decreases the input size, thus can simplify the system.

The data set will be created automatically using the above encoding scheme from the input data file (A class called DataSetfactory will handle this). The training set for the above input data will be encoded in the following format:  

Temperature

Wind

Hour

Weekday

Weekend

Month

Load

37

3

00

6

1

1

1168

37

9

01

6

1

1

1213

37

6

02

6

1

1

1316

37

3

03

6

1

1

1417

37

3

04

6

1

1

1534

37

5

05

6

1

1

1680

36

5

06

6

1

1

1819

34

6

07

6

1

1

1967

 

The DataSetfactory class also acts as a preliminary data filter to eliminate any outliners or bad data that were present. All inputs to the model are linearly scaled between 0 and 1, using the minimum and maximum values corresponding to the input vector.

5.2.3      Network Architecture

The network consists of one input layer, one output layer and one hidden layer. Obviously, there is only one output unit – the load. The number of input units is also fixed, depends on how many factors are included in the model, and how the factors are encoded. The number of hidden units needs to be decided by training with some test sets. Figure 5‑1 is the architecture of the load forecast model including all of the six effects that we mentioned before.

Figure 51 Load forecast model

The network requires enough hidden units to learn the general features of the relationship. With too many hidden units, it will cause overfitting while too few will lead to underfitting (Section 3.3.2 ). The goal is to use as few units in the hidden layer as possible while still retaining the network's ability to learn the relationships among the data. As mentioned earlier, including more than a single middle layer does not significantly improve the accuracy of the predictions.

The activation functions of the hidden units are sigmoid functions while the output activation function can be either a sigmoid function or a linear function, which can be selected by the users.

5.2.4      Implementation of The Back-Propagation Algorithm

The network is trained using the back-propagation algorithm. The standard sum-of-squares error function is used

                                                                                          (5.1)

Here is the Java code for the error function, which is one of the methods within the NeuralNetwork class:

  public double errorFunction (double[] x, double[] y) {

        double sum = 0.0;

        for (int i=0; i<x.length; i++)  sum += (x[i] - y[i])*(x[i] - y[i]);

        return 0.5 * sum;

    }

As mentioned above, the activation function for the hidden units is the sigmoid function:

                                                                                                 (5.2)

This function has a very useful feature – its derivative can be expressed in the following form:

                                                                                                (5.3)

 The above two equations can be easily coded:

public double sigmoid(double x) {

               if (x >  50.) return 1.0;

               if (x < -50.) return 0.0;

               return 1.0 /(1.0+Math.exp(-x));

            }

            public double sigmoidDerivative(double x) {

   return x*(1.0-x);

            }

The first step for the back-propagation is forward propagation

  void feedForward() {

      //For hidden units

      for (int i = 0; i<numberOfHiddenUnit; i++) {

            double sum = 0.0;

            for (int j=0; j<numberOfInputUnit+1; j++) {

                if (j==numberOfInputUnit)

            sum += weightLayer1[j][i]; // Include the Bias term

                else  sum += weightLayer1[j][i]*inputs[j];

            }

            hiddens[i] = sigmoid(sum);

        }

      //For output units

        for (int i = 0; i<numberOfOutputUnit; i++) {

            double sum =0.0;

            for (int j=0; j<numberOfHiddenUnit; j++) {

                sum += weightLayer2[j][i]*hiddens[j];

            }

            outputs[i] = sigmoid(sum);

        }

   }

The second step is error back-propagation. Using the expression derived from (3.13) and (3.16), we obtain the following results. For the output units, the δ’s are given by

                                                                                                      (5.4)

while for units in the hidden layer the δ’s are found using

                                                                                       (5.5)

The derivatives with respect to the first layer and second layer weights are then given by

 and                                                                               (5.6)

We use the gradient descent algorithm with momentum (3.18) to update the weights:

void backpropagation(double rate, double alpha) {

        double[] delta1 = new double[numberOfHiddenUnit];

        double[] delta2 = new double[numberOfOutputUnit];

       //Delta  for second layer

        for (int j=0; j<numberOfOutputUnit; j++) {

            delta2[j] = targets[j] - outputs[j];

        }

       //Delta  for first layer

        for (int j=0; j<numberOfHiddenUnit; j++) {

            double sum = 0.0;

            for (int k=0; k<numberOfOutputUnit; k++) {

                double term = delta2[k] * weightLayer2[j][k];

                if (outputActivationType==1) term *=sigmoidDerivative(outputs[k]);

                sum += term;

            }

            delta1[j] = sum;

        }

      //Update the second layer weights

      for (int i=0; i<numberOfHiddenUnit; i++) {

            for (int j=0; j<numberOfOutputUnit; j++) {

                double delta = delta2[j]*hiddens[i];

                if (outputActivationType==1)  delta *= sigmoidDerivative(outputs[j]);

                double weightChange = rate * delta +alpha*momentum2[i][j];

                weightLayer2[i][j] += weightChange;

                momentum2[i][j] = weightChange;

            }

        }

 //Update the first layer weights

        for (int i=0; i<numberOfInputUnit+1; i++) {

            for (int j=0; j<numberOfHiddenUnit; j++) {

                if (i!=numberOfInputUnit && inputs[i]==0) {

                    momentum1[i][j] = 0.;

                }

                else {

                    double delta = delta1[j]*sigmoidDerivative(hiddens[j]);

                    if (i!=numberOfInputUnit) delta *= inputs[i];

                     double weightChange = rate * delta +alpha*momentum1[i][j];

                     weightLayer1[i][j] += weightChange;

                      momentum1[i][j] = weightChange;

                }

            }

        }

    }

Batch learning method was adopted to train the networks.

5.2.5      Network Generalization

The Split-sample (or hold-out validation) method [5] is used to estimate generalization error.  With this method, part of the data is reserved as a test set that will not be used in the training. After training, run the network on the test set. The error on the test set provides us an unbiased estimate of the generalization error, with which we can decide whether the model is sufficiently general.

5.3      Features of The System

The TellFuture load forecast system has several useful features:

·        It checks the importance of each effect

·        It helps find optimal number of hidden units

·        It checks the influence of the learning rate and momentum

·        It displays the training and forecasting result in graphics and tabular form

·        It displays training errors

Figure 5‑2 is the main screen of the TellFuture system.

Figure 52  TellFuture forecast system main screen

5.3.1      Effects Setup

We can specify which effects can be included in the model by selecting Setup->Effect Setup menu, a dialog as show in Figure 5‑3 will appear. Selecting or deselecting each factor can help us to evaluate the importance of this factor. This provides us an efficient way to decide whether the effect should be included in the model or not if we are not quite sure. For this model, the default values including all the effects are the best. 

Figure 53 Effects setup

5.3.2      Network Parameters Setup

Once the input units and the output units are known, we can use the “Network Setup” (Figure 5‑4) to find the optimal number of hidden units. This is a trial and error process. We can start with a large number of epochs and a small number of hidden units. Next, we train the network and write down the minimum training error that the network can reach. Gradually we increase the number of hidden units and repeat the above process until the training error has no big changes.

We can also use this setup to study the influence of the learning rate and momentum to the training time. We can also choose different activation functions for the output units to see whether they have similar performances or not.

Figure 54 Network setup

5.3.3      Training the Network

From the main screen menu File->Load Training Set, training data which includes both input values and target values can be loaded to the system. The data are stored in the format as described previously. After setting up the effects and network parameters, we can click the Run-> Training menu to start the training process. The process stops upon hitting the error tolerance, or reaching the epoch limits. The training error for each epoch is shown in the prompt as the training process is going on. Once the training process finishes, we can click the Display->Display Training Results menu to show the training results in graphic form (Figure 5‑5 ) or tabular form (Figure 5‑6 ). The hourly training errors are shown in column 4 (Figure 5‑6 ). 

Figure 55 Training result graphic display

Figure 56 Training result tabular display

5.3.4      Testing the Network

Once we finish training the network, we need to do a generalization test to see whether this model is sufficiently general. This can be done by first loading the test set from menu File->Load Test Set; then clicking the Run-> Testing menu to run the system; and finally clicking the Display->Display Testing Results menu to show the testing results. The results can be displayed in a format similar to Figure 5‑5 and Figure 5‑6.

5.3.5      Future Load forecasting

After the network has been well trained and passed the generalization test, we can load the future predicted input data from the menu File->Load Predicted Inputs, and then click the Run-> Predict menu to get the forecasted loads. The results can be displayed in graphic form (Figure 5‑7) or tabular form, which is similar to Figure 5‑6 except without “Actual Load” and “Error%” columns.

Figure 57 Load forecast graphic display

5.4      Analysis of Results

1. The multi-layered feed-forward networks perform very well for short-term gas load forecasting.  The forecast accuracy has been in excess of 90% for this model. The weather and the calendar information have great impact on the load, especially the temperature.   

2. A reasonable number of hidden units for this model is between 7 and 12, and we recommend 10. Performances are not good if the numbers are less than 5 while larger than 12 has no significant decrease in the training error (Figure 5‑8).

Figure 58 Average squared error vs. hidden units

3. Learning rate and momentum have significant influence on the training process with the gradient descent method. Table 5‑1 shows the epochs that the training processes taken to meet the error tolerant (average square error is 0.0005) or reach the epoch limit (9999) with different values of learning rate and momentum, where each pair had 10 tests. It is easy to see that too large and too small learning rates converge slowly, while high momentum helps small learning rate to converge fast. The best learning rate and momentum term are 0.8 and 0.1 respectively for this model.   

4. There are no big differences between using a sigmoid activation function and a linear activation function for the output unit.

 


         Table 51 Training epochs vs. learning rate and momentum

 


Table 5‑1 (Continued)

           



<< Previou Page   Index Page   Next Page >>

Copyright ©2000-2007 Zhanshou Yu. All Right Reserved.