|
|
In the gas industry, daily or hourly load forecasting
is critical for companies to optimize the distribution of their energy to their
customers. For pipeline companies, load
forecasting can help determine the operational impacts on their systems so that
load demands are met and the estimate of future capacity availability may be
made. It can also help them to find the best possible way to minimize the
operational costs of meeting the demand i.e., the decisions regarding how many,
at what level, and where to run compressors. A decision also has to be made
regarding underground storage use, such as whether to inject or withdraw gas to
meet demand. Another reason for load forecasting is that all gas moving on the
system is not nominated. In other words, customers have certain rights to take
or leave some gas in the system without having to nominate those transactions,
and this capacity demand has to be planned. It is apparent from the above
reasons that a reliable load forecast is a must for any kind of operational
planning. Gas load
forecasting with neural networks borrows the ideas from electric load
forecasting, in which neural networks have been used very extensively for
short-term forecast - one hour up to a maximum of 24 hours [14]. Gas load
forecasting, though it has similarities with forecasting in other utilities
(water, electric power), has different objectives. It obtains forecasts for
periods that match the planning cycles. Usually, this implies a forecast for
the next three to five days. In this chapter, a gas load forecast system - the TellFuture system, is built to show how neural networks work in forecasting. This system was implemented with Java, an object-oriented and platform–independent programming language. Here we will describe the details of this system. 5.1 Factors Affecting LoadThe most difficult
part of building a good model is to choose and collect the training and testing
input data. A number of research papers [14][15][16][17][18][19][20] show that
the following factors influence the demand of load: 1) Weather Conditions This includes temperature, wind velocity, cloud
cover, dew point, rainfall, and snowfall. It has been widely observed that, in
most cases, there is a strong correlation between weather (especially
temperature and wind velocity) and load demand. In most situations, as the
temperature goes down, the demand for gas goes up and vice-versa. However, this
relation is highly non-linear. Other weather effects influence the load to a
lesser extent. 2) Calendar This includes hour-of-day, day-of-week, and
month-of-year, weekend and holiday effects. Most gas load patterns show a very
consistent dependence on the calendar. For example, assuming all other factors
remaining constant, the demand for energy at 1:00 AM when most people are sleeping
is expected to be different from that at 6:00 AM when most people are getting
up. Similar observations exist for the day-of-week. Though, it cannot be
generalized, the middle days of the week (Tuesdays, Wednesdays, Thursdays and
Fridays) behave differently from the remaining days. The month-of-year captures
the seasonal effect. Holidays are again special days, they tend to produce
behavior that is more like a weekend day. 3) Economic Information This includes market gas price, price
differential between gas and oil, and price differential between competitors’
prices. In many situations, the effect of economic factors on gas demand is
non-trivial. A direct influence of economic factors on gas demand can be
observed in some instances such as when the customer has storage fields for
injection or withdrawal. If the market gas price is low, even if the
temperature is high, there can be a high demand for gas if the customer is
injecting gas into the storage field. Similarly, even if the temperature is low,
if the gas price is high, customers may use the gas in the storage instead of
buying new gas from pipeline companies, thus decrease the demand for load.
Price differential between gas and oil plays an important role on the demand
when the customer is a dual fuel use power plant. Here, depending on the price
differential between gas and oil, the customer can increase or reduce gas
consumption. The above effects are those that can be quantified and hence are possible candidates to be used as inputs for neural net training and forecasting. There are other factors, such as contractual obligations that definitely influence gas demand, and these are too difficult to quantify and are therefore impossible to include as influencing variables. In addition there are a number of other factors such as maintenance or accidents on competitors’ lines, that influence the demand of gas load, that at best can only be explained qualitatively. 5.2 The Load Forecast Model5.2.1 Input DataThe input data used in this model came from one of the author's clients, a pipeline company. They stored all of their history data in different tables in an Oracle database – weather information in one table, and hourly load history in another table. The author retrieved these data and stored them in text files with the following format:
5.2.2 Data ProcessingGiven the above input data, the model can be set up to reflect up to the following six effects: 1) Temperature: represented by its actual value. 2) Wind velocity: represented by its actual value. 3) Hour-of-day: represents the 24 hours of one day by 0, 1, 2… 23 4) Weekday: represents Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, and Saturday by 0, 1, 2, 3, 4, 5, and 6 respectively. 5) Weekend: represents Monday, Tuesday, Wednesday, Thursday and Friday as 0; Saturday and Sunday as 1. 6) Month-of-year: represents the twelve months in one year by 0 to 11 respectively. Obviously, effect 1) and effect 2) are ordinal variables. They can be presented to the network by their actual values. Effect 3), Effect 4), Effect 5) and Effect 6) are categorical variables. From Section 4.1 we know that for categorical variables, we can either use the 1-of –c encoding method (which will use up to 1+1+24+7+2+12= 47 input units), or the one-effect-one-unit method (which will use only up to 1+1+1+1+1+1=6 input units). Here we adopt the second encoding method. Though this method does impose the artificial ordering on some data (Effect 3, Effect 4, Effect 5 and Effect 6), it dramatically decreases the input size, thus can simplify the system. The data set will be created automatically using the above encoding scheme from the input data file (A class called DataSetfactory will handle this). The training set for the above input data will be encoded in the following format:
The DataSetfactory class also acts as a
preliminary data filter to eliminate any outliners or bad data that were
present. All inputs to the model are linearly scaled between 0 and 1, using the
minimum and maximum values corresponding to the input vector. 5.2.3 Network ArchitectureThe network
consists of one input layer, one output layer and one hidden layer. Obviously,
there is only one output unit – the load. The number of input units is also
fixed, depends on how many factors are included in the model, and how the
factors are encoded. The number of hidden units needs to be decided by training
with some test sets. Figure 5‑1
is the architecture of the load forecast model
including all of the six effects that we mentioned before.
Figure 5‑1 Load
forecast model The network requires enough hidden units to learn the
general features of the relationship. With too many hidden units, it will cause
overfitting while too few will lead
to underfitting (Section 3.3.2
). The goal is to use as few units in the hidden layer
as possible while still retaining the network's ability to learn the
relationships among the data. As mentioned
earlier, including more than a single middle layer does not significantly
improve the accuracy of the predictions. The activation functions of the hidden units are sigmoid functions while the output activation function can be either a sigmoid function or a linear function, which can be selected by the users. 5.2.4 Implementation of The Back-Propagation AlgorithmThe network is
trained using the back-propagation algorithm. The standard sum-of-squares error
function is used
Here is the Java
code for the error function, which is one of the methods within the NeuralNetwork class: public double errorFunction (double[]
x, double[] y) { double sum = 0.0; for (int i=0; i<x.length; i++) sum += (x[i] - y[i])*(x[i] -
y[i]); return 0.5 * sum;
} As mentioned
above, the activation function for the hidden units is the sigmoid function:
This function has a
very useful feature – its derivative can be expressed in the following form:
The above two equations can be easily coded: public double sigmoid(double x) { if
(x > 50.) return 1.0; if (x < -50.) return 0.0; return 1.0 /(1.0+Math.exp(-x)); } public double sigmoidDerivative(double
x) { return x*(1.0-x); } The first step for the
back-propagation is forward propagation
void feedForward() { //For hidden units for (int i = 0;
i<numberOfHiddenUnit; i++) { double sum = 0.0; for (int j=0;
j<numberOfInputUnit+1; j++) { if (j==numberOfInputUnit) sum
+= weightLayer1[j][i]; // Include the Bias term else sum += weightLayer1[j][i]*inputs[j]; } hiddens[i] =
sigmoid(sum); } //For output units for (int i = 0;
i<numberOfOutputUnit; i++) { double sum =0.0; for (int j=0;
j<numberOfHiddenUnit; j++) { sum +=
weightLayer2[j][i]*hiddens[j]; } outputs[i] =
sigmoid(sum); } } The second step is error back-propagation. Using the
expression derived from (3.13) and (3.16), we obtain the following results. For
the output units, the δ’s are given by
while for units in the
hidden layer the δ’s are found using
The derivatives with
respect to the first layer and second layer weights are then given by
We use the gradient
descent algorithm with momentum (3.18) to update the weights: void backpropagation(double rate,
double alpha) { double[] delta1 = new
double[numberOfHiddenUnit]; double[] delta2 = new
double[numberOfOutputUnit]; //Delta for second layer for (int j=0;
j<numberOfOutputUnit; j++) { delta2[j] = targets[j]
- outputs[j]; } //Delta for first layer for (int j=0;
j<numberOfHiddenUnit; j++) { double sum = 0.0; for (int k=0;
k<numberOfOutputUnit; k++) { double term =
delta2[k] * weightLayer2[j][k]; if
(outputActivationType==1) term *=sigmoidDerivative(outputs[k]); sum += term; } delta1[j] = sum; } //Update the second layer
weights for (int i=0;
i<numberOfHiddenUnit; i++) { for (int j=0; j<numberOfOutputUnit; j++) { double delta =
delta2[j]*hiddens[i]; if
(outputActivationType==1) delta *=
sigmoidDerivative(outputs[j]); double weightChange = rate * delta +alpha*momentum2[i][j]; weightLayer2[i][j]
+= weightChange; momentum2[i][j] =
weightChange; } } //Update the
first layer weights for (int i=0;
i<numberOfInputUnit+1; i++) { for (int j=0;
j<numberOfHiddenUnit; j++) { if
(i!=numberOfInputUnit && inputs[i]==0) {
momentum1[i][j] = 0.; } else { double delta =
delta1[j]*sigmoidDerivative(hiddens[j]); if (i!=numberOfInputUnit)
delta *= inputs[i]; double weightChange = rate * delta
+alpha*momentum1[i][j]; weightLayer1[i][j] += weightChange; momentum1[i][j] = weightChange; } } } } Batch
learning method was adopted to train the networks. 5.2.5 Network GeneralizationThe Split-sample (or hold-out validation) method [5]
is used to estimate generalization error. With this method, part of the
data is reserved as a test set that will not be used in the training. After
training, run the network on the test set. The error on the test set provides
us an unbiased estimate of the generalization error, with which we can decide
whether the model is sufficiently general. 5.3 Features of The SystemThe TellFuture load forecast system has several useful features: · It checks the importance of each effect · It helps find optimal number of hidden units · It checks the influence of the learning rate and momentum · It displays the training and forecasting result in graphics and tabular form · It displays training errors Figure 5‑2 is the main screen of the TellFuture system.
Figure 5‑2 TellFuture forecast system main screen 5.3.1 Effects SetupWe can specify which effects can be included in the model by selecting Setup->Effect Setup menu, a dialog as show in Figure 5‑3 will appear. Selecting or deselecting each factor can help us to evaluate the importance of this factor. This provides us an efficient way to decide whether the effect should be included in the model or not if we are not quite sure. For this model, the default values including all the effects are the best.
5.3.2 Network Parameters SetupOnce the input units and the output units are known, we can use the “Network Setup” (Figure 5‑4) to find the optimal number of hidden units. This is a trial and error process. We can start with a large number of epochs and a small number of hidden units. Next, we train the network and write down the minimum training error that the network can reach. Gradually we increase the number of hidden units and repeat the above process until the training error has no big changes. We can also use this setup to study the influence of the learning rate and momentum to the training time. We can also choose different activation functions for the output units to see whether they have similar performances or not.
5.3.3 Training the NetworkFrom the main screen menu File->Load Training Set, training data which includes both input values and target values can be loaded to the system. The data are stored in the format as described previously. After setting up the effects and network parameters, we can click the Run-> Training menu to start the training process. The process stops upon hitting the error tolerance, or reaching the epoch limits. The training error for each epoch is shown in the prompt as the training process is going on. Once the training process finishes, we can click the Display->Display Training Results menu to show the training results in graphic form (Figure 5‑5 ) or tabular form (Figure 5‑6 ). The hourly training errors are shown in column 4 (Figure 5‑6 ).
Figure 5‑5 Training
result graphic display
Figure 5‑6 Training result tabular display 5.3.4 Testing the NetworkOnce we finish training the network, we need to do a generalization test to see whether this model is sufficiently general. This can be done by first loading the test set from menu File->Load Test Set; then clicking the Run-> Testing menu to run the system; and finally clicking the Display->Display Testing Results menu to show the testing results. The results can be displayed in a format similar to Figure 5‑5 and Figure 5‑6. 5.3.5 Future Load forecastingAfter the network has been well trained and passed the generalization test, we can load the future predicted input data from the menu File->Load Predicted Inputs, and then click the Run-> Predict menu to get the forecasted loads. The results can be displayed in graphic form (Figure 5‑7) or tabular form, which is similar to Figure 5‑6 except without “Actual Load” and “Error%” columns.
Figure 5‑7 Load forecast graphic display 5.4 Analysis of Results1. The multi-layered feed-forward networks perform very well for short-term gas load forecasting. The forecast accuracy has been in excess of 90% for this model. The weather and the calendar information have great impact on the load, especially the temperature. 2. A reasonable number of hidden units for this model is between 7 and 12, and we recommend 10. Performances are not good if the numbers are less than 5 while larger than 12 has no significant decrease in the training error (Figure 5‑8).
Figure 5‑8 Average squared error vs. hidden units 3. Learning rate and momentum have significant influence on the training process with the gradient descent method. Table 5‑1 shows the epochs that the training processes taken to meet the error tolerant (average square error is 0.0005) or reach the epoch limit (9999) with different values of learning rate and momentum, where each pair had 10 tests. It is easy to see that too large and too small learning rates converge slowly, while high momentum helps small learning rate to converge fast. The best learning rate and momentum term are 0.8 and 0.1 respectively for this model. 4. There are no big differences between using a sigmoid activation function and a linear activation function for the output unit. Table 5‑1 Training epochs vs. learning rate and momentum
Table 5‑1 (Continued)
|
|