2         Fundamentals of Neural Networks

Neural networks, sometimes referred to as connectionist models, are parallel-distributed models that have several distinguishing features [3]:

1)      A set of processing units;

2)      An activation state for each unit, which is equivalent to the output of the unit;

3)      Connections between the units. Generally each connection is defined by a weight wjk  that determines the effect that the signal of unit j has on unit k;

4)      A propagation rule, which determines the effective input of the unit from its external inputs;

5)      An activation function, which determines the new level of activation based on the effective input and the current activation;

6)      An external input (bias, offset) for each unit;

7)      A method for information gathering (learning rule);

8)      An environment within which the system can operate, provide input signals and, if necessary, error signals.

2.1      Processing Unit

A processing unit (Figure 2‑1), also called a neuron or node, performs a relatively simple job; it receives inputs from neighbors or external sources and uses them to compute an output signal that is propagated to other units.

Figure 21 Processing unit

Within the neural systems there are three types of units:

1)      Input units, which receive data from outside of the network;

2)      Output units, which send data out of the network;

3)      Hidden units, whose input and output signals remain within the network.

Each unit j can have one or more inputs x0, x1, x2,  xn, but only one output zj. An input to a unit is either the data from outside of the network, or the output of another unit, or its own output.      

2.2      Combination Function

Each non-input unit in a neural network combines values that are fed into it via synaptic connections from other units, producing a single value called net input. The function that combines values is called the combination function, which is defined by a certain propagation rule. In most neural networks we assume that each unit provides an additive contribution to the input of the unit with which it is connected. The total input to unit j is simply the weighted sum of the separate outputs from the connected units plus a threshold or bias term qj :     

                                                                                               (2.1)     

The contribution for positive wji is considered as an excitation and an inhibition for negative wji. We call units with the above propagation rule sigma units.

In some cases more complex rules for combining inputs are used. One of the propagation rules known as sigma-pi has the following format [3]:

                                                                                 (2.2)

Lots of combination functions usually use a "bias" or "threshold" term in computing the net input to the unit. For a linear output unit, a bias term is equivalent to an intercept in a regression model. It is needed in much the same way as the constant polynomial ‘1’ is required for approximation by polynomials.

2.3      Activation Function

Most units in neural network transform their net inputs by using a scalar-to-scalar function called an activation function, yielding a value called the unit's activation. Except possibly for output units, the activation value is fed to one or more other units. Activation functions with a bounded range are often called squashing functions.  Some of the most commonly used activation functions are [4]:

1)       Identity function (Figure 2‑2)

                                                                                                         (2.3)

It is obvious that the input units use the identity function. Sometimes a constant is multiplied by the net input to form a linear function.

Figure 22 Identity function

2) Binary step function (Figure 2‑3)

Also known as threshold function or Heaviside function. The output of this function is limited to one of the two values:

                                                                            (2.4)

This kind of function is often used in single layer networks. 

Figure 23 Binary step function

3) Sigmoid function (Figure 2‑4)

                                                                                                 (2.5)

This function is especially advantageous for use in neural networks trained by back-propagation, because it is easy to differentiate, and thus can dramatically reduce the computation burden for training. It applies to applications whose desired output values are between 0 and 1.  

Figure 24 Sigmoid function

4) Bipolar sigmoid function (Figure 2‑5)

                                                                                                 (2.6)

This function has similar properties with the sigmoid function.  It works well for applications that yield output values in the range of [-1,1].

Figure 25 Bipolar sigmoid function

Activation functions for the hidden units are needed to introduce non-linearity into the networks. The reason is that a composition of linear functions is again a linear function. However, it is the non-linearity (i.e., the capability to represent nonlinear functions) that makes multi-layer networks so powerful. Almost any nonlinear function does the job, although for back-propagation learning it must be differentiable and it helps if the function is bounded (see Section 3.4). The sigmoid functions are the most common choices [5].

For the output units, activation functions should be chosen to be suited to the distribution of the target values. We have already seen that for binary [0,1] outputs, the sigmoid function is an excellent choice. For continuous-valued targets with a bounded range, the sigmoid functions are again useful, provided that either the outputs or the targets to be scaled to the range of the output activation function.  But if the target values have no known bounded range, it is better to use an unbounded activation function, most often the identity function (which amounts to no activation function). If the target values are positive but have no known upper bound, an exponential output activation function can be used [5].

2.4      Network topologies

The topology of a network is defined by the number of layers, the number of units per layer, and the interconnection patterns between layers. They are generally divided into two categories based on the pattern of connections:

1) Feed-forward networks (Figure 3‑1), where the data flow from input units to output units is strictly feed-forward. The data processing can extend over multiple layers of units, but no feedback connections are present. That is, connections extending from outputs of units to inputs of units in the same layer or previous layers are not permitted. Feed-forward networks are the main focus of this thesis. Details will be described in Chapter 3.

2) Recurrent networks (Figure 2‑6), which contain feedback connections. Contrary to feed-forward networks, the dynamical properties of the network are important. In some cases, the activation values of the units undergo a relaxation process such that the network will evolve to a stable state in which activation does not change further. In other applications in which the dynamical behavior constitutes the output of the network, the changes of the activation values of the output units are significant.

Figure 26 Recurrent neural network

2.5      Network Learning

The functionality of a neural network is determined by the combination of the topology (number of layers, number of units per layer, and the interconnection pattern between the layers) and the weights of the connections within the network. The topology is usually held fixed, and the weights are determined by a certain training algorithm. The process of adjusting the weights to make the network learn the relationship between the inputs and targets is called learning, or training. Many learning algorithms have been invented to help find an optimum set of weights that results in the solution of the problems. They can roughly be divided into two main groups:

1)      Supervised Learning - The network is trained by providing it with inputs and desired outputs (target values). These input-output pairs are provided by an external teacher, or by the system containing the network. The difference between the real outputs and the desired outputs is used by the algorithm to adapt the weights in the network (Figure 2‑7). It is often posed as a function approximation problem - given training data consisting of pairs of input patterns x, and corresponding target t, the goal is to find a function f(x) that matches the desired response for each training input.

Figure 27 Supervised learning model

2)      Unsupervised Learning - With unsupervised learning, there is no feedback from the environment to indicate if the outputs of the network are correct. The network must discover features, regulations, correlations, or categories in the input data automatically.  In fact, for most varieties of unsupervised learning, the targets are the same as inputs. In other words, unsupervised learning usually performs the same task as an auto-associative network, compressing the information from the inputs.

2.6      Objective Function

To train a network and measure how well it performs, an objective function (or cost function) must be defined to provide an unambiguous numerical rating of system performance. Selection of an objective function is very important because the function represents the design goals and decides what training algorithm can be taken.  To develop an objective function that measures exactly what we want is not an easy task. A few basic functions are very commonly used. One of them is the sum of squares error function,

(2.7)

where p indexes the patterns in the training set, i indexes the output nodes, and tpi and ypi  are, respectively, the target and actual network output for the ith output unit on the pth pattern. In real world applications, it may be necessary to complicate the function with additional terms to control the complexity of the model.


<< Previou Page   Index Page   Next Page >>

Copyright ©2000-2007 Zhanshou Yu. All Right Reserved.