- Introduction to neural networks
- Blog main content
- Neural network framework
- Forward propagation of neural networks
- Inverse propagation of neural networks
- Summary
- Reference

Today, nerual networks ( networks ) have been a very large, subject domain [ 1 ] domain. It doesn't use a simple"algorithm","a framework"to summarize its contents. From early neurons ( neuron ), to sensor ( perceptron ), then to bp neural networks, then go to the depth learning ( deep learning ), which is a general evolution process. Although different in different times, the idea of propagation, such as forward numerical propagation, inverse error propagation is a.

This blog mainly introduces the neural network. Forward propagation And and Error propagation process By simply using a single layer hidden layer neural network, a detailed numerical example is used to demonstrate. Also, each step, the blogger provides the implementation code for tensorflow [ 2 ].

For a neuron.

Code for tensorflow

```
defmultilayer_perceptron(x, weights, bias): layer_1 = tf.add(tf.matmul(x, weights["h1"]), bias["b1"])
layer_1 = tf.nn.sigmoid(layer_1)
out_layer = tf.add(tf.matmul(layer_1, weights["out"]), bias["out"])
layer_2 = tf.nn.sigmoid(out_layer)
return layer_2
```

- 1
- 2
- 3
- 4
- 5
- 6

- 1
- 2
- 3
- 4
- 5
- 6

( 1 ) determine input data and gd

`X = [[1, 2], [3, 4]]Y = [[0, 1], [0, 1]]`

- 1
- 2

- 1
- 2

Clearly batch_size = 2. The fi & t batch size is.

X1 = 1

X2 = 2

( 2 ) initialization weight and

Figure 1 shows that the number of weights is 8

```
weights = {
'h1': tf.Variable([[0.15, 0.16], [0.17, 0.18]], name="h1"),
'out': tf.Variable([[0.15, 0.16], [0.17, 0.18]], name="out")
}
biases = {
'b1': tf.Variable([0.1,0.1], name="b1"),
'out': tf.Variable([0.1, .1], name="out")
}
```

- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9

- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9

( 3 ) forward propagation I tance

Taking the first neuron as an example:

There's

There's

Finally, the output of hidden layer

M = [ [ 0. 64336514, 0.650 21855 ] ].Code for forward propagation tensorflow

```
import tensorflow as tf
import numpy as npx = [1, 2]
weights = [[0.15, 0.16], [0.17, 0.18]]
b = [0.1, 0.1]X = tf.placeholder("float", [None, 2])
W = tf.placeholder("float", [2, 2])
bias = tf.placeholder("float", [None, 2])
mid_value = tf.add(tf.matmul(X, W), b)
result = tf.nn.sigmoid(tf.add(tf.matmul(X, W), b))
with tf.Session() as sess:
x = np.array(x).reshape(1, 2)
print x b = np.array(b).reshape(1, 2)
result, mid_value = sess.run([result, mid_value], feed_dict={X : x, W : weights, bias : b})
print mid_value
print result
```

- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21

- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21

In the same way, we get the pred of the output layer.

Brooke = [ [ 0. 57616305, 0.579 31882 ] ].( 4 ) error calculation

A lot of error functions are optional, the mean square error function used in this example ( mean squared error ).

Notice that the difference between the square error and the square error is the square error ( sum-squared error ).

The error generated by the mean square error:

It's known that the default output and the error we expect, we need to optimize the parameters in neural networks according to the error. The error of the parameter. Bp algorithm is the core part of neural network, and almost all neural network models are optimized by the algorithm or improved algorithm. Bp algorithm is based on gradient descent ( gradient descent ) strategy, which is based on the negative direction of.

We've a given learning rate 0. 5, every update.

( 1 ) update the weight of the output layer ( weight ) first

According to chain method,

The first item is the derivative of the mean square error function

The second is the gradient of the activation function.

Third third

So so, so,

Update

In other words,

( 2 ) update bias

According to chain method,

Update

Same as the same.

( 3 ) next update the weight of the hidden layer

So.

For the total error.

There's

First, cost1.

( some of these items have been worked out before, so take it directly to calculate ).

Find cost2 again.

Total.

And calculate the second.

Calculated third.

Merge calculation

Update

Similarly, other

( 4 ) update the bias of the hidden layer

Also, according to chain method.

Update

Same update

( 1 ) to this, all of the parameters are updated, then in the next batch [ 3, 4 ], through the forward propagation of the new parameter, the resulting error is

Loss = 0.238 827It's smaller than the first 0.254 468, which also illustrates the effectiveness of gradient descent.

( 2 ) I'm using the results of calculation and code to be 0, 01, and I guess one of my calculations is the error that I calculated, in one of the numerical results.

The mistake, if you find, please give me a hint, thank you.

( 3 ) code details

An optimization method in tensorflow

`optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)`

- 1

- 1

Cost definition

`cost = tf.reduce_mean(tf.pow(pred - y, 2))`

- 1

- 1

( 4 ) gradient vanishing ( vanish gradient problem )

Gradient vanishing problem is, in the course of training, the gradient becomes very small, or the gradient becomes 0, so that the parameter update is too slow. Gradient disappears is a relationship with the activation function, usually we use ReLU or ReLU ( such as prelu [ 3 ] ) to reduce the gradient disappear. In our case, we use the ligistic function, which is the derivative, and of course is the derivative 1. We can see from the previous operation that there will be a problem in the process of backward uploading.

As a result, it'll be smaller and smaller when the derivative is less than 1. So, for the sake of this problem, the problem is to use ReLU ( derivative 1 ) instead of a function of ligistic type, too.

Help us to modify the poor quality of the sentence