Batch normalization
Batch normalization
Batch normalization (BN) is a method used to make training of artificial neural networks faster and more stable through normalization of the layers’ inputs by re-centering and re-scaling1. It was proposed by Sergey Ioffe and Christian Szegedy in 2015.
Mechanism:
- It normalizes the activations of each layer for every mini-batch independently.
- It adds two trainable parameters per layer, which scale and shift the normalized output.
- It introduces some noise to the output of each layer.
Pros:
- It helps improve the performance of machine learning models by making training faster and more stable.
- It helps reduce overfitting.
- It helps reduce the dependence on initialization.
- Works well with other optimization methods (SGD w/ momentum, RMSprop, Adam)
- Allows for higher learning rates
Cons:
- It can be computationally expensive.
- It can cause problems when used with small batch sizes.
- Is not good for RNNs / LSTMs.
- Have to use a different calculation between training and testing.
Examples:
- In a convolutional neural network (CNN), batch normalization can be applied after each convolutional layer.
- In a deep neural network (DNN), batch normalization can be applied after each fully connected layer.
Code using numpy:
def batch_norm(Z, gamma, beta, epsilon):
"""
@Z is a numpy.ndarray of shape (m, n) that should be normalized
@m is the number of data points
@n is the number of features in Z
@gamma is a numpy.ndarray of shape (1, n)
containing the scales used for batch normalization
@beta is a numpy.ndarray of shape (1, n)
containing the offsets used for batch normalization
@epsilon is a small number used to avoid division by zero
Returns: the normalized Z matrix
"""
# Calculate the mean and variance of Z
mean = np.mean(Z, axis=0)
var = np.var(Z, axis=0)
# Normalize Z
Z_norm = (Z - mean) / np.sqrt(var + epsilon)
# Scale and shift the normalized Z using gamma and beta
Z_norm_scaled_shifted = gamma * Z_norm + beta
return Z_norm_scaled_shifted
Code using tensorflow:
def create_batch_norm_layer(prev, n, activation):
"""
@prev is the activated output of the previous layer
@n is the number of nodes in the layer to be created
@activation is the activation function that
should be used on the output of the layer
Returns: a tensor of the activated output for the layer
"""
# Initialize the base layer with 'Dense' function from tensorflow
# The 'prev' input is passed through a dense layer with n nodes
k_init = tf.contrib.layers.variance_scaling_initializer(mode="FAN_AVG")
layer = tf.layers.Dense(units=n,
kernel_initializer=k_init,
use_bias=False)(prev)
# Initialize trainable parameters 'gamma' and 'beta' as vectors of 1
# and 0 respectively, with shape (1, n)
gamma = tf.Variable(initial_value=tf.ones(shape=(1, n)), name="gamma")
beta = tf.Variable(initial_value=tf.zeros(shape=(1, n)), name="beta")
# Calculate the batch mean and variance of the previous layer
# using the tf.nn.moments() function
mean, variance = tf.nn.moments(layer, axes=[0])
# Create a batch normalization layer using the
# tf.nn.batch_normalization() function
# with gamma, beta, mean, variance and epsilon=1e-8 as arguments
# Apply the activation function to
# the output of the batch normalization layer
# The final result is a tensor of the activated output for the layer
layer_norm = tf.nn.batch_normalization(layer,
mean,
variance,
beta,
gamma,
1e-8)
output = activation(layer_norm)
return output
Comments
Post a Comment