Batch normalization

Batch normalization (BN) is a method used to make training of artificial neural networks faster and more stable through normalization of the layers’ inputs by re-centering and re-scaling1. It was proposed by Sergey Ioffe and Christian Szegedy in 2015.

Mechanism:

It normalizes the activations of each layer for every mini-batch independently.
It adds two trainable parameters per layer, which scale and shift the normalized output.
It introduces some noise to the output of each layer.

Pros:

It helps improve the performance of machine learning models by making training faster and more stable.
It helps reduce overfitting.
It helps reduce the dependence on initialization.
Works well with other optimization methods (SGD w/ momentum, RMSprop, Adam)
Allows for higher learning rates

Cons:

It can be computationally expensive.
It can cause problems when used with small batch sizes.
Is not good for RNNs / LSTMs.
Have to use a different calculation between training and testing.

Examples:

In a convolutional neural network (CNN), batch normalization can be applied after each convolutional layer.
In a deep neural network (DNN), batch normalization can be applied after each fully connected layer.

Code using numpy:

def batch_norm(Z, gamma, beta, epsilon):
    """
    @Z is a numpy.ndarray of shape (m, n) that should be normalized
        @m is the number of data points
        @n is the number of features in Z
    @gamma is a numpy.ndarray of shape (1, n)
    containing the scales used for batch normalization
    @beta is a numpy.ndarray of shape (1, n)
    containing the offsets used for batch normalization
    @epsilon is a small number used to avoid division by zero
    Returns: the normalized Z matrix
    """
    # Calculate the mean and variance of Z
    mean = np.mean(Z, axis=0)
    var = np.var(Z, axis=0)

    # Normalize Z
    Z_norm = (Z - mean) / np.sqrt(var + epsilon)

    # Scale and shift the normalized Z using gamma and beta
    Z_norm_scaled_shifted = gamma * Z_norm + beta

    return Z_norm_scaled_shifted

Code using tensorflow:

def create_batch_norm_layer(prev, n, activation):
    """
    @prev is the activated output of the previous layer
    @n is the number of nodes in the layer to be created
    @activation is the activation function that
    should be used on the output of the layer
    Returns: a tensor of the activated output for the layer
    """
    # Initialize the base layer with 'Dense' function from tensorflow
    # The 'prev' input is passed through a dense layer with n nodes
    k_init = tf.contrib.layers.variance_scaling_initializer(mode="FAN_AVG")
    layer = tf.layers.Dense(units=n,
                            kernel_initializer=k_init,
                            use_bias=False)(prev)

    # Initialize trainable parameters 'gamma' and 'beta' as vectors of 1
    # and 0 respectively, with shape (1, n)
    gamma = tf.Variable(initial_value=tf.ones(shape=(1, n)), name="gamma")
    beta = tf.Variable(initial_value=tf.zeros(shape=(1, n)), name="beta")

    # Calculate the batch mean and variance of the previous layer
    # using the tf.nn.moments() function
    mean, variance = tf.nn.moments(layer, axes=[0])

    # Create a batch normalization layer using the
    # tf.nn.batch_normalization() function
    # with gamma, beta, mean, variance and epsilon=1e-8 as arguments
    # Apply the activation function to
    # the output of the batch normalization layer
    # The final result is a tensor of the activated output for the layer
    layer_norm = tf.nn.batch_normalization(layer,
                                           mean,
                                           variance,
                                           beta,
                                           gamma,
                                           1e-8)
    output = activation(layer_norm)
    return output

Search This Blog

Boosting Neural Network Performance: Supervised Learning Optimization Methods

Batch normalization

Batch normalization

Mechanism:

Pros:

Cons:

Examples:

Code using numpy:

Code using tensorflow:

Comments

Post a Comment

Popular posts from this blog

Mini-Batch Gradient Descent

Adam