Gradient descent with momentum

Gradient descent with momentum is an optimization algorithm that is used to update the weights of a neural network during training. It is an extension of the standard gradient descent algorithm that adds a momentum term to the update rule. The momentum term helps accelerate gradients vectors in the right directions, thus leading to faster convergence.

The basic idea of Gradient Descent with momentum is to calculate the exponentially weighted average of past gradients and use this average to update the weights instead of using only the current gradient. This helps smooth out fluctuations in the gradient and helps prevent oscillations in the optimization process.

How works?

v_t = beta * v_t-1 + alpha * Grad(w)

w = w - v_t

Pros:

It also works much faster than the algorithm Standard Gradient Descent.
Less oscillation around ravines and local optima

Cons:

It requires tuning of an additional hyperparameter (momentum) which can be time-consuming.
Can cause model to pass the optimum
Have to remember to scale down the learning rate by a factor of (1 - beta)

Code:

def update_variables_momentum(alpha, beta1, var, grad, v):
    """
    @alpha is the learning rate
    @beta1 is the momentum weight
    @var is a numpy.ndarray containing the variable to be updated
    @grad is a numpy.ndarray containing the gradient of var
    @v is the previous first moment of var
    Returns: the updated variable and the new moment, respectively
    """
    # Compute the first moment estimate
    v_new = beta1 * v + (1 - beta1) * grad

    # Update the variable
    var_new = var - alpha * v_new

    return var_new, v_new

Code using tensorflow:

def create_momentum_op(loss, alpha, beta1):
    """
    @loss is the loss of the network
    @alpha is the learning rate
    @beta1 is the momentum weight
    Returns: the momentum optimization operation
    """
    # Create a MomentumOptimizer object with the
    # specified learning rate and momentum weight
    optimizer = tf.train.MomentumOptimizer(learning_rate=alpha,
                                           momentum=beta1)

    # Create the training operation by calling the minimize
    # method on the optimizer object with the loss as an argument
    train_op = optimizer.minimize(loss)

    return train_op

Search This Blog

Boosting Neural Network Performance: Supervised Learning Optimization Methods