Gradient descent with momentum
Gradient descent with momentum
Gradient descent with momentum is an optimization algorithm that is used to update the weights of a neural network during training. It is an extension of the standard gradient descent algorithm that adds a momentum term to the update rule. The momentum term helps accelerate gradients vectors in the right directions, thus leading to faster convergence.
The basic idea of Gradient Descent with momentum is to calculate the exponentially weighted average of past gradients and use this average to update the weights instead of using only the current gradient. This helps smooth out fluctuations in the gradient and helps prevent oscillations in the optimization process.
How works?
v_t = beta * v_t-1 + alpha * Grad(w)
w = w - v_t
Pros:
- It also works much faster than the algorithm Standard Gradient Descent.
- Less oscillation around ravines and local optima
Cons:
- It requires tuning of an additional hyperparameter (momentum) which can be time-consuming.
- Can cause model to pass the optimum
- Have to remember to scale down the learning rate by a factor of (1 - beta)
Code:
def update_variables_momentum(alpha, beta1, var, grad, v):
"""
@alpha is the learning rate
@beta1 is the momentum weight
@var is a numpy.ndarray containing the variable to be updated
@grad is a numpy.ndarray containing the gradient of var
@v is the previous first moment of var
Returns: the updated variable and the new moment, respectively
"""
# Compute the first moment estimate
v_new = beta1 * v + (1 - beta1) * grad
# Update the variable
var_new = var - alpha * v_new
return var_new, v_new
Code using tensorflow:
def create_momentum_op(loss, alpha, beta1):
"""
@loss is the loss of the network
@alpha is the learning rate
@beta1 is the momentum weight
Returns: the momentum optimization operation
"""
# Create a MomentumOptimizer object with the
# specified learning rate and momentum weight
optimizer = tf.train.MomentumOptimizer(learning_rate=alpha,
momentum=beta1)
# Create the training operation by calling the minimize
# method on the optimizer object with the loss as an argument
train_op = optimizer.minimize(loss)
return train_op
Comments
Post a Comment