Mini-Batch Gradient Descent
Mini-Batch Gradient Descent
Mini-Batch Gradient Descent is a variation of the Gradient Descent algorithm that is commonly used in deep learning. It is also known as Stochastic Gradient Descent (SGD), although technically, SGD refers to the version of the algorithm that uses a batch size of 1.
To implement Mini-Batch Gradient Descent, we first randomly shuffle the training data, and then divide it into batches of a fixed size. We then loop over the batches and perform the following steps for each batch:
- Compute the gradients of the loss function with respect to the model parameters using the current batch of data.
- Update the model parameters using the computed gradients and the learning rate.
- Repeat until the entire dataset has been processed a fixed number of times (epochs).
Mechanism:
- It updates the model in small batches instead of one large batch.
- It reduces the variance of the parameter updates, which can lead to more stable convergence.
- It can make use of highly optimized matrix optimizations common to state-of-the-art deep learning libraries that make computing the gradient with respect to one example very efficient.
Pros:
- It can make use of highly optimized matrix optimizations common to state-of-the-art deep learning libraries that make computing the gradient with respect to one example very efficient.
- It can converge faster than stochastic gradient descent.
- It can be more computationally efficient than batch gradient descent.
- Has a more stable convergence
Cons:
- It can be sensitive to learning rate tuning.
- It can be sensitive to batch size selection.
- It is hard to deal with saddle points.
- It is hard to deal with ravines and local optima.
Code:
def train_mini_batch(X_train, Y_train, X_valid, Y_valid,
batch_size=32, epochs=5, load_path="/tmp/model.ckpt",
save_path="/tmp/model.ckpt"):
"""
Trains a loaded neural network model using mini-batch gradient descent
"""
# Start a TensorFlow session to run the training operations
with tf.Session() as sess:
# Load the model graph and restore the session
saver = tf.train.import_meta_graph(load_path + ".meta")
saver.restore(sess, load_path)
x = tf.get_collection("x")[0]
y = tf.get_collection("y")[0]
accuracy = tf.get_collection("accuracy")[0]
loss = tf.get_collection("loss")[0]
train_op = tf.get_collection("train_op")[0]
m = X_train.shape[0]
# Calculate the number of batches
if m % batch_size == 0:
n_batches = m // batch_size
else:
n_batches = m // batch_size + 1
# Train the model for the given number of epochs
for i in range(epochs + 1):
# Calculate the loss and acuracy for the training set
cost_train = sess.run(loss, feed_dict={x: X_train, y: Y_train})
accuracy_train = sess.run(accuracy,
feed_dict={x: X_train, y: Y_train})
cost_val = sess.run(loss, feed_dict={x: X_valid, y: Y_valid})
# Calculate the loss and accuracy for the validation set
accuracy_val = sess.run(accuracy,
feed_dict={x: X_valid, y: Y_valid})
# Print the training and validation results for the current epoch
print("After {} epochs:".format(i))
print("\tTraining Cost: {}".format(cost_train))
print("\tTraining Accuracy: {}".format(accuracy_train))
print("\tValidation Cost: {}".format(cost_val))
print("\tValidation Accuracy: {}".format(accuracy_val))
# Train the model using mini-batches
if i < epochs:
# Shuffle the training data
shuffled_X, shuffled_Y = shuffle_data(X_train, Y_train)
# Train the model using mini-batches
for b in range(n_batches):
start = b * batch_size
end = (b + 1) * batch_size
if end > m:
end = m
X_mini_batch = shuffled_X[start:end]
Y_mini_batch = shuffled_Y[start:end]
# Define the next mini-batch
next_train = {x: X_mini_batch, y: Y_mini_batch}
# Run a training step with the mini-batch
sess.run(train_op, feed_dict=next_train)
# Print the mini-batch results every 100 batches
if (b + 1) % 100 == 0 and b != 0:
loss_mini_batch = sess.run(loss, feed_dict=next_train)
acc_mini_batch = sess.run(accuracy,
feed_dict=next_train)
print("\tStep {}:".format(b + 1))
print("\t\tCost: {}".format(loss_mini_batch))
print("\t\tAccuracy: {}".format(acc_mini_batch))
# Save the trained model
return saver.save(sess, save_path)
Comments
Post a Comment