Feature Scaling

Feature Scaling



Feature scaling is a data preprocessing technique that involves transforming the values of features or variables in a dataset to a similar scale. This is done to ensure that all features contribute equally to the model and to prevent features with larger values from dominating the model. Feature scaling is not strictly necessary in all Machine Learning models.

There are two main feature scaling techniques: min-max scaler and standard scaler. The min-max scaler responds well for features with distributions which are not Gaussian, while the standard scaler responds well for features with Gaussian distributions.

Rescaling/min-max normalization:
  • x_p = (x -min(x)) / (max(x) - min(x))
Standardization or Z-score normalization:
  • x_p = (x - m)/s

Pros:

  • It helps improve the performance of machine learning models by ensuring that all features contribute equally to the model.
  • It helps prevent features with larger values from dominating the model.
  • It helps improve the convergence rate of gradient descent algorithms.
  • It speeds up gradient descent.

Cons:

  • It can be computationally expensive.
  • It can be sensitive to outliers.

Examples:

Feature scaling can be used in many machine learning algorithms to improve their performance. For example, some algorithms where feature scaling matters are:

  • K-nearest neighbors (KNN) with a Euclidean distance measure is sensitive to magnitudes and hence should be scaled for all features to weigh in equally.
  • K-Means uses the Euclidean distance measure here feature scaling matters.
Here’s an example of how feature scaling can be applied to data: Suppose that we have the students’ weight data, and the students’ weights span [160 pounds, 200 pounds]. To rescale this data, we first subtract 160 from each student’s weight and divide the result by 40 (the difference between the maximum and minimum weights).

Code:

def normalize(X, m, s):
    """
    X is the numpy.ndarray of shape (d, nx) to normalize

    d is the number of data points

    nx is the number of features

    m is a numpy.ndarray of shape (nx,) that contains
    the mean of all features of X

    s is a numpy.ndarray of shape (nx,) that contains the standard deviation
    of all features of X

    Returns: The normalized X matrix
    """
    return (X - m) / s

Comments

Popular posts from this blog

Mini-Batch Gradient Descent

Saddle points, ravines and local optimum

Gradient descent with momentum