Performance Metrics in Machine Learning

Chanchala Gorale
4 min readJun 12, 2024

--

Performance metrics are critical in evaluating the effectiveness of machine learning models. They provide quantitative measures to assess how well a model performs on a given dataset, guiding model selection and tuning.

Types of Performance Metrics

Regression Metrics:

  • Mean Absolute Error (MAE)
  • Mean Squared Error (MSE)
  • Root Mean Squared Error (RMSE)
  • R-squared (R²)

Classification Metrics:

  • Accuracy
  • Precision
  • Recall
  • F1 Score
  • Area Under the ROC Curve (AUC-ROC)
  • Confusion Matrix

Clustering Metrics:

  • Silhouette Score
  • Davies-Bouldin Index
  • Adjusted Rand Index (ARI)

Regression Metrics

Mean Absolute Error (MAE):

  • Definition: The average of the absolute differences between predicted and actual values.
  • Use Case: When you need a straightforward interpretation of errors.
  • Formula:
from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y_true, y_pred)

Mean Squared Error (MSE):

  • Definition: The average of the squared differences between predicted and actual values.
  • Use Case: When larger errors need to be penalized more severely.
  • Formula:
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(y_true, y_pred)

Root Mean Squared Error (RMSE):

  • Definition: The square root of the mean squared error.
  • Use Case: Similar to MSE but provides results in the same unit as the target variable.
  • Formula:
rmse = mean_squared_error(y_true, y_pred, squared=False)

R-squared (R²):

  • Definition: The proportion of the variance in the dependent variable that is predictable from the independent variables.
  • Use Case: To understand the proportion of the variance explained by the model.
  • Formula:
from sklearn.metrics import r2_score
r2 = r2_score(y_true, y_pred)

Classification Metrics

Accuracy:

  • Definition: The ratio of correctly predicted instances to the total instances.
  • Use Case: When the classes are balanced.
  • Formula:
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_true, y_pred)

Precision:

  • Definition: The ratio of true positive predictions to the total predicted positives.
  • Use Case: When the cost of false positives is high.
  • Formula:
from sklearn.metrics import precision_score
precision = precision_score(y_true, y_pred, average='binary')

Recall:

  • Definition: The ratio of true positive predictions to the total actual positives.
  • Use Case: When the cost of false negatives is high.
  • Formula:
from sklearn.metrics import recall_score
recall = recall_score(y_true, y_pred, average='binary')

F1 Score:

  • Definition: The harmonic mean of precision and recall.
  • Use Case: When a balance between precision and recall is needed.
  • Formula:
from sklearn.metrics import f1_score
f1 = f1_score(y_true, y_pred, average='binary')

Area Under the ROC Curve (AUC-ROC):

  • Definition: The area under the receiver operating characteristic (ROC) curve.
  • Use Case: Evaluating the trade-off between true positive rate and false positive rate.
  • Formula:Integral of ROC Curve
from sklearn.metrics import roc_auc_score
auc = roc_auc_score(y_true, y_pred_proba)

Confusion Matrix:

  • Definition: A table showing the performance of the classification algorithm.
  • Use Case: Detailed breakdown of true positives, false positives, true negatives, and false negatives.
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_true, y_pred)

Clustering Metrics

Silhouette Score:

  • Definition: Measures how similar an object is to its own cluster compared to other clusters.
  • Use Case: To validate the consistency within clusters.
  • Formula: Computed using distances
from sklearn.metrics import silhouette_score
score = silhouette_score(X, labels)

Davies-Bouldin Index:

  • Definition: Measures the average similarity ratio of each cluster with the cluster that is most similar to it.
  • Use Case: To evaluate the clustering algorithm.
  • Formula: Computed using cluster centroids
from sklearn.metrics import davies_bouldin_score 

db_index = davies_bouldin_score(X, labels)

Adjusted Rand Index (ARI):

  • Definition: Measures the similarity between two data clusterings while adjusting for chance.
  • Use Case: To compare different clustering algorithms.
  • Formula: Adjusted version of Rand Index
from sklearn.metrics import adjusted_rand_score
ari = adjusted_rand_score(labels_true, labels_pred)

Choosing the Right Metric

Regression:

  • MAE: When you need a simple, interpretable measure of error.
  • MSE/RMSE: When you want to penalize larger errors more severely.
  • : When you need to understand the proportion of variance explained by the model.

Classification:

  • Accuracy: When classes are balanced.
  • Precision/Recall: When dealing with imbalanced classes and when false positives or false negatives are more costly.
  • F1 Score: When a balance between precision and recall is important.
  • AUC-ROC: When evaluating the trade-off between true positive rate and false positive rate.

Clustering:

  • Silhouette Score: To validate consistency within clusters.
  • Davies-Bouldin Index: To evaluate the clustering algorithm.
  • ARI: To compare different clustering algorithms.

--

--

Chanchala Gorale
Chanchala Gorale

Written by Chanchala Gorale

Founder | Product Manager | Software Developer

No responses yet