Methods Of Evaluating the Performance Parameters of Machine Learning Models

5 min readMay 25, 2021

In this blog, we will be looking at the various ways by which we can look at the performance parameters of a model, and how to decide which one is the best suited for our dataset.

The evaluation of of performance models can happen once we are done with feature engineering as well as selection, built our models and made predictions using the model about the probability of our output class. Once we are done with these steps, we can move on to deciding the model that fits our dataset the best.

The performance parameters that we will look at to text our models will change with the type of model. First lets look at the types of models we will be looking at for this post and get to know a little more about them:

Regression Models — Regression models are predictive models which find a relationship between a target variable and its predictors. Regression models are generally used for continuous data like time series modelling and forecasting. This of regression models as algorithms that predict a certain values, for eg: the sales of a product on a particular day.
Classification Models — Classification models are predictive models as well, but unlike regression models classification models do not predict a certain value but rather a label or class that the output will fall under. For eg: whether the sales for a product will be high or low on a particular day. it is worth noting that in classification models the algorithms can further be divided into two types — class output and probability output. Class output which are algorithms like Support Vector Machines and KNearest Neighbors create a binary output. Whether the output will be class 1 will take two possible values — 0 for yes and 1 for no, and so on for the other classes. probability outputs on the other hand are algorithms like Logistic Regression, Gradient Boosting which returns a probability for each class. these probabilities can be converted to classes by creating a threshold based on which the probability can be converted to a class.

Now that we have a better understanding of the types of models, let us take a look at the performance metrics for different models —

Classification Metrics

Prediction Accuracy

Classification accuracy can be calculated by dividing the number of correct predictions by the total number of input samples. to explain this further, let us think of the output class of a dataset as being either ‘true’ or ‘false’. the prediction made after fitting the model to the data will therefore be one pf either ‘true’ or ‘false’. This will lead to one of four possible situations for the prediction-

Prediction is true and output class is also true, which makes this prediction a “True Positive” or TP
Prediction is false and output class is also false, which makes this prediction a “True Negative” or TN
Prediction is true but output class is false, which makes this prediction a “False Positive” or FP
Prediction is false but output class is true, which makes this a “False Negative” or FN

Classification Accuracy can be denoted by TP+TN divided by TP+TN+FP+FN.

Accuracy can be used as a powerful metric for evaluating the performance of a model when the classes are equally represented in the sample, however they could give inaccurate predictions when this parameter is not met.

ROC Curve

ROC (Receiver Operating Characteristic) Curve can be divided into two parts — True Positive Rate (TPR) or Sensitivity, and False Positive Rate (FPR) or Specificity.

TPR — True positive rate can be calculated by dividing TP by the sum of FN and TP. TPR gives us the percentage of true classes that were actually classified into the true class.
FPR — False positive rate can be calculated by dividing FP by the sum of FP and TN. FPR gives us the percentage of false classes that were incorrectly classified into true class.

The range of values for TPR and FPR both lie in between 0 and 1. Plotting TPR and FPR with a threshold which is a float that takes values between 0 and 1 will give us the ROC curve for the classifier models at different thresholds.

ROC Curve displaying TPR vs FPR at various thresholds

Confusion Matrix

Confusion Matrix is a matrix of size N, where N is the umber of classes in the dataset. Using Confusion Matrix, we get a matrix as output that defines the model performance. Confusion Matrix by itself is not a very powerful performance measure. rather we get the 4 main values that are used across all other performance metric evaluations — TP, FP, TN and FN. Hence, Confusion matrix is one of the most important tools for performance metric evaluation.

Confusion matrix for a dataset of 10 classes (music genres) — Confusion Matrix for a dataset of 10 classes (music genres)

Classification Report : precision, recall, f1 score

Classification report gives us a number of metrics that can be used to evaluate the performance of a model. We have already gone through some of these metrics earlier but here is a quick recap of the information available in a classification report —

Precision — precision is the number of TP divided by the number of all positives (TP+FP) predicted by the model
Recall — recall is the number of TP predicted by the model divided by the number of positives actually existing in the sample (TP+FN)
F1 score — FI score denotes the harmonic mean between precision and recall. it tells us the preciseness as well as the robustness of our model. A higher F1 score generally signifies a better model.

Regression Metrics

Mean Absolute Error

Mean absolute error is a performance metric generally used in regression models where the target value is an integer. Mean absolute error denotes the average of the difference between actual value and predicted value for all the observations. A smaller MAE generally signifies a better model.

Mean Squared Error

Unlike MAE, MSE takes the average of the difference between actual values and error terms after squaring the difference. Taking a square of the difference ensures that the larger the difference between actual and predicted value, the larger will be the effect of this prediction on the final value of MSE.

Root Mean Squared Error

RMSE is the square root of MSE to take away the effect of squaring the difference in MSE

R Square
Adjusted R Square

Methods Of Evaluating the Performance Parameters of Machine Learning Models

Written by Tejas Raj