Confusion Matrix

Siri Chandana
5 min readJun 26, 2021

--

Task 05 👨🏻‍💻

Task Description 📄

📌 Create a blog/article/video about cyber crime cases where they talk about confusion matrix or its two types of error.

This article is about confusion matrix .

What is Confusion matrix ?

A confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known. The confusion matrix itself is relatively simple to understand, but the related terminology can be confusing.

The number of correct and incorrect predictions are summarized with count values and broken down by each class. It gives you insight not only into the errors being made by your classifier but more importantly the types of errors that are being made.

The following 4 are the basic terminology which will help us in determining the metrics we are looking for.

  • True Positives (TP): when the actual value is Positive and predicted is also Positive.
  • True negatives (TN): when the actual value is Negative and prediction is also Negative.
  • False positives (FP): When the actual is negative but prediction is Positive. Also known as the Type 1 error
  • False negatives (FN): When the actual is Positive but the prediction is Negative. Also known as the Type 2 error

Let’s take an example:

We have a total of 20 cats and dogs and our model predicts whether it is a cat or not.

Actual values = [‘dog’, ‘cat’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’]
Predicted values = [‘dog’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘cat’, ‘cat’, ‘cat’, ‘dog’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’, ‘dog’, ‘dog’, ‘cat’]

True Positive (TP) = 6

You predicted positive and it’s true. You predicted that an animal is a cat and it actually is.

True Negative (TN) = 11

You predicted negative and it’s true. You predicted that animal is not a cat and it actually is not (it’s a dog).

False Positive (Type 1 Error) (FP) = 2

You predicted positive and it’s false. You predicted that animal is a cat but it actually is not (it’s a dog).

False Negative (Type 2 Error) (FN) = 1

You predicted negative and it’s false. You predicted that animal is not a cat but it actually is.

From our confusion matrix, we can calculate five different metrics measuring the validity of our model.

  1. Accuracy (all correct / all) = TP + TN / TP + TN + FP + FN
  2. Misclassification or Error (all incorrect / all) = FP + FN / TP + TN + FP + FN
  3. Precision (true positives / predicted positives) = TP / TP + FP
  4. Sensitivity or Recall (true positives / all actual positives) = TP / TP + FN
  5. Specificity (true negatives / all actual negatives) =TN / TN + FP

CYBER CRIME CASES AND CONFUSION MATRIX

Cyber-crime is nothing but all illegal activities which are carried out using technology. Cyber-criminals hack user’s personal computers, smartphones, personal details from social media, business secrets, national secrets, important personal data, etc with the help of internet and technology. Hackers are the criminals who are performing these illegal, malicious activities on the internet. Though some agencies are trying to tackle this problem, it is growing regularly and many people have become victims of identity theft, hacking, and malicious software. Let’s find out more about cyber-crimes.

Cybercrime performs for various reason:

  • Stealing of personal data
  • Identity stolen
  • For stealing organizational data
  • Steal bank card details.
  • Hack emails for gaining information.

When we get the data, after data cleaning, pre-processing, and wrangling, the first step we do is to feed it to an outstanding model and of course, get output in probabilities. But hold on! How in the hell can we measure the effectiveness of our model. Better the effectiveness, better the performance and that are exactly what we want. And it is where the Confusion matrix comes into the limelight. Confusion Matrix is a performance measurement for machine learning classification.

Thus, detecting various cyber-attacks in a network is very necessary. The application of Machine Learning model in building an effective Intrusion Detection System (IDS) comes into play. A binary classification model can be used to identify what is happening in the network i.e., if there is any attack or not.

Understanding the raw security data is the first step to build an intelligent security model for making predictions about future incidents. The two categories being — normal and anomaly. Take into account the selected security features and performing all preprocessing steps, train the model that can be used to detect whether the test case is normal or an anomaly. For evaluation of model, one of the metric used is Confusion matrix.

Confusion matrices have two types of errors: Type I and Type II

Type I error:

This type of error can prove to be very dangerous. Our system predicted no attack but in real attack takes place, in that case no notification would have reached the security team and nothing can be done to prevent it. The False Positive cases above fall in this category and thus one of the aim of model is to minimize this value.

Type II error:

This type of error are not very dangerous as our system is protected in reality but model predicted an attack. the team would get notified and check for any malicious activity. This doesn’t cause any harm. They can be termed as False Alarm.

--

--

No responses yet