Logistic regression is one of the most popular machine learning algorithms. It’s a simple and effective way to classify whether something is one thing or another.

So tell me – are you in or are you out?

🤖 Table of Contents 🤖
What is logistic regression?
How logistic regression works
Logistic regression vs linear regression
Multiclass classification
Examples of logistic regression in the wild

What is logistic regression?

Logistic regression is a supervised machine learning algorithm that predicts the probability of an outcome. In contrast to linear regression where theoretically any number could be the predicted, in logistic regression there are only two possible outcomes, 0 or 1. This means the prediction will be discrete – something is or isn’t. An email is either spam or it isn’t. The candidate will win or they won’t. Tomorrow, it will rain or it won’t.

This is what’s known as a classification algorithm which classifies data into categories. Considering its simplicity, logistic regression is extremely powerful and widely used. Uses include:

  • Spam filters
  • Weather predictions
  • Election predictions
  • Sports predictions
  • Image classification (kitten or icecream?)
  • Classifying types of words (hello NLP 👋)

And many more.

How logistic regression works

Let’s say that you run an online store and you just sent a promotional email out to your customer list. Of the customers that are on your newsletter list, you know their age and whether or not they bought a product as a direct result of reading your promotional email. You know this because you can tell which link(s) they clicked in your newsletter and you have tracking set up to see what that user did once they landed on your website.

Now you want to know whether someone of a certain age (e.g. 25 years old) will buy from you in the future. This is a binary question – we can predict that the user either will buy or they will not buy, which means that logistic regression is a 👌 algorithm to use to create this model.

First let’s take a look at our data:

We can see that the observed actions (a person who bought something from your online store) are plotted in red. There are no data points in the middle because we already know whether the individual did or did not buy, the data is binary.

Across the horizontal axis we have age and across the vertical axis we have two ticks, 1 and 0. Remember that 1 means someone purchased something after reading your email and 0 means someone did not purchase anything. Another way of looking at this is that 1 is 100% likelihood that someone bought something and 0 is 0% likelihood.

So logistic regression measures the relationship between our dependent variable (whether someone bought something) and independent variable (their age). Rather than trying to predict exactly what any given user is going to do, we can use logistic regression to predict the probability, or likelihood, that a person will buy based on their age. These probabilities are then transformed into binary values that allow us to make a yes-no prediction.

You might be asking how we convert these values – enter the sigmoid function.

The sigmoid function is an S-shaped curve that can scale any real number to be between 0 and 1, but never exactly reach 0 or 1. Once the numbers are mapped to sit between 0 and 1 they are transformed into either a 0 (no) or 1 (yes) using a threshold value.

Ultimately, the threshold value is arbitrary, however, in practice it’s usually selected as 0.5 or 50%. This provides symmetry as it’s a number in the middle.

Any value below the threshold line will be projected onto the 0 line and classified as a ‘no’, while any value above the threshold line will be projected onto the 1 line and classified as a ‘yes’ 👍

Ok, so we now know whether our model predicts a customer will buy something. But how likely are they to buy?

Since logistic regression works on probabilities we can project the data point onto the vertical axis and obtain the probability of something happening.

Here’s the formula for logistic regression:

y = e^(b0 + b1*x) / (1 + e^(b0 + b1*x))

Y is the dependent variable or predicted output. In our example that’s whether a customer will make a purchase or not.

x is the independent variable or predictor. We assume that the independent variable is causing the dependent variable to change in some way. Here we’re talking about age.

b1 is the coefficient for the independent variable. This controls the angle or slope of the our line and expresses how a unit change in x (older) effects a unit change in Y (more likely to buy).

b0 is the constant term. This is the point where your trendline crosses the horizontal axis.

E represents Euler’s number and is the actual numerical value that we’re looking to transform with the sigmoid function.

Pretty rad, right?

Logistic regression vs linear regression

You might be like “hold the phone – didn’t we already talk about a regression?” and yes, yes we did – here.

Regression analysis is all about estimating and understanding the relationship between variables. Because of this, linear regression and logistic regression are similar. In fact, you might even recognize elements of the linear regression formula in the logistic regression formula. That’s because logistic regression is a linear method that has been transformed using the logistic function (aka sigmoid function).

At the end of the day, both algorithms are doing the same thing – they are using an equation to find the line that best fits the variables found in a dataset.

Linear regression, however, predicts a continuous outcome that is measured along a sliding numeric scale, like house prices, while logistic regression predicts a discrete, or binary outcome, which means the outcome will always be one thing or another. The email is spam or it is not spam. IRL a linear regression might predict that a house will sell for $100,000 whereas a logistic regression would predict 1, aka yes, that email is indeed, spam.

Multiclass classification

What happens if you want to classify something with more than two possible outcomes? Well, luckily, logistic regression’s got your back. Using a method called one-vs-all classification, aka multiclass classification, you can do the same level of prediction as a logistic regression, but for multiple variables.

In multiclass classification, you train multiple logistic regression classifiers – one for each class in your data set. Your multiclass classifier will then pick the logistic regression classifier that outputs the highest probability. Easy.

Examples of logistic regression in the wild

Have you already done a really rad regression analysis? I’d love to see it! Share your regressions with the hashtag #machinesgonewild 🌿