Introduction The chi-square statistic is a useful tool for understanding the relationship between two categorical variables. For the sake of example, let’s say you work for a tech company that has rolled out a new product and you want to assess the relationship between this product and customer churn. In the age of data, tech […]

# Author Archives: lessonsindatascience

## How to Visualize Multiple Regression in 3D

Introduction No matter your exposure to data science & the world of statistics, at the very least, you’ve very likely heard of regression. In this post we’ll be talking about multiple regression, as a precursor, you’ll definitely want some familiarity with simple linear regression. If you aren’t familiar you can start here! Otherwise, let’s dive […]

## Visualizing Multiple Linear Regression with Heatmaps

Introduction No matter your exposure to data science & the world of statistics, it’s likely that at some point, you’ve at the very least heard of regression. As a precursor to this quick lesson on multiple regression, you should have some familiarity with simple linear regression. If you aren’t, you can start here! Otherwise, let’s […]

## The Intuitive Explanation of Logistic Regression

Introduction Logistic regression can be pretty difficult to understand! As such I’ve put together a very intuitive explanation of the why, what, and how of logistic regression. We’ll start with some building blocks that should lend well to clearer understanding so hang in there! Through the course of the post, I hope to send you […]

## Multiple Regression in R

Introduction No matter your exposure to data science & the world of statistics, it’s likely that at some point, you’ve at the very least heard of regression. As a precursor to this quick lesson on multiple regression, you should have some familiarity with simple linear regression. If you aren’t, you can start here! Otherwise let’s […]

## Leverage Anti-joins

Introduction Assuming you already have some background with the other more common types of joins, inner, left, right, and outer; adding semi and anti can prove incredibly useful saving you what could have alternatively taken multiple steps. In a previous post, I outlined the benefits of semi-joins and how to use them. Here I’ll be […]

## Getting Started with Data Science

Introduction When it comes to getting started in data science it can be a bit overwhelming. You need to know statistics, programming, machine learning… within each of those domains there are a many, many sub domains that can dominate a person’s focus and once they’re done reading everything there is to know about one thing, […]

## Leverage Semi-joins in R

Introduction Assuming you already have some background with the other more common types of joins, inner, left, right, and outer; adding semi and anti can prove incredibly useful saving you what could have alternatively taken multiple steps. In this post, I’ll be focusing on just semi-joins; with that said, there is a lot of overlap […]

## Kmeans clustering

Introduction Clustering is a machine learning technique that falls into the unsupervised learning category. Without going into a ton of detail on different machine learning categories, I’ll give a high level description of unsupervised learning. To put it simply, rather than pre-determining what we want our algorithm to find, we provide the algorithm little to […]

## What Every Data Scientist Needs to Know About Clustering

Introduction to Machine Learning Machine learning is a frequently buzzed about term, yet there is often a lack of understanding into its different areas. One of the first distinctions made with machine learning is between what’s called supervised and unsupervised learning. Having a basic understanding of this distinction and the purposes/applications of either will be […]