Building a Regression Model with Categorical Factors

Introduction Regression is a staple in the world of data science, and as such it’s useful to understand it in its simplest form. I recently wrote a post that gave us more detail into regression. You can find that here. To follow on the ideas that we explored there, today we will be exploring the […]

Build, Evaluate, and Interpret a Linear Regression Model in Minutes

Intro Regression is central to so much of the statistical analysis & machine learning tools that we leverage as data scientists. Stated simply, we utilize regression techniques to model Y through some function of X. We’ll take a look at some additional ideas to set up the premise of regression; and then we’ll take a […]

Understanding The General Modeling Framework

When it comes to building statistical models, we do so with the purpose of understanding or approximating some aspect of our world. The concept of the general modeling framework lends well to breaking down the purposes and approaches that we might take to generate said understanding. What is the General Modeling Framework? Take a look […]

COVID-19: Data Visualization Mastery

I recently made a post where we explored the data recently put out by John Hopkins University on COVID-19; while we were able to make some interesting discoveries, it seemed pertinent to gather data that provided a more full picture. In my search I came across the following dataset acquired and distributed by Tableau. This […]

Guide to Exploratory Data Analysis with JHU COVID-19 Data

There is a lot of pandemonium and energy around covid-19 and it’s potential implications. There are many parties out there saying many things. One of the amazing about being a data scientist is having the ability to dive into available data on your own. Lets dive into some data currently being accumulated by John Hopkins […]

GIT Essentials for a Data Scientist

Version Control 101 Version control is all about managing changes to files and directories by one or many contributors. Git is an incredibly popular system for version control and the one we will be running through for this course. There are many benefits to version control, and Git specifically. Including a view of historical changes […]

Why Bias in Covid-19 Reporting Will Drive New Risks & Challenges

How incomplete information & bias are driving bad assumptions and inappropriate action Right now the world is in pandemonium about the risks associated with covid-19; most of which appear to be less about virus symptoms, and more about the larger social implications of the panic. What are the current data limitations? Our information is currently […]

Don’t Miss The Bias-Variance Tradeoff Question in Your Next Interview

Why Do Interviewers Ask About it? Questions about the bias-variance tradeoff are used very frequently in interviews for data scientist positions. They often serve to delineate a data scientist that is seasoned and knows their stuff versus one that is junior… and more specifically, as one who is unfamiliar with their options for mitigating prediction […]

Become a Master of Data Wrangling in R

The dplyr package has a rich set of tools & functions that you can use for data wrangling, exploratory data analysis, feature engineering, and the like. In the next few minutes, we’ll run through the functions that are absolutely pivotal and that you’ll find yourself using every day as a data scientist. Select: Surface the […]

Intro to Bayesian Statistics

Bayesian Statistics at the Heart of Data Science Data science has deep roots in bayesian statistics & rather than giving the historical background of Sir Thomas Bayes, I’ll give you a high level perspective on bayesian statistics, bayes’ theorem, and how to leverage it as a tool in your work! Bayesian statistics are rooted in […]