Evaluating a Random Forest model

The Random Forest is a powerful tool for classification problems, but as with many machine learning algorithms, it can take a little effort to understand exactly what is being predicted and what it means in context. Luckily, Scikit-Learn makes it pretty easy to run a Random Forest and interpret the results. In this post I’ll […]

Hypothesis-testing the discount bump

I bet you know this feeling: an item you need is on sale, so you gleefully add it to your cart and start thinking, “What else can I buy with all this money I just saved?” A few clicks (or turns around the store) later, and you’ve got a lot more in your cart than […]

Cleaning house data

For my first project as a Flatiron School data science bootcamper, I was asked to analyze data about the sale prices of houses in King County, Washington, in 2014 and 2015. The dataset is well known to students of data science because it lends itself to linear regression modeling. You can take a look at […]

Seven ways to scatterplot

You know scatterplots—those sprinkles of points that help you get an initial sense for how two variables relate to one another. If you have data to analyze, you’ll probably be making a scatterplot sooner or later. In this post, I’ll run through seven ways to make scatterplots using a variety of tools in Excel, Python, […]

I dig data science

Hi! I’m Jenny, and over the last few years I’ve been slowly pivoting the focus of my work from Roman archaeology to data science. In August 2019, I started a full-time data science bootcamp offered by Flatiron School. At about 50 hours of coursework per week, the program is going to help me build solid […]

For starters

Here I blog about what I’m learning, what I’m doing, and what I’m experiencing as a humanities PhD who flew the coop. (The coop was on fire.) It’s mostly professional, but also personal, because I am a person. If you want to read about my day job as a program manager/project manager for a collaborative […]