Earlier this year I completed Flatiron School’s full-time online data science bootcamp. The program took about 40 hours per week for 5 months, and it covered pretty much everything I expected from a data science bootcamp (plus some things I hadn’t even heard of before I started!). It’s impossible, of course, for any bootcamp to cover everything a data scientist needs to know, and that’s not really the point. Flatiron School’s stated objective for the bootcamp is to introduce students to a wide range of material and teach them how to learn new skills on their own.
After graduating and while starting my job search, I set out to fill some gaps in my preparation. Reading job ads taught me a lot about what employers are looking for and what specific skills were required for the sorts of jobs that appealed to me most. In this post I’ll share some of the resources I have used post-bootcamp to broaden and deepen my data science knowledge.
The bootcamp curriculum
Flatiron School is always updating its data science curriculum, but just for context, here is a general outline of what we covered:
- General Python programming, with particular focus on NumPy, Pandas, and Matplotlib
- Git, GitHub, and principles of version control
- Basics of data visualization
- Regression (linear, logistic, multiple linear, polynomial, etc.)
- SQL and general principles of database design
- Basics of object-oriented programming
- APIs, web scraping, and working with JSON files
- A broad survey of statistical topics, including combinatorics, permutations, distributions, central limit theorem, hypothesis testing, and Bayesian stats
- Time series analysis
- Basics of linear algebra
- Survey of machine learning algorithms and techniques, including k nearest neighbors, decision trees, random forests and other ensemble methods, support vector machines, PCA, k means, and recommendation systems
- A little introduction to Spark via PySpark
- Basics of graph theory/network analysis
- Basics of NLP
- Survey of deep learning topics, including RNNs, CNNs, and transfer learning
- A little intro to AWS and how to deploy a machine learning algorithm into production.
A whirlwind tour! I felt (and still feel) really good about the breadth and depth of what we covered. But once I really started reading job ads, I got inspired to sharpen my skills or build new ones in a few areas, described below.
We covered SQL in my bootcamp, and I used it to extract data from a database for one of my portfolio projects. But after graduation, I began to realize that SQL is really, really important in the professional world of data science. Nearly every job ad mentions some form of SQL, and SQL is especially prominent in data analyst ads. Even though I could discuss the merits of various machine learning models in detail, I was worried that I wouldn’t be able to get through a technical interview in SQL.
So I made it my first priority to improve my SQL skills after graduation. I started off trying to solve problems on HackerRank, but I found that a little discouraging. While I could view a solution to a tricky problem, there was no explanation of why the solution worked.
I had a better experience working through the “SQL for Business Analysts” series of courses on DataCamp. Some of the courses focused on the same dataset throughout, which made it easier to build knowledge of the data, which in turn made it easier to focus on getting my queries right. After completing this series, I knew I at least had an awareness of most things I might be asked to do in a technical interview, even if I couldn’t necessarily do them correctly without on the first try.
Natural Language Processing
NLP is one of my favorite data science topics, and I designed my bootcamp capstone project to focus on NLP techniques. After graduation, I wanted to continue building my knowledge.
I had referred to Aurelien Géron’s Hands-On Machine Learning with Scikit-Learn and Tensorflow while working on my capstone project. After bootcamp, I took some time to reread sections that interested me. I also bought two new books on NLP: Deep Learning for Natural Language Processing by Stephan Raaijmakers, and Natural Language Processing in Action by Hobson Lane, Cole Howard, Hannes Hapke. I thought that neither was as well-written as Géron’s book, but it’s always useful to get different perspectives and see different explanations of the same topic.
Post-bootcamp life was also the perfect time to get some hands-on experience with NLP techniques we didn’t cover in class. My favorite of these has been topic modeling, which I recently applied to music reviews from Pitchfork. During bootcamp I certainly didn’t have time to explore a lot of topics outside the curriculum, but now I do!
In bootcamp we covered Matplotlib and Seaborn, and I had some previous exposure to Bokeh, Plotly, and ggplot2. I noticed that many data analyst job ads feature dashboarding as a major job responsibility, so I wanted to beef up my skills in that area.
I had already done some tinkering in Tableau, so I took the opportunity after bootcamp to create a Tableau-centric project. Using a dataset I developed for the project, I built a dashboard to communicate key features of the data. I used four different types of plots and included a drop-down menu so users could manipulate the plots themselves. I manipulated the colors and fonts to suit the project’s theme (pinball machines) and made custom layouts so the dashboard would look good on a variety of devices. (You can view my pinball dashboard here or read how I made it in this post.) After completing this project, I felt much more confident about my ability to use Tableau, and I have something fun to show for my efforts.
When I had just graduated from bootcamp, I just wanted a job–pretty much any job! Although I had read Emily Robinson and Jacqueline Nolis’s Build a Career in Data Science and learned about the major categories of data science jobs, I still didn’t have a very clear idea of which ones were right for me. Reading a lot of job ads helped me form a clearer idea of what sorts of data work interested me most. I also learned what sorts of analyses were important to these roles.
To make myself a little better prepared for the marketing and product roles that appealed to me most, I did little projects to practice the types of analysis common in those domains. In particular, I learned how to do RFM analysis, customer segmentation, and churn modeling, working mostly with datasets I found on Kaggle. Not only was this great practice, but it also has given me more things to talk about in interviews! The fact that I taught myself how to do these analyses outside of bootcamp probably counts for something, too.
More computer science and stats
I have also been dabbling in some topics that will expand my horizons a little bit. MIT has an open-access mini-course on computer science basics called the Missing Semester of Your Computer Science Education. It’s a brief introduction to practical topics that are really important to any programmer but that don’t often get covered explicitly or in detail in degree programs and bootcamps. I haven’t finished it yet, but I found the first few lessons useful, with lots of stuff worth returning to later.
Causal inference was another topic I wanted to explore, mostly just so I could understand how it could be applied in business contexts. I found A Crash Course in Causality, an online course offered by Penn on Coursera and worked through the first unit in about a day. This little bit of work was enough to let me discuss how causal inference could help with a business problem during a recent job interview.
And that brings me to the one overarching point I want to make here. I’m finding, as I go through job interviews, that I truly don’t need to know everything about data science, just enough to speak generally about how things work and what they’re good for. Of course I’ll need to learn some things on the job–everybody does! But I think my post-bootcamp studies are helping me grow and showing potential employers that I can grow into a valuable employee.