I took two free MOOCs and stopped using Excel for data analysis.
I recently started learning python as my first venture into developing some data analysis skills. The two courses I took were Udacity’s Intro to Python Programming and Intro to Data Analysis, and so far it’s been all I’ve needed to completely stop using Excel for my reports.
Part of what I produce at work are product engagement activity reports for the sales and customer success teams. I look at login activity and the quality of the engagements users have with our product. This helps sales teams stay aware of which accounts are doing well, but also who is a churn risk. This is one of my favorite aspects of my job, and a skill area I can see myself growing into.
Until recently, I created these reports with some backend data that I would import into Excel. I copy/paste user audit logs and various outputs from basic queries into my Excel spreadsheets and manipulate away to synthesize the reports using filters, pivot charts, and conditional formatting. This was pretty time consuming, and also required similar workflows on several similar reports. In essence, there was a lot of repetition and my hands would end up cramping up from all the selecting and clicking.
My previous experience going through a coding bootcamp taught me that if you perform the same operation more than once, you could probably write a function to do that. So I reached out to my Data Scientist friend to explain my challenge, and she said ‘you should probably learn python or R’ .
I started my journey with Udacity’s Intro to Python Programming which takes you through the basics: language conventions, datatypes, operators, functions, control flow, etc. I was already familiar with some of these concepts in other programming languages, but the only advantage that provided was to move faster through some of the content. If this is your first venture into programming in general, the course is a great place to start.
It took me maybe about a week to get through the entire course which was fairly painless. By the end of it I had gained some confidence and felt my brain grow into the programming mindset. I knew that I needed more practice and that lists and dictionaries with for loops would only get me so far for what I was trying to accomplish. I had read about other tools, and knew I needed to learn some more powerful libraries like NumPy and Pandas specifically.
2. Udacity’s Intro to Data Analysis (Also FREE!)
So before starting any projects I took Udacity’s Intro to Data Analysis. The course first takes you through how to setup an environment on your computer, so there is a suggested short courses within the course that I highly recommend called Anaconda and Jupyter Notebooks. (I only installed Miniconda by the way, so you don’t need install the full package for now). If you don’t have experience in these you should go through the lessons. Even through the Intro to Data Analysis course give you a browser-based compiler to practice in, I would highly recommend to complete as many of the lessons you can in a Jupyter Notebook running on your computer.
Then the course introduces you to NumPy first and then Pandas. You look a manipulating entire columns and rows to filter, sort, perform mathematical operations, and even some basic correlations. We also learned to some basic plotting with Matplotlib, but the lessons don’t go into visualization more than a couple plots.
Maybe more importantly, the major take way from the course is learning to import .CSVs and how to go through the framework of data analysis : 1) Question 2) Wrangle 3) Explore 4) Draw Conclusions 5) Communicate. I frequently refer back to this framework for my projects now as it provides an outline to tackle them. When I feel overwhelmed, I go back to the framework.
As far as time to complete the course, the concepts were newer to me so it took me about two weeks to complete. I really took my time knowing that these were fundamental concepts that I should mindfully grasp to have some level of proficiency.
By the end of this course, I definitely felt ready to tackle projects. I knew that I just needed more exposure, and I’ve always learned best by trying myself. I started with the famous titanic dataset that can be download from Kaggle, and created some problems following the 5 step data analysis framework.
So in a little over a month in and basically all my spare time dedicated to this, I was ready to tackle my work projects. All that prep work paid off because it took me a couple days to create all my reports in Python.
I still have to use Excel to create my .CSVs to import into my functions , but I not longer type in long formulas and select large areas of cells to apply filters and identify duplicates!
For anyone looking to level up their game in data analysis, whether it’s your full time job or not, I would highly recommend trying to learn how to program. It’s also super fun!