In the last post I showed how I created a web-scraper that retrieves data from wine.com’s best rated wine list. The scraper produced a dataset with 23,822 rows of wine data that included: wine name, vintage year, origin, price, average ratings and total ratings. Refer here to see how I did this.
This post focuses on the data cleaning process of the results from the web-scraper. The data looks like this:
I’m in the midst of working on a a project that scraps, cleans, analyzes, and visualizes data from the online wine retailer wine.com. Specifically, I collected information from over 23,000 wines rated 94 points and above to see if I could answer some questions or glean any sort of insights. In this post, I focus on the web scraping part only.
I was interested in web scraping because I thought it was a stealthy way of obtaining a unique dataset to work with. For me it’s the closest thing I’ve ever done to ‘hacking’ if you think about it in…
I am currently making my way through a statistics course in Python and this is my cheat sheet when it comes to interpreting OLS results.
The screenshots below are model output from the statsmodels v0.13.0.dev0 (+34) library.
For complete project source code, see [Github project link]. There are other models in there but they aren’t detailed in this article yet.
For the following examples, I will be using wind turbine data from the USGS website. I will explore two different methods of inferential statistic: linear regression and logistic regression.
We will be using the Ordinary Least Squares (OLS) method for…
I recently started learning python as my first venture into developing some data analysis skills. The two courses I took were Udacity’s Intro to Python Programming and Intro to Data Analysis, and so far it’s been all I’ve needed to completely stop using Excel for my reports.
Part of what I produce at work are product engagement activity reports for the sales and customer success teams. I look at login activity and the quality of the engagements users have with our product. This helps sales teams stay aware of which accounts are doing well, but also who is a churn…
Products + Customers = Data. What cheese?