End-to-End Recipe Data Analysis

Recipe Image

This project involves performing data analytics on recipe data collected from a popular recipe website. The analysis includes various aspects such as ratings, nutritional values, preparation time, ingredients list, course, cuisine, and category.

You can find all code files for this project on GitHub.

Project Presentation Video

Watch an overview of the complete project in my project presentation video.

Data Collection

I collected comprehensive data on all recipes from the website Skinnytaste. For each recipe, I extracted the following information: recipe ID, name, URL, rating, ingredients list, number of servings, nutritional information, Weight Watchers value, duration, course, cuisine, and categories list. I used Scrapy, a fast and high-level web crawling and web scraping framework, to collect the data. The data was structured using an item schema and cleaned during parsing using Scrapy's ItemLoaders.

You can find the scraped data files for this project on GitHub.

Data Cleaning and Processing

The raw dataset required extensive cleaning:

  • Split the 'rating' column into 'rating value' and 'votes number'
  • Created individual columns for each nutrition type from the 'nutrition_info' column
  • Split the 'duration' column into separate columns for each duration type
  • Used the 'ingredient parser' Python package for parsing ingredient names from ingredient sentences
  • Cleaned the 'course', 'cuisine', and 'categories' columns

Data Modeling

Separate dataframes were created for unique lists of ingredients, courses, cuisines, and categories, and saved as CSV files. Data files were also created for relational mapping to link recipes with their respective ingredients, courses, cuisines, and categories.

You can find all data files here.

Data Visualization

Two interactive dashboards were created in Tableau to analyze and share insights using multiple visualizations from bar chart, line chart to word cloud, scatter plot and tree map. Filters for category, course, and cuisine name were used to interact with visualizations.

  • Compared cuisines based on ratings and courses based on duration using tree map and area map
  • Analyzed the distribution of commonly used ingredients using word clouds
  • Examined the correlation between the number of servings and total recipe duration using scatter plot
  • Visualized the nutritional value distribution across different courses using line charts and bar charts
  • Investigated the Weight Watchers values for different categories and how each nutrition type contributes to the Weight Watchers values

Insights

  1. Kosher salt, garlic, and olive oil are the three most commonly used ingredients across recipes, indicating their essential role in a variety of dishes.
  2. Among all courses, recipes categorized under 'sauce' take the most total time to prepare. This indicates that sauces often require more extensive preparation and cooking time compared to other courses.
  3. Carbohydrate and protein appear to be the main contributors to Weight Watchers values. However, it’s important to note that these findings may be influenced by the available data, and further analysis may be needed to confirm these contributors accurately.

Note: It should be noted that further insights can be found by changing the filters since it is an interactive dashboard.

For detailed visualizations and further insights, please check out the dashboards on Tableau.