Vaibhav Mali's Documents

  • More »
  •  
  •  

Retail sale deals

June 10, 2021
Uploaded through Retail - Category: Data Files » Structured Data » Labeled Data - Tags: #retail  #retail_sales  #retail_sales_prediction  0 119 0

Context

RedFlagDeals is a forum where users can post product sales that they come across. The "All Hot Deals" section of the forum was scraped for relevant information on July 17, 2020.

I supplied a kernel on how to clean the data and will follow up with some analyses for identifying promising deals. I will continue updating the data-set with new posts on the forum should there be sufficient interest, wich I will evaluate based on the number of downloads and upvotes.

Content

Three tables are supplied.

Each row in the main table corresponds to a post. Columns indicate post information such as the title, the sum of up-votes minus down-votes, a link to the referenced deal, and more.

The comments table stores all comments made in response to the scraped posts. Titles in the 'title' column serve as foreign keys and link comments to the corresponding posts found in the main table.

Lastly, a cleaned version of the main table was supplied, for those who do not want to deal with data wrangling. The corresponding code can be found in the Kernel section.

Inspiration

After data-wrangling of the main table, the set should be fairly simple to analyze and may contain some interesting deals. Since links to the sales are included, you may come across offerings that interest you.

The comments table can be used for natural language processing and more robust sentiment analysis. You may want to consider applying PCA.

Happy sales hunting!

Some questions you may want to answer:

  • Which users generate the most discussed posts or the highest number of upvotes?
  • What type of products do top-users post?
  • What products ***** the biggest savings?
  • What are the most popular product categories posted on the forum?
  • Which retailers are most frequently represented?
  • Which retailers generate the highest number of replies per pos
  • License Type Open Data Commons
  • Data Original Source Attribution https://www.kaggle.com/jahnic/data-on-sales-posted-on-redflagdeals