Case Problem: Heavenly Chocolates Website Transactions.
Case #1, Fall 2020 – Descriptive Analytics
Due: Case reports and supporting files due Tuesday, October 6th, 2020; end of day (11:59pm) for Blackboard submission
Submit: 2 files – Managerial report and Python code written in editor
(.py) In your Python code, use comments (#) to indicate which problem you are addressing.
You are required to complete this assignment using Python except for part 5 where you will be using Excel. If you use any other tool (such as Excel) for parts 1-4 and 6,
75% of your grade will be deducted for those parts. Also, all Python code should be submitted to get credit for the results you found.
Case Problem: Heavenly Chocolates Website Transactions
Heavenly Chocolates manufactures and sells quality chocolate products at its plant and retail store located in Saratoga Springs, New York. Two years ago, the company developed a web site and began selling its products over the Internet. Web site sales have exceeded the company’s expectations, and management is now considering strategies to increase sales even further. To learn more about the web site customers, a sample of 50 Heavenly chocolate transactions was selected from the previous month’s sales. Data showing the day of the week each transaction was made, the type of browser the customer used, the time spent on the web site, the number of web pages viewed, and the amount spent by each of the 50 customers are contained in the file named HeavenlyChocolates. A portion of the data is shown in the table that follows:
||Amount Spent ($)
Heavenly Chocolates would like to first understand the customer behavior of the online shoppers in terms of the time they spend on the web site, the number of pages they view and the amount of money they spend; and then use the sample data to determine whether online shoppers who spend more time and view more pages also spend more money during their visit to the web site. The company would also like to investigate the effect that the day of the week and the type of the browser have on sales.
Use the methods of descriptive analytics we learned in class to learn about the customers who visit the Heavenly Chocolates web site. For each part of your report, include the results you found (either by using Python or Excel) and a discussion of your findings. As a separate file, submit your Python code where you use comments (#) to identify the parts you are solving. The report should be easy to read and written in a professional tone. Include the following in your report:
- (15 pts) Data Cleansing: (Python + Report) Start by loading the data to Python as a Pandas data frame. There are two problems with the data we have received from Heavenly Chocolates. The first problem is the data is out of order. Sort the data frame in ascending order by the column “Customer” (make sure to modify the saved data frame). Then print your data and visually inspect. Notice that we have some duplicate rows. Make a list of the customers that have duplicate rows in the data and include them in your report. Then remove the duplicate rows. Print your data to verify that duplicate rows have been removed. Also notice that the indices for our data frame (row numbers) is now out of order. Run the following code to make sure they are again in order and verify:
df.reset_index(drop=True, inplace=True) For this part, your report should just contain a summary of the data cleansing operations you have performed.
- (20 pts) Descriptive Analytics: (Python + Report) Describe your data by using Pandas library in Python. For the columns “Time (min)”, “Pages Viewed” and “Amount Spent ($)”, find the mean, median, standard deviation, variance, 25th and 75th percentiles, minimum and maximum values. Put your results into a tabular format and include in your report. Discuss what you learn about Heavenly Chocolates’ online shoppers from these numerical summaries.
- (15 pts) Associations between variables: (Python + Report) Compute the sample correlation coefficient between the time spent on the website, the number of pages viewed, and the dollar amount spent. Include your findings in your report and comment on your findings. You should calculate pairwise correlation between these three variables (three correlation coefficients in total) and comment on all. What types of relationships exist between these variables?
- (15 pts) Average time spent per page: (Python + Report) We would like to estimate the average time spent on a page per customer. Start by creating a Numpy array of shape (50,1) and call this array AvgTimeperPage – you will fill this array to include the average time spent on a page by each customer in our data set (“Time (min)” / “Pages viewed”). Iterate through the customers in the data set and for each customer, calculate the time they spent on the website divided by the pages viewed to obtain the average time spent on a page. For ith customer, save the resulting value in the ith row of the array you created. You must use a for loop for this part. Finally, calculate the average time spent per page over all customers and include this value in your report.
Hint: To iterate over the rows of the data frame, you can use the d.loc[RowIndex,”Column name”] function that we used to calculate the measures for specific columns.
5. (15 pts) Showing associations visually using Excel:(Report)
- Plot a scatter chart between the time spent on the website and dollar amount spent. Use the horizontal axis for the time spent on the website. Embed your chart in your report. Discuss what you learn from the scatter chart. Comment on how this relates to your findings on correlation in part 3?
- Plot a scatter chart between the number of web pages viewed and dollar amount spent. Use the horizontal axis for the number of web pages viewed. Embed your chart in your report. Discuss what you learn from the scatter chart. Comment on how this relates to your findings on correlation in part 3?
- How do the two scatter charts compare and what does that tell you about the relationship between these variables? (Hint: Try fitting trendlines to your data to visualize the relationship.) Is this supported by your findings in part 3?
- (10 pts) Mean amount spent by browser: (Python + Report) Find the mean amount spent ($) by Browser (Chrome, Firefox and Other) using Python. Put your results into a tabular format and include in your report. Discuss the observations you can make about Heavenly Chocolates’ business based on the browser used. Hint: There are multiple ways to solve this part. You can use a combination of for and if loops to recursively calculate the total amount spent for each browser type and divide that by the number of observations for that group to obtain the mean. Another way to do this is to use the groupby function in Pandas – do research to find out how this works.
- (10 pts) Quality of the end result: Report writing style, clarity of the discussions and arguments, presentation of the report, clarity of the Python code and comments included within the code. Use proper grammar, punctuation, and spelling and write effective sentences that make logical sense. Consider this to be a business communication and write in a clear and professional tone. Avoid clichés and informal language.
Case Problem: Heavenly Chocolates Website Transactions.