Call/WhatsApp/Text: +44 20 3289 5183

Question: Data Mining : Exploring and Comparing Supervised Methods for a Prediction Task

28 May 2024,9:27 AM

You will be assigned a single dataset with an associated data mining problem to solve (e.g., a regression problem). You should first use data exploration techniques to explore the data, conduct appropriate data preparation, and then choose two supervised data mining techniques available in KNIME to predict certain data values and evaluate and compare their performance. You will need to select appropriate techniques, justify your choices made at different stages of your workflow, and demonstrate that you have knowledge of the necessary underlying data mining techniques.

You should write a 2,500 word structured report that includes the following headings (more details on how the report will be assessed are provided below):

• Introduction - introduce the prediction problem.

• Data mining theory - provide a theoretical description of the two supervised data mining methods used in the workflow (for example, the classification or regression techniques that have been used), why they are appropriate to the prediction task, and how their performance can be assessed. This should include citations to relevant prior literature.

• Data exploration and preparation – describe the approaches used in the workflow to explore the data; and perform feature selection, transformation and normalisation, where appropriate.

• Experimental setup - describe the experimental setup and the evaluation measures used in the workflow and how the data has been handled to ensure that the models were not over-fitted. You should explain which nodes were used in KNIME and provide a rationale for the various parameter settings that were used. You should not, however, simply list all the modules in your workflow and their parameters - be selective and discuss the modules most critical to solving the data mining task.

• Results – present the results for each data mining method and compare the performance of the different methods using graphical and tabular methods. What insights can you gain from the models? For example, which are the most important features, are there any outliers in the predictions?

• Conclusion and reflections – summarise the main findings of your report and reflect on the methods used. Charts and tables (and their associated captions), references and appendices are not included in the word count.

Your report should be a critical evaluation of the workflow in the context of the data mining problem posed, it should not be merely a description of what was done

 

You will choose a single dataset to base your analyses and report on. The datasets have been derived from Kaggle competitions.

Titanic Dataset (Binary Classification)

The data is split across two files, each of which contains 1,204 entries representing 1,204 passengers, although it should be noted that the passengers are not necessarily the same in the two files. The two files are titanic_ticket_data.csv and titanic_personal_data.csv. The aim of this challenge is to build a model that is able to predict whether or not a passenger will survive the sinking of the Titanic.

Song Popularity Dataset (Regression)

The data is split across two files, each of which contains 2,000 entries, representing 2,000 popular songs from the Spotify platform. The two files are song_details.csv and song_acoustic_analysis.csv. The aim of the challenge is to build a model to predict the popularity of each song on Spotify

Expert answer

This Question Hasn’t Been Answered Yet! Do You Want an Accurate, Detailed, and Original Model Answer for This Question?

 

Ask an expert

 

Stuck Looking For A Model Original Answer To This Or Any Other
Question?


Related Questions

What Clients Say About Us

WhatsApp us