Group Assignment - Data Analysis Report
Due Sunday, October 13, 11.55pm
Objective:
The objective of this assignment is to develop the skills necessary for a comprehensive data analysis process, including the selection of an appropriate dataset, formulation of a meaningful research question, application of relevant statistical methods, and clear communication of findings through a full report. By completing this project, you will gain hands-on experience in data cleaning, exploratory analysis, statistical modeling, and reporting, preparing them to effectively analyze and interpret data to support data-driven decision-making.
Instructions:
Dataset: Use the dataset provided inside the Kaggle competition via the link below. You are required to use this dataset for your analysis. You may use additional datasets if necessary, but the primary focus should be on the dataset provided.
Formulate a research question: Develop a research question that can be answered using the dataset you have selected. The research question should be specific, relevant, and clearly defined. It should also be answerable using the statistical methods covered in this course.
Conduct data analysis: Perform a comprehensive data analysis to answer the research question you have formulated. This should include data cleaning, exploratory analysis, and statistical modeling as appropriate. You should use the tools and techniques covered in this course to analyze the data and draw meaningful conclusions.
Write a report: Prepare a report summarizing your findings on how well you can predict who will leave the bank. The report should include an introduction, data description, methodology, results, and conclusion. It should be well-organized, clearly written, and include appropriate visualizations to support your analysis. Fold your code in your rmd/qmd report by setting the following yaml header.
---
title: "xxx"
author: "xxx"
format:
html:
code-fold: true
---
- Submit your report: Submit your report as a html file by the deadline. Be sure to include your name and the names of your group members on the report.
Datasets:
Bank Churn Data
Evaluation:
Your report will be evaluated based on the following criteria:
Introduction: Provide background information on your dataset and clearly state your research question and hypotheses. Include a data dictionary for the variables used. Is the research question clearly defined and relevant?
Data Description: Describe the dataset, including the variables used in your analysis. Are the data cleaning and preprocessing steps clearly explained?
Methodology: Describe the statistical methods used to analyze the data (you are required to use only the tools and methodologies that have been taught during the course. Utilizing external techniques or software not covered in the curriculum is not permitted). Are the methods appropriate for the research question? Are the assumptions of this methods met?
For postgraduate student ONLY: In addition to the requirement above. You have to survey/research one more methodology and make comparison on both of your chosen methods and discuss. This is to test your ability to think critically and be able to research and make better judgement and decision.
Results: Present the results of your analysis, including any visualizations or tables where appropriate. Are the results clearly presented and relevant to the research question?
Discussion: Discuss the implications of your findings and their relevance to the research question.
Conclusion: Summarize your findings and discuss their implications. Are the conclusions supported by the data and analysis?
Code: Include any code used to conduct the analysis in an appendix. Is the code well-documented and easy to follow?
Presentation: Finally, prepare a presentation summarizing your project, including your research question, methodology, main results, and conclusions. Make sure to highlight the most interesting aspects of your project and be prepared to discuss your findings.
Word limit: Maximum of 1500 words for undergraduate and 2000 words for postgraduate.
Generative AI: You are permitted to use generative AI tools to stimulate ideas and enhance your understanding of the assignment. However, directly copying or reproducing content generated by AI is not allowed. All submissions must be original work, reflecting your own understanding and effort. Proper attribution should be given if AI tools are used to assist in brainstorming or idea development.
Submission:
Submit your report as a html file by the deadline. Be sure to include your name and the names of your group members on the report. You should also submit your code as a separate file.
Submit your analysis via Kaggle Competition website. You should attach the proof of your submission in your report.
How to participate in Private Kaggle Competition
Join the competition https://www.kaggle.com/t/c0c9fcc6785347a38b5e6e508eb2f901
Make sure your group name is the same as the one that you register via the Moodle page. Else, it will not be counted.
Caution
I chose the data from Kaggle Competition website is because it has a lot of examples for you to refer to. In fact, I encouraged you to take a look at the examples and learn from them. However, you should not copy the analysis from the examples. You should do the analysis by yourself. All your submission will go through plagiarism check. If you are found to copy the analysis from the examples, you will get zero for the assignment.
Note:
You may work in groups of up to 3 students.
You only can use R to conduct the analysis.
You have to submit your report as a html file and also submit your prediction via the provided Kaggle Competition website link above.
Late submissions will not be accepted.
If you have any questions or need clarification on the assignment, please post them on the discussion forum.
Good luck! ♥️