MIS772 – PREDICTIVE ANALYTICS
Subject Code – MIS772
Subject Name – Predictive Analytics
University Name – Deakin Business School, Australia
Predictive Analytics employs data, statistical algorithms, and machine learning techniques to determine the likelihood of future outcomes based on historical data. The motive is to provide the best judgment of what will happen in the future, rather than simply knowing what has happened.
Types of Predictive Models
Simple linear regression
A statistical method for describing the relationship between two continuous variables.
Multiple linear regression
A statistical method for describing the relationship between more than two continuous variables.
Polynomial regression
A nonlinear relationship will result from a nonlinear relationship between residuals and a predictor. A polynomial regression model can be used to archive this.
Y = β0 + β1X +β2X2 + … + βhXh + ϵ
Support vector regression
Another regression method is the Support Vector Machine, which characterizes the procedure based on all relevant features. With a few minor variations, the Support Vector Regression (SVR) uses similar principles to the SVM for categorization.
Decision tree regression
These decision tree models use a tree-like structure to create classification or regression-related algorithms.
The purpose of this assignment is to develop your ability to:
(i) Analyze patterns in a business dataset utilizing descriptive data mining concepts, and
(ii) Develop predictive models to address questions relevant to a particular business.
The business context for this assignment is the international tourism sector, focusing on providers of tourist accommodation. Organizations such as Airbnb provide a digital platform that tourists can use to rent properties in particular locations around the world. The properties are owned by private individuals (property hosts), and Airbnb takes a commission for bookings via their digital platform.
Globally, the tourism sector has been heavily impacted by the COVID-19 pandemic. Due to restrictions on international travel, the tourism sector is currently under financial pressure globally and must make prudent decisions to remain viable. Against this background, AirBnB approached you to generate recommendations for their rental listings in Denmark. Airbnb provided you with a dataset of 23,941 listings of rentals for the period of Nov 2016-October 2019. This dataset reflects the pre-COVID period.
Create a geospatial (map-based) visualization of all rental properties, using their geo-locations to automatically categorize those located on the Danish island of Sealand versus those in the rest of Denmark. For your visualization, use the following ranges of longitude and latitude to identify Sealand properties:
• Longitude >= 10.99 and < 13, and
• Latitude < 56.25.
Using these ranges in combination, you should be able to generate a new attribute (say “Sealand”) that determines whether the property is on the island of Sealand (true) or if the property is located in the rest of Denmark (false).
As Sealand incorporates the capital city of Copenhagen, Airbnb wants to know if there are differences between the Sealand properties versus those in the rest of Denmark. Explore this from the perspective of tourists staying at the rentals (define).
Given the financial pressure on the tourism sector, Airbnb wants to advertise properties that are NOT located on Sealand and have been popular with tourists in the past. Define a new attribute that can be used to classify whether these non-Sealand properties are popular or not, using appropriate attributes in the dataset. Develop two different classification models that can be used by Airbnb managers to predict if a particular non-Sealand rental property is likely popular or not. Evaluate the performance of each model, indicating the best predictive model.
i) the final report according to the submission template (as a PDF file)
ii) all RapidMiner files (in the RMP format) combined as a single ZIP file.
We accept