For this project, I explored a dataset of Airbnb properties in NYC. I used Python and the Seaborn library to visualise the data. The examination of the dataset highlights the large variability of the price distribution and the expected impact of the neighbourhood and room type on price. To model the property price based on the independent variables, I used the scikit-learn library to implement multiple linear regression and random forest regressors. Given the limited number of independent variables on the dataset, the model accuracy was limited, especially for the higher-priced properties. A more elaborated model could be built using neural network and natural language processing analysis of the properties reviews to improve the model efficiency.
top of page
bottom of page
Comments