Project2

For this project, we read in data containing information about bike rental behavior for different days, as well as various weather and day information (was it a weekday or was it a holiday for example). Marcus and I then removed columns from the data we did not use in our analysis, split the data into training and test sets and performed some exploratory data analysis. This involved creation of some summary statistics and some plots. We then fit four different models, two linear and two ensemble models, to the test portion of the bicycle data and compared them based on RMSE. If I were to do the project again I would include interaction and quadratic terms in the linear equations. It would have been cool to make the models more complex and complicated. If we had coded the project up in a way where we were working off the same data objects throughout. The way we did it was that because Marcus removed some of the columns that I wanted to use I made a replica of the data, just with those columns retained. The most difficult part for me was remembering exactly how to predict on the test data and then compare the models via a common metric. Frankly, another student asked a question on Slack that biased me towards an incorrect understanding which didn’t help. They referred to two of the algorithms as classification algorithms and their question kind of assumed the comparison would be hard. I should have ignored that - because once I had done it the comparisons based on RMSE made perfect sense to me. My big takeaways from the project are that R is a very powerful language. It was fairly straightforward to create summary statistics, plots, models and successfully test those models. Also, there is data available that can be used in a meaningful way from such a large variety phenomenon/topics. This rental bike data allows for prediction of rental levels from values mostly relating to the weather. I could imagine similar methods being used to predict restaurant or retail store patronage, demand for a product being sold online, or attendance at concerts. Data science is fascinating and statistics as “the grammar of science” is so flexible and powerful! This is the link to the repository: here This is the link to the pages page: here

Written on July 10, 2021