I wanted to see how well a linear regression algorithm could predict how many people were likely to RSVP to a particular event. I started with the following code to build a data frame containing some potential predictors:
63.F-statistic: 8.934 on 31 and 12 DF, p-value: 0.0001399
As I understand it we can look at the R-squared value to understand how much of the variance in the data has been explained by the model – in this case it’s 85%.
A lot of the coefficients seem to be based around specific event names which seems a bit too specific to me so I wanted to see what would happen if I derived a feature which indicated whether a session was practical:
37.F-statistic: 3.049 on 17 and 26 DF, p-value: 0.005187
Again none of the coefficients are statistically significant which is disappointing. I think the main problem may be that I have very few data points (only 42) making it difficult to come up with a general model.
I think my next step is to look for some other features that could impact the number of RSVPs e.g. other events on that day, the weather.
I’m a novice at this but trying to learn more so if you have any ideas of what I should do next please let me know.