A reaction to a challenging example in multiple regression analysis
Keywords:Linear regression analysis, Regression illustrative example, Hazards in regression analysis, Robust regression, Mixture models
In a very stimulating paper, Preece gives an artificial dataset useful to illustrate the hazard of multiple regression and challenges the reader to spot the simple inbuilt features of these data. The present note aims at finding how Preece generated the whole set of data. First of all OLS regression model is fitted to the data; after checking for model assumptions some doubts arise on the validity of OLS regression; thus robust regression estimators are considered as a proper alternative. The latter give discordant coefficient estimates, but after a deep analysis, they agree in highlighting the presence of two subsets within the dataset: 9 cases being generated by one model, and the remaining 8 cases being generated by a second model. This particular pattern of the data is recognized by the mixture model as well.