*** A brief guide to using the Regression Diagnostics *** {T J B Holland and S A T Redfern (1997) "Unit cell refinement from powder diffraction data: the use of regression diagnostics". Mineralogical Magazine, 61: 65-77.} The original reference on regression diagnostics is Belsley, Kuh and Welsh (1980) Regression Diagnostics: Identifying influential data and sources of collinearity. J Wiley. They were introduced to least squares (LSQ) problems in geology by Powell (1985) J. Met. Geol. 3, 231-243, and are briefly described there. Regression diagnostics are numbers, calculated during the regression, which furnish valuable information on the influence of each observation on the least squares result and on the estimated parameters. Usually it is deletion diagnostics which are calculated, and these give information on the changes which would result from deletion of each observation from the regression. In the context of the least squares programs used here, the main diagnostics are briefly described below: (in what follows n=number of observations and p=number of parameters) * Hat. Hat values are listed for each observation and give information on the amount of influence each observation has on the least squares result. A hat value of 0.0 implies no influence whatever, whereas a hat of 1.0 implies extreme influence (that observation is effectively fixing one parameter in the regression). The sum of the hat values is equal to the number of parameters being estimated, so an average hat value is p/n. Hat values which are greater than a cutoff value of 2p/n are flagged as potential leverage points (highly influential). * Rstudent. Ordinary residuals (y-ycalc) are not always very useful because influential data often have very small residuals. Rstudent is designed to take influence into account through division of the residual by sqrt(1-h). They are defined in Belsley et al 1980 and Powell 1985. A suitable cutoff for 95% confidence level is 2.0, any value of Rstudent above this magnitude may signify a potentially deleterious observation. * Dfits. This is a deletion diagnostic involves the change in the predicted value of y upon deletion of an observation. The diagnostic printed gives the change in calculated y upon deletion of an observation as a multiple of the standard deviation of the calculated value. Values greater than the cutoff of 2sqrt(p/n) are to be treated as potentially suspicious. * sig(i). This is simply the value that sigmafit would take on upon deletion of observation i. If this value falls significantly below the value for sigmafit (the standard error of the fit) then deletion of that observation would cause an improvement in the overall fit. * DFbetai. The change in each fitted parameter upon deletion of observation i is flagged by this diagnostic. In the output it is given in terms of a percentage of the standard error. Observations which would cause any parameter to change by more than 30% of its standard error are flagged by the program as potentially suspicious. The usefulness of diagnostics is that without re-running the regression it is possible to gain an understanding of which observations may be deleterious to the analysis. Outliers (large residuals) may not be a problem if they have a low influence (small hat). It may be a good strategy to remove the offensive observations sequentially until you are satisfied that the deleterious data have been removed. However, these are single-observation diagnostics and cannot detect deleterious effects of several observations acting together - there may be a masking effect.