UnitCell Đ What it is

UnitCell is a leastsquares refinement program to retrieve unit cell constants from diffraction data. The user supplies indexed reflections from crystal diffraction patterns.

The program is further described in:

T J B Holland and S A T Redfern (1997) "Unit cell refinement from powder diffraction data: the use of regression diagnostics". Mineralogical Magazine, 61: 65-77.

** New feature: a zeroshift (systematic small error in 2Theta or energy) may now be refined from the data, if desired. 

Data file construction

Data input files have the form:
Synthetic spinel
1 1 1  19.000
2 2 0  31.272
3 1 1  36.846
2 2 2  38.548
4 0 0  44.810
4 2 2  55.656
5 1 1  59.358
4 4 0  65.236
5 3 1  68.628
0 0 0

The first line is a title (<255 characters), each subsequent line is the hkl and measurement (twotheta, dspace, or beam energy). The end of file is flagged by 0 0 0 for hkl. 

ĄĄĄ A Note on Errors ĄĄĄ

Earlier versions of the program used unweighted least squares, generating uncertainties from the residuals in the data. However, when the reflections had 2thetas which happened to match the calculated results too well, the resulting cell parameter uncertainties were unreasonably small. This can also happen when there are too few indexed reflections. So a minimum error on 2theta is now assumed. If the user judges that this is too small he/she can scale the errors directly (see below). It is now the user's responsibility to manage the uncertainties.

The program now weights each hkl reflection, so that regressing with 2theta, energy, d-spacing or Q gives essentially the same result. However, a minimum default uncertainty of sigma(2theta) = 0.005 deg is now assumed as the weighting. This value is transformed into energy, dspacing or Q for regressions involving minimising those quantities.

How to evaluate uncertainties in cell parameters:
Examine the value of sigmafit in the output from UnitCell. If it is significantly larger than 1.0 (say 1.7 for example) then two possibilities exist:
a) there is a poor fit to the data; in this case the residuals and regression diagnostics should be examined to see if any particular reflections are the source of the bad fit.
b) the errors on the input data may be larger than the default, in which case the errors on the cell parameters should be adjusted upwards in proportion. A doubling of the errors on 2theta leads directly to a doubling of cell parameter errors and a halving of sigmafit. Multiplying the errors by sigmafit yields the same result as an unweighted regression (and returns sigmafit to 1.0). There is no need to rerun the regression - with constant weights the cell parameters remain unchanged.
If sigmafit is less than 1.0, then the resulting uncertainties on cell parameters should not be adjusted downwards, unless you can justify robustly a case for input 2theta errors which are smaller than 0.005 deg.


ĄĄĄ A brief guide to using the Regression Diagnostics ĄĄĄ

The original reference on regression diagnostics is Belsley, Kuh and Welsh (1980) Regression Diagnostics: Identifying influential data and sources of collinearity. J Wiley. They were introduced to least squares (LSQ) problems in geology by Powell (1985) J. Met. Geol. 3, 231-243, and are briefly described there.

Regression diagnostics are numbers, calculated during the regression, which furnish valuable information on the influence of each observation on the least squares result and on the estimated parameters. Usually it is deletion diagnostics which are calculated, and these give information on the changes which would result from deletion of each observation from the regression. In the context of the least squares programs used here, the main diagnostics are briefly described below:

(in what follows n=number of observations and p=number of parameters)

Ą Hat. Hat values are listed for each observation and give information on the amount of influence each observation has on the least squares result. A hat value of 0.0 implies no influence whatever, whereas a hat of 1.0 implies extreme influence (that observation is effectively fixing one parameter in the regression). The sum of the hat values is equal to the number of parameters being estimated, so an average hat value is p/n. Hat values which are greater than a cutoff value of 2p/n are flagged as potential leverage points (highly influential).

Ą Rstudent. Ordinary residuals (y-ycalc) are not always very useful because influential data often have very small residuals. Rstudent is designed to take influence into account through division of the residual by sqrt(1-h). They are defined in Belsley et al 1980 and Powell 1985. A suitable cutoff for 95% confidence level is 2.0, any value of Rstudent above this magnitude may signify a potentially deleterious observation.

Ą Dfits. This is a deletion diagnostic involves the change in the predicted value of y upon deletion of an observation. The diagnostic printed gives the change in calculated y upon deletion of an observation as a multiple of the standard deviation of the calculated value. Values greater than the cutoff of 2sqrt(p/n) are to be treated as potentially suspicious.

Ą sig(i). This is simply the value that sigmafit would take on upon deletion of observation i. If this value falls significantly below the value for sigmafit (the standard error of the fit) then deletion of that observation would cause an improvement in the overall fit.

Ą DFbetai. The change in each fitted parameter upon deletion of observation i is flagged by this diagnostic. In the output it is given in terms of a percentage of the standard error. Observations which would cause any parameter to change by more than 30% of its standard error are flagged by the program as potentially suspicious.

The usefulness of diagnostics is that without re-running the regression it is possible to gain an understanding of which observations may be deleterious to the analysis. Outliers (large residuals) may not be a problem if they have a low influence (small hat). It may be a good strategy to remove the offensive observations sequentially until you are satisfied that the deleterious data have been removed. However, these are single-observation diagnostics and cannot detect deleterious effects of several observations acting together - there may be a masking effect.


Have fun!
Tim Holland & Simon Redfern. Updated 16th Feb 2006.

(e-mail: tjbh@esc.cam.ac.uk satr@esc.cam.ac.uk)