Why Bother Rescaling your Peak list?
The idea behind rescaling your peak list data is that the various indexing programs work best in the 500 - 1500 A**3 range. Thus for large or very small cells that are not indexing with the native peak list data, rescaling into this range (by fudging/changing the wavelength) may help index the cell - which can then be unscaled back to real size later.
Rescaling (a ten fold reduction) is the strategy used by Bob von Dreele and co-workers for indexing on protein powder X-ray diffraction data. (R. B. Von Dreele, P. W. Stephens, G. D. Smith and R. H. Blessing, "The first protein crystal structure determined from high-resolution X-ray powder diffraction data: a variant of T3R3 human insulin-zinc complex produced by grinding", Acta Cryst. (2000). D56, 1549-1553. Synopsis: High-resolution synchrotron X-ray powder diffraction data have been used to solve the crystal structure of a new variant T3R3 human insulin-zinc complex produced by mechanical grinding of a polycrystalline sample.)
Using rescaling within Crysfire
Within Crysfire, type RS to run the rescale command
What is the optimum cell size is to rescale into?
Disclaimer: the following incorporates a lot of educated guesswork.
From: "ROBIN SHIRLEY (USER)" [R.Shirley@surrey.ac.uk] Organization: Psychology Dept, Surrey Univ. U.K. To: "L. Cranswick" [L.M.D.Cranswick@dl.ac.uk] Date: Thu, 14 Dec 2000 13:29:53 GMT Subject: Re: rescale From Lachlan: > Also, can you recommend what the optimum cell size is to rescale > into when dealing with large cells. I reckon around 500 - 1500 A**3 would make a pretty good target (i.e. 1000 +- 500).
From Robin Shirley
From: "Robin Shirley" [R.Shirley@surrey.ac.uk] Organization: Psychology Dept, Surrey Univ. U.K. To: "L. Cranswick" [L.M.D.Cranswick@dl.ac.uk] Date: Wed, 13 Dec 2000 14:56:15 GMT Subject: Unscale program CC: firstname.lastname@example.org (Bob Von Dreele) Lachlan (and Bob) Referring to my last email... > I'm confident that I can knock off a plain-vanilla standalone > scaleback utility in FORTRAN as requested, and hope to find time > for that before Xmas. The need to put together a new powerpoint > statistics presentation for this lunchtime rather squeezed it out > from this weekend I'm afraid. Well, that's now been done. The resulting filter program is called UNSCALE and is attached. It inputs a rescaled dataset's summary file (.SUM) and ouputs the corresponding unscaled solutions to the .SUM and .SMH (with header) files for a dataset name of one's choice. The input and output dataset names must be different, as it can't safely overwrite the same file that it is reading. The user must also supply it with the original rescale factor R that is to be removed. For each solution in the list, the volume (/R-cubed), direct cell sides (/R) and powder constants (*R-squared) are all unscaled, with the rest of the data fields left verbatim. The description field at the end of each solution line may also optionally be changed (globally, for all solutions). Thus, in case the rescaling is referred to in the description, this can be updated at unscale time. The algorithms have been chosen to maximise precision. Thus the powder constants and volume are recalculated from the direct cell, since this is the version that will usually have most digits of precision (for R>1). In the unlikely event that R<1, the opposite will be done, so that precision is always conserved. I already had a set of subroutines to carry out these calculations, so that part was no sweat. The program can be used either interactively via a console dialogue, or via command-line arguments, in which case it could be included in a batch script. The format for use in command-line mode is: UNSCALE <dset> <udset> <R> where dset is the input .SUM file for the original rescaled cell, udset is the dataset name to be used for the new .SUM and .SMH files, with the rescale factor stripped, and R is the rescale factor that was used for the indexing runs and now needs to be stripped. Otherwise, just type UNSCALE and follow the dialogue. Hopefully most potential errors are trapped in my usual paranoid style. So far I've only had time to test it vs the R=2 rescaled Zr114 data that was used tested originally to validate CRYS' RS (rescale) option. The corresponding dset and udset summary files are also attached (called ZR114RS2.SUM and US.SUM/.SMH respectively). Please go ahead and validate it on a wider set of cases (especially Bob, who is probably the person most in need of it). Let me know in due course whether you hit any snags - I'm not expecting any, but who knows. With best wishes for a cool Yule Robin
What are the volume ranges that the various indexing programs have been optimised for?
Disclaimer: the following incorporates a lot of educated guesswork.
From: "ROBIN SHIRLEY (USER)" [R.Shirley@surrey.ac.uk] Organization: Psychology Dept, Surrey Univ. U.K. To: "L. Cranswick" [L.M.D.Cranswick@dl.ac.uk] Date: Thu, 14 Dec 2000 13:24:44 GMT Subject: Re: Unscale program CC: email@example.com (Bob Von Dreele) From Lachlan: > This brings up the query - what are the volume ranges that the > various indexing programs have been optimised for? That's a good question. If I had a proper answer I'd have been less vague about the issue. The best general guide I can suggest is that nearly all the mature indexing programs have their roots in the 1970s, when it was unusual to tackle cells whose actual volumes were above c.1500 A**3, and very unusual actually to exceed 5000 A**3 (although pseudo-solutions with such volumes were often reported). Thus I'd suggest considering rescaling for cases in the 5000-10,000 A**3 range, and very seriously considering it for anything larger. When it comes to individual programs, the limitations are harder to pin down. I'd guess that ITO is one of the most affected by implicit volume assumptions. TREOR and LZON are perhaps among the least (since I know that Per-Eric Werner was successfully solving cells with volumes above 5000 A**3 in the 1980s), and that the algorithms used by LZON and LOSH should be relatively scale-independent. DICVOL uses some of the same underlying methodology as LZON, but has modified it to search in shells of increasing volume, which makes small cells fast at the expense of making high-volume cases time-consuming. TAUP is probably relatively insensitive to cell volume, though in practice limited by computing time to tackling higher-symmetry cases down to orthorhombic. I'm not sure about KOHL. It contains a mass of poorly-documented heuristic optimisations which speed it up greatly, but which make it hard to know the effect of issues like cell volume without systematic experimentation. In fact systematic testing of all the different programs regarding: a) high volume b) small number of observed lines would be both interesting and useful. Best wishes Robin