SHELXL-93 Manual

                               SHELXL-93
                               ---------

SHELXL-93 is a FORTRAN-77 program for the refinement of crystal structures
from diffraction data, and is primarily designed for single crystal X-ray data
at atomic resolution.  It is intended to be easy to install and use on a wide
variety of computers, and replaces the structure-refining part of SHELX-76.

   SHELXL-93 is general and efficient for all space groups in all settings and
there are no arbitrary limits to the size of problems which can be handled,
except for the total memory available to the program.  All instructions are
in machine independent free format, with extensive use of default settings to
minimize the amount of input required from the user.  Instructions and data
are taken from two standard (ASCII) text files, so that input files can easily
be transferred between different computers.  SHELXL-93 is provided in source
form as well as precompiled PC versions.   An application form is reproduced
in Appendix F.

   The program is available free to academics (and for a modest license fee
to commercial institutions) subject to the condition that it is acknowledged
in all publications which report structures refined with it.

   SHELXL-93 has been designed particularly for optimum performance on
computers with vector and pipelined architectures, and most vectorizing
compilers achieve a high degree of vectorization for all important routines
without special action on the part of the user. The rate-determining routines
are provided in a separate file so that they may be compiled with full
optimization; it may well prove counter-productive to optimize or vectorize
the rest of the program!  The distribution and installation of the program is
discussed further in Appendix E.

   Two auxiliary programs are provided for use with SHELXL-93.  PDBINS reads
a PDB-format file for a protein and interactively generates a SHELXL-93 '.ins'
input file, and CIFTAB reads the '.cif' output file from SHELXL-93 and
produces various tables.  Users are encouraged to adapt CIFTAB and PDBINS to
local circumstances.   PDBINS and CIFTAB are described in Appendices B and C
respectively.


                   PROGRAM AND FILE ORGANIZATION

The way of running SHELXL-93 and the conventions for filenames will of course
vary for different computers and operating systems, but the following general
concept should be adhered to as much as possible.  SHELXL-93 may be run
on-line by means of the command:

                           shelxl name

where 'name' defines the first component of the filename for all files which
correspond to a particular crystal structure.  On some systems, 'name' may not
be longer than 8 characters.  Batch operation will normally require the use of
a short batch file containing the above command etc.

   Before starting SHELXL-93, two ASCII input files must be prepared. The file
'name.ins' contains instructions, crystal and atom data etc.  The reflection
data file 'name.hkl' contains one line per reflection in the same fixed format
as for SHELX-76; batch numbers, wavelength (for Laue data) and 'direction
cosines' (for absorption corrections using SHELXA) are optional.  Although
both files are essentially upwardly compatible with SHELX-76 and the Siemens
SHELXTL system, there are many new facilities and some important philosophical
differences.  When converting '.ins' files from these programs to SHELXL-93,
it is a good idea to DELETE or modify all WGHT, OMIT, BLOC, ILSF and MERG
instructions (because of changes in their specifications), to review all AFIX
instructions for possible differences or more appropriate new options, and to
free any coordinates which have been fixed to anchor the molecule in polar
space groups (the program has a better way of doing this).  The Fourier grid
is now larger and the asymmetric unit is found automatically, so FMAP and GRID
should be replaced by e.g. 'FMAP 2'. Free variables are no longer required for
special position constraints or handling multiply occupied sites (see EXYZ
and EADP) but are still legal and are interpreted in the same way.  Disordered
groups will probably require the addition of PART instructions and may benefit
from some of the new restraints (SAME, SADI, FLAT, DELU and SIMU etc.).

   A brief summary of the progress of the structure refinement appears on the
console (i.e. the standard FORTRAN output), and a full listing is written to
a file 'name.lst', which can be printed or examined with a text editor.  After
each refinement cycle a file 'name.res' is (re)written; it is similar to
'name.ins', but has updated values for all refined parameters.  It may be
copied or edited to name.ins for the next refinement run.  Optionally further
files 'name.cif' (refinement results) and 'name.fcf' (reflection data) may be
created (using the ACTA instruction) in CIF format for direct publication,
archiving and input to other programs (e.g. CIFTAB - see Appendices C and D).

   Two mechanisms are provided for interaction with a SHELXL-93 job which is
already running.  The first, which it is not possible to implement on all
computer systems, applies to 'on-line' runs.  If the  key combination
is hit, the job terminates almost immediately, but without the loss of output
buffers etc. which can happen with  etc.  Usually the  key may be
used as an alternative to .  If the  key is hit during least-
squares refinement, the program completes the current cycle and then, instead
of further refinement cycles, continues with the final structure-factor
calculation, tables and Fourier etc.  Otherwise  has no effect.  On
computer consoles with no  key,  or  usually have the same
effect.

   The second mechanism requires the user to create the file 'name.fin' (the
contents of this file are irrelevant); the program tries at regular intervals
to delete it, and if it succeeds it takes the same action as after .  The
name.fin file is also deleted (if found) at the start of a job in case it has
been accidentally left over from a previous job.  This approach may be used
with batch jobs under most operating systems.

   The UNIX version of SHELXL-93 is able to read the '.ins' and '.hkl' files
in either UNIX or DOS format, and writes the '.res', '.cif' and '.fcf' files
in DOS format, so that PC's can access such files via a shared disk without
the need for conversion programs such as DOS2UNIX etc.  The program may be
compiled without this option if necessary.  For reasons of efficiency the
'.lst' file is always in the local format.  Note that for UNIX systems all
filenames associated with SHELXL-93 should be in lower case.

   The program uses two large arrays A and B dynamically, so the limits on the
size of structure which can be handled are determined by the dimensions of
these two arrays and also of the array C; A, B and C are defined as separate
COMMON blocks.  The standard version of the program is dimensioned for up to
1500 parameters in each full-matrix block and roughly 5000 atoms (assuming a
generous number of restraints etc.), and is suitable for a typical (UNIX)
workstation (or mainframe) with 8MB or more physical memory.  The standard
precompiled PC version is similarly dimensioned but will automatically run as
a virtual memory program if less memory is available; it thus requires 8MB of
free contiguous disk space (plus another 2MB or so for scratch files) and an
80586, 80486 or 80386/80387 processor.  A real mode precompiled PC version
PCSHELXL.EXE is also available which should run on virtually ANY PC with a
coprocessor and 640K memory; however it is restricted to 300 full-matrix
parameters and is somewhat slower.

   It may be necessary to redimension A, B and C and recompile the program for
specific installations, e.g. to fit within a given job category on a
mainframe.  The highest elements of A and B actually used for the various
calculations are printed out by the program (after 'Memory required =').  The
program will try to use all available physical (and virtual) memory rather
than performing its own disk I/O, thereby achieving longer vector 'runs',
which enhances performance on vector and pipelined systems.  In some cases,
e.g. when a large structure is refined on a MicroVAX or PC with limited
physical memory (or allocation of physical memory to a given process in the
case of the VAX) this strategy may cause excessive 'paging' and disk I/O.  If
this happens, the maximum vector run length can be reduced by setting the 4th
parameter on the L.S. instruction or by reducing the value of the variable IV
in the main program and recompiling; it may also be more efficient to 'block'
the refinement or use the CGLS option.


            THE '.ins' INSTRUCTION FILE - GENERAL ORGANIZATION

All instructions commence with a four (or fewer) character word (which may be
an atom name); numbers and other information follow in free format, separated
by one or more spaces.  Upper and lower case input may be freely mixed; with
the exception of the text string input using TITL, the input is converted to
upper case for internal use in SHELXL-93.  The TITL, CELL, ZERR, LATT (if
required), SYMM (if required), SFAC, DISP (if required) and UNIT instructions
must be given in that order; all remaining instructions, atoms, etc. should
come between UNIT and the last instruction, which is always HKLF (to read in
reflection data).

   There is also a facility (which may not be possible under some operating
systems) for reading instructions from (possibly nested) 'include files' by
inserting the line '+filename' at the appropriate place in the '.ins' file.

   A number of instructions allow atom names to be referenced; use of such
instructions without any atom names means 'all non-hydrogen atoms' (in the
current residue, if one has been defined).  A list of atom names may also be
abbreviated to the first atom, the symbol '>' (separated by spaces), and then
the last atom; this means 'all atoms between and including the two named atoms
but excluding hydrogens'.  For further details of the atom list syntax, see
'RESI' as well as the following examples.


               EXAMPLES OF SHELXL-93 STRUCTURE REFINEMENTS

The two test structures supplied with the program are intended to provide a
good illustration of routine structure refinement with SHELXL-93.  The output
discussed here should not differ significantly from that of the test jobs,
except that it has been abbreviated and there may be slight differences in
the last decimal place caused by rounding errors.

==============================================================================

FIRST EXAMPLE (ags4):

   The first example (provided as the files 'ags4.ins' and 'ags4.hkl') is the
final refinement job for the polymeric inorganic structure Ag(NCSSSSCN)2 AsF6.
This structure is described by H.W. Roesky, T. Gries, J. Schimkowiak and P.G.
Jones in Angew. Chem. 98 (1986) 93-94 [Int. Edn. 25 (1986) 84-85] and was also
used as the cover picture for the SHELXS-86 manual.  Each ligand bridges two
Ag+ ions so each silver is tetrahedrally coordinated by four nitrogen atoms.
The silver, arsenic and one of the fluorine atoms lie on special positions.
Normally the four unique heavy atoms (from Patterson interpretation using
SHELXS) would have been refined first isotropically and the remaining atoms
found in a difference synthesis, and possibly an intermediate job would have
been performed with the heavy atoms anisotropic and the light atoms isotropic.
For test purposes we shall simply input the atomic coordinates which assumes
isotropic U's of 0.05.  In this job all atoms are to be made anisotropic
(ANIS).  We shall further assume that a previous job has recommended the
weighting scheme used here (WGHT) and shown that one reflection is to be
suppressed in the refinement because it is clearly erroneous (OMIT).

   The first 9 instructions (TITL...UNIT) are the same for any SHELXS and
SHELXL-93 job for this structure and define the cell dimensions, symmetry and
contents.  The Siemens SHELXTL program XPREP can be used to generate these
instructions automatically for any space group etc.  SHELXL-93 knows the
scattering factors for the first 94 neutral atoms in the Periodic Table.
Ten least-squares cycles are to be performed, and the ACTA instruction ensures
that the CIF files 'ags4.cif' and 'ags4.fcf' will be written for archiving and
publication purposes.  ACTA also sets up the calculation of bond lengths and
angles (BOND) and a final difference electron density synthesis (FMAP 2) with
peak search (PLAN 20).  The HKLF 4 instruction terminates the file and
initiates the reading of the 'ags4.hkl' intensity data file.

   Users migrating from SHELX-76 should note that it is still legal to set up
special position constraints on the x,y,z-coordinates, occupation factors, and
Uij components (for upwards compatibility).  However it is totally unnecessary
because the program will do this automatically for any special position in any
space group, conventional or otherwise.  Similarly the program recognizes
polar space groups (P-4 is non-polar) and applies appropriate restraints
(H.D. Flack and D. Schwarzenbach, Acta Cryst., A44 (1988) 499-506), so
it is no longer necessary to worry about fixing one or more coordinates to
prevent the structure drifting along polar axes.  It is not necessary to set
the overall scale factor using an FVAR instruction for this initial job,
because the program will itself estimate a suitable starting value.  Comments
may be included in the '.ins' file either as REM instructions or as the rest
of a line following '!'; this latter facility has been used to annotate this
example.


TITL AGS4 in P-4                         ! title of up to 76 characters
CELL 0.71073 8.381 8.381 6.661 90 90 90  ! wavelength and unit-cell
ZERR 1 .002 .002 .001 0 0 0              ! Z (formula-units/cell), cell esd's
LATT -1                              ! non-centrosymmetric primitive lattice
SYMM -X, -Y, Z
SYMM Y, -X, -Z              ! symmetry operators (x,y,z must be left out)
SYMM -Y, X, -Z
SFAC C AG AS F N S          ! define scattering factor numbers
UNIT 4 1 1 6 4 8            ! unit cell contents in same order

L.S. 10                     ! 10 cycles full-matrix least-squares
ACTA                        ! CIF-output, bonds, Fourier, peak search
OMIT -2 3 1                 ! suppress bad reflection
ANIS                        ! convert all (non-H) atoms to anisotropic
WGHT 0.037 0.31             ! weighting scheme
AG  2  .000  .000  .000
AS  3  .500  .500  .000
S1  6  .368  .206  .517     ! atom name, SFAC number, x, y, z (usually
S2  6  .614  .966  .736     ! followed by sof and U(iso) or Uij); the
C   1  .278  .095  .337     ! program automatically generates special
N   5  .211  .030  .214     ! position constraints
F1  4  .596  .325 -.007
F2  4  .500  .500  .246
HKLF 4                      ! read h,k,l,Fo^2,sigma(Fo^2) from 'ags4.hkl'

The '.lst' listing file starts with a header followed by an echo of the above
'.ins' file. After reading TITL...UNIT the program calculates the cell volume,
F(000), absorption coefficient, cell weight and density.  If the density is
unreasonable, perhaps the unit-cell contents have been given incorrectly.
The next items in the '.lst' file are the connectivity table and the symmetry
operations used to include a shell of symmetry equivalent atoms (so that all
unique bond lengths and angles can be found):

------------------------------------------------------------------------------

Covalent radii and connectivity table for  AGS4 in P-4

C    0.770
AG   1.440
AS   1.210
F    0.640
N    0.700
S    1.030

Ag - N N_$4 N_$5 N_$3
As - F2 F2_$6 F1_$7 F1_$6 F1_$1 F1
S1 - C S2_$1
S2 - S2_$2 S1_$1
C - N S1
N - C Ag
F1 - As
F2 - As


Operators for generating equivalent atoms:

$1   -x+1, -y+1, z
$2   -x+1, -y+2, z
$3   -x, -y, z
$4   y, -x, -z
$5   -y, x, -z
$6   y, -x+1, -z
$7   -y+1, x, -z

------------------------------------------------------------------------------

Note that in addition to symmetry operations generated by the program, one
can also define operations with the EQIV instruction and then refer to the
corresponding atoms with _$n in the same way.  Thus:

 EQIV $1 1-x, 1-y, z
 EQIV $2 x, y-1, z
 EQIV $3 1-x, -y, z
 CONF S1 S2_$1 S2_$2 S1_$3

could have been included in 'ags4.ins' to calculate the S-S-S-S torsion angle.
Only one new operator would have been required if S2 were bonded to S1 in the
original atom list.  If EQIV instructions are used, the program renumbers the
other symmetry operators accordingly.

   The next part of the output is concerned with the data reduction:

------------------------------------------------------------------------------

   1475  Reflections read, of which     0  rejected

  0 =< h =< 10,     -9 =< k =< 10,      0 =< l =<  8,   Max. 2-theta =   55.00

      0  Systematic absence violations


Inconsistent equivalents etc.

  h   k   l       Fo^2    Sigma(Fo^2)  Esd of mean(Fo^2)

  3   4   0      387.25       8.54       47.78

      1  Inconsistent equivalents

    904  Unique reflections, of which      1  suppressed

R(int) = 0.0165     R(sigma) = 0.0202      Friedel opposites not merged

Maximum memory for data reduction =   955 /    9083

------------------------------------------------------------------------------

Throughout this documentation, Sigma with a capital S means a summation, and
sigma with a small s is an esd.  Fo^2 means the EXPERIMENTAL measurement, and
so, despite the square, may possibly be slightly negative if the background is
higher than the peak as a result of statistical fluctuations etc.  R(int) and
R(sigma) are defined as follows:

          R(int) = Sigma | Fo^2 - Fo^2(mean) | / Sigma [ Fo^2 ]

where both summations involve all input reflections for which more than one
symmetry equivalent is averaged, but not the remaining reflections, and:

            R(sigma) = Sigma [ sigma(Fo^2) ] / Sigma [ Fo^2 ]

over all reflections in the merged list.  Since these R-indices are based on
F^2, they will tend to be about twice as large as the corresponding indices
based on F.  The 'esd of the mean' (in the table of inconsistent equivalents)
is the rms deviation from the mean divided by the square root of (n-1), where
n equivalents are combined for a given reflection.  In estimating the
sigma(F^2) of a merged reflection, the program uses the value obtained by
combining the sigma(F^2) values of the individual contributors, unless the esd
of the mean is larger, in which case it is used instead.

   The memory statistics which appear at various points in the output give the
highest elements of the A and B arrays used for the given calculation.
Although it is easy to adjust these dimensions, it requires recompiling the
program and will rarely be required.  For example there is no limit on the
number of reflections in this sort/merge stage - if there is less physical
memory the program makes more use of the disk, which of course is slower.

   Special position constraints are then generated and the statistics from the
first least-squares cycle are listed (the output has been compacted to fit the
page). The maximum vector length refers to the number of reflections processed
simultaneously in the rate-determining calculations; usually the program
utilizes all available memory to make this as large as possible, subject to a
maximum of 511.  This maximum may be reduced (but not increased) by means of
the fourth parameter on the L.S. (or CGLS) instruction; this may be required
to prevent unnecessary disk transfers when large structures are refined on
virtual memory systems with limited physical memory.  The number of parameters
refined in the current cycle is followed by the total number of refinable
parameters (here both are 55).

------------------------------------------------------------------------------

Special position constraints for Ag
x =  0.0000         y =  0.0000         z =  0.0000         U22 = 1.0 * U11
U23 = 0             U13 = 0             U12 = 0             sof = 0.25000

Special position constraints for As
x =  0.5000         y =  0.5000         z =  0.0000         U22 = 1.0 * U11
U23 = 0             U13 = 0             U12 = 0             sof = 0.25000

Special position constraints for F2
x =  0.5000         y =  0.5000         U23 = 0             U13 = 0
sof = 0.50000


Least-squares cycle 1  Maximum vector length =511  Memory required =1095/82388

wR2 =  0.5042 before cycle   1 for    903 data and   55 /   55 parameters

GooF = S =     3.480;     Restrained GooF =      3.480  for      0 restraints

Weight = 1/[sigma^2(Fo^2)+(0.0370*P)^2+0.31*P] where P=(Max(Fo^2,0)+2*Fc^2)/3

** Shifts scaled down to reduce maximum shift/esd from   17.32  to   15.00 **

    N      value        esd    shift/esd  parameter

    1     2.38015     0.04260    32.401    OSF
    2     0.08362     0.00224    14.993    U11 Ag
    5     0.02864     0.00580    -3.679    U33 As
   11     0.08546     0.00781     4.543    U33 S1
   23    -0.01788     0.00444    -4.027    U12 S2
   47     0.14422     0.01515     6.218    U33 F1
   52     0.13288     0.02330     3.558    U11 F2

Mean shift/esd =   2.053    Maximum =  32.401 for  OSF

Max. shift = 0.055 A for C      Max. dU = 0.049 for F2

------------------------------------------------------------------------------

Only the largest shift/esd's are printed. More output could have been obtained
using 'MORE 2' or 'MORE 3'.  The largest correlation matrix elements are
printed after the last cycle, in which the mean and maximum shift/esd have
been reduced to 0.002 and 0.012 respectively.  This is followed by the full
table of refined coordinates and Uij's with esd's (too large to include here,
but similar to the corresponding table in SHELX-76 except that Ueq and its esd
are also printed) and by a final structure factor calculation:

------------------------------------------------------------------------------

Final Structure Factor Calculation for  AGS4 in P-4

Total number of l.s. parameters = 55        Maximum vector length = 511

wR2 =  0.0779 before cycle  11 for    903 data and    2 /   55 parameters

GooF = S =     1.063;     Restrained GooF =      1.063  for      0 restraints

Weight = 1/[sigma^2(Fo^2)+(0.0370*P)^2+0.31*P] where P=(Max(Fo^2,0)+2*Fc^2)/3

R1 =  0.0322 for    818 Fo > 4.sigma(Fo)  and  0.0370 for all    904 data
wR2 =  0.0834,  GooF = S =   1.138,  Restrained GooF =    1.138  for all data

Flack x parameter =   0.0224   with esd  0.0260   (expected values are 0
(within 3 esd's) for correct and +1 for inverted absolute structure)

------------------------------------------------------------------------------

There are some important points to note here.  The weighted R-index based on
Fo^2 is (for compelling statistical reasons) much higher than the conventional
R-index based on Fo with a threshold of say Fo > 4.sigma(Fo).  For comparison
with structures refined against F the latter is therefore printed as well (as
R1).  Despite the fact that wR2 and not R1 is the quantity minimized, R1 has
the advantage that it is relatively insensitive to the weighting scheme, and
so is more difficult to manipulate.

   Since the structure is non-centrosymmetric, the program has automatically
estimated the Flack absolute structure parameter x in the final structure
factor summation.  In this example x is within one esd of zero, and its esd
is also relatively small.  This provides strong evidence that the absolute
structure has been assigned correctly, so that no further action is required.
The program would have printed a warning here if it would have been necessary
to 'invert' the structure.  For further details see the section on 'absolute
structure' below.  The two parameters 'refined' ( 2 / 55 ) but not applied in
the final structure factor cycle in this case are related to the overall scale
and the Flack x parameter; no parameters are 'refined' in the final structure
factor cycle for a centrosymmetric structure.

   This is followed by a list of principal mean square displacements U for all
anisotropic atoms.  It will be seen that none of the smallest components (in
the third column) are in danger of going negative [which would make the atom
'non positive definite' (NPD)] but that the motion of the two unique fluorine
atoms is highly anisotropic (not unusual for an AsF6 anion).  The program
suggests that the fluorine motion is so extended in one direction that it
would be possible to represent each of the two fluorine atoms as disordered
over two sites, for which x, y and z coordinates are given; this may safely be
ignored here (although there may well be some truth in it).  The two suggested
new positions for each 'split' atom are placed equidistant from the current
position along the direction (and reverse direction) corresponding to the
largest eigenvalue of the anisotropic displacement tensor.

   This list is followed by the analysis of variance (reproduced here in
squashed form), recommended weighting scheme (to give a flat analysis of
variance in terms of Fc^2), and a list of the most disagreeable reflections
(which clearly shows that the one reflection suppressed by OMIT is indeed an
aberration).  For a discussion of the analysis of variance see the second
example.

------------------------------------------------------------------------------

Principal mean square atomic displacements U

  0.1067   0.1067   0.0561   Ag
  0.0577   0.0577   0.0386   As
  0.1038   0.0659   0.0440   S1
  0.0986   0.0515   0.0391   S2
  0.0779   0.0729   0.0391   C
  0.1004   0.0852   0.0474   N
  0.3029   0.0954   0.0473   F1
     may be split into  0.5965  0.3173  0.0288  and  0.5946  0.3324 -0.0369
  0.4778   0.1671   0.0457   F2
     may be split into  0.5320  0.5089  0.2462  and  0.4680  0.4911  0.2462

Analysis of variance for reflections employed in refinement
K = Mean[Fo^2] / Mean[Fc^2]  for group

Fc/Fc(max)     0.000 0.026 0.039 0.051 0.063 0.082 0.103 0.147 0.202 0.306 1.0

Number in group    94.   89.   90.   91.   89.   91.   89.   91.   88.   91.

           GooF  1.096 1.101 0.997 1.078 1.187 1.069 1.173 0.922 1.019 0.966

            K    1.560 1.053 1.010 1.004 1.007 1.021 1.026 1.002 0.997 0.984


Resolution(A)  0.77  0.81  0.85  0.90  0.95  1.02  1.10  1.22  1.40  1.74  inf

Number in group    97.   84.   92.   91.   89.   90.   89.   90.   93.   88.

           GooF  1.067 0.959 0.935 0.895 1.035 1.040 1.115 1.149 1.161 1.228

            K    1.047 1.010 1.009 0.991 1.004 0.996 0.989 1.012 0.997 0.982

            R1   0.166 0.100 0.069 0.059 0.051 0.036 0.033 0.027 0.020 0.020


Recommended weighting scheme:  WGHT   0.0329   0.3591


Most Disagreeable Reflections (* if suppressed)

    h   k   l       Fo^2        Fc^2  Delta(F^2)/esd  Fc/Fc(max)  Resolution(A)

*  -2   3   1         43.53          7.44      11.14       0.029       2.19
    4   4   4         18.32         33.30       3.51       0.062       1.11
   -4   1   3         15.79          4.17       3.39       0.022       1.50
    0   2   2         41.60         57.32       3.16       0.082       2.61
    2   5   0        124.72        100.33       3.06       0.108       1.56
    2   3   0         64.43         48.46       3.03       0.075       2.32
   -5   4   1         11.04          2.57       2.90       0.017       1.28
    2   5   3         42.27         55.48       2.60       0.080       1.27
    6   5   2          6.43          1.02       2.56       0.011       1.02
    4   6   2         20.16         11.98       2.55       0.037       1.10
    6   1   1         55.45         42.28       2.51       0.070       1.35
    6   0   5        104.65        126.19       2.49       0.121       0.96
    4   1   2        139.30        116.95       2.44       0.117       1.74
    9   0   3         39.34         26.06       2.44       0.055       0.86
    2   4   4        371.53        327.01       2.36       0.195       1.24
    4   3   5         55.69         43.02       2.33       0.071       1.04
   -3   6   0          7.51          3.10       2.25       0.019       1.25
   -1   4   2        142.05        120.53       2.22       0.119       1.74
    0  10   1          2.01          8.31       2.21       0.031       0.83
   -2   1   2       1497.02       1361.86       2.20       0.399       2.49

------------------------------------------------------------------------------

After the table of bond lengths and angles (BOND was implied by the ACTA
instruction), the data are merged (again) for the Fourier calculation after
correcting for dispersion (because the electron density is real).  In contrast
to the initial data reduction, Friedel's law is assumed here; the aim is to
set up a unique reflection list so that the (difference) electron density can
be calculated on an absolute scale.

   The algorithm for generating the 'asymmetric unit' for the Fourier
calculations is general for all space groups, in conventional settings or
otherwise.  The rms electron density (averaged over all grid points) is
printed as well as the maximum and minimum values so that the significance of
the latter can be assessed.  Since PLAN 20 was assumed, only a peak list is
printed (and written to the .res file), followed by a list of shortest
distances between peaks (not shown below); PLAN -20 would have produced a more
detailed analysis with 'printer plots' of the structure. The last 40 peaks
and some of the interatomic distances have been deleted here to save space.
In this table, 'distances to nearest atoms' takes symmetry equivalents into
account.

------------------------------------------------------------------------------

Bond lengths and angles        [severely squashed to fit 80 columns!]

Ag -   Distance     Angles
N     2.279(0.006)
N_$4  2.279(0.006) 113.08(0.15)
N_$5  2.279(0.006) 113.08(0.15) 102.47(0.29)
N_$3  2.279(0.006) 102.47(0.29) 113.08(0.16) 113.08(0.15)
         Ag -        N            N_$4         N_$5

As -   Distance     Angles
F2    1.640(0.007)
F2_$6 1.640(0.007)180.00(0.00)
F1_$7 1.672(0.004) 89.08(0.41) 90.92(0.41)
F1_$6 1.672(0.004) 89.08(0.41) 90.92(0.41)178.18(0.82)
F1_$1 1.672(0.004) 90.92(0.41) 89.08(0.41) 90.01(0.01) 90.01(0.01)
F1    1.672(0.004) 90.92(0.41) 89.08(0.41) 90.01(0.01) 90.01(0.01)178.18(0.82)
         As -        F2          F2_$6       F1_$7       F1_$6       F1_$1

S1 -   Distance     Angles
C     1.682(0.007)
S2_$1 2.063(0.003)  98.61(0.20)
         S1 -        C

S2 -   Distance     Angles
S2_$2 2.011(0.003)
S1_$1 2.063(0.003) 105.37(0.07)
         S2 -        S2_$2

C -    Distance     Angles
N     1.147(0.007)
S1    1.682(0.007) 175.67(0.49)
         C -         N

N -    Distance     Angles
C     1.147(0.007)
Ag    2.279(0.006) 152.38(0.45)
         N -         C

F1 -   Distance     Angles
As    1.672(0.004)
         F1 -

F2 -   Distance     Angles
As    1.640(0.007)
         F2 -


FMAP and GRID set by program

FMAP   2   3  18
GRID    -3.333  -2  -1     3.333   2   1

R1 =  0.0370 for    590 unique reflections after merging for Fourier
Highest memory used    768 /    6109


Electron density synthesis with coefficients Fo-Fc

Maximum = 0.32,   Minimum = -0.35 e/A^3,   Highest memory used = 768/13827
Mean = 0.00,   Rms deviation from mean = 0.07 e/A^3


Fourier peaks appended to .res file

         x       y       z       sof     U    Peak  Dist to nearest atoms

Q1  1  0.0000  0.0000  0.5000  0.25000  0.05  0.32  2.60 N  2.69 C  3.33 AG
Q2  1  0.5691  0.3728  0.1623  1.00000  0.05  0.27  1.20 F1  1.34 F2  1.62 AS
Q3  1  0.5685  0.3851 -0.1621  1.00000  0.05  0.24  1.19 F1  1.25 F2  1.56 AS
Q4  1  0.4075  0.4717  0.2378  1.00000  0.05  0.23  0.81 F2  1.78 AS  1.79 F1
Q5  1  0.5848  0.2667  0.0312  1.00000  0.05  0.23  0.55 F1  2.09 AS  2.47 F1
Q6  1  0.5495  0.3425 -0.1122  1.00000  0.05  0.21  0.83 F1  1.57 AS  1.65 F2
Q7  1  0.2617 -0.1441  0.1446  1.00000  0.05  0.20  1.59 N  2.17 F1  2.40 C
Q8  1  0.7221  0.1898  0.0030  1.00000  0.05  0.20  1.55 F1  2.39 N  2.54 N
Q9  1  0.1997  0.0293  0.1024  1.00000  0.05  0.19  0.75 N  1.79 C  1.82 AG
Q10 1  0.5394  1.0113  0.8165  1.00000  0.05  0.19  0.91 S2  1.41 S2  2.82 S1

==============================================================================

SECOND EXAMPLE (sigi):

In the second example (provided as the files 'sigi.ins' and 'sigi.hkl') a
small organic structure is refined in the space group P-1.  Only the features
that are different from the ags4 refinement will be discussed in detail.  The
structure consists of a five-membered lactone [-C7-C11-C8-C4(O1)-O3-] with a
-CH2-OH group [-C5-O2] attached to C7 and a =C(CH3)(NH2) unit [=C9(C10)N6]
double-bonded to C8.

   Of particular interest here is the placing and refinement of the 11
hydrogen atoms via HFIX instructions.  The two -CH2- groups (C5 and C11) and
one tertiary CH (C7) can be placed geometrically by standard methods; the
algorithms have been improved relative to those used in SHELX-76, and the
hydrogen atoms are now idealized before each refinement cycle (and after the
last).  Since N6 is attached to a conjugated system, it is reasonable to
assume that the -NH2 group is coplanar with the C8=C9(C10)-N6 unit, which
enables these two hydrogens to be placed as ethylenic hydrogens, which
requires HFIX (or AFIX) 9n; the program takes into account that they are
bonded to nitrogen in setting the default bond lengths.  All these hydrogens
are to be refined using a 'riding model' (HFIX or AFIX m3) for x, y and z.

   The -OH and -CH3 groups are trickier, in the latter case because C9 is
sp2-hybridized, so the potential barrier to rotation is low and there is no
fully staggered conformation available as the obvious choice.  Since the data
are reasonable, the initial torsion angles for these two groups can be found
by means of difference electron density syntheses calculated around the
circles which represent the loci of all possible hydrogen atom positions.  The
torsion angles are then refined during the least-squares refinement.  Note
that in subsequent cycles (and jobs) these groups will be re-idealized
geometrically with RETENTION of the current torsion angle; the circular
Fourier calculation is performed only once.  Two 'free variables' (2 and 3 -
yes, they still exist!) have been assigned to refine common isotropic
displacement parameters for the 'rigid' and 'rotating' hydrogens respectively.
If these had not been specified, the default action would have been to hold
the hydrogen U values at 1.2 times the equivalent isotropic U of the atoms to
which they are attached (1.5 for the -OH and methyl groups).

   The 'sigi.ins' file (which is provided as a test job) is as follows.  Note
that for instructions with both numerical parameters and atom names such as
HFIX and MPLA, is does not matter whether numbers or atoms come first, but the
order of the numerical parameters themselves (and in some cases the order of
the atoms) is important.

------------------------------------------------------------------------------

TITL SIGI in P-1
CELL 0.71073 6.652 7.758 8.147 73.09 75.99 68.40
ZERR 2 .002 .002 .002 .03 .03 .03
SFAC C H N O
UNIT 14 22 2 6           ! no LATT and SYMM needed for space group P-1

L.S. 4
EXTI 0.001               ! refine an isotropic extinction parameter
WGHT .060 0.15           ! (suggested by program in last job);  WGHT
OMIT 2 8 0               ! and OMIT are also based on previous output

BOND $H                  ! include H in bond lengths / angles table
CONF                     ! all torsion angles except involving hydrogen
FMAP 2                   ! Fo-Fc Fourier
PLAN -20                 ! printer plots and full analysis of peak list

HFIX 147 31 O2           ! initial location of -OH and -CH3 hydrogens from
HFIX 137 31 C10          ! circular Fourier, then refine torsion, U(H)=fv(3)

HFIX 93 21 N6            ! -NH2 in plane, xyz ride on N, U(H)=fv(2)
HFIX 23 21 C5 C11        ! two -CH2- groups, xyz ride on C, U(H)=fv(2)
HFIX 13 21 C7            ! tertiary CH, xyz ride on C, U(H)=fv(2)

EQIV $1 X-1, Y, Z        ! define symmetry operation and tabulate H-bond
RTAB H..O H2 O1_$1       ! distance and angle to symmetry equivalent of O1
RTAB XHY O2 H2 O1_$1      ! 'H..O' and 'XHY' are table headings

RTAB H..O H6A O1         ! include intramolecular H-bond in tables
RTAB XHY N6 H6A O1

EQIV $2 X+1, Y, Z-1      ! include a further intermolecular H-bond in the
RTAB H..O H6B O2_$2      ! same tables; involves symmetry equivalent of O2
RTAB XHY N6 H6B O2_$2
                                    ! l.s. planes through 5-ring and through
MPLA 5 C7 C11 C8 C4 O3 O1 N6 C9 C10 ! CNC=CCC moiety, then find deviations
MPLA 6 C10 N6 C9 C8 C11 C4 O1 O3 C7 ! of last 4 and 3 named atoms resp. too

FVAR 1 .06 .07           ! overall scale and free variables for U(H)

REM name sfac# x y z sof(+10 to fix it) U11 U22 U33 U23 U13 U12 follow

O1      4   0.30280   0.17175   0.68006  11.00000   0.02309   0.04802 =
        0.02540  -0.00301  -0.00597  -0.01547
O2      4  -0.56871   0.23631   0.96089  11.00000   0.02632   0.04923 =
        0.02191  -0.00958   0.00050  -0.02065
O3      4  -0.02274   0.28312   0.83591  11.00000   0.02678   0.04990 =
        0.01752  -0.00941  -0.00047  -0.02109
C4      1   0.10358   0.23458   0.68664  11.00000   0.02228   0.02952 =
        0.01954  -0.00265  -0.00173  -0.01474
C5      1  -0.33881   0.18268   0.94464  11.00000   0.02618   0.03480 =
        0.01926  -0.00311  -0.00414  -0.01624
N6      3   0.26405   0.17085   0.33925  11.00000   0.03003   0.04232 =
        0.02620  -0.01312   0.00048  -0.01086
C7      1  -0.25299   0.33872   0.82228  11.00000   0.02437   0.03111 =
        0.01918  -0.00828  -0.00051  -0.01299
C8      1  -0.03073   0.27219   0.55976  11.00000   0.02166   0.02647 =
        0.01918  -0.00365  -0.00321  -0.01184
C9      1   0.05119   0.24371   0.39501  11.00000   0.02616   0.02399 =
        0.02250  -0.00536  -0.00311  -0.01185
C10     1  -0.10011   0.29447   0.26687  11.00000   0.03877   0.04903 =
        0.02076  -0.01022  -0.00611  -0.01800
C11     1  -0.26553   0.36133   0.63125  11.00000   0.02313   0.03520 =
        0.01862  -0.00372  -0.00330  -0.01185

HKLF 4     ! read intensity data from 'sigi.hkl'; terminates '.ins' file

------------------------------------------------------------------------------

   The data reduction reports 1904 reflections read with -7 >= h >= 7,
-8 >= k >= 9 and -9 >= l >= 9.  Note that these are the limiting index values;
in fact only about 1.5 times the unique volume of reciprocal space was
measured.  The maximum 2-theta was 50.00, and there were no systematic absence
violations, 34 (not seriously) inconsistent equivalents, and 1297 unique data,
of which 1 was suppressed (by OMIT).  R(int) was 0.0196 and R(sigma) 0.0151.

   It will be seen that the program uses different default distances to
hydrogen for different bonding situations (these may be overridden by the user
if desired, of course).  These defaults depend on the temperature (set using
TEMP) in order to allow for librational effects.  The list of default X-H
distances is followed by the (squashed) circular difference electron syntheses
to determine the C-OH and C-CH3 initial torsion angles:

------------------------------------------------------------------------------

Default effective X-H distances for T =   20.0 C

AFIX m =    1     2     3     4   4[N]  3[N]  15[B]  8[O]   9   9[N]   16
d(X-H) =  0.98  0.97  0.96  0.93  0.86  0.89  1.10  0.82  0.93  0.86  0.93


Difference electron density (eA^-3x100) at 15 degree intervals for AFIX 147
group attached to O2.  The center of the range is eclipsed (cis) to C7 and
rotation is clockwise looking down C5 to O2
  -2  0  1  0  0  0 -1 -5 -8 -9 -6 -2  2  5  9 16 29 42 48 39 23  9  0 -2


Difference electron density (eA^-3x100) at 15 degree intervals for AFIX 137
group attached to C10.  The center of the range is eclipsed (cis) to N6 and
rotation is clockwise looking down C9 to C10
  34 37 39 41 38 30 20 15 19 28 39 47 50 43 29 15 12 19 29 35 33 27 25 29

After local symmetry averaging:   21  28  36  41  40  33  24  20

------------------------------------------------------------------------------

It can be seen that the hydroxyl hydrogen is very clearly defined, but that
the methyl group is rotating fairly freely (low potential barrier).  After
three-fold averaging, however, there is a single difference electron density
maximum.  The (squashed) least-squares refinement output follows:

------------------------------------------------------------------------------

Least-squares cycle 1  Maximum vector length =511 Memory required =1771/135569

wR2 =  0.1138 before cycle   1 for   1296 data and  105 /  105 parameters

GooF = S =     1.134;     Restrained GooF =      1.134  for      0 restraints

Weight = 1/[sigma^2(Fo^2)+(0.0600*P)^2+0.15*P] where P=(Max(Fo^2,0)+2*Fc^2)/3

    N      value        esd    shift/esd  parameter

    1     0.97914     0.00386    -5.406    OSF
    2     0.03486     0.00263    -9.959   FVAR  2
    3     0.07515     0.00396     1.048   FVAR  3
    4     0.02334     0.00951     2.349   EXTI

Mean shift/esd =   0.911    Maximum =  -9.959 for FVAR  2

Max. shift = 0.038 A for H10C      Max. dU =-0.026 for H5A

     .......... etc (cycles 2 and 3 omitted) .........


Least-squares cycle 4  Maximum vector length =511 Memory required =1771/135569

wR2 =  0.1044 before cycle   4 for   1296 data and  105 /  105 parameters

GooF = S =     1.025;     Restrained GooF =      1.025  for      0 restraints

Weight = 1/[sigma^2(Fo^2)+(0.0600*P)^2+0.15*P] where P=(Max(Fo^2,0)+2*Fc^2)/3

    N      value        esd    shift/esd  parameter

    1     0.97903     0.00361    -0.001    OSF
    2     0.03607     0.00178     0.022   FVAR  2
    3     0.07346     0.00379    -0.009   FVAR  3
    4     0.02502     0.01089    -0.004   EXTI

Mean shift/esd =   0.006    Maximum =  -0.182 for tors H10A

Max. shift = 0.003 A for H10B      Max. dU = 0.000 for H5A


Largest correlation matrix elements

    0.509 U12 O2 / U22 O2                   0.506 U12 O3 / U11 O3
    0.508 U12 O2 / U11 O2                   0.500 U12 O3 / U22 O3



Idealized hydrogen atom generation before cycle   5

Name     x       y       z    AFIX  d(X-H)  shift  Bonded   Conformation
                                                    to      determined by
H2   -0.6017  0.2095  0.8833  147   0.820   0.000   O2        C5  H2
H5A  -0.2721  0.0676  0.9001   23   0.970   0.000   C5        O2  C7
H5B  -0.2964  0.1554  1.0576   23   0.970   0.000   C5        O2  C7
H6A   0.3572  0.1389  0.4085   93   0.860   0.000   N6        C9  C8
H6B   0.3073  0.1559  0.2347   93   0.860   0.000   N6        C9  C8
H7   -0.3331  0.4598  0.8575   13   0.980   0.000   C7        O3  C5  C11
H10A -0.2044  0.4191  0.2694  137   0.960   0.000   C10       C9  H10A
H10B -0.1761  0.2034  0.2962  137   0.960   0.000   C10       C9  H10A
H10C -0.0176  0.2950  0.1525  137   0.960   0.000   C10       C9  H10A
H11A -0.3575  0.2948  0.6198   23   0.970   0.000   C11       C8  C7
H11B -0.3198  0.4943  0.5737   23   0.970   0.000   C11       C8  C7

------------------------------------------------------------------------------

   The final structure factor calculation, analysis of variance etc. produces
the following edited output:

------------------------------------------------------------------------------

Final Structure Factor Calculation for  SIGI in P-1

Total number of l.s. parameters = 105    Maximum vector length =  511

wR2 =  0.1044 before cycle   5 for   1296 data and    0 /  105 parameters

GooF = S =     1.025;     Restrained GooF =      1.025  for      0 restraints

Weight = 1/[sigma^2(Fo^2)+(0.0600*P)^2+0.15*P] where P=(Max(Fo^2,0)+2*Fc^2)/3

R1 =  0.0365 for   1189 Fo > 4.sigma(Fo)  and  0.0399 for all   1297 data
wR2 =  0.1060,  GooF = S =   1.042,  Restrained GooF =    1.042  for all data


Principal mean square atomic displacements U

  0.0504   0.0254   0.0188   O1
  0.0491   0.0229   0.0190   O2
  0.0513   0.0194   0.0165   O3
  0.0326   0.0208   0.0159   C4
  0.0375   0.0204   0.0190   C5
  0.0440   0.0320   0.0214   N6
  0.0329   0.0201   0.0185   C7
  0.0276   0.0190   0.0181   C8
  0.0288   0.0220   0.0191   C9
  0.0494   0.0353   0.0181   C10
  0.0353   0.0215   0.0183   C11


Analysis of variance for reflections employed in refinement
K = Mean[Fo^2] / Mean[Fc^2]  for group

Fc/Fc(max)     0.000 0.009 0.017 0.027 0.038 0.049 0.065 0.084 0.110 0.156 1.0

Number in group    135.  125.  130.  139.  119.  133.  130.  128.  131.  126.

           GooF   1.110 1.006 1.082 1.046 1.093 1.014 0.923 0.996 1.027 0.930

            K     1.521 1.121 0.966 1.023 1.008 0.990 0.998 0.998 1.008 1.010


Resolution(A)  0.84  0.88  0.90  0.95  0.99  1.06  1.14  1.25  1.44  1.79  inf

Number in group    136.  127.  128.  128.  136.  124.  128.  130.  130.  129.

           GooF   1.007 0.890 0.865 0.867 0.864 0.921 0.874 1.095 1.256 1.432

            K     1.024 1.013 1.017 0.990 0.991 0.989 1.013 0.995 1.037 1.004

            R1    0.062 0.049 0.051 0.046 0.034 0.034 0.031 0.039 0.039 0.037


Recommended weighting scheme:  WGHT   0.0548   0.1468

------------------------------------------------------------------------------

The analysis of variance should be examined carefully for indications of
systematic errors.  If the Goodness of Fit is significantly higher than unity
and the scale factor K is appreciably lower than unity in the extreme right
columns in terms of both Fc and resolution, then an extinction parameter
should be refined (the program prints a warning in such a case).  This does
not show here because an extinction parameter is already being refined.  The
scale factor is a little high for the weakest reflections in this example;
this may well be a statistical artifact and may be ignored (selecting the
groups on Fc will tend to make Fo^2 greater than Fc^2 for this range).  The
increase in the GooF at low resolution (the 1.79 to infinity range) is caused
in part by systematic errors in the model such as the use of scattering
factors based on spherical atoms which ignore bonding effects, and is normal
for purely light-atom structures (this interpretation is confirmed by the fact
that difference electron density peaks are found in the middle of bonds). In
extreme cases the lowest or highest resolution ranges can be conveniently
suppressed by means of the SHEL instruction; this is normal practice in
macromolecular refinements.

   The weighting scheme suggested by the program is designed to produce a flat
analysis of variance in terms of Fc, but makes no attempt to fit the
resolution dependence of the Goodness of Fit.  It is also written to the end
of the .res file, so that it is easy to update it before the next job. In the
early stages of refinement it is better to retain the default scheme of
WGHT 0.1; the updated parameters should not be incorporated in the next '.ins'
file until all atoms have been found and at least the heavier atoms refined
anisotropically.

   The list of most disagreeable reflections and tables of bond lengths and
angles (BOND $H - omitted here) and torsion angles (CONF) are followed by the
RTAB and MPLA tables:

------------------------------------------------------------------------------

Selected torsion angles

 -175.08 ( 0.12)  C7 - O3 - C4 - O1
    5.72 ( 0.15)  C7 - O3 - C4 - C8
  109.70 ( 0.12)  C4 - O3 - C7 - C5
  -11.64 ( 0.15)  C4 - O3 - C7 - C11
  171.12 ( 0.10)  O2 - C5 - C7 - O3
  -72.04 ( 0.15)  O2 - C5 - C7 - C11
   -1.47 ( 0.24)  O1 - C4 - C8 - C9
  177.61 ( 0.12)  O3 - C4 - C8 - C9
 -176.27 ( 0.14)  O1 - C4 - C8 - C11
    2.81 ( 0.16)  O3 - C4 - C8 - C11
    3.09 ( 0.22)  C4 - C8 - C9 - N6
  176.93 ( 0.13)  C11 - C8 - C9 - N6
 -177.23 ( 0.13)  C4 - C8 - C9 - C10
   -3.38 ( 0.22)  C11 - C8 - C9 - C10
  176.04 ( 0.13)  C9 - C8 - C11 - C7
   -9.39 ( 0.14)  C4 - C8 - C11 - C7
   12.36 ( 0.14)  O3 - C7 - C11 - C8
 -104.74 ( 0.13)  C5 - C7 - C11 - C8


Distance H..O

     2.041 (0.003)  H2 - O1_$1
     2.225 (0.002)  H6A - O1
     2.172 (0.002)  H6B - O2_$2


Angle XHY

  174.03 (2.37)  O2 - H2 - O1_$1
  129.29 (0.05)  N6 - H6A - O1
  155.07 (0.05)  N6 - H6B - O2_$2



Least-squares planes (x,y,z in crystal coordinates) and deviations from them
(* indicates atom used to define plane)

 2.344 (0.004) x + 7.411 (0.004) y - 0.015 (0.005) z = 1.978 (0.004)

*   -0.074 (0.001)  C7
*    0.068 (0.001)  C11
*   -0.042 (0.001)  C8
*   -0.006 (0.001)  C4
*    0.054 (0.001)  O3
    -0.006 (0.002)  O1
    -0.098 (0.003)  N6
    -0.056 (0.002)  C9
    -0.031 (0.003)  C10

Rms deviation of fitted atoms =   0.055


 2.544 (0.004) x + 7.349 (0.004) y - 0.166 (0.004) z = 1.863 (0.003)

Angle to previous plane (with approximate esd) =  2.45 ( 0.07 )

*    0.005 (0.001)  C10
*    0.008 (0.001)  N6
*   -0.005 (0.001)  C9
*   -0.034 (0.001)  C8
*    0.013 (0.001)  C11
*    0.012 (0.001)  C4
     0.057 (0.002)  O1
     0.021 (0.002)  O3
    -0.154 (0.002)  C7

Rms deviation of fitted atoms =   0.016

------------------------------------------------------------------------------

All esd's printed by the program are calculated rigorously from the full
covariance matrix, except for the angle between two least-squares planes,
which involves some approximations.  The contributions to the esds in bond
lengths, angles and torsion angles also take the errors in the unit-cell
parameters (as input on the ZERR instruction) rigorously into account; an
approximate treatment is used to obtain the (rather small) contributions of
the cell errors to the esds involving least-squares planes.

   The free torsional motion of H2 is virtually at right angles to the fairly
linear hydrogen bond, so the O-H..O angle has a large esd.  On the other hand
the 'riding model' constraint applied to the N-H bonds effectively prevents
the estimation of a meaningful esd in the two N-H..O angles, hence the
unrealistically small values for these two esds.

   There follows the difference electron density synthesis and line printer
'plot' of the structure and peaks.  The highest and lowest features are 0.28
and -0.17 eA^-3 respectively, and the rms difference electron density is 0.04.
These values confirm that the treatment of the hydrogen atoms was adequate,
and are indeed typical for routine structure analysis of small organic
molecules.  This output is too voluminous to give here, and indeed users of
the Siemens SHELXTL molecular graphics program XP will almost always suppress
it by use of the default option of a positive number on the PLAN instruction,
and employ interactive graphics instead for analysis of the peak list.

==============================================================================


                  THE REFLECTION DATA FILE 'name.hkl'

The '.hkl' file consists of one line per reflection in FORMAT(3I4,2F8.2,I4)
for h,k,l,Fo^2,sigma(Fo^2), and batch number.  This file should be terminated
by a record with all items zero; individual data sets within the file should
NOT be separated from one another - the batch numbers serve to distinguish
between groups of reflections for which separate scale factors are to be
refined (see the BASF instruction).  The reflection order and the batch number
order is unimportant.  This '.hkl' file is read each time the program is run;
unlike SHELX-76, there is no facility for intermediate storage of binary data.
This enhances computer independence and eliminates several possible sources of
confusion.  The '.hkl' file is read after the HKLF instruction (which
terminates the '.ins' file) has been interpreted.  The HKLF instruction
specifies the format of the '.hkl' file, and allows scale factors and a
reorientation matrix to be applied.  For further details see the specification
of the HKLF instruction.

   Lorentz, polarization and absorption corrections are assumed to have been
applied to the data in the '.hkl' file.  If SHELXA is used for the absorption
corrections, it will have read a file name.raw (containing direction cosines)
and written 'name.hkl' (without cosines).  Since SHELXA can read a SHELXL-93
'.ins' file, empirical absorption corrections (which require SHELXA to
calculate Fc) may be applied more than once to the original data in the course
of a structure determination simply by running SHELXA immediately before
SHELXL-93 with the same '.ins' file.  Note that there are special extensions
to the '.hkl' format for Laue and powder data, as well as for twinned crystals
which cannot be handled by a TWIN instruction alone.

   In general the '.hkl' file should contain all measured reflections without
rejection of systematic absences or merging of equivalents.  The systematic
absences and R(int) for equivalents provide an excellent check on the space
group assignment and consistency of the input data.  Since complex scattering
factors are used throughout by SHELXL-93 it is important NOT to average
Friedel opposites in preparing this file.


             WHY DOES SHELXL-93 REFINE AGAINST F-SQUARED ?

Traditionally most crystal structures have been refined against F.  For a
well-behaved structure the geometrical parameters and their esd's are almost
identical for refinement based on all Fo^2 values and for an old-fashioned
refinement against F ignoring data with Fo less than (say) 3.sigma(Fo).  For
weakly diffracting crystals and in particular for pseudosymmetry problems the
refinement against all data is demonstrably superior.  The esd's are reduced
because more experimental information is used, and the chance of getting stuck
in a local minimum is reduced.  In addition, the use of a threshold introduces
a systematic error which introduces bias into the displacement parameters Uij.
On the other hand, it is impossible to refine on F using ALL data, because it
would involve taking the square root of a negative number for reflections with
negative Fo^2 (i.e. background higher than the peak as a result of statistical
fluctuations), and because the estimation of sigma(Fo) from sigma(Fo^2) for
small or negative Fo^2 is a difficult statistical problem which requires the
assumption of a probability distribution function for the F-values.  In the
case of pseudosymmetric structures - i.e. the very case where the weak
reflections are most important - this distribution function is not known a
priori, making it impossible to derive 'correct' sigma(Fo) values and hence
correct weights.

   The diffraction experiment measures intensities and their standard
deviations, which after the various corrections give Fo^2 and sigma(Fo^2). If
your data reduction program only outputs Fo and sigma(Fo), which as explained
above involves serious approximations for weak reflections, you MUST CORRECT
YOUR DATA REDUCTION PROGRAM, not simply write a routine to square the Fo
values or use HKLF 3 to input Fo and sigma(Fo) to SHELXL-93 (although the
latter is legal).  Note that if an Fo^2 value is too large to fit format F8.2,
then format F8.0 may be used instead - the decimal point overrides the FORTRAN
format specification.

   The use of a threshold for ignoring weak reflections may introduce bias
which primarily affects the atomic displacement parameters; it is only
justified to speed up the early stages of refinement.  In the final refinement
ALL DATA should be used except for reflections known to suffer from systematic
error (i.e. in the final refinement the OMIT instruction may be used to omit
specific reflections - although not without good reason - but not ALL
reflections below a given threshold).  Anyone planning to ignore this advice
should read F. L. Hirshfeld and D. Rabinovich, Acta Cryst., A29 (1973) 510-513
and L. Arnberg, S. Hovmoller and S. Westman, Acta Cryst., A35 (1979) 497-499
first.  Refinement against F^2 also facilitates the treatment of twinned and
powder data, and the determination of absolute structure.

   One cosmetic disadvantage of refinement against F^2 is that R-indices based
on F^2 are larger than (often about double) those based on F.  For comparison
with older refinements based on F and an OMIT threshold, a conventional index
R1 based on observed F values larger than 4.sigma(Fo) is also printed.  The
deviation of the Goodness of Fit (S) from unity also tends to be magnified
when calculated with F^2.

   Throughout the output, R indices based on F^2 are denoted R2 and those
based on F are denoted R1, e.g.

 wR2 = [ Sigma[w(Fo^2-Fc^2)^2] / Sigma[w(Fo^2)^2] ]^0.5

 R1 = Sigma||Fo|-|Fc|| / Sigma|Fo|

For details of the weights w see 'WGHT' below.  The Goodness of Fit (S) is
always based on F^2:

 GooF = S = [ Sigma [ w(Fo^2-Fc^2)^2 ] / (n-p) ]^0.5

where n is the number of reflections and p is the total number of parameters
refined.  In the 'Restrained Goodness of Fit', Sigma[w(yt-y)^2] is added
to the numerator and the number of restraints is added to the denominator.
This corresponds to treating each restraint as an extra observational equation
with weight w = 1/sigma^2.  y is the quantity (e.g. a bond length) being
restrained and yt is its target value.  In these expressions, Sigma is written
with a capital S to indicate a summation and a small s for an estimated
standard deviation (corresponding to the use of capital and small Greek
letters for sigma).

   In general most statistical quantities are defined as in the I.U.Cr.
Commission's report:  'Statistical Descriptors in Crystallography',
D. Schwarzenbach et al., Acta Cryst., A45 (1989) 63-75.


                            CIF ARCHIVE FORMAT

   The CIF format represents a major step forward in the archiving,
publication and communication of crystallographic data.  At last it is
possible to publish crystal structures and incorporate structural data into
the crystallographic databases without the expensive and error-prone retyping
of tables by hand.  CIF format also provides a convenient method of
transferring data from one program system to another.  The ACTA instruction
instructs SHELXL-93 to write two CIF-format files: 'name.fcf' contains the
reflection data and 'name.cif' all other data.  These files contain all the
items needed for archiving the structure; those answers not known to SHELXL-93
(e.g. the color of the crystal) are left as a question mark.  In general the
final 'name.cif' file should be edited using any text editor to replace most
of these question marks.  The file is then suitable for deposition in the
CSD (organic) and ICSD (inorganic crystal structure) databases.

   For publication via electronic mail it will normally be necessary to add
the authors' names, title, text etc., which may also be done in CIF-format;
this is followed by the edited contents of one or more '.cif' files each
describing one structure (or possibly the same structure at different
temperatures etc.).  An example of a paper submitted to Acta Cryst. in this
way is provided in Appendix D.  At the time of writing it is necessary to
send the diagram and Fo/Fc tables by post, though in principle the '.fcf'
file is suitable for the direct submission of the Fo/Fc data in CIF-format.
SHELXL-93 users are strongly recommended to familiarize themselves with the
definitive paper by the I.U.Cr. Commission on Crystallographic Data:  S.R.
Hall, F.H. Allen and I.D. Brown, Acta Cryst., A47 (1991) 655-685.

   The auxiliary program CIFTAB is provided with SHELXL-93 to facilitate the
transition to CIF.  It enables the '.cif' output file from SHELXL-93 to be
extended by adding CIF information from other (e.g. diffractometer data
processing) programs, and enables a variety of tables to be produced (e.g.
crystal data, coordinates, bond lengths and angles, and structure factors)
for padding out Ph.D. theses and submission to Journals that have not yet
seen the light.  Further details of CIFTAB may be found in Appendix C.



                        TREATMENT OF HYDROGEN ATOMS

   It is difficult to locate hydrogen atoms accurately using X-ray data
because of their low scattering power and lack of core electrons, and because
the valence electron density is asymmetrical and is not centered at the
position of the nucleus (which can be determined by neutron diffraction).  In
addition hydrogen atoms tend to have larger vibrational and librational
amplitudes than other atoms.  For many purposes it is preferable to calculate
the hydrogen positions according to well-established geometrical criteria and
then to adopt a refinement procedure which ensures that a sensible geometry is
retained.

   SHELXL-93 provides a bewildering selection of (AFIX and HFIX) options for
positioning and refining hydrogen atoms, as detailed in the section 'atom
lists and least-squares constraints'.  For routine refinement, however, the
riding model is a good choice for tertiary CH (HFIX 13), secondary CH2
(HFIX 23), ethylenic =CH2 (HFIX 93), acetylenic CH (HFIX 163), BH in
polyhedral boranes (HFIX 153), and aromatic CH or amide NH (HFIX 43).  The
hydrogen coordinates are re-idealized before each cycle, and 'ride' on the
atoms to which they are attached (i.e. the coordinate shifts are the same for
both).  In this riding model, the C-H vector remains constant in magnitude and
direction, but its origin, i.e. the position of the carbon atom in the
unit-cell, may move.  Both C and H contribute to the derivative calculations
which improves convergence.  Alternatively AFIX (or HFIX) 14 etc. performs a
similar riding refinement but allows the C-H distance to vary as well (keeping
the C-H distances equal within a CH2 or CH3 group). It is possible to use SADI
or DFIX to restrain chemically equivalent C-H distances involving different
carbons to be equal.

   Methyl and hydroxyl groups are more difficult to position accurately.  If
good (low-temperature) data are available the method of choice is HFIX 137 for
-CH3 and HFIX 147 for -OH groups; in this approach, a difference electron
density synthesis is calculated around the circle which represents the loci of
possible hydrogen positions (for a fixed X-H distance and Y-X-H angle).  The
maximum electron density (in the case of a methyl group after local threefold
averaging) is then taken as the starting position for the hydrogen atom(s).
In subsequent refinement cycles (and in further least-squares jobs) the
hydrogens are re-idealized at the start of each cycle, but the current torsion
angle is retained; the torsion angles are allowed to refine whilst keeping
the X-H distance and Y-X-H angle fixed.  If unusually high quality data
are available, AFIX 138 would allow the refinement of a common C-H distance
for a methyl group but not allow it to tilt; a variable metric rigid group
refinement (AFIX 9 for the carbon followed by AFIX 135 before the first H)
would allow it to tilt as well, but still retain tetrahedral H-C-H angles and
equal C-H distances within the group.

   If the data quality is less good, then the refinement of torsion angles
may not converge very well.  In such cases the hydrogens can be positioned
geometrically and refined using a riding model by HFIX 33 for methyl and
HFIX 83 for hydroxyl groups.  This staggers the methyl groups, and -OH groups
attached to saturated carbons, as well as possible; -OH groups attached to
aromatic rings are placed in one of the two positions in the plane.  In either
-OH case the choice of hydrogen position is then determined by best hydrogen
bond (to an N, O, Cl or F atom) which can be created.  For disordered methyl
groups (with two sites rotated by 60 degrees from one another) HFIX 123 is
recommended, possibly with refinement of the corresponding site occupation
factors via a 'free variable' so that their sum is unity (e.g. 21 and -21).

   The choice of a suitable (default) O-H distance is very difficult.  O-H
internuclear distances for isolated molecules in the gas phase are about 0.96
Angstroms (cf. 1.10 for C-H), but the appropriate distance to use for X-ray
diffraction must be appreciably shorter to allow for the displacement of the
center of gravity of the electron distribution towards the oxygen atom, and
also for librational effects.  Although the (temperature dependent) value
assumed by the program fits reasonably well for O-H groups in predominantly
organic molecules, appreciably longer O-H distances are appropriate for low
temperature studies of strongly (cooperatively) hydrogen bonded systems -
short H..O distances are always associated with long O-H distances.  If there
are many such O-H groups and good quality data are available, HFIX 88 (or 148)
plus SADI restraints to make all the O-H distances approximately equal (with
an esd of say 0.01) is a good approach.

   Hydrogen atoms may also 'ride' on atoms in rigid groups (unlike SHELX-76);
for example HFIX 43 could reference carbon atoms in a rigid phenyl ring.  In
such a case further geometrical restraints (SADI, SAME, DFIX, FLAT) are not
permitted on the hydrogen atoms; this is the only exception to the general
rule that any number of restraints may be applied to any atom, whatever
constraints are also being applied to it.  This is much more general than in
SHELX-76.

   If the hydrogen atoms are generated using HFIX, the standard option is to
set the isotropic U's to -1.2 (-1.5 for methyl and hydroxyl) which is
interpreted as 1.2 (or 1.5) times the equivalent isotropic displacement
parameter of the last atom which did not use this facility. A good alternative
is to use 'free variables' to constrain the U values of chemically equivalent
hydrogens to be equal.

   Hydrogen atoms are identified as such by their scattering factor numbers,
which must correspond to a SFAC name H (or $H).  Other elements which need
to be specifically identified (e.g. so that HFIX 43 can use different default
C-H and N-H distances) are defined similarly.  However for the output of the
PLAN instruction, hydrogen atoms are identified as those atoms with a radius
of less the 0.4 Angstroms (this is not as illogical as it may sound; the PLAN
output is concerned with potential hydrogen bonds etc., not with the
scattering power of an atom, and SHELXL-93 has to handle neutron as well as
X-ray data).

   OMIT $H (or OMIT_* $H if residues are employed) combined with L.S. 0,
FMAP 2 and PLAN -100 enables an 'omit map' to be calculated, which is a
convenient way of checking whether there are actually electron density peaks
close to the calculated hydrogen positions.  In this omit map, the hydrogen
atoms are retained but do not contribute to Fc; if a non-zero electron density
appears in the 'Peak' column for one of these hydrogens in the Fourier output,
then there was an actual peak in the difference electron density synthesis
within 0.31 Angstroms of the expected hydrogen position.

   There are a number of operations in SHELXL-93 in which hydrogen atoms are
treated specially, for example in the connectivity array, in atom lists
defined using the '>' and '<' symbols, in the atoms following the SAME, ANIS
and AFIX instructions, and in the output generated by PLAN.  This approach
is very convenient for the vast majority of structure refinements.  However it
may be useful to know how the program decides which atoms are 'hydrogens'
in order to be able to treat hydrogens as normal atoms.  The program scans the
SFAC instructions (either format) for an element named 'H', and if one is
found, treats all atoms with this scattering factor number specially.  If two
or more scattering factors are named 'H', only the last one gets this special
treatment, which provides a way of tricking the program into allowing both
'normal' and 'special' hydrogens.  Similarly for neutron data, where an SFAC
instruction is needed for each element anyway, one could if desired suppress
the special treatment of hydrogens by labeling their SFAC instruction 'Hyd' or
even 'D'.


           RESTRAINTS, CONSTRAINTS AND GROUP FITTING, AND DISORDER

In crystal structure refinement, there is an important distinction between a
'constraint' and a 'restraint'.  A constraint is an exact mathematical
condition which enables one or more least-squares variables to be expressed
exactly in terms of other variables or constants, and hence eliminated.  An
example is the fixing of the x, y and z coordinates of an atom on an inversion
center.  A restraint takes the form of additional information which is not
exact but is subject to a probability distribution; for example we could
restrain two chemically but not crystallographically equivalent bonds to be
approximately equal, with an effective standard deviation of (say) 0.01
Angstroms.

   A restraint is incorporated in the least-squares refinement as if it were
an additional experimental observation;  w(yt-y)^2 is added to the quantity
Sigma[w(Fo^2-Fc^2)^2] to be minimized, where a quantity y (which is a function
of the least-squares parameters) is to be restrained to a target value yt, and
the weight w (for either a restraint or a reflection) is 1/sigma^2.  In the
case of a reflection sigma^2 is estimated using a weighting scheme; for a
restraint sigma is simply the effective standard deviation.  In SHELXL-93 the
restraint weights are multiplied by the square of the Goodness of Fit for the
reflection data, which allows for the possibility that the reflection weights
may be relative rather than absolute, and also gives the restraints more
influence at the early stages of refinement (when the Goodness of Fit is
invariably much greater than unity), which improves convergence.

   Most of the constraints and restraints available in SHELXL-93 have already
been widely used in other programs, especially for macromolecular refinement.
In SHELXL-93 an effort has been made to make them simple to understand and
use, while at the same time avoiding the bias which is introduced when
specific target values etc. have to be assumed.  For example it is more
realistic to assume that a phenyl group is planar and has mm (C2v) symmetry
(in both cases within a reasonable tolerance) rather than that it is an
exactly regular hexagon with a bond length of 1.39 Angstroms; however both
approaches may conveniently be applied using SHELXL-93.  The following general
categories of constraints and restraints are available using SHELXL-93:

1. Constraints for the coordinates and anisotropic displacement parameters for
atoms on special positions: these are generated automatically by the program
for ALL special positions in ALL space groups, in conventional settings or
otherwise.  If the user applies (correct or incorrect) special position
constraints using free variables etc., the program assumes this has been done
with intent and reports but does not apply the correct constraints.  Thus the
accidental application of a free variable to a Uij term of an atom on a
special position can lead to the refinement 'blowing up' !

2. Two or more atoms sharing the same site: the xyz and Uij parameters may be
equated using the EXYZ and EADP constraints respectively (or by using 'free
variables').  The occupation factors may be expressed in terms of a 'free
variable' so that their sum is constrained to be constant (e.g. 1.0). If more
than two different chemical species share a site, a linear free variable
restraint (SUMP) is required to restrain the sum of occupation factors.  EADP
is also useful for equating the Uij of 'opposite' fluorines of disordered
-CF3 groups.

3. Floating origin restraints: these are generated automatically by the
program as and when required by the method of H.D. Flack and D. Schwarzenbach,
Acta Cryst., A44 (1988) 499-506, so the user should not attempt to fix the
origin in such cases by fixing the coordinates of a heavy atom.

4. Geometrical constraints: these include rigid-group refinements (AFIX 6),
variable-metric rigid-group refinements (AFIX 9) and various riding models
(AFIX/HFIX) for hydrogen atom refinement, for example torsional refinement of
a methyl group about the local threefold axis.

5. Fragments of known geometry may be fitted to target atoms (e.g. from a
previous Fourier peak search), and the coordinates generated for any missing
atoms.  Four standard groups are available: regular pentagon, regular hexagon,
naphthalene and pentamethylcyclopentadienyl; any other group may be used
simply by specifying orthogonal or fractional coordinates in a given cell
(AFIX mn with m > 16 and FRAG...FEND).  This is usually, but not always, a
preliminary to rigid group refinement.

6. Geometrical restraints: a particularly useful restraint is to make
chemically but not crystallographically equivalent distances equal (subject
to a given or assumed esd) without having to invent a value for this distance
(SADI).  The SAME instruction can be used to generate such restraints
automatically, e.g. when chemically identical molecules or residues are
present.  This has the same effect as making equivalent bond lengths and
angles but not torsion angles equal.  The FLAT instruction restrains a group
of atoms to lie in a plane (but the plane is free to move and rotate).  DFIX
and CHIV restrain distances and chiral volumes respectively to target values.
When 'free variables' are used for the target values, it is possible to
restrain different distances etc. to be equal and to refine their mean value
(for which an esd is thus obtained).  ALL types of geometrical restraints may
involve ANY atom, even if it is part of a rigid group or a symmetry equivalent
generated using EQIV $n ... and referenced by _$n, except for hydrogen atoms
which ride on rigid group atoms (see preceding section).

7. 'Anti-bumping' restraints may be applied individually, by means of DFIX
distance restraints with the distance given as a negative number, or generated
automatically by means of the BUMP instruction, which operates on all atoms
which have been designated by 'CONN 0' instructions (and so are excluded from
the connectivity array).  DFIX restraints with negative distance d are ignored
if the two atoms are further from one another than |d| in the current
refinement cycle; if they are closer than |d|, a restraint is applied to
increase the distance to |d| with the given (or assumed) esd.  The automatic
generation of anti-bumping restraints takes all possible symmetry equivalents
into account, and allows a safety margin of 0.5 A so that atoms which move
towards one another during the refinement are also covered.  In combination
with the SWAT instruction for diffuse solvent, BUMP provides a very effective
way of handling solvent water in macromolecules.

8. Restraints on anisotropic displacement parameters: three different types of
restraint may be applied to Uij values.  DELU applies a 'rigid-bond' restraint
to Uij of two bonded (or 1,3) atoms; the anisotropic displacement
components of the two atoms along the line joining them are restrained to be
equal.  This restraint was suggested by J.S. Rollett (in Crystallographic
Computing, Ed. F.R. Ahmed, S.R. Hall and C.P. Huber, Munksgaard, Copenhagen,
(1970) pp. 167-181), and corresponds to the rigid-bond criterion for testing
whether anisotropic displacement parameters are physically reasonable (F.L.
Hirshfeld, Acta Cryst., A32 (1976) 239-244; K.N. Trueblood and J.D. Dunitz,
Acta Cryst., B39 (1983) 120-133).  J.J. Didisheim and D. Schwarzenbach (Acta
Cryst., A43 (1987) 226-232) have shown that in many but not all cases, rigid-
bond restraints are equivalent to the TLS description of rigid body motion in
the limit of zero esd's; however this requires that (almost) all atom pairs
are restrained in this way, which for molecules with conformational
flexibility is unlikely to be appropriate.  An extensive study (E. Irmer,
Ph.D. Thesis, University of Goettingen, 1990) has shown that this condition is
fulfilled within the experimental error for routine X-ray studies of bonds and
1,3-distances between two first-row elements (B to F inclusive), and so may be
applied as a 'hard' restraint (low esd).  A rigid bond restraint is not
suitable for systems with unresolved disorder, e.g. AsF6- anions and dynamic
Jahn-Teller effects, although it may be useful in detecting such effects.

   Isolated (e.g. solvent water) atoms may be restrained to be approximately
isotropic, e.g. to prevent them going 'non-positive-definite'; this is a rough
approximation and so should be applied as a 'soft' restraint with a large esd
(ISOR).  Similarly the assumption of 'similar' Uij values for spatially
adjacent atoms (SIMU) is useful so that (for example) the thermal ellipsoids
increase and change direction gradually going along a side-chain in a
polypeptide, but this treatment is approximate and thus also appropriate only
for a soft restraint; it is also useful for partially overlapping atoms of
disordered groups.  A simple way to apply SIMU to all such overlapping atoms
is to give a SIMU instruction with no atoms (i.e. all atoms implied) and the
third number set to a distance less than the shortest bond, i.e.

 SIMU 0.02 0.04 0.8

which applies the restraint to all pairs of atoms separated by less than 0.8
Angstroms.  Additional SIMU restraints may be included in the same job.

   SHELXL-93 does not permit DELU, SIMU and ISOR restraints to reference
symmetry generated atoms, although this is allowed for all geometrical
restraints.  To permit such references for displacement parameter restraints
as well would considerably complicate the program, and is rarely required in
practice.

9. 'Shift limiting restraints' may be applied in SHELXL-93 by the Marquardt
algorithm (J. Soc. Ind. Appl. Math., 11 (1963) 431-441).  Terms proportional
to a 'damping factor' (the first parameter on the DAMP instruction) are added
to the least-squares matrix before inversion.  Shift limiting restraints are
particularly useful in the refinement of structures with a poor data to
parameter ratio, and for pseudosymmetric problems. The 'damping factor' should
be reduced towards the end of the refinement, otherwise the least-squares
estimates of the esd's in the less well determined parameters will be too low
(the program does however make a first order correction to the esds for this
effect).  The shifts are also scaled down if the maximum shift/esd exceeds the
second DAMP parameter.  In addition, if the actual and target values for a
particular restraint differ by more than 100 times the given esd, the program
will temporarily increase the esd to limit the influence of this restraint in
any one cycle to that produced by a discrepancy of 100 times the esd.  This
helps to prevent a bad initial model and tight restraints from causing
dangerously large shifts in the first cycle.

10. Further constraints may be applied to atom coordinates, occupation and
displacement parameters, and to restrained distances (DFIX) and chiral volumes
(CHIV), by the use of 'free variables'.  Linear combinations of free variables
may in turn be restrained (SUMP).  Free variables were required for special
position constraints and for refining more than one atom on the same site in
SHELX-76; their use in this way is allowed (for upwards compatibility) in
SHELXL-93, but it is more convenient to use the fully automatic handling of
special positions in SHELXL-93, and atoms on multiply occupied sites may be
constrained using EXYZ and EADP.  For further details see the description of
the FVAR instruction.

   A major advantage of applying chemically reasonable restraints is that a
subsequent difference electron density synthesis is often more revealing,
because the parameters were not allowed to 'mop up' any residual effects.  The
refinement of pseudosymmetric structures, where the X-ray data may not be able
to determine all of the parameters, is also considerably facilitated, at the
cost of making it much easier to refine a structure in a space group of
unnecessarily low symmetry!

   By way of example, assume that the structure contains a cyclopentadienyl
(Cp) ring pi-bonded to a metal atom, and that as a result of the high thermal
motion of the ring only three of the atoms could be located in a difference
electron density map.  We wish to fit a regular pentagon (default C-C 1.42 A)
in order to place the remaining two atoms, which are input as dummy atoms with
zero coordinates.  Since the C-C distance is uncertain (there may well be an
appreciable librational shortening in such a case) we refine the C5-ring as a
'variable metric' rigid group, i.e. it remains a regular pentagon but the C-C
distance is free to vary.  In SHELXL-93 this may all be achieved by inserting
one instruction (AFIX 59) before the five carbons and one (AFIX 0) after them:

 AFIX 59                 ! AFIX mn with m = 5 to fit pentagon (default C-C
 C1 1 .6755 .2289 .0763  ! 1.42 A) and n = 9 for v-m rigid-group refinement
 C2 1 .7004 .2544 .0161
 C3 1 0 0 0              ! the coordinates for C3 and C4 are obtained by the
 C4 1 0 0 0              ! fit of the other 3 atoms to a regular pentagon
 C5 1 .6788 .1610 .0766
 AFIX 0                  ! terminates rigid group

Since Uij values were not specified, the atoms would refine isotropically
starting from U = 0.05.  To refine with anisotropic displacement parameters in
the same or a subsequent job, the instruction:

 ANIS C1 > C5

should be inserted anywhere before C1 in the '.ins' file.  The SIMU and ISOR
restraints on the Uij would be inappropriate for such a group, but:

 DELU C1 > C5

could be applied if the anisotropic refinement proved unstable.  The five
hydrogen atoms could be added and refined with the 'riding model' by means of:

 HFIX 43 C1 > C5

anywhere before C1 in the input file.  For good data, in view of possible
librational effects, a possible alternative would be:

 HFIX 44 C1 > C5
 SADI 0.02 C1 H1 C2 H2 C3 H3 C4 H4 C5 H5

(which retains a riding model but allows the C-H bond lengths to refine,
subject to the restraint that they should be equal within about 0.02 A).

   In analogous manner it is possible to generate missing atoms and perform
rigid group refinements for phenyl rings (AFIX 66) and Cp* groups (AFIX 109).
Very often it is possible and desirable to remove the rigid group constraints
(by simply deleting the AFIX instructions) in the final stages of refinement;
there is good experimental evidence that the ipso-angles of phenyl rings
differ systematically from 120 degrees [P.G. Jones, J. Organomet. Chem., 345
(1988) 405; T. Maetzke and D. Seebach, Helv. Chim. Acta, 72 (1989) 624-630;
A. Domenicano, Accurate Molecular Structures, eds. Domenicano and Hargittai,
Chapter 18, OUP 1992].

   As a second example, assume that the structure contains two molecules of
poorly defined THF solvent, and that we have managed to identify the oxygen
atoms.  A rigid pentagon would clearly be inappropriate here, except possibly
for placing missing atoms, since THF molecules are not planar.  However we can
RESTRAIN the 1,2- and the 1,3-distances in the two molecules to be similar by
means of a 'similarity restraint' (SAME).  Assume that the molecules are
numbered O11 C12 ... C15 and O21 C22 ... C25, and that the atoms are given in
this order in the atom list.  Then we can either insert the instruction:

 SAME O21 > C25

before the first molecule, or:

 SAME O11 > C15

before the second.  These SAME instructions define a group of five atoms
which are considered to be the same as the five (non-hydrogen) atoms which
immediately follow the SAME instruction.  The entries in the connectivity
table for the latter are used to define the 1,2- and 1,3-distances, so the
SAME instruction should be inserted before the group with the best geometry.
This one SAME instruction restrains five pairs of 1,2- and five pairs of 1,3-
distances to be nearly equal, i.e.

 d(O11-C12) = d(O21-C22),  d(C12-C13) = d(C22-C23),  d(C13-C14) = d(C23-C24),
 d(C14-C15) = d(C24-C25),  d(C15-O11) = d(C25-O21),  d(O11-C13) = d(O21-C23),
 d(C12-C14) = d(C22-C24),  d(C13-C15) = d(C23-C25),  d(C14-O11) = d(C24-O21),
 and  d(C15-C12) = d(C25-C22).

In addition, it would also be reasonable to restrain the distances on opposite
sides of the same ring to be equal.  This can be achieved with one further
SAME instruction in which we count the other way around the ring.  For example
we could insert:

 SAME O11 C15 < C12

before the first ring.  The symbol '<' indicates that one must count up the
atom list instead of down.  The above instruction is exactly equivalent to:

 SAME O11 C15 C14 C13 C12

This generates 10 further restraints, but two of them [d(C13-C14) = d(C14-C13)
and d(C12-C15) = d(C15-C12)]  are identities, and each of the others appears
twice, so only four are independent and the rest are ignored.  It is not
necessary to add a similar instruction before the second ring, because the
program also automatically generates all 'implied' restraints, i.e. restraints
which can be derived by combining two existing distance restraints which refer
to the same atom pair.

   In contrast to other restraint instructions, the SAME instructions must be
inserted at the correct positions in the atom list.  These similarity
restraints provide a very general and powerful way of exploiting non-
crystallographic symmetry; in this example two instructions suffice to
restrain the THF molecules so that they have (within an assumed standard
deviation) twofold symmetry and are the same as each other.  However we have
not imposed planarity on the rings nor restricted any of the torsion angles.

   To complicate matters, let us assume that the two molecules are two
alternative conformations of a THF molecule disordered on a single site.  We
must then ensure that the site occupation factors of the two molecules add to
unity, and that no spurious bonds linking them are added to the connectivity
table.  The former is achieved by employing site occupation factors of 21
(i.e. 1 times free-variable 2) for the first molecule and -21 [ 1*(1-fv(2)) ]
for the five atoms of the second molecule.  Free variable 2 is then the
occupation factor of the first molecule; its starting value must be specified
on the FVAR instruction.  The possibility of spurious bonds is eliminated by
inserting 'PART 1' before the first molecule, 'PART 2' before the second, and
'PART 0' after it.  Hydrogen atoms can be inserted in the usual way using the
HFIX instruction since the connectivity table is 'correct'; they will
automatically be assigned the site occupation factors of the atoms to which
they are bonded.

   Finally we would like to refine with anisotropic displacement parameters
because the thermal motion of such solvent molecules is certainly not
isotropic, but the refinement will be unstable unless we restrain the
anisotropic displacement parameters to behave 'reasonably' by means of rigid
bond restraints (DELU) and 'similar Uij' restraints (SIMU); fortunately the
program can set up these restraints automatically.  The DELU restraints
restrain the differences in the components of the displacement parameters of
two atoms to zero along the 1,2- and 1,3-vector directions, and are derived
with the help of the connectivity table.  Since the SIMU restraints are much
more approximate, we restrict them to atoms which, because of the disorder,
are almost overlapping (i.e. are within 0.7 A of each other).  Note that the
SIMU restraints ignore the connectivity table and are based directly on a
distance criterion specifically because this is a sensible way of handling
disorder.  In order to specify a non-standard distance cutoff which is the
third SIMU parameter, we must also give the first two parameters which are the
restraint esd's for distances involving non-terminal atoms (0.02) and at least
one terminal atom (0.04) respectively.  The '.ins' file now contains:

 HFIX 23 C12 > C15 C22 > C25
 ANIS O11 > C25
 DELU O11 > C25
 SIMU O11 > C25 0.02 0.04 0.7
 FVAR ..... 0.75
 ....
 PART 1
 SAME O21 > C25
 SAME O11 C15 < C12
 O11 4 ..... ..... ..... 21
 C12 1 ..... ..... ..... 21
 C13 1 ..... ..... ..... 21
 C14 1 ..... ..... ..... 21
 C15 1 ..... ..... ..... 21
 PART 2
 O21 4 ..... ..... ..... -21
 C22 1 ..... ..... ..... -21
 C23 1 ..... ..... ..... -21
 C24 1 ..... ..... ..... -21
 C25 1 ..... ..... ..... -21
 PART 0

   An alternative type of disorder common for THF molecules and proline
residues in proteins is when one atom (say C14) can flip between two positions
(i.e. it is the flap of an envelope conformation). If we assign C14 to PART 1,
C14' to PART 2, and the remaining ring atoms to PART 0 then the program will
be able to generate the correct connectivity, and so we can also generate
hydrogen atoms for both disordered components (with AFIX, not HFIX):

 SIMU C14 C14'
 ANIS O11 > C14'
 FVAR ..... 0.7
 ....
 SAME O11 C12 C13 C14' C15
 O11 4 ..... ..... .....
 C12 1 ..... ..... .....
 AFIX 23
 H12A 2 ..... ..... .....
 H12B 2 ..... ..... .....
 AFIX 0
 C13 1 ..... ..... .....
 PART 1
 AFIX 23
 H13A 2 ..... ..... ..... 21
 H13B 2 ..... ..... ..... 21
 PART 2
 AFIX 23
 H13C 2 ..... ..... ..... -21
 H13D 2 ..... ..... ..... -21
 AFIX 0
 PART 1
 C14 1 ..... ..... ..... 21
 AFIX 23
 H14A 2 ..... ..... ..... 21
 H14B 2 ..... ..... ..... 21
 AFIX 0
 PART 0
 C15 1 ..... ..... .....
 PART 1
 AFIX 23
 H15A 2 ..... ..... ..... 21
 H15B 2 ..... ..... ..... 21
 PART 2
 AFIX 23
 H15C 2 ..... ..... ..... -21
 H15D 2 ..... ..... ..... -21
 AFIX 0
 C14' 1 ..... ..... ..... -21
 AFIX 23
 H14C 2 ..... ..... ..... -21
 H14D 2 ..... ..... ..... -21
 AFIX 0
 PART 0

   It will be seen that six hydrogens belong to one conformation, six to the
other, and two are common.  The generation of the idealized hydrogen positions
is based on the connectivity table but also takes the PART numbers into
account.  These procedures should be able to set up the correct hydrogen atoms
for all cases of two overlapping disordered groups.  In cases of more than two
overlapping groups the program will usually still be able to generate the
hydrogen atoms correctly by making reasonable assumptions when it finds that
an atom is 'bonded' to atoms with different PART numbers, but it is possible
that there are examples of very complex disorder which can only be handled by
using dummy atoms constrained (EXYZ and EADP) to have the same positional and
displacement parameters as atoms with different PART numbers (in practice it
may be easier - and quite adequate - to ignore hydrogens except on the two
components with the highest occupancies!).

   When the site symmetry is high, it may be simpler to apply similarity
restraints using SADI or DFIX rather than SAME.  For example the following
three instruction sets would all restrain a perchlorate ion (CL,O1,O2,O3,O4)
to be a regular tetrahedron:

 SAME CL O2 O3 O4 O1
 SADI  O1 O2  O1 O3

followed immediately by the atoms CL, O1... O4; the SAME restraint makes all
the Cl-O bonds equal but introduces only FOUR independent restraints involving
the O..O distances, which allows the tetrahedron to distort retaining only one
-4 axis, so one further restraint must be added using SADI.

or:

 SADI  CL O1  CL O2  CL O3  CL O4
 SADI  O1 O2  O1 O3  O1 O4  O2 O3  O2 O4  O3 O4

or:

 DFIX 31  CL O1  CL O2  CL O3  CL O4
 DFIX 31.6330  O1 O2  O1 O3  O1 O4  O2 O3  O2 O4  O3 O4

in the case of DFIX, one extra least-squares variable (free variable 3) is
needed, but it is the mean Cl-O bond length and refining it directly means
that its esd is also obtained directly.  If the perchlorate ion lies on a
three-fold axis through CL and O1, the SADI method would require the use
of symmetry equivalent atoms (EQIV $1 y, z, x  and  O2_$1  etc. for R3 on
rhombohedral axes) so DFIX would be simpler (same DFIX instructions as above
with distances involving O3 and O4 deleted)  [the number 1.6330 in the above
example is of course twice the sine of half the tetrahedral angle].


   If you wish to test whether you have understood the full implications of
these restraints, try the following problems:

(a) A C-O-H group is being refined with AFIX 87 so that the torsion angle
about the C-O bond is free.  How can we restrain it to make the 'best'
hydrogen-bond to a specific Cl- ion, so that the H..Cl distance is minimized
and the O-H..Cl angle maximized, using only one restraint instruction (it may
be assumed that the initial geometry is reasonably good) ?

(b) Restrain a C6 ring to an ideal chair conformation using one SAME and one
SADI instruction.  Hint: all 1-2, 1-3 and 1-4 distances are respectively equal
for a chair conformation, which also includes a regular planar hexagon as a
special case.  A non-planar boat conformation does not have equal 1-4
distances.  To force the ring to be non-planar, the ratio of the 1-2 and 1-3
distances would have to be restrained using DFIX and a free variable.


     MACROMOLECULES AND OTHER STRUCTURES WITH A POOR DATA/PARAMETER RATIO

   Macromolecules often contain regions of disordered solvent and do not
usually diffract to as high a resolution as small molecules.  On the other
hand they often contain repeated chemical units which we can exploit by means
of similarity restraints to improve the effective data to parameter ratio and
hence the precision of the structure.  These provide an effective way of
incorporating 'non-crystallographic symmetry' into structure refinement.  To
simplify the application of restraints etc. SHELXL-93 allows a structure to be
subdivided into residues, each of which is defined by a residue number and
(optionally) a residue class (up to 4 characters).  Different residues of the
same chemical type may be assigned to the same class and also use identical
atom names, but must have different residue numbers. Thus for example the beta
carbon atoms in all phenylalanine residues (class PHE) in a polypeptide may
all be called 'CB'.  Only one instruction would then be needed to add the
appropriate idealized hydrogens to all of them and refine them with a 'riding
model':
          HFIX_phe 23 CB

To apply 'similarity' distance restraints to all phenylalanines, all that is
required is one SAME instruction, which should be inserted before the first
atom of the residue with the best geometry (so that its connectivity array
may be used to define the 1,2- and 1,3-distances):

 RESI 23 phe
 SAME_phe N > CZ              [Note: there is of course no restriction on the
 N 3 ..... ..... .....        order of the atoms in a residue, but it must be
 ...                          the same for all residues of the same class]
 CZ 1 ..... ..... .....

It would also be sensible to apply a planarity restraint to these side chains:

 FLAT_phe CB > CZ

The code '_*' is used to refer to all residues.  For example it would be
possible to use FLAT in this way to ensure that all peptide carbonyl carbons
have planar coordination, but it is easier to do this by restraining their
chiral volumes to zero (because the three bonded atoms do not then need to be
named explicitly):

 CHIV_* C 0

assuming that these are the only atoms named 'C'; since the default chiral
volume is zero it could be left out.  In some cases it is necessary to refer
to specific residues, in which case residue numbers should be used.  For
example the following instruction calculates the torsion angle of a disulfide
bridge linking Cys_56 and Cys_124:

 CONF CB_56 SG_56 SG_124 CB_124

Protein crystallographers will have noticed that SHELXL-93 is fully compatible
with the usual protein atom naming conventions, except that all atom names
MUST begin with a letter, so the PDB convention of starting some hydrogen atom
names with a digit is not allowed; similarly residue classes must begin with a
letter and residue numbers must be pure numbers. The auxiliary program PDBINS
is provided to generate a SHELXL-93 '.ins' file from a PDB file, incorporating
restraints etc. taken from the dictionary file SHELXL.DIC.

   The general approach to the refinement of large structures with limited
reflection data is to proceed GRADUALLY, using all appropriate restraints
(and possibly rigid group constraints) in the early stages of refinement, and
relaxing them as far as possible only when the refinement has more or less
converged.  Although full-matrix refinement is normally recommended for
small-molecule refinements, it is more efficient in terms of computer
resources to use the Konnert-Hendrickson conjugate gradient approach (CGLS)
for macromolecular refinement, with judicious insertion of large full-matrix
blocks to help to resolve problem areas (e.g. solvent disorder).  A final
refinement with overlapping full-matrix blocks, possibly restricted to the
x, y and z coordinates only, would then be required to obtain the esds in e.g.
torsion angles.  For a very small protein or polynucleotide with less than 500
non-hydrogen atoms (excluding solvent) a single final xyz-block would suffice.
The CGLS refinement is usually very stable; erratic behavior can usually be
tracked down to one or more atoms with unreasonably large isotropic or
anisotropic displacement parameters, or to refinement of more parameters than
the data and restraints can support.

   If the second number on the L.S. or CGLS instruction is negative (-N) then
every Nth reflection is ignored in the least-squares refinement, but is used
instead for the calculation of independent R-values when the final structure
factor cycle is performed.  This enables 'R(free)' to be used to calibrate the
sigmas for the various restraints and to check on possible 'over-refinement'
(e.g. the refinement of noise peaks from a difference electron density map as
solvent atoms).  For details see A.T. Brunger, Nature 355 (1992) 472-475.
Note the use of the DEFS instruction to change the default sigmas globally!
A particularly effective application of R(free) is the decision as to whether
the data justify (restrained) anisotropic refinement rather than isotropic.
After the structure has more or less reached convergence after isotropic
refinement in the usual way, two jobs are run with (for example) 'CGLS 20 -10'
so that every 10th reflection is ignored in the refinement but is used instead
for calculating R(free).  One of the jobs should also contain ANIS (before the
first atom), DELU and SIMU (without atom names), and ISOR (for the solvent
water, e.g. 'ISOR O1 > LAST').  Only if R(free) is significantly lower for the
ANIS job is further anisotropic refinement justified.  This is more likely to
be the case if the data have been collected to higher resolution (i.e. the
data to parameter ratio is higher), but the quality of the data is also
important.  In general the effective resolution should be better than (very
roughly) 1.5 Angstroms for proteins and polynucleotides before anisotropic
refinement is justified.  It is sensible to apply this R(free) test and - if
justified - initiate anisotropic refinement BEFORE attempting to resolve
discrete side-chain disorder unless the components of the disorder are well
separated spatially, because anisotropic motion can be regarded as an
alternative to isotropic motion with discrete disorder for small separations.
On the other hand it is a good idea to try to locate as many solvent atoms as
possible before applying the test (see below).

   The similarity restraints on the geometry are unbiased in the sense that
no arbitrary numbers in the form of standard bond lengths and angles are
required.  Thus it should never be necessary to repeat a refinement because
more precise values of these quantities are available.  If R(free) is used to
establish optimal esd's for the restraints, the weights may also be regarded
as objective.  The only assumption being made is that chemically equivalent
bond lengths and angles (i.e. 1,3-distances) are equal in a statistical sense.
Similarly the planarity restraints and the restraints on isotropic and
anisotropic displacement parameters do not require the use of preconceived
(and possibly erroneous) numbers (except zero!).  This approach should be used
whenever the type of problem (e.g. the extent of the non-crystallographic
similarity) and the extent of the data permit.

   The geometrical similarity approach works very well for 'small-molecule'
structures which have become large because there are several chemically
identical molecules in the crystallographic asymmetric unit, and well for
polynucleotides which may also contain several examples of each repeating
unit (especially when divided up into base, furanose and phosphate units).
A further advantage of the similarity approach for polynucleotides is that the
state of protonation of the bases may be uncertain, making it difficult to
know which standard bond lengths etc. to use as target values or in fitting
rigid groups; it is safer to assume that the equivalent bases have the same
(partial) protonation states, i.e. the 1,2- and 1,3-distances are 'similar'
but unknown.

   On the other hand in proteins some amino-acids may be present many more
times (and so will be better refined) than others, and geometric similarity
does not help for an amino-acid which is only present once.  Thus the
recommended approach for proteins and large polypeptides is to use DFIX
instructions to restrain 1,2- and 1,3-distances to standard values, with
SAME/SADI (and small sigmas) to restrain the components of disordered
residues to be similar.  FLAT restraints are useful for aromatic residues and
(with larger sigmas) for the five atoms involved in each main-chain peptide
linkage.  It is also very convenient to impose planarity on carbonyl and
carboxyl carbons using CHIV (with a chiral volume of zero).  All these
restraints are set up automatically when the program PDBINS (Appendix B) is
used to convert a PDB file for a protein into SHELXL-93 '.ins' format; the
restraints are taken from the dictionary file SHELXL.DIC which users are
encouraged to extend and adapt to local circumstances.  Alternatively a text
editor may be used to incorporate the appropriate parts of SHELXL.DIC into
the .ins file.

   Standard (restraint) bond lengths based on the CSD have been tabulated by
F.H.Allen, O. Kennard, D.G. Watson, L. Brammer, A.G. Orpen and R. Taylor in
Sections 9.5 and 9.6 of Volume C of International Tables for Crystallography
(1992), Ed. A.J.C. Wilson, Kluwer Academic Publishers, Dordrecht, pp. 685-791.
Suitable parameters for proteins have been given by R.A. Engh and R. Huber,
Acta Cryst., A47 (1991) 392-400.  For nucleic acids the necessary parameters
may be taken from R. Taylor and O. Kennard, J. Mol. Struct., 78 (1982) 1-28
(bases and phosphates) and S. Arnott and D.W.L. Hukins, Biochem. J., 130
(1972) 453-465 (furanose rings).  Taylor and Kennard found no evidence that
the bases are non-planar, so FLAT can safely be used.  With poor resolution
data it might be better to fit the bases to the orthogonal coordinates given
by R. Taylor and O. Kennard, J. Am. Chem. Soc., 104 (1982) 3209-3212, and then
refine them as rigid groups (FRAG...FEND - possibly in an 'include file' -
followed by AFIX 176 etc.).

   It appears that the optimal restraint esds are very nearly independent of
the type of structure and the resolution of the data, so normally the default
values may be used.  These have been established by R(free) and other tests on
a variety of structures. The default values may if necessary be reset globally
by a DEFS instruction before the individual restraints.  The default esds are:
all SAME and SADI distances, and DFIX with positive d: 0.03 A (first DEFS
parameter);  FLAT and CHIV: 0.2 A^3 (second DEFS parameter);  DELU: 0.01 A^2
(third DEFS parameter), SIMU: 0.05 (fourth DEFS parameter) if neither atom
terminal, otherwise 0.1 (or twice the fourth DEFS parameter);  ISOR: 0.1 if
atom not bonded to exactly one other atom, otherwise 0.2;  DFIX -d (anti-
bumping restraints) 0.1 A.  The ISOR and DFIX -d defaults are not set by DEFS.

   Although the above default restraint esds give good results for small
molecules and proteins which diffract to 1.2 Angstroms or better, there may
be discrepancies involving the rigid bond restraints (indicating that the
harmonic model is not such a good approximation, i.e. an ensemble (molecular
dynamics) approach may be a better description.  In such case DELU and SIMU
can be relaxed to about 0.03 and 0.10 respectively for anisotropic refinement,
and this model may well give the lowest value for the free R-factor. Some care
is needed, because if the restraints are relaxed too far the refinement may
become unstable.

   The refinement may also become unstable (e.g. oscillate rather than
converge) if one or more solvent atoms have unreasonably high displacement
parameters, in which case they can be deleted.  Otherwise either 'DAMP 100'
(with L.S.) or 'SLIM .3 .1' (with CGLS) should be tried to damp the refinement
(which will then require more cycles for convergence).

   A further facility primarily intended for macromolecules but also useful
for smaller structures is the production of tables using RTAB.  When used in
conjunction with residues, RTAB provides a convenient way of tabulating
standard torsion angles, chiral volumes, and distances and angles involved in
(for example) hydrogen bonds.  Examples of the latter involving symmetry
generated atoms were included in the second test structure (sigi) discussed
above.  The following instructions would produce sorted tables of the standard
protein torsion angles and chiral volumes for the alpha-carbon atoms, assuming
that the residues are numbered consecutively (CA_- means the atom CA with the
residue number decreased by one):

 RTAB_* Omeg CA_- C_- N CA
 RTAB_* Phi C_- N CA C
 RTAB_* Psi N CA C N_+
 RTAB_* Chi N CA CB CG
 RTAB_* Cvol CA

If RTAB_* is not appropriate for a particular residue, e.g. some torsion
angles involving the terminal residues, or chi and chiral volume for glycine,
the residues in question are simply left out of the tables.  The _+ and _-
notation may also be used for cyclic peptides by assigning an 'alias' to the
first and last residues; for example the residues in a cyclic pentapeptide
could be numbered 2 to 6 inclusive, with alias 7 assigned to residue 2 and
alias 1 to residue 6, so that all the torsion angles would be tabulated using
the above RTAB instructions.

   The SWAT option introduces one variable and one fixed parameter which
enable diffuse solvent to be modeled by Babinet's principle (R. Langridge,
D.A. Marvin, W.E. Seeds, H.R. Wilson, C.W. Hooper, M.H.F. Wilkins and L.D.
Hamilton, J. Mol. Biol. 2 (1960) 38-64; H. Driessen, M.I.J. Haneef, G.W.
Harris, B. Howlin, G. Khan and D.S. Moss, J. Appl. Cryst. 22 (1989) 510-516).
This usually produces a significant but not dramatic improvement for the very
low order data in macromolecular refinements.

   One of the most difficult and potentially time-consuming aspects of
macromolecular structure refinement is the treatment of solvent water.  The
relatively diffuse solvent atoms contribute primarily to the lower order
reflections and so often constitute a local region in the least-squares
parameter space in which there are more parameters than data, i.e. there may
be many plausible sets of parameters which fit the data equally well.  Thus
anisotropic refinement of fully occupied atoms or isotropic refinement of a
larger number of water molecules with fractional occupation factors may well
fit the data equally well and involve about the same number of parameters in
total.  The advantage of the former approach is that chemically sensible
restraints can be applied to the distances between the waters (and between the
solvent and protein atoms).  Even when the data only permit an isotropic
refinement, it is recommended that the water be refined with full occupancies
and 'anti-bumping' restraints until no more waters can be found, and then if
necessary (e.g. when there are strong difference Fourier peaks closer than
say 2.3 Angstom to waters with relatively high U values) partial occupancies
can be assigned.

   SHELXL-93 enables anti-bumping restraints to be input by hand (DFIX -d) but
they will usually be generated automatically by the program (by using the BUMP
instruction and flagging the (water) atoms on which it is to operate by
'CONN 0').  The anti-bumping restraints are generated between all water atoms,
and between all water and all other atoms, including all possible symmetry
equivalents and taking atom types into account (thus potential hydrogen bonds
are allowed to be shorter than O..C distances etc.).

   The following iterative procedure proves effective in practice at building
up a network of fully-occupied water molecules, with an acceptable pattern of
hydrogen-bonded distances, that is also consistent with the diffraction data.
The SWAT and BUMP instructions should be included throughout, with CONN 0 to
flag the water molecules and inhibit the generation of accidental bonds (which
can for example upset the reidealization of hydrogen atoms each refinement
cycle).  If the waters are anisotropic 'ISOR 1.0 O1 > LAST' is advisable.
After each refinement job, water molecules with (an)isotropic displacement
parameters which are too high (e.g. all three principal components greater
than 1.2 or 1.4 A^2) should be deleted, and (FMAP 2 / PLAN 200 2.3) difference
peaks which make sensible hydrogen bonding distances to water molecules or to
other electronegative atoms added; these will not necessarily be the highest
peaks.  The final table of distances between peaks should be checked to ensure
that there are no short distances between the chosen peaks (PLAN 200 2.3 does
this automatically).  The list of 'disagreeable restraints' after the final
refinement cycle in each job should also be checked for short contacts and if
necessary one of the offending waters removed.  At lower resolution it would
be necessary to use a graphical display of the Fo-Fc or 2Fo-Fc electron
density to locate the new trial water molecules.  This procedure converges
after a few jobs when no further water molecules can be eliminated or added.
At this point the remaining difference electron density peaks should be
inspected carefully to see if it is necessary to add partially occupied
discrete solvent atoms in the vicinity of disordered side-chains (if any).  An
advantage of the full occupancy / antibumping approach is that it prevents
water molecules from diffusing into protein regions and thus facilitates
remodeling of disordered side-chains etc. In summary, for modeling the solvent
the following instructions would be typical:

CGLS 10
SWAT 2 2      ! will be updated by the program in the .res file
BUMP                ! automatic antibumping restraints generated
CONN 0 O1 > LAST    ! flag water for antibumping and exclude from connectivity
ISOR 0.1 O1 > LAST  ! for anisotropic waters (ignored for isotropic atoms)
FMAP 2              ! Fo-Fc map
PLAN 200 2.3    ! difference peaks only written to .res for potential waters

and after each job waters would be deleted on editing .res to .ins if bad
contacts remain (see final restraints summary) or if U or Ueq have risen to
too high a value; selected (or perhaps all) potential waters in the peak-list
are then renamed and moved to before the HKLF instruction. It is also possible
to monitor progress using the free R factor (CGLS 10 -10). Even if anisotropic
refinement is planned, it is a good idea (and it usually makes the eventual
R-free test for the anisotropic refinement more favorable) to optimize the
water structure in this way first. If this extension of the water is continued
after going anisotropic, then an ANIS instruction is needed before the first
new water (oxygen) atom.

   Other useful features for macromolecules include an 'omit map' (OMIT
atomnames followed by FMAP), the SHEL instruction for ignoring high and low
resolution data, the use of 'include files' for accessing standard fragments
or restraint libraries, and provision for synchrotron data at various
wavelengths (DISP) as well as Laue data (LAUE plus HKLF 2).

   The amount of '.lst' file output produced may be reduced substantially by
putting 'MORE 0' before the first atom in the '.ins' file, but this facility
should only be used when one is sure that the '.ins' file is correct; it might
be better to edit (or write a little program to extract information from) the
full '.lst' file instead, so that diagnostic information is still available if
required. The UNIX 'more' command is useful for browsing through '.lst' files.

   In contrast to standard macromolecular refinement programs, SHELXL-93 is
able to provide reliable estimates of the standard deviations of all refined
parameters and of all derived quantities, subject of course to any assumptions
implied by the restraints employed (in keeping with the Bayesian philosophy).
For example tight geometrical 'similarity restraints' effectively determine
mean bond lengths and angles and their esd's, but leave the torsion angles
free to refine independently; thus the torsion angles - and their esds -
retain their diagnostic value.

   In summary, a typical refinement of a small protein would take the
following course.  First the auxiliary program PDBINS would be used to
convert the atom coordinates into SHELXL-93 '.ins' format and to extract the
necessary restraints from a residue dictionary file (based on 'shelxl.dic'
which is provided as a model).  This is especially convenient if XPLOR has
been used for the structure solution by molecular replacement and/or the
initial refinement.  Some editing of the '.ins' file may be needed if disorder
or non-standard residues are present.  Different components of disordered
groups should be assigned different 'PART' numbers, and the occupation factors
of two components may be refined as p and (1-p) by the use of a free variable
(i.e. set to e.g. 21 and -21 in which case a starting value for free variable
number 2 should be given as the second parameter on the 'FVAR' instruction).
The first SHELXL-93 runs serve to build up a consistent network of fully
occupied solvent molecules as explained above.  At this point the hydrogen
atoms are inserted by removing 'REM ' which precedes the HFIX instructions
from PDBINS and the dictionary file.  Attachment of hydrogens to more than one
component of each disordered group is best performed in a subsequent job by
inserting the appropriate AFIX instructions.  If the resolution is very good
(ca. 1.5 A or better) the R(free) test should now be performed to see whether
anisotropic refinement is justified (i.e. two 'CGLS 20 -10' jobs should be
run, differing only in that one contains an 'ANIS' instruction).  It is a
mistake to model discrete disorder (unless the components are very clearly
separated), or to include partially occupied solvent, until this test is
applied, because anisotropic refinement may well provide an alternative way
of modeling these effects.  Subsequent anisotropic refinement (if justified)
may be combined with improvement of the solvent model and possible modeling of
discrete disorder; very often the better phase estimates resulting from the
restrained anisotropic refinement give a much clearer difference electron
density.  Towards the end of this procedure partially occupied solvent may be
introduced; where possible the occupation factors should be coupled (using
free variables) to those of neighboring disordered side-chains, or an atom
may be split into two components with occupancies fixed at 0.5 (i.e. set as
10.5), either as recommended by the program (see the list of principal
displacement components) or as deduced from an Fo-Fc Fourier.  This maintains
the anti-bumping restraints with other solvent and side-chain atoms, but not
between disordered components for which the occupancies add up to less than
1.1 (slightly greater than one to allow for hydrogen atom contributions etc.).
At various stages in the refinement one of the LIST options can be used to
write a phased reflection list to the .fcf file for input into another
macromolecular FFT map generating for input into a graphics system.  When the
refinement has converged, it may be desired to run an xyz-only refinement with
overlapping blocks (L.S./BLOC) to obtain esds on the torsion angles and
hydrogen bonding distances (the antibumping list may be used to set up tables
using RTAB and EQIV - see the sigi test example).  Torsion angles and
hydrogen-bonding distances are not usually restrained in the refinement, and
so their esds have some meaning.  Finally 'ACTA 2' and/or WPDB may be used to
archive the results.


                            ABSOLUTE STRUCTURE

Even if determination of absolute configuration is not one of the aims of the
structure determination, it is important to refine ANY non-centrosymmetric
structure as the correct 'absolute structure' in order to avoid introducing
systematic errors into the bond lengths etc.  In some cases the absolute
structure will be known with certainty (e.g. proteins), but in others it has
to be deduced from the X-ray data.  Generally speaking, a single phosphorus or
heavier atom suffices to determine an absolute structure using Cu-K(alpha)
radiation, and with accurate high-resolution low-temperature data including
Friedel opposites such an atom may even suffice for Mo-K(alpha).

   In the course of the final structure factor calculation the program
calculates the Flack absolute structure parameter x and its esd (it is a bonus
of the refinement against F^2 that this calculation is a 'hole in one' and
doesn't require expensive iteration).  A comparison of x with its esd provides
an indication as to whether the refined absolute structure is correct or
whether it has to be 'inverted' (the program prints a suitable warning should
this be necessary).  This attempt to refine x 'on the cheap' is reliable when
the true value of x is close to zero, but may produce a (possibly severe)
underestimate of x for structures which have to be inverted, because x is
correlated with positional and other parameters which have not been allowed to
vary.  Effectively these parameters have adapted themselves to compensate for
the wrong (zero) value of x in the course of the refinement, and need to be
refined with x to eliminate the effects of correlation.  These effects will
tend to be greater when the correlation terms are greater, e.g. for pseudo-
symmetric structures and for poor data to parameter ratios (say less than 8:1).
x can be refined at the same time as all the other parameters using the TWIN
and BASF instructions; this implies racemic twinning and so is discussed
under TWIN below (see also H.D. Flack, Acta Cryst., (1983) A39, 876-881).

   For most space groups 'inversion' of the structure simply involves
inserting an instruction  'MOVE 1 1 1 -1'  before the first atom.  Where the
space group is one of the 11 enantiomorphous pairs [e.g. P3(1) and P3(2)] the
translation parts of the symmetry operators need to be inverted as well to
generate the other member of the pair.  There are seven cases for which, if
the standard setting of the International Tables for Crystallography has been
used, inversion in the origin does NOT lead to the inverted absolute structure
(in fact, in some cases it leads to a totally different structure: H.D. Flack,
personal communication, 1992)!  This problem was drawn to the author's
attention by D. Rogers in about 1980, but was probably first discussed in
print by E. Parthe and L.M. Gelato, Acta Cryst., A40 (1984) 169-183 and by G.
Bernardinelli and H.D. Flack, Acta Cryst., A41 (1985) 500-511.  The offending
space groups and corresponding correct MOVE instructions are:

 Fdd2      MOVE .25 .25 1 -1           I4(1)cd   MOVE 1 .5 1 -1
 I4(1)     MOVE 1 .5 1 -1              I-42d     MOVE 1 .5 .25 -1
 I4(1)22   MOVE 1 .5 .25 -1            F4(1)32   MOVE .25 .25 .25 -1
 I4(1)md   MOVE 1 .5 1 -1


            TWINNED CRYSTALS AND REFINEMENT AGAINST POWDER DATA

   Twinned crystals are refined in SHELXL-93 by the method of Pratt, Coyle
and Ibers, J. Chem. Soc. 1971, 2146-2151 (see also Jameson, Acta Cryst., A38
(1982) 817-820).  The sum of the Fc^2 values of the individual twin domains,
each multiplied by its fractional contribution, is fitted to the observed
Fo^2.  Since the n fractional contributions must sum to unity, only n-1 of
them can be refined independently; the fraction of component 1 is set equal to
one minus the sum of the other fractional contributions.

   Refinement of twinned crystals and refinement against F^2-values derived
from powder data are similar in that several reflections with different
indices may contribute to a single F^2 observation.  For powder data this
requires some small adjustments to the format of the '.hkl' file; the batch
number becomes the multiplicity m, and where several reflections contribute to
the same observation the multiplicity is made positive for the last reflection
in the group and negative for the rest.  A similar approach is possible for
twinned crystals, except that the batch number is replaced by the twin
component number, and the batch scale factors (BASF) may be refined to
determine the fractional contributions of the components 2, 3, ...    k1, the
fraction of component 1, is refined as  ( 1 - k2 - k3 - ... ).  In simple
cases, i.e. when the lattices of all components are always coincident, the
normal format can be retained in the '.hkl' file, and the index transformation
specified with a TWIN instruction.  Although SHELXL-93 may be useful for some
high symmetry and hence reasonably well resolved powder and fibre diffraction
patterns - the various restraints and constraints should be exploited in full
to make up for the poor data/parameter ratio - for normal powder data a
Rietveld refinement program would be much more appropriate.

   For both powder (HKLF 6) and twinned data (HKLF 5 or TWIN with HKLF 4), the
reflection data are reduced to the 'prime' component, by multiplying Fo^2 and
Fc^2 by the ratio  Fc^2(prime) / Fc^2(total), before performing the analysis
of variance and the Fourier calculations.  Similarly  'OMIT h k l'  refers to
the indices of the prime component.  The prime component is the one for which
the indices have not been transformed by the TWIN instruction (i.e. m = 1 ),
or in the case of HKLF 5 or HKLF 6 the component given with positive m (i.e.
the last contributor to a given intensity measurement, not necessarily with
|m| = 1).

   For powder data the least-squares refinement fits the overall scale factor
(osf^2 where osf is given on the FVAR instruction) times the multiplicity
weighted sum of calculated intensities to Fo^2:

 (Fc^2)* = osf^2 [ m(1) * Fc(1)^2 + m(2) * Fc(2)^2 + m(3) * Fc(3)^2 + ... ]

where the multiplicities of the contributors are given in the place of the
batch numbers in the '.hkl' file.  Since it is then not possible to define
batch numbers as well, 'BASF' cannot be used with powder data.

   For twinned data (TWIN or HKLF 5) the expression becomes:

 (Fc^2)* = osf^2 [ k(1) * Fc(1)^2 + k(2) * Fc(2)^2 + k(3) * Fc(3)^2 + ... ]

where the starting values for the k(2), k(3), ... are given on the BASF
instruction, and k(1) is defined such that  Sigma[k(m)] = 1.  If no BASF
instruction is used, all the k(m) are made equal.  m is the component number
given in the place of the batch number in the '.hkl' file; if TWIN is used to
generate the components, m is 1 for the initial indices, 2 after applying the
TWIN matrix once, 3 after applying it twice, etc.  The parameter ncomp must be
given on the TWIN instruction if the matrix is to be applied more than once.

   The following cases are relatively common:

(a) The lower symmetry trigonal, tetragonal, hexagonal or cubic Laue groups
may be twinned so that they look (more) like the corresponding higher symmetry
Laue groups (assuming c unique except for cubic):

 TWIN  0 1 0  1 0 0  0 0 -1

plus one BASF parameter if the twin components are not equal in scattering
power.

(b) Orthorhombic with a and b approximately equal may emulate tetragonal:

 TWIN  0 1 0  1 0 0  0 0 -1

plus one BASF parameter for unequal components.

(c) Monoclinic with beta approximately 90 degrees may emulate orthorhombic:

 TWIN  1 0 0  0 -1 0  0 0 -1

plus one BASF parameter for unequal components.

(d) Monoclinic with a and c approximately equal and beta approximately 120
degrees may emulate hexagonal [P2(1)/c would give absences and possibly also
intensity statistics corresponding to P6(3)].  There are three components, so
ncomp must be specified and the matrix is applied once to generate the indices
of the second component and twice for the third component.  In German this is
called a 'Drilling' as opposed to a 'Zwilling' (with two components):

 TWIN  0 0 1  0 1 0  -1 0 -1  3

plus TWO BASF parameters for unequal components.  If the data were collected
using an hexagonal cell, then an HKLF matrix would also be required to
transform them to a setting with b unique:

 HKLF  4  1  1 0 0  0 0 1  0 -1 0

(e) Refinement of racemic twinning may be performed by adding the following
two instructions to the '.ins' file (and retaining HKLF 4):

 TWIN
 BASF 0.5

since the default TWIN matrix inverts the indices.  In this example, the BASF
coefficient is the Flack absolute structure parameter x (H.D. Flack, Acta
Cryst., (1983) A39, 876-881; G. Bernardinelli and H.D. Flack, Acta Cryst., A41
(1985) 500-511).  Refinement of racemic twinning should normally only be
attempted after all non-hydrogen atoms have been located AND the program
suggests that it would be advisable.  If racemic twinning is refined in this
way, the automatic calculation of the Flack x parameter in the final structure
factor cycle is suppressed, since the BASF parameter is x.

   If general and racemic twinning are to be refined simultaneously, ncomp
should be doubled and given a negative sign, and there should be |ncomp|-1
BASF twin component factors (or none, in the unlikely event that all are to be
fixed as equal).  The inverted components follow those generated using the
TWIN matrix, in the same order.  In such a case a single Flack x parameter is
no longer appropriate; the program will still estimate a value, which should
be zero since the effect has already been taken into account, but its esd
gives a guide to the reliability of the racemic refinement.

   The HKLF 5 and 6 instructions force MERG 0, i.e. neither a transformation
of reflection indices into a standard form nor a sort-merge is performed.  If
twinning is specified using the TWIN instruction, any MERG instruction may be
used and the default remains MERG 2.  Although this is always safe for racemic
twinning, there may be other forms of twinning for which it is not permissible
to sort-merge first.  Whether or not MERG is used, the program ignores all
systematically absent contributions, with the result that a reflection is
excluded from the data if it is systematically absent for all components.

   Twinning usually arises for good structural reasons.  When the heavy atom
positions correspond to a higher symmetry space group it may be difficult or
impossible to distinguish between twinning and disorder (of the light atoms);
see W. Hoenle and H.G. von Schnering, Z. Krist., 184 (1988) 301-305.  Since
refinement as a twin usually requires only two extra instructions and one
extra parameter, in such cases it should be attempted first, before investing
many hours in a detailed interpretation of the 'disorder'!



             THE '.ins' INSTRUCTION FILE - DETAILED SPECIFICATION

The rest of this documentation should be regarded as a reference manual rather
than light reading!

   Defaults are given in square brackets in this documentation; '#' indicates
that the program will generate a suitable default value based on the rest of
the available information.  Continuation lines are flagged by '=' at the end
of a line, the instruction being continued on the next line which must start
with four spaces.  Other lines beginning with four spaces are treated as
comments, so blank lines may be added to improve readability.  All characters
following '!' or '=' in an instruction line are ignored, except after TITL,
SYMM or EQIV (for which continuation lines are not allowed).

   The '.ins' file may include an instruction of the form:  +filename
(the '+' character MUST be in column 1).  This causes further input to be
taken from the named file until an 'END' instruction is encountered in that
file, whereupon the file is closed and instructions are taken from the next
line of the '.ins' file.  The input instructions from such an 'include' file
are not echoed to the '.lst' and '.res' file, and may NOT contain FVAR, BASF,
EXTI or SWAT instructions or atoms (except inside a FRAG...FEND section)
since this would prevent the '.res' file from being used unchanged for the
next refinement job (after renaming as '.ins').

   The '+filename' facility enables standard fragment coordinates or long
lists of restraints etc. to be read from the same files for each refinement
job, and for different structures to access the same fragment or restraint
files.  One could also for example store the LATT and SYMM instructions for
different space groups, or neutron scattering factors for particular elements,
or LAUE instructions followed by wavelength-dependent scattering factors, in
suitably named files.  Since these 'include' files are not echoed, it is a
good idea to test them as part of an '.ins' file first, to check for possible
syntax errors.  Such 'include' files may be nested; the maximum allowed depth
depends upon the operating system and compiler used.  Note that on some (e.g.
IBM mainframe) computers, 'filename' is a dummy name (DDNAME) which must be
defined in the JCL or REXX macro used to submit the job.


TITL [   ]

Title of up to 76 characters, to appear at suitable places in the output.
The characters '!' and '=', if present, are part of the title rather than
having a special significance.


CELL  lambda  a b c alpha beta gamma

Wavelength and unit-cell dimensions in Angstroms and degrees.


ZERR  Z  esd(a) esd(b) esd(c) esd(alpha) esd(beta) esd(gamma)

Z value (number of formula units per cell) followed by the estimated standard
deviations in the unit-cell dimensions.


LATT N [1]

Lattice type: 1=P, 2=I, 3=rhombohedral obverse on hexagonal axes, 4=F, 5=A,
6=B, 7=C.  N must be made negative if the structure is non-centrosymmetric.


SYMM symmetry operation

Symmetry operators, i.e. coordinates of the general positions as given in
International Tables.  The operator x, y, z is always assumed, so MUST NOT be
input.  If the structure is centrosymmetric, the origin MUST lie on a center
of symmetry.  Lattice centering  should be indicated by LATT, not SYMM.  The
symmetry operators may be specified using decimal or fractional numbers, e.g.
0.5-x, 0.5+y, -z  or  Y-X, -X, Z+1/6; the three components are separated by
commas.


SFAC elements

Element symbols which define the order of scattering factors to be employed
by the program.  The first 94 elements of the periodic system are recognized.
The element name may be preceded by '$' but this is not obligatory (the '$'
character is allowed for logical consistency but is ignored). The program uses
the neutral atom scattering factors, f' and f" and absorption coefficients
from International Tables for Crystallography, Volume C (1992), Ed. A.J.C.
Wilson, Kluwer Academic Publishers, Dordrecht: Tables 6.1.1.4 (pp. 500-502),
4.2.6.8 (pp. 219-222) and 4.2.4.2 (pp. 193-199) respectively.  The covalent
radii stored in the program are based on experience rather than taken from a
specific source, and are deliberately overestimated for elements which tend to
have variable coordination numbers so that 'bonds' are not missed, at the cost
of generating the occasional 'non-bond'.  The default radii (not those set for
individual atoms by CONN) are printed before the connectivity table.


SFAC label a1 b1 a2 b2 a3 b3 a4 b4 c f' f" mu r wt

Scattering factor in the form of an exponential series, followed by real and
imaginary dispersion terms, linear absorption coefficient, covalent radius and
atomic weight.  Except for the 'label' and atomic weight the format is the
same as that used in SHELX-76.  label consists of up to 4 characters beginning
with a letter (e.g. Ca2+) and should be included before a1; the first label
character may be a '$', but this would be ignored (note however that the '$',
if used, counts as one of the four characters).  The two SFAC formats may be
used in the same '.ins' file; the order of the SFAC instructions (and the
order of element names in the first type of SFAC instruction) define the
scattering factor numbers which are referenced by atom instructions.  The
units of mu should be barns/atom, as in Table 4.2.4.2 of International Tables,
Volume C (see above).


DISP  E  f' f" [#]  mu [#]

The DISP instruction allows the dispersion and (optionally) the absorption
coefficient of a particular element E (the name may be optionally prefaced by
'$') to be read in without having to use the full form of the SFAC
instruction.  It will typically be used for synchrotron data where the
wavelength does not correspond to the values (for Cu, Mo and Ag radiation)
for which these terms are stored in the program.  All other terms on the SFAC
instruction are independent of the wavelength, so its short form may then be
used.  DISP instructions, if present, MUST come between SFAC and UNIT.


UNIT n1 n2 ...

Number of atoms of each type in the unit-cell, in SFAC order.


LAUE E

Wavelength-dependent values of f' and f" may be defined for an element E by
means of the LAUE instruction, which is used in conjunction with the HKLF 2
reflection data format (in which the wavelength is given separately for each
reflection).  This is primarily intended for refinement of structures against
Laue data collected using synchrotron radiation, but could also be used for
refinement of a structure using data collected at different wavelengths
employing other sources.  There is no provision for handling overlapping
reflection orders.  Scaling for the source intensity distribution and Lp,
absorption corrections etc. must have been performed before using SHELXL-93.
A dummy wavelength of say 0.7 Angstrom should be given on the CELL
instruction, and the absorption coefficient estimated by the program should be
ignored.

   The element symbol may be preceded by '$' but this is optional; it must
be followed by at least one blank or the end of the line.  Any remaining
information on the LAUE instruction line is ignored.  The line immediately
following the LAUE instruction is always ignored, and so may be used for
headings.  The following lines contain values of wavelength (in Angstroms),
f' and f" in FORMAT(F7.3,2F8.3); further information (e.g. mu) may follow on
the same line but will be ignored.  The wavelength values must be in ascending
order and will be linearly interpolated; the wavelength intervals do not need
to be equal (but it is more efficient if most of them are) and should indeed
be smaller in the region of an absorption edge. This list is terminated by a
record in which all three values are given as zero.  There should only be one
LAUE instruction for each element type; if a reflection wavelength is outside
the range specified, the constant f' and f" values defined by the
corresponding SFAC instruction are used instead.

   A LAUE instruction must be preceded by (normal) SFAC and UNIT instructions
referencing the elements in question, and by all atoms.  Thus the LAUE
instruction(s) are usually the last instructions before HKLF 2 (or -2) at the
end of the '.ins' file (which facilitates editing). The +filename construction
may conveniently be used to read long LAUE tables from 'include' files without
echoing them.  If computer memory is restricted (e.g. in the real-mode PC
version of SHELXL-93) then the LAUE tables should not cover a larger range
than is strictly necessary for the reflection data employed.


REM

Followed by a comment on the same line.  This comment is copied to the results
file ('.res').  A line beginning with at least one blank may also be used as
a comment, but such comments are only copied to the .res file if the line is
completely blank; REM comments are always copied.  Comments may also be
included on the same line as any instruction following the character '!', and
are copied to the .res file (except in the case of atoms and FVAR, EXTI, SWAT
and BASF instructions).


MORE m [1]

MORE sets the amount of (printer) output; m takes a value in the range 0
(least) to 3 (most verbose).  MORE 0 also suppresses the echoing to the '.lst'
file of any instructions or atoms which follow it (until the next MORE
instruction).


TIME t [#]

If the time t (measured in seconds from the start of the job) is exceeded,
SHELXL-93 performs no further least-squares cycles, but goes on to the final
structure factor calculation followed by bond lengths, Fourier calculations
etc.  The default value of t is installation dependent, and is either set to
'infinity' or to a little less than the maximum time allocation for a
particular class of job.  Usually t is 'CPU time', but on some simpler
computer systems (e.g. PC's) the elapsed time may have to be used instead.


END

END is used to terminate an 'include' file, and may also be included after HKLF
in the '.ins' file (for compatibility with SHELX-76).



                  REFLECTION DATA INPUT AND MASSAGING

Before running SHELXL-93, a reflection data file 'name.hkl' must have been
prepared.  The HKLF command tells the program which format has been chosen
for this file, and allows the indices to be transformed using the 3x3 matrix
r11...r33, so that the new h is r11*h + r12*k + r13*l etc.  The program will
not accept matrices with negative or zero determinants.  It is essential that
the cell, symmetry and atom coordinates in the '.ins' file correspond to the
indices AFTER transformation using this matrix.

HKLF  n [0]  s [1]  r11...r33 [1 0 0 0 1 0 0 0 1]  wt [1]  m [0]

n is negative if reflection data follow, otherwise they are read from the
'.hkl' file.  The data are read in FORMAT(3I4,2F8.2,I4) (except for |n| < 3)
subject to FORTRAN-77 conventions.  The data are terminated by a record with
h, k and l all zero (except |n| = 1, which contains a terminator and a
checksum).  In the reflection formats given below, BN stands for batch number.
If BN is greater than one, Fc^2 is multiplied by the (BN-1)'th coefficient
specified by means of BASF instructions (see below).  If BN is zero or absent,
it is reset to one.  The multiplicative scale s multiplies both Fo^2 and
sigma(Fo^2) (or Fo and sigma(Fo) for n = 1 or 3).  The multiplicative weight
wt multiplies all 1/sigma^2 values and m is an integer 'offset' needed to read
'condensed data' (HKLF 1); both are included for compatibility with SHELX-76.
Negative n is also only retained for upwards compatibility; it is much better
to keep the reflection data in the 'name.hkl' file, otherwise the data can
easily get lost when editing 'name.res' to 'name.ins' for the next job.

n = 1:  SHELX-76 condensed data (BN is set to one).  'Condensed data' impose
        unnecessary index restrictions and can introduce rounding errors;
        although they still have their uses (email!), SHELXL-93 cannot
        generate condensed data and their use is discouraged.

n = 2:  h k l Fo^2 sigma(Fo^2) BN [1] lambda [#] in FORMAT(3I4,2F8.2,I4,F8.4)
        for refinement based on singlet reflections from Laue photographs.
        The data are assumed to be scaled for source intensity distribution
        and geometric factors and (if necessary) corrected for absorption.
        If lambda is zero or absent the value from the CELL instruction is
        used.  n = 2 switches off the merging of equivalent reflections
        BEFORE l.s. refinement (i.e. sets MERG 0); equivalents and
        measurements of the same reflections at different wavelengths are
        merged after least-squares refinement and the subsequent application
        of a dispersion correction, but before Fourier calculations.

The remaining options (n > 2) all require FORMAT(3I4,2F8.2,I4); as is normal
for a FORTRAN program, other formats (e.g. F8.0) may be used for the floating
point numbers provided that eight columns are used in all and a decimal point
is present.

n = 3:  h k l Fo sigma(Fo) BN [1] (if BN is absent or zero it is set to 1).
        The use of data corresponding to this format is NOT RECOMMENDED,
        since the generation of Fo and sigma(Fo) from Fo^2 and sigma(Fo^2) is
        a tricky statistical problem and could introduce bias.

n = 4:  h k l Fo^2 sigma(Fo^2) BN [1]  for the standard reflection data file.
        Since Fo^2 is obtained as the difference of the experimental peak and
        background counts, it may be positive or (occasionally) negative.

n = 5:  h k l Fo^2 sigma(Fo^2) m  where m is the twin component number.  Each
        measured Fo^2 value is fitted to the sum of k[|m|]*Fc[|m|]^2 over all
        contributing components, multiplied by the overall scale factor.  m
        should be given as positive for the last contributing component and
        negative for the remaining ones (if any). The values of Fo^2 and
        sigma(Fo^2) are taken from the last ('prime') reflection in a group,
        and may simply be set equal for each component, but the indices h,k,l
        will in general take on different values for each component.  The
        starting values of the twin factors k[2]..k[max(m)] are specified on
        BASF instruction(s); k[1] is given by one minus the sum of the other
        twin factors.  Note that many simple forms of twinning can also be
        handled with HKLF 4 and a TWIN instruction to generate the indices of
        the remaining twin component(s); HKLF 5 is required if the reciprocal
        space lattices of the components cannot be superimposed exactly.
        HKLF 5 sets MERG 0.

n = 6:  h k l Fo^2 sigma(Fo^2) m  as for n = 5, there may be one or more sets
        of reflection indices corresponding to a single Fo^2 value.  The last
        reflection in a group has a positive m value and the previous members
        of the group have negative m.  The values of Fo^2 and sigma(Fo^2) are
        taken from the last ('prime') reflection in a group, and may simply
        be set to the same values for the others.  m is here the reflection
        MULTIPLICITY, and is defined as the number of equivalent permutations
        of the given h, k and l values, not counting Friedel opposites. This
        is intended for fitting resolved powder data for high symmetry crystal
        systems.  For example, in a powder diagram of a crystal in the higher
        cubic Laue class (m3m) the reflections 3 0 0 (with multiplicity 3) and
        2 2 1 (multiplicity 12) would contribute to the same measured Fo^2.
        HKLF 6 sets MERG 0.  HKLF 6 may not be used with BASF.

   THERE MAY ONLY BE ONE HKLF INSTRUCTION AND IT MUST COME LAST, except when
HKLF -n is followed by reflection data in the '.ins' file, in which case the
file is terminated by the end of the reflection data.  Negative n is retained
for compatibility with SHELX-76 but is not recommended!


OMIT  s [-3]  2-theta(lim) [180]

s is a threshold for flagging reflections as 'unobserved'.  Note that if no
OMIT instruction is given, ALL reflections except those with large negative
Fo^2 [i.e. Fo^2 < -3.sigma(Fo^2)] are treated as 'observed'.  Unobserved data
are not used for least-squares refinement or Fourier calculations, but are
retained for the calculation of R-indices based on all data, and may also
appear (flagged with an asterisk) in the list of reflections for which Fo^2
and Fc^2 disagree significantly.  Internally in the program s is halved and
applied to Fo^2, so for positive Fo^2 the test is roughly equivalent to
suppressing all reflections with Fo < s * sigma(Fo), as required for
consistency with SHELX-76.  Note that s may be set to 0 (to suppress
reflections with negative Fo^2) or (as in the default setting) to a negative
threshold (to suppress very negative Fo^2) which has no equivalent in
SHELX-76.  An OMIT instruction with a positive s value is NOT ALLOWED in
combination with ACTA, because it may introduce a bias in the final refined
parameters; individual aberrant reflections may still be suppressed using
OMIT h k l, even when ACTA is used.

   2-theta(lim) defines a limiting 2-theta above which reflections are totally
ignored;  they are rejected immediately on reading in.  This facility may be
used to save computer time in the early stages of structure refinement, and is
also sometimes useful for macromolecules, but should not be used without very
good reason!  The SHEL command may be used to flag reflections as 'unobserved'
(but retain them in the data set) above or below particular 2-theta limits.

   OMIT followed by atom names but no numbers may be used to calculate an
'omit map' and is described in the section 'Atom Lists ...'.


OMIT h k l

The reflection h k l is flagged as 'unobserved' in the list of merged
reflections after data reduction.  Since there may be perfectly justified
reasons for ignoring individual reflections (e.g. when a reflection is
truncated by the beam stop) this form of OMIT is allowed with ACTA; however it
should not be used indiscriminately.  OMIT takes effect after MERG, so if the
default MERG 2 is used, OMIT must refer to the indices in the final reflection
list, not necessarily as input.  It will always be safe to use the indices as
given in the list of reflections which do not agree well that is printed
after least-squares refinement; however if no sort-merge is performed, OMIT
suppresses all reflections with matching indices.


SHEL lowres [infinite] highres [0]

Reflections outside the specified resolution range in Angstroms are flagged as
'unobserved' in the list of merged reflections after data reduction.  This
instruction may be useful for macromolecules.


BASF scale factors

Relative batch scale factors are included in the least-squares refinement
based on the batch numbers in the '.hkl' file.  For batch number BN, the Fc^2
value is multiplied by the (BN-1)'th scale factor from the BASF instruction,
as well as by the overall scale factor.  For batch number one (or zero), Fc is
multiplied by the overall scale factor, but not by a batch scale factor.  The
least-squares matrix will be singular if there are no reflections with BN=1
(or zero), so the program considers this to be an error.  Note that BASF scale
factors, unlike the overall scale factor (see FVAR) are relative to F^2, not
F.  For twinned crystals, i.e. when either TWIN or HKLF 5 are employed, BASF
specifies the fractional contributions of the various twin components.


TWIN  3x3 matrix [-1 0 0 0 -1 0 0 0 -1]  ncomp [2]

ncomp is the number of twin components (2 or greater) and the matrix is
applied (iteratively if |ncomp| > 2) to generate the indices of the twin
components from the input reflection indices, which apply to the first (prime)
component.  If a transformation matrix is also given on the HKLF instruction,
it is applied first before the (iterative) application of the TWIN matrix.
This method of defining twinning allows the standard HKLF 4 format to be used
for the '.hkl' file, but can only be used when the reciprocal lattices for all
twinned components are metrically superimposable. In other cases HKLF 5 format
must be used.  The Fo^2 values are fitted to the sum of k[m]*Fc[m]^2
multiplied by the overall scale factor, where k[1] is one minus the sum of
k[2], k[3], .. and the starting values for the remaining twin fractions k[2],
k[3], .. are specified on a BASF instruction.  Only ONE TWIN instruction is
allowed.  If BASF is omitted the TWIN factors are all assumed to be equal
(i.e. 'perfect' twinning).

   For example, if a structure in the space group P2(1)/c with a and c almost
equal and beta close to 120 degrees is pseudohexagonally twinned so that the
space group appears to be P6(3) (with the pseudo-6(3) axis along b),
refinement could be performed with the instructions:

 TWIN  0 0 1  0 1 0  -1 0 -1  3
 BASF .35 .25

The CELL, LATT and SYMM instructions would give the true monoclinic cell (in
the conventional setting with the 2(1) axis along b). A full set of monoclinic
data would be prerequisite for a satisfactory refinement.  If the twinning is
'perfect', the BASF instruction would be left out, and a unique hexagonal set
of data should suffice.  If the data had been collected on a hexagonal cell in
this example, an HKLF conversion matrix would be needed as well to make b the
2(1) axis first, e.g.:

 HKLF 4  1  1 0 0  0 0 1  0 -1 0

   Refinement of racemic twinning may be performed with:

 TWIN  -1 0 0  0 -1 0  0 0 -1  2  (or just TWIN, since these are the defaults)
 BASF 0.4

so that the BASF coefficient is the Flack absolute structure parameter x (H.D.
Flack, Acta Cryst., (1983) A39, 876-881; G. Bernardinelli and H.D. Flack, Acta
Cryst., A41 (1985) 500-511).  In this case the program does not calculate a
separate Flack parameter in the final structure factor calculation, but uses
the BASF parameter and its esd for the Flack parameter in the '.cif' output.

   If the racemic twinning is present at the same time as normal twinning,
ncomp should be doubled (because there are twice as many components as before)
and given a negative sign (to indicate to the program that the inversion
operator is to be applied multiplicatively with the specified TWIN matrix).
The number of BASF parameters, if any, should be increased from m-1 to 2m-1
where m is the original number of components (equal to the new |ncomp| divided
by 2).  The TWIN matrix is applied m-1 times to generate components 2 ... m
from the prime reflection (component 1); components m+1 ... 2m are then
generated as the Friedel opposites of components 1 ... m.  In such a case the
program will estimate a Flack parameter in the final structure factor cycle,
but it should be zero because it has already been taken into account.  This
is done because the esd of this number is still of interest even when there is
no longer a single racemic twinning parameter. It should be noted that because
of the way the twin component factors are defined, there will inevitably be
very large (e.g. 0.99) correlation coefficients between BASF parameters j and
j+m in this treatment of combined normal and racemic twinning.

   For both the TWIN and HKLF 5 treatments, the data are reduced to the prime
component by multiplying Fo^2 and Fc^2 by the ratio  Fc^2(prime) / Fc^2(total)
before performing the analysis of variance and Fourier calculations. Similarly
'OMIT h k l'  refers to the indices of the prime component.  The prime
component is the one for which the indices have not been transformed by the
TWIN instruction (i.e. m = 1 ), or in the case of HKLF 5 the component given
with positive m (i.e. last, but not necessarily with |m| = 1).


EXTI x [0]

An extinction parameter x is refined by least-squares, where Fc is multiplied
by:
                                                            -1/4
         k [ 1 + 0.001 * x * Fc^2 * lambda^3 / sin(2theta) ]

where k is the overall scale factor.  Note that it has been necessary to
change this expression from SHELX-76 (which used an even cruder approximation)
and SHELXTL (which used 0.002 instead of 0.001*lambda^3).  The wavelength
dependence is needed for HKLF 2 (Laue) data.  The program will print a warning
if extinction (or SWAT - see below) may be worth refining, but it is not
normally advisable to introduce it until all the non-hydrogen atoms have been
found.  For twinned and powder data, the Fc^2 value used in the above
expression is based on the total calculated intensity summed over all
components rather than the individual contributions, which would be easier to
justify theoretically (but makes little difference in practice).  For the
analysis of variance and '.fcf' output file, the Fo^2 values are brought onto
the absolute scale of Fc^2 by dividing them by the scale factor(s) and the
extinction factor.  The above expression for the extinction is empirical and
represents a compromise to cover both primary and secondary extinction; it
has been shown to work well in practice but does not appear to correspond
exactly to any of the expressions discussed in the literature.  The article by
A.C. Larson in Crystallographic Computing (1970), Ed. F.R. Ahmed, Munksgaard,
Copenhagen, pp. 291-294 comes closest and should be consulted for further
information.


SWAT g [0] U [2]

The SWAT option introduces one variable g and one fixed parameter U which
enable diffuse solvent to be modeled by Babinet's principle (R. Langridge,
D.A. Marvin, W.E. Seeds, H.R. Wilson, C.W. Hooper, M.H.F. Wilkins and L.D.
Hamilton, J. Mol. Biol. 2 (1960) 38-64; H. Driessen, M.I.J. Haneef, G.W.
Harris, B. Howlin, G. Khan and D.S. Moss, J. Appl. Cryst. 22 (1989) 510-516).
The real part of the scattering factor for each non-hydrogen atom is modified
as follows:

            f(new) = f(old) - g . exp [ -8pi^2.U.(sintheta/lambda)^2 ]

The large value of U ensures that only the low theta f and hence Fc^2 values
are affected.  Subtracting the term in g in this way from the occupied regions
of the structure is equivalent to adding a corresponding diffuse scattering
term in the (empty) solvent regions in its effect on all calculated Fc^2
values except F(000) (which is calculated ignoring g).  For proteins g usually
refines to a value between 2 and 4; for small molecules without significant
diffuse solvent regions it should refine to zero.

   Since both extinction and diffraction from diffuse solvent tend to affect
primarily the strong reflections at low diffraction angle, they tend to show
the same symptoms in the analysis of variance, and so a combined warning
message is printed.  It will however be obvious from the type of structural
problem which of the two should be applied.  The program does not permit the
simultaneous refinement of SWAT and EXTI.


MERG n [2]

If n is equal to 2 the reflections are sorted and merged before refinement;
if the structure is non-centrosymmetric the Friedel opposites are not combined
before refinement (necessary distinction from SHELXS).  If n is 1 the indices
are converted to a 'standard setting' in which l is maximized first, followed
by k, and then h; if n is zero, the data are neither sorted nor converted to a
standard setting.  n = 3 is the same as n = 2 except that Friedel opposites
are also merged (this introduces small systematic errors and should only be
used for good reason, e.g. to speed up the early stages of a refinement of a
light atom structure before performing the final stages with MERG 2).  Note
that the reflections are always merged, and Friedel opposites combined, before
performing Fourier calculations in SHELXL-93 so that the (difference) electron
density is correctly scaled.  Even with n = 0 the program will change the
reflection order within each data block to optimize the vectorization of the
structure factor calculations (it is shuffled back into the MERG order for
LIST 4 output).  Note that MERG may not be used in conjunction with TWIN or
HKLF 5 or 6.  In SHELX-76, MERG 3 had a totally different meaning, namely the
determination of inter-batch scale factors; in SHELXL-93, these may be
included in the refinement using the BASF instruction.


              ATOM LISTS AND LEAST-SQUARES CONSTRAINTS

Atom instructions begin with an atom name (up to 4 characters which do not
correspond to any of the ca. 80 SHELXL-93 or SHELXA command names, and
terminated by at least one blank) followed by a scattering factor number
(which refers to the list defined by the SFAC instruction(s)), x, y, and z in
fractional coordinates, and (optionally) a site occupation factor (s.o.f.) and
an isotropic U or six anisotropic Uij components (both in Angstroms^2).  Note
that different program systems may differ in their order of Uij components;
SHELXL-93 uses the same order as SHELX-76 and SHELXTL.  The exponential factor
takes the form  exp(-8.pi^2.U.[sin(theta)/lambda]^2)  for an isotropic
displacement parameter U and:

 exp ( -2.pi^2.[ h^2.(a*)^2.U11 + k^2.(b*)^2.U22 + ... + 2hk.a*.b*.U12 ] )

for anisotropic Uij.  An atom is specified as follows in the '.ins' file:

atomname sfac x y z sof [11] U [0.05] or U11 U22 U33 U23 U13 U12

   The atom name must be unique, except that atoms in different residues - see
RESI - may have the same names; in contrast to SHELX-76 it is not necessary to
pad out the atom name to 4 characters with blanks.  To fix any atom
parameter, add 10.  Thus the site occupation factor is normally given as 11
(i.e. fixed at 1).  The site occupation factor for an atom in a special
position should be multiplied by the multiplicity of that position (as given
in International Tables, Volume A) and divided by the multiplicity of the
general position for that space group.  This is the same definition as in
SHELX-76 and is retained for upwards compatibility; it might have been less
confusing to keep the multiplicity and occupation factor separate.  An atom on
a fourfold axis for example will usually have s.o.f. = 10.25.

   If any atom parameter is given as (10*m+p), where abs(p) is less than 5 and
m is an integer, it is interpreted as p*fv(m), where fv(m) is the mth 'free
variable' (see FVAR).  Note that there is no fv(1), since this position on an
FVAR instruction is occupied by the overall scale factor, and m=1 corresponds
to fixing an atom by adding 10. If m is negative, the parameter is interpreted
as p*(fv(-m)-1).  Thus to constrain two occupation factors to add up to 0.25
(for two elements occupying the same fourfold special position) they could be
given as 20.25 and -20.25, i.e. 0.25*fv(2) and 0.25*(1-fv(2)), which
correspond to p=0.25, m=2 and p=-0.25, m=-2 respectively.

   In SHELX-76, it was necessary to use free variables and coordinate fixing
in this way to set up the appropriate constraints for refinement of atoms on
special positions.  In SHELXL-93, this is allowed (for upwards compatibility)
but is NOT NECESSARY: the program will automatically work out and apply the
appropriate positional, s.o.f. and Uij constraints for any special position
in any space group, in a conventional setting or otherwise.  Thus all that
is necessary is to specify atomname, sfac, x, y and z, and leave the rest to
the program; when the atom is (later) made anisotropic using the ANIS command,
the appropriate Uij constraints will be added.  For a well-behaved structure,
the list of atom coordinates (from direct methods and/or difference electron
density syntheses) suffices.  If the multiplicity factor (s.o.f.) is left out,
it will be fixed at the appropriate value of 1 for a general position and less
than 1 for a special position.  Since SHELXL-93 automatically generates origin
restraints for polar space groups, no atom coordinates should be fixed by the
user for this purpose (in contrast to SHELX-76).

   It may still be necessary to apply constraints by hand to handle disorder;
a common case is that there are two possible positions for a group of atoms,
in which the first set should all have s.o.f.'s of (say) 21, and the second
set -21, with the result that the sum of the two occupation factors is fixed
at 1, but the individual values may refine as fv(2) and 1-fv(2).  Similarly if
a special position with 2/m symmetry is occupied by Ca2+ and Ba2+, the two
ions could be given the s.o.f.'s 30.25 and -30.25 respectively.  In this case
it would be desirable to use the EADP instruction to equate the Ca2+ and Ba2+
(anisotropic) displacement parameters.

   If U is given as -T, where T is in the range 0.5 < T < 5, it is fixed at T
times U(eq) of the previous atom not constrained in this way.  The resulting
value is not refined independently but is updated after every least-squares
cycle.


SPEC del [0.2]

All following atoms (until the next SPEC instruction) are considered to lie
on special positions (for the purpose of automatic constraint generation) if
they lie within del Angstroms of a special position.  The coordinates of such
an atom are also adjusted so that it lies exactly on the special position.


RESI class [    ] number [0] alias

Until the next RESI instruction, all atoms are considered to be in the
specified 'residue', which may be defined by a class (up to four characters,
beginning with a letter) or number (up to four digits) or both.  The same atom
names may be employed in different residues, enabling them to be referenced
globally or selectively.  The residue number should be unique to a particular
residue, but the class may be used to refer to a class of similar residues,
e.g. a particular type of amino acid in a polypeptide.

   Residues may be referenced by any instruction which allows atom names; the
reference takes the form of the character '_' followed by either the residue
class or number without intervening spaces.  If an instruction codeword is
followed immediately by a residue number, all atom names referred to in the
instruction are assumed to belong to that residue unless they are themselves
immediately followed by '_' and a residue number, which is then used instead.
Thus:

 RTAB_4 Ang N H0 O_11

would cause the calculation of an angle N_4 - H0_4 - O_11, where the first
two atoms are in residue 4 and the third is in residue 11.

   If the instruction codeword is followed immediately by a residue class, the
instruction is effectively duplicated for all residues of that class.  '_*'
may be used to match all residue classes; this includes the default class
'    ' (residue number 0) which applies until the first RESI instruction is
encountered.  Thus:

 MPLA_phe CB > CZ

would calculate least-squares planes through atoms CB to CZ inclusive of all
residues of class 'phe' (phenylalanine).  In the special case of HFIX, only
the FIRST instruction which applies to a given atom is applied.  Thus:

 HFIX_1 33 N
 HFIX_* 43 N

would add hydrogens to the N-terminal nitrogen (residue 1) of a polypeptide
to generate a -NH3+ group, but all other (amide) nitrogens would become -NH-.

   Individual atom names in an instruction may be followed by '_' and a
residue number, but not by '_*' or '_' and a residue class.  If an atom name
is not followed by a residue number, the current residue is assumed (unless
overridden by a global residue number or class appended to the instruction
codeword).  The symbols '_+' meaning 'the next residue' and '_-' meaning 'the
preceding residue' (i.e. residues number n+1 and n-1 if the current residue
number is n) may be appended to atom names but not to instruction codenames.
Thus the instruction:

 RTAB_* Omeg CA_- C_- N CA

could be used to calculate all the peptide (omega) torsion angles in a protein
or polypeptide.  If (as at the N-terminus in this example) some or all of the
named atoms cannot be found for a particular residue, the instruction is
simply ignored for that residue.

   '_$n' does not refer to a residue; it uses the symmetry operation $n
defined by a preceding 'EQIV $n' instruction to generate an equivalent of the
named atom (see EQIV).

   alias specifies an alternative value of the residue number so that cyclic
chains of residues may be created; for a cyclic pentapeptide (residue numbers
2,3,..6) it could be set to 1 for residue 6 and to 7 for residue 2.  If more
than one RESI instruction refers to the same number, alias only needs to be
specified once.  alias is referenced only by the _+ and _- operations (see
above), and a value used for alias may not be used as a residue number on a
RESI instruction.  Note that if there is more than one cyclic peptide in the
asymmetric unit, it is a good idea to leave a gap of TWO residue numbers
between them.  E.g. a cyclic pentapeptide with two molecules in the asymmetric
unit would be numbered 2 to 6 and 9 to 13, with aliases 7 on RESI 2, 1 on RESI
6, 14 on RESI 9 and 10 on RESI 13.

   It will generally be found convenient for applying restraints etc. to use
the same names for atoms in identical residues.


MOVE dx [0] dy [0] dz [0] sign [1]

The coordinates of the following atoms are changed to:  x = dx + sign * x,
y = dy + sign * y,  z = dz + sign * z  until superseded by a further MOVE.
MOVE should not be used at the same time as the specification of zero
coordinates to indicate that an atom should not be used in fitting a
fragment of known geometry (e.g. AFIX 66), because after the move the
coordinates will no longer be zero!


ANIS n

The next n isotropic non-hydrogen atoms are made anisotropic, generating
appropriate special position constraints for the Uij if required.  Intervening
atoms which are already anisotropic are not counted.  A negative n has the
same effect.


ANIS names

The named atoms are made anisotropic (if not already), generating the
appropriate constraints for special positions.  Note that names may include
'$' followed by a scattering factor name (see SFAC);  'ANIS $CL' would make
all chlorine atoms anisotropic.  Since ANIS, like other instructions, applies
to the current residue unless otherwise specified, ANIS_* $S would be required
to make the sulfur atoms in all residues anisotropic (for example).  ANIS MUST
precede the atoms to which it is to be applied.  ANIS on its own, with neither
a number nor names as parameters, makes all FOLLOWING non-hydrogen atoms (in
all residues) anisotropic.  The L.S. and CGLS instructions provide the option
of delaying the conversion to anisotropic of all atoms specified by ANIS until
a given number of least-squares cycles has been performed.


AFIX mn d [#] sof [11] U [10.08]

AFIX applies constraints and/or generates idealized coordinates for all atoms
until the next AFIX instruction is read.  The digits mn of the AFIX code
control two logically quite separate operations.  Although this is confusing
for new users, it has been retained for upwards compatibility with SHELX-76,
and because it provides a very concise notation.  m refers to geometrical
operations which are performed before the first refinement cycle (hydrogen
atoms are idealized before every cycle), and n sets up constraints which are
applied throughout the least-squares refinement.  n is always a single digit;
m may be two, one or zero digits (the last corresponds to m = 0).

   The options for idealizing hydrogen atom positions depend on the
connectivity table which is set up using CONN, BIND, FREE and PART; with
experience, this can also be used to generate hydrogen atoms attached to
disordered groups and to atoms on special positions.  d determines the bond
lengths in the idealized groups, and sof and U OVERRIDE the values in the atom
list for all atoms until the next AFIX instruction.  U is not applied if the
atom is already anisotropic, but is used if an isotropic atom is to be made
anisotropic using ANIS.  Any legal U value may be used, e.g. 31 (a free
variable reference) or -1.2 (1.2 times Ueq of the preceding normal atom). Each
AFIX instruction must be followed by the required number of hydrogen or other
atoms.  The individual AFIX options are as follows; the default X-H distances
depend on both the chemical environment and the temperature (to allow for
librational effects) which is specified by means of the TEMP instruction.

m = 0  No action.

m = 1  Idealized tertiary C-H with all X-C-H angles equal.  There must be
       three and only three other bonds in the connectivity table to the
       immediately preceding atom, which is assumed to be carbon.  m = 1 is
       often combined with a riding model refinement (n = 3).

m = 2  Idealized secondary CH2 with all X-C-H and Y-C-H angles equal, and
       H-C-H determined by X-C-Y (i.e. approximately tetrahedral, but widened
       if X-C-Y is much less than tetrahedral).  This option is also suitable
       for riding refinement (n = 3).

m = 3  Idealized CH3 group with tetrahedral angles.  The group is staggered
       with respect to the shortest other bond to the atom to which the -CH3
       is attached.  If there is no such bond (e.g. an acetonitrile solvent
       molecule) this method cannot be used (but m = 13 is still viable).

m = 4  Aromatic C-H or amide N-H with the hydrogen atom on the external
       bisector of the X-C-Y or X-N-Y angle.  m = 4 is suitable for a riding
       model refinement, i.e. AFIX 43 before the H atom.

m = 5  Next five non-hydrogen atoms are fitted to a regular pentagon, default
       d = 1.42 A.

m = 6  Next six non-hydrogen atoms are fitted to a regular hexagon, default
       d = 1.39 A.

m = 7  Identical to m = 6 (included for upwards compatibility from SHELX-76).
       In SHELX-76 only the first, third and fifth atoms of the six-membered
       ring were used as target atoms; in SHELXL-93 this will still be the
       case if the other three are given zero coordinates, but the procedure
       is more general because any one, two or three atoms may be left out by
       giving them zero coordinates.

m = 8  Idealized OH group, with X-O-H angle tetrahedral.  If the oxygen is
       attached to a saturated carbon, all three staggered positions are
       considered for the hydrogen.  If it is attached to an aromatic ring,
       both positions in the plane are considered.  The final choice is based
       on forming the 'best' hydrogen bond to a nitrogen, oxygen, chlorine or
       fluorine atom.  The algorithm involves generating a potential position
       for such an atom by extrapolating the O-H vector, then finding the
       nearest N, O, F or Cl atom to this position, taking symmetry
       equivalents into account.  If another atom which, (according to the
       connectivity table) is bonded to the N, O, F or Cl atom, is nearer to
       the ideal position, the N, O, F or Cl atom is not considered. Note that
       m = 8 had a different effect in SHELX-76 (but was rarely employed).

m = 9  Idealized terminal X=CH2 or X=NH2+ with the hydrogen atoms in the plane
       of the nearest substituent on the atom X.  Suitable for riding model
       refinement (AFIX 93 before the two H atoms).

m = 10 Idealized pentamethylcyclopentadienyl (Cp*). This AFIX must be followed
       by the 5 ring carbons and then the 5 methyl carbons in cyclic order, so
       that the first methyl group (atom 6) is attached to the first carbon
       (atom 1).  The default d is 1.42 A, with the C-CH3 distance set to
       1.063d.  A variable-metric rigid group refinement (AFIX 109) would be
       appropriate, and would allow for librational shortening of the bonds.
       Hydrogen atoms (e.g. with AFIX 37 or 127) may be included after the
       corresponding carbon atoms, in which case AFIX 0 or 5 (in the case of a
       rigid group refinement) must be inserted before the next carbon atom.

m = 11 Idealized naphthalene group with equal bonds (default d = 1.39 A).
       The atoms should be numbered as a symmetrical figure of eight, starting
       with the alpha C and followed by the beta, so that the first six atoms
       (and also the last six) describe a hexagon in cyclic order.   m = 11 is
       also appropriate for rigid group refinement (AFIX 116).

m = 12 Idealized disordered methyl group; as m = 3 but with two positions
       rotated from each other by 60 degrees.  The corresponding occupation
       factors should normally be set to add up to one, e.g. by giving them as
       21 (i.e. 1*fv(2) ) and -21 ( 1*(1-fv(2)) ). If HFIX is used to generate
       an AFIX instruction with m=12, the occupation factors are fixed at 0.5.
       AFIX 12n is suitable for a para methyl on a phenyl group with no meta
       substituents, and should be followed by 6 half hydrogen atoms (first
       the three belonging to one -CH3 component, then the three belonging to
       the other, so that hydrogens n and n+3 are opposite one another).
       Disordered -CF3 groups may also be generated in this way (with d=1.32).

m = 13 Idealized CH3 group with tetrahedral angles.  If the coordinates of
       the first hydrogen atom are non-zero, they define the torsion angle
       of the methyl group.  Otherwise (or if the AFIX instruction is being
       generated via HFIX) a structure-factor calculation is performed (of
       course only once, even if many hydrogens are involved) and the torsion
       angle is set which maximizes the sum of the electron density at the
       three calculated hydrogen positions.  Since even this is not an
       infallible method of getting the correct torsion angle, it should
       normally be combined with a rigid or rotating group refinement for the
       methyl group (e.g. mn = 137 before the first H).  In subsequent least-
       squares cycles the group is re-idealized retaining the current torsion
       angle.  -CF3 groups may be generated in the same way (with d = 1.32).

m = 14 Idealized OH group, with X-O-H angle tetrahedral.  If the coordinates
       of the hydrogen atom are non-zero, they are used to define the torsion
       angle.  Otherwise (or if HFIX was used to set up the AFIX instruction)
       the torsion angle is chosen which maximizes the electron density (see
       m = 13).  Since this torsion angle is unlikely to be very accurate, the
       use of a rotating group refinement is recommended (i.e. mn = 147 before
       the H atom).

m = 15 BH group in which the boron atom is bonded to either four or five other
       atoms as part of an polyhedral fragment.  The hydrogen atom is placed
       on the vector which represents the negative sum of the unit vectors
       along the four or five other bonds to the boron atom.

m = 16 Acetylenic C-H, with X-C-H linear.  Usually refined with the riding
       model, i.e. AFIX 163.

m > 16 A group defined in a FRAG...FEND section with code = m is fitted,
       usually as a preliminary to rigid group refinement.  The FRAG...FEND
       section MUST precede the corresponding AFIX instruction in the '.ins'
       file, but there may be any number of AFIX instructions with the same
       m corresponding to a single FRAG...FEND section.

When a group is fitted (m = 5, 6, 10 or 11, or m > 16), atoms with non-zero
coordinates are used as target atoms with equal weight.  Atoms with all three
coordinates zero are ignored.  Any three or more non-colinear atoms may be
used as target atoms.

   'Riding' (n = 3, 4) and 'rotating' (n = 7, 8) hydrogen atoms, but not other
idealized groups, are re-idealized (if m is 1, 2, 3, 4, 8, 9, 12, 13, 14, 15
or 16) before each refinement cycle (after the first cycle, the coordinates of
the first hydrogen of a group are always non-zero, so the torsion angle is
retained on reidealizing).  For n = 4 and 8, the angles are reidealized but
the (refined) X-H bond length is retained, unless the hydrogen coordinates are
all zero, in which case d (on the AFIX instruction) or (if d is not given) a
standard value which depends on the chemical environment and temperature
(TEMP) is used instead.

n = 0  No action.

n = 1  The coordinates, s.o.f. and U or Uij are fixed.

n = 2  The s.o.f. and U (or Uij) are fixed, but the coordinates are free to
       refine.

n = 3  The coordinates, but not the s.o.f. or U (or Uij) 'ride' on the
       coordinates of the previous atom with n not equal to 3.  The same
       shifts are applied to the coordinates of both atoms, and both
       contribute to the derivative calculation.  The atom on which riding
       is performed may not itself be a riding atom, but it may be in a rigid
       group (m = 5, 6 or 9).

n = 4  This constraint is the same as n = 3 except that the X-H distance is
       free to refine.  The X-H vector direction does not change.  This
       constraint requires better quality reflection data than n = 3, but
       allows for variations in apparent X-H distances caused by libration
       and bonding effects.  If there is more than one equivalent hydrogen,
       the same shift is applied to each equivalent X-H distance (e.g. to all
       three C-H bonds in a methyl group).  n = 4 may be combined with DFIX
       or SADI restraints (to restrain chemically equivalent X-H distances to
       be equal) or embedded inside a rigid (n = 6) group, in which case the
       next atom (if any) in the same rigid group must follow an explicit AFIX
       instruction with n = 5.  Note that n = 4 had a different effect in
       SHELX-76.

n = 5  The next atom(s) are 'dependent' atoms in a rigid group.  Note that
       this is automatically generated for the atoms following an n = 6 or
       n = 9 atom, so does not need to be included specifically unless m has
       to be changed (e.g. AFIX 35 before the first hydrogen of a rigid methyl
       group with AFIX 6 or 9 before the preceding carbon).

n = 6  The next atom is the 'pivot atom' of a NEW rigid group, i.e. the other
       atoms in the rigid group rotate about this atom, and the same
       translational shifts are applied to all atoms in the rigid group.

n = 7  The following (usually hydrogen) atoms (until the next AFIX with n not
       equal to 7) are allowed to ride on the immediately preceding atom X and
       rotate about the Y-X bond; X must be bonded to one and only one atom Y
       in the connectivity list, ignoring the n = 7 atoms (which, if they are
       F rather than H, may be present in the connectivity list).  The motion
       of the atoms of this 'rotating group' is a combination of riding motion
       (c.f. n = 3) on the atom X plus a tangential component perpendicular
       to the Y-X and X-H bonds, so that the X-H distances, Y-X-H and H-X-H
       angles remain unchanged.  This constraint is intended for -OH, -CH3 and
       possibly -CF3 groups.  X may be part of a rigid group, which may be
       resumed with an AFIX n = 5 following the n = 7 atoms.

n = 8  This constraint is similar to n = 7 except that the X-H distances
       may also vary, the same shifts being applied along all the X-H bonds.
       Thus only the Y-X-H and H-X-H angles are held constant; the
       relationship of n = 8 to n = 7 corresponds to that of n = 4 to n = 3.
       DFIX and SADI restraints may be useful for the X-H distances.  This
       constraint is useful for -CF3 groups or for -CH3 groups with good data.

n = 9  The first (pivot) atom of a new 'variable metric' rigid group.  Such a
       group retains its 'shape' but may shrink or expand uniformly.  It is
       useful for C5H5 and BF4 groups, which may show appreciable librational
       shortening of the bond lengths.  Subsequent atoms of this type of
       rigid group should have n = 5, which is generated automatically by the
       program if no other AFIX instruction is inserted between the atoms.
       Riding atoms are not permitted inside this type of rigid group.  Only
       the pivot atom coordinates may be fixed (by adding 10) or tied to
       free variables, and only the pivot atom may lie on a special position
       (for the automatic generation of special position constraints).

Although there are many possible combinations of m and n, in practice only a
small number is used extensively, as discussed in the section on hydrogen
atoms.  Rigid group fitting and refinement (e.g. AFIX 66 followed by six
atoms of a phenyl ring or AFIX 109 in front of a Cp* group) is particularly
useful in the initial stages of refinement; atoms not found in the structure
solution may be given zero coordinates, in which cases they will be generated
from the rigid group fit.

   A rigid group or set of dependent hydrogens must ALWAYS be followed by
'AFIX 0' (or another AFIX instruction)!  Leaving out 'AFIX 0' by mistake is a
common cause of error; the program is able to detect and correct some obvious
cases, but in many cases this is not logically possible.


HFIX  mn  U [#]  d [#]  atomnames

HFIX generates AFIX instructions and dummy hydrogen atoms bonded to the named
atoms, the AFIX parameters being as specified on the HFIX instruction.  This
is exactly equivalent to the corresponding editing of the atom list.  The atom
names may reference residues (by appending '_n' to the name, where n is the
residue number), or SFAC names (preceded by a '$' sign).  U may be any legal
value for the isotropic temperature factor, e.g. 21 to tie a group of hydrogen
U value to free variable 2, or -1.5 to fix U at 1.5 times U(eq) of the
preceding normal atom.  HFIX MUST precede the atoms to which it is to be
applied.  If more than one HFIX instruction references a given atom, only the
FIRST is applied.  'HFIX 0' is legal, and may be used to switch off following
HFIX instructions for a given atom (which is useful if the latter involve '_*'
or a global reference to a residue class).


FRAG code [17] a [1] b [1] c [1] alpha [90] beta [90] gamma [90]

Enables a fragment to be input using a cell and coordinates taken from the
literature.  Orthogonal coordinates may also be input in this way.  Such a
fragment may be fitted to the set of atoms following an AFIX instruction with
m = code (code must be greater than 16); there must be the same number of
atoms in this set as there are following FRAG, and they must be in the same
order.  Only the coordinates of the FRAG fragment are actually used; atom
names, sfac numbers, sof and Uij are IGNORED.  A FRAG fragment may be given
anywhere between UNIT and HKLF or END, and must be terminated by a FEND
instruction, but must precede any AFIX instruction which refers to it.  This
'rigid fit' is often a preliminary to a rigid group refinement (AFIX with n =
6 or 9).


FEND

This must immediately follow the last atom of a FRAG fragment.


EXYZ atomnames

The same x, y and z parameters are used for all the named atoms.  This is
useful when atoms of different elements share the same site, e.g. in minerals
(in which case EADP will probably be used as well).  The coordinates (and
possibly free variable references) are taken from the named atom which
precedes the others in the atom list, and the actual values, free variable
references etc. given for the x, y and z of the other atoms are ignored.
An atom should not appear in more than one EXYZ instruction.


EADP atomnames

The same isotropic or anisotropic displacement parameters are used for all
the named atoms.  The displacement parameters (and possibly free variable
references) are taken from the named atom which precedes the others in the
atom list, and the actual values, free variable references etc. given for the
Uij of the other atoms are ignored.  The atoms involved must either be all
isotropic or all anisotropic.  An atom should not appear in more than one EADP
instruction.  'Opposite' fluorines of PF6 or disordered -CF3 groups are good
candidates for EADP, e.g.

 EADP F11 F14
 EADP F12 F15
 EADP F13 F16
 C1 .......
 PART 1
 F11 ...... 21 ......
 F12 ...... 21 ......
 F13 ...... 21 ......
 PART 2
 F14 ...... -21 ......
 F15 ...... -21 ......
 F16 ...... -21 ......
 PART 0

EADP applies an (exact) CONSTRAINT.  The SIMU instruction RESTRAINS the Uij
components of neighboring atoms to be approximately equal with an appropriate
(usually fairly large) esd.


EQIV  $n  symmetry operation

Defines symmetry operation $n for referencing symmetry equivalent atoms on any
instruction which allows atom names, by appending '_$n' (where n is an integer
between 1 and 511 inclusive) to the atom name.  Such a symmetry operation must
be defined before it is used; it does not have to be an allowed operation of
the space group, but the same notation is used as on the SYMM instruction.
The same $n may not appear on two separate EQIV instructions.  Thus:

 EQIV $2 1-x, y, 1-z
 CONF C1 C2 C2_$2 C1_$2

could be used to calculate a torsion angle across a crystallographic twofold
axis (note that this may be required because CONF with no atom names only
generates torsion angles automatically which involve the unique atom list and
a one atom deep shell of symmetry equivalents).  If the instruction codeword
refers to a residue, this is applied to the named atoms before any symmetry
operation specified with '_$n'.  Thus:

 RTAB_23 O..O OG_12 O_$3

would calculate the (hydrogen bond) distance between OG_12 and (O_23)_$3, i.e.
between OG in residue 12 and the equivalent obtained by applying the symmetry
operation defined by EQIV $3 to the atom O in residue 23.


OMIT  atomnames

The named atoms are retained in the atom list but ignored in the structure
factor calculation and least-squares refinement. This instruction may be used,
together with L.S. 0 and FMAP 2, to create an 'OMIT map' to get a clearer
picture of disordered regions of the structure; this concept will be familiar
to macromolecular crystallographers.  In particular, 'OMIT $H' can be used to
check the hydrogen atom assignment of -OH groups etc.  If an actual peak is
present within 0.31 A of the calculated hydrogen atom position, the electron
density appears in the 'Peak' column of the PLAN output.  OMIT_* $H must be
used for this if residues are employed.



                       THE CONNECTIVITY LIST

The connectivity list is a list of 'bonds' which is set up automatically,
and may be edited using BIND and FREE.  It is used to define idealized
hydrogen atom positions, for the BOND and PLAN output of bond lengths and
angles, and by the instructions DELU, CHIV, SAME and SIMU.  Hydrogen atoms
are excluded from the connectivity list (except when introduced by hand
using BIND).


CONN  bmax [12]  r [#]  atomnames     or    CONN  bmax [12]

The CONN instruction fine tunes the generation of the connectivity table and
is particularly useful when pi-bonded ligands or metal ions are present in the
structure.  For the purposes of the connectivity table (which is always
generated), bonds are all distances between non-hydrogen atoms less than
r1 + r2 + 0.5 Angstroms, where r1 and r2 are the covalent radii of the atoms
in question (taking PART into consideration as explained below).  A shell of
symmetry equivalent atoms is also generated, so that all unique bonds are
represented at least once in the list.  All bonds, including those to
symmetry equivalent atoms, may be deleted or added using the FREE or BIND
instructions.

   Default values of r (identified by the scattering factor type) are stored
in the program.  These defaults may be changed (for both the connectivity
table AND the PLAN -n output) by using the full form of the SFAC instruction.
Alternatively the defaults may be overridden for the named atoms by specifying
r on a CONN instruction, in which case r is used in the generation of the
connectivity list but not by the  PLAN instruction. '$' followed by an element
name (the same as on a SFAC instruction) may also be employed on a CONN
instruction (and also does not apply to PLAN).  The second form of the CONN
instruction may be used to change the maximum coordination number bmax for all
atoms (which defaults to 12 if there is no CONN instruction).

   If, after generating bonds as above and editing with FREE and BIND, there
are more than bmax bonds to a given atom, the list is pruned so that only the
shortest bmax are retained.  A harmless side-effect of this pruning of the
connectivity list is that symmetry operations may be stored and printed that
are never actually used.  Note that this option only removes one entry for a
bond from the connectivity list, not both, except in the case of 'CONN 0'
which ensures that there are no bonds to or from the named atoms.  In some
cases it will be necessary to use FREE to remove a 'bond' from a light atom to
an alkali metal atom (for example) in order to generate hydrogen atoms
correctly.

   'CONN 0' is frequently used to prevent the solvent water in macromolecular
structures from making additional 'bonds' to the macromolecule which confuse
the generation of idealized hydrogen atoms etc., and it is also required if
BUMP is used to generate 'antibumping' restraints in such cases.  Refinements
of macromolecules will often include BUMP and 'CONN 0 O1 > LAST', where 'LAST'
may be used to indicate the last atom in the file (which saves trouble when
adding extra waters).

   The CONN instruction, like ANIS and HFIX, MUST precede the atoms to which
it is to be applied.  Repeated CONN instructions are allowed; the LAST
relevant CONN preceding a particular atom is the one which is actually applied.
CONN without atom names changes the default value of bmax for all following
atoms.  The following example illustrates the use of CONN:

 CONN Fe 0
 MPLA 5 C11 > C15 Fe
 MPLA 5 C21 > C25 Fe
 Fe  .....
 C11 .....
 .........
 C25 .....

which would prevent bonds being generated from the iron atom to all 10 carbons
in ferrocene. In this example the distances of the iron atom from the two ring
planes would be calculated instead.


PART  n  sof

The following atoms belong to part n of a disordered group.  The automatic
bond generation ignores bonds between atoms with different PART numbers,
unless one of them is zero (the value before the first PART instruction). If
a site occupation factor (sof) is specified on the PART instruction, it
overrides the value on the following atom instructions (even if set via an
AFIX instruction) until a further PART instruction, e.g. 'PART 0', is
encountered).

   If n is negative, the generation of special position constraints is
suppressed and bonds to symmetry generated atoms with the same or a different
non-zero PART number are excluded; this is suitable for a solvent molecule
disordered on a special position of higher symmetry than the molecule can take
(e.g. a toluene molecule on an inversion center).  A PART instruction remains
in force until a further PART instruction is read; 'PART 0' should be used to
continue with the non-disordered part of the structure.

   Some care is necessary in generating hydrogen atoms where disordered groups
are involved.  If the hydrogen atoms are assigned a PART number, then even if
the atom to which they are attached has no part number (i.e. PART 0) the above
rules may be used by the program to work out the correct connectivity for
calculating the hydrogen atom positions.  HFIX hydrogens are assigned the PART
number of the atom to which they are attached.  If the hydrogens and the atom
to which they are attached belong to PART zero but the latter atom is bonded
to atoms with non-zero PART, the LOWEST of these non-zero PART numbers is
assumed to be the major component and is used to calculate the hydrogen
positions.  As an example, assume that one of the valine residues (Val32) in a
small protein is disordered so that one of the methyl groups is common to both
components and the other is disordered unequally over the two remaining
positions.  Hydrogens could be added for the major component only as follows:

  HFIX_val 37 CG1 CG2
  HFIX_val 13 CB
  :
  RESI 32 Val
  N    .....
  CA   .....
  C    .....
  O    .....
  CB   .....
  CG1  .....
  PART 1
  CG2  1  ...  ...  ...  21  ...
  PART 2
  CG2' 1  ...  ...  ... -21  ...
  PART 0
  :

where free variable 2 is the occupation factor for PART 1 (say 0.7) and the
occupation factor of the second component is tied to 1-fv(2) (i.e. 0.3).  The
value for this free variable is set on the FVAR instruction and is free to
refine.  If there were more than two components, a linear free variable
restraint (SUMP) could be used to restrain the sum of occupation factors to
(e.g.) 1.  The hydrogens for the second component could be added in a
subsequent job with the help of AFIX instructions:

  :
  CB   .....
  PART 1                           ! This hydrogen for the major component was
  AFIX 13                          ! generated in the previous run (but Part 1
  HB   2  ...  ...  ...  21 -1.2   ! must be added and its sof changed now !).
  PART 2                           !!
  AFIX 13                          !! These four lines are added now; HFIX
  HB'  2  ...  ...  ... -21 -1.2   !! would not be a valid alternative.
  PART 0                           !!
  AFIX 0
  CG1  .....
  AFIX 37                          !
  HG1A 2  ...  ...  ...  11 -1.5   ! Generated from HFIX in previous run.
  HG1B 2  ...  ...  ...  11 -1.5   !
  HG1C 2  ...  ...  ...  11 -1.5   !
  AFIX 0                           !
  PART 1
  CG2  1  ...  ...  ...  21  ...
  AFIX 37                          !
  HG2A 2  ...  ...  ...  21 -1.5   ! Generated from HFIX in previous run.
  HG2B 2  ...  ...  ...  21 -1.5   !
  HG2C 2  ...  ...  ...  21 -1.5   !
  AFIX 0                           !
  PART 2
  CG2' 1  ...  ...  ... -21  ...
  AFIX 37                          !!
  HG2D 2  ...  ...  ... -21 -1.5   !! These five lines are added now - in this
  HG2E 2  ...  ...  ... -21 -1.5   !! case HFIX 37 CG2'_32 could also be used.
  HG2F 2  ...  ...  ... -21 -1.5   !!
  AFIX 0                           !!
  PART 0
  :


BIND atom1 atom2

The specified 'bond' (which may be of any length) is added to the connectivity
list if it is not there already.  Only one of the two atoms may be an
equivalent atom (i.e. have the extension _$n).


FREE atom1 atom2

The specified 'bond' is deleted from the connectivity list (if present).  Only
one of the two atoms may be an equivalent atom (i.e. have the extension _$n).



                      LEAST-SQUARES RESTRAINTS


DFIX  d  s [#]  atom pairs

The distance between the first and second named atom, the third and fourth,
fifth and sixth etc. (if present) is restrained to a target value d with an
estimated standard deviation s.  d may refer to a 'free variable', otherwise
it is considered to be fixed.  Fixing d by adding 10 is not allowed, so the
value may lie between 0 and 15.

   If d is given a negative sign, the restraint is applied ONLY if the current
distance between the two atoms is LESS than |d|.  This is an 'antibumping'
restraint, and may be used to prevent solvent (water) molecules from
approaching too close to one another or to a macromolecule.  Antibumping
restraints may also be generated automatically using the BUMP instruction (see
below).

   The default value of s is 0.03 when d is positive and 0.1 when d is
negative.  The default s for positive d may be changed by means of a preceding
DEFS instruction (see below).


BUMP s [0.1] d1 [#] d2 [#] d3 [#] ...

'Antibumping' restraints are generated automatically for all (solvent water)
atoms which have been flagged with 'CONN 0'.  The restraints can be generated
between CONN 0 atoms and all other non-hydrogen atoms, and appear in
subsequent tables as DFIX instructions with negative d and effective standard
deviation s.  The values to be used for d are given in SFAC order as d1, d2,
d3, ...; the default values are 3.2 for C, 2.7 for N, 2.6 for O and 3.5 for
ALL other elements. If the structure contains atoms of other elements (e.g.
explicit cations in polynucleotides) that can interact with the solvent, it
will be necessary to specify the appropriate distances on the BUMP
instruction.  The restraints are also set up for all symmetry equivalents
automatically; however if the sum of occupancies of the two atoms is less than
1.1, no restraint is generated.  Iterative refinement with antibumping
restraints, followed by deletion of atoms which persist in giving unacceptable
distances or for which the (equivalent isotropic) displacement parameters
become larger than say 1.0 to 1.2 A^2, and insertion of new potential solvent
atoms from difference electron density syntheses, provides a reliable
procedure for building up a solvent model with acceptable hydrogen bonding
distances which is consistent with the diffraction data; 'PLAN 200 2.3' would
be appropriate.  If there are more than 15 different SFAC types, d15 is used
for those with numbers greater than 15.


SAME  s1 [0.03]  s2 [0.03]  atomnames

The list of atoms (which may include the symbol '>' meaning all intervening
non-hydrogen atoms in a forward direction, or '<' meaning all intervening
non-hydrogen atoms in a backward direction) is compared with the same number
of atoms which follow the SAME instruction.  All bonds in the connectivity
list for which both atoms are present in the SAME list are restrained to be
the same length as those between the corresponding following atoms (with an
effective standard deviation s1).  The same applies to 1-3 distances (defined
by two bonds in the connectivity list which share a common atom), with
standard deviation s2.  If s2 is absent it is given the same value as s1.
s1 or s2 may be set to zero to switch off the corresponding restraints.  The
program automatically sets up the n*(n-1)/2 restraint equations required when
n interatomic distances should be equal.  This ensures optimum efficiency and
avoids arbitrary unequal weights.  Only the minimum set of restraints needs to
be specified in the '.ins' file;  redundant restraints are ignored by the
program, provided that they have the same sigma values as the unique set of
restraints.  See also SADI.

   The position of a SAME instruction in the input file is critical.  If (say)
all the phenylalanine residues in a protein are to be restrained to have the
same 1,2 and 1,3 distances, and all have the same atom names (in the same
order!), and the same residue name (PHE), but different residue numbers, then
ONE SAME instruction suffices:

 SAME_phe N > CZ

where the first atom in each phenylalanine is labeled 'N' and the last 'CZ'.
This instruction should be inserted before the first atom (N) of the phenyl-
alanine with the best geometry, because the connectivity table for this
residue will be used to define the 1,2 and 1,3 distances.  This phenylalanine
does not have to be the first in the atom list.  In this case it would also be
reasonable to impose local twofold symmetry for the phenyl ring, so a further
SAME instruction could be added before the beta (benzylic) carbon (CB) of the
same residue:

 SAME CB CG CD2 CD1 CE2 CE1 CZ

where the order of the immediately following atoms is:  CB CG CD1 CD2 CE1 CE2
CZ. Note that these two SAME restraints are all that is required, however many
PHE residues are present; the program will generate all indirectly implied 1,2
and 1,3 equal distance restraints!  In this case it would also be sensible to
make the carbon atoms of the benzyl groups coplanar by a FLAT restraint.


SADI  s [0.03]  atom pairs

The distances between the first and second named atoms, the third and fourth,
fifth and sixth etc. (if present) are restrained to be equal with an
effective standard deviation s.  The SAME and SADI restraints are analyzed
together by the program to find redundant and implied restraints.  The same
effect as is obtained using SADI can also be produced by using DFIX with d
tied to a free variable, but the latter costs one more least-squares parameter
(but in turn produces a value and esd for this parameter).  The default
effective standard deviations for SAME and SADI may be changed by means of a
DEFS instruction before the instruction in question.


CHIV  V [0]  s [0.2]  atomnames

The chiral volumes of the named atoms are restrained to the value V (in cubic
Angstroms) with standard deviation s.  The chiral volume is defined as the
volume of the tetrahedron formed by the three bonds to each named atom, which
must be bonded to three and only three non-hydrogen atoms in the connectivity
list; the order in the connectivity list, which is determined by the order of
increasing bond lengths, defines the sign of the chiral volume.  Note that
RTAB may be used to list chiral volumes defined in the same way but without
restraining them.  The chiral volume is positive for the alpha-carbon (CA) of
an L-amino-acid if the order of the three bond lengths is CA-N, CA-C, CA-CB
(as would be expected for an accurate structure).  Note that 'CHIV 0' (or just
CHIV since the default V is zero) may be used to impose a planarity restraint
on an atom which is bonded to three others (by making the chiral volume zero),
and is mathematically equivalent to a FLAT instruction which names the four
atoms explicitly.


FLAT  s [0.2]  four or more atoms

If precisely four atoms are named, they are restrained to be coplanar (within
the effective standard deviation s) by restraining the volume of the
tetrahedron with the four atoms as corners to zero. The edges of this
tetrahedron do NOT have to appear as bonds in the connectivity list.  The
algebra involved is the same as for CHIV (!), and so the units of s are
Angstroms^3.  If more than four atoms are specified, the fourth and all
remaining atoms are used in turn as the fourth corner of a tetrahedron
involving the first three atoms, for which the volume is again restrained to
be zero.  The first three atoms should be chosen so that they define a
triangle for which the area is as large as possible; for example alternate
atoms could be used for a six-membered ring.

   Although it might be objected that this method could cause the first three
atoms to be 'more coplanar' than the others, in practice FLAT is a very simple
and effective way of restraining a group of atoms to be approximately coplanar.
Alternative methods involve either (a) biasing the plane towards the existing
least-squares plane, (b) extra least-squares variables (which cost computer
time), or (c) complicated calculations and problems with numerical precision.
An alternative objection to this (and to some other) algorithms is that it
will tend to cause the atoms to be 'attracted' towards one another (since this
also reduces the volume of the tetrahedron).  However this error becomes
negligible as the atoms become nearly coplanar; tests with typical phenyl-
alanine residues in a polypeptide showed that the bias introduced was of the
order of 0.0001 Angstroms.  The default value of s for CHIV and FLAT may be
changed by a preceding DEFS instruction.


DELU  s1 [0.01]  s2 [0.01]  atomnames

All bonds in the connectivity list connecting atoms on the same DELU
instruction are subject to a 'rigid bond' restraint, i.e. the components of
the (anisotropic) displacement parameters in the direction of the bond are
restrained to be equal within an effective standard deviation s1.  The same
type of restraint is applied to 1-3 distances as defined by the connectivity
list (atoms 1, 2 and 3 must all be defined on the same DELU instruction).  If
s2 is omitted it is given the same value as s1.  A zero value for s1 or s2
switches off the corresponding restraint.  If no atoms are specified, all non-
hydrogen atoms are assumed.  DELU is ignored if (in the refinement cycle in
question) one or both of the atoms concerned is isotropic; in this case a
'hard' restraint is inappropriate, but SIMU may be used in the usual way as a
'soft' restraint.  DELU without atomnames applies to all non-hydrogen atoms
(in the current residue); DELU_* without atoms applies to all non-hydrogen
atoms in all residues.  SFAC element names may also be referenced, preceded by
the symbol '$'.  The default values of s1 and s2 may be changed by means of a
preceding DEFS instruction.


SIMU  s [0.05]  st [0.1]  dmax [1.7]  atomnames

Atoms closer than dmax are RESTRAINED with effective standard deviation s to
have the same Uij components.  If (according to the connectivity table, i.e.
ignoring attached hydrogens) one or both of the two atoms involved is terminal
(or not bonded at all), st is used instead as the esd.  If s but not st is
specified, st is set to twice s. If no atoms are given, all non-hydrogen atoms
are understood.  SIMU_* with no atoms applies to all non-hydrogen atoms in all
residues.  SFAC element names may also be referenced, preceded by '$'.  The
interatomic distance for testing against dmax is calculated from the atom
coordinates without using the connectivity table (though the latter is used
for deciding if an atom is terminal or makes no bonds).

   Note that SIMU should in general be given a much larger esd (and hence
lower weight) than DELU; whereas there is good evidence that DELU restraints
should hold accurately for most covalently bonded systems, SIMU (and ISOR) are
only rough approximations to reality.  s or st may be set to zero to switch
off the appropriate restraints.

   SIMU is intended for use for larger structures with poorer resolution and
data to parameter ratios than are required for full unrestrained anisotropic
refinement.  It is based on the observation that the Uij values on neighboring
atoms in larger molecules tend to be both similar and (when the resolution is
poor) significantly correlated with one another.  By applying a very weak
restraint of this type, we allow a gradual increase and change in direction
of the anisotropic displacement parameters as we go out along a side-chain,
and we restrain the motion of atoms perpendicular to a planar group (which
DELU cannot influence).  The use of a distance criterion directly rather than
via the connectivity table enables the restraints to be applied automatically
to partially overlapping disordered atoms, for which it is an excellent
approach.  dmax can be set so that coordination distances to metal ions etc.
are excluded.  Terminal atoms tend to show the largest deviations from equal
Uij's and so st should be set higher than s (or made equal to zero to switch
off the restraints altogether).  SIMU restraints are NOT recommended for SMALL
molecules and ions, especially if free rotation or torsion is possible (e.g.
C5H5-groups, AsF6- ions).  For larger molecular fragments, the effective
rotation angles are smaller, and the assumption of equal Uij for neighboring
atoms is more appropriate: both translation and libration of a large fragment
will result in relatively similar Uij components on adjacent atoms.  SIMU may
be combined with ISOR, which applies a further soft but quite different
restraint on the Uij components.  SIMU may also be used when one or both of
the atoms concerned is isotropic.  The default value of s may be changed by a
preceding DEFS instruction (st is then set to twice s).


DEFS  sd [0.03]  sf [0.2]  su [0.01]  ss [0.05]  maxsof [1]

DEFS may be used to change the default effective standard deviations for the
following DFIX, SAME, SADI, CHIV, FLAT, DELU and SIMU restraints, and is
useful when these are to be varied systematically to establish the optimum
values for a large structure (e.g. using R(free)).  sd is the default for s
in the SADI and DFIX instructions (excluding DFIX instructions with negative
d, for which the default s remains at 0.1), and also for s1 and s2 in the SAME
instruction.  sf is the default effective standard deviation for CHIV and
FLAT, su is the default for both s1 and s2 in DELU, and ss is the default s
for SIMU.  The default st for SIMU is set to twice the default s.

   maxsof is the maximum allowed value that an occupation factor can refine
to; occupation factors that are fixed or tied to free variables are not
restricted.  It is possible to change this parameter (to say 1.1 to allow for
hydrogen atoms) when refining both occupation factors and U's for solvent
water in proteins (a popular but not uncontroversial way of improving the R
factor).


ISOR  s [0.1]  st [0.2]  atomnames

The named atoms are RESTRAINED with effective standard deviation s so that
their Uij components approximate to isotropic behavior; however the
corresponding isotropic U is free to vary.  ISOR is often applied, perhaps
together with SIMU, to allow anisotropic refinement of large organic molecules
when the data are not adequate for unrestrained refinement of all the Uij; in
particular ISOR can be applied to solvent water for which DELU and SIMU are
inappropriate.  ISOR should in general be applied as a weak restraint, i.e.
with relatively large sigmas, for the reasons discussed above (see SIMU);
however it is also useful for preventing individual atoms from becoming 'non-
positive-definite'.  However it should not be used indiscriminately for this
purpose without investigating whether there are reasons (e.g. disorder, wrong
scattering factor type etc.) for the atom going n.p.d.  If (according to the
connectivity table, i.e. ignoring attached hydrogens) the atom is terminal (or
makes no bonds), st is used instead as the esd.  If s but not st is specified,
st is set to twice s.  If no atoms are given, all non-hydrogen atoms are
understood.  SFAC element names may also be referenced, preceded by '$'.  s or
st may be set to zero to switch off the appropriate restraints.  ISOR without
atom names (or ISOR_* if residues are used) applies this restraint to all non-
hydrogen atoms.  Note also the use of the keyword 'LAST' to indicate the last
atom in the .ins file; an anisotropic refinement of a macromolecule will often
include 'ISOR 0.1 O1 > LAST', which assumes that the solvent water is in
residue 0 at the end of the atom list.

   Note that ISOR should in general be given a much larger esd (and hence
lower weight) than DELU; whereas there is good evidence that DELU restraints
should hold accurately for most covalently bonded systems, ISOR (and SIMU) are
only rough approximations to reality.


SUMP  c  sigma  c1  m1  c2  m2 ...

The linear restraint:   c = c1*fv(m1) + c2*fv(m2) + ...   is applied to the
specified free variables.  This enables more than two atoms to be assigned to
a particular site, with the sum of site occupation factors restrained to be a
constant.  It also enables linear relations to be imposed between distances
used on DFIX restraints, for example to restrain a group of atoms to be
collinear.  sigma is the effective standard deviation.  By way of example,
assume that a special position on a four-fold axis is occupied by a mixture of
sodium, calcium, aluminium and potassium cations so that the average charge
is +2 and the site is fully occupied. The necessary restraints and constraints
could be set up as follows (the program will take care of the special position
constraints on the coordinates and Uij of course):

 SUMP 1.0 0.01 1.0 2 1.0 3 1.0 4 1.0 5   ! site fully occupied
 SUMP 2.0 0.01 1.0 2 2.0 3 3.0 4 1.0 5   ! mean charge = +2
 EXYZ Na1 Ca1 Al1 K1             ! common x, y and z coordinates
 EADP Na1 Ca1 Al1 K1             ! common U or Uij
 FVAR ... 0.20 0.30 0.35 0.15    ! starting values for free variables 2..5
 ...
 Na1 ... ... ... ... 20.25 ...   ! 0.25 * fv(2)  [the 0.25 is required for
 Ca1 ... ... ... ... 30.25 ...   ! 0.25 * fv(3)  a special position on a
 Al1 ... ... ... ... 40.25 ...   ! 0.25 * fv(4)  four-fold axis, i.e. site
 K1  ... ... ... ... 50.25 ...   ! 0.25 * fv(5)  symmetry 4]

Similar SUMP restraints may be used when elements are distributed over several
sites in minerals so that the elemental composition corresponds (within
suitable standard deviations) to an experimental chemical analysis.



                     LEAST-SQUARES ORGANIZATION


L.S.  nls [0]  nrf [0]  nextra [0]  maxvec [511]

nls cycles of full-matrix least-squares refinement are performed, followed
by a structure factor calculation.  When L.S. (or CGLS) is combined with BLOC,
each cycle involves refinement of a block of parameters which may be set up
differently in different cycles.  If no L.S. or CGLS instruction is given,
'L.S. 0' is assumed.

   If nrf is positive it is the number of these cycles which should be
performed before applying ANIS.  This two-stage refinement is particularly
suitable for the early stages of least-squares refinement; experience
indicates that it is not advisable to let everything go at once!

   Negative nrf indicates which reflections should be ignored during the
refinement but used instead for the calculation of independent R-factors in
the final structure factor summation; for example L.S. 4 -10 would ignore
every 10th reflection for refinement purposes.  The selection is based on the
(merged) reflection list before applying OMIT and SHEL, and so should be
independent of the operation of these two instructions (however only data
which have not been suppressed by OMIT or SHEL contribute to the independent
R-factors).  This strategy should also make the selection of reflections to
ignore independent of the computer.  It is desirable to use the same negative
value of nrf throughout, so that the values of 'R1(free)' and 'wR2(free)' are
not biased by the 'memory' of the contribution of these reflections to
earlier refinements.  These independent R-factors may be used to calibrate the
sigmas for the various classes of restraint, and provide a check as to whether
the data are being 'over-refined' (primarily a problem for macromolecules with
a poor data to parameter ratio).  For further details see A.T. Brunger, Nature
355 (1992) 472-475.  In SHELXL-93, these ignored reflections are treated in
the same way as reflections suppressed with OMIT except for the calculation of
R1(free) and wR2(free), i.e they are used in the calculation of R-indices
based on all reflections, but not used for Fourier calculations.

   nextra is the number of additional parameters which were derived from the
data when performing empirical absorption corrections etc.  It should be set
to 8 (LAMI), 12 (HOPE) or 18 (EMPI) if SHELXA was used for this purpose, and
to 44 for DIFABS (or 34 without the theta correction; N. Walker and D. Stuart,
Acta Cryst., A39 (1983) 158-166).  It ensures that the standard deviations and
GooF are estimated correctly; they would be underestimated if the number of
extra parameters is not specified.  nextra is zero (and so can be omitted) if
extra information in the form of indexed crystal faces or psi-scan data was
used to apply an absorption correction.

   maxvec refers to the maximum number of reflections processed simultaneously
in the rate-determining calculations.  Usually the program utilizes all
available memory to process as many reflections as possible simultaneously,
subject to a maximum of maxvec, which may not be larger than 511.  For
complicated reasons involving the handling of suppressed and 'R(free)'
reflections and input/output buffering, some blocks may be smaller than the
maximum, especially if the facilities for refinement against twinned or powder
data are being used.  It may be desirable to set maxvec to a smaller number
than 511 to prevent unnecessary disk transfers when large structures are
refined on virtual memory systems with limited physical memory.


CGLS  nls [0]  nrf [0]  nextra [0]  maxvec [511]

As L.S., but the Konnert-Hendrickson conjugate-gradient algorithm is employed
instead of the full-matrix approach.  Although BLOC may be used with CGLS, in
practice it is much better to refine all parameters at once.  CGLS is much
faster than L.S. for a large number of parameters, and so will be the method
of choice for most macromolecular refinements.  The convergence properties of
CGLS are good in the early stages (especially if there are many restraints),
but cannot compete with L.S. in the final stages for structures which are
small enough for full-matrix refinement.  The major disadvantage of CGLS is
that it does not provide estimated standard deviations, so when a large
structure has been refined to convergence using CGLS it may be worth
performing a blocked full-matrix refinement (L.S./BLOC) to obtain the standard
deviations in quantities of interest (e.g. torsion angles, in which case only
xyz blocks would be required).  A further disadvantage of CGLS is its
propensity for getting stuck in a local minimum in situations where L.S./BLOC
would find the global minimum; for this reason a mixed CGLS/L.S. alternative
is provided (CGLS with negative nls) which performs CGLS refinement in the
odd numbered cycles and L.S. in the even numbered.  When this option is used,
it will be normal to provide BLOC instructions for the even numbered cycles
only.  The other parameters have the same meaning as with L.S.; CGLS is
entirely suitable for R(free) tests (negative nrf), and since it requires
much less memory than L.S. there will rarely be any reason to change maxvec
from its default value.

   The CGLS algorithm is based closely on the procedure described by W.A.
Hendrickson and J.H. Konnert (Computing in Crystallography, Ed. R. Diamond,
S. Ramaseshan and K. Venkatesan, I.U.Cr. and Indian Academy of Sciences,
Bangalore 1980, pp. 13.01-13.25).  The structure-factor derivatives contribute
only to the diagonal elements of the least-squares matrix, but all 'additional
observational equations' (restraints) contribute in full to diagonal and off-
diagonal terms, although neither the l.s. matrix A nor the Jacobean J are ever
generated.  The preconditioning recommended by Hendrickson and Konnert is used
to speed up the convergence of the internal conjugate gradient iterations, and
has the additional advantage of preventing the excessive damping of poorly
determined parameters characteristic of other conjugate gradient algorithms
(D.E. Tronrud, Acta Cryst. A48 (1992) 912-916).

   A further refinement in the CGLS approach is to save the parameter shifts
from the previous full CGLS cycle, and to use them to estimate a shift
multiplication factor independently for each parameter.  This parameter is
larger when a parameter appears to 'creep' in the same direction in successive
cycles, and small when it oscillates.  This technique significantly improves
the convergence properties of the CGLS approach, because it indirectly takes
into account the correlation terms which were ignored (to save time and
space); however it cannot be used with BLOC or 'CGLS -nls'.  The maximum and
minimum shifts are set by the SLIM instruction; usually it will not be
necessary to change them, but if a CGLS refinement appears to be unstable,
both parameters should be reduced; in such a case it would be even better to
track down and fix the cause of the instability, e.g. trying to refine a
structure in the wrong space group!


SLIM  f1 [0.8]  f2 [0.2]

Maximum and minimum shift multiplication factors for CGLS refinement as
described in the previous paragraph.  These numbers have no effect on L.S.
refinement, but for full-matrix refinement the program still reduces the
shifts on parameters that appear to oscillate.


BLOC  n1  n2  atomnames

If n1 or n2 are positive, the x, y and z parameters of the named atoms are
refined in the corresponding cycle.  If n1 or n2 are negative, the occupation
and displacement parameters are refined in cycle.  Not more than two such
parameters may be specified on a single BLOC instruction, but the same atoms
may be mentioned in any number of BLOC instructions.  To refine both x, y and
z as well as displacement parameters for an atom in the same block, n1 and n2
should specify the same cycle number, but with opposite signs.  A BLOC
instruction with no atom names refines all atoms in the specified cycles.
The pattern of blocks is repeated after the maximum block number has been
reached if the number of L.S. refinement cycles is larger than the maximum
BLOC |n1| or |n2|.  If a cycle number less than the maximum |n1| or |n2| is
not mentioned in any BLOC instruction, it is treated as full-matrix.  The
overall scale, batch/twin scale factors, extinction coefficient, SWAT g
parameter and free variables (if present) are refined in every block.  Riding
(hydrogen) atoms and atoms in rigid groups are included in the same blocks as
the atoms on which they ride.

   For example, a polypeptide consisting of 30 residues (residue numbers 1..30
set by RESI instructions) could be refined efficiently as follows (all
non-hydrogen atoms assumed anisotropic):

 BLOC 1
 BLOC -2 N_1 > O_16
 BLOC -3 N_14 > O_30

which would ensure 3 roughly equally sized blocks of about 800 parameters each
and some overlap between the two anisotropic blocks to avoid problems where
they join.  The geometric parameters would refine in cycles 1,4,7 .. and the
anisotropic displacement parameters in the remaining cycles.  An alternative
good blocking strategy would be to divide the structure into three overlapping
blocks of xyz and Uij parameters, and to add a fourth cycle in which all xyz
but no Uij values are refined (these four blocks would then also each contain
about 800 parameters), i.e.:

 BLOC 1 -1 N_1 > O_11
 BLOC 2 -2 N_10 > O_21
 BLOC 3 -3 N_20 > O_30
 BLOC 4


DAMP  damp [1]  limse [15]

damp is usually left at the default value unless there is severe correlation,
e.g. when trying to refine a pseudo-centrosymmetric structure, or refining
with few data per parameter (e.g. from powder data).  A value in the range
1-10000 might then be appropriate.  The diagonal elements of the least-squares
matrix are multiplied by (1+damp/1000) before inversion; this is a version of
the Marquardt algorithm (J. Soc. Ind. Appl. Math., 11 (1963) 431-441).  A
side-effect of damping is that the standard deviations of poorly determined
parameters will be artificially reduced; it is recommended that a final least-
squares cycle be performed with little or no damping in order to improve these
estimated standard deviations.  Theoretically, damping only serves to improve
the convergence properties of the refinement, and can be gradually reduced as
the refinement converges; it should not influence the final parameter values.
However in practice damping also deals effectively with rounding error
problems in the (single-precision) least-squares matrix algebra, which can
present problems when the number of parameters is large and/or restraints are
used (especially when the latter have small esd's), and so it may not prove
possible to lift the damping entirely even for a well converged refinement.

   If the maximum shift/esd (excluding the overall scale factor) is greater
than limse, all the shifts are scaled down by the same numerical factor so
that the maximum is equal to limse.  If the maximum shift/esd is smaller than
limse no action is taken.  This helps to prevent excessive shifts in the early
stages of refinement.


WGHT  a [0.1]  b [0]  c [0]  d [0]  e [0]  f [.33333]

The weighting scheme is defined as follows:

w = q / [ sigma^2(Fo^2) + (a*P)^2 + b*P + d + e*sin(theta) ]

where P = [ f * Maximum of(0 or Fo^2) + (1-f) * Fc^2 ]. It is possible for the
experimental Fo^2 value to be negative because the background is higher than
the peak; such negative values are replaced by 0 to avoid possibly dividing by
a very small or even negative number in the expression for w.

For twinned and powder data, the Fc^2 value used in the expression for P is
the total calculated intensity obtained as a sum over all components.

q is 1 when c is zero,  exp[c*(sin(theta)/lambda)^2]  when c is positive, and
1 - exp[c*(sin(theta)/lambda)^2]  when c is negative.

The use of P rather than (say) Fo^2 reduces statistical bias (A.J.C. Wilson,
Acta Cryst., A32 (1976) 994-996).  The weighting scheme is NOT refined if a
is negative (contrast SHELX-76).  The parameters can be set by trial and error
so that the variance shows no marked systematic trends with the magnitude of
Fc^2 or of resolution; the program suggests a suitable WGHT instruction after
the analysis of variance.  This scheme is chosen to give a flat analysis of
variance in terms of Fc^2, but does not take the resolution dependence into
account.  It is usually advisable to retain default weights (WGHT 0.1) until
all atoms have been found, when the scheme suggested by the program can be
used for the next refinement job by replacing the WGHT instruction (if any)
by the one output by the program towards the end of the .res file.  This
procedure is adequate for most routine refinements.

   It may be desirable to use a scheme which does not give a flat analysis of
variance to emphasize particular features in the refinement; for example c =
+10 or -10 would weight up data at higher 2-theta, e.g. to perform a 'high-
angle' refinement (uncontaminated by hydrogen atoms which contribute little at
higher diffraction angle) prior to a difference electron density synthesis
(FMAP 2) to locate the hydrogens.  The exponential weights which are obtained
when c is positive were advocated by J.D. Dunitz and P. Seiler, Acta Cryst.,
B29 (1973) 589-595.  Weighting up the high angle reflections will in general
give X-ray atomic coordinates which are closer to those from neutron
diffraction.

   Refinement against F^2 requires different weights to refinement against F;
in particular, making all the weights equal ('unit weights'), although useful
in the initial stages of refinement against F, is NEVER a sensible option for
F^2.  If the program suspects that an unsuitable WGHT instruction has been
accidentally retained for a structure which had been refined previously with
SHELX-76 or the XLS program in the Siemens SHELXTL system, it will output a
warning message.


FVAR  osf [1]  free variables

The overall scale factor is followed by the values of the 'free variables'
fv(2) ...  The overall scale factor is given throughout as the square root of
the scale factor which multiplies Fc^2 in the least-squares refinement
[to make it similar to the scale factor in SHELX-76 which multiplied Fc], i.e.
osf^2*Fc^2 is fitted to Fo^2.

   SHELXL-93 goes to some trouble to ensure that the initial value of the
scale factor has very little influence.  Firstly a quick structure factor
summation with a small fraction of the total number of reflections is
performed to estimate a new scale factor.  If the values differ substantially
then the new value is used.  Secondly the scale factor is factored out of the
least-squares algebra so that, although it is still refined, the only
influence the previous value has is an indirect one via the weighting scheme
and extinction correction.

   Before calculating electron density maps and the analysis of variance, and
writing the structure factor file ('name.fcf'), the observed F^2 values and
esd's are brought onto an absolute scale by dividing by the scale factor.

   The free variables allow extra constraints to be applied to the atoms, e.g.
for common site occupation factors or isotropic displacement parameters, and
may be used in conjunction with the SUMP, DFIX and CHIV restraints.  If there
is more than one FVAR instruction, they are concatenated; they may appear
anywhere between UNIT and HKLF (or END).



                           LISTS AND TABLES

The esds in bond lengths, angles and torsion angles, chiral volumes, Ueq, and
coefficients of least-squares planes and deviation of atoms from them, are
estimated rigorously from the full correlation matrix (an approximate
treatment is used for the angles between least-squares planes).  The errors in
the unit-cell dimensions (specified on the ZERR instruction) are taken into
account exactly in estimating the esds in bond lengths, bond angles, torsion
angles and chiral volumes.  Correlation coefficients between the unit-cell
dimensions are ignored except when determined by crystal symmetry (so that for
a cubic crystal the cell esds contribute to errors in bond lengths and chiral
volumes but not to the errors in bond angles or torsion angles).  The (rather
small) contributions of the unit-cell errors to the esds of quantities
involving least-squares planes are estimated using an isotropic approximation.

   For full-matrix refinement, the esds are calculated after the final
refinement cycle.  In the case of BLOC'ed refinement, the esds are calculated
after every cycle (except that esds in geometric parameters are not calculated
after pure Uij/sof cycles etc.), and the maximum estimate of each esd is
printed.  This prevents some esds being underestimated because not all of the
relevant atoms were refined in the last cycle, but at the cost of
overestimating all the esds if the R-factor drops appreciably during the
refinement.  Thus large structures should first be refined almost to
convergence (either by CGLS or L.S./BLOC), and then a separate final blocked
refinement job performed to obtain the final parameters and their esds.  It
is important that there is sufficient overlap between the blocks to enable
every esd to be estimated with all contributing atoms refining in at least
one of the refinement cycles.


BOND  atomnames

BOND outputs bond lengths for all bonds defined in the connectivity list which
involve two atoms named on the same BOND instruction.  Angles are output for
all pairs of such bonds involving a common atom.  Numerical parameters on a
BOND instruction are ignored, but not treated as errors (for compatibility
with SHELX-76).  A BOND instruction with no parameters outputs bond lengths
(and the corresponding angles) for ALL bonds in the connectivity table, and
'BOND $H' on its own includes all bonds to hydrogens as well (but - as usual -
the hydrogens are not included in the connectivity table, so bonds involving
symmetry equivalent hydrogens are not included).  Other element names may also
be referenced globally by preceding them with a '$' on a BOND instruction.
BOND is set automatically by ACTA, and the bond lengths and angles are written
to the .cif file.


CONF  atomnames

The named atoms define a chain of at least four atoms.  CONF generates a list
of torsion angles with esd's for all torsion angles defined by this chain.
CONF is often used to specify an n-membered ring, in which case the first
three atoms must be named twice (n+3 names in all). If no atoms are specified,
all possible torsion angles not involving hydrogen are generated from the
connectivity array.  The torsion angles generated by CONF are also written to
the .cif file if an ACTA instruction is present. All torsion angles calculated
by SHELXL-93 follow the conventions defined by F.H. Allen and D. Rogers, Acta
Cryst., B25 (1969) 1326.


MPLA  na  atomnames

A least-squares mean plane is calculated through the first na of the named
atoms, and the equation of the plane and the deviations of all the named
atoms from the plane are listed with estimated standard deviations (from the
full covariance matrix).  The angle to the previous least-squares plane (if
any) is also calculated, but some approximations are involved in estimating
its esd.  na must be at least 3.  If na is omitted the plane is fitted to all
the atoms specified.


RTAB  codename  atomnames

Chiral volumes (one atomname), bonds (two), angles (three) and torsion angles
(four atomnames) are tabulated compactly against residue name and number.
codename is used to identify the quantity being printed; it must begin with a
letter and not be longer than 4 characters (e.g. 'Psi' or 'omeg').  There may
not be more than 4 atom names.  It is assumed that the atoms have the same
names in all the required residues.  For chiral volumes only, the necessary
bonds must be present in the connectivity list (the same conventions are
employed as for CHIV).  Since the atoms do not themselves have to be in the
same residue (it is sufficient that the names match), the residue name (if
any) is printed as that of the first named atom for distances, the second for
angles, and the third in the case of torsion angles.  The latter should be
consistent with generally accepted conventions for proteins.  A typical
application of RTAB for small-molecule structures is the tabulation of
hydrogen-bonded distances and angles (with esd's) since these will not usually
appear in the tables created automatically by BOND.

   If RTAB refers to more than one residue (e.g. RTAB_*), it is ignored for
those residues in which not all the required atoms can be found (e.g. some of
the main chain torsional angles for the terminal residues in a protein).


LIST  m [#]  mult [1]

m = 0:  No action.

m = 1:  Write h,k,l, Fo, Fc and phase (in degrees) to .fcf in XPLOR format.
        Only unique reflections after removing systematic absences, scaling
        [to an absolute scale of F(calc)], applying dispersion and extinction
        or SWAT corrections (if any), and merging equivalents including
        Friedel opposites are included.  If Fo^2 was negative, F(obs) is set
        to zero.  Reflections suppressed by OMIT or SHEL [or reserved for
        R(free)] are not included.

m = 2:  Write h,k,l, Fo, sigma(Fo) and phase angle in degrees in FORMAT(3I4,
        2F8.2,I4) for the reflection list as defined for m = 1.

m = 3:  Write h,k,l, Fo, sigma(Fo), A(real) and B(imag) in FORMAT(3I4,4F8.2),
        the reflections being processed exactly as for m = 1.

m = 4:  Write h,k,l, Fc^2, Fo^2, sigma(Fo^2) and a one-character status flag.
        Fo^2 are scaled to Fc^2 and possibly corrected for extinction, but no
        corrections have been made for dispersion and no further merging has
        been performed.  FORMAT(3I4,2F12.2,F10.2,1X,A1) is employed.  The
        status flag is 'o' (observed), 'x' [observed but suppressed using
        'OMIT h k l', SHEL or reserved for R(free)], or '<' (Fo^2 is less than
        t.sigma(Fo^2), where t is one half of the F-threshold s specified on
        an OMIT instruction).

m = 5:  Write h,k,l, Fo, Fc, and phase in degrees in FORMAT(3I4,2F10.2,F7.2)
        for the reflection list as defined for m = 1.  Like the m = 1 option,
        this is intended for input to standard macromolecular FFT programs
        (such as W. Furey's PHASES program), thereby providing a route to a
        graphical display of the electron density.

For m = 4 only, mult is a constant multiplicative factor applied to all the
quantities output (except the reflection indices!), and may be used if there
are scaling problems.  For other m options mult is ignored.  For m = 2, 3 or
4 only a blank line is included at the end of the file as a terminator.  The
reflection list is written to the file 'name.fcf', which is in CIF format for
n = 3 or 4; however the actual reflections are always in fixed format except
for n = 1.  The program CIFTAB can - amongst other options - read the m = 4
output and print Fo/Fc/sigma(F) tables in compact form on an HP-compatible
laser-printer (see Appendix C).


ACTA  labelcode [1]

A 'Crystallographic Information File' file 'name.cif' is created in self-
defining STAR format.  This ASCII file is suitable for data archiving, network
transmission, and (with suitable additions) for direct submission for
publication.  ACTA automatically sets the BOND, FMAP 2, PLAN and LIST 4
instructions, and may not be used with other FMAP or LIST instructions or with
a positive OMIT s threshold.  A warning message appears if the cell contents
on the UNIT instruction are not consistent with the atom list, because they
are used to calculate the density etc. which appears in the '.cif' output
file.

   If labelcode is set to one (or is absent) the atom labels in the cif file
are generated assuming that typical small-molecule atom names have been used,
i.e. CE1 is translated as Ce1, and if residue numbers are used they are
appended directly, e.g. C2B_3; residue classes are not included in the atom
labels.  A labelcode of 2 implies that atoms have been named in a typical
protein manner; CE1 in residue number 34 which is of class PHE generates the
atom label C_E1_Phe_34.


SIZE dx dy dz

dx, dy and dz are the three principal dimensions of the crystal in mm, as
usually quoted in publications.  This information is written to the '.cif'
file (and also used by SHELXA).


TEMP T [20]

Sets the temperature T of the data collection in degrees Celsius.  This is
reported to the .cif file and used to set the default isotropic U values for
all atoms.  TEMP must come before all atoms in the .ins file.  TEMP also sets
the default X-H bond lengths (see AFIX) which depend slightly on the
temperature because of librational effects.  The default C-H bond lengths and
default U-values are rounded to two decimal places so that they may be quoted
more easily.


WPDB n [1]

Writes the refined coordinates to a '.pdb' file.  If n is positive hydrogen
atoms are omitted; if |n| is 1 all atoms are converted to isotropic and ATOM
statements generated, and if |n| is 2 ANISOU statements are also generated
(but the equivalent B value is still used on the ATOM statement).  The atom
names and residue classes and numbers should conform to PDB conventions. This
provides a direct link to XPLOR and other programs which use the official
(Brookhaven) dialect of the PDB format.  Note however that XPLOR requires that
solvent (water) atoms are each placed in a separate residue; this is not
standard PDB format and is not generated by SHELXL-93.



               FOURIER, PEAK SEARCH AND LINE PRINTER PLOTS


FMAP  code [2]  axis [#]  nl [53]

The unique unit of the cell for performing the Fourier calculation is set up
automatically unless specified by the user using FMAP and GRID; the value of
axis must be non-zero to suppress the automatic selection.  The program
chooses a 53 x 53 x nl or 103 x 103 x nl grid depending on the resolution of
the data.  axis is 1, 2 or 3 to define the direction perpendicular to the
layers.  Dispersion corrections are applied (so that the resulting electron
density is real) and Friedel opposites are merged after the least-squares
refinement and analysis of variance but before calculating the Fourier
synthesis.  This will improve the map (and bring the maximum and minimum
residual density closer to zero) compared with SHELX-76.  In addition, since
usually all the data are employed, reflections with sigma(F) relatively large
compared with Fc are weighted down.  This should be better than the use of an
arbitrary cutoff on Fo/sigma(F).  The rms fluctuation of the map relative to
the mean density is also calculated; in the case of a difference map this
gives an estimate of the 'noise level' and so may be used to decide whether
individual peaks are significant.

   If code is made negative, both positive and negative peaks are included in
the list, sorted on the absolute value of the peak height. This is intended to
be useful for neutron diffraction data.

code = 2: Difference electron density synthesis with coefficients (Fo-Fc) and
          phases phi(calc).

code = 3: Electron density synthesis with coefficients Fo and phases phi(calc).

code = 4: Electron density synthesis with coefficients (2Fo-Fc) and phases
          phi(calc).

F(000) is included in the Fourier summations for code = 3 and 4.


GRID  sl [#]  sa [#]  sd [#]  dl [#]  da [#]  dd [#]

Fourier grid, when not set automatically.  Starting points and increments
multiplied by 100.  s means starting value, d increment, l is the direction
perpendicular to the layers, a is across the paper from left to right, and d
is down the paper from top to bottom.  Note that the grid is 53 x 53 x nl
points, i.e. twice as large as in SHELX-76, and that sl and dl need not be
integral.  The 103 x 103 x nl grid is only available when it is set
automatically by the program (see above).


PLAN  npeaks [20]  d1 [#]  d2 [#]

If npeaks is positive a Fourier peak list is printed and written to the .res
file; if it is negative molecule assembly and line printer plots are also
performed.  For negative npeaks, distances involving peaks which are less than
r1+r2+d1 (the covalent radii r are defined via SFAC; 1 and 2 refer to the two
atoms concerned) are printed and used to define 'molecules' for the line
printer plots.  Distances involving atoms and/or peaks which are less than
r1+r2+|d2| are considered to be 'non-bonded interactions'; however distances
in which both atoms are hydrogen or at least one is carbon (recognised by SFAC
label 'C') are ignored.  The default values of d1 and d2 (for negative npeaks)
are 0.5 and 2.0 resp.  These non-bonded interactions are ignored when defining
molecules, but the corresponding atoms and distances are included in the line
printer output.  Thus an atom or peak may appear in more than one map, or more
than once on the same map. A table of the appropriate coordinates and symmetry
transformations appears at the end of each molecule.

   Negative d2 includes hydrogen atoms in the line printer plots, otherwise
they are left out (but included in the distance tables).  For the purposes of
the PLAN instruction, a hydrogen atom is one with a radius of less than 0.4
Angstrom.  Peaks are assigned the radius of SFAC type 1, which is usually set
to carbon.  Peaks appear on the printout as numbers, but in the .res file they
are given names beginning with 'Q' and followed by the same numbers.  Since
only three digits are available for the number, the absolute value of npeaks
may not exceed 999.  Peak heights are also written to the .res file (after the
sof and dummy U values) in electrons per cubic Angstrom.  See also MOLE for
forcing molecules (and their environments) to be printed separately.

   A default npeaks of +20 is set by FMAP; to obtain line printer plots, an
explicit PLAN instruction with negative npeaks is required.  If npeaks is
positive the nearest unique atoms to each peak are tabulated, together with
the corresponding distances.  A table of shortest distances between peaks is
also produced.  If npeaks is positive d1 and d2 have a different meaning.  The
default of d1 is then -1 and causes the full peaklist to appear in the .res
file.  If it is positive (say 2.3) then the full peaklist is still printed in
the .lst file, but only suitable candidates for (full occupancy) water
molecules appear in the .res file (with SFAC 4 and U set to 0.75).  The water
molecules must be less than 4 Angstroms from an atom which begins with 'O', 'N'
or 'W', and may not be less than d2 (default 3.0) from any atom which does not
begin with 'O', 'N', 'W' or 'H', and may not be nearer than d1 to any 'O', 'N'
or 'W' atom or potential waters which have larger peak heights.  This facility
is intended for extending the water structure of proteins in connection with
BUMP and SWAT.  To include the waters in the next refinement job, their names
need to be changed and they need to be moved to before the HKLF instruction at
the end of the atom list in the new .ins file. It is recommended that the last
water is called 'LAST' on the ISOR and CONF instructions so that this name
does not need to be updated each job.


MOLE n

Forces the following atoms, and atoms or peaks that are bonded to them, into
molecule n of the PLAN output.  n may not be greater than 99.  n = 99 has a
special meaning: the 'line printer plot' is suppressed for the following
atoms, but the table of distances is still printed.  This is sometimes useful
for saving paper, e.g. for solvent water in protein structures.



                        FURTHER INFORMATION

The author may be contacted via email ( gsheldr @ ibm.gwdg.de ) or fax
(Germany  [from the US 01149-, from the UK 01049- and from most other
countries 0049-] -551-393373).  He would be particularly interested to hear of
any problems in the installation and use of the program and of any errors or
lack of clarity in this instruction summary.

   A satirical description of the SHELX programming philosophy may be found in
Current Contents (Physical, Chemical and Earth Sciences), Vol. 29, Number 41
(1989) page 14.  The task of writing this program was made considerably
easier by the incorporation of many ideas and algorithms proposed by Durward
Cruickshank, Howard Flack, Wayne Hendrickson, John Konnert, John Rollett,
Dieter Schwarzenbach and David Watkin, that had already been tested by these
authors in their own excellent programs.  Ward Robinson proved invaluable in
making the documentation comprehensible.  My research group in Goettingen had
no choice but to suffer the 'alpha-test', and ca. 100 colleagues at different
institutions around the world provided invaluable feedback in the 'beta-' and
'gamma-tests'.  Syd Hall was most helpful and persuasive in the implementation
of CIF format.  The IUCr kindly gave permission for scattering factors,
absorption coefficients etc. from the new Volume C of International Tables to
be used prior to publication, and should also have the final word of caution:

   "Thoughtless use of established procedures in widely distributed software
may be as harmful as the natural tendency of most people to prefer results in
agreement with preconceived ideas",  D. Schwarzenbach et al., Report of the
IUCr Subcommittee on Statistical Descriptors, Acta Cryst., A45 (1989) 63-75.

                                         George Sheldrick




             Appendix A - Absorption corrections with SHELXA-92
             --------------------------------------------------

   SHELXA-93 is currently under development.  Licensed users of SHELXL-93
will be informed when it is ready for release.



          Appendix B - PDB to '.ins' format conversion with PDBINS
          --------------------------------------------------------

   The auxiliary program PDBINS reads a PDB format file and writes a '.ins'
file for SHELXL-93.  There are no command line options and the program runs
interactively, asking the user to supply missing information.  It is assumed
that the PDB file conforms to the specifications in the PDB documentation
'Atomic Coordinate and Bibliographic Entry Format Description' of Feb. 1992.
Atom lists from XPLOR and some other programs can be read, but if the
conventions for transforming from orthogonal to crystallographic coordinates
do not correspond to those in the above document, then appropriate SCALEn
records must precede the ATOM or HETATM records.  Note that SHELXL-93, unlike
XPLOR, does not allow the same residue number to be used with different
residue classes, and that the SHELXL-93 residue classes must begin with a
letter and the SHELXL-93 residue numbers must be pure numbers and may not
contain non-digits.  XPLOR usually requires each solvent molecule to be
defined as a separate residue, whereas for SHELXL-93 the solvent may either
be treated in this way or assigned to the (default) residue number zero.
PDBINS may renumber the residues if the current numbering scheme would be
inconvenient for SHELXL-93.  If there is more than one chain (or molecule) in
the asymmetric unit, different residue numbers are required and there should
be a gap of one or more residue numbers between different chains (otherwise
there are problems with C_- and N_+ etc.).

   PDBINS reads restraints and other standard instructions from a residue
dictionary file 'shelxl.dic'.  Users are encouraged to use this file as a
model for their own dictionary files (which should be given different names).
If the protein contains disordered or non-standard residues, some editing
of the resulting '.ins' file will be required before SHELXL-93 can be run.
The order of atoms in each residue is irrelevant for SHELXL-93, but for PDBINS
there are advantages in putting the atom named 'N' (the peptide nitrogen)
first in each residue (as is normal practice).  Some restraints may be missing
if non-standard residue or atom names are used; note that PDBINS expects the
C-terminal oxygen to be called OXT and not to be put in a separate residue.
PDBINS converts OT to OXT for the C-terminus and CD to CD1 in isoleucine in
accordance with PDB rules.  In addition, PDBINS generates restraints for
disulfide bridges linking residues CYS (or CSS).  PDBINS uses the first
character of the atom name as the element name, and recognizes two character
element names if they start one column to the left, in accordance with PDB
rules.

   SHELXL-93 may be run with Friedel opposites NOT averaged (in which case no
MERG instruction is needed; this is the correct option when significant
anomalous scatterers e.g. iron (using CuKa) are present) or using MERG 4 to
average Friedel opposites and set the dispersion terms f" to zero.  Note that
XPLOR and many other macromolecular programs, unlike SHELXL-93, require a
reflection list in which Friedel opposites have already been merged.

   PDBINS creates a '.pdl' file which gives residue names and compositions
and other useful information; this file should be printed out and retained for
reference during the refinement.

   PDBINS is written in essentially machine independent FORTRAN-77.  Before
compiling it the comments in the source should be consulted and the following
three changes may be required: (a) it is advisable to open the .ins and .pdl
files with CARRIAGECONTROL='LIST' for VAX/VMS systems, (b) the only format
statement in subroutine GETANS may be changed to end with ',$' to tidy up
the console output for VAX/VMS and some other systems and (c) the variable CR
may be set to CHAR(13) in the first executable statement in the main program
if the program is to run on a UNIX machine which shares a disk with MSDOS
machines using NFS network software.  This enables DOS programs such as the
Siemens SHELXTL-PC system to read the files directly.  For pure UNIX AND VMS
systems CR should be set to CHAR(32).



   Appendix C - Tables production from .'cif' and '.fcf' files using CIFTAB
   ------------------------------------------------------------------------

   The auxiliary program CIFTAB is provided with SHELXL-93 to facilitate the
transition to CIF.  In contrast to the actual SHELX programs, the FORTRAN code
is not intended to be treated as sacrosanct; it may be modified and extended
by users as the need arises.  CIFTAB should be run interactively from a
console; there are no command line options.  The program is available as an
essentially computer independent FORTRAN-77 source, or in precompiled form
for certain hardware configurations.  Before compiling it on a new system the
comments in the source should be consulted (see also the last paragraph of the
PDBINS description immediately above).  CIFTAB reads a '.cif' or '.fcf' file
written by SHELXL-93 and provides the following facilities, which may be
selected from a menu:

1. Some of the items which are unknown to SHELXL-93 and so are present as '?'
in the .cif file may be replaced by the corresponding items from other CIF
files, written for example by diffractometer control, data reduction or
absorption correction programs.  Only non-looped '?' items are resolved in
this way.

2. Structure factor tables may be output in compressed form on HP-Laserjet
compatible printers (or to file in the corresponding format for network
transmission to such a printer).

3. Tables of crystal data, atom parameters, bond lengths and angles,
anisotropic displacement parameters and hydrogen atom coordinates may be
produced in a format specified in a file 'ciftab.???' (where ??? is any three
letter combination).  A standard ASCII file 'ciftab.def' is provided; users
may use it as a model for preparing files to conform with the various journal
requirements etc.  This means that it is not necessary to modify and recompile
the program each time a journal changes its rules.  Extra files are provided
for users of the Siemens SHELXTL system which produce '.tex' files for
printing via the Siemens XTEXT program; this includes the production of tables
in German and provides much more flexibility in the handling of Greek and
other special characters.

   The format file is simply copied to the output file, except that directives
(lines beginning with '?' or '$') have a special meaning, '\n\' (where n is a
number) is replaced by the ASCII character n (e.g. \12\ starts a new page),
and CIF identifiers (which begin with the character '_') are replaced by the
appropriate number or string from the CIF file. CIF identifiers may optionally
be followed (without an intervening space) by one or more of: 'n', ':n'
and '=n' where n is an integer; the CIF identifier (including qualifier) must
be terminated by one space, which is not copied to the output file.  'n' right justifies a string or justifies a number so that the
figure immediately to the left of the decimal point appears in column n; if
there is no decimal point then the last digit appears in column n.  In either
case the standard deviation (if any) extends to the right with brackets but
without intervening spaces.  If 'n' are both absent, the CIF item is
inserted at the current position.  If ':n' is absent the item is treated as a
string (see above), otherwise it is treated as a number; n is the power of 10
with which the CIF item should be multiplied, and is useful for converting
Angstroms to pm or printing coordinates as integers; n may be negative, zero
or positive.  '=n' rounds the CIF item (after application of ':n') so that
there are not more than n figures after the decimal point; n must be zero or
positive.

   A line beginning with 'loop_' is repeated until the corresponding loop in
the CIF file is exhausted; all the CIF items in the line must be in the same
loop in the CIF input file.  All CIF data names and the string 'loop_' must be
given as lower case in the format file; in the CIF input file the standard CIF
rules apply.

   A line containing at least 4 consecutive underscores is copied to the
output file unchanged, and may be used for drawing a horizontal line.  There
are also two pseudo-CIF-identifies: '_tabno' is the number of the table, and
'_comno' is a number to identify the compound.  Both may be set via the
CIFTAB menu.  '_tabno' but not '_comno' is incremented each time it is used.

   An underscore '_' followed by a space may be used to continue on the next
line without creating a new line in the output file.  Lines beginning with
question marks are output to the console (without the leading question mark)
as questions; if the answer to the question is not 'Y' or 'y', everything in
the format file is skipped until the next line which begins with a question
mark.  Lines beginning with a dollar '$' are not interpreted as text, but are
scanned for the following strings (upper or lower case, quotes not essential):

 'xtext': output should be formatted for the Siemens SHELXTL xtext program.
 'xtext,deutsch': as above, but translated into German.

The above directive, if present, should be the first line of the format file.

   The directive $symops:n, where n is an integer, prints the symmetry
operations used to generate equivalent atoms, starting each line of text in
column n.  These operators are references by '#m' (where m is an integer)
after the atom name.  The line beginning '$symops:n' usually follows the table
of selected bond lengths and angles, but could also be used for a torsion
angles table.

   The remaining directives may appear at any point in the format file except
immediately after a continuation line marker, but always on a line beginning
with '$'.

 'h=none': leave out all hydrogen atoms.
 'h=only': leave out all non-hydrogen atoms.
 'h=free': leave out riding or rigid group hydrogens but include the rest.
 'h=all': include all hydrogen and all other atoms.

The hydrogen atom directives apply only to coordinates tables; hydrogen atoms
are recognised by the .._type_symbol 'H'.  The publication flags can be used
to control which hydrogen atoms appear in tables of bond lengths, angles etc.

 'brack': Atom names should include brackets (if present in the CIF file).
 'nobrack': Brackets are deleted from the atom names.
 'flag': Only output items for which the publication flag is 'Y' or 'y'.
 'noflag': Output all items, ignoring the publication flag.

The default settings are '$h=none,brack,flag'.  The standard tables file
'ciftab.def' illustrates the use of most of these facilities.  CIFTAB extends
some of the standard CIF codes to make them more suitable for tables, and also
takes special action when items such as _refine_ls_extinction_coef are missing
or undefined.



       Appendix D - Example of an Acta Cryst. paper in '.cif' format
       -------------------------------------------------------------

The following example is based on a paper submitted to Acta Crystallographica
in CIF format; it has been edited slightly since submission.

data_global

#============================================================================

# 1. SUBMISSION DETAILS

_publ_contact_author          # Name and address of author for correspondence
;
      Ehmke Pohl
      Institut f\"ur Anorganische Chemie
      Universit\"at G\"ottingen
      Tammannstr. 4
      3400 G\"ottingen
      Bundesrepublik Deutschland
;
_publ_contact_author_phone        '049 551 393075'
_publ_contact_author_fax          '049 551 393373'
_publ_contact_author_email        epohl@ibm.gwdg.de

_publ_requested_journal           'Acta Crystallographica C'
_publ_requested_coeditor_name     ?

_publ_contact_letter
;
  Please consider this CIF submission for publication as a Regular Structure
  Paper in Acta Crystallographica C.
;

#============================================================================

# 2. PROCESSING SUMMARY (IUCr Office Use Only)

_journal_date_recd_electronic     ?

_journal_date_to_coeditor         ?
_journal_date_from_coeditor       ?
_journal_date_accepted            ?

_journal_date_printers_first      ?
_journal_date_printers_final      ?
_journal_date_proofs_out          ?
_journal_date_proofs_in           ?

_journal_coeditor_name            ?
_journal_coeditor_code            ?
_journal_coeditor_notes
 ?

_journal_techeditor_code          ?
_journal_techeditor_notes
 ?

_journal_coden_ASTM               ?
_journal_name_full                ?
_journal_year                     ?
_journal_volume                   ?
_journal_issue                    ?
_journal_page_first               ?
_journal_page_last                ?

_journal_suppl_publ_number        ?
_journal_suppl_publ_pages         ?

#============================================================================

# 3. TITLE AND AUTHOR LIST

_publ_section_title
;
Structures of Aminotriphenylphosphonium Bromide and Hexachloroantimonate
;

# The loop structure below should contain the names and addresses of all
# authors, in the required order of publication. Repeat as necessary.

loop_
 _publ_author_name
 _publ_author_address
     'Pohl, Ehmke'
;     Institut f\"ur Anorganische Chemie
      Universit\"at G\"ottingen
      Tammannstr. 4
      3400 G\"ottingen
      Bundesrepublik Deutschland
;
     'Gosink, Hans J.'
;     Institut f\"ur Anorganische Chemie
      Universit\"at G\"ottingen
      Tammannstr. 4
      3400 G\"ottingen
      Bundesrepublik Deutschland
;
     'Herbst-Irmer, Regine'
;     Institut f\"ur Anorganische Chemie
      Universit\"at G\"ottingen
      Tammannstr. 4
      3400 G\"ottingen
      Bundesrepublik Deutschland
;
     'Noltemeyer, Mathias'
;     Institut f\"ur Anorganische Chemie
      Universit\"at G\"ottingen
      Tammannstr. 4
      3400 G\"ottingen
      Bundesrepublik Deutschland
;
     'Roesky, Herbert W.'
;     Institut f\"ur Anorganische Chemie
      Universit\"at G\"ottingen
      Tammannstr. 4
      3400 G\"ottingen
      Bundesrepublik Deutschland
;
     'Sheldrick, George M.'
;     Institut f\"ur Anorganische Chemie
      Universit\"at G\"ottingen
      Tammannstr. 4
      3400 G\"ottingen
      Bundesrepublik Deutschland
;

#============================================================================

# 4. TEXT

_publ_section_abstract
;
The structures of aminotriphenylphosphonium bromide and hexachloroantimonate
are stabilized by hydrogen bonds.
;
_publ_section_comment
;
The aminotriphenylphosphonium bromide (I) and hexachloroantimonate (II)
have been structurally characterized. There are two formula units of (II)
in the asymmetric unit. Both compounds form hydrogen bonds from the amino
hydrogen atoms to the anions. The positions of the amino hydrogen atoms were
refined with distance restraints for the N-H distances.  The N-Br distances
in I are 3.310(2) and 3.373 (2) \%A, the N-Cl distances in II are 3.594 (4),
3.563(4), 3.740(5) and 3.537(5) \%A.  All other distances and angles are
generally as expected. They correspond well with values found in the
aminotriphenylphosphonium chloride (Hursthouse, Walker, Warrens @ Woolins,
1985), the aminotriphenylphosphonium (1,2,-bis(benzamid-2'-olato)phenyl-
N,N',O,O')nitrido osmium (IV) (Barner, Collins, Maper and Santasiero, 1986)
and the amino triphenylphosphonium (di(thiazane)-3-eno-N,S)-thiosulfato-
triphenyl-phosphine platinum (Hursthouse, Short, Kelly @ Woolins, 1988).
;

_publ_section_experimental
;
Data were collected by the real-time learnt profile method (Clegg, 1981).
Scattering factors, dispersion corrections and absorption coefficients were
taken from International Tables for Crystallography, Vol. C. (1992), tables
6.1.1.4, 4.2.6.8 and 4.2.4.2 respectively.  Since I crystallizes in a polar
space group, polar axis restraints were applied by the method of Flack @
Schwarzenbach (1988) and the absolute structure of the crystal used for the
investigation was established as described by Flack (1983).
;

_publ_section_references
;
Barner, J.C., Collins, T.J., Mapes, B.E. @ Santasiero, B.D. (1986).
Inorg. Chem. 25, 4322-4323.

Clegg, W. (1981). Acta Cryst. A37, 22-28.

Flack, H.D. (1983). Acta Cryst. A39, 876-881.

Flack, H.D. @ Schwarzenbach, D. (1988). Acta Cryst. A44, 499-506.

Hursthouse, M.B., Short, R.L., Kelly, P.F. @ Woollins, J.D. (1988).
Acta Cryst. C44, 1731-1733.

Hursthouse, M.B., Walker, N.P.C., Warrens, C.P. @ Woollins, J.D. (1985).
J. Chem. Soc., Dalton Trans., 1043-1047.

International Tables for Crystallography (1992). Vol. C. Dordrecht: Kluwer
Academic Publishers.

Sheldrick, G.M. (1990). Acta Cryst. A46, 467-473.

Sheldrick, G.M. (1993). In preparation for J. Appl. Cryst.
;

_publ_section_figure_captions
;
Fig.1 : Structure of I showing 50 % probability displacement ellipsoids
The hydrogen atoms are omitted for clarity.

Fig.2 : Structure of II showing 50 % probability displacement ellipsoids.
The hydrogen atoms are omitted for clarity.
;

_publ_section_acknowledgements
;
This work was supported by the Deutsche Forschungsgemeinschaft and the
Fonds der Chemischen Industrie.
;

#============================================================================

data_alge

_audit_creation_method            SHELXL

_chemical_name_systematic
;
 Amino(triphenyl)phosphonium Bromide
;
_chemical_name_common             ?
_chemical_formula_moiety          ?
_chemical_formula_structural      ?
_chemical_formula_analytical      ?
_chemical_formula_sum             'C18 H17 Br N P'
_chemical_formula_weight          358.21
_chemical_melting_point           ?
_chemical_compound_source         ?

loop_
 _atom_type_symbol
 _atom_type_description
 _atom_type_scat_dispersion_real
 _atom_type_scat_dispersion_imag
 _atom_type_scat_source
 'C'  'C'   0.0033   0.0016
 'International Tables Vol C Tables 4.2.6.8 and 6.1.1.4'
 'H'  'H'   0.0000   0.0000
 'International Tables Vol C Tables 4.2.6.8 and 6.1.1.4'
 'P'  'P'   0.1023   0.0942
 'International Tables Vol C Tables 4.2.6.8 and 6.1.1.4'
 'N'  'N'   0.0061   0.0033
 'International Tables Vol C Tables 4.2.6.8 and 6.1.1.4'
 'Br'  'Br'  -0.2901   2.4595
 'International Tables Vol C Tables 4.2.6.8 and 6.1.1.4'

_symmetry_cell_setting            Orthorhombic
_symmetry_space_group_name_H-M    Pna2(1)

loop_
 _symmetry_equiv_pos_as_xyz
 'x, y, z'
 '-x, -y, z+1/2'
 'x+1/2, -y+1/2, z'
 '-x+1/2, y+1/2, z+1/2'

_cell_length_a                    10.978(2)
_cell_length_b                    9.628(2)
_cell_length_c                    15.530(3)
_cell_angle_alpha                 90.00
_cell_angle_beta                  90.00
_cell_angle_gamma                 90.00
_cell_volume                      1641.5(3)
_cell_formula_units_Z             4
_cell_measurement_temperature     153(2)
_cell_measurement_reflns_used     56
_cell_measurement_theta_min       10
_cell_measurement_theta_max       12.5

_exptl_crystal_description        'Transparent blocks'
_exptl_crystal_colour             Colourless
_exptl_crystal_size_max           0.4
_exptl_crystal_size_mid           0.2
_exptl_crystal_size_min           0.2
_exptl_crystal_density_meas       ?
_exptl_crystal_density_diffrn     1.449
_exptl_crystal_density_method     ?
_exptl_crystal_F_000              728
_exptl_absorpt_coefficient_mu     2.595
_exptl_absorpt_correction_type    empirical
_exptl_absorpt_correction_T_min   0.783
_exptl_absorpt_correction_T_max   0.952

_exptl_special_details
;
 ?
;

_diffrn_ambient_temperature       153(2)
_diffrn_radiation_wavelength      0.71073
_diffrn_radiation_type            MoK\a
_diffrn_radiation_source          'fine-focus sealed tube'
_diffrn_radiation_monochromator   graphite
_diffrn_measurement_device        'Stoe-Siemens AED 4-circle-diffractometer'
_diffrn_measurement_method        'Profile fitted 2\q/\w scans (Clegg, 1981)'
_diffrn_standards_number          3
_diffrn_standards_interval_count  ?
_diffrn_standards_interval_time   90
_diffrn_standards_decay_%         0
_diffrn_reflns_number             3776
_diffrn_reflns_av_R_equivalents   0.0068
_diffrn_reflns_av_sigmaI/netI     0.0196
_diffrn_reflns_limit_h_min        -15
_diffrn_reflns_limit_h_max        15
_diffrn_reflns_limit_k_min        -11
_diffrn_reflns_limit_k_max        13
_diffrn_reflns_limit_l_min        -21
_diffrn_reflns_limit_l_max        21
_diffrn_reflns_theta_min          4.23
_diffrn_reflns_theta_max          29.98
_reflns_number_total              3704
_reflns_number_observed           3385
_reflns_observed_criterion        >2sigma(I)

_computing_data_collection        'Stoe DIF4'
_computing_cell_refinement        'Stoe DIF4'
_computing_data_reduction         'Stoe REDU4'
_computing_structure_solution     'SHELXS-86 (Sheldrick, 1990)'
_computing_structure_refinement   'SHELXL-93 (Sheldrick, 1993)'
_computing_molecular_graphics     SHELXTL-Plus
_computing_publication_material   SHELXL-93

_refine_special_details
;
 Refinement on F^2^ for ALL reflections except for 3 with very negative F^2^
 or flagged by the user for potential systematic errors.  Weighted R-factors
 wR and all goodnesses of fit S are based on F^2^, conventional R-factors R
 are based on F, with F set to zero for negative F^2^. The observed criterion
 of F^2^ > 2sigma(F^2^) is used only for calculating _R_factor_obs etc. and is
 not relevant to the choice of reflections for refinement.  R-factors based
 on F^2^ are statistically about twice as large as those based on F, and R-
 factors based on ALL data will be even larger.
;

_refine_ls_structure_factor_coef  Fsqd
_refine_ls_matrix_type            full
_refine_ls_weighting_scheme
 'calc w=1/[s^2^(Fo^2^)+( 0.0241P)^2^+0.6395P] where P=(Fo^2^+2Fc^2^)/3'
_atom_sites_solution_primary      'heavy-atom method'
_atom_sites_solution_secondary    difmap
_atom_sites_solution_hydrogens    geom
_refine_ls_extinction_method      SHELXL-93
_refine_ls_extinction_expression
 'Fc^*^=kFc[1+0.001xFc^2^l^3^/sin(2q)]^-1/4^'
_refine_ls_extinction_coef        0.0050(3)
_refine_ls_abs_structure_details
 'Flack H D (1983), Acta Cryst. A39, 876-881'
_refine_ls_abs_structure_Flack    -0.016(7)
_refine_ls_number_reflns          3701
_refine_ls_number_parameters      216
_refine_ls_number_restraints      106
_refine_ls_R_factor_all           0.0327
_refine_ls_R_factor_obs           0.0258
_refine_ls_wR_factor_all          0.0598
_refine_ls_wR_factor_obs          0.0547
_refine_ls_goodness_of_fit_all    1.102
_refine_ls_goodness_of_fit_obs    1.066
_refine_ls_restrained_S_all       1.095
_refine_ls_restrained_S_obs       1.049
_refine_ls_shift/esd_max          0.001
_refine_ls_shift/esd_mean         0.000

loop_
 _atom_site_label
 _atom_site_type_symbol
 _atom_site_fract_x
 _atom_site_fract_y
 _atom_site_fract_z
 _atom_site_U_iso_or_equiv
 _atom_site_thermal_displace_type
 _atom_site_occupancy
 _atom_site_calc_flag
 _atom_site_refinement_flags
 _atom_site_disorder_group
Br1 Br 0.38157(2) 0.27359(2) 0.50000(2) 0.02694(9) Uani 1 d . .
P1 P 0.14371(4) 0.49898(7) 0.65684(4) 0.0173(2) Uani 1 d . .
N1 N 0.1273(2) 0.3578(2) 0.60192(13) 0.0234(8) Uani 1 d D .
H1A H 0.1853(23) 0.3285(33) 0.5715(17) 0.031(6) Uiso 1 d D .
H1B H 0.0581(20) 0.3392(33) 0.5819(18) 0.031(6) Uiso 1 d D .
C11 C 0.1653(2) 0.6551(2) 0.59526(13) 0.0209(9) Uani 1 d . .
C12 C 0.2659(2) 0.6611(3) 0.5394(2) 0.0295(11) Uani 1 d D .
H12 H 0.3171(6) 0.5838(9) 0.5328(2) 0.036(4) Uiso 1 calc RD .
C13 C 0.2886(2) 0.7828(3) 0.4941(2) 0.0374(11) Uani 1 d D .
H13 H 0.3568(8) 0.7880(3) 0.4576(5) 0.039(4) Uiso 1 calc RD .
C14 C 0.2121(2) 0.8966(3) 0.5018(2) 0.0356(12) Uani 1 d D .
H14 H 0.2281(3) 0.9781(10) 0.4706(4) 0.039(5) Uiso 1 calc RD .
C15 C 0.1116(2) 0.8897(3) 0.5560(2) 0.0346(12) Uani 1 d D .
H15 H 0.0591(7) 0.9666(9) 0.5612(2) 0.039(4) Uiso 1 calc RD .
C16 C 0.0882(2) 0.7688(3) 0.6029(2) 0.0282(10) Uani 1 d D .
H16 H 0.0203(8) 0.7644(3) 0.6396(4) 0.036(4) Uiso 1 calc RD .
C21 C 0.0109(2) 0.5193(2) 0.72264(13) 0.0191(9) Uani 1 d . .
C22 C 0.0224(2) 0.5529(3) 0.8098(2) 0.0261(10) Uani 1 d D .
H22 H 0.0997(9) 0.5658(3) 0.8342(3) 0.036(4) Uiso 1 calc RD .
C23 C -0.0819(2) 0.5672(3) 0.8600(2) 0.0336(12) Uani 1 d D .
H23 H -0.0748(3) 0.5898(4) 0.9186(7) 0.039(4) Uiso 1 calc RD .
C24 C -0.1958(2) 0.5483(3) 0.8241(2) 0.0303(11) Uani 1 d D .
H24 H -0.2660(8) 0.5578(3) 0.8586(4) 0.039(5) Uiso 1 calc RD .
C25 C -0.2077(2) 0.5154(3) 0.7374(2) 0.0253(9) Uani 1 d D .
H25 H -0.2856(9) 0.5026(3) 0.7134(3) 0.039(4) Uiso 1 calc RD .
C26 C -0.1045(2) 0.5011(3) 0.68582(14) 0.0210(9) Uani 1 d D .
H26 H -0.1122(2) 0.4796(4) 0.6272(7) 0.036(4) Uiso 1 calc RD .
C31 C 0.2764(2) 0.4780(2) 0.72269(13) 0.0187(9) Uani 1 d . .
C32 C 0.3141(2) 0.3456(3) 0.7437(2) 0.0314(13) Uani 1 d D .
H32 H 0.2720(5) 0.2682(9) 0.7220(3) 0.036(4) Uiso 1 calc RD .
C33 C 0.4146(3) 0.3265(3) 0.7973(2) 0.0390(15) Uani 1 d D .
H33 H 0.4408(4) 0.2364(11) 0.8113(2) 0.039(4) Uiso 1 calc RD .
C34 C 0.4753(2) 0.4401(3) 0.8297(2) 0.0291(10) Uani 1 d D .
H34 H 0.5433(8) 0.4270(3) 0.8658(4) 0.039(5) Uiso 1 calc RD .
C35 C 0.4378(2) 0.5722(3) 0.8101(2) 0.0351(13) Uani 1 d D .
H35 H 0.4788(5) 0.6489(9) 0.8335(3) 0.039(4) Uiso 1 calc RD .
C36 C 0.3386(2) 0.5924(3) 0.7553(2) 0.0294(11) Uani 1 d D .
H36 H 0.3139(4) 0.6829(11) 0.7406(2) 0.036(4) Uiso 1 calc RD .

loop_
 _atom_site_aniso_label
 _atom_site_aniso_U_11
 _atom_site_aniso_U_22
 _atom_site_aniso_U_33
 _atom_site_aniso_U_23
 _atom_site_aniso_U_13
 _atom_site_aniso_U_12
Br1 0.02225(9) 0.03460(11) 0.02397(9) 0.00572(14) 0.00420(11) 0.01067(9)
P1 0.0154(2) 0.0186(2) 0.0178(2) -0.0018(2) -0.0004(2) 0.0001(2)
N1 0.0170(8) 0.0269(10) 0.0264(9) -0.0103(8) 0.0014(7) -0.0005(7)
C11 0.0196(9) 0.0230(11) 0.0202(9) 0.0008(8) -0.0029(8) -0.0008(8)
C12 0.0285(11) 0.0312(14) 0.0289(11) 0.0012(10) 0.0048(9) 0.0008(10)
C13 0.0365(11) 0.0443(14) 0.0314(13) 0.011(2) 0.0016(13) -0.0098(11)
C14 0.0403(12) 0.0326(12) 0.0339(11) 0.014(2) -0.0132(13) -0.0100(10)
C15 0.0338(12) 0.0251(13) 0.0450(14) 0.0077(11) -0.0121(11) 0.0006(11)
C16 0.0240(10) 0.0287(12) 0.0318(11) 0.0037(10) -0.0030(9) 0.0024(10)
C21 0.0184(9) 0.0183(11) 0.0205(9) -0.0031(8) 0.0008(7) -0.0005(8)
C22 0.0243(10) 0.0307(12) 0.0233(10) -0.0080(10) 0.0023(8) -0.0069(10)
C23 0.0345(12) 0.039(2) 0.0279(12) -0.0154(11) 0.0076(10) -0.0094(12)
C24 0.0259(11) 0.0293(13) 0.0356(13) -0.0099(11) 0.0126(10) -0.0038(10)
C25 0.0183(9) 0.0230(12) 0.0346(12) -0.0022(10) 0.0039(8) -0.0010(8)
C26 0.0207(9) 0.0215(10) 0.0209(9) -0.0020(8) 0.0000(7) 0.0017(8)
C31 0.0178(9) 0.0201(10) 0.0182(9) 0.0001(8) 0.0003(7) -0.0003(7)
C32 0.0387(13) 0.0204(12) 0.0350(12) -0.0046(10) -0.0143(10) 0.0049(10)
C33 0.0459(15) 0.0287(14) 0.0423(15) -0.0028(12) -0.0163(13) 0.0132(13)
C34 0.0220(10) 0.0391(14) 0.0263(11) 0.0047(10) -0.0069(8) 0.0012(10)
C35 0.0332(13) 0.0320(14) 0.0401(13) 0.0080(12) -0.0153(11) -0.0125(11)
C36 0.0318(11) 0.0200(11) 0.0364(12) 0.0042(10) -0.0125(10) -0.0072(10)

_geom_special_details
;
 All esds (except the esd in the dihedral angle between two l.s. planes)
 are estimated using the full covariance matrix.  The cell esds are taken
 into account individually in the estimation of esds in distances, angles
 and torsion angles; correlations between esds in cell parameters are only
 used when they are defined by crystal symmetry.  An approximate (isotropic)
 treatment of cell esds is used for estimating esds involving l.s. planes.

Hydrogen bond details:

 H1A..BR1 2.481(22)
 N1..BR1 3.310(2)
 N-H1A..BR1 168(3)
 H1B..BR1' 2.560(23)
 N1..BR1' 3.373(2)
 N1-H1B..BR1' 163(3)
;

loop_
 _geom_bond_atom_site_label_1
 _geom_bond_atom_site_label_2
 _geom_bond_distance
 _geom_bond_site_symmetry_2
 _geom_bond_publ_flag
P1 N1 1.615(2) . yes
P1 C21 1.791(2) . yes
P1 C31 1.791(2) . yes
P1 C11 1.797(2) . yes
N1 H1A 0.84(2) . yes
N1 H1B 0.84(2) . yes
C11 C16 1.388(4) . ?
C11 C12 1.406(3) . ?
C12 C13 1.389(4) . ?
C12 H12 0.939(11) . ?
C13 C14 1.385(4) . ?
C13 H13 0.941(11) . ?
C14 C15 1.389(4) . ?
C14 H14 0.939(11) . ?
C15 C16 1.397(4) . ?
C15 H15 0.942(11) . ?
C16 H16 0.940(11) . ?
C21 C22 1.397(3) . ?
C21 C26 1.401(3) . ?
C22 C23 1.392(3) . ?
C22 H22 0.937(11) . ?
C23 C24 1.381(4) . ?
C23 H23 0.939(11) . ?
C24 C25 1.389(3) . ?
C24 H24 0.943(11) . ?
C25 C26 1.395(3) . ?
C25 H25 0.940(11) . ?
C26 H26 0.937(11) . ?
C31 C32 1.380(3) . ?
C31 C36 1.392(3) . ?
C32 C33 1.394(3) . ?
C32 H32 0.939(11) . ?
C33 C34 1.376(4) . ?
C33 H33 0.939(11) . ?
C34 C35 1.371(4) . ?
C34 H34 0.942(11) . ?
C35 C36 1.396(3) . ?
C35 H35 0.938(11) . ?
C36 H36 0.941(11) . ?

loop_
 _geom_angle_atom_site_label_1
 _geom_angle_atom_site_label_2
 _geom_angle_atom_site_label_3
 _geom_angle
 _geom_angle_site_symmetry_1
 _geom_angle_site_symmetry_3
 _geom_angle_publ_flag
N1 P1 C21 107.58(10) . . yes
N1 P1 C31 107.31(10) . . yes
C21 P1 C31 110.39(10) . . yes
N1 P1 C11 115.96(11) . . yes
C21 P1 C11 108.64(11) . . yes
C31 P1 C11 106.93(10) . . yes
P1 N1 H1A 120(2) . . yes
P1 N1 H1B 118(2) . . yes
H1A N1 H1B 114(3) . . yes
C16 C11 C12 119.9(2) . . ?
C16 C11 P1 122.3(2) . . ?
C12 C11 P1 117.8(2) . . ?
C13 C12 C11 119.2(3) . . ?
C13 C12 H12 120.4(2) . . ?
C11 C12 H12 120.38(15) . . ?
C14 C13 C12 120.9(3) . . ?
C14 C13 H13 119.5(2) . . ?
C12 C13 H13 119.5(2) . . ?
C13 C14 C15 119.8(3) . . ?
C13 C14 H14 120.1(2) . . ?
C15 C14 H14 120.1(2) . . ?
C14 C15 C16 120.1(3) . . ?
C14 C15 H15 120.0(2) . . ?
C16 C15 H15 120.0(2) . . ?
C11 C16 C15 120.1(2) . . ?
C11 C16 H16 119.96(14) . . ?
C15 C16 H16 120.0(2) . . ?
C22 C21 C26 120.4(2) . . ?
C22 C21 P1 120.3(2) . . ?
C26 C21 P1 119.3(2) . . ?
C23 C22 C21 119.4(2) . . ?
C23 C22 H22 120.30(14) . . ?
C21 C22 H22 120.30(13) . . ?
C24 C23 C22 120.4(2) . . ?
C24 C23 H23 119.82(14) . . ?
C22 C23 H23 119.82(14) . . ?
C23 C24 C25 120.4(2) . . ?
C23 C24 H24 119.78(14) . . ?
C25 C24 H24 119.78(14) . . ?
C24 C25 C26 120.2(2) . . ?
C24 C25 H25 119.91(14) . . ?
C26 C25 H25 119.91(13) . . ?
C25 C26 C21 119.2(2) . . ?
C25 C26 H26 120.39(13) . . ?
C21 C26 H26 120.39(12) . . ?
C32 C31 C36 119.9(2) . . ?
C32 C31 P1 118.9(2) . . ?
C36 C31 P1 121.2(2) . . ?
C31 C32 C33 120.0(2) . . ?
C31 C32 H32 119.99(13) . . ?
C33 C32 H32 120.0(2) . . ?
C34 C33 C32 119.8(3) . . ?
C34 C33 H33 120.1(2) . . ?
C32 C33 H33 120.1(2) . . ?
C35 C34 C33 120.7(2) . . ?
C35 C34 H34 119.66(14) . . ?
C33 C34 H34 119.7(2) . . ?
C34 C35 C36 120.0(2) . . ?
C34 C35 H35 120.02(14) . . ?
C36 C35 H35 120.0(2) . . ?
C31 C36 C35 119.6(2) . . ?
C31 C36 H36 120.18(14) . . ?
C35 C36 H36 120.2(2) . . ?

_refine_diff_density_max    0.248
_refine_diff_density_min   -0.230
_refine_diff_density_rms    0.057

#============================================================================

data_dada

_audit_creation_method            SHELXL

_chemical_name_systematic
;
Amino(triphenyl)phosphonium Hexachloroantimonate
;

  ... etc. as for the first structure ...

_refine_diff_density_max    0.428
_refine_diff_density_min   -0.348
_refine_diff_density_rms    0.058

#============================================================================

_eof  # End of Crystallographic Information File



          Appendix E - Distribution and installation of SHELXL-93
          -------------------------------------------------------

SHELXL-93 is usually distributed in self-extracting packed form on MSDOS
diskette, and contains the following files:

shelxl.for - SHELXL source for VAX/VMS 'front end'; this should be compiled
without vectorization or optimization.

shelxl.f - SHELXL front end for UNIX (and some other) systems.

shelxlv.f (or shelxlv.for) - sources for routines which should be fully
optimized and/or vectorized.  The VAX/VMS and UNIX versions are identical.

tyme.c - C routines for the date and time for those UNIX systems which do not
provide them (i.e. IBM RS/6000 series).  If these are used, all calls from
shelxl.f to 'TIME' must be changed to 'TYME' to avoid a clash of names.  A
special version of 'shelxl.f' (called 'shelxl.ibm') is available in which this
has been done.

shelxl1.f, shelxl2.f,shelxl3.f and shelxl4.f (or shelxl1.for etc. for VMS).
The remaining sources for the main body of the program which should be
compiled without vectorization or optimization.  The VAX/VMS and UNIX versions
are identical.

shelxl.hlp - this documentation.  This MUST be read before attempting to
install or use the programs !

sigi.ins, sigi.hkl, ags4.ins, ags4.hkl - test input files for SHELXL=93
(discussed in detail earlier in this documentation).

pdbins.f - essentially computer-independent source of PDBINS (see Appendix B).

shelxl.dic - restraints dictionary (currently only for proteins) which is read
by PDBINS.  This may be used as a model for local 'restraints dictionaries'.

ciftab.f - essentially computer-independent source of CIFTAB (see Appendix C).

ciftab.def - standard format file for input to CIFTAB.  This may be used as a
model for users to produce modified tables formats for specific journals etc.
ciftab.ang, ciftab.met and ciftab.ger - special format files for tables
production using the XTEXT program in the Siemens SHELXTL system.

Usually pdbins and ciftab may be compiled and linked with standard compiler
options; before installation the comments in these program sources should be
consulted !


Compilation on VAX/VMS systems
------------------------------

(a) VaxStation, MicroVAX etc.

$ FOR SHELXL,SHELXLV,SHELXL1,SHELXL2,SHELXL3,SHELXL4
$ LINK SHELXL,SHELXLV,SHELXL1,SHELXL2,SHELXL3,SHELXL4
$ SET PROT=(W:E) SHELXL.EXE

(b) VAX 9000 (vector processor) etc.

$ FORT SHELXL,SHELXL1,SHELXL2,SHELXL3,SHELXL4
$ FORT/VECTOR/ASSUME=(NOACCU,NODUMM)/MATH=FAST/SHOW=ALL SHELXLV
$ LINK SHELXL,SHELXLV,SHELXL1,SHELXL2,SHELXL3,SHELXL4
$ SET PROT=(W:E) SHELXL.EXE

In both cases the following symbol should also be defined for each session in
which the program is used:

$ SHELXL :== $ DISK:[USER]SHELXL

where DISK and USER define where the file SHELXL.EXE is located, and will need
to be replaced by the appropriate names for your system.  This line may be
included in the LOGIN.COM file for individual users, or - better - a global
symbol SHELXL can be defined in the file which is executed when the system is
started.


Compilation on IRIS (and many other UNIX) systems
-------------------------------------------------

   For UNIX systems all filenames associated with SHELX should be lower case.
The name of the compiler and the optimization switches etc. differ for
different systems.  There is no need - and it will probably prove counter-
productive - to optimize 'shelxl.f', 'shelxl1.f' etc., but it is important to
compile 'shelxlv.f' with the highest available optimization level.  Typical
instructions to compile and link would be:

# f77 shelxlv.f -c -O2
# f77 shelxl.f shelxl1.f shelxl2.f shelxl3.f shelxl4.f shelxlv.o -o shelxl

The executable program shelxl may then be copied into a directory such as
/usr/bin/ for general use (which will require superuser privileges). The UNIX
version of SHELXL-93 is able to read the '.ins' and '.hkl' files in either
UNIX or DOS format, and may be set up to write the '.res', '.cif' and '.fcf'
files in DOS format, so that PC's can access such files via a shared disk
without the need for conversion programs such as DOS2UNIX etc. To compile the
program with this option the first executable statement in shelxl.f should be
KD=CHAR(13) (see the comments in the source).  For reasons of efficiency the
'.lst' file is always in the local format (it can still be printed directly
from a PC using SPRINT - see below).


Compilation on IBM RS/6000 series
---------------------------------

   Since IBM do not (yet) provide FORTRAN-callable DATE and TIME routines, it
is necessary to include FORTRAN-callable C routines DATE and TYME provided in
the file tyme.c.  They may be compiled as follows:

# xlc tyme.c -c

The rest of the program is then compiled and linked as follows:

# xlf shelxlv.f -c -O
# xlf shelxl1.f -c -NT50000 -NQ50000
# xlf shelxl2.f -c -NT50000 -NQ50000
# xlf shelxl3.f -c -NT50000 -NQ50000
# xlf shelxl4.f -c -NT50000 -NQ50000
# xlf shelxl.f shelxlv.o shelxl1.o shelxl2.o shelxl3.o shelxl4.o tyme.o
  -o shelxl

Where shelxl.f is a special version in which all the calls to TIME have been
changed to TYME; the file 'shelxl.ibm' in which this change has been made may
be copied to 'shelxl.f'.  Note that the file 'shelxl.ibm' also specifically
closes all files before terminating to work around a known system problem.


Compilation on Convex computers
-------------------------------

'shelxl.f' must be modified by replacing the statement T=... in subroutine
SXTI (in 'shelxl.f') to:

      T=ETIME(TX)

where TX has been declared as a REAL array of dimension 2, i.e. the statement
REAL TX(2) is included after the first statement of the subroutine.


Compilation on Sun Sparcstations running SUNOS
----------------------------------------------

The same alteration must be made as for Convex, and the -lV77 switch used on
the f77 compile/link instruction (n.b. lower case 'L', not digit '1').


Compilation on the Cray Y-MP running UNICOS
-------------------------------------------

The line: T=.01*REAL(MCLOCK()) should be changed to:

      T=SECOND()

It is simpler (and safe) to vectorize all routines:

# cf77 -l nag -Zp -o shelxl shelxl.f shelxlv.f shelxl1.f shelxl2.f shelxl3.f
  shelxl4.f


Compilation on other computers
------------------------------

   Although the UNIX and VAX/VMS versions are almost identical, it will
probably prove easier to adapt the UNIX version.  The program is standard
FORTRAN-77 except for the following routines.  Sometimes compiler and linker
switches should be set for VAX/VMS-compatibility.  Usually it is best to
compile and link without optimization first to see if there are unresolved
subroutine references.  These should only refer to the following non-standard
FORTRAN-77 routines, and will have to be replaced by calls to local
alternatives.  There are a very small number of such calls, and all are to be
found in the 'front-end' routines in shelxl.f (or shelxl.for for VMS).  A
pitfall for the unwary is the possibility that the same names are used for
local routines but with different specifications, which can cause the program
to appear to compile and link correctly but to abort when started; the most
likely culprit is 'TIME'.

TIME and DATE - these subroutines should return the current time and date as
strings 'HH:MM:SS' and 'DD-MON-YY' (where HH is the hours part of the time as
two digits, and MON is a three character abbreviation for the month, etc.).
It does not matter which characters are used as separators.

MCLOCK - this function should return the current time as a INTEGER number of
1/100 seconds from an arbitrary starting time.  The function ETIME (see above)
(which returns the number of seconds as a REAL) is a common alternative.

IARGC and GETARG - the INTEGER function IARGC should return the number of
command line parameters, and the subroutine call GETARG(IARGC(),string) is
then used by SHELXL-93 to extract the last of these (it is used to set up
all the file names by adding the appropriate extensions).  This avoids the
confusion over where to start counting parameters !


Precompiled PC versions
-----------------------

   Two precompiled PC versions are provided. 'SHELXL.EXE' is a 32-bit version
which runs in 'protected mode' on 80386SX, 80486SX, 80386DX, 80486DX and
Pentium processors.  For the first three a numeric coprocessor must also be
present.  The program contains a built-in (Phar Lap) protected mode loader and
so runs as a stand-alone program.  It runs as a virtual memory program if the
available extended memory is less than about 5MB, which means that about 10MB
disk space should be free for scratch files. Since the program is particularly
efficient as regards disk input/ouput, it is usually better to leave extended
memory free for the program rather than to install a disk cache.  On systems
with limited physical memory it may be necessary to remove other resident
programs and protected mode drivers (in 'AUTOEXEC.BAT' and 'CONFIG.SYS', then
reboot).  If the program fails to load properly it usually means that either
not enough memory is available, or that a memory manager or other resident
user of extended memory conflicts and has to be removed first.  The Lahey /
Phar Lap banner which appears when the program is started gives the amount of
extended memory available to the program; it may be suppressed by the
statement:

SET DOS=-NOSIGNON

which can be included in 'AUTOEXEC.BAT'.  MSDOS version 5.0 or later is
recommended but not absolutely essential.

   For older personal computers and systems with too little extended memory
the 16-bit 'real mode' version 'PCSHELXL.EXE' is also provided.  It contains
all facilities of 'SHELXL.EXE' but is somewhat slower and has limited memory,
so for example is restricted to 300 full-matrix parameters.  BLOC or CGLS may
be used to refine larger structures, provided that there is room for all
atoms, restraints etc.; since the memory allocation is dynamic, there are no
individual limits except on the number of least-squares parameters.  This
version should run on ANY personal computer with an 8088 or compatible CPU
and the corresponding coprocessor (not required for 80486DX etc.) and 640 kB
of memory.  The scratch disk space required depends on the size of the
structure; at least 2MB is recommended (a RAM-disk may be used - the scratch
files are set up in the current directory).  If there is not enough memory to
run the program it may be necessary to remove resident drivers (network
software is particularly greedy).  Although this program should be compatible
with MSDOS 2.10 and all subsequent versions, version 5.0 or later is
recommended.

   Although both PC versions are tolerated by MS-WINDOWS and other
multitasking interfaces, there is usually a considerable price to pay in terms
of performance degradation.  If PC's and RISC machines are linked by a NFS
network, they may be run using the same files provided that these are in DOS
format, because the UNIX version of SHELXL-93 can read DOS format .ins and
.hkl files and can be set up to write DOS format .res, .cif and .fcf files
(by setting KD=CHAR(13) as the first executable statement in shelxl.f - see
comments in the source).  If this has been done, all editing may be performed
on the PC's; any text editor may be used.

   For PC's which are connected to a HP Laserjet or compatible printer, a
special program SPRINT is provided for printing the documentation and listing
files.  SPRINT can print both DOS and UNIX format text files (the latter
facility is useful for NFS networks).  If no filename extension is specified,
.lst is assumed.  If .lst is given or assumed, the output is compressed;
otherwise a margin is left on the left hand side.  SPRINT should be able to
handle read-only files and printers running out of paper. Examples:

SPRINT SHELXL.HLP  (to print this manual; similarly READ.ME and REFEREE.MSG)
SPRINT AGS4  (to print AGS4.LST in compressed mode after running the test job)


Memory requirements, paging etc.
--------------------------------

   The program uses two large arrays A and B dynamically, so the limits on the
size of structure which can be handled are determined by the dimensions of
these two arrays and also of the array C; A, B and C are defined as separate
COMMON blocks.  The standard version of the program is dimensioned for up to
1500 parameters in each full-matrix block and roughly 5000 atoms (assuming a
generous number of restraints etc.), and is suitable for a typical (UNIX)
workstation (or mainframe) with 8MB or more physical memory; the precompiled
(protected mode) PC version 'SHELXL.EXE' is similarly dimensioned.

   It may be necessary to redimension A, B and C and recompile the program for
specific installations, e.g. to fit within a given job category on a
mainframe.  The highest elements of A and B actually used for the various
calculations are printed out by the program (after 'Memory required =').  The
program will try to use all available physical (and virtual) memory rather
than performing its own disk I/O, thereby achieving longer vector 'runs',
which enhances performance on vector and pipelined systems.  In some cases,
e.g. when a large structure is refined on a MicroVAX or PC with limited
physical memory (or allocation of physical memory to a given process in the
case of the VAX) this strategy may cause excessive 'paging' and disk I/O.  If
this happens, the maximum vector run length can be reduced by setting the 4th
parameter on the L.S. instruction or by reducing the value of the variable IV
in the main program and recompiling; it may also be more efficient to 'block'
the refinement or use CGLS (except in the final refinement).



                     Appendix F - Application form
                     -----------------------------


SHELXL-93 USER REGISTRATION FORM            | Do not write here !
--------------------------------            | Date sent:
Title/Name:                                 | Version:
                                            --------------------------------
Full postal address:
                                            EMAIL address (if available):


                                            Tel:

                                            Fax:
----------------------------------------------------------------------------
Please tick ALL relevant boxes, sign and return to: Prof. George Sheldrick,
Institut fuer Anorg. Chemie, Tammannstrasse 4, D-37077 Goettingen, Germany.
Fax: +49-551-393373;   Email: gsheldr @ shelx.uni-ac.gwdg.de
----------------------------------------------------------------------------
SHELXL-93 is supplied ONLY on MSDOS format diskettes, in the form of self-
extracting packed files containing MSDOS executables and sources for UNIX
and VMS systems, documentation and test data.  The programs PDBINS (PDB to
SHELXL format conversion) and CIFTAB (tables from SHELXL CIF output) are
included.  The license fee of DM 4999 for for-profit institutions covers
use for an unlimited time on an unlimited number of computers at a specified
firm or institution at a single geographical site.  SHELXL-93 is currently
available free of charge to academics for non-commercial use only; it may
prove necessary to change this policy if the license fees for for-profit
institutions fail to cover the total costs involved.  Academic institutions
willing and able to contribute to the costs of developing and distributing
the SHELX programs are of course welcome to do so (we suggest DM 99).
Please make out checks to "Institut fuer Anorg. Chemie, Prof. Sheldrick".
If you wish to pay by direct bank transfer please ask us to send an invoice.
----------------------------------------------------------------------------
[ ] I wish to license SHELXL-93 for use at the following for-profit firm or
institution.  I agree that within three months I will either destroy all
copies of the program in my possession or pay the license fee of DM 4999.


----------------------------------------------------------------------------
[ ] The program will be used exclusively for non-commercial purposes at the
following not-for-profit institution only:


----------------------------------------------------------------------------
[ ] Please send me an invoice for DM

[ ] Please send me a receipt for the enclosed payment of DM

[ ] I agree to cite SHELXL-93 in all publications reporting results obtained
using it.

[ ] I accept that the author has no liabilities in respect of errors in the
programs and documentation.

Please supply SHELXL-93 on [ ] 1.44 MB, [ ] 1.2 MB, [ ] either 1.2 or 1.44
MB MSDOS diskettes.  [ ] I already possess a copy of SHELXL-93.
----------------------------------------------------------------------------

  Signed:                                      Date: