SEQ, SEQUENCE


NAME
SEQ, SEQUENCE - manipulate the content of the sequence buffer.

SYNOPSIS
SEQ = three_letters_code
SEQ LOAD filename
SEQ READ filename
SEQ FROM structure_identifier
SEQ COPY
SEQ SAVE filename
SEQ SWN filename
SEQ RESET

DESCRIPTION
The command SEQ (long form: SEQUENCE) manipulates the content of the main sequence buffer. Garlic mantains two sequence buffers: the main buffer and the reference buffer. The main sequence buffer is used to prepare the average hydrophobicity plot, the hydrophobic moment plot, helical wheel plot and for some other operations which require the sequence information. The reference sequence buffer is used for sequence comparison and other operations which require two sequences.

Both buffers store the following sequence information:
(1) The number of residues.
(2) The sequence in the form of three letters code. Uppercase letters are used.
(3) Disulfide bond flag, if information about disulfide bonds is available.
(4) Residue serial numbers.
(5) Raw hydrophobicity values (replaced by average value for exotic residues).

In addition, the main sequence buffer contains the following information:
(6) The average hydrophobicity. width.
(7) The hydrophobic moment.

As sequence information may be given independently from any structure, atomic coordinates are not required for most sequence manipulation routines. Thus, garlic may be used as a sequence analysing tool.

All version of the command SEQ, except one, are used to manipulate the content of the main sequence buffer. The only exception is SEQ COPY, which copies the content of the main sequence buffer to the reference buffer. This is the only way to store information to the reference buffer.

SEQ = three_letters_code
The command SEQ may be used with the keyword = (equal sign) to define sequence at garlic command prompt. This may be practical to define a short sequence fragment. This fragment may be used for helical wheel plot, or to locate the given sequence fragment in a structure which is being investigated. The syntax:

SEQ = three_letters_code

Example:
seq = ala phe tyr trp asn

The sequence fragment will be converted to uppercase. The sequence is not checked for exotic residues so you can use the non-standard codes. However, the routine which assigns the hydrophobicity values will fail to recognize them. The average hydrophobicity value (calculated for the current scale) will be assigned to these residues. At present, 23 codes are recognized:


SEQ LOA filename
SEQ LOAD filename
SEQ REA filename
SEQ READ filename
The keyword LOAD (or READ) may be used to read a sequence from the specified file. Garlic is capable to recognize two types of input file formats: FASTA files (one letter code) and files which contain three letters code in a free format.

If input file contains the symbol > (greater than) in the first column of the first useful line, the file is treated as one letter protein code in FASTA format. Empty lines are ignored. The lines beginning with the symbol # (numbersign) in the first column are treated as comments (ignored too). Thus, the lines which are not empty and do not contain the symbol # in the first column are treated as useful.

If input file is not recognized as FASTA file, it is expected to contain the three letters code in a free format. Empty lines and all lines which contain # in the first column are ignored. All other lines are treated as useful. Digits (serial numbers, for example) are ignored. The following characters are threated as separators:
(1) space
(2) tab
(3) comma (,)
(4) semicolon (;)
(5) newline (line feed)

If input file contains at least one bad code (a residue name which consists of four letters, for example) the reading will fail. The hard-coded maximal number of residues is 50000, but it may be easily changed (see MAXRESIDUES in the header file defines.h).

Example:
load sample.fasta

SEQ FRO structure_identifier
SEQ FROM structure_identifier
The keyword FROM may be used to copy the sequence from the specified structure to the main sequence buffer. Only selected residues are copied. Residue is treated as selected if the first atom is selected. For proteins, this is typically N (nitrogen). Residue insertion codes are ignored! Thus, the same residue serial index (number) may appear more than once in the array of residue serial numbers.

Example:
seq from 1

SEQ COP
SEQ COPY
The command SEQ COPY copies the sequence from the main sequence buffer to the reference buffer. This is the only way to initialize the reference buffer. This command must be executed (i.e., the keyword COPY must be used) before executing commands which require two sequences for proper operation. The main sequence buffer may be initialized prior to SEQ COPY by using one of the keywords described above (=, LOAD or FROM).

Example:
seq copy

SEQ SAV filename
SEQ SAVE filename
The command SEQ SAVE saves the sequence to the specified file. Ten codes (each consisting of up to three letters) are written per line, separated by space. Serial numbers are not included (see SWN keyword).

Example:
seq save 9pap.seq

SEQ SWN filename
The command SEQ SWN saves the sequence to the specified file. Both residue names and serial numbers are written to the output file. Insertion codes will be missing! Five serial numbers and residue names are written per line, separated by space.

Example:
seq swn 9pap.seq

SEQ RES
SEQ RESET
Reset (clear) the main sequence buffer. The command SEQ RESET sets the number of residues in the main sequence buffer to zero. The storage is not freed, so the buffer may be used again later.

Example:
seq reset

RELATED COMMANDS
PLOT prepares the average hydrophobicity and/or hydrophobic moment plot. COMPARE compares two sequences. VENN draws Venn diagram. WHEEL draws helical wheel plot. SEL SEQ selects portions of the structure which contain the sequence stored to the main sequence buffer. To use any of these commands, the main sequence buffer (to use COMPARE both buffers) must be initialized by using the command SEQ.