Format of the `sequest.params` file and parameters to dqst

dqst is a program very similar to Sequest™ but enhances some aspects of the search, provides better results because many errors have been corrected and last not least it is extensible.

Parameters

Please, read the output of the command "dqst -H" to see whether changes have been made obsoleting this information.

-A / +A
-C
-DQST
-F
-FileIO
-G / +G
-H
-L
-M
-O
-P
-R
-S
-Sequest
operation modes
-Q
-X

Usually the last occurance wins for parameters which change values. Example:
-Cdir1 -Cdir2
effectively selects "-Cdir2".

Environment variables with an effect:

DQST
SEQUEST4DQST
DQST_PRESERVE_FILES
PATH
TMP, TEMP, TEMPDIR, TMPDIR

Please note that the section [SEQUEST_ENZYME_INFO] must follow the section [SEQUEST] in Sequest™, there is no limitation in dqst.

Parameter -A / +A

This flag isn't used in pure dqst. The flag is passed to Sequest™ in sequest4dqst mode instead.
+A selects the use of FFT, -A omits its use. +A is the default.

An arguments is required to this parameter. It is a directory's name to which the program should jump before processing any file.
The argument must follow the parameter immediately for Sequest™. There is no such restriction in dqst. You may use "-C dir".

Parameter -DQST

The purpose of this parameter is to act as an antagonist to -Sequest. The whole behaviour is similar. All values are set by default.

This parameter is unknown to Sequest™.

Parameter -F

This flag isn't used in pure dqst currently, but the flag is passed to Sequest™ in sequest4dqst mode.
This flag selects XML format for the output.

Parameter -FileIO

Normally Sequest™ uses plain file IO while dqst uses an advanced technique called memory mapped files by default. You can select plain file IO explicitely by using "-FileIO".
You should select -FileIO if your free real memory is smaller than your database size.
The antagonist of this flag is -M which reverses access (back) to memory mapped files.
Neither -FileIO nor -M is passed to an inferior Sequest™ program in sequest4dqst mode. The generated file isn't big and it isn't reused.

This parameter is unknown to Sequest™.

Parameter -G / +G

This flag isn't used by dqst in native modes.
+G selects the use of revision 1.2 index files, -G omits its use. -G is the default.
In sequest4dqst mode this flag isn't used either because the generated file is a plain database.

Parameter -H

This flag requests online help from the program. Use "-HA" if you want everything about your Sequest™ program.

Parameter -L

This flag isn't used in pure dqst. The flag is passed to Sequest™ in sequest4dqst mode instead.
-L selects LCQ binning. Binning is used in discrete Fourier Transformation.
dqst uses a different binning method in native dqst mode. The compatibility mode uses more or less the original binning method.

Parameter -M

Normally Sequest™ uses plain file IO while dqst uses an advanced technique called memory mapped files by default. You can select memory mapped files by using "-M" even for Sequest™.
You should select -FileIO if your free real memory is smaller than your database size.
The antagonist of this flag is -FileIO which reverses access (back) to plain file IO.
Neither -FileIO nor -M is passed to an inferior Sequest™ program in sequest4dqst mode. The generated file isn't big and it isn't reused.

Parameter -O

This parameter overwrites the value of create_output_files to 0.

dqst either writes to stdandard output or it writes an output file. Sequest™ always writes to standard output.

This parameter is unknown to Sequest™.

Parameter -P

An arguments is required to this parameter. It is a file name with the parameters and defaults to "sequest.params".
The argument must follow the parameter immediately for Sequest™. There is no such restriction in dqst. You may use "-P dqst.params".

Parameter -R

An arguments is required to this parameter. It is a file name which contains a DTA file name to process in each line.
The argument must follow the parameter immediately for Sequest™. There is no such restriction in dqst. You may use "-R filelist".

Parameter -S

This parameter let dqst and Sequest™ skip a DTA file if a corresponding .OUT file already exists.
This parameter remains its purpose even if create_output_files is set to 0 or -O is set.

Parameter -Sequest

This parameter works with and without an argument which must be added by a preceeding equal sign. No space is allowed. Examples are

-Sequest
-Sequest=1
-Sequest=5

-Sequest is equivalent to -Sequest=4. The number itself is a combination of various flags that change the general behaviour. A combination is selected by the addition of the various flags. "5" is the combination of "1" and "5" for example.

Initially all flags are set to 0 and can be set iteratively using this parameter. The antagonist parameter -DQST resets the selected flags back to null. Thus, the parameter sequence "-Sequest=5 -DQST" effectively sets the flag "1", because "-DQST" is equivalent to "-DQST=4" and 5-1 is 4.

This parameter is unknown to Sequest™.

Meaning of the flags:

flag meaning

1 select sequest4dqst mode

2 select sequest compatibility mode

4 format output as most compatible to Sequest™ as possible.
dqst emits another column with the original peptide which helps to identify changed amino acid codes from a pattern. Example:
... Peptide OriginalPeptide ... ------- --------------- ... R.GIDVHNAEF.F R.GIDVXNAEF.F
Here, X has been changed to H. The compatible output format shows the original peptide in the column "Peptide".

8 Don't use windows for normalizing, usually alignment is done in 10 windows. This is an experimental flag. Use it on your own risk.

16 Don't ignore peaks smaller than 5% of the maximum abundancy. This is an experimental flag. Use it on your own risk.

32 Compare all theoretical peaks in the FFT. Normally only matching peaks are compared. The flag takes precedence over flag value 64. This is an experimental flag. Use it on your own risk.

64 Compare all core peaks and all matching neutral loss peaks in the FFT. Normally only matching peaks are compared. A core peak are all standard peaks of the selected series without any neutral loss.

32768 An experimental flag for testing purpose.

flag	meaning
1	select sequest4dqst mode
2	select sequest compatibility mode
4	format output as most compatible to Sequest™ as possible. dqst emits another column with the original peptide which helps to identify changed amino acid codes from a pattern. Example: ... Peptide OriginalPeptide ... ------- --------------- ... R.GIDVHNAEF.F R.GIDVXNAEF.F Here, X has been changed to H. The compatible output format shows the original peptide in the column "Peptide".
8	Don't use windows for normalizing, usually alignment is done in 10 windows. This is an experimental flag. Use it on your own risk.
16	Don't ignore peaks smaller than 5% of the maximum abundancy. This is an experimental flag. Use it on your own risk.
32	Compare all theoretical peaks in the FFT. Normally only matching peaks are compared. The flag takes precedence over flag value 64. This is an experimental flag. Use it on your own risk.
64	Compare all core peaks and all matching neutral loss peaks in the FFT. Normally only matching peaks are compared. A core peak are all standard peaks of the selected series without any neutral loss.
32768	An experimental flag for testing purpose.

operation modes

dqst has three different operation modes which can be selected by flags. A specific mode performs the whole analysis step. These modes are pluggable. Not every mode may have been distributed by every software release. If the desired mode is unavailable, another mode is selected automatically.

operation mode sequest4dqst

This mode is also called the proxy mode. An original Sequest™ program is needed to use this mode. dqst will perform every pre-selection of every peptide candidate. This includes the replacement of joker amino acid letters by matching amino acids. Then, a temporary database is created with the matching peptides. A new parameter file compatible with Sequest™ is generated. Finally, the Sequest™-program is called. A trick is used to force this original program to detect proper cleavage points by inserting X letters with a mass of 10000. No valid X letter can occur normally, because all have been replaced. This cheating will force the called program to separate the peptides at the correct positions.

dqst needs to know what the original program is. This is accomplished either by setting an environment variable called SEQUEST4DQST or a parameter file's entry called sequest4dqst. Either select the pure file name if the program can be found in the current path or select Sequest™'s complete path by setting the environment variable SEQUEST4DQST to e.g. C:\SEQUESTPROGS\SEQUEST27.EXE.

Note that using the environment variable automatically selects "-Sequest=1". This can be overwritten by further command line parameters.

The value of the environment variable takes precedence over the value entered in the parameter file.

This mode is selected by setting the command line parameter "-Sequest=1".

operation mode sequest compatibility

This mode is intended as a direct replacement for Sequest™. It merges the new peptide digester with amino acid replacement already used in mode sequest4dqst with the most Sequest™-compatible scoring algorithm but without commonly known bugs. No further program is used.

This mode takes precedence over mode sequest4dqst if both have been selected by "-Sequest=3".

This mode is selected by using the command line parameter "-Sequest=2".

operation mode dqst standard mode

This is the most advanced mode. It is intended to be a replacement for Sequest™, too, but it uses

a different scoring algorithm
a different ranking
a different output format (this can be changed by "-Sequest=4")

The already described new peptide digester is also used, of course.

scoring algorithm

The new scoring algorithm uses a complete different approach than Sequest™. The algorithm itself is beyond the scope of this help. Look at the published paper about this topic.

ranking

The new ranking incorporates both the preliminary score as the cross-correlation value by multiplying them and ranking the product.

The preliminary score is called SP further on, but uses a different scheme of calculation. It is used for preliminary ranking like the value computed by Sequest™, though.

The cross-correlation score is called XCorr further on, but is completely different.

output format

If show_fragment_ions is selected, the output is different to Sequest™. A list of ions sorted by mass is shown. Each row has five or six columns separated by semicoli.

The m/z value of the ion.
The signed z value (charge) of the ion.
The fragment's base name, e.g. "y4".
The expected normalized intensity of this ion. The algorithms used perform a peak height prediction. The results are displayed here.
The modifications of this ion shown in parentheses. This includes every variable modification character and a list of neutral losses or gains.
This optional column contains either nothing or "matches" with the obvious meaning.

Parameter -Q

This flag isn't used in pure dqst currently, but the flag is passed to Sequest™ in sequest4dqst mode.
This flag selects output compatible with a SequestQueue application.

Parameter -X

This flag isn't used in pure dqst currently, but the flag is passed to Sequest™ in sequest4dqst mode.
This flag selects output compatible with a Bioworks Browser.

Environment variables

Various environment variables have an influence on the behaviour of dqst. They can be changed using the favourite method on the used operation system, e.g. "SET DQST=whatever" or "EXPORT DQST=whatever" or using more sophisticated techniques.

DQST

This environment variables may contain any command line parameter. A very helpful thing is the use of additional switches in environment where other programs invoke dqst.

Assume you have a SequestBrowser in use and you want to select a Sequest™ compatible output. Under Windows you should either set the environment variable in the system preferences or you should set it on the command line followed by an invocation of the browser.

The content of the environment variable should be "-Sequest=4" in this case.

All command line parameters that should be passed to dqst must be entered in the content of the environment variable. Every element of the variable may be quoted to ensure proper determination. An example is

SET DQST='-DC:\My Files\db.fasta' "-RC:\My Files\list"

The content of this environment variable is analyzed first. Then the command line is checked and may overwrite any values passed in the environment variable. Values entered using the environment variable are equivalent to command line parameters after this step. Thus, they overwrite the content of the parameter file in most cases.

SEQUEST4DQST

See operation mode sequest4dqst.

DQST_PRESERVE_FILES

This environment variable can be set to any value. All temporary files are not deleted as long as this environment variable exists. Currently, only the operation mode sequest4dqst creates temporary files.

PATH

This standard environment variable has an effect on search programs. Using the environment variable SEQUEST4DQST or the parameter sequest4dqst in operation mode sequest4dqst an external program is invoked. This environment variable's content is seeked if a path isn't passed along with the program's name. Consult your favourite operating system manual for more information about PATH.

TMP, TEMP, TEMPDIR, TMPDIR

This standard environment variables have an effect on creating temoprary files. The preferred variable name is operating system dependend. Some temporary variables are created in operation mode sequest4dqst using at least one of this environment variables. Consult your favourite operating system manual for more information about temporary files and which environment variable selects the actual path.

sequest.params

file's structure

The file is line oriented. It doesn't matter whether line feed (ASCII 10) or carriage return/line feed (ASCII 13/ASCII 10) is used.

The file is devided into sections. Each section starts with a section header in brackets. Only values in the correct section are recognized. Example:

[SEQUEST]

values

One value is entered on exactly one line. A value is introduced by its name, followed by an equal sign followed by the value's content which may be empty. This is an example:

num_output_lines = 5

The original Sequest™ program needs exactly one space before the equal sign. This is not the case for dqst. You can use an arbitrary of blanks, tabulators, etc. No blanks at all are allowed, too.

The original Sequest™ program has no special syntax for a string. dqst allows to use strings put into single or double quotes, too. Sequest™ uses the first word after the equal sign if it tries to scan a non-number. dqst uses everything except a comment omitting leading and trailing blanks if not using quotes. Quotes always delimit the true content of the string.

comments

Empty lines and lines starting with a hash (#) are treated as comment lines. dqst additionally accepts lines starting with a semicolon as comment lines.

Sequest™ accidentially accept comments after the value's content because it ignores everything after analyzing the value. dqst actively accepts comments in any line after a hash or a semicolon. Normally, there is no difference except for strings. The following line is recognized as a file name of "#comment" in Sequest™ while dqst recognizes an empty value.

first_database_name = #comment

[SEQUEST]

This section contains various elements. If an element is given more than once, the last one wins.

The following elements exist.

The order is irrelevant. Values may be omitted. >add_A_Alanine >add_B_avg_NandD >add_C_Cysteine >add_Cterm_peptide >add_Cterm_protein >add_D_Aspartic_Acid >add_E_Glutamic_Acid >add_F_Phenylalanine >add_G_Glycine >add_H_Histidine >add_I_Isoleucine >add_J_avg_IandL >add_K_Lysine >add_L_Leucine >add_M_Methionine >add_N_Asparagine >add_Nterm_peptide >add_Nterm_protein >add_O_Ornithine >add_P_Proline >add_Q_Glutamine >add_R_Arginine >add_S_Serine >add_T_Threonine >add_U_Selenocysteine >add_V_Valine >add_W_Tryptophan >add_X_LorI >add_Y_Tyrosine >add_Z_avg_QandE >amino_acids_per_proton >compatible >create_output_files >database_name >diff_search_options >enzyme_number >first_database_name >fragment_ion_tolerance >ion_cutoff_percentage >ion_series >mass_type_fragment >mass_type_parent >match_peak_allowed_error >match_peak_count >match_peak_tolerance >max_consecutive_X >max_num_differential_AA ial_AA_per_mod" >max_num_differential_AA_per_mod cleavage_sites" >max_num_internal_cleavage_sites >max_replacements_X >min_num_differential_AA ial_AA_per_mod" >min_num_differential_AA_per_mod >nucleotide_reading_frame >num_description_lines >num_output_lines >num_results >partial_sequence >peptide_mass_tolerance >print_duplicate_references >protein_mass_filter >rare_ions_percentage >remove_precursor_peak >second_database_name >sequence_header_filter >sequest4dqst >show_fragment_ions

`add_A_Alanine`

Modify the static mass of this amino acid. The value is added to the standard mass of the entry. Sequest™ just accepts a number which is 0.0 most of the time. dqst accepts not just a number. The format of the value is

mass[':'percentage]['!']['['mass[':'percentage]['!']']']

Some valid examples:

add_A_Alanine = 0
add_A_Alanine = 12.00!
add_A_Alanine = 12.00
add_A_Alanine = 12.00[-CO]
add_A_Alanine = COH+13-N![-CO:0.2]
add_A_Alanine = COH+13-N:60%![-CO:20%]

These examples are rather stupid, but show the possibilities.

The main difference is shown normally. The neutral loss is shown in brackets. No neutral loss means that no neutral loss can happen. A dynamic modification overwrites the neutral loss behaviour.

An exclamation mark indicates an affinity to protons. The default values implement an exclamation mark to the amino acids H, K, R. Overwrite it with a value of 0[0].

The default values implement neutral losses with an intensity of 16.6% for the amino acids

K, R, Q, N   -NH3
S, T, E, D   -H2O
any          -CO

The special -CO neutral loss simulates an occurance of an A ion and happens only once if no B ion series is selected. Other neutral losses may occur more often and in combination.
Overwrite this behaviour with a value of 0[0].

The expected intensity of a particular peak can be selected after a colon either as a factor or in percent notation. Other influences may multiply another factor, too. This results in peak heights of various intensities.
It is always a good idea to choose the intensity correctly. Either the main peak or the neutral loss is less abundant very often.

The mass can be any combination of a number or a chemical notation. The later one isn't understood by Sequest™ directly (the sequest4dqst mode translates this!), but it is very handy because a switch from monoisotopic to average mode happens without any difference. Finally, no computation error of the user may occur.

Element codes understood so far are:
H Li Be B C N O F Na Mg Al Si P S Cl K Ca Br I
Additionally, these codes are accepted, too:
p for a proton
n for a neutron
e for an elektron
D for deuterium
Please note that the molecule composition isn't understood by Sequest™.

`add_B_avg_NandD`

This value must be 0. dqst replaces code "B" either by N or by D.

`add_C_Cysteine`

See add_A_Alanine.

A common static modification is "add_C_Cysteine = CH2CONH" meaning carbamido-methylation. A value of "57" is lousy and "57.0215" is incorrect for both monoisotopic as average cases, but one has to use the later one or "57.0513" in Sequest™ because that program isn't smart enough to understand molecules.

`add_Cterm_peptide`

The standard mass of the C-terminus is the mass of the molecule "OH". Every modification should consider this as the base. Replacing the terminus by foobar should result in

add_Cterm_peptide = foobar-OH

See add_A_Alanine for the syntax.

`add_Cterm_protein`

The standard mass of the C-terminus of the protein is the mass of the C-terminus of the peptide. Every modification should consider this as the base. See add_A_Alanine for the syntax.

`add_J_avg_IandL`

This value must be 0. dqst replaces code "J" either by I or by L.

`add_K_Lysine`

See add_A_Alanine.

`add_L_Leucine`

See add_A_Alanine.

`add_M_Methionine`

See add_A_Alanine.

`add_N_Asparagine`

See add_A_Alanine.

`add_Nterm_peptide`

The standard mass of the N-terminus is the mass of the atom "H". Every modification should consider this as the base. Replacing the terminus by foobar should result in

add_Nterm_peptide = foobar-H

See add_A_Alanine for the syntax.

`add_Nterm_protein`

The standard mass of the N-terminus of the protein is the mass of the N-terminus of the peptide. Every modification should consider this as the base. See add_A_Alanine for the syntax.

`add_O_Ornithine`

This value has an initial value of 0 and is ignored. Feel free to set it to a reasonable value like N2C5H10O for Ornithine. The standard doesn't mention O for Ornithine therefore we set the standard value to 0. Sequest™ uses the mass of Ornithine for the code O. See add_A_Alanine.

`add_P_Proline`

See add_A_Alanine.

`add_Q_Glutamine`

See add_A_Alanine.

`add_R_Arginine`

See add_A_Alanine.

`add_S_Serine`

See add_A_Alanine.

`add_T_Threonine`

See add_A_Alanine.

`add_U_Selenocysteine`

This code isn't used in Sequest™ but dqst supports it. See add_A_Alanine.

`add_V_Valine`

See add_A_Alanine.

`add_W_Tryptophan`

See add_A_Alanine.

`add_X_LorI`

This value must be 0. dqst replaces code "X" with every possible amino acid.

`add_Y_Tyrosine`

See add_A_Alanine.

`add_Z_avg_QandE`

This value must be 0. dqst replaces code "Z" either by Q or by E.

`amino_acids_per_proton`

This value is unknown to Sequest™.

This value must be an integer in the range 0 - 144 currently. The default value is 1.

Meaning: When computing possible fragments and their charge this value selects the number of consecutive amino acids that are likely to bind another proton. "Another" means neither counted by the proton binding amino acids nor that one always counted at the terminus.

`compatible`

This value is unknown to Sequest™.

This value must be an integer in the range 0 - 65535 currently. The default value is 0.

Meaning: See -Sequest. The command line parameter takes precedence over this value, although only flags set explicitely by -Sequest or -DQST take precedence.

`create_output_files`

This value must be an integer and either 0 or 1. The default value is 0.

Meaning: If set to 1 an output file is created. If an output file is created, dqst no longer shows results on standard output in opposite to Sequest™.

The command line parameter -O takes precedence over this value.

`database_name`

This value must be a string. Most people omit this value and prefer to set first_database_name which is a synonym of this value.

`diff_search_options`

This value consists of pairs of variable modifications. Each pair has two elements: A mass and a list of amino acids.

Meaning: Each variable modification is tested whether it can be applied or not to the current peptide. If so, the peptide is scored. Only the complete mass of the peptide is the decision switch what to do. The unmodified peptide is scored if the peptide's mass matches without a variable modification, but the modified peptide is scored if the modification plus the peptide's mass is within the range of the peptide's tolerance.

Example of work:
Be the peptide's precursor mass 1000. The unmodified mass of the peptide shall be 984 and has the form "...MM...".
Then the two peptides are scored (* is the modification indicator of a mass aduct of 16):
"...M*M..."
"...MM*..."
If the peptide mass tolerance value is bigger then 16, some other modifications are tested:
"...MM..."
"...M*M*..."

The mass of each pair of variable modification can be entered as described at add_A_Alanine.
The list of amino acids letters must be entered without blanks between them.
Invalid amino acid letter may be used if the mass is 0 only.
dqst allows up to 8 pairs (configurable).
Sequest™ allows up to 3 pairs. No error is shown if more pairs are entered.
Sequest™ won't apply any modification if the first two are null. Always put valid modifications first. Don't use "0 X 0 X 16 M"
dqst allows the re-use of a amino acid letter in another variable modification. Thus, "44 S 80 ST" is valid and works as expected.
Sequest™ only applies the last modification character without warning. Don't use "44 S 80 ST". It is interpreted as "80 ST".
Sequest™ doesn't look for all permutations of variable modifications. Sometimes it misses something.

A common variable modification is "O M" meaning oxygene may be added to methionine. A value of "16 M" is lousy and "15.995 M" is incorrect for both monoisotopic as average cases, but one has to use the later one or "15.9994" in Sequest™ because that program isn't smart enough to understand molecules.

`enzyme_number`

This value must be an integer describing the enzyme to use. The default value is 0.

Meaning: This value selects the enzyme in use out of the list of enzymes in section [SEQUEST_ENZYME_INFO]. In that example a value of 1 would select Trypsin.

Values without corresponding entry bind to the enzymeless mode. Sequest™ won't give any warning in this case.

Sequest™ gives very poor results in enzymeless mode in opposite to dqst.

`first_database_name`

This value must be a string. The value must be enclosed into quotes if the value contains spaces, but this is incompatible to Sequest™.

This value is a synonym of database_name.

Meaning: The value gives the name of the FASTA-database to search like the command line parameter -D. The command line parameter takes precedence over this value. Every file name is accepted that matches the rules of the operating system.

`fragment_ion_tolerance`

This value must be a value between 0 and 10. The default value is 0 which leads effectively to 0.49999 for dqst.

Meaning: This value selects the radius of the m/z value of the fragment. A theoretical peak is considered matching if it matches the fragment tolerance range of any experimental peak.

Example: A value of 0.1 selects a range [x-0.1, x+0.1] around each peak with a m/z value of x.

Sequest™ uses 0.0 for a special case where binning is involved heavily. This means that any value used in the process (here fragment peaks) are rounded to an integer using some special rounding rules before comparing.
dqst never uses binning for detecting matching fragment, but because of compatibility reasons it uses 0.49999 if the original value is 0. This leads to a more or less compatible behaviour to Sequest™.

This value has no influence on the XCorr result of a match. The important sp value of a match is calculated using the fragment ion tolerance, though.

The algorithms of the calculation of sp is different in dqst and Sequest™. See operation modes for more information.

`ion_cutoff_percentage`

This value must be a value between 0 and 0.9999. The default value is 0.

Meaning: This value selects a cut-off below which a matching peptide is rejected. The value compared with this value is the ratio
(# matching theoretical fragment peaks) / (# total theoretical fragment peaks)
which means that the user can select a minimum coverage of matching peaks.

Another very similar filter is provided by the parameter values match_peak_count and match_peak_allowed_error.

This value is influenced by fragment_ion_tolerance.

The algorithms of the calculation of matching peaks is different in dqst and Sequest™. See operation modes for more information.

`ion_series`

This value is a list of numbers. All numbers default to 0 meaning that nothing is done at all. Therefore this value must be set. The numbers must be entered in this order:

0 or 1 whether neutral losses of series A should be honoured. (1 = yes)
0 or 1 whether neutral losses of series B should be honoured. (1 = yes)
0 or 1 whether neutral losses of series Y should be honoured. (1 = yes)
factor for series A
factor for series B
factor for series C
factor for series D
factor for series V
factor for series W
factor for series X
factor for series Y
factor for series Z

Meaning: A factor for any series must be between 0 and 100. 0 means ignoring this series, 1 is the standard value. The factor is an intensity prediction. In doubt, use 1! There is nearly no reason why not to use 0 or 1, because 1 is the normalized standard maximum of any prediction.

Neutral losses are ignored if the corresponding series is disabled. Neutral losses normally have a lower intensity prediction then the normal peaks, this can be switch or modified though.

A value list of

0 1 1  0 0.5 0 0 0 0 0 1.0 0

with the default neutral loss assignment may create these peaks

peak name	peak intensity prediction
b2	0.5
b2-CO	0.08333 (=16.7% of 0.5)
y2	1.0

The algorithms of the calculation of matching peaks is different in dqst and Sequest™. See operation modes for more information.

`mass_type_fragment`

This value must be either 0 or 1. The default value is 1.

Meaning: A value of 1 selects monoisotopic masses, 0 selects average masses for calculating fragment peaks. The DTA file contains the precursor mass in its first line, the fragments in the following. Only the fragments are compared using this value's mass class in opposite to mass_type_parent.

Most elements have several stable isotops. These isotops have one or more additional neutrons. The most abundant isotops are normally those with the fewest neutrons. But with increasing mass the probability of a neutron gain raises and it is very likely that a summed mass of 2000 amu has at least one neutron. An average value is the mean value of probable isotops. The average value normally is not matchable exactly because parts of neutrons don't occur.

The generating program of the DTA file usually creates either monoisotopic or average masses. But differences between precursor and fragments are possible. In either case this value must match the properties of the creating process, otherwise nonsense will be computed.

dqst uses an accuracy of 4 fractional digits on 32-bit-machines and 11 fractional digits on 64-bit-machines.

`mass_type_parent`

This value must be either 0 or 1. The default value is 1.

Meaning: A value of 1 selects monoisotopic masses, 0 selects average masses for calculating precursor peaks. The DTA file contains the precursor mass in its first line, the fragments in the following. Only the precursor mass is compared using this value's mass class in opposite to mass_type_fragment.

dqst uses an accuracy of 4 fractional digits on 32-bit-machines and 11 fractional digits on 64-bit-machines.

`match_peak_allowed_error`

This value must be an integer between 0 and 100. The default value is 0.

See match_peak_count for a description.

`match_peak_count`

This value must be an integer between 0 and 100. The default value is 0. Sequest™ has a maximum of 5.

Meaning: The highest abundant experimental peaks are checked whether they are matched by the theoretical ones. match_peak_count is the number of the top abundant peaks to check. A maximum of match_peak_allowed_error may lack this test.

See match_peak_tolerance for some preliminary informations about topmost abundant peak selection.

See ion_cutoff_percentage for an alternate approach for picking top peaks.

`match_peak_tolerance`

This value must be between 0 and 100. The default value is 0.

Meaning: While selecting the topmost abundant peaks that should be matched using match_peak_count the topmost raw peaks are discriminated. That means, the experimental peaks must have a minimum space between them. This is the maximum of this value and fragment_ion_tolerance. In other words: This value is never smaller than fragment_ion_tolerance.

`max_consecutive_X`

This value is unknown to Sequest™.

This value must be between 0 and 100. The default value is 1.

Meaning: Several joker codes for amino acids exist. Most are just able to select 2 different amino acids, so both possible true amino acids are tested including every possible modification.

However, the code X will be replaced by one of 20 amino acids. Each of them are tested. Obviously, the chance of a false positive raises significantly if more than just one consecutive X is replaced as in XXX. This value limits the number of adjacent X. If the number of consecutive X is bigger than this value the next non-X based cleavage position is seeked and the algorithm continues beginning with that peptide.

`max_num_differential_AA`

This value is unknown to Sequest™.

This value must be an integer in the range 0 - 144 currently. The default value is 12.

Meaning: Using diff_search_options it is possible to select variable modifications.

This value limits the maximum total number of variable modifications in one peptide.

`max_num_differential_AA_per_mod`

This value is unknown to Sequest™.

This value must be an integer in the range 0 - 144 currently. The default value is 4.

Meaning: Using diff_search_options it is possible to select variable modifications.

This value limits the maximum total number of each single variable modification in one peptide.

`max_num_internal_cleavage_sites`

This value must be an integer in the range 0 - 143 currently. The default value is 5 and should be changed to a smaller value.

This value is not used in enzymeless mode.

Meaning: Enzymes split proteins following several rules described in section [SEQUEST_ENZYME_INFO]. The enzymes sometimes doesn't cleave at a possible cleavage point. This may results either from a short incubation time or from a modification near the cleavage point. Other reasons exist, too.

This value is the number of cleavage positions that may have been ignored by the enzyme. Note that 0 up to this value cleavage sites are possible.

`max_replacements_X`

This value is unknown to Sequest™.

This value must be a non-negative integer number. The default value is 1.

Meaning: Several joker codes for amino acids exist. Most are just able to select 2 different amino acids, so both possible true amino acids are tested including every possible modification.

However, the code X will be replaced by one of 20 amino acids. Each of them are tested. Obviously, the chance of a false positive raises significantly if more than just one X is replaced in a peptide. This value limits the number of X-codes in a peptide. If the number of X-codes is bigger than this value the next non-X based cleavage position is seeked and the algorithm continues beginning with that peptide.

`min_num_differential_AA`

This value is unknown to Sequest™.

This value must be an integer in the range 0 - 144 currently. The default value is 0.

Meaning: Using diff_search_options it is possible to select variable modifications.

This value limits the maximum total number of variable modifications in one peptide.

`min_num_differential_AA_per_mod`

This value is unknown to Sequest™.

This value must be an integer in the range 0 - 144 currently. The default value is 0.

Meaning: Using diff_search_options it is possible to select variable modifications.

This value limits the minimum total number of each single variable modification in one peptide.

`nucleotide_reading_frame`

This value must be an integer in the range 0 - 9. The default value is 0.

FASTA databases may consist either of DNA sequences or of amino acid codes. dqst is able to translate DNA sequences into amino acid codes. This value describes what to do with DNA codes. Meaning of the values:

value meaning

0 The FASTA file contains amino acid codes. No translation is needed. This is the best and fastest case.

1 The DNA sequence is scanned left to right (forward direction). The amino acid code starts with the first DNA code.

2 The DNA sequence is scanned left to right (forward direction). The amino acid code starts with the second DNA code.

3 The DNA sequence is scanned left to right (forward direction). The amino acid code starts with the third DNA code.

4 The DNA sequence is scanned right to left (backward direction for the complementary strand). The amino acid code starts with the first DNA code.

5 The DNA sequence is scanned right to left (backward direction for the complementary strand). The amino acid code starts with the second DNA code.

6 The DNA sequence is scanned right to left (backward direction for the complementary strand). The amino acid code starts with the third DNA code.

7 Use each of the DNA translations of the codes 1, 2, 3.

8 Use each of the DNA translations of the codes 4, 5, 6.

9 Use each of the DNA translations of the codes 1, 2, 3, 4, 5, 6.

value	meaning
0	The FASTA file contains amino acid codes. No translation is needed. This is the best and fastest case.
1	The DNA sequence is scanned left to right (forward direction). The amino acid code starts with the first DNA code.
2	The DNA sequence is scanned left to right (forward direction). The amino acid code starts with the second DNA code.
3	The DNA sequence is scanned left to right (forward direction). The amino acid code starts with the third DNA code.
4	The DNA sequence is scanned right to left (backward direction for the complementary strand). The amino acid code starts with the first DNA code.
5	The DNA sequence is scanned right to left (backward direction for the complementary strand). The amino acid code starts with the second DNA code.
6	The DNA sequence is scanned right to left (backward direction for the complementary strand). The amino acid code starts with the third DNA code.
7	Use each of the DNA translations of the codes 1, 2, 3.
8	Use each of the DNA translations of the codes 4, 5, 6.
9	Use each of the DNA translations of the codes 1, 2, 3, 4, 5, 6.

`num_description_lines`

This value must be a non-negative integer number. The default value is 0.

Meaning: A maximum of this value of description lines is emitted after the results in the output file.

A description line is limited to 79 characters.

`num_output_lines`

This value must be a non-negative integer number. The default value is 1.

Meaning: The top scored results are printed. This is the number of elements shown. Each line contains these fields.

name meaning

# Position of the results. This value increments by one starting from 1.

Rank/Sp The position of the results regarding XCorr and Sp are listed in this field separated by a slash (/). Sequest™'s top XCorr value is always listed first while dqst selects the best global value in the #-column.
Each of this two position fields may occur more than once. The top XCorr and SP rows are shown in the results, too.

Id# This value is 0.

(M+H)+ The mass of the theoretical peptide is listed here including one proton for a single charged molecule.

deltCn This value is an indicator of the difference to the top XCorr candidate. The top XCorr candidate has a value of 0, the worst a theoretical value of 1.0.

XCorr The XCorr value shows the similarity of the theoretical spectrum and the experimental data. A value lower than 1.5 usually indicates a poor result.

Sp This value is the preliminary score value. dqst uses a plain value and Sequest™ uses a percentage. Therefore, a factor 100 must be used to compare values of these two different output modes. A value lower than 1.5 resp. 150 usually indicates a poor result.

Ions This value shows the ratio of the found and expected peaks.

Reference This value contains the first characters of the protein's name. Optionally, the number of other proteins containing this peptide is added, too.

Peptide This value contains the peptide's sequence. The cleavage points are shown as dots. The adjacent amino letters are shown, too. A dash (-) indicates a terminus of the protein.

OriginalPeptide This value is shown in dqst mode only. This value is the original value of Peptide. A difference occurs only if a replacement of a joker amino acid has happened. This value will include B, J, X and Z, but Peptide will show the used values.
Sequest™ always displays this value as Peptide.

name	meaning
#	Position of the results. This value increments by one starting from 1.
Rank/Sp	The position of the results regarding XCorr and Sp are listed in this field separated by a slash (`/`). Sequest™'s top XCorr value is always listed first while dqst selects the best global value in the `#`-column. Each of this two position fields may occur more than once. The top XCorr and SP rows are shown in the results, too.
Id#	This value is 0.
(M+H)+	The mass of the theoretical peptide is listed here including one proton for a single charged molecule.
deltCn	This value is an indicator of the difference to the top XCorr candidate. The top XCorr candidate has a value of 0, the worst a theoretical value of 1.0.
XCorr	The XCorr value shows the similarity of the theoretical spectrum and the experimental data. A value lower than 1.5 usually indicates a poor result.
Sp	This value is the preliminary score value. dqst uses a plain value and Sequest™ uses a percentage. Therefore, a factor 100 must be used to compare values of these two different output modes. A value lower than 1.5 resp. 150 usually indicates a poor result.
Ions	This value shows the ratio of the found and expected peaks.
Reference	This value contains the first characters of the protein's name. Optionally, the number of other proteins containing this peptide is added, too.
Peptide	This value contains the peptide's sequence. The cleavage points are shown as dots. The adjacent amino letters are shown, too. A dash (`-`) indicates a terminus of the protein.
OriginalPeptide	This value is shown in dqst mode only. This value is the original value of Peptide. A difference occurs only if a replacement of a joker amino acid has happened. This value will include B, J, X and Z, but Peptide will show the used values. Sequest™ always displays this value as Peptide.

`num_results`

This value must be an integer between 1 and 100000. The default value is 500.

Meaning: This is the size of the scoring buffer. The smaller the buffer, the fast is the algorithm and the less memory is used. On the other hand, it becomes more likely to throw away a low scored Sp value with a high XCorr value.

The top scored Sp results are collected in a buffer of this value's size. Each element of this buffer gets a XCorr value after preliminary scoring each peptide.

`partial_sequence`

This value is a space delimited list of amino acid sequences that must occur in the theoretical spectra.

Meaning: This is a very useful tool to reduce the amount of time in computation. The longer a sequence is, the better is the overall performance.

Example: A value of "A CM" looks for the occurance of A and either CM or MC.

This list can be given after the charge value of a DTA file, too. Just add it in the first line. Note that the parameter file's value takes precedence, though.

`peptide_mass_tolerance`

This value must be a value between 0 and 100. The default value is 0 which should be changed to a reasonable value.

Meaning: This value selects the radius of the m/z value of the peptide. A theoretical peptide is considered matching if it matches the peptide mass tolerance range of the experimental precursor mass.

This value is used for the peptide selection, only. For fragments another entry called fragment_ion_tolerance is used.

`print_duplicate_references`

This value must be a either 0 or 1.

Meaning: If set to 1 all proteins containing the peptide are listed in the output. With a value of 0 only the first occurance of a protein is listed for a particular protein.

`protein_mass_filter`

This value must be a either
mean percentage "%" or
min max

Meaning: The mean value, min and max have to be integer values. The percentage must be in the range 0 - 100.0. The percentage value describes a radius around the mean value. Thus, "200 10%" is equivalent to "180 220".

Currently, both Sequest™ and dqst need an entry "[MASS=value]" in the description line of the proteins to let this test do its work. dqst uses the complete line in opposite to Sequest™, though.

This feature should not be used in the current state of implementation.

`rare_ions_percentage`

This value is unknown to Sequest™.

This value must be an integer in the range 0 - 100. The default value is 100.

Meaning: Some rare theoretical peaks' intensity is multiplied by this value's percentage to predict the expected fragment peaks better.

A rare ion is one with a charge higher than the amount of proton acceptors seen so far added by one.

This value has a strong relation to amino_acids_per_proton.

`remove_precursor_peak`

This value must be a either 0 or 1.

Meaning: If set to 1 the peaks near the precursor are removed. Sequest™ removes peaks with a distance lower than 15 amu, dqst uses 10 amu.

`second_database_name`

This value must be a string. The value must be enclosed into quotes if the value contains spaces, but this is incompatible to Sequest™. But Sequest™ ignores this value in either case.

Meaning: The value gives the name of a second FASTA-database to search. Every file name is accepted that matches the rules of the operating system.

`sequence_header_filter`

This value must be a a list of strings. Everything after a hash (#) or a semicolon (;) including is treated as a comment.

Meaning: Several elements can be splitted by spaces. Each element can be introduced by an exclamation mark (!) meaning that this element excluding the exclamation mark must not appear in the header of a protein or the protein will be skipped. This test is done first.

Next, all other elements are tested. The protein is processed if one matches filter string matches the header string.

A filter string may contain a tilde (~). This is replaced by a blank during comparison.

`sequest4dqst`

This value is unknown to Sequest™.

This value must be a string. The value must be enclosed into quotes if the value contains spaces.

Meaning: The value gives the name of a sequest program dqst shall use in the operation mode sequest4dqst. The file name may include a path. Otherwise the standard search path is seeked for the given program.

One scenario may be this: Rename sequest27.exe into orig_sequest.exe. Rename dqst.exe into sequest27.exe. Set this value to orig_sequest.exe and set the environment variable DQST to "-Sequest=4". Then use your favorite tools.

This value can be entered using the environment variable SEQUEST4DQST, too. That value takes precedence over the parameter file's value.

Note that using the environment variable automatically selects "-Sequest=1". This can be overwritten by further command line parameters.

`show_fragment_ions`

This value must be a either 0 or 1.

Meaning: If set to 1 the fragment peaks of the top scored peptide are listed at the end of the output.

Sequest™'s output is wrong in many cases. dqst performs well both in compatibility mode as in standard mode.

In compatibility mode this list is block oriented, one block for each charge. Basically, each line of a block contains one element of each selected series. The element contains the m/z value and optionally a "+" for a matching peak.

In standard mode we have just one list. Each list entry has a separate output line. The line format is described here.

[SEQUEST_ENZYME_INFO]

This section must follow the section [SEQUEST] in the original Sequest™ program. The order has no influence for dqst.

The format in this section is very different to other sections. There is no value assignment. Instead, an ordered list is presented. The list may or may not be ordered and missing elements are allowed. However, the item with the number 0 should describe a missing enzyme.

Each line consists of 5 elements and eventually a comment. The elements are separated by at least one blank or tabulator.

element meaning

order This element gives a user selectable number of the digestion enzyme followed immediately by a dot.

name This element is the name of the enzyme. Only one word is allowed.

direction This element is either 0 or 1. 1 means that the enzyme scans the protein from N to the C-terminus. 0 describes an enzyme scanning from C to the N-terminus.

break after This element is either a dash (-) or a list of amino acids after which a cleavage happens. A dash means: cleave after avery amino acid.

not before This element is either a dash (-) or a list of amino acids before which no cleavage can occur. A dash means: There is no exceptional amino acid where no cleavage is possible before.

Therefore a combination of "- - for "break after" and "not before" means an unspecific cleavage.

element	meaning
order	This element gives a user selectable number of the digestion enzyme followed immediately by a dot.
name	This element is the name of the enzyme. Only one word is allowed.
direction	This element is either 0 or 1. 1 means that the enzyme scans the protein from N to the C-terminus. 0 describes an enzyme scanning from C to the N-terminus.
break after	This element is either a dash (-) or a list of amino acids after which a cleavage happens. A dash means: cleave after avery amino acid.
not before	This element is either a dash (-) or a list of amino acids before which no cleavage can occur. A dash means: There is no exceptional amino acid where no cleavage is possible before. Therefore a combination of "`- -` for "break after" and "not before" means an unspecific cleavage.

Some examples:

0.      No_Enzyme               0       -               -
1.      Trypsin                 1       KR              P
2.      Trypsin(KRLNH)          1       KRLNH           -
3.      Chymotrypsin            1       FWYL            -
4.      Chymotrypsin(FWY)       1       FWY             P
5.      Clostripain             1       R               -
6.      Cyanogen_Bromide        1       M               -
7.      IodosoBenzoate          1       W               -
8.      Proline_Endopept        1       P               -
9.      Staph_Protease          1       E               -
10.     Trypsin_K               1       K               P
11.     Trypsin_R               1       R               P
12.     GluC/V8                 1       ED              -
13.     LysC                    1       K               P
14.     AspN                    0       D               P
15.     Elastase                1       ALIV            P
16.     Elastase/Tryp/Chymo     1       ALIVKRWFY       P

Trypsin scans a protein from the N-terminal and cleaves after every K or R, but not before a P.

AspN scans starting from the C-terminal and cleaves after a D, but not before a P.

Format of the sequest.params file and parameters to dqst

Parameters

sequest.params

scoring algorithm

ranking

sequest.params

Format of the `sequest.params` file and parameters to dqst