Home  · Classes  · Annotated Classes  · Modules  · Members  · Namespaces  · Related Pages
IDFilter

Filters protein identification engine results by different criteria.

potential predecessor tools $ \longrightarrow $ IDFilter $ \longrightarrow $ potential successor tools
MascotAdapter (or other ID engines) PeptideIndexer
IDFileConverter ProteinInference
FalseDiscoveryRate IDMapper
ConsensusID

This tool is used to filter the identifications found by a peptide/protein identification tool like Mascot. Different filters can be applied:

To enable any of the filters, just change their default value. All active filters will be applied in order.

The command line parameters of this tool are:

IDFilter -- Filters results from protein or peptide identification engines based on different criteria.
Version: 1.11.1 Nov 14 2013, 11:18:15, Revision: 11976

Usage:
  IDFilter <options>

Options (mandatory options marked with '*'):
  -in <file>*                       Input file  (valid formats: 'idXML')
  -out <file>*                      Output file  (valid formats: 'idXML')

Filtering by peptide/protein score. To enable any of the filters below, just change their default value. All 
active filters will be applied in order.:
  -score:pep <score>                The score which should be reached by a peptide hit to be kept. The score 
                                    is dependent on the most recent(!) preprocessing - it could be Mascot
                                    scores (if a MascotAdapter was applied before), or an FDR (if FalseDiscov
                                    eryRate was applied before), etc. (default: '0')
  -score:prot <score>               The score which should be reached by a protein hit to be kept. (default: 
                                    '0')

Filtering by significance threshold:
  -thresh:pep <fraction>            Keep a peptide hit only if its score is above this fraction of the peptid
                                    e significance threshold. (default: '0')
  -thresh:prot <fraction>           Keep a protein hit only if its score is above this fraction of the protei
                                    n significance threshold. (default: '0')

Filtering by whitelisting (only instances also present in a whitelist file can pass):
  -whitelist:proteins <file>        Filename of a FASTA file containing protein sequences.
                                    All peptides that are not a substring of a sequence in this file are rem
                                    oved
                                    All proteins whose accession is not present in this file are removed. (v
                                    alid formats: 'fasta')
  -whitelist:by_seq_only            Match peptides with FASTA file by sequence instead of accession and disab
                                    le protein filtering.

Filtering by blacklisting (only instances not present in a blacklist file can pass):
  -blacklist:peptides <file>        Peptides having the same sequence as any peptide in this file will be 
                                    filtered out
                                    (valid formats: 'idXML')

Filtering by RT predicted by 'RTPredict':
  -rt:p_value <float>               Retention time filtering by the p-value predicted by RTPredict. (default:
                                    '0' min: '0' max: '1')
  -rt:p_value_1st_dim <float>       Retention time filtering by the p-value predicted by RTPredict for first 
                                    dimension. (default: '0' min: '0' max: '1')

Filtering by mz:
  -mz:error <float>                 Filtering by deviation to theoretical mass (disabled for negative values)
                                    . (default: '-1')
  -mz:unit <String>                 Absolute or relativ error. (default: 'ppm' valid: 'Da', 'ppm')

Filtering best hits per spectrum (for peptides) or from proteins:
  -best:n_peptide_hits <integer>    Keep only the 'n' highest scoring peptide hits per spectrum (for n>0). 
                                    (default: '0' min: '0')
  -best:n_protein_hits <integer>    Keep only the 'n' highest scoring protein hits (for n>0). (default: '0' 
                                    min: '0')
  -best:strict                      Keep only the highest scoring peptide hit.
                                    Similar to n_peptide_hits=1, but if there are two or more highest scorin
                                    g hits, none are kept.

  -min_length <integer>             Keep only peptide hits with a length greater or equal this value. Value 
                                    0 will have no filter effect. (default: '0' min: '0')
  -max_length <integer>             Keep only peptide hits with a length less or equal this value. Value 0 
                                    will have no filter effect. Value is overridden by min_length, i.e. if
                                    max_length < min_length, max_length will be ignored. (default: '0' max:
                                    '0')
  -min_charge <integer>             Keep only peptide hits for tandem spectra with charge greater or equal 
                                    this value. (default: '1' min: '1')
  -var_mods                         Keep only peptide hits with variable modifications (fixed modifications 
                                    from SearchParameters will be ignored).
  -unique                           If a peptide hit occurs more than once per PSM, only one instance is kept
                                    .
  -unique_per_protein               Only peptides matching exactly one protein are kept. Remember that isofor
                                    ms count as different proteins!
  -keep_unreferenced_protein_hits   Proteins not referenced by a peptide are retained in the idXML.
                                    
Common TOPP options:
  -ini <file>                       Use the given TOPP INI file
  -threads <n>                      Sets the number of threads allowed to be used by the TOPP tool (default: 
                                    '1')
  -write_ini <file>                 Writes the default configuration file
  --help                            Shows options
  --helphelp                        Shows all options (including advanced)

INI file documentation of this tool:

Legend:
required parameter
advanced parameter
+IDFilterFilters results from protein or peptide identification engines based on different criteria.
version1.11.1 Version of the tool that generated this parameters file.
++1Instance '1' section for 'IDFilter'
in input file input file*.idXML
out output file output file*.idXML
min_length0 Keep only peptide hits with a length greater or equal this value. Value 0 will have no filter effect.0:∞
max_length0 Keep only peptide hits with a length less or equal this value. Value 0 will have no filter effect. Value is overridden by min_length, i.e. if max_length < min_length, max_length will be ignored.-∞:0
min_charge1 Keep only peptide hits for tandem spectra with charge greater or equal this value.1:∞
var_modsfalse Keep only peptide hits with variable modifications (fixed modifications from SearchParameters will be ignored).true,false
uniquefalse If a peptide hit occurs more than once per PSM, only one instance is kept.true,false
unique_per_proteinfalse Only peptides matching exactly one protein are kept. Remember that isoforms count as different proteins!true,false
keep_unreferenced_protein_hitsfalse Proteins not referenced by a peptide are retained in the idXML.true,false
log Name of log file (created only when specified)
debug0 Sets the debug level
threads1 Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse Disables progress logging to command linetrue,false
testfalse Enables the test mode (needed for internal use only)true,false
+++scoreFiltering by peptide/protein score. To enable any of the filters below, just change their default value. All active filters will be applied in order.
pep0 The score which should be reached by a peptide hit to be kept. The score is dependent on the most recent(!) preprocessing - it could be Mascot scores (if a MascotAdapter was applied before), or an FDR (if FalseDiscoveryRate was applied before), etc.
prot0 The score which should be reached by a protein hit to be kept.
+++threshFiltering by significance threshold
pep0 Keep a peptide hit only if its score is above this fraction of the peptide significance threshold.
prot0 Keep a protein hit only if its score is above this fraction of the protein significance threshold.
+++whitelistFiltering by whitelisting (only instances also present in a whitelist file can pass)
proteins filename of a FASTA file containing protein sequences.
All peptides that are not a substring of a sequence in this file are removed
All proteins whose accession is not present in this file are removed.
input file*.fasta
by_seq_onlyfalse Match peptides with FASTA file by sequence instead of accession and disable protein filtering.true,false
+++blacklistFiltering by blacklisting (only instances not present in a blacklist file can pass)
peptides Peptides having the same sequence as any peptide in this file will be filtered out
input file*.idXML
+++rtFiltering by RT predicted by 'RTPredict'
p_value0 Retention time filtering by the p-value predicted by RTPredict.0:1
p_value_1st_dim0 Retention time filtering by the p-value predicted by RTPredict for first dimension.0:1
+++mzFiltering by mz
error-1 Filtering by deviation to theoretical mass (disabled for negative values).
unitppm Absolute or relativ error.Da,ppm
+++bestFiltering best hits per spectrum (for peptides) or from proteins
n_peptide_hits0 Keep only the 'n' highest scoring peptide hits per spectrum (for n>0).0:∞
n_protein_hits0 Keep only the 'n' highest scoring protein hits (for n>0).0:∞
strictfalse Keep only the highest scoring peptide hit.
Similar to n_peptide_hits=1, but if there are two or more highest scoring hits, none are kept.
true,false
n_to_m_peptide_hits: peptide hit rank range to extracts

OpenMS / TOPP release 1.11.1 Documentation generated on Thu Nov 14 2013 11:19:24 using doxygen 1.8.5