Home  · Classes  · Annotated Classes  · Modules  · Members  · Namespaces  · Related Pages
FileFilter

Extracts portions of the data from an mzML, featureXML or consensusXML file.

pot. predecessor tools $ \longrightarrow $ FileFilter $ \longrightarrow $ pot. successor tools
any tool yielding output
in mzML, featureXML
or consensusXML format

any tool that profits on reduced input

With this tool it is possible to extract m/z, retention time and intensity ranges from an input file and to write all data that lies within the given ranges to an output file.

Depending on the input file type, additional specific operations are possible:

The priority of the id-flags is (decreasing order): remove_annotated_features / remove_unannotated_features -> remove_clashes -> keep_best_score_id -> sequences_whitelist / accessions_whitelist

MS2 and higher spectra can be filtered according to precursor m/z (see 'pc_mz'). This flag can be combined with 'rt' range to filter precursors by RT and m/z. If you want to extract an MS1 region with untouched MS2 spectra included, you will need to split the dataset by MS level and use 'mz' option for MS1 and 'pc_mz' for MS2 data. Then merge them again. RT can be filtered at any step.

The command line parameters of this tool are:

FileFilter -- Extracts or manipulates portions of data from peak, feature or consensus-feature files.
Version: 1.11.1 Nov 14 2013, 11:18:15, Revision: 11976

Usage:
  FileFilter <options>

This tool has algoritm parameters that are not shown here! Please check the ini file for a detailed descripti
on or use the --helphelp option.

Options (mandatory options marked with '*'):
  -in <file>*                                         Input file  (valid formats: 'mzML', 'featureXML', 'cons
                                                      ensusXML')
  -in_type <type>                                     Input file type -- default: determined from file extens
                                                      ion or content
                                                      (valid: 'mzML', 'featureXML', 'consensusXML')
  -out <file>*                                        Output file (valid formats: 'mzML', 'featureXML', 'cons
                                                      ensusXML')
  -out_type <type>                                    Output file type -- default: determined from file exten
                                                      sion or content
                                                      (valid: 'mzML', 'featureXML', 'consensusXML')
  -rt [min]:[max]                                     Retention time range to extract (default: ':')
  -mz [min]:[max]                                     M/z range to extract (applies to ALL ms levels!) (defau
                                                      lt: ':')
  -pc_mz [min]:[max]                                  MSn (n>=2) precursor filtering according to their m/z 
                                                      value. Do not use this flag in conjunction with 'mz',
                                                      unless you want to actually remove peaks in spectra
                                                      (see 'mz'). RT filtering is covered by 'rt' and compati
                                                      ble with this flag. (default: ':')
  -int [min]:[max]                                    Intensity range to extract (default: ':')
  -sort                                               Sorts the output according to RT and m/z.

Peak data options:
  -peak_options:sn <s/n ratio>                        Write peaks with S/N > 'sn' values only (default: '0')
  -peak_options:rm_pc_charge i j ...                  Remove MS(2) spectra with these precursor charges. All 
                                                      spectra without precursor are kept!
  -peak_options:level i j ...                         MS levels to extract (default: '[1 2 3]')
  -peak_options:sort_peaks                            Sorts the peaks according to m/z.
  -peak_options:no_chromatograms                      No conversion to space-saving real chromatograms, e.g. 
                                                      from SRM scans.
  -peak_options:remove_chromatograms                  Removes chromatograms stored in a file.
  -peak_options:mz_precision 32 or 64                 Store base64 encoded m/z data using 32 or 64 bit precis
                                                      ion. (default: '64' valid: '32', '64')
  -peak_options:int_precision 32 or 64                Store base64 encoded intensity data using 32 or 64 bit 
                                                      precision. (default: '32' valid: '32', '64')

Remove spectra or select spectra (removing all others) with certain properties.:
  -spectra:remove_zoom                                Remove zoom (enhanced resolution) scans
  -spectra:remove_mode <mode>                         Remove scans by scan mode
                                                      (valid: 'Unknown', 'MassSpectrum', 'MS1Spectrum', 'MS
                                                      nSpectrum', 'SelectedIonMonitoring', 'SelectedReactionM
                                                      onitoring', 'ConsecutiveReactionMonitoring', 'ConstantN
                                                      eutralGain', 'ConstantNeutralLoss', 'Precursor', 'Enhan
                                                      cedMultiplyCharged', 'TimeDelayedFragmentation', 'Elect
                                                      romagneticRadiation', 'Emission', 'Absorbtion')

                                                      

Remove spectra or select spectra (removing all others) with certain properties.:
  -spectra:remove_activation <activation>             Remove MSn scans where any of its precursors features 
                                                      a certain activation method
                                                      (valid: 'Collision-induced dissociation', 'Post-sourc
                                                      e decay', 'Plasma desorption', 'Surface-induced dissoci
                                                      ation', 'Blackbody infrared radiative dissociation',
                                                      'Electron capture dissociation', 'Infrared multiphoton
                                                      dissociation', 'Sustained off-resonance irradiation',
                                                      'High-energy collision-induced dissociation', 'Low-ener
                                                      ...
                                                      ion')
  -spectra:remove_collision_energy [min]:[max]        Remove MSn scans with a collision energy in the given 
                                                      interval. (default: ':')
  -spectra:remove_isolation_window_width [min]:[max]  Remove MSn scans whichs isolation window width is in 
                                                      the given interval. (default: ':')

                                                      

Remove spectra or select spectra (removing all others) with certain properties.:
  -spectra:select_zoom                                Select zoom (enhanced resolution) scans
  -spectra:select_mode <mode>                         Selects scans by scan mode
                                                      (valid: 'Unknown', 'MassSpectrum', 'MS1Spectrum', 'MS
                                                      nSpectrum', 'SelectedIonMonitoring', 'SelectedReactionM
                                                      onitoring', 'ConsecutiveReactionMonitoring', 'ConstantN
                                                      eutralGain', 'ConstantNeutralLoss', 'Precursor', 'Enhan
                                                      cedMultiplyCharged', 'TimeDelayedFragmentation', 'Elect
                                                      romagneticRadiation', 'Emission', 'Absorbtion')
  -spectra:select_activation <activation>             Select MSn scans where any of its precursors features 
                                                      a certain activation method
                                                      (valid: 'Collision-induced dissociation', 'Post-sourc
                                                      e decay', 'Plasma desorption', 'Surface-induced dissoci
                                                      ation', 'Blackbody infrared radiative dissociation',
                                                      'Electron capture dissociation', 'Infrared multiphoton
                                                      dissociation', 'Sustained off-resonance irradiation',
                                                      'High-energy collision-induced dissociation', 'Low-ener
                                                      ...
                                                      ion')
  -spectra:select_collision_energy [min]:[max]        Select MSn scans with a collision energy in the given 
                                                      interval. (default: ':')
  -spectra:select_isolation_window_width [min]:[max]  Select MSn scans whichs isolation window width is in 
                                                      the given interval. (default: ':')

                                                      

Feature data options:
  -feature:q [min]:[max]                              Overall quality range to extract [0:1] (default: ':')

                                                      

Consensus feature data options:
  -consensus:map i j ...                              Maps to be extracted from a consensus
  -consensus:map_and                                  Consensus features are kept only if they contain exactl
                                                      y one feature from each map (as given above in 'map').

Black or white listing of of MS2 spectra by consensus features.:
  -consensus:blackorwhitelist:blacklist               True: remove matched MS2. False: retain matched MS2 
                                                      spectra. Other levels are kept. (default: 'true' valid:
                                                      'false', 'true')
  -consensus:blackorwhitelist:file <file>             Input file containing consensus features whose correspo
                                                      nding MS2 spectra should be removed from the mzML file!
                                                      Matching tolerances are taken from 'consensus:blackorw
                                                      hitelist:rt' and 'consensus:blackorwhitelist:mz' option
                                                      s.
                                                      If consensus:blackorwhitelist:maps is specified, only
                                                      these will be used.
                                                      (valid formats: 'consensusXML')
  -consensus:blackorwhitelist:maps i j ...            Maps used for black/white list filtering.
  -consensus:blackorwhitelist:rt tolerance            Retention tolerance [s] for precursor to consensus feat
                                                      ure position (default: '60' min: '0')
  -consensus:blackorwhitelist:mz tolerance            M/z tolerance [Th] for precursor to consensus feature 
                                                      position (default: '0.01' min: '0')
  -consensus:blackorwhitelist:use_ppm_tolerance       If ppm tolerance should be used. Otherwise Da are used.
                                                      (default: 'false' valid: 'false', 'true')

                                                      

Feature & Consensus data options:
  -f_and_c:charge [min]:[max]                         Charge range to extract (default: ':')
  -f_and_c:size [min]:[max]                           Size range to extract (default: ':')
  -f_and_c:remove_meta <name> 'lt|eq|gt' <value>      Expects a 3-tuple (=3 entries in the list), i.e. <name>
                                                      'lt|eq|gt' <value>; the first is the name of meta valu
                                                      e, followed by the comparison operator (equal, less or
                                                      greater) and the value to compare to. All comparisons
                                                      are done after converting the given value to the corres
                                                      ponding data value type of the meta value (for lists,
                                                      this simply compares length, not content!)!

                                                      

ID options. The Priority of the id-flags is: remove_annotated_features / remove_unannotated_features -> remov
e_clashes -> keep_best_score_id -> sequences_whitelist / accessions_whitelist.:
  -id:keep_best_score_id                              In case of multiple peptide identifications, keep only 
                                                      the id with best score
  -id:sequences_whitelist <sequence>                  Keep only features with white listed sequences, e.g. 
                                                      LYSNLVER or the modification (Oxidation)
  -id:accessions_whitelist <accessions>               Keep only features with white listed accessions, e.g. 
                                                      sp|P02662|CASA1_BOVIN
  -id:remove_annotated_features                       Remove features with annotations
  -id:remove_unannotated_features                     Remove features without annotations
  -id:remove_unassigned_ids                           Remove unassigned peptide identifications
  -id:blacklist <file>                                Input file containing MS2 identifications whose corresp
                                                      onding MS2 spectra should be removed from the mzML file
                                                      !
                                                      Matching tolerances are taken from 'id:rt' and 'id:mz'
                                                      options.
                                                      This tool will require all IDs to be matched to an MS2
                                                      spectrum, and quit with error otherwise. Use 'id:black
                                                      list_imperfect' to allow for mismatches. (valid formats
                                                      : 'idXML')
  -id:rt tolerance                                    Retention tolerance [s] for precursor to id position 
                                                      (default: '0.1' min: '0')
  -id:mz tolerance                                    M/z tolerance [Th] for precursor to id position (defaul
                                                      t: '0.001' min: '0')
  -id:blacklist_imperfect                             Allow for mismatching precursor positions (see 'id:blac
                                                      klist')

                                                      
                                                      
Common TOPP options:
  -ini <file>                                         Use the given TOPP INI file
  -threads <n>                                        Sets the number of threads allowed to be used by the 
                                                      TOPP tool (default: '1')
  -write_ini <file>                                   Writes the default configuration file
  --help                                              Shows options
  --helphelp                                          Shows all options (including advanced)

The following configuration subsections are valid:
 - algorithm   S/N algorithm section

You can write an example INI file using the '-write_ini' option.
Documentation of subsection parameters can be found in the doxygen documentation or the INIFileEditor.
Have a look at the OpenMS documentation for more information.

INI file documentation of this tool:

Legend:
required parameter
advanced parameter
+FileFilterExtracts or manipulates portions of data from peak, feature or consensus-feature files.
version1.11.1 Version of the tool that generated this parameters file.
++1Instance '1' section for 'FileFilter'
in input file input file*.mzML,*.featureXML,*.consensusXML
in_type input file type -- default: determined from file extension or content
mzML,featureXML,consensusXML
out output fileoutput file*.mzML,*.featureXML,*.consensusXML
out_type output file type -- default: determined from file extension or content
mzML,featureXML,consensusXML
rt: retention time range to extract
mz: m/z range to extract (applies to ALL ms levels!)
pc_mz: MSn (n>=2) precursor filtering according to their m/z value. Do not use this flag in conjunction with 'mz', unless you want to actually remove peaks in spectra (see 'mz'). RT filtering is covered by 'rt' and compatible with this flag.
int: intensity range to extract
sortfalse sorts the output according to RT and m/z.true,false
log Name of log file (created only when specified)
debug0 Sets the debug level
threads1 Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse Disables progress logging to command linetrue,false
testfalse Enables the test mode (needed for internal use only)true,false
+++peak_optionsPeak data options
sn0 write peaks with S/N > 'sn' values only
rm_pc_charge[] Remove MS(2) spectra with these precursor charges. All spectra without precursor are kept!
level[1, 2, 3] MS levels to extract
sort_peaksfalse sorts the peaks according to m/z.true,false
no_chromatogramsfalse No conversion to space-saving real chromatograms, e.g. from SRM scans.true,false
remove_chromatogramsfalse Removes chromatograms stored in a file.true,false
mz_precision64 Store base64 encoded m/z data using 32 or 64 bit precision.32,64
int_precision32 Store base64 encoded intensity data using 32 or 64 bit precision.32,64
+++spectraRemove spectra or select spectra (removing all others) with certain properties.
remove_zoomfalse Remove zoom (enhanced resolution) scanstrue,false
remove_mode Remove scans by scan mode
Unknown,MassSpectrum,MS1Spectrum,MSnSpectrum,SelectedIonMonitoring,SelectedReactionMonitoring,ConsecutiveReactionMonitoring,ConstantNeutralGain,ConstantNeutralLoss,Precursor,EnhancedMultiplyCharged,TimeDelayedFragmentation,ElectromagneticRadiation,Emission,Absorbtion
remove_activation Remove MSn scans where any of its precursors features a certain activation method
Collision-induced dissociation,Post-source decay,Plasma desorption,Surface-induced dissociation,Blackbody infrared radiative dissociation,Electron capture dissociation,Infrared multiphoton dissociation,Sustained off-resonance irradiation,High-energy collision-induced dissociation,Low-energy collision-induced dissociation,Photodissociation,Electron transfer dissociation,Pulsed q dissociation
remove_collision_energy: Remove MSn scans with a collision energy in the given interval.
remove_isolation_window_width: Remove MSn scans whichs isolation window width is in the given interval.
select_zoomfalse Select zoom (enhanced resolution) scanstrue,false
select_mode Selects scans by scan mode
Unknown,MassSpectrum,MS1Spectrum,MSnSpectrum,SelectedIonMonitoring,SelectedReactionMonitoring,ConsecutiveReactionMonitoring,ConstantNeutralGain,ConstantNeutralLoss,Precursor,EnhancedMultiplyCharged,TimeDelayedFragmentation,ElectromagneticRadiation,Emission,Absorbtion
select_activation Select MSn scans where any of its precursors features a certain activation method
Collision-induced dissociation,Post-source decay,Plasma desorption,Surface-induced dissociation,Blackbody infrared radiative dissociation,Electron capture dissociation,Infrared multiphoton dissociation,Sustained off-resonance irradiation,High-energy collision-induced dissociation,Low-energy collision-induced dissociation,Photodissociation,Electron transfer dissociation,Pulsed q dissociation
select_collision_energy: Select MSn scans with a collision energy in the given interval.
select_isolation_window_width: Select MSn scans whichs isolation window width is in the given interval.
+++featureFeature data options
q: Overall quality range to extract [0:1]
+++consensusConsensus feature data options
map[] maps to be extracted from a consensus
map_andfalse Consensus features are kept only if they contain exactly one feature from each map (as given above in 'map').true,false
++++blackorwhitelistBlack or white listing of of MS2 spectra by consensus features.
blacklisttrue True: remove matched MS2. False: retain matched MS2 spectra. Other levels are kept.false,true
file Input file containing consensus features whose corresponding MS2 spectra should be removed from the mzML file!
Matching tolerances are taken from 'consensus:blackorwhitelist:rt' and 'consensus:blackorwhitelist:mz' options.
If consensus:blackorwhitelist:maps is specified, only these will be used.
input file*.consensusXML
maps[] maps used for black/white list filtering.
rt60 retention tolerance [s] for precursor to consensus feature position0:∞
mz0.01 m/z tolerance [Th] for precursor to consensus feature position0:∞
use_ppm_tolerancefalse If ppm tolerance should be used. Otherwise Da are used.false,true
+++f_and_cFeature & Consensus data options
charge: charge range to extract
size: size range to extract
remove_meta[] Expects a 3-tuple (=3 entries in the list), i.e. 'lt|eq|gt' ; the first is the name of meta value, followed by the comparison operator (equal, less or greater) and the value to compare to. All comparisons are done after converting the given value to the corresponding data value type of the meta value (for lists, this simply compares length, not content!)!
+++idID options. The Priority of the id-flags is: remove_annotated_features / remove_unannotated_features -> remove_clashes -> keep_best_score_id -> sequences_whitelist / accessions_whitelist.
remove_clashesfalse remove features with id clashes (different sequences mapped to one feature)true,false
keep_best_score_idfalse in case of multiple peptide identifications, keep only the id with best scoretrue,false
sequences_whitelist[] keep only features with white listed sequences, e.g. LYSNLVER or the modification (Oxidation)
accessions_whitelist[] keep only features with white listed accessions, e.g. sp|P02662|CASA1_BOVIN
remove_annotated_featuresfalse remove features with annotationstrue,false
remove_unannotated_featuresfalse remove features without annotationstrue,false
remove_unassigned_idsfalse remove unassigned peptide identificationstrue,false
blacklist Input file containing MS2 identifications whose corresponding MS2 spectra should be removed from the mzML file!
Matching tolerances are taken from 'id:rt' and 'id:mz' options.
This tool will require all IDs to be matched to an MS2 spectrum, and quit with error otherwise. Use 'id:blacklist_imperfect' to allow for mismatches.
input file*.idXML
rt0.1 retention tolerance [s] for precursor to id position0:∞
mz0.001 m/z tolerance [Th] for precursor to id position0:∞
blacklist_imperfectfalse Allow for mismatching precursor positions (see 'id:blacklist')true,false
+++algorithmS/N algorithm section
++++SignalToNoise
max_intensity-1 maximal intensity considered for histogram construction. By default, it will be calculated automatically (see auto_mode). Only provide this parameter if you know what you are doing (and change 'auto_mode' to '-1')! All intensities EQUAL/ABOVE 'max_intensity' will be added to the LAST histogram bin. If you choose 'max_intensity' too small, the noise estimate might be too small as well. If chosen too big, the bins become quite large (which you could counter by increasing 'bin_count', which increases runtime). In general, the Median-S/N estimator is more robust to a manual max_intensity than the MeanIterative-S/N.-1:∞
auto_max_stdev_factor3 parameter for 'max_intensity' estimation (if 'auto_mode' == 0): mean + 'auto_max_stdev_factor' * stdev0:999
auto_max_percentile95 parameter for 'max_intensity' estimation (if 'auto_mode' == 1): auto_max_percentile th percentile0:100
auto_mode0 method to use to determine maximal intensity: -1 --> use 'max_intensity'; 0 --> 'auto_max_stdev_factor' method (default); 1 --> 'auto_max_percentile' method-1:1
win_len200 window length in Thomson1:∞
bin_count30 number of bins for intensity values3:∞
min_required_elements10 minimum number of elements required in a window (otherwise it is considered sparse)1:∞
noise_for_empty_window1e+20 noise value used for sparse windows

For the parameters of the S/N algorithm section see the class documentation there:
peak_options:sn

Todo:
add tests for selecting modes (port remove modes) (Andreas)

OpenMS / TOPP release 1.11.1 Documentation generated on Thu Nov 14 2013 11:19:24 using doxygen 1.8.5