Tool to estimate the probability of peptide hits to be incorrectly assigned.
potential predecessor tools | ![]() ![]() | potential successor tools |
MascotAdapter (or other ID engines) | ConsensusID |
By default an estimation is performed using the (inverse) Gumbel distribution for incorrectly assigned sequences and a Gaussian distribution for correctly assigned sequences. The probabilities are calculated by using Bayes' law, similar to PeptideProphet. Alternatively, a second Gaussian distribution can be used for incorrectly assigned sequences. At the moment, IDPosteriorErrorProbability is able to handle X!Tandem, Mascot, MyriMatch and OMSSA scores.
No target/decoy information needs to be provided, since the model fits are done on the mixed distribution.
In order to validate the computed probabilities one can adjust the fit_algorithm subsection.
There are three parameters for the plot: The parameter 'output_plots' is by default false. If set to true the plot will be created. The scores are plotted in form of bins. Each bin represents a set of scores in a range of (highest_score - smallest_score)/number_of_bins (if all scores have positive values). The midpoint of the bin is the mean of the scores it represents. Finally, the parameter output_name should be used to give the plot a unique name. Two files are created. One with the binned scores and one with all steps of the estimation. If top_hits_only is set, only the top hits of each PeptideIndentification are used for the estimation process. Additionally, if 'top_hits_only' is set, target_decoy information are available and a False Discovery Rate run was performed before, an additional plot will be plotted with target and decoy bins(output_plot must be true in fit_algorithm subsection). A peptide hit is assumed to be a target if its q-value is smaller than fdr_for_targets_smaller.
Actually, the plots are saved as a gnuplot file. Therefore, to visualize the plots one has to use gnuplot, e.g. gnuplot file_name. This should output a postscript file which contains all steps of the estimation.
The command line parameters of this tool are:
IDPosteriorErrorProbability -- Estimates probabilities for incorrectly assigned peptide sequences and a set of search engine scores using a mixture model. Version: 1.11.1 Nov 14 2013, 11:18:15, Revision: 11976 Usage: IDPosteriorErrorProbability <options> This tool has algoritm parameters that are not shown here! Please check the ini file for a detailed descripti on or use the --helphelp option. Options (mandatory options marked with '*'): -in <file>* Input file (valid formats: 'idXML') -out <file>* Output file (valid formats: 'idXML') -output_name <file>* Gnuplot file as txt (valid formats: 'txt') -split_charge The search engine scores are split by charge if this flag is set. Thus, for each char ge state a new model will be computed. -top_hits_only If set only the top hits of every PeptideIdentification will be used -ignore_bad_data If set errors will be written but ignored. Useful for pipelines with many datasets where only a few are bad, but the pipeline should run through. -prob_correct If set scores will be calculated as 1-ErrorProbabilities and can be interpreted as probabilities for correct identifications. Common TOPP options: -ini <file> Use the given TOPP INI file -threads <n> Sets the number of threads allowed to be used by the TOPP tool (default: '1') -write_ini <file> Writes the default configuration file --help Shows options --helphelp Shows all options (including advanced) The following configuration subsections are valid: - fit_algorithm Algorithm parameter subsection You can write an example INI file using the '-write_ini' option. Documentation of subsection parameters can be found in the doxygen documentation or the INIFileEditor. Have a look at the OpenMS documentation for more information.
INI file documentation of this tool:
For the parameters of the algorithm section see the algorithms documentation:
fit_algorithm
OpenMS / TOPP release 1.11.1 | Documentation generated on Thu Nov 14 2013 11:19:24 using doxygen 1.8.5 |