Home  · Classes  · Annotated Classes  · Modules  · Members  · Namespaces  · Related Pages
FeatureFinderAlgorithmPicked.h
Go to the documentation of this file.
1 // --------------------------------------------------------------------------
2 // OpenMS -- Open-Source Mass Spectrometry
3 // --------------------------------------------------------------------------
4 // Copyright The OpenMS Team -- Eberhard Karls University Tuebingen,
5 // ETH Zurich, and Freie Universitaet Berlin 2002-2013.
6 //
7 // This software is released under a three-clause BSD license:
8 // * Redistributions of source code must retain the above copyright
9 // notice, this list of conditions and the following disclaimer.
10 // * Redistributions in binary form must reproduce the above copyright
11 // notice, this list of conditions and the following disclaimer in the
12 // documentation and/or other materials provided with the distribution.
13 // * Neither the name of any author or any participating institution
14 // may be used to endorse or promote products derived from this software
15 // without specific prior written permission.
16 // For a full list of authors, refer to the file AUTHORS.
17 // --------------------------------------------------------------------------
18 // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
19 // AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
20 // IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
21 // ARE DISCLAIMED. IN NO EVENT SHALL ANY OF THE AUTHORS OR THE CONTRIBUTING
22 // INSTITUTIONS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
23 // EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
24 // PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
25 // OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
26 // WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
27 // OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
28 // ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29 //
30 // --------------------------------------------------------------------------
31 // $Maintainer: Oliver Kohlbacher, Stephan Aiche $
32 // $Authors: Marc Sturm $
33 // --------------------------------------------------------------------------
34 
35 #ifndef OPENMS_TRANSFORMATIONS_FEATUREFINDER_FEATUREFINDERALGORITHMPICKED_H
36 #define OPENMS_TRANSFORMATIONS_FEATUREFINDER_FEATUREFINDERALGORITHMPICKED_H
37 
43 
44 #include <OpenMS/FORMAT/MzMLFile.h>
46 #include <OpenMS/FORMAT/TextFile.h>
54 
55 #include <boost/math/special_functions/fpclassify.hpp>
56 
57 #include <numeric>
58 #include <fstream>
59 #include <algorithm>
60 
61 #include <QtCore/QDir>
62 
63 #ifdef _OPENMP
64 #include <omp.h>
65 #endif
66 
67 
68 namespace OpenMS
69 {
83  template <class PeakType, class FeatureType>
85  public FeatureFinderAlgorithm<PeakType, FeatureType>,
86  public FeatureFinderDefs
87  {
88 public:
90 
96 
101 
102 protected:
108 
109 public:
112  FeatureFinderAlgorithm<PeakType, FeatureType>(),
113  map_(),
114  log_()
115  {
116  //debugging
117  defaults_.setValue("debug", "false", "When debug mode is activated, several files with intermediate results are written to the folder 'debug' (do not use in parallel mode).");
118  defaults_.setValidStrings("debug", StringList::create("true,false"));
119  //intensity
120  defaults_.setValue("intensity:bins", 10, "Number of bins per dimension (RT and m/z). The higher this value, the more local the intensity significance score is.\nThis parameter should be decreased, if the algorithm is used on small regions of a map.");
121  defaults_.setMinInt("intensity:bins", 1);
122  defaults_.setSectionDescription("intensity", "Settings for the calculation of a score indicating if a peak's intensity is significant in the local environment (between 0 and 1)");
123  //mass trace search parameters
124  defaults_.setValue("mass_trace:mz_tolerance", 0.03, "Tolerated m/z deviation of peaks belonging to the same mass trace.\nIt should be larger than the m/z resolution of the instument.\nThis value must be smaller than that 1/charge_high!");
125  defaults_.setMinFloat("mass_trace:mz_tolerance", 0.0);
126  defaults_.setValue("mass_trace:min_spectra", 10, "Number of spectra that have to show a similar peak mass in a mass trace.");
127  defaults_.setMinInt("mass_trace:min_spectra", 1);
128  defaults_.setValue("mass_trace:max_missing", 1, "Number of consecutive spectra where a high mass deviation or missing peak is acceptable.\nThis parameter should be well below 'min_spectra'!");
129  defaults_.setMinInt("mass_trace:max_missing", 0);
130  defaults_.setValue("mass_trace:slope_bound", 0.1, "The maximum slope of mass trace intensities when extending from the highest peak.\nThis parameter is important to seperate overlapping elution peaks.\nIt should be increased if feature elution profiles fluctuate a lot.");
131  defaults_.setMinFloat("mass_trace:slope_bound", 0.0);
132  defaults_.setSectionDescription("mass_trace", "Settings for the calculation of a score indicating if a peak is part of a mass trace (between 0 and 1).");
133  //Isotopic pattern search parameters
134  defaults_.setValue("isotopic_pattern:charge_low", 1, "Lowest charge to search for.");
135  defaults_.setMinInt("isotopic_pattern:charge_low", 1);
136  defaults_.setValue("isotopic_pattern:charge_high", 4, "Highest charge to search for.");
137  defaults_.setMinInt("isotopic_pattern:charge_high", 1);
138  defaults_.setValue("isotopic_pattern:mz_tolerance", 0.03, "Tolerated m/z deviation from the theoretical isotopic pattern.\nIt should be larger than the m/z resolution of the instument.\nThis value must be smaller than that 1/charge_high!");
139  defaults_.setMinFloat("isotopic_pattern:mz_tolerance", 0.0);
140  defaults_.setValue("isotopic_pattern:intensity_percentage", 10.0, "Isotopic peaks that contribute more than this percentage to the overall isotope pattern intensity must be present.", StringList::create("advanced"));
141  defaults_.setMinFloat("isotopic_pattern:intensity_percentage", 0.0);
142  defaults_.setMaxFloat("isotopic_pattern:intensity_percentage", 100.0);
143  defaults_.setValue("isotopic_pattern:intensity_percentage_optional", 0.1, "Isotopic peaks that contribute more than this percentage to the overall isotope pattern intensity can be missing.", StringList::create("advanced"));
144  defaults_.setMinFloat("isotopic_pattern:intensity_percentage_optional", 0.0);
145  defaults_.setMaxFloat("isotopic_pattern:intensity_percentage_optional", 100.0);
146  defaults_.setValue("isotopic_pattern:optional_fit_improvement", 2.0, "Minimal percental improvement of isotope fit to allow leaving out an optional peak.", StringList::create("advanced"));
147  defaults_.setMinFloat("isotopic_pattern:optional_fit_improvement", 0.0);
148  defaults_.setMaxFloat("isotopic_pattern:optional_fit_improvement", 100.0);
149  defaults_.setValue("isotopic_pattern:mass_window_width", 25.0, "Window width in Dalton for precalculation of estimated isotope distributions.", StringList::create("advanced"));
150  defaults_.setMinFloat("isotopic_pattern:mass_window_width", 1.0);
151  defaults_.setMaxFloat("isotopic_pattern:mass_window_width", 200.0);
152  defaults_.setValue("isotopic_pattern:abundance_12C", 98.93, "Rel. abundance of the light carbon. Modify if labeled.", StringList::create("advanced"));
153  defaults_.setMinFloat("isotopic_pattern:abundance_12C", 0.0);
154  defaults_.setMaxFloat("isotopic_pattern:abundance_12C", 100.0);
155  defaults_.setValue("isotopic_pattern:abundance_14N", 99.632, "Rel. abundance of the light nitrogen. Modify if labeled.", StringList::create("advanced"));
156  defaults_.setMinFloat("isotopic_pattern:abundance_14N", 0.0);
157  defaults_.setMaxFloat("isotopic_pattern:abundance_14N", 100.0);
158 
159  defaults_.setSectionDescription("isotopic_pattern", "Settings for the calculation of a score indicating if a peak is part of a isotopic pattern (between 0 and 1).");
160  //Seed settings
161  defaults_.setValue("seed:min_score", 0.8, "Minimum seed score a peak has to reach to be used as seed.\nThe seed score is the geometric mean of intensity score, mass trace score and isotope pattern score.\nIf your features show a large deviation from the averagene isotope distribution or from an gaussian elution profile, lower this score.");
162  defaults_.setMinFloat("seed:min_score", 0.0);
163  defaults_.setMaxFloat("seed:min_score", 1.0);
164  defaults_.setSectionDescription("seed", "Settings that determine which peaks are considered a seed");
165  //Fitting settings
166  defaults_.setValue("fit:epsilon_abs", 0.0001, "Absolute epsilon used for convergence of the fit.", StringList::create("advanced"));
167  defaults_.setMinFloat("fit:epsilon_abs", 0.0);
168  defaults_.setValue("fit:epsilon_rel", 0.0001, "Relative epsilon used for convergence of the fit.", StringList::create("advanced"));
169  defaults_.setMinFloat("fit:epsilon_rel", 0.0);
170  defaults_.setValue("fit:max_iterations", 500, "Maximum number of iterations of the fit.", StringList::create("advanced"));
171  defaults_.setMinInt("fit:max_iterations", 1);
172  defaults_.setSectionDescription("fit", "Settings for the model fitting");
173  //Feature settings
174  defaults_.setValue("feature:min_score", 0.7, "Feature score threshold for a feature to be reported.\nThe feature score is the geometric mean of the average relative deviation and the correlation between the model and the observed peaks.");
175  defaults_.setMinFloat("feature:min_score", 0.0);
176  defaults_.setMaxFloat("feature:min_score", 1.0);
177  defaults_.setValue("feature:min_isotope_fit", 0.8, "Minimum isotope fit of the feature before model fitting.", StringList::create("advanced"));
178  defaults_.setMinFloat("feature:min_isotope_fit", 0.0);
179  defaults_.setMaxFloat("feature:min_isotope_fit", 1.0);
180  defaults_.setValue("feature:min_trace_score", 0.5, "Trace score threshold.\nTraces below this threshold are removed after the model fitting.\nThis parameter is important for features that overlap in m/z dimension.", StringList::create("advanced"));
181  defaults_.setMinFloat("feature:min_trace_score", 0.0);
182  defaults_.setMaxFloat("feature:min_trace_score", 1.0);
183  defaults_.setValue("feature:min_rt_span", 0.333, "Minimum RT span in relation to extended area that has to remain after model fitting.", StringList::create("advanced"));
184  defaults_.setMinFloat("feature:min_rt_span", 0.0);
185  defaults_.setMaxFloat("feature:min_rt_span", 1.0);
186  defaults_.setValue("feature:max_rt_span", 2.5, "Maximum RT span in relation to extended area that the model is allowed to have.", StringList::create("advanced"));
187  defaults_.setMinFloat("feature:max_rt_span", 0.5);
188  defaults_.setValue("feature:rt_shape", "symmetric", "Choose model used for RT profile fitting. If set to symmetric a gauss shape is used, in case of asymmetric an EGH shape is used.", StringList::create("advanced"));
189  defaults_.setValidStrings("feature:rt_shape", StringList::create("symmetric,asymmetric"));
190  defaults_.setValue("feature:max_intersection", 0.35, "Maximum allowed intersection of features.", StringList::create("advanced"));
191  defaults_.setMinFloat("feature:max_intersection", 0.0);
192  defaults_.setMaxFloat("feature:max_intersection", 1.0);
193  defaults_.setValue("feature:reported_mz", "monoisotopic", "The mass type that is reported for features.\n'maximum' returns the m/z value of the highest mass trace.\n'average' returns the intensity-weighted average m/z value of all contained peaks.\n'monoisotopic' returns the monoisotopic m/z value derived from the fitted isotope model.");
194  defaults_.setValidStrings("feature:reported_mz", StringList::create("maximum,average,monoisotopic"));
195  defaults_.setSectionDescription("feature", "Settings for the features (intensity, quality assessment, ...)");
196  //user-specified seed settings
197  defaults_.setValue("user-seed:rt_tolerance", 5.0, "Allowed RT deviation of seeds from the user-specified seed position.");
198  defaults_.setMinFloat("user-seed:rt_tolerance", 0.0);
199  defaults_.setValue("user-seed:mz_tolerance", 1.1, "Allowed m/z deviation of seeds from the user-specified seed position.");
200  defaults_.setMinFloat("user-seed:mz_tolerance", 0.0);
201  defaults_.setValue("user-seed:min_score", 0.5, "Overwrites 'seed:min_score' for user-specified seeds. The cutoff is typically a bit lower in this case.");
202  defaults_.setMinFloat("user-seed:min_score", 0.0);
203  defaults_.setMaxFloat("user-seed:min_score", 1.0);
204  defaults_.setSectionDescription("user-seed", "Settings for user-specified seeds.");
205  //debug settings
206  defaults_.setValue("debug:pseudo_rt_shift", 500.0, "Pseudo RT shift used when .", StringList::create("advanced"));
207  defaults_.setMinFloat("debug:pseudo_rt_shift", 1.0);
208  this->defaultsToParam_();
209  }
210 
211  // docu in base class
212  virtual void setSeeds(const FeatureMapType& seeds)
213  {
214  seeds_ = seeds;
215  }
216 
218  virtual void run()
219  {
220  //-------------------------------------------------------------------------
221  //General initialization
222  //---------------------------------------------------------------------------
223 
224  //quality estimation
225  DoubleReal min_feature_score = param_.getValue("feature:min_score");
226  //charges to look at
227  SignedSize charge_low = (Int)param_.getValue("isotopic_pattern:charge_low");
228  SignedSize charge_high = (Int)param_.getValue("isotopic_pattern:charge_high");
229  //fitting settings
230  UInt max_iterations = param_.getValue("fit:max_iterations");
231  DoubleReal epsilon_abs = param_.getValue("fit:epsilon_abs");
232  DoubleReal epsilon_rel = param_.getValue("fit:epsilon_rel");
233 
234  Size max_isotopes = 20;
235 
236  // check if non-natural isotopic abundances are set. If so modify
237  DoubleReal abundance_12C = param_.getValue("isotopic_pattern:abundance_12C");
238  DoubleReal abundance_14N = param_.getValue("isotopic_pattern:abundance_14N");
239 
240  const Element* carbon_const = ElementDB::getInstance()->getElement("Carbon");
241  Element* carbon = const_cast<Element*>(carbon_const);
242 
243  if (param_.getValue("isotopic_pattern:abundance_12C") != defaults_.getValue("isotopic_pattern:abundance_12C"))
244  {
245  max_isotopes += 1000;
246  IsotopeDistribution isotopes;
247  std::vector<std::pair<Size, double> > container;
248  container.push_back(std::make_pair(12, abundance_12C / 100.0));
249  container.push_back(std::make_pair(13, 1.0 - (abundance_12C / 100.0)));
250  isotopes.set(container);
251  carbon->setIsotopeDistribution(isotopes);
252  }
253 
254  const Element* nitrogen_const = ElementDB::getInstance()->getElement("Nitrogen");
255  Element* nitrogen = const_cast<Element*>(nitrogen_const);
256 
257  if (param_.getValue("isotopic_pattern:abundance_14N") != defaults_.getValue("isotopic_pattern:abundance_14N"))
258  {
259  max_isotopes += 1000;
260  IsotopeDistribution isotopes;
261  std::vector<std::pair<Size, double> > container;
262  container.push_back(std::make_pair(14, abundance_14N / 100.0));
263  container.push_back(std::make_pair(15, 1.0 - (abundance_14N / 100.0)));
264  isotopes.set(container);
265  nitrogen->setIsotopeDistribution(isotopes);
266  }
267 
268  // initialize trace fitter parameters here to avoid
269  // bug https://sourceforge.net/apps/trac/open-ms/ticket/147
270  Param trace_fitter_params;
271  trace_fitter_params.setValue("max_iteration", max_iterations);
272  trace_fitter_params.setValue("epsilon_abs", epsilon_abs);
273  trace_fitter_params.setValue("epsilon_rel", epsilon_rel);
274 
275  //copy the input map
277 
278  //flag for user-specified seed mode
279  bool user_seeds = (seeds_.size() > 0);
280  if (user_seeds)
281  {
282  seeds_.sortByMZ();
283  }
284  DoubleReal user_rt_tol = param_.getValue("user-seed:rt_tolerance");
285  DoubleReal user_mz_tol = param_.getValue("user-seed:mz_tolerance");
286  DoubleReal user_seed_score = param_.getValue("user-seed:min_score");
287 
288  //reserve space for calculated scores
289  UInt charge_count = charge_high - charge_low + 1;
290  for (Size s = 0; s < map_.size(); ++s)
291  {
292  Size scan_size = map_[s].size();
293  map_[s].getFloatDataArrays().resize(3 + 2 * charge_count);
294  map_[s].getFloatDataArrays()[0].setName("trace_score");
295  map_[s].getFloatDataArrays()[0].assign(scan_size, 0.0);
296  map_[s].getFloatDataArrays()[1].setName("intensity_score");
297  map_[s].getFloatDataArrays()[1].assign(scan_size, 0.0);
298  map_[s].getFloatDataArrays()[2].setName("local_max");
299  map_[s].getFloatDataArrays()[2].assign(scan_size, 0.0);
300  //create isotope pattern score arrays
301  UInt charge = charge_low;
302  for (Size i = 3; i < 3 + charge_count; ++i)
303  {
304  map_[s].getFloatDataArrays()[i].setName(String("pattern_score_") + charge);
305  map_[s].getFloatDataArrays()[i].assign(scan_size, 0.0);
306  ++charge;
307  }
308  //create overall score arrays
309  charge = charge_low;
310  for (Size i = 3 + charge_count; i < 3 + 2 * charge_count; ++i)
311  {
312  map_[s].getFloatDataArrays()[i].setName(String("overall_score_") + charge);
313  map_[s].getFloatDataArrays()[i].assign(scan_size, 0.0);
314  ++charge;
315  }
316  }
317 
318  int gl_progress = 0;
319  debug_ = ((String)(param_.getValue("debug")) == "true");
320  //clean up / create folders for debug information
321  if (debug_)
322  {
323  QDir dir(".");
324  dir.mkpath("debug/features");
325  log_.open("debug/log.txt");
326  }
327 
328  //---------------------------------------------------------------------------
329  //Step 1:
330  //Precalculate intensity scores for peaks
331  //---------------------------------------------------------------------------
332  if (debug_) log_ << "Precalculating intensity thresholds ..." << std::endl;
333  //new scope to make local variables disappear
334  {
335  ff_->startProgress(0, intensity_bins_ * intensity_bins_, "Precalculating intensity scores");
336  DoubleReal rt_start = map_.getMinRT();
337  DoubleReal mz_start = map_.getMinMZ();
340  intensity_thresholds_.resize(intensity_bins_);
341  for (Size rt = 0; rt < intensity_bins_; ++rt)
342  {
343  intensity_thresholds_[rt].resize(intensity_bins_);
344  DoubleReal min_rt = rt_start + rt * intensity_rt_step_;
345  DoubleReal max_rt = rt_start + (rt + 1) * intensity_rt_step_;
346  std::vector<DoubleReal> tmp;
347  for (Size mz = 0; mz < intensity_bins_; ++mz)
348  {
349  ff_->setProgress(rt * intensity_bins_ + mz);
350  DoubleReal min_mz = mz_start + mz * intensity_mz_step_;
351  DoubleReal max_mz = mz_start + (mz + 1) * intensity_mz_step_;
352  //std::cout << "rt range: " << min_rt << " - " << max_rt << std::endl;
353  //std::cout << "mz range: " << min_mz << " - " << max_mz << std::endl;
354  tmp.clear();
355  for (typename MapType::ConstAreaIterator it = map_.areaBeginConst(min_rt, max_rt, min_mz, max_mz); it != map_.areaEndConst(); ++it)
356  {
357  tmp.push_back(it->getIntensity());
358  }
359  //init vector
360  intensity_thresholds_[rt][mz].assign(21, 0.0);
361  //store quantiles (20)
362  if (!tmp.empty())
363  {
364  std::sort(tmp.begin(), tmp.end());
365  for (Size i = 0; i < 21; ++i)
366  {
367  Size index = (Size) std::floor(0.05 * i * (tmp.size() - 1));
368  intensity_thresholds_[rt][mz][i] = tmp[index];
369  }
370  }
371  }
372  }
373 
374  //store intensity score in PeakInfo
375  for (Size s = 0; s < map_.size(); ++s)
376  {
377  for (Size p = 0; p < map_[s].size(); ++p)
378  {
379  map_[s].getFloatDataArrays()[1][p] = intensityScore_(s, p);
380  }
381  }
382  ff_->endProgress();
383  }
384 
385  //---------------------------------------------------------------------------
386  //Step 2:
387  //Precalculate mass trace scores and local trace maximum for each peak
388  //---------------------------------------------------------------------------
389  //new scope to make local variables disappear
390  {
391  Size end_iteration = map_.size() - std::min((Size) min_spectra_, map_.size());
392  ff_->startProgress(min_spectra_, end_iteration, "Precalculating mass trace scores");
393  // skip first and last scans since we cannot extend the mass traces there
394  for (Size s = min_spectra_; s < end_iteration; ++s)
395  {
396  ff_->setProgress(s);
397  const SpectrumType& spectrum = map_[s];
398  //iterate over all peaks of the scan
399  for (Size p = 0; p < spectrum.size(); ++p)
400  {
401  std::vector<DoubleReal> scores;
402  scores.reserve(2 * min_spectra_);
403 
404  DoubleReal pos = spectrum[p].getMZ();
405  Real inte = spectrum[p].getIntensity();
406 
407  //if(debug_) log_ << std::endl << "Peak: " << pos << std::endl;
408  bool is_max_peak = true; //checking the maximum intensity peaks -> use them later as feature seeds.
409  for (Size i = 1; i <= min_spectra_; ++i)
410  {
411  try
412  {
413  Size spec_index = map_[s + i].findNearest(pos);
414  DoubleReal position_score = positionScore_(pos, map_[s + i][spec_index].getMZ(), trace_tolerance_);
415  if (position_score > 0 && map_[s + i][spec_index].getIntensity() > inte) is_max_peak = false;
416  scores.push_back(position_score);
417  }
418  catch (...) //no peaks in the spectrum
419  {
420  scores.push_back(0.0);
421  }
422  }
423  for (Size i = 1; i <= min_spectra_; ++i)
424  {
425  try
426  {
427  Size spec_index = map_[s - i].findNearest(pos);
428  DoubleReal position_score = positionScore_(pos, map_[s - i][spec_index].getMZ(), trace_tolerance_);
429  if (position_score > 0 && map_[s - i][spec_index].getIntensity() > inte) is_max_peak = false;
430  scores.push_back(position_score);
431  }
432  catch (...) //no peaks in the spectrum
433  {
434  scores.push_back(0.0);
435  }
436  }
437  //Calculate a consensus score out of the scores calculated before
438  DoubleReal trace_score = std::accumulate(scores.begin(), scores.end(), 0.0) / scores.size();
439 
440  //store final score for later use
441  map_[s].getFloatDataArrays()[0][p] = trace_score;
442  map_[s].getFloatDataArrays()[2][p] = is_max_peak;
443  }
444  }
445  ff_->endProgress();
446  }
447 
448  //---------------------------------------------------------------------------
449  //Step 2.5:
450  //Precalculate isotope distributions for interesting mass ranges
451  //---------------------------------------------------------------------------
452  //new scope to make local variables disappear
453  {
454  DoubleReal max_mass = map_.getMaxMZ() * charge_high;
455  Size num_isotopes = std::ceil(max_mass / mass_window_width_) + 1;
456  ff_->startProgress(0, num_isotopes, "Precalculating isotope distributions");
457 
458  //reserve enough space
459  isotope_distributions_.resize(num_isotopes);
460 
461  //calculate distribution if necessary
462  for (Size index = 0; index < num_isotopes; ++index)
463  {
464  //if(debug_) log_ << "Calculating iso dist for mass: " << 0.5*mass_window_width_ + index * mass_window_width_ << std::endl;
466  d.setMaxIsotope(max_isotopes);
468  //trim left and right. And store the number of isotopes on the left, to reconstruct the monoisotopic peak
469  Size size_before = d.size();
471  isotope_distributions_[index].trimmed_left = size_before - d.size();
473 
474  for (IsotopeDistribution::Iterator it = d.begin(); it != d.end(); ++it)
475  {
476  isotope_distributions_[index].intensity.push_back(it->second);
477  //if(debug_) log_ << " - " << it->second << std::endl;
478  }
479 
480  //determine the number of optional peaks at the beginning/end
481  Size begin = 0;
482  Size end = 0;
483  bool is_begin = true;
484  bool is_end = false;
485 
486  for (Size i = 0; i < isotope_distributions_[index].intensity.size(); ++i)
487  {
488  if (isotope_distributions_[index].intensity[i] < intensity_percentage_)
489  {
490  if (!is_end && !is_begin) is_end = true;
491  if (is_begin) ++begin;
492  else if (is_end) ++end;
493  }
494  else if (is_begin)
495  {
496  is_begin = false;
497  }
498  }
499  isotope_distributions_[index].optional_begin = begin;
500  isotope_distributions_[index].optional_end = end;
501  //scale the distibution to a maximum of 1
502  DoubleReal max = 0.0;
503  for (Size i = 0; i < isotope_distributions_[index].intensity.size(); ++i)
504  {
505  if (isotope_distributions_[index].intensity[i] > max)
506  {
507  max = isotope_distributions_[index].intensity[i];
508  }
509  }
510  isotope_distributions_[index].max = max;
511  for (Size i = 0; i < isotope_distributions_[index].intensity.size(); ++i)
512  {
513  isotope_distributions_[index].intensity[i] /= max;
514  }
515 
516  //if(debug_) log_ << " - optinal begin/end:" << begin << " / " << end << std::endl;
517  }
518 
519  ff_->endProgress();
520  }
521 
522  //-------------------------------------------------------------------------
523  //Step 3:
524  //Charge loop (create seeds and features for each charge separately)
525  //-------------------------------------------------------------------------
526  Int plot_nr_global = -1; //counter for the number of plots (debug info)
527  Int feature_nr_global = 0; //counter for the number of features (debug info)
528  for (SignedSize c = charge_low; c <= charge_high; ++c)
529  {
530  UInt meta_index_isotope = 3 + c - charge_low;
531  UInt meta_index_overall = 3 + charge_count + c - charge_low;
532 
533  Size feature_candidates = 0;
534  std::vector<Seed> seeds;
535 
536  //-----------------------------------------------------------
537  //Step 3.1: Precalculate IsotopePattern score
538  //-----------------------------------------------------------
539  ff_->startProgress(0, map_.size(), String("Calculating isotope pattern scores for charge ") + String(c));
540  for (Size s = 0; s < map_.size(); ++s)
541  {
542  ff_->setProgress(s);
543  const SpectrumType& spectrum = map_[s];
544  for (Size p = 0; p < spectrum.size(); ++p)
545  {
546  DoubleReal mz = spectrum[p].getMZ();
547 
548  //get isotope distribution for this mass
549  const TheoreticalIsotopePattern& isotopes = getIsotopeDistribution_(mz * c);
550  //determine highest peak in isotope distribution
551  Size max_isotope = std::max_element(isotopes.intensity.begin(), isotopes.intensity.end()) - isotopes.intensity.begin();
552  //Look up expected isotopic peaks (in the current spectrum or adjacent spectra)
553  Size peak_index = spectrum.findNearest(mz - ((DoubleReal)(isotopes.size() + 1) / c));
554  IsotopePattern pattern(isotopes.size());
555 
556  for (Size i = 0; i < isotopes.size(); ++i)
557  {
558  DoubleReal isotope_pos = mz + ((DoubleReal)i - max_isotope) / c;
559  findIsotope_(isotope_pos, s, pattern, i, peak_index);
560  }
561 
562  DoubleReal pattern_score = isotopeScore_(isotopes, pattern, true);
563 
564  //update pattern scores of all contained peaks (if necessary)
565  if (pattern_score > 0.0)
566  {
567  for (Size i = 0; i < pattern.peak.size(); ++i)
568  {
569  if (pattern.peak[i] >= 0 && pattern_score > map_[pattern.spectrum[i]].getFloatDataArrays()[meta_index_isotope][pattern.peak[i]])
570  {
571  map_[pattern.spectrum[i]].getFloatDataArrays()[meta_index_isotope][pattern.peak[i]] = pattern_score;
572  }
573  }
574  }
575  }
576  }
577  ff_->endProgress();
578  //-----------------------------------------------------------
579  //Step 3.2:
580  //Find seeds for this charge
581  //-----------------------------------------------------------
582  Size end_of_iteration = map_.size() - std::min((Size) min_spectra_, map_.size());
583  ff_->startProgress(min_spectra_, end_of_iteration, String("Finding seeds for charge ") + String(c));
584 
585  DoubleReal min_seed_score = param_.getValue("seed:min_score");
586  //do nothing for the first few and last few spectra as the scans required to search for traces are missing
587  for (Size s = min_spectra_; s < end_of_iteration; ++s)
588  {
589  ff_->setProgress(s);
590 
591  //iterate over peaks
592  for (Size p = 0; p < map_[s].size(); ++p)
593  {
594  FloatDataArrays& meta = map_[s].getFloatDataArrays();
595  DoubleReal overall_score = std::pow(meta[0][p] * meta[1][p] * meta[meta_index_isotope][p], 1.0f / 3.0f);
596  meta[meta_index_overall][p] = overall_score;
597 
598  //add seed to vector if certain conditions are fulfilled
599  if (meta[2][p] != 0.0) // local maximum of mass trace is prerequisite for all features
600  {
601  //automatic seeds: overall score greater than the min seed score
602  if (!user_seeds && overall_score >= min_seed_score)
603  {
604  Seed seed;
605  seed.spectrum = s;
606  seed.peak = p;
607  seed.intensity = map_[s][p].getIntensity();
608  seeds.push_back(seed);
609  }
610  //user-specified seeds: overall score greater than USER min seed score
611  else if (user_seeds && overall_score >= user_seed_score)
612  {
613  //only consider seeds, if they are near a user-specified seed
614  FeatureType tmp;
615  tmp.setMZ(map_[s][p].getMZ() - user_mz_tol);
616  for (typename FeatureMapType::const_iterator it = std::lower_bound(seeds_.begin(), seeds_.end(), tmp, typename FeatureType::MZLess()); it < seeds_.end(); ++it)
617  {
618  if (it->getMZ() > map_[s][p].getMZ() + user_mz_tol)
619  {
620  break;
621  }
622  if (fabs(it->getMZ() - map_[s][p].getMZ()) < user_mz_tol &&
623  fabs(it->getRT() - map_[s].getRT()) < user_rt_tol)
624  {
625  Seed seed;
626  seed.spectrum = s;
627  seed.peak = p;
628  seed.intensity = map_[s][p].getIntensity();
629  seeds.push_back(seed);
630  break;
631  }
632  }
633  }
634  }
635  }
636  }
637  //sort seeds according to intensity
638  std::sort(seeds.rbegin(), seeds.rend());
639  //create and store seeds map and selected peak map
640  if (debug_)
641  {
642  //seeds
643  FeatureMap<> seed_map;
644  seed_map.reserve(seeds.size());
645  for (Size i = 0; i < seeds.size(); ++i)
646  {
647  Size spectrum = seeds[i].spectrum;
648  Size peak = seeds[i].peak;
649  const FloatDataArrays& meta = map_[spectrum].getFloatDataArrays();
650  Feature tmp;
651  tmp.setIntensity(seeds[i].intensity);
652  tmp.setOverallQuality(meta[meta_index_overall][peak]);
653  tmp.setRT(map_[spectrum].getRT());
654  tmp.setMZ(map_[spectrum][peak].getMZ());
655  tmp.setMetaValue("intensity_score", meta[1][peak]);
656  tmp.setMetaValue("pattern_score", meta[meta_index_isotope][peak]);
657  tmp.setMetaValue("trace_score", meta[0][peak]);
658  seed_map.push_back(tmp);
659  }
660  FeatureXMLFile().store(String("debug/seeds_") + String(c) + ".featureXML", seed_map);
661  }
662 
663  ff_->endProgress();
664  std::cout << "Found " << seeds.size() << " seeds for charge " << c << "." << std::endl;
665 
666  //------------------------------------------------------------------
667  //Step 3.3:
668  //Extension of seeds
669  //------------------------------------------------------------------
670 
671  // We do not want to store features whose seeds lie within other
672  // features with higher intensity. We thus store this information in
673  // the map seeds_in_features which contains for each seed i a vector
674  // of other seeds that are contained in the corresponding feature i.
675  //
676  // The features are stored in an temporary feature map until it is
677  // decided whether they are contained within a seed of higher
678  // intensity.
679  std::map<Size, std::vector<Size> > seeds_in_features;
680  typedef std::map<Size, FeatureType> FeatureMapType;
681  FeatureMapType tmp_feature_map;
682  gl_progress = 0;
683  ff_->startProgress(0, seeds.size(), String("Extending seeds for charge ") + String(c));
684 #ifdef _OPENMP
685 #pragma omp parallel for
686 #endif
687  for (SignedSize i = 0; i < (SignedSize)seeds.size(); ++i)
688  {
689  //------------------------------------------------------------------
690  //Step 3.3.1:
691  //Extend all mass traces
692  //------------------------------------------------------------------
693 
694  const SpectrumType& spectrum = map_[seeds[i].spectrum];
695  const PeakType& peak = spectrum[seeds[i].peak];
696 
698  {
699  ff_->setProgress(gl_progress++);
700 
701  if (debug_)
702  {
703  log_ << std::endl << "Seed " << i << ":" << std::endl;
704  //If the intensity is zero this seed is already uses in another feature
705  log_ << " - Int: " << peak.getIntensity() << std::endl;
706  log_ << " - RT: " << spectrum.getRT() << std::endl;
707  log_ << " - MZ: " << peak.getMZ() << std::endl;
708  }
709  }
710 
711  //----------------------------------------------------------------
712  //Find best fitting isotope pattern for this charge (using averagine)
713  IsotopePattern best_pattern(0);
714  DoubleReal isotope_fit_quality = findBestIsotopeFit_(seeds[i], c, best_pattern);
715 
716  if (isotope_fit_quality < min_isotope_fit_)
717  {
718  abort_(seeds[i], "Could not find good enough isotope pattern containing the seed");
719  //continue;
720  }
721  else
722  {
723 
724  //extend the convex hull in RT dimension (starting from the trace peaks)
725  MassTraces traces;
726  traces.reserve(best_pattern.peak.size());
727  extendMassTraces_(best_pattern, traces, meta_index_overall);
728 
729  //check if the traces are still valid
730  DoubleReal seed_mz = map_[seeds[i].spectrum][seeds[i].peak].getMZ();
731 
732  if (!traces.isValid(seed_mz, trace_tolerance_))
733  {
734  abort_(seeds[i], "Could not extend seed");
735  //continue;
736  }
737  else
738  {
739 
740  //------------------------------------------------------------------
741  //Step 3.3.2:
742  //Gauss/EGH fit (first fit to find the feature boundaries)
743  //------------------------------------------------------------------
744  Int plot_nr = -1;
745 
746 #ifdef _OPENMP
747 #pragma omp critical (FeatureFinderAlgorithmPicked_PLOTNR)
748 #endif
749  {
750  plot_nr = ++plot_nr_global;
751  }
752 
753  //------------------------------------------------------------------
754 
755  //TODO try fit with baseline term once more
756  //baseline estimate
757  traces.updateBaseline();
758  traces.baseline = 0.75 * traces.baseline;
759 
760  traces[traces.max_trace].updateMaximum();
761 
762  // choose fitter
763  double egh_tau = 0.0;
764  TraceFitter<PeakType>* fitter = chooseTraceFitter_(egh_tau);
765 
766  fitter->setParameters(trace_fitter_params);
767  fitter->fit(traces);
768 
769 #if 0
771  Param alt_p;
772  alt_p.setValue("max_iteration", max_iterations);
773  alt_p.setValue("epsilon_abs", epsilon_abs);
774  alt_p.setValue("epsilon_rel", epsilon_rel);
775 
776  alt_fitter->setParameters(alt_p);
777  alt_fitter->fit(traces);
778 
779  LOG_DEBUG << "EGH: " << fitter->getCenter() << " " << fitter->getHeight() << std::endl;
780  LOG_DEBUG << "GAUSS: " << alt_fitter->getCenter() << " " << alt_fitter->getHeight() << std::endl;
781 #endif
782  // what should come out
783  // left "sigma"
784  // right "sigma"
785  // x0 .. "center" position of RT fit
786  // height .. "height" of RT fit
787 
788  //------------------------------------------------------------------
789 
790  //------------------------------------------------------------------
791  //Step 3.3.3:
792  //Crop feature according to RT fit (2.5*sigma) and remove badly fitting traces
793  //------------------------------------------------------------------
794  MassTraces new_traces;
795  cropFeature_(fitter, traces, new_traces);
796 
797  //------------------------------------------------------------------
798  //Step 3.3.4:
799  //Check if feature is ok
800  //------------------------------------------------------------------
801  String error_msg = "";
802 
803  DoubleReal fit_score = 0.0;
804  DoubleReal correlation = 0.0;
805  DoubleReal final_score = 0.0;
806 
807  bool feature_ok = checkFeatureQuality_(fitter, new_traces, seed_mz, min_feature_score, error_msg, fit_score, correlation, final_score);
808 #ifdef _OPENMP
809 #pragma omp critical (FeatureFinderAlgorithmPicked_DEBUG)
810 #endif
811  {
812  //write debug output of feature
813  if (debug_)
814  {
815  writeFeatureDebugInfo_(fitter, traces, new_traces, feature_ok, error_msg, final_score, plot_nr, peak);
816  }
817  }
818  traces = new_traces;
819 
820 
821  //validity output
822  if (!feature_ok)
823  {
824  abort_(seeds[i], error_msg);
825  //continue;
826  }
827  else
828  {
829 
830  //------------------------------------------------------------------
831  //Step 3.3.5:
832  //Feature creation
833  //------------------------------------------------------------------
834  Feature f;
835  //set label
836  f.setMetaValue(3, plot_nr);
837  f.setCharge(c);
838  f.setOverallQuality(final_score);
839  f.setMetaValue("score_fit", fit_score);
840  f.setMetaValue("score_correlation", correlation);
841  f.setRT(fitter->getCenter());
842  f.setWidth(fitter->getFWHM());
843 
844  // Extract some of the model parameters.
845  if (egh_tau != 0.0)
846  {
847  egh_tau = (static_cast<EGHTraceFitter<PeakType>*>(fitter))->getTau();
848  f.setMetaValue("EGH_tau", egh_tau);
849  f.setMetaValue("EGH_height", (static_cast<EGHTraceFitter<PeakType>*>(fitter))->getHeight());
850  f.setMetaValue("EGH_sigma", (static_cast<EGHTraceFitter<PeakType>*>(fitter))->getSigmaSquare());
851  }
852 
853  // Calculate the mass of the feature: maximum, average, monoisotopic
854  if (reported_mz_ == "maximum")
855  {
856  f.setMZ(traces[traces.getTheoreticalmaxPosition()].getAvgMZ());
857  }
858  else if (reported_mz_ == "average")
859  {
860  DoubleReal total_intensity = 0.0;
861  DoubleReal average_mz = 0.0;
862  for (Size t = 0; t < traces.size(); ++t)
863  {
864  for (Size p = 0; p < traces[t].peaks.size(); ++p)
865  {
866  average_mz += traces[t].peaks[p].second->getMZ() * traces[t].peaks[p].second->getIntensity();
867  total_intensity += traces[t].peaks[p].second->getIntensity();
868  }
869  }
870  average_mz /= total_intensity;
871  f.setMZ(average_mz);
872  }
873  else if (reported_mz_ == "monoisotopic")
874  {
875  DoubleReal mono_mz = traces[traces.getTheoreticalmaxPosition()].getAvgMZ();
876  mono_mz -= (Constants::PROTON_MASS_U / c) * (traces.getTheoreticalmaxPosition() + best_pattern.theoretical_pattern.trimmed_left);
877  f.setMZ(mono_mz);
878  }
879 
880  // Calculate intensity based on model only
881  // - the model does not include the baseline, so we ignore it here
882  // - as we scaled the isotope distribution to
883  f.setIntensity(
884  fitter->getFeatureIntensityContribution() // was 2.5 * fitter->getHeight() * sigma
885  / getIsotopeDistribution_(f.getMZ()).max);
886 
887  // we do not need the fitter anymore
888  delete fitter;
889 
890  //add convex hulls of mass traces
891  for (Size j = 0; j < traces.size(); ++j)
892  {
893  f.getConvexHulls().push_back(traces[j].getConvexhull());
894  }
895 
896 #ifdef _OPENMP
897 #pragma omp critical (FeatureFinderAlgorithmPicked_TMPFEATUREMAP)
898 #endif
899  {
900  tmp_feature_map[i] = f;
901  }
902 
903  //----------------------------------------------------------------
904  //Remember all seeds that lie inside the convex hull of the new feature
906  for (Size j = i + 1; j < seeds.size(); ++j)
907  {
908  DoubleReal rt = map_[seeds[j].spectrum].getRT();
909  DoubleReal mz = map_[seeds[j].spectrum][seeds[j].peak].getMZ();
910  if (bb.encloses(rt, mz) && f.encloses(rt, mz))
911  {
912 #ifdef _OPENMP
913 #pragma omp critical (FeatureFinderAlgorithmPicked_SEEDSINFEATURES)
914 #endif
915  {
916  seeds_in_features[i].push_back(j);
917  }
918  }
919  }
920  }
921  }
922  } // three if/else statements instead of continue (disallowed in OpenMP)
923  } // end of OPENMP over seeds
924 
925  // Here we have to evaluate which seeds are already contained in
926  // features of seeds with higher intensities. Only if the seed is not
927  // used in any feature with higher intensity, we can add it to the
928  // features_ list.
929  std::vector<Size> seeds_contained;
930  for (typename std::map<Size, FeatureType>::iterator iter = tmp_feature_map.begin(); iter != tmp_feature_map.end(); ++iter)
931  {
932  Size seed_nr = iter->first;
933  bool is_used = false;
934  for (Size i = 0; i < seeds_contained.size(); ++i)
935  {
936  if (seed_nr == seeds_contained[i]) { is_used = true; break; }
937  }
938  if (!is_used)
939  {
940  ++feature_candidates;
941 
942  //re-set label
943  iter->second.setMetaValue(3, feature_nr_global);
944  ++feature_nr_global;
945  features_->push_back(iter->second);
946 
947  std::vector<Size> curr_seed = seeds_in_features[seed_nr];
948  for (Size k = 0; k < curr_seed.size(); ++k)
949  {
950  seeds_contained.push_back(curr_seed[k]);
951 
952  }
953  }
954  }
955 
957  std::cout << "Found " << feature_candidates << " feature candidates for charge " << c << "." << std::endl;
958  }
959  // END OPENMP
960 
961  //------------------------------------------------------------------
962  //Step 4:
963  //Resolve contradicting and overlapping features
964  //------------------------------------------------------------------
965  ff_->startProgress(0, features_->size() * features_->size(), "Resolving overlapping features");
966  if (debug_) log_ << "Resolving intersecting features (" << features_->size() << " candidates)" << std::endl;
967  //sort features according to m/z in order to speed up the resolution
968  features_->sortByMZ();
969  //precalculate BBs and maximum mz span
970  std::vector<DBoundingBox<2> > bbs(features_->size());
971  DoubleReal max_mz_span = 0.0;
972 
973  for (Size i = 0; i < features_->size(); ++i)
974  {
975  bbs[i] = (*features_)[i].getConvexHull().getBoundingBox();
976  if (bbs[i].height() > max_mz_span)
977  {
978  max_mz_span = bbs[i].height();
979  }
980  }
981 
982  Size removed(0);
983  //intersect
984  for (Size i = 0; i < features_->size(); ++i)
985  {
986  Feature& f1((*features_)[i]);
987  for (Size j = i + 1; j < features_->size(); ++j)
988  {
989  ff_->setProgress(i * features_->size() + j);
990  Feature& f2((*features_)[j]);
991  //features that are more than 2 times the maximum m/z span apart do not overlap => abort
992  if (f2.getMZ() - f1.getMZ() > 2.0 * max_mz_span) break;
993  //do nothing if one of the features is already removed
994  if (f1.getIntensity() == 0.0 || f2.getIntensity() == 0.0) continue;
995  //do nothing if the overall convex hulls do not overlap
996  if (!bbs[i].intersects(bbs[j])) continue;
997  //act depending on the intersection
998  DoubleReal intersection = intersection_(f1, f2);
999 
1000  if (intersection >= max_feature_intersection_)
1001  {
1002  ++removed;
1003 
1004  if (debug_) log_ << " - Intersection (" << (i + 1) << "/" << (j + 1) << "): " << intersection << std::endl;
1005  if (f1.getCharge() == f2.getCharge())
1006  {
1007  if (f1.getIntensity() * f1.getOverallQuality() > f2.getIntensity() * f2.getOverallQuality())
1008  {
1009  if (debug_) log_ << " - same charge -> removing duplicate " << (j + 1) << std::endl;
1010  f1.getSubordinates().push_back(f2);
1011  f2.setIntensity(0.0);
1012  }
1013  else
1014  {
1015  if (debug_) log_ << " - same charge -> removing duplicate " << (i + 1) << std::endl;
1016  f2.getSubordinates().push_back(f1);
1017  f1.setIntensity(0.0);
1018  }
1019  }
1020  else if (f2.getCharge() % f1.getCharge() == 0)
1021  {
1022  if (debug_) log_ << " - different charge (one is the multiple of the other) -> removing lower charge " << (i + 1) << std::endl;
1023  f2.getSubordinates().push_back(f1);
1024  f1.setIntensity(0.0);
1025  }
1026  else if (f1.getCharge() % f2.getCharge() == 0)
1027  {
1028  if (debug_) log_ << " - different charge (one is the multiple of the other) -> removing lower charge " << (i + 1) << std::endl;
1029  f1.getSubordinates().push_back(f2);
1030  f2.setIntensity(0.0);
1031  }
1032  else
1033  {
1034  if (f1.getOverallQuality() > f2.getOverallQuality())
1035  {
1036  if (debug_) log_ << " - different charge -> removing lower score " << (j + 1) << std::endl;
1037  f1.getSubordinates().push_back(f2);
1038  f2.setIntensity(0.0);
1039  }
1040  else
1041  {
1042  if (debug_) log_ << " - different charge -> removing lower score " << (i + 1) << std::endl;
1043  f2.getSubordinates().push_back(f1);
1044  f1.setIntensity(0.0);
1045  }
1046  }
1047  }
1048  }
1049  }
1050  LOG_INFO << "Removed " << removed << " overlapping features." << std::endl;
1051  //finally remove features with intensity 0
1052  FeatureMap<> tmp;
1053  tmp.reserve(features_->size());
1054  for (Size i = 0; i < features_->size(); ++i)
1055  {
1056  if (features_->operator[](i).getIntensity() != 0.0)
1057  {
1058  tmp.push_back(features_->operator[](i));
1059  }
1060  }
1062  //sort features by intensity
1063  features_->sortByIntensity(true);
1064  ff_->endProgress();
1065  std::cout << features_->size() << " features left." << std::endl;
1066 
1067  //Abort reasons
1068  std::cout << std::endl;
1069  std::cout << "Abort reasons during feature construction:" << std::endl;
1070  for (std::map<String, UInt>::const_iterator it = aborts_.begin(); it != aborts_.end(); ++it)
1071  {
1072  std::cout << "- " << it->first << ": " << it->second << std::endl;
1073  }
1074  if (debug_)
1075  {
1076  //store map of abort reasons for failed seeds
1077  FeatureMap<> abort_map;
1078  abort_map.reserve(abort_reasons_.size());
1079  Size counter = 0;
1080  for (typename std::map<Seed, String>::iterator it2 = abort_reasons_.begin(); it2 != abort_reasons_.end(); ++it2, ++counter)
1081  {
1082  Feature f;
1083  f.setRT(map_[it2->first.spectrum].getRT());
1084  f.setMZ(map_[it2->first.spectrum][it2->first.peak].getMZ());
1085  f.setIntensity(map_[it2->first.spectrum][it2->first.peak].getIntensity());
1086  f.setMetaValue("label", it2->second);
1087  f.setUniqueId(counter); // ID = index
1088  abort_map.push_back(f);
1089  }
1090  abort_map.setUniqueId();
1091  FeatureXMLFile().store("debug/abort_reasons.featureXML", abort_map);
1092 
1093  //store input map with calculated scores (without overall score)
1094  for (Size s = 0; s < map_.size(); ++s)
1095  {
1096  map_[s].getFloatDataArrays().erase(map_[s].getFloatDataArrays().begin() + 2);
1097  }
1098  MzMLFile().store("debug/input.mzML", map_);
1099  }
1100 
1101  }
1102 
1104  {
1105  return new FeatureFinderAlgorithmPicked();
1106  }
1107 
1108  static const String getProductName()
1109  {
1110  return "centroided";
1111  }
1112 
1113 protected:
1117  mutable std::ofstream log_;
1119  bool debug_;
1121  std::map<String, UInt> aborts_;
1123  std::map<Seed, String> abort_reasons_;
1126 
1128 
1145 
1146 
1148 
1154  std::vector<std::vector<std::vector<DoubleReal> > > intensity_thresholds_;
1156 
1158  std::vector<TheoreticalIsotopePattern> isotope_distributions_;
1159 
1160  // Docu in base class
1161  virtual void updateMembers_()
1162  {
1163  pattern_tolerance_ = param_.getValue("mass_trace:mz_tolerance");
1164  trace_tolerance_ = param_.getValue("isotopic_pattern:mz_tolerance");
1165  min_spectra_ = (UInt) std::floor((DoubleReal)param_.getValue("mass_trace:min_spectra") * 0.5);
1166  max_missing_trace_peaks_ = param_.getValue("mass_trace:max_missing");
1167  slope_bound_ = param_.getValue("mass_trace:slope_bound");
1168  intensity_percentage_ = (DoubleReal)param_.getValue("isotopic_pattern:intensity_percentage") / 100.0;
1169  intensity_percentage_optional_ = (DoubleReal)param_.getValue("isotopic_pattern:intensity_percentage_optional") / 100.0;
1170  optional_fit_improvement_ = (DoubleReal)param_.getValue("isotopic_pattern:optional_fit_improvement") / 100.0;
1171  mass_window_width_ = param_.getValue("isotopic_pattern:mass_window_width");
1172  intensity_bins_ = param_.getValue("intensity:bins");
1173  min_isotope_fit_ = param_.getValue("feature:min_isotope_fit");
1174  min_trace_score_ = param_.getValue("feature:min_trace_score");
1175  min_rt_span_ = param_.getValue("feature:min_rt_span");
1176  max_rt_span_ = param_.getValue("feature:max_rt_span");
1177  max_feature_intersection_ = param_.getValue("feature:max_intersection");
1178  reported_mz_ = param_.getValue("feature:reported_mz");
1179  }
1180 
1182  void abort_(const Seed& seed, const String& reason)
1183  {
1184  if (debug_) log_ << "Abort: " << reason << std::endl;
1185  aborts_[reason]++;
1186  if (debug_) abort_reasons_[seed] = reason;
1187  }
1188 
1193  DoubleReal intersection_(const Feature& f1, const Feature& f2) const
1194  {
1195  //calculate the RT range sum of feature 1
1196  DoubleReal s1 = 0.0;
1197  const std::vector<ConvexHull2D>& hulls1 = f1.getConvexHulls();
1198  for (Size i = 0; i < hulls1.size(); ++i)
1199  {
1200  s1 += hulls1[i].getBoundingBox().width();
1201  }
1202 
1203  //calculate the RT range sum of feature 2
1204  DoubleReal s2 = 0.0;
1205  const std::vector<ConvexHull2D>& hulls2 = f2.getConvexHulls();
1206  for (Size j = 0; j < hulls2.size(); ++j)
1207  {
1208  s2 += hulls2[j].getBoundingBox().width();
1209  }
1210 
1211  //calculate overlap
1212  DoubleReal overlap = 0.0;
1213  for (Size i = 0; i < hulls1.size(); ++i)
1214  {
1215  DBoundingBox<2> bb1 = hulls1[i].getBoundingBox();
1216  for (Size j = 0; j < hulls2.size(); ++j)
1217  {
1218  DBoundingBox<2> bb2 = hulls2[j].getBoundingBox();
1219  if (bb1.intersects(bb2))
1220  {
1221  if (bb1.minPosition()[0] <= bb2.minPosition()[0] &&
1222  bb1.maxPosition()[0] >= bb2.maxPosition()[0]) //bb1 contains bb2
1223  {
1224  overlap += bb2.width();
1225  }
1226  else if (bb2.minPosition()[0] <= bb1.minPosition()[0] &&
1227  bb2.maxPosition()[0] >= bb1.maxPosition()[0]) //bb2 contains bb1
1228  {
1229  overlap += bb1.width();
1230  }
1231  else if (bb1.minPosition()[0] <= bb2.minPosition()[0] &&
1232  bb1.maxPosition()[0] <= bb2.maxPosition()[0]) //the end of bb1 overlaps with bb2
1233  {
1234  overlap += bb1.maxPosition()[0] - bb2.minPosition()[0];
1235  }
1236  else if (bb2.minPosition()[0] <= bb1.minPosition()[0] &&
1237  bb2.maxPosition()[0] <= bb1.maxPosition()[0]) //the end of bb2 overlaps with bb1
1238  {
1239  overlap += bb2.maxPosition()[0] - bb1.minPosition()[0];
1240  }
1241  }
1242  }
1243  }
1244 
1245  return overlap / std::min(s1, s2);
1246  }
1247 
1250  {
1251  //calculate index in the vector
1252  Size index = (Size) std::floor(mass / mass_window_width_);
1253 
1254  if (index >= isotope_distributions_.size())
1255  {
1256  throw Exception::InvalidValue(__FILE__, __LINE__, __PRETTY_FUNCTION__, "IsotopeDistribution not precalculated. Maximum allowed index is " + String(isotope_distributions_.size()), String(index));
1257  }
1258 
1259  //Return distribution
1260  return isotope_distributions_[index];
1261  }
1262 
1270  DoubleReal findBestIsotopeFit_(const Seed& center, UInt charge, IsotopePattern& best_pattern) const
1271  {
1272  if (debug_) log_ << "Testing isotope patterns for charge " << charge << ": " << std::endl;
1273  const SpectrumType& spectrum = map_[center.spectrum];
1274  const TheoreticalIsotopePattern& isotopes = getIsotopeDistribution_(spectrum[center.peak].getMZ() * charge);
1275  if (debug_) log_ << " - Seed: " << center.peak << " (mz:" << spectrum[center.peak].getMZ() << ")" << std::endl;
1276 
1277  //Find m/z boundaries of search space (linear search as this is local and we have the center already)
1278  DoubleReal mass_window = (DoubleReal)(isotopes.size() + 1) / (DoubleReal)charge;
1279  if (debug_) log_ << " - Mass window: " << mass_window << std::endl;
1280  Size end = center.peak;
1281  while (end < spectrum.size() &&
1282  spectrum[end].getMZ() < spectrum[center.peak].getMZ() + mass_window)
1283  {
1284  ++end;
1285  }
1286  --end;
1287 
1288  //search begin
1289  SignedSize begin = center.peak;
1290  while (begin >= 0 &&
1291  spectrum[begin].getMZ() > spectrum[center.peak].getMZ() - mass_window)
1292  {
1293  --begin;
1294  }
1295  ++begin;
1296  if (debug_) log_ << " - Begin: " << begin << " (mz:" << spectrum[begin].getMZ() << ")" << std::endl;
1297  if (debug_) log_ << " - End: " << end << " (mz:" << spectrum[end].getMZ() << ")" << std::endl;
1298 
1299  //fit isotope distribution to peaks
1300  DoubleReal max_score = 0.0;
1301  for (Size start = begin; start <= end; ++start)
1302  {
1303  //find isotope peaks for the current start peak
1304  Size peak_index = start;
1305  IsotopePattern pattern(isotopes.size());
1306  if (debug_) log_ << " - Fitting at " << start << " (mz:" << spectrum[start].getMZ() << ")" << std::endl;
1307  for (Size iso = 0; iso < isotopes.size(); ++iso)
1308  {
1309  DoubleReal pos = spectrum[start].getMZ() + iso / (DoubleReal)charge;
1310  findIsotope_(pos, center.spectrum, pattern, iso, peak_index);
1311  }
1312 
1313  //check if the seed is contained, otherwise abort
1314  bool seed_contained = false;
1315  for (Size iso = 0; iso < pattern.peak.size(); ++iso)
1316  {
1317  if (pattern.peak[iso] == (Int)center.peak && pattern.spectrum[iso] == center.spectrum)
1318  {
1319  seed_contained = true;
1320  break;
1321  }
1322  }
1323  if (!seed_contained)
1324  {
1325  if (debug_) log_ << " - aborting: seed is not contained!" << std::endl;
1326  continue;
1327  }
1328 
1329  DoubleReal score = isotopeScore_(isotopes, pattern, false);
1330 
1331  //check if the seed is still contained, otherwise abort
1332  seed_contained = false;
1333  for (Size iso = 0; iso < pattern.peak.size(); ++iso)
1334  {
1335  if (pattern.peak[iso] == (Int)center.peak &&
1336  pattern.spectrum[iso] == center.spectrum)
1337  {
1338  seed_contained = true;
1339  break;
1340  }
1341  }
1342  if (!seed_contained)
1343  {
1344  if (debug_) log_ << " - aborting: seed was removed during isotope fit!" << std::endl;
1345  continue;
1346  }
1347 
1348  if (debug_) log_ << " - final score: " << score << std::endl;
1349  if (score > max_score)
1350  {
1351  max_score = score;
1352  best_pattern = pattern;
1353  }
1354  }
1355  if (debug_) log_ << " - best score : " << max_score << std::endl;
1356  best_pattern.theoretical_pattern = isotopes;
1357  return max_score;
1358  }
1359 
1367  void extendMassTraces_(const IsotopePattern& pattern, MassTraces& traces, Size meta_index_overall) const
1368  {
1369  //find index of the trace with the maximum intensity
1370  DoubleReal max_int = 0.0;
1371  Size max_trace_index = 0;
1372  for (Size p = 0; p < pattern.peak.size(); ++p)
1373  {
1374  if (pattern.peak[p] < 0) continue; //skip missing and removed traces
1375  if (map_[pattern.spectrum[p]][pattern.peak[p]].getIntensity() > max_int)
1376  {
1377  max_int = map_[pattern.spectrum[p]][pattern.peak[p]].getIntensity();
1378  max_trace_index = p;
1379  }
1380  }
1381 
1382  //extend the maximum intensity trace to determine the boundaries in RT dimension
1383  Size start_index = pattern.spectrum[max_trace_index];
1384  const PeakType* start_peak = &(map_[pattern.spectrum[max_trace_index]][pattern.peak[max_trace_index]]);
1385  DoubleReal start_mz = start_peak->getMZ();
1386  DoubleReal start_rt = map_[start_index].getRT();
1387  if (debug_) log_ << " - Trace " << max_trace_index << " (maximum intensity)" << std::endl;
1388  if (debug_) log_ << " - extending from: " << map_[start_index].getRT() << " / " << start_mz << " (int: " << start_peak->getIntensity() << ")" << std::endl;
1389  //initialize the trace and extend
1390  MassTrace max_trace;
1391  max_trace.peaks.push_back(std::make_pair(start_rt, start_peak));
1392  extendMassTrace_(max_trace, start_index, start_mz, false, meta_index_overall);
1393  extendMassTrace_(max_trace, start_index, start_mz, true, meta_index_overall);
1394 
1395  DoubleReal rt_max = max_trace.peaks.back().first;
1396  DoubleReal rt_min = max_trace.peaks.begin()->first;
1397  if (debug_) log_ << " - rt bounds: " << rt_min << "-" << rt_max << std::endl;
1398  //Abort if too few peak were found
1399  if (!max_trace.isValid() || max_trace.peaks.size() < 2 * min_spectra_ - max_missing_trace_peaks_)
1400  {
1401  if (debug_) log_ << " - could not extend trace with maximum intensity => abort" << std::endl;
1402  return;
1403  }
1404  for (Size p = 0; p < pattern.peak.size(); ++p)
1405  {
1406  if (debug_) log_ << " - Trace " << p << std::endl;
1407  if (p == max_trace_index)
1408  {
1409  if (debug_) log_ << " - previously extended maximum trace" << std::endl;
1410  traces.push_back(max_trace);
1411  traces.back().theoretical_int = pattern.theoretical_pattern.intensity[p];
1412  traces.max_trace = traces.size() - 1;
1413  continue;
1414  }
1415  Seed starting_peak;
1416  starting_peak.spectrum = pattern.spectrum[p];
1417  starting_peak.peak = pattern.peak[p];
1418  if (pattern.peak[p] == -2)
1419  {
1420  if (debug_) log_ << " - removed during isotope fit" << std::endl;
1421  continue;
1422  }
1423  else if (pattern.peak[p] == -1)
1424  {
1425  if (debug_) log_ << " - missing" << std::endl;
1426  continue;
1427  }
1428  starting_peak.intensity = map_[starting_peak.spectrum][starting_peak.peak].getIntensity();
1429  if (debug_) log_ << " - trace seed: " << map_[starting_peak.spectrum].getRT() << " / " << map_[starting_peak.spectrum][starting_peak.peak].getMZ() << " (int: " << map_[starting_peak.spectrum][starting_peak.peak].getIntensity() << ")" << std::endl;
1430 
1431  //search for nearby maximum of the mass trace as the extension assumes that it starts at the maximum
1432  Size begin = std::max((Size)0, starting_peak.spectrum - min_spectra_);
1433  Size end = std::min(starting_peak.spectrum + min_spectra_, (Size)map_.size());
1434  DoubleReal mz = map_[starting_peak.spectrum][starting_peak.peak].getMZ();
1435  DoubleReal inte = map_[starting_peak.spectrum][starting_peak.peak].getIntensity();
1436  for (Size spectrum_index = begin; spectrum_index < end; ++spectrum_index)
1437  {
1438  //find better seeds (no-empty scan/low mz diff/higher intensity)
1439  SignedSize peak_index = -1;
1440  try
1441  {
1442  peak_index = map_[spectrum_index].findNearest(map_[starting_peak.spectrum][starting_peak.peak].getMZ());
1443  }
1444  catch (...) //no peaks in the spectrum
1445  {
1446  peak_index = -1;
1447  }
1448 
1449  if (peak_index < 0 ||
1450  map_[spectrum_index][peak_index].getIntensity() <= inte ||
1451  std::fabs(mz - map_[spectrum_index][peak_index].getMZ()) >= pattern_tolerance_
1452  )
1453  {
1454  continue;
1455  }
1456 
1457  starting_peak.spectrum = spectrum_index;
1458  starting_peak.peak = peak_index;
1459  inte = map_[spectrum_index][peak_index].getIntensity();
1460  }
1461  if (debug_) log_ << " - extending from: " << map_[starting_peak.spectrum].getRT() << " / " << map_[starting_peak.spectrum][starting_peak.peak].getMZ() << " (int: " << map_[starting_peak.spectrum][starting_peak.peak].getIntensity() << ")" << std::endl;
1462 
1463  //------------------------------------------------------------------
1464  //Extend seed to a mass trace
1465  MassTrace trace;
1466  const PeakType* seed = &(map_[starting_peak.spectrum][starting_peak.peak]);
1467  //initialize trace with seed data and extend
1468  trace.peaks.push_back(std::make_pair(map_[starting_peak.spectrum].getRT(), seed));
1469  extendMassTrace_(trace, starting_peak.spectrum, seed->getMZ(), false, meta_index_overall, rt_min, rt_max);
1470  extendMassTrace_(trace, starting_peak.spectrum, seed->getMZ(), true, meta_index_overall, rt_min, rt_max);
1471 
1472  //check if enough peaks were found
1473  if (!trace.isValid())
1474  {
1475  if (debug_) log_ << " - could not extend trace " << std::endl;
1476  //Missing traces in the middle of a pattern are not acceptable => fix this
1477  if (p < traces.max_trace)
1478  {
1479  traces.clear(); //remove earlier traces
1480  continue;
1481  }
1482  else if (p > traces.max_trace)
1483  {
1484  break; //no more traces are possible
1485  }
1486  }
1487  traces.push_back(trace);
1488  traces.back().theoretical_int = pattern.theoretical_pattern.intensity[p];
1489  }
1490  }
1491 
1510  void extendMassTrace_(MassTrace& trace, SignedSize spectrum_index, DoubleReal mz, bool increase_rt, Size meta_index_overall, DoubleReal min_rt = 0.0, DoubleReal max_rt = 0.0) const
1511  {
1512  //Reverse peaks if we run the method for the second time (to keep them in chronological order)
1513  if (increase_rt)
1514  {
1515  ++spectrum_index;
1516  std::reverse(trace.peaks.begin(), trace.peaks.end());
1517  }
1518  else
1519  {
1520  --spectrum_index;
1521  }
1522 
1523  //check if boundaries are set
1524  bool boundaries = false;
1525  if (max_rt != min_rt)
1526  {
1527  boundaries = true;
1528  }
1529 
1530  //Relax slope theshold if there is a hard boundary for the extension
1531  DoubleReal current_slope_bound = (1.0 + (DoubleReal)boundaries) * slope_bound_;
1532 
1533  Size delta_count = min_spectra_;
1534  std::vector<DoubleReal> deltas(delta_count - 1, 0);
1535 
1536  DoubleReal last_observed_intensity = trace.peaks.back().second->getIntensity();
1537 
1538  UInt missing_peaks = 0;
1539  Size peaks_before_extension = trace.peaks.size();
1540  String abort_reason = "";
1541 
1542  while ((!increase_rt && spectrum_index >= 0) || (increase_rt && spectrum_index < (SignedSize)map_.size()))
1543  {
1544  if (boundaries &&
1545  ((!increase_rt && map_[spectrum_index].getRT() < min_rt) ||
1546  (increase_rt && map_[spectrum_index].getRT() > max_rt))
1547  )
1548  {
1549  abort_reason = "Hit upper/lower boundary";
1550  break;
1551  }
1552 
1553  SignedSize peak_index = -1;
1554 
1555  try
1556  {
1557  peak_index = map_[spectrum_index].findNearest(mz);
1558  }
1559  catch (...) //no peaks in the spectrum
1560  {
1561  peak_index = -1;
1562  }
1563 
1564  // check if the peak is "missing"
1565  if (
1566  peak_index < 0 // no peak found
1567  || map_[spectrum_index].getFloatDataArrays()[meta_index_overall][peak_index] < 0.01 // overall score is to low
1568  || positionScore_(mz, map_[spectrum_index][peak_index].getMZ(), trace_tolerance_) == 0.0 // deviation of mz is too big
1569  )
1570  {
1571  ++missing_peaks;
1572 
1573  if (missing_peaks > max_missing_trace_peaks_)
1574  {
1575  abort_reason = "too many peaks missing";
1576  break;
1577  }
1578  }
1579  else
1580  {
1581  missing_peaks = 0;
1582 
1583  //add found peak to trace
1584  trace.peaks.push_back(std::make_pair(map_[spectrum_index].getRT(), &(map_[spectrum_index][peak_index])));
1585 
1586  //update deltas and intensities
1587  deltas.push_back((map_[spectrum_index][peak_index].getIntensity() - last_observed_intensity) / last_observed_intensity);
1588  last_observed_intensity = map_[spectrum_index][peak_index].getIntensity();
1589 
1590  //Abort if the average delta is too big (as intensity increases then)
1591  DoubleReal average_delta = std::accumulate(deltas.end() - delta_count, deltas.end(), 0.0) / (DoubleReal)delta_count;
1592  if (average_delta > current_slope_bound)
1593  {
1594  abort_reason = String("Average delta above threshold: ") + average_delta + "/" + current_slope_bound;
1595 
1596  //remove last peaks as we extended too far
1597  Size remove = std::min((Size)(trace.peaks.size() - peaks_before_extension), delta_count - 1);
1598  trace.peaks.erase(trace.peaks.end() - remove, trace.peaks.end());
1599  break;
1600  }
1601  }
1602 
1603  //increase/decrease scan index
1604  if (increase_rt) ++spectrum_index;
1605  else --spectrum_index;
1606  }
1607  if (debug_) log_ << " - Added " << (trace.peaks.size() - peaks_before_extension) << " peaks (abort: " << abort_reason << ")" << std::endl;
1608  }
1609 
1611  template <typename SpectrumType>
1612  Size nearest_(DoubleReal pos, const SpectrumType& spec, Size start) const
1613  {
1614  Size index = start;
1615  DoubleReal distance = std::fabs(pos - spec[index].getMZ());
1616  ++index;
1617  while (index < spec.size())
1618  {
1619  DoubleReal new_distance = std::fabs(pos - spec[index].getMZ());
1620  if (new_distance < distance)
1621  {
1622  distance = new_distance;
1623  ++index;
1624  }
1625  else
1626  {
1627  break;
1628  }
1629  }
1630  return --index;
1631  }
1632 
1642  void findIsotope_(DoubleReal pos, Size spectrum_index, IsotopePattern& pattern, Size pattern_index, Size& peak_index) const
1643  {
1644  if (debug_) log_ << " - Isotope " << pattern_index << ": ";
1645 
1646  DoubleReal intensity = 0.0;
1647  DoubleReal pos_score = 0.0;
1648  UInt matches = 0;
1649 
1650  //search in the center spectrum
1651  const SpectrumType& spectrum = map_[spectrum_index];
1652  peak_index = nearest_(pos, spectrum, peak_index);
1653  DoubleReal mz_score = positionScore_(pos, spectrum[peak_index].getMZ(), pattern_tolerance_);
1654  pattern.theoretical_mz[pattern_index] = pos;
1655 
1656  if (mz_score != 0.0)
1657  {
1658  if (debug_) log_ << String::number(spectrum[peak_index].getIntensity(), 1) << " ";
1659  pattern.peak[pattern_index] = peak_index;
1660  pattern.spectrum[pattern_index] = spectrum_index;
1661  intensity += spectrum[peak_index].getIntensity();
1662  pos_score += mz_score;
1663  ++matches;
1664  }
1665 
1666  //previous spectrum
1667  if (spectrum_index != 0 && !map_[spectrum_index - 1].empty())
1668  {
1669  const SpectrumType& spectrum_before = map_[spectrum_index - 1];
1670  Size index_before = spectrum_before.findNearest(pos);
1671  DoubleReal mz_score = positionScore_(pos, spectrum_before[index_before].getMZ(), pattern_tolerance_);
1672  if (mz_score != 0.0)
1673  {
1674  if (debug_) log_ << String::number(spectrum_before[index_before].getIntensity(), 1) << "b ";
1675  intensity += spectrum_before[index_before].getIntensity();
1676  pos_score += mz_score;
1677  ++matches;
1678 
1679  if (pattern.peak[pattern_index] == -1)
1680  {
1681  pattern.peak[pattern_index] = index_before;
1682  pattern.spectrum[pattern_index] = spectrum_index - 1;
1683  }
1684  }
1685  }
1686  //next spectrum
1687  if (spectrum_index != map_.size() - 1 && !map_[spectrum_index + 1].empty())
1688  {
1689  const SpectrumType& spectrum_after = map_[spectrum_index + 1];
1690  Size index_after = spectrum_after.findNearest(pos);
1691  DoubleReal mz_score = positionScore_(pos, spectrum_after[index_after].getMZ(), pattern_tolerance_);
1692  if (mz_score != 0.0)
1693  {
1694  if (debug_) log_ << String::number(spectrum_after[index_after].getIntensity(), 1) << "a ";
1695  intensity += spectrum_after[index_after].getIntensity();
1696  pos_score += mz_score;
1697  ++matches;
1698 
1699  if (pattern.peak[pattern_index] == -1)
1700  {
1701  pattern.peak[pattern_index] = index_after;
1702  pattern.spectrum[pattern_index] = spectrum_index + 1;
1703  }
1704  }
1705  }
1706  //no isotope found
1707  if (matches == 0)
1708  {
1709  if (debug_) log_ << " missing" << std::endl;
1710  pattern.peak[pattern_index] = -1;
1711  pattern.mz_score[pattern_index] = 0.0;
1712  pattern.intensity[pattern_index] = 0.0;
1713  }
1714  else
1715  {
1716  if (debug_) log_ << "=> " << intensity / matches << std::endl;
1717  pattern.mz_score[pattern_index] = pos_score / matches;
1718  pattern.intensity[pattern_index] = intensity / matches;
1719  }
1720  }
1721 
1723  DoubleReal positionScore_(DoubleReal pos1, DoubleReal pos2, DoubleReal allowed_deviation) const
1724  {
1725  DoubleReal diff = fabs(pos1 - pos2);
1726  if (diff <= 0.5 * allowed_deviation)
1727  {
1728  return 0.1 * (0.5 * allowed_deviation - diff) / (0.5 * allowed_deviation) + 0.9;
1729  }
1730  else if (diff <= allowed_deviation)
1731  {
1732  return 0.9 * (allowed_deviation - diff) / (0.5 * allowed_deviation);
1733  }
1734  return 0.0;
1735  }
1736 
1738  DoubleReal isotopeScore_(const TheoreticalIsotopePattern& isotopes, IsotopePattern& pattern, bool consider_mz_distances) const
1739  {
1740  if (debug_) log_ << " - fitting " << pattern.intensity.size() << " peaks" << std::endl;
1741  //Abort if a core peak is missing
1742  for (Size iso = 0 + isotopes.optional_begin; iso < pattern.peak.size() - isotopes.optional_end; ++iso)
1743  {
1744  if (pattern.peak[iso] == -1)
1745  {
1746  if (debug_) log_ << " - aborting: core peak is missing" << std::endl;
1747  return 0.0;
1748  }
1749  }
1750  //Find best isotope fit
1751  // - try to leave out optional isotope peaks to improve the fit
1752  // - do not allow gaps inside the pattern
1753  DoubleReal best_int_score = 0.01; //Not 0 as this would result in problems when checking for the percental improvement
1754  Size best_begin = 0;
1755  for (Size i = isotopes.optional_begin; i > 0; --i)
1756  {
1757  if (pattern.peak[i - 1] == -1)
1758  {
1759  best_begin = i;
1760  break;
1761  }
1762  }
1763  Size best_end = 0;
1764  for (Size i = isotopes.optional_end; i > 0; --i)
1765  {
1766  if (pattern.peak[pattern.peak.size() - i] == -1)
1767  {
1768  best_end = i;
1769  break;
1770  }
1771  }
1772  if (debug_) log_ << " - best_begin/end: " << best_begin << "/" << best_end << std::endl;
1773  for (Size b = best_begin; b <= isotopes.optional_begin; ++b)
1774  {
1775  for (Size e = best_end; e <= isotopes.optional_end; ++e)
1776  {
1777  //Make sure we have more than 2 peaks (unless in the first loop interation, there we allow two points)
1778  if (isotopes.size() - b - e > 2 || (b == best_begin &&
1779  e == best_end &&
1780  isotopes.size() - b - e > 1))
1781  {
1782  DoubleReal int_score = Math::pearsonCorrelationCoefficient(isotopes.intensity.begin() + b, isotopes.intensity.end() - e, pattern.intensity.begin() + b, pattern.intensity.end() - e);
1783  if (boost::math::isnan(int_score)) int_score = 0.0;
1784  if (isotopes.size() - b - e == 2 && int_score > min_isotope_fit_) int_score = min_isotope_fit_; //special case for the first loop iteration (otherwise the score is 1)
1785  if (debug_) log_ << " - fit (" << b << "/" << e << "): " << int_score;
1786  if (int_score / best_int_score >= 1.0 + optional_fit_improvement_)
1787  {
1788  if (debug_) log_ << " - new best fit ";
1789  best_int_score = int_score;
1790  best_begin = b;
1791  best_end = e;
1792  }
1793  if (debug_) log_ << std::endl;
1794  }
1795  }
1796  }
1797 
1798  //if the best fit is empty, abort
1799  if (pattern.mz_score.size() - best_begin - best_end == 0)
1800  {
1801  return 0.0;
1802  }
1803 
1804  //remove left out peaks from the beginning
1805  for (Size i = 0; i < best_begin; ++i)
1806  {
1807  pattern.peak[i] = -2;
1808  pattern.intensity[i] = 0.0;
1809  pattern.mz_score[i] = 0.0;
1810  }
1811  //remove left out peaks from the end
1812  for (Size i = 0; i < best_end; ++i)
1813  {
1814  pattern.peak[isotopes.size() - 1 - i] = -2;
1815  pattern.intensity[isotopes.size() - 1 - i] = 0.0;
1816  pattern.mz_score[isotopes.size() - 1 - i] = 0.0;
1817  }
1818  //calculate m/z score (if required)
1819  if (consider_mz_distances)
1820  {
1821  best_int_score *= std::accumulate(pattern.mz_score.begin() + best_begin, pattern.mz_score.end() - best_end, 0.0) / (pattern.mz_score.size() - best_begin - best_end);
1822  }
1823 
1824  //return final score
1825  OPENMS_POSTCONDITION(best_int_score >= 0.0, (String("Internal error: Isotope score (") + best_int_score + ") should be >=0.0").c_str())
1826  OPENMS_POSTCONDITION(best_int_score <= 1.0, (String("Internal error: Isotope score (") + best_int_score + ") should be <=1.0").c_str())
1827  return best_int_score;
1828  }
1829 
1840  DoubleReal intensityScore_(Size spectrum, Size peak) const
1841  {
1842  // calculate (half) bin numbers
1843  DoubleReal intensity = map_[spectrum][peak].getIntensity();
1844  DoubleReal rt = map_[spectrum].getRT();
1845  DoubleReal mz = map_[spectrum][peak].getMZ();
1846  DoubleReal rt_min = map_.getMinRT();
1847  DoubleReal mz_min = map_.getMinMZ();
1848  UInt rt_bin = std::min(2 * intensity_bins_ - 1, (UInt) std::floor((rt - rt_min) / intensity_rt_step_ * 2.0));
1849  UInt mz_bin = std::min(2 * intensity_bins_ - 1, (UInt) std::floor((mz - mz_min) / intensity_mz_step_ * 2.0));
1850  // determine mz bins
1851  UInt ml, mh;
1852  if (mz_bin == 0 || mz_bin == 2 * intensity_bins_ - 1)
1853  {
1854  ml = mz_bin / 2;
1855  mh = mz_bin / 2;
1856  }
1857  else if (Math::isOdd(mz_bin))
1858  {
1859  ml = mz_bin / 2;
1860  mh = mz_bin / 2 + 1;
1861  }
1862  else
1863  {
1864  ml = mz_bin / 2 - 1;
1865  mh = mz_bin / 2;
1866  }
1867  // determine rt bins
1868  UInt rl, rh;
1869  if (rt_bin == 0 || rt_bin == 2 * intensity_bins_ - 1)
1870  {
1871  rl = rt_bin / 2;
1872  rh = rt_bin / 2;
1873  }
1874  else if (Math::isOdd(rt_bin))
1875  {
1876  rl = rt_bin / 2;
1877  rh = rt_bin / 2 + 1;
1878  }
1879  else
1880  {
1881  rl = rt_bin / 2 - 1;
1882  rh = rt_bin / 2;
1883  }
1884  // calculate distances to surrounding bin centers (normalized to [0,1])
1885  DoubleReal drl = std::fabs(rt_min + (0.5 + rl) * intensity_rt_step_ - rt) / intensity_rt_step_;
1886  DoubleReal drh = std::fabs(rt_min + (0.5 + rh) * intensity_rt_step_ - rt) / intensity_rt_step_;
1887  DoubleReal dml = std::fabs(mz_min + (0.5 + ml) * intensity_mz_step_ - mz) / intensity_mz_step_;
1888  DoubleReal dmh = std::fabs(mz_min + (0.5 + mh) * intensity_mz_step_ - mz) / intensity_mz_step_;
1889  // Calculate weights for the intensity scores based on the distances to the
1890  // bin center(the nearer to better)
1891  DoubleReal d1 = std::sqrt(std::pow(1.0 - drl, 2) + std::pow(1.0 - dml, 2));
1892  DoubleReal d2 = std::sqrt(std::pow(1.0 - drh, 2) + std::pow(1.0 - dml, 2));
1893  DoubleReal d3 = std::sqrt(std::pow(1.0 - drl, 2) + std::pow(1.0 - dmh, 2));
1894  DoubleReal d4 = std::sqrt(std::pow(1.0 - drh, 2) + std::pow(1.0 - dmh, 2));
1895  DoubleReal d_sum = d1 + d2 + d3 + d4;
1896  // Final score .. intensityScore in the surrounding bins, weighted by the distance of the
1897  // bin center to the peak
1898  DoubleReal final = intensityScore_(rl, ml, intensity) * (d1 / d_sum)
1899  + intensityScore_(rh, ml, intensity) * (d2 / d_sum)
1900  + intensityScore_(rl, mh, intensity) * (d3 / d_sum)
1901  + intensityScore_(rh, mh, intensity) * (d4 / d_sum);
1902 
1903  OPENMS_POSTCONDITION(final >= 0.0, (String("Internal error: Intensity score (") + final + ") should be >=0.0").c_str())
1904  OPENMS_POSTCONDITION(final <= 1.0001, (String("Internal error: Intensity score (") + final + ") should be <=1.0").c_str())
1905  return final;
1906  }
1907 
1915  {
1916  // choose fitter
1917  if (param_.getValue("feature:rt_shape") == "asymmetric")
1918  {
1919  LOG_DEBUG << "use asymmetric rt peak shape" << std::endl;
1920  tau = -1.0;
1921  return new EGHTraceFitter<PeakType>();
1922  }
1923  else // if (param_.getValue("feature:rt_shape") == "symmetric")
1924  {
1925  LOG_DEBUG << "use symmetric rt peak shape" << std::endl;
1926  return new GaussTraceFitter<PeakType>();
1927  }
1928  }
1929 
1930  DoubleReal intensityScore_(Size rt_bin, Size mz_bin, DoubleReal intensity) const
1931  {
1932  // interpolate score value according to quantiles(20)
1933  const std::vector<DoubleReal>& quantiles20 = intensity_thresholds_[rt_bin][mz_bin];
1934  // get iterator pointing to quantile that is >= intensity
1935  std::vector<DoubleReal>::const_iterator it = std::lower_bound(quantiles20.begin(), quantiles20.end(), intensity);
1936  // bigger than the biggest value => return 1.0
1937  if (it == quantiles20.end())
1938  {
1939  return 1.0;
1940  }
1941  // interpolate inside the bin
1942  DoubleReal bin_score = 0.0;
1943  if (it == quantiles20.begin())
1944  {
1945  bin_score = 0.05 * intensity / *it;
1946  }
1947  else
1948  {
1949  // (intensity - vigintile_low) / (vigintile_high - vigintile_low)
1950  bin_score = 0.05 * (intensity - *(it - 1)) / (*it - *(it - 1));
1951  }
1952 
1953  DoubleReal final = bin_score +
1954  0.05 * ((it - quantiles20.begin()) - 1.0); // determine position of lower bound in the vector
1955 
1956  //fix numerical problems
1957  if (final < 0.0) final = 0.0;
1958  if (final > 1.0) final = 1.0;
1959 
1960  // final = 1/20 * [ index(vigintile_low) + (intensity-vigintile_low) / (vigintile_high - vigintile_low) ]
1961  return final;
1962  }
1963 
1970 
1980  const MassTraces& traces,
1981  MassTraces& new_traces)
1982  {
1983  DoubleReal low_bound = fitter->getLowerRTBound();
1984  DoubleReal high_bound = fitter->getUpperRTBound();
1985 
1986  if (debug_) log_ << " => RT bounds: " << low_bound << " - " << high_bound << std::endl;
1987  for (Size t = 0; t < traces.size(); ++t)
1988  {
1989  const MassTrace& trace = traces[t];
1990  if (debug_) log_ << " - Trace " << t << ": (" << trace.theoretical_int << ")" << std::endl;
1991 
1992  MassTrace new_trace;
1993  //compute average relative deviation and correlation
1994  DoubleReal deviation = 0.0;
1995  std::vector<DoubleReal> v_theo, v_real;
1996  for (Size k = 0; k < trace.peaks.size(); ++k)
1997  {
1998  //consider peaks when inside RT bounds only
1999  if (trace.peaks[k].first >= low_bound && trace.peaks[k].first <= high_bound)
2000  {
2001  new_trace.peaks.push_back(trace.peaks[k]);
2002 
2003  DoubleReal theo = traces.baseline + fitter->computeTheoretical(trace, k);
2004 
2005  v_theo.push_back(theo);
2006  DoubleReal real = trace.peaks[k].second->getIntensity();
2007  v_real.push_back(real);
2008  deviation += std::fabs(real - theo) / theo;
2009  }
2010  }
2011  DoubleReal fit_score = 0.0;
2012  DoubleReal correlation = 0.0;
2013  DoubleReal final_score = 0.0;
2014  if (!new_trace.peaks.empty())
2015  {
2016  fit_score = deviation / new_trace.peaks.size();
2017  correlation = std::max(0.0, Math::pearsonCorrelationCoefficient(v_theo.begin(), v_theo.end(), v_real.begin(), v_real.end()));
2018  final_score = std::sqrt(correlation * std::max(0.0, 1.0 - fit_score));
2019  }
2020  if (debug_) log_ << " - peaks: " << new_trace.peaks.size() << " / " << trace.peaks.size() << " - relative deviation: " << fit_score << " - correlation: " << correlation << " - final score: " << correlation << std::endl;
2021  //remove badly fitting traces
2022  if (!new_trace.isValid() || final_score < min_trace_score_)
2023  {
2024  if (t < traces.max_trace)
2025  {
2026  new_traces = MassTraces();
2027  if (debug_) log_ << " - removed this and previous traces due to bad fit" << std::endl;
2028  new_traces.clear(); //remove earlier traces
2029  continue;
2030  }
2031  else if (t == traces.max_trace)
2032  {
2033  new_traces = MassTraces();
2034  if (debug_) log_ << " - aborting (max trace was removed)" << std::endl;
2035  break;
2036  }
2037  else if (t > traces.max_trace)
2038  {
2039  if (debug_) log_ << " - removed due to bad fit => omitting the rest" << std::endl;
2040  break; //no more traces are possible
2041  }
2042  }
2043  //add new trace
2044  else
2045  {
2046  new_trace.theoretical_int = trace.theoretical_int;
2047  new_traces.push_back(new_trace);
2048  if (t == traces.max_trace)
2049  {
2050  new_traces.max_trace = new_traces.size() - 1;
2051  }
2052  }
2053  }
2054  new_traces.baseline = traces.baseline;
2055  }
2056 
2081  MassTraces& feature_traces,
2082  const DoubleReal& seed_mz, const DoubleReal& min_feature_score,
2083  String& error_msg, DoubleReal& fit_score, DoubleReal& correlation, DoubleReal& final_score)
2084  {
2085  bool feature_ok = true;
2086 
2087  //check if the sigma fit was ok (if it is larger than 'max_rt_span')
2088  if (feature_ok)
2089  {
2090  // 5.0 * sigma > max_rt_span_ * region_rt_span
2091  if (fitter->checkMaximalRTSpan(max_rt_span_))
2092  {
2093  feature_ok = false;
2094  error_msg = "Invalid fit: Fitted model is bigger than 'max_rt_span'";
2095  }
2096  }
2097 
2098  //check if the feature is valid
2099  if (!feature_traces.isValid(seed_mz, trace_tolerance_))
2100  {
2101  feature_ok = false;
2102  error_msg = "Invalid feature after fit - too few traces or peaks left";
2103  }
2104 
2105  //check if x0 is inside feature bounds
2106  if (feature_ok)
2107  {
2108  std::pair<DoubleReal, DoubleReal> rt_bounds = feature_traces.getRTBounds();
2109  if (fitter->getCenter() < rt_bounds.first || fitter->getCenter() > rt_bounds.second)
2110  {
2111  feature_ok = false;
2112  error_msg = "Invalid fit: Center outside of feature bounds";
2113  }
2114  }
2115 
2116  //check if the remaining traces fill out at least 'min_rt_span' of the RT span
2117  if (feature_ok)
2118  {
2119  std::pair<DoubleReal, DoubleReal> rt_bounds = feature_traces.getRTBounds();
2120  if (fitter->checkMinimalRTSpan(rt_bounds, min_rt_span_))
2121  {
2122  feature_ok = false;
2123  error_msg = "Invalid fit: Less than 'min_rt_span' left after fit";
2124  }
2125  }
2126 
2127  //check if feature quality is high enough (average relative deviation and correlation of the whole feature)
2128  if (feature_ok)
2129  {
2130  std::vector<DoubleReal> v_theo, v_real;
2131  DoubleReal deviation = 0.0;
2132  for (Size t = 0; t < feature_traces.size(); ++t)
2133  {
2134  MassTrace& trace = feature_traces[t];
2135  for (Size k = 0; k < trace.peaks.size(); ++k)
2136  {
2137  // was DoubleReal theo = new_traces.baseline + trace.theoretical_int * height * exp(-0.5 * pow(trace.peaks[k].first - x0, 2) / pow(sigma, 2) );
2138  DoubleReal theo = feature_traces.baseline + fitter->computeTheoretical(trace, k);
2139  v_theo.push_back(theo);
2140  DoubleReal real = trace.peaks[k].second->getIntensity();
2141  v_real.push_back(real);
2142  deviation += std::fabs(real - theo) / theo;
2143  }
2144  }
2145  fit_score = std::max(0.0, 1.0 - (deviation / feature_traces.getPeakCount()));
2146  correlation = std::max(0.0, Math::pearsonCorrelationCoefficient(v_theo.begin(), v_theo.end(), v_real.begin(), v_real.end()));
2147  final_score = std::sqrt(correlation * fit_score);
2148 
2149  if (final_score < min_feature_score)
2150  {
2151  feature_ok = false;
2152  error_msg = "Feature quality too low after fit";
2153  }
2154 
2155  //quality output
2156  if (debug_)
2157  {
2158  log_ << "Quality estimation:" << std::endl;
2159  log_ << " - relative deviation: " << fit_score << std::endl;
2160  log_ << " - correlation: " << correlation << std::endl;
2161  log_ << " => final score: " << final_score << std::endl;
2162  }
2163  }
2164 
2165  return feature_ok;
2166  }
2167 
2182  const MassTraces& traces,
2183  const MassTraces& new_traces,
2184  bool feature_ok, const String error_msg, const DoubleReal final_score, const Int plot_nr, const PeakType& peak,
2185  const String path = "debug/features/")
2186  {
2187 
2188  DoubleReal pseudo_rt_shift = param_.getValue("debug:pseudo_rt_shift");
2189  TextFile tf;
2190  //gnuplot script
2191  String script = String("plot \"") + path + plot_nr + ".dta\" title 'before fit (RT: " + String::number(fitter->getCenter(), 2) + " m/z: " + String::number(peak.getMZ(), 4) + ")' with points 1";
2192  //feature before fit
2193  for (Size k = 0; k < traces.size(); ++k)
2194  {
2195  for (Size j = 0; j < traces[k].peaks.size(); ++j)
2196  {
2197  tf.push_back(String(pseudo_rt_shift * k + traces[k].peaks[j].first) + "\t" + traces[k].peaks[j].second->getIntensity());
2198  }
2199  }
2200  tf.store(path + plot_nr + ".dta");
2201  //fitted feature
2202  if (new_traces.getPeakCount() != 0)
2203  {
2204  tf.clear();
2205  for (Size k = 0; k < new_traces.size(); ++k)
2206  {
2207  for (Size j = 0; j < new_traces[k].peaks.size(); ++j)
2208  {
2209  tf.push_back(String(pseudo_rt_shift * k + new_traces[k].peaks[j].first) + "\t" + new_traces[k].peaks[j].second->getIntensity());
2210  }
2211  }
2212 
2213  tf.store(path + plot_nr + "_cropped.dta");
2214  script = script + ", \"" + path + plot_nr + "_cropped.dta\" title 'feature ";
2215 
2216  if (!feature_ok)
2217  {
2218  script = script + " - " + error_msg;
2219  }
2220  else
2221  {
2222  script = script + (features_->size() + 1) + " (score: " + String::number(final_score, 3) + ")";
2223  }
2224  script = script + "' with points 3";
2225  }
2226  //fitted functions
2227  tf.clear();
2228  for (Size k = 0; k < traces.size(); ++k)
2229  {
2230  char fun = 'f';
2231  fun += (char)k;
2232  tf.push_back(fitter->getGnuplotFormula(traces[k], fun, traces.baseline, pseudo_rt_shift * k));
2233  //tf.push_back(String(fun)+"(x)= " + traces.baseline + " + " + fitter->getGnuplotFormula(traces[k], pseudo_rt_shift * k));
2234  script = script + ", " + fun + "(x) title 'Trace " + k + " (m/z: " + String::number(traces[k].getAvgMZ(), 4) + ")'";
2235  }
2236 
2237  //output
2238  tf.push_back("set xlabel \"pseudo RT (mass traces side-by-side)\"");
2239  tf.push_back("set ylabel \"intensity\"");
2240  tf.push_back("set samples 1000");
2241  tf.push_back(script);
2242  tf.push_back("pause -1");
2243  tf.store(path + plot_nr + ".plot");
2244  }
2245 
2247 private:
2248 
2253  };
2254 
2255 } // namespace OpenMS
2256 
2257 #endif // OPENMS_TRANSFORMATIONS_FEATUREFINDER_FEATUREFINDERALGORITHMPICKED_H
void sortByMZ()
Sort features by m/z position.
Definition: FeatureMap.h:322
const double k
bool encloses(const PositionType &position) const
Checks whether this range contains a certain point.
Definition: DBoundingBox.h:159
std::vector< std::vector< std::vector< DoubleReal > > > intensity_thresholds_
Precalculated intensity 20-quantiles (binned)
Definition: FeatureFinderAlgorithmPicked.h:1154
bool debug_
debug flag
Definition: FeatureFinderAlgorithmPicked.h:1119
bool isOdd(UInt x)
Returns true if the given interger is odd.
Definition: MathFunctions.h:124
void setMetaValue(const String &name, const DataValue &value)
sets the DataValue corresponding to a name
virtual DoubleReal getFeatureIntensityContribution()=0
MapType::SpectrumType SpectrumType
Definition: FeatureFinderAlgorithmPicked.h:93
const ChargeType & getCharge() const
Non-mutable access to charge state.
Size size() const
returns the size of the distribtion which is the number of isotopes in the distribution ...
virtual String getGnuplotFormula(FeatureFinderAlgorithmPickedHelperStructs::MassTrace< PeakType > const &trace, const char function_name, const DoubleReal baseline, const DoubleReal rt_shift)=0
ContainerType::iterator Iterator
Definition: IsotopeDistribution.h:70
FeatureMapType * features_
Output data pointer.
Definition: FeatureFinderAlgorithm.h:144
float Real
Real type.
Definition: Types.h:109
Param defaults_
Container for default parameters. This member should be filled in the constructor of derived classes!...
Definition: DefaultParamHandler.h:155
DoubleReal positionScore_(DoubleReal pos1, DoubleReal pos2, DoubleReal allowed_deviation) const
Calculates a score between 0 and 1 for the m/z deviation of two peaks.
Definition: FeatureFinderAlgorithmPicked.h:1723
DoubleReal intersection_(const Feature &f1, const Feature &f2) const
Definition: FeatureFinderAlgorithmPicked.h:1193
void setValue(const String &key, const DataValue &value, const String &description="", const StringList &tags=StringList())
Sets a value.
TraceFitter< PeakType > * chooseTraceFitter_(double &tau)
Choose a the best trace fitter for the current mass traces based on the user parameter (symmetric...
Definition: FeatureFinderAlgorithmPicked.h:1914
A more convenient string class.
Definition: String.h:56
DoubleReal intensity_percentage_
Isotope pattern intensity contribution of required peaks.
Definition: FeatureFinderAlgorithmPicked.h:1134
DoubleReal intensityScore_(Size spectrum, Size peak) const
Compute the intensity score for the peak peak in spectrum spectrum.
Definition: FeatureFinderAlgorithmPicked.h:1840
Size spectrum
Spectrum index.
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:65
DoubleReal min_isotope_fit_
Mimimum isotope pattern fit for a feature.
Definition: FeatureFinderAlgorithmPicked.h:1139
FeatureFinder * ff_
Pointer to the calling FeatureFinder that is used to access the feature flags.
Definition: FeatureFinderAlgorithm.h:147
A 2-dimensional raw data point or peak.
Definition: Peak2D.h:55
Size size() const
Definition: MSExperiment.h:117
void setMZ(CoordinateType coordinate)
Mutable access to the m/z coordinate (index 1)
Definition: Peak2D.h:197
FeatureFinderAlgorithm for picked peaks.
Definition: FeatureFinderAlgorithmPicked.h:84
QualityType getOverallQuality() const
Non-mutable access to the overall quality.
#define LOG_INFO
Macro if a information, e.g. a status should be reported.
Definition: LogStream.h:455
void cropFeature_(TraceFitter< PeakType > *fitter, const MassTraces &traces, MassTraces &new_traces)
Creates new mass traces new_traces based on the fitting result and the original traces traces...
Definition: FeatureFinderAlgorithmPicked.h:1979
Size max_trace
Maximum intensity trace.
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:269
Size findNearest(CoordinateType mz) const
Binary search for the peak nearest to a specific m/z.
Definition: MSSpectrum.h:476
void extendMassTraces_(const IsotopePattern &pattern, MassTraces &traces, Size meta_index_overall) const
Definition: FeatureFinderAlgorithmPicked.h:1367
Param param_
Container for current parameters.
Definition: DefaultParamHandler.h:148
ConvexHull2D & getConvexHull() const
Returns the overall convex hull of the feature (calculated from the convex hulls of the mass traces) ...
void findIsotope_(DoubleReal pos, Size spectrum_index, IsotopePattern &pattern, Size pattern_index, Size &peak_index) const
Searches for an isotopic peak in the current spectrum and the adjacent spectra.
Definition: FeatureFinderAlgorithmPicked.h:1642
IntensityType getIntensity() const
Definition: Peak2D.h:161
DoubleReal mass_window_width_
Width of the isotope pattern mass bins.
Definition: FeatureFinderAlgorithmPicked.h:1137
Helper struct for a collection of mass traces used in FeatureFinderAlgorithmPicked.
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:153
std::ofstream log_
Output stream for log/debug info.
Definition: FeatureFinderAlgorithmPicked.h:1117
DoubleReal pattern_tolerance_
Stores mass_trace:mz_tolerance.
Definition: FeatureFinderAlgorithmPicked.h:1129
void store(const String &filename, const FeatureMap<> &feature_map)
stores the map feature_map in file with name filename.
Abstract fitter for RT profile fitting.
Definition: TraceFitter.h:62
DoubleReal min_rt_span_
Minimum RT range that has to be left after the fit.
Definition: FeatureFinderAlgorithmPicked.h:1141
bool isValid(DoubleReal seed_mz, DoubleReal trace_tolerance)
Checks if still valid (seed still contained and enough traces)
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:174
A container for features.
Definition: FeatureMap.h:111
Iterator end()
Definition: IsotopeDistribution.h:187
#define OPENMS_POSTCONDITION(condition, message)
Postcondition macro.
Definition: Macros.h:114
unsigned int UInt
Unsigned integer type.
Definition: Types.h:92
virtual bool checkMinimalRTSpan(const std::pair< DoubleReal, DoubleReal > &rt_bounds, const DoubleReal min_rt_span)=0
Size optional_begin
Number of optional peaks at the beginning of the pattern.
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:282
Helper structure for a theoretical isotope pattern used in FeatureFinderAlgorithmPicked.
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:277
Abstract base class for FeatureFinder algorithms.
Definition: FeatureFinderAlgorithm.h:74
const TheoreticalIsotopePattern & getIsotopeDistribution_(DoubleReal mass) const
Returns the isotope distribution for a certain mass window.
Definition: FeatureFinderAlgorithmPicked.h:1249
std::vector< DoubleReal > mz_score
m/z score of peak (0 if peak index is -1 or -2)
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:309
Forward iterator for an area of peaks in an experiment.
Definition: AreaIterator.h:58
Isotope distribution class.
Definition: IsotopeDistribution.h:61
Size size() const
Returns the size.
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:290
std::map< Seed, String > abort_reasons_
Array of abort reasons.
Definition: FeatureFinderAlgorithmPicked.h:1123
FeatureFinderAlgorithm< PeakType, FeatureType >::FeatureMapType FeatureMapType
Definition: FeatureFinderAlgorithmPicked.h:92
bool encloses(DoubleReal rt, DoubleReal mz) const
Returns if the mass trace convex hulls of the feature enclose the position specified by rt and mz...
Size setUniqueId()
Assigns a new, valid unique id. Always returns 1.
Definition: UniqueIdInterface.h:150
static String number(DoubleReal d, UInt n)
returns a string for d with exactly n decimal places
virtual DoubleReal getLowerRTBound() const =0
void setSectionDescription(const String &key, const String &description)
Sets a description for an existing section.
String reported_mz_
The mass type that is reported for features. &#39;maximum&#39; returns the m/z value of the highest mass trac...
Definition: FeatureFinderAlgorithmPicked.h:1144
DoubleReal trace_tolerance_
Stores isotopic_pattern:mz_tolerance.
Definition: FeatureFinderAlgorithmPicked.h:1130
ptrdiff_t SignedSize
Signed Size type e.g. used as pointer difference.
Definition: Types.h:151
bool intersects(const DBoundingBox &bounding_box) const
Definition: DBoundingBox.h:180
std::vector< DoubleReal > intensity
Peak intensity (0 if peak index is -1 or -2)
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:307
Size getPeakCount() const
Returns the peak count of all traces.
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:163
void resize(Size s)
Definition: MSExperiment.h:122
virtual DoubleReal computeTheoretical(const FeatureFinderAlgorithmPickedHelperStructs::MassTrace< PeakType > &trace, Size k)=0
#define IF_MASTERTHREAD
Definition: OpenSwathChromatogramExtractor.C:48
void setIsotopeDistribution(const IsotopeDistribution &isotopes)
sets the isotope distribution of the element
CoordinateType getMaxMZ() const
returns the maximal m/z value
Definition: MSExperiment.h:527
const double c
std::vector< DoubleReal > theoretical_mz
Theoretical m/z value of the isotope peak.
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:311
void setParameters(const Param &param)
Sets the parameters.
DoubleReal intensity_rt_step_
RT bin width.
Definition: FeatureFinderAlgorithmPicked.h:1150
ConstAreaIterator areaEndConst() const
Returns an non-mutable invalid area iterator marking the end of an area.
Definition: MSExperiment.h:337
#define LOG_DEBUG
Macro for general debugging information.
Definition: LogStream.h:459
The purpose of this struct is to provide definitions of classes and typedefs which are used throughou...
Definition: FeatureFinderDefs.h:51
CoordinateType getMinMZ() const
returns the minimal m/z value
Definition: MSExperiment.h:521
SpectrumType::FloatDataArrays FloatDataArrays
Definition: FeatureFinderAlgorithmPicked.h:94
CoordinateType getMinRT() const
returns the minimal retention time value
Definition: MSExperiment.h:533
const double PROTON_MASS_U
void setIntensity(IntensityType intensity)
Non-mutable access to the data point intensity (height)
Definition: Peak2D.h:167
std::vector< FloatDataArray > FloatDataArrays
Float data array vector type.
Definition: MSSpectrum.h:113
TheoreticalIsotopePattern theoretical_pattern
Theoretical isotope pattern.
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:313
void store(const String &filename)
Writes the data to a file.
static const String getProductName()
Definition: FeatureFinderAlgorithmPicked.h:1108
Fitter for RT profiles using a gaussian background model.
Definition: GaussTraceFitter.h:53
std::vector< Size > spectrum
Spectrum index (undefined if peak index is -1 or -2)
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:305
Size nearest_(DoubleReal pos, const SpectrumType &spec, Size start) const
Returns the index of the peak nearest to m/z pos in spectrum spec (linear search starting from index ...
Definition: FeatureFinderAlgorithmPicked.h:1612
bool isValid() const
Checks if this Trace is valid (has more than 2 points)
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:142
File adapter for MzML files.
Definition: MzMLFile.h:58
Representation of an element.
Definition: Element.h:54
UInt intensity_bins_
Number of bins (in RT and MZ) for intensity significance estimation.
Definition: FeatureFinderAlgorithmPicked.h:1138
DoubleReal max_rt_span_
Maximum RT range the model is allowed to span.
Definition: FeatureFinderAlgorithmPicked.h:1142
void writeFeatureDebugInfo_(TraceFitter< PeakType > *fitter, const MassTraces &traces, const MassTraces &new_traces, bool feature_ok, const String error_msg, const DoubleReal final_score, const Int plot_nr, const PeakType &peak, const String path="debug/features/")
Creates several files containing plots and viewable data of the fitted mass trace.
Definition: FeatureFinderAlgorithmPicked.h:2181
void endProgress() const
Ends the progress display.
FeatureFinderAlgorithmPickedHelperStructs::IsotopePattern IsotopePattern
Definition: FeatureFinderAlgorithmPicked.h:107
CoordinateType width() const
Returns the width of the area i.e. the difference of dimension zero (X).
Definition: DIntervalBase.h:292
const DataValue & getValue(const String &key) const
Returns a value of a parameter.
void setRT(CoordinateType coordinate)
Mutable access to the RT coordinate (index 0)
Definition: Peak2D.h:209
static DoubleReal pearsonCorrelationCoefficient(IteratorType1 begin_a, IteratorType1 end_a, IteratorType2 begin_b, IteratorType2 end_b)
Calculates the Pearson correlation coefficient for the values in [begin_a, end_a) and [begin_b...
Definition: StatisticFunctions.h:285
ConstAreaIterator areaBeginConst(CoordinateType min_rt, CoordinateType max_rt, CoordinateType min_mz, CoordinateType max_mz) const
Returns a non-mutable area iterator for area.
Definition: MSExperiment.h:327
DoubleReal theoretical_int
Theoretical intensity value (scaled to [0,1])
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:91
const std::vector< Feature > & getSubordinates() const
immutable access to subordinate features
void setMaxIsotope(Size max_isotope)
sets the maximal isotope with max_isotope
const Element * getElement(const String &name) const
FeatureFinderAlgorithmPicked()
default constructor
Definition: FeatureFinderAlgorithmPicked.h:111
DoubleReal optional_fit_improvement_
Minimal imrovment for leaving out optional isotope.
Definition: FeatureFinderAlgorithmPicked.h:1136
const std::vector< ConvexHull2D > & getConvexHulls() const
Non-mutable access to the convex hulls.
virtual bool checkMaximalRTSpan(const DoubleReal max_rt_span)=0
bool checkFeatureQuality_(TraceFitter< PeakType > *fitter, MassTraces &feature_traces, const DoubleReal &seed_mz, const DoubleReal &min_feature_score, String &error_msg, DoubleReal &fit_score, DoubleReal &correlation, DoubleReal &final_score)
Checks the feature based on different score thresholds and model constraints.
Definition: FeatureFinderAlgorithmPicked.h:2080
Iterator begin()
Definition: IsotopeDistribution.h:185
virtual DoubleReal getHeight() const =0
virtual DoubleReal getFWHM() const =0
void store(const String &filename, const MapType &map) const
Stores a map in a MzML file.
Definition: MzMLFile.h:126
void trimLeft(DoubleReal cutoff)
Trims the left side of the isotope distribution to isotopes with a significant contribution.
DoubleReal max_feature_intersection_
Maximum allowed feature intersection (if larger, that one of the feature is removed) ...
Definition: FeatureFinderAlgorithmPicked.h:1143
DoubleReal min_trace_score_
Minimum quality of a traces.
Definition: FeatureFinderAlgorithmPicked.h:1140
virtual DoubleReal getUpperRTBound() const =0
static StringList create(const String &list, const char splitter= ',')
Returns a list that is created by splitting the given (comma-separated) string (String are not trimme...
void estimateFromPeptideWeight(double average_weight)
Estimate Peptide Isotopedistribution from weight and number of isotopes that should be reported...
void trimRight(DoubleReal cutoff)
Trims the right side of the isotope distribution to isotopes with a significant contribution.
DBoundingBox< 2 > getBoundingBox() const
returns the bounding box of the feature hull points
DoubleReal findBestIsotopeFit_(const Seed &center, UInt charge, IsotopePattern &best_pattern) const
Finds the best fitting position of the isotopic pattern estimate defined by center.
Definition: FeatureFinderAlgorithmPicked.h:1270
DoubleReal baseline
Estimated baseline in the region of the feature (used for the fit)
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:271
An LC-MS feature.
Definition: Feature.h:66
FeatureFinderAlgorithm< PeakType, FeatureType >::MapType MapType
Definition: FeatureFinderAlgorithmPicked.h:91
Helper structure for a found isotope pattern used in FeatureFinderAlgorithmPicked.
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:300
FeatureFinderAlgorithmPickedHelperStructs::Seed Seed
Definition: FeatureFinderAlgorithmPicked.h:103
void setWidth(WidthType fwhm)
Set the width of the feature (FWHM)
DoubleReal isotopeScore_(const TheoreticalIsotopePattern &isotopes, IsotopePattern &pattern, bool consider_mz_distances) const
Calculates a score between 0 and 1 for the correlation between theoretical and found isotope pattern...
Definition: FeatureFinderAlgorithmPicked.h:1738
void setValidStrings(const String &key, const std::vector< String > &strings)
Sets the valid strings for the parameter key.
virtual void fit(FeatureFinderAlgorithmPickedHelperStructs::MassTraces< PeakType > &traces)=0
virtual void run()
Main method for actual FeatureFinder.
Definition: FeatureFinderAlgorithmPicked.h:218
Management and storage of parameters / INI files.
Definition: Param.h:69
CoordinateType getMZ() const
Returns the m/z coordinate (index 1)
Definition: Peak2D.h:191
Invalid value exception.
Definition: Exception.h:336
DoubleReal intensity_percentage_optional_
Isotope pattern intensity contribution of optional peaks.
Definition: FeatureFinderAlgorithmPicked.h:1135
Representation of a mass spectrometry experiment.
Definition: MSExperiment.h:68
void swapFeaturesOnly(FeatureMap &from)
Swaps the feature content (plus its range information) of this map with the content of from...
Definition: FeatureMap.h:377
UInt max_missing_trace_peaks_
Stores mass_trace:max_missing.
Definition: FeatureFinderAlgorithmPicked.h:1132
std::map< String, UInt > aborts_
Array of abort reasons.
Definition: FeatureFinderAlgorithmPicked.h:1121
DoubleReal slope_bound_
Max slope of mass trace intensities.
Definition: FeatureFinderAlgorithmPicked.h:1133
PositionType const & minPosition() const
Accessor to minimum position.
Definition: DIntervalBase.h:121
FeatureFinderAlgorithmPickedHelperStructs::MassTraces< PeakType > MassTraces
Definition: FeatureFinderAlgorithmPicked.h:105
This class provides Input/Output functionality for feature maps.
Definition: FeatureXMLFile.h:59
void setOverallQuality(QualityType q)
Set the overall quality.
std::vector< TheoreticalIsotopePattern > isotope_distributions_
Vector of precalculated isotope distributions for several mass windows.
Definition: FeatureFinderAlgorithmPicked.h:1158
static const ElementDB * getInstance()
returns a pointer to the singleton instance of the element db
Definition: ElementDB.h:78
CoordinateType getMaxRT() const
returns the maximal retention time value
Definition: MSExperiment.h:539
void setMinInt(const String &key, Int min)
Sets the minimum value for the integer or integer list parameter key.
size_t Size
Size type e.g. used as variable which can hold result of size()
Definition: Types.h:144
std::vector< std::pair< DoubleReal, const PeakType * > > peaks
Contained peaks (pair of RT and pointer to peak)
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:94
UInt min_spectra_
Number of spectra that have to show the same mass (for finding a mass trace)
Definition: FeatureFinderAlgorithmPicked.h:1131
Size optional_end
Number of optional peaks at the end of the pattern.
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:284
DoubleReal intensityScore_(Size rt_bin, Size mz_bin, DoubleReal intensity) const
Definition: FeatureFinderAlgorithmPicked.h:1930
void startProgress(SignedSize begin, SignedSize end, const String &label) const
Initializes the progress display.
bool empty() const
Definition: MSExperiment.h:127
void extendMassTrace_(MassTrace &trace, SignedSize spectrum_index, DoubleReal mz, bool increase_rt, Size meta_index_overall, DoubleReal min_rt=0.0, DoubleReal max_rt=0.0) const
Extends a single mass trace in one RT direction.
Definition: FeatureFinderAlgorithmPicked.h:1510
MapType map_
editable copy of the map
Definition: FeatureFinderAlgorithmPicked.h:1115
void setCharge(const ChargeType &ch)
Set charge state.
Size trimmed_left
The number of isotopes trimmed on the left side. This is needed to reconstruct the monoisotopic peak...
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:288
FeatureFinderAlgorithmPickedHelperStructs::MassTrace< PeakType > MassTrace
Definition: FeatureFinderAlgorithmPicked.h:104
void setProgress(SignedSize value) const
Sets the current progress.
PositionType const & maxPosition() const
Accessor to maximum position.
Definition: DIntervalBase.h:127
Size getTheoreticalmaxPosition() const
Returns the theoretical maximum trace index.
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:195
A RT Profile fitter using an Exponential Gaussian Hybrid background model.
Definition: EGHTraceFitter.h:58
virtual void updateMembers_()
This method is used to update extra member variables at the end of the setParameters() method...
Definition: FeatureFinderAlgorithmPicked.h:1161
A D-dimensional bounding box.
Definition: DBoundingBox.h:51
DoubleReal getRT() const
Definition: MSSpectrum.h:215
FeatureMapType seeds_
User-specified seed list.
Definition: FeatureFinderAlgorithmPicked.h:1125
void updateBaseline()
Sets the baseline to the lowest contained peak of the trace.
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:216
virtual DoubleReal getCenter() const =0
DoubleReal intensity_mz_step_
m/z bin width
Definition: FeatureFinderAlgorithmPicked.h:1152
int Int
Signed integer type.
Definition: Types.h:100
void set(const ContainerType &distribution)
overwrites the container which holds the distribution using distribution
void setMinFloat(const String &key, DoubleReal min)
Sets the minimum value for the floating point or floating point list parameter key.
static FeatureFinderAlgorithm< PeakType, FeatureType > * create()
Definition: FeatureFinderAlgorithmPicked.h:1103
virtual void setSeeds(const FeatureMapType &seeds)
Sets a reference to the calling FeatureFinder.
Definition: FeatureFinderAlgorithmPicked.h:212
std::vector< DoubleReal > intensity
Vector of intensity contributions.
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:280
void abort_(const Seed &seed, const String &reason)
Writes the abort reason to the log file and counts occurrences for each reason.
Definition: FeatureFinderAlgorithmPicked.h:1182
void sortByIntensity(bool reverse=false)
Sorts the peaks according to ascending intensity.
Definition: FeatureMap.h:297
std::pair< DoubleReal, DoubleReal > getRTBounds() const
Returns the RT boundaries of the mass traces.
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:246
Size peak
Peak index.
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:67
std::vector< SignedSize > peak
Peak index (-1 if peak was not found, -2 if it was removed to improve the isotope fit) ...
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:303
This class provides some basic file handling methods for text files.
Definition: TextFile.h:47
FeatureFinderAlgorithmPickedHelperStructs::TheoreticalIsotopePattern TheoreticalIsotopePattern
Definition: FeatureFinderAlgorithmPicked.h:106
void defaultsToParam_()
Updates the parameters after the defaults have been set in the constructor.
Real intensity
Intensity.
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:69
FeatureFinderAlgorithmPicked & operator=(const FeatureFinderAlgorithmPicked &)
Not implemented.
void setMaxFloat(const String &key, DoubleReal max)
Sets the maximum value for the floating point or floating point list parameter key.
Helper struct for mass traces used in FeatureFinderAlgorithmPicked.
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:83
Helper structure for seeds used in FeatureFinderAlgorithmPicked.
Definition: FeatureFinderAlgorithmPickedHelperStructs.h:62
double DoubleReal
Double-precision real type.
Definition: Types.h:118

OpenMS / TOPP release 1.11.1 Documentation generated on Thu Nov 14 2013 11:19:13 using doxygen 1.8.5