Class that implements a suffix array for a String. It can be used to find peptide Candidates for a MS spectrum. More...
#include <OpenMS/DATASTRUCTURES/SuffixArrayTrypticCompressed.h>
Public Member Functions | |
SuffixArrayTrypticCompressed (const String &st, const String &filename, const WeightWrapper::WEIGHTMODE weight_mode=WeightWrapper::MONO) | |
constructor taking the string and the filename for writing or reading More... | |
SuffixArrayTrypticCompressed (const SuffixArrayTrypticCompressed &sa) | |
copy constructor More... | |
virtual | ~SuffixArrayTrypticCompressed () |
destructor More... | |
String | toString () |
transforms suffix array to a printable String More... | |
void | findSpec (std::vector< std::vector< std::pair< std::pair< SignedSize, SignedSize >, DoubleReal > > > &candidates, const std::vector< DoubleReal > &spec) |
the function that will find all peptide candidates for a given spectrum More... | |
bool | save (const String &file_name) |
saves the suffix array to disc More... | |
bool | open (const String &file_name) |
opens the suffix array More... | |
void | setTolerance (DoubleReal t) |
setter for tolerance More... | |
DoubleReal | getTolerance () const |
getter for tolerance More... | |
bool | isDigestingEnd (const char aa1, const char aa2) const |
returns if an enzyme will cut after first character More... | |
void | setTags (const std::vector< String > &tags) |
setter for tags More... | |
const std::vector< String > & | getTags () |
getter for tags More... | |
void | setUseTags (bool use_tags) |
setter for use_tags More... | |
bool | getUseTags () |
getter for use_tags More... | |
void | setNumberOfModifications (Size number_of_mods) |
setter for number of modifications More... | |
Size | getNumberOfModifications () |
getter for number of modifications More... | |
void | printStatistic () |
output for statistic More... | |
![]() | |
SuffixArray (const String &st, const String &filename) | |
constructor taking the string and the filename for writing or reading More... | |
SuffixArray (const SuffixArray &sa) | |
copy constructor More... | |
virtual | ~SuffixArray ()=0 |
destructor More... | |
SuffixArray () | |
constructor More... | |
![]() | |
WeightWrapper () | |
constructor More... | |
WeightWrapper (const WEIGHTMODE weight_mode) | |
constructor More... | |
virtual | ~WeightWrapper () |
destructor More... | |
WeightWrapper (const WeightWrapper &source) | |
copy constructor More... | |
void | setWeightMode (const WEIGHTMODE mode) |
Sets the weight mode (MONO or AVERAGE) More... | |
WEIGHTMODE | getWeightMode () const |
Gets the weight mode (MONO or AVERAGE) More... | |
DoubleReal | getWeight (const AASequence &aa) const |
returns the weight of either mono or average value More... | |
DoubleReal | getWeight (const EmpiricalFormula &ef) const |
returns the weight of either mono or average value More... | |
DoubleReal | getWeight (const Residue &r, Residue::ResidueType res_type=Residue::Full) const |
returns the weight of either mono or average value More... | |
Protected Member Functions | |
SuffixArrayTrypticCompressed () | |
constructor More... | |
SignedSize | getNextSep_ (const SignedSize p) const |
gets the index of the next sperator for a given index More... | |
SignedSize | getLCP_ (const std::pair< SignedSize, SignedSize > &last_point, const std::pair< SignedSize, SignedSize > ¤t_point) |
gets the lcp for two strings described as pairs of ints More... | |
SignedSize | findFirst_ (const std::vector< DoubleReal > &spec, DoubleReal &m) |
binary search for finding the index of the first element of the spectrum that matches the desired mass within the tolerance. More... | |
SignedSize | findFirst_ (const std::vector< DoubleReal > &spec, DoubleReal &m, SignedSize start, SignedSize end) |
binary search for finding the index of the first element of the spectrum that matches the desired mass within the tolerance. it searches recursivly. More... | |
void | parseTree_ (SignedSize start_index, SignedSize stop_index, SignedSize depth, SignedSize walked_in, SignedSize edge_len, std::vector< std::pair< SignedSize, SignedSize > > &out_number, std::vector< std::pair< SignedSize, SignedSize > > &edge_length, std::vector< SignedSize > &leafe_depth) |
treats the suffix array as a tree and parses the tree using postorder traversion. This is realised by a recursive algorithm. More... | |
bool | hasMoreOutgoings_ (SignedSize start_index, SignedSize stop_index, SignedSize walked_in) |
indicates if a node during traversal has more outgoings More... | |
Protected Attributes | |
const String & | s_ |
the string with which the suffix array is build More... | |
DoubleReal | tol_ |
mass tolerance for finding candidates More... | |
std::vector< std::pair < SignedSize, SignedSize > > | indices_ |
vector of pairs of ints describing all relevant sufices More... | |
std::vector< SignedSize > | lcp_ |
vector of ints with lcp values More... | |
std::vector< SignedSize > | skip_ |
vector of ints with skip values More... | |
DoubleReal | masse_ [256] |
mass table More... | |
Size | number_of_modifications_ |
number of allowed modifications More... | |
std::vector< String > | tags_ |
all given tags More... | |
bool | use_tags_ |
indicates whether tags are used or not More... | |
SignedSize | progress_ |
Additional Inherited Members | |
![]() | |
enum | WEIGHTMODE { AVERAGE = 0, MONO, SIZE_OF_WEIGHTMODE } |
Class that implements a suffix array for a String. It can be used to find peptide Candidates for a MS spectrum.
This class implements a suffix array. It can just be used for finding peptide Candidates for a given MS Spectrum within a certain mass tolerance. The suffix array can be saved to disc for reused so it has to be build just once. The suffix array consits of a vector of pair of ints for every suffix, a vector of LCP values and a so called skip vector. Only the sufices that are matching the function isDigestingEnd are created. Besides a suffix will not reach till the end of the string but till the next occurence of the separator ($). So only the interessting sufices will be saved. This will reduce the used space.
SuffixArrayTrypticCompressed | ( | const String & | st, |
const String & | filename, | ||
const WeightWrapper::WEIGHTMODE | weight_mode = WeightWrapper::MONO |
||
) |
constructor taking the string and the filename for writing or reading
st | the string as const reference with which the suffix array will be build |
filename | the filename for writing or reading the suffix array |
weight_mode | if not monoistopic weight should be used, this parameters can be set to AVERAGE |
Exception::InvalidValue | if string does not start with empty string ($) |
FileNotFound | is thrown if the given file was not found |
The constructor checks if a suffix array with given filename (without file extension) exists or not. In the first case it will simple be loaded and otherwise it will be build. Bulding the suffix array consists of several steps. At first all indices for a digesting enzyme (defined by using function isDigestingEnd) are created as an vector of SignedSize pairs. After creating all relevant indices they are sorted and the lcp and skip vectors are created.
SuffixArrayTrypticCompressed | ( | const SuffixArrayTrypticCompressed & | sa | ) |
copy constructor
|
virtual |
destructor
|
protected |
constructor
|
protected |
binary search for finding the index of the first element of the spectrum that matches the desired mass within the tolerance.
spec | const reference to spectrum |
m | mass |
|
protected |
binary search for finding the index of the first element of the spectrum that matches the desired mass within the tolerance. it searches recursivly.
spec | const reference to spectrum |
m | mass |
start | start index |
end | end index |
|
virtual |
the function that will find all peptide candidates for a given spectrum
spec | const reference of DoubleReal vector describing the spectrum |
candidates | output parameter which contains the candidates of the masses given in spec |
InvalidValue | if the spectrum is not sorted ascendingly |
for every mass within the spectrum all candidates described by as pairs of ints are returned. All masses are searched for the same time in just one suffix array traversal. In order to accelerate the traversal the skip and lcp table are used. The mass wont be calculated for each entry but it will be updated during traversal using a stack datastructure
Implements SuffixArray.
|
protected |
gets the lcp for two strings described as pairs of ints
last_point | const pair of ints describing a substring |
current_point | const pair of ints describing a substring |
|
protected |
gets the index of the next sperator for a given index
p | const SignedSize describing a position in the string |
|
virtual |
getter for number of modifications
Implements SuffixArray.
|
virtual |
|
virtual |
|
virtual |
|
protected |
indicates if a node during traversal has more outgoings
start_index | SignedSize describing the start index in indices_ vector |
stop_index | SignedSize describing the end index in indices_ vector |
walked_in | how many characters we have seen from root to actual position |
|
virtual |
returns if an enzyme will cut after first character
aa1 | const char as first aminoacid |
aa2 | const char as second aminoacid |
Implements SuffixArray.
|
virtual |
opens the suffix array
file_name | const reference string describing the filename |
FileNotFound |
Implements SuffixArray.
|
protected |
treats the suffix array as a tree and parses the tree using postorder traversion. This is realised by a recursive algorithm.
start_index | SignedSize describing the start index in indices_ vector |
stop_index | SignedSize describing the end index in indices_ vector |
depth | at with depth the traversion is at the actual position |
walked_in | how many characters we have seen from root to actual position |
edge_len | how many characters we have seen from last node to actual position |
out_number | reference to vector of pairs of ints. For every node it will be filled with how many outgoing edge a node has in dependece of its depth |
edge_length | will be filled with the edge_length in dependence of its depth |
leafe_depth | will be filled with the depth of every leafe |
|
virtual |
output for statistic
Implements SuffixArray.
|
virtual |
saves the suffix array to disc
file_name | const reference string describing the filename |
Exception::UnableToCreateFile | if file could not be created (e.x. if you have no rigths) |
Implements SuffixArray.
|
virtual |
|
virtual |
setter for tags
tags | const vector of strings with tags with length 3 each |
InvalidValue | if at least one tag does not have size of 3 |
Implements SuffixArray.
|
virtual |
setter for tolerance
t | DoubleReal with tolerance |
Exception::InvalidValue | if tolerance is negative |
Implements SuffixArray.
|
virtual |
setter for use_tags
use_tags | indicating whether tags should be used or not |
Implements SuffixArray.
|
virtual |
transforms suffix array to a printable String
Implements SuffixArray.
|
protected |
vector of pairs of ints describing all relevant sufices
|
protected |
vector of ints with lcp values
|
protected |
mass table
|
protected |
number of allowed modifications
|
protected |
|
protected |
the string with which the suffix array is build
|
protected |
vector of ints with skip values
|
protected |
all given tags
|
protected |
mass tolerance for finding candidates
|
protected |
indicates whether tags are used or not
OpenMS / TOPP release 1.11.1 | Documentation generated on Thu Nov 14 2013 11:19:29 using doxygen 1.8.5 |