Class FMIndex
java.lang.Object
com.compomics.util.experiment.personalization.ExperimentObject
com.compomics.util.experiment.identification.protein_inference.fm_index.FMIndex
- All Implemented Interfaces:
FastaMapper
,ProteinDetailsProvider
,SequenceProvider
,Serializable
public class FMIndex extends ExperimentObject implements FastaMapper, SequenceProvider, ProteinDetailsProvider
The FM index.
- Author:
- Dominik Kopczynski, Marc Vaudel
- See Also:
- Serialized Form
-
Field Summary
Fields Modifier and Type Field Description static char
DELIMITER
Character defining as delimiter between protein sequences.ArrayList<Integer>
indexStringLengths
Length of the indexed string (all concatenated protein sequences).ArrayList<int[]>
lessTablesPrimary
Less table for doing an update step according to the LF step.ArrayList<int[]>
lessTablesReversed
Less table for doing an update step according to the LF step reversed.int
maxPTMsPerPeptide
Maximal number of PTMs per peptide.ArrayList<WaveletTree>
occurrenceTablesPrimary
Wavelet tree for storing the burrows wheeler transform.ArrayList<WaveletTree>
occurrenceTablesReversed
Wavelet tree for storing the burrows wheeler transform reversed.boolean
onlyTrypticPeptides
Flag for only considering tryptic digested peptides.static char
SENTINEL
Sentinel character necessary for computation of the suffix array. -
Constructor Summary
Constructors Constructor Description FMIndex()
Empty default constructor.FMIndex(File fastaFile, FastaParameters fastaParameters, WaitingHandler waitingHandler, boolean displayProgress, PeptideVariantsParameters peptideVariantsPreferences, SearchParameters searchParameters)
Constructor.FMIndex(File fastaFile, FastaParameters fastaParameters, WaitingHandler waitingHandler, boolean displayProgress, IdentificationParameters identificationParameters)
Constructor. -
Method Summary
Modifier and Type Method Description void
addModificationPattern(Modification modification)
Adds a modification pattern for bitwise pattern search.boolean
checkModificationPattern(PeptideProteinMapping peptideProteinMapping)
Checking if peptide-protein should be discarded due to pattern modification conflict.int[]
computeMappingRanges(double mass)
Compute mapping ranges.double
computeMassTolerance(double currentTollerance, double refMass)
Compute the inverse mass value.double
computeMassValue(double currentMass, double refMass)
Compute the mass value.Collection<String>
getAccessions()
Returns all accessions loaded in the provider.long
getAllocatedBytes()
Computes the number of allocated bytes.HashSet<String>
getDecoyAccessions()
Returns the decoy accessions.String
getDescription(String accession)
Returns the description of the protein with the given accession.String
getGeneName(String accession)
Returns the gene name for the given protein.String
getHeader(String proteinAccession)
Returns the FASTA header of the protein as found in the FASTA file.ProteinDatabase
getProteinDatabase(String accession)
Returns the the protein database for the given protein.Integer
getProteinEvidence(String accession)
Returns an integer representing the protein evidence level as indexed by UniProt.ArrayList<PeptideProteinMapping>
getProteinMapping(Tag tag, SequenceMatchingParameters sequenceMatchingPreferences)
Returns the protein mappings for the given peptide sequence.ArrayList<PeptideProteinMapping>
getProteinMapping(String peptide, SequenceMatchingParameters sequenceMatchingParameters)
Returns the protein mapping in the FASTA file loaded in the sequence factory for the given peptide sequence in a map: peptide sequence found in the FASTA file | protein accession | list of indexes of the peptide sequence on the protein sequence.ArrayList<PeptideProteinMapping>
getProteinMappingWithoutVariants(Tag tag, SequenceMatchingParameters sequenceMatchingPreferences, int indexPart)
Mapping tags against proteome without variants.ArrayList<PeptideProteinMapping>
getProteinMappingWithoutVariants(String peptide, SequenceMatchingParameters seqMatchPref, int indexPart)
Exact mapping peptides against the proteome.ArrayList<PeptideProteinMapping>
getProteinMappingWithVariants(Tag tag, SequenceMatchingParameters sequenceMatchingPreferences, int indexPart)
Mapping tags against proteome with variants.ArrayList<PeptideProteinMapping>
getProteinMappingWithVariantsFixed(String peptide, SequenceMatchingParameters seqMatchPref, int indexPart)
Variant tolerant mapping peptides against the proteome.ArrayList<PeptideProteinMapping>
getProteinMappingWithVariantsGeneric(String peptide, SequenceMatchingParameters seqMatchPref, int indexPart)
Variant tolerant mapping peptides against the proteome.ArrayList<PeptideProteinMapping>
getProteinMappingWithVariantsSpecific(String peptide, SequenceMatchingParameters seqMatchPref, int indexPart)
Variant tolerant mapping peptides against the proteomeString
getSequence(String proteinAccession)
Returns the protein sequence for the given accession.String
getSimpleDescription(String accession)
Returns the simple description of the protein with the given accession.String
getSubsequence(String accession, int start, int end)
Returns the subsequence of the sequence of a given protein.String
getTaxonomy(String accession)
Returns the taxonomy for the given protein.void
mapTagToProteinTermini(MatrixContent cell, double combinationMass, boolean CTermDirection, LinkedList<MatrixContent>[] matrix, int k, int leftIndex, int rightIndex)
double
pepMass(String peptide)
Computing the mass of a peptide.char
prefixCharacter(String proteinAccession, int index)
Backward propagation of the BWT to get the previous character.void
reconstructFasta(File file)
Reconstructs the FASTA file stored in the indexchar
suffixCharacter(String proteinAccession, int index, int length)
Forward propagation of the BWT to get the n'th consecutive characterboolean
withinMassTolerance(double mass, int numX)
Lookup, if mass can be described a combination of numX different amino acidsMethods inherited from class com.compomics.util.experiment.personalization.ExperimentObject
addUrParam, asLong, clearParametersMap, getId, getUrParam, getUrParams, removeUrParam, setId, setUrParams
-
Field Details
-
maxPTMsPerPeptide
public int maxPTMsPerPeptideMaximal number of PTMs per peptide. -
onlyTrypticPeptides
public boolean onlyTrypticPeptidesFlag for only considering tryptic digested peptides. -
DELIMITER
public static char DELIMITERCharacter defining as delimiter between protein sequences. -
SENTINEL
public static char SENTINELSentinel character necessary for computation of the suffix array. -
occurrenceTablesPrimary
Wavelet tree for storing the burrows wheeler transform. -
occurrenceTablesReversed
Wavelet tree for storing the burrows wheeler transform reversed. -
lessTablesPrimary
Less table for doing an update step according to the LF step. -
lessTablesReversed
Less table for doing an update step according to the LF step reversed. -
indexStringLengths
Length of the indexed string (all concatenated protein sequences).
-
-
Constructor Details
-
FMIndex
public FMIndex()Empty default constructor. -
FMIndex
public FMIndex(File fastaFile, FastaParameters fastaParameters, WaitingHandler waitingHandler, boolean displayProgress, PeptideVariantsParameters peptideVariantsPreferences, SearchParameters searchParameters) throws IOException, OutOfMemoryError, RuntimeException, IllegalArgumentExceptionConstructor. If modification settings are provided the index will contain modification information, ignored if null.- Parameters:
fastaFile
- the FASTA file to indexfastaParameters
- the parameters for the FASTA file parsingwaitingHandler
- the waiting handlerdisplayProgress
- if true, the progress is displayedpeptideVariantsPreferences
- contains all parameters for variantssearchParameters
- the search parameters- Throws:
IOException
- exception thrown if an error occurs while iterating the FASTA file.OutOfMemoryError
RuntimeException
IllegalArgumentException
-
FMIndex
public FMIndex(File fastaFile, FastaParameters fastaParameters, WaitingHandler waitingHandler, boolean displayProgress, IdentificationParameters identificationParameters) throws IOException, OutOfMemoryError, RuntimeException, IllegalArgumentExceptionConstructor. If modification settings are provided the index will contain modification information, ignored if null.- Parameters:
fastaFile
- the FASTA file to indexfastaParameters
- the parameters for the FASTA file parsingwaitingHandler
- the waiting handlerdisplayProgress
- if true, the progress is displayedidentificationParameters
- contains all identification parameters- Throws:
IOException
- exception thrown if an error occurs while iterating the FASTA fileOutOfMemoryError
RuntimeException
IllegalArgumentException
-
-
Method Details
-
computeMassValue
public double computeMassValue(double currentMass, double refMass)Compute the mass value.- Parameters:
currentMass
- the current massrefMass
- the reference mass- Returns:
- the mass value
-
computeMassTolerance
public double computeMassTolerance(double currentTollerance, double refMass)Compute the inverse mass value.- Parameters:
currentTollerance
- the current massrefMass
- the reference mass- Returns:
- the inverse mass value
-
addModificationPattern
Adds a modification pattern for bitwise pattern search.- Parameters:
modification
- modification object
-
checkModificationPattern
Checking if peptide-protein should be discarded due to pattern modification conflict.- Parameters:
peptideProteinMapping
- the peptide protein mapping- Returns:
- either yes or no
-
computeMappingRanges
public int[] computeMappingRanges(double mass)Compute mapping ranges.- Parameters:
mass
- the mass- Returns:
- the mapping ranges
-
getAllocatedBytes
public long getAllocatedBytes()Computes the number of allocated bytes.- Returns:
- allocated bytes
-
getProteinMapping
public ArrayList<PeptideProteinMapping> getProteinMapping(String peptide, SequenceMatchingParameters sequenceMatchingParameters)Description copied from interface:FastaMapper
Returns the protein mapping in the FASTA file loaded in the sequence factory for the given peptide sequence in a map: peptide sequence found in the FASTA file | protein accession | list of indexes of the peptide sequence on the protein sequence. 0 is the first amino acid.- Specified by:
getProteinMapping
in interfaceFastaMapper
- Parameters:
peptide
- the peptide sequencesequenceMatchingParameters
- the sequence matching preferences- Returns:
- the peptide to protein mapping: peptide sequence > protein accession > index in the protein An empty map if not
-
getProteinMappingWithoutVariants
public ArrayList<PeptideProteinMapping> getProteinMappingWithoutVariants(String peptide, SequenceMatchingParameters seqMatchPref, int indexPart)Exact mapping peptides against the proteome.- Parameters:
peptide
- the peptideseqMatchPref
- the sequence matching preferencesindexPart
- the index part- Returns:
- the mapping
-
getProteinMappingWithVariantsFixed
public ArrayList<PeptideProteinMapping> getProteinMappingWithVariantsFixed(String peptide, SequenceMatchingParameters seqMatchPref, int indexPart)Variant tolerant mapping peptides against the proteome.- Parameters:
peptide
- the peptideseqMatchPref
- the sequence match preferencesindexPart
- the index part- Returns:
- the mapping
-
getProteinMappingWithVariantsGeneric
public ArrayList<PeptideProteinMapping> getProteinMappingWithVariantsGeneric(String peptide, SequenceMatchingParameters seqMatchPref, int indexPart)Variant tolerant mapping peptides against the proteome.- Parameters:
peptide
- the peptideseqMatchPref
- the sequence match preferencesindexPart
- the index part- Returns:
- the mapping
-
getProteinMappingWithVariantsSpecific
public ArrayList<PeptideProteinMapping> getProteinMappingWithVariantsSpecific(String peptide, SequenceMatchingParameters seqMatchPref, int indexPart)Variant tolerant mapping peptides against the proteome- Parameters:
peptide
- the peptideseqMatchPref
- the sequence matching preferencesindexPart
- the index part- Returns:
- the mapping
-
pepMass
Computing the mass of a peptide.- Parameters:
peptide
- the peptide- Returns:
- the peptide mass
-
withinMassTolerance
public boolean withinMassTolerance(double mass, int numX)Lookup, if mass can be described a combination of numX different amino acids- Parameters:
mass
- to be describednumX
- number of Xs- Returns:
- decision
-
mapTagToProteinTermini
public void mapTagToProteinTermini(MatrixContent cell, double combinationMass, boolean CTermDirection, LinkedList<MatrixContent>[] matrix, int k, int leftIndex, int rightIndex) -
getProteinMapping
public ArrayList<PeptideProteinMapping> getProteinMapping(Tag tag, SequenceMatchingParameters sequenceMatchingPreferences)Description copied from interface:FastaMapper
Returns the protein mappings for the given peptide sequence. Peptide sequence | Protein accession | Index in the protein. An empty map if not found.- Specified by:
getProteinMapping
in interfaceFastaMapper
- Parameters:
tag
- the tag to look for in the tree. Must contain a consecutive amino acid sequence of longer or equal size than the initialTagSize of the treesequenceMatchingPreferences
- the sequence matching preferences- Returns:
- the protein mapping for the given peptide sequence
-
getProteinMappingWithoutVariants
public ArrayList<PeptideProteinMapping> getProteinMappingWithoutVariants(Tag tag, SequenceMatchingParameters sequenceMatchingPreferences, int indexPart)Mapping tags against proteome without variants.- Parameters:
tag
- the tagsequenceMatchingPreferences
- the sequence matching preferencesindexPart
- the index part- Returns:
- the protein mapping
-
getProteinMappingWithVariants
public ArrayList<PeptideProteinMapping> getProteinMappingWithVariants(Tag tag, SequenceMatchingParameters sequenceMatchingPreferences, int indexPart)Mapping tags against proteome with variants.- Parameters:
tag
- the tagsequenceMatchingPreferences
- the sequence matching preferencesindexPart
- the index part- Returns:
- the protein mapping
-
reconstructFasta
Reconstructs the FASTA file stored in the index- Parameters:
file
- the output FASTA file object
-
prefixCharacter
Backward propagation of the BWT to get the previous character.- Parameters:
proteinAccession
- the accessionindex
- the index in the suffix array / BWT- Returns:
- the previous character
-
suffixCharacter
Forward propagation of the BWT to get the n'th consecutive character- Parameters:
proteinAccession
- the accessionindex
- the index in the suffix array / BWTlength
- number of forward steps- Returns:
- the n'th next character
-
getSequence
Description copied from interface:SequenceProvider
Returns the protein sequence for the given accession.- Specified by:
getSequence
in interfaceSequenceProvider
- Parameters:
proteinAccession
- the accession of the protein- Returns:
- the sequence of the protein
-
getSubsequence
Description copied from interface:SequenceProvider
Returns the subsequence of the sequence of a given protein. Indexes are 0-based like for strings and no exception is thrown if indexes are out of bounds, the substring is trimmed.- Specified by:
getSubsequence
in interfaceSequenceProvider
- Parameters:
accession
- the accession of the proteinstart
- the start indexend
- the end index- Returns:
- the subsequence as string
-
getAccessions
Description copied from interface:SequenceProvider
Returns all accessions loaded in the provider.- Specified by:
getAccessions
in interfaceSequenceProvider
- Returns:
- all accessions loaded in the provider
-
getDecoyAccessions
Description copied from interface:SequenceProvider
Returns the decoy accessions.- Specified by:
getDecoyAccessions
in interfaceSequenceProvider
- Returns:
- the decoy accessions
-
getHeader
Description copied from interface:SequenceProvider
Returns the FASTA header of the protein as found in the FASTA file.- Specified by:
getHeader
in interfaceSequenceProvider
- Parameters:
proteinAccession
- the accession of the protein- Returns:
- the FASTA header of the protein as found in the FASTA file
-
getDescription
Description copied from interface:ProteinDetailsProvider
Returns the description of the protein with the given accession.- Specified by:
getDescription
in interfaceProteinDetailsProvider
- Parameters:
accession
- the accession of the protein- Returns:
- the description of the protein with the given accession
-
getSimpleDescription
Description copied from interface:ProteinDetailsProvider
Returns the simple description of the protein with the given accession.- Specified by:
getSimpleDescription
in interfaceProteinDetailsProvider
- Parameters:
accession
- the accession of the protein- Returns:
- the description of the protein with the given accession
-
getProteinDatabase
Description copied from interface:ProteinDetailsProvider
Returns the the protein database for the given protein.- Specified by:
getProteinDatabase
in interfaceProteinDetailsProvider
- Parameters:
accession
- the accession of the protein- Returns:
- the name of the protein database
-
getGeneName
Description copied from interface:ProteinDetailsProvider
Returns the gene name for the given protein.- Specified by:
getGeneName
in interfaceProteinDetailsProvider
- Parameters:
accession
- the accession of the protein- Returns:
- the gene name for the given protein
-
getTaxonomy
Description copied from interface:ProteinDetailsProvider
Returns the taxonomy for the given protein.- Specified by:
getTaxonomy
in interfaceProteinDetailsProvider
- Parameters:
accession
- the accession of the protein- Returns:
- the taxonomy for the given protein
-
getProteinEvidence
Description copied from interface:ProteinDetailsProvider
Returns an integer representing the protein evidence level as indexed by UniProt.- Specified by:
getProteinEvidence
in interfaceProteinDetailsProvider
- Parameters:
accession
- the protein accession- Returns:
- an integer representing the protein evidence level as indexed by UniProt
-