public class FMIndex extends ExperimentObject implements FastaMapper, SequenceProvider, ProteinDetailsProvider
Modifier and Type | Field and Description |
---|---|
static char |
DELIMITER
Character defining as delimiter between protein sequences.
|
ArrayList<Integer> |
indexStringLengths
Length of the indexed string (all concatenated protein sequences).
|
ArrayList<int[]> |
lessTablesPrimary
Less table for doing an update step according to the LF step.
|
ArrayList<int[]> |
lessTablesReversed
Less table for doing an update step according to the LF step reversed.
|
int |
maxPTMsPerPeptide
Maximal number of PTMs per peptide.
|
ArrayList<WaveletTree> |
occurrenceTablesPrimary
Wavelet tree for storing the burrows wheeler transform.
|
ArrayList<WaveletTree> |
occurrenceTablesReversed
Wavelet tree for storing the burrows wheeler transform reversed.
|
boolean |
onlyTrypticPeptides
Flag for only considering tryptic digested peptides.
|
static char |
SENTINEL
Sentinel character necessary for computation of the suffix array.
|
NO_KEY
Constructor and Description |
---|
FMIndex()
Empty default constructor.
|
FMIndex(File fastaFile,
FastaParameters fastaParameters,
WaitingHandler waitingHandler,
boolean displayProgress,
IdentificationParameters identificationParameters)
Constructor.
|
FMIndex(File fastaFile,
FastaParameters fastaParameters,
WaitingHandler waitingHandler,
boolean displayProgress,
PeptideVariantsParameters peptideVariantsPreferences,
SearchParameters searchParameters)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
void |
addModificationPattern(Modification modification)
Adds a modification pattern for bitwise pattern search.
|
boolean |
checkModificationPattern(PeptideProteinMapping peptideProteinMapping)
Checking if peptide-protein should be discarded due to pattern
modification conflict.
|
int[] |
computeMappingRanges(double mass)
Compute mapping ranges.
|
double |
computeMassTolerance(double currentTollerance,
double refMass)
Compute the inverse mass value.
|
double |
computeMassValue(double currentMass,
double refMass)
Compute the mass value.
|
Collection<String> |
getAccessions()
Returns all accessions loaded in the provider.
|
long |
getAllocatedBytes()
Computes the number of allocated bytes.
|
HashSet<String> |
getDecoyAccessions()
Returns the decoy accessions.
|
String |
getDescription(String accession)
Returns the description of the protein with the given accession.
|
String |
getGeneName(String accession)
Returns the gene name for the given protein.
|
String |
getHeader(String proteinAccession)
Returns the FASTA header of the protein as found in the FASTA file.
|
String |
getOrganismIdentifier(String accession)
Returns the organism identifier for the given protein.
|
ProteinDatabase |
getProteinDatabase(String accession)
Returns the the protein database for the given protein.
|
Integer |
getProteinEvidence(String accession)
Returns an integer representing the protein evidence level as indexed by
UniProt.
|
ArrayList<PeptideProteinMapping> |
getProteinMapping(String peptide,
SequenceMatchingParameters sequenceMatchingParameters)
Returns the protein mapping in the FASTA file loaded in the sequence
factory for the given peptide sequence in a map: peptide sequence found
in the FASTA file | protein accession | list of indexes of the peptide
sequence on the protein sequence.
|
ArrayList<PeptideProteinMapping> |
getProteinMapping(Tag tag,
SequenceMatchingParameters sequenceMatchingPreferences)
Returns the protein mappings for the given peptide sequence.
|
ArrayList<PeptideProteinMapping> |
getProteinMappingWithoutVariants(String peptide,
SequenceMatchingParameters seqMatchPref,
int indexPart)
Exact mapping peptides against the proteome.
|
ArrayList<PeptideProteinMapping> |
getProteinMappingWithoutVariants(Tag tag,
SequenceMatchingParameters sequenceMatchingPreferences,
int indexPart)
Mapping tags against proteome without variants.
|
ArrayList<PeptideProteinMapping> |
getProteinMappingWithVariants(Tag tag,
SequenceMatchingParameters sequenceMatchingPreferences,
int indexPart)
Mapping tags against proteome with variants.
|
ArrayList<PeptideProteinMapping> |
getProteinMappingWithVariantsFixed(String peptide,
SequenceMatchingParameters seqMatchPref,
int indexPart)
Variant tolerant mapping peptides against the proteome.
|
ArrayList<PeptideProteinMapping> |
getProteinMappingWithVariantsGeneric(String peptide,
SequenceMatchingParameters seqMatchPref,
int indexPart)
Variant tolerant mapping peptides against the proteome.
|
ArrayList<PeptideProteinMapping> |
getProteinMappingWithVariantsSpecific(String peptide,
SequenceMatchingParameters seqMatchPref,
int indexPart)
Variant tolerant mapping peptides against the proteome
|
String |
getSequence(String proteinAccession)
Returns the protein sequence for the given accession.
|
String |
getSimpleDescription(String accession)
Returns the simple description of the protein with the given accession.
|
String |
getSubsequence(String accession,
int start,
int end)
Returns the subsequence of the sequence of a given protein.
|
String |
getTaxonomy(String accession)
Returns the taxonomy for the given protein.
|
void |
mapTagToProteinTermini(MatrixContent cell,
double combinationMass,
boolean CTermDirection,
LinkedList<MatrixContent>[] matrix,
int k,
int leftIndex,
int rightIndex) |
double |
pepMass(String peptide)
Computing the mass of a peptide.
|
char |
prefixCharacter(String proteinAccession,
int index)
Backward propagation of the BWT to get the previous character.
|
void |
reconstructFasta(File file)
Reconstructs the FASTA file stored in the index
|
char |
suffixCharacter(String proteinAccession,
int index,
int length)
Forward propagation of the BWT to get the n'th consecutive character
|
boolean |
withinMassTolerance(double mass,
int numX)
Lookup, if mass can be described a combination of numX different amino
acids
|
addUrParam, asLong, clearParametersMap, getId, getUrParam, getUrParams, removeUrParam, setId, setUrParams
public int maxPTMsPerPeptide
public boolean onlyTrypticPeptides
public static char DELIMITER
public static char SENTINEL
public ArrayList<WaveletTree> occurrenceTablesPrimary
public ArrayList<WaveletTree> occurrenceTablesReversed
public ArrayList<int[]> lessTablesPrimary
public ArrayList<int[]> lessTablesReversed
public FMIndex()
public FMIndex(File fastaFile, FastaParameters fastaParameters, WaitingHandler waitingHandler, boolean displayProgress, PeptideVariantsParameters peptideVariantsPreferences, SearchParameters searchParameters) throws IOException, OutOfMemoryError, RuntimeException, IllegalArgumentException
fastaFile
- the FASTA file to indexfastaParameters
- the parameters for the FASTA file parsingwaitingHandler
- the waiting handlerdisplayProgress
- if true, the progress is displayedpeptideVariantsPreferences
- contains all parameters for variantssearchParameters
- the search parametersIOException
- exception thrown if an error occurs while iterating
the FASTA file.OutOfMemoryError
RuntimeException
IllegalArgumentException
public FMIndex(File fastaFile, FastaParameters fastaParameters, WaitingHandler waitingHandler, boolean displayProgress, IdentificationParameters identificationParameters) throws IOException, OutOfMemoryError, RuntimeException, IllegalArgumentException
fastaFile
- the FASTA file to indexfastaParameters
- the parameters for the FASTA file parsingwaitingHandler
- the waiting handlerdisplayProgress
- if true, the progress is displayedidentificationParameters
- contains all identification parametersIOException
- exception thrown if an error occurs while iterating
the FASTA fileOutOfMemoryError
RuntimeException
IllegalArgumentException
public double computeMassValue(double currentMass, double refMass)
currentMass
- the current massrefMass
- the reference masspublic double computeMassTolerance(double currentTollerance, double refMass)
currentTollerance
- the current massrefMass
- the reference masspublic void addModificationPattern(Modification modification)
modification
- modification objectpublic boolean checkModificationPattern(PeptideProteinMapping peptideProteinMapping)
peptideProteinMapping
- the peptide protein mappingpublic int[] computeMappingRanges(double mass)
mass
- the masspublic long getAllocatedBytes()
public ArrayList<PeptideProteinMapping> getProteinMapping(String peptide, SequenceMatchingParameters sequenceMatchingParameters)
FastaMapper
getProteinMapping
in interface FastaMapper
peptide
- the peptide sequencesequenceMatchingParameters
- the sequence matching preferencespublic ArrayList<PeptideProteinMapping> getProteinMappingWithoutVariants(String peptide, SequenceMatchingParameters seqMatchPref, int indexPart)
peptide
- the peptideseqMatchPref
- the sequence matching preferencesindexPart
- the index partpublic ArrayList<PeptideProteinMapping> getProteinMappingWithVariantsFixed(String peptide, SequenceMatchingParameters seqMatchPref, int indexPart)
peptide
- the peptideseqMatchPref
- the sequence match preferencesindexPart
- the index partpublic ArrayList<PeptideProteinMapping> getProteinMappingWithVariantsGeneric(String peptide, SequenceMatchingParameters seqMatchPref, int indexPart)
peptide
- the peptideseqMatchPref
- the sequence match preferencesindexPart
- the index partpublic ArrayList<PeptideProteinMapping> getProteinMappingWithVariantsSpecific(String peptide, SequenceMatchingParameters seqMatchPref, int indexPart)
peptide
- the peptideseqMatchPref
- the sequence matching preferencesindexPart
- the index partpublic double pepMass(String peptide)
peptide
- the peptidepublic boolean withinMassTolerance(double mass, int numX)
mass
- to be describednumX
- number of Xspublic void mapTagToProteinTermini(MatrixContent cell, double combinationMass, boolean CTermDirection, LinkedList<MatrixContent>[] matrix, int k, int leftIndex, int rightIndex)
public ArrayList<PeptideProteinMapping> getProteinMapping(Tag tag, SequenceMatchingParameters sequenceMatchingPreferences)
FastaMapper
getProteinMapping
in interface FastaMapper
tag
- the tag to look for in the tree. Must contain a consecutive
amino acid sequence of longer or equal size than the initialTagSize of
the treesequenceMatchingPreferences
- the sequence matching preferencespublic ArrayList<PeptideProteinMapping> getProteinMappingWithoutVariants(Tag tag, SequenceMatchingParameters sequenceMatchingPreferences, int indexPart)
tag
- the tagsequenceMatchingPreferences
- the sequence matching preferencesindexPart
- the index partpublic ArrayList<PeptideProteinMapping> getProteinMappingWithVariants(Tag tag, SequenceMatchingParameters sequenceMatchingPreferences, int indexPart)
tag
- the tagsequenceMatchingPreferences
- the sequence matching preferencesindexPart
- the index partpublic void reconstructFasta(File file)
file
- the output FASTA file objectpublic char prefixCharacter(String proteinAccession, int index)
proteinAccession
- the accessionindex
- the index in the suffix array / BWTpublic char suffixCharacter(String proteinAccession, int index, int length)
proteinAccession
- the accessionindex
- the index in the suffix array / BWTlength
- number of forward stepspublic String getSequence(String proteinAccession)
SequenceProvider
getSequence
in interface SequenceProvider
proteinAccession
- the accession of the proteinpublic String getSubsequence(String accession, int start, int end)
SequenceProvider
getSubsequence
in interface SequenceProvider
accession
- the accession of the proteinstart
- the start indexend
- the end indexpublic Collection<String> getAccessions()
SequenceProvider
getAccessions
in interface SequenceProvider
public HashSet<String> getDecoyAccessions()
SequenceProvider
getDecoyAccessions
in interface SequenceProvider
public String getHeader(String proteinAccession)
SequenceProvider
getHeader
in interface SequenceProvider
proteinAccession
- the accession of the proteinpublic String getDescription(String accession)
ProteinDetailsProvider
getDescription
in interface ProteinDetailsProvider
accession
- the accession of the proteinpublic String getSimpleDescription(String accession)
ProteinDetailsProvider
getSimpleDescription
in interface ProteinDetailsProvider
accession
- the accession of the proteinpublic ProteinDatabase getProteinDatabase(String accession)
ProteinDetailsProvider
getProteinDatabase
in interface ProteinDetailsProvider
accession
- the accession of the proteinpublic String getGeneName(String accession)
ProteinDetailsProvider
getGeneName
in interface ProteinDetailsProvider
accession
- the accession of the proteinpublic String getTaxonomy(String accession)
ProteinDetailsProvider
getTaxonomy
in interface ProteinDetailsProvider
accession
- the accession of the proteinpublic String getOrganismIdentifier(String accession)
ProteinDetailsProvider
getOrganismIdentifier
in interface ProteinDetailsProvider
accession
- the accession of the proteinpublic Integer getProteinEvidence(String accession)
ProteinDetailsProvider
getProteinEvidence
in interface ProteinDetailsProvider
accession
- the protein accessionCopyright © 2021. All rights reserved.