java.lang.Object
com.compomics.util.experiment.personalization.ExperimentObject
com.compomics.util.experiment.identification.protein_inference.fm_index.FMIndex
All Implemented Interfaces:
FastaMapper, ProteinDetailsProvider, SequenceProvider, Serializable

public class FMIndex extends ExperimentObject implements FastaMapper, SequenceProvider, ProteinDetailsProvider
The FM index.
Author:
Dominik Kopczynski, Marc Vaudel
See Also:
  • Field Details

    • maxPTMsPerPeptide

      public int maxPTMsPerPeptide
      Maximal number of PTMs per peptide.
    • onlyTrypticPeptides

      public boolean onlyTrypticPeptides
      Flag for only considering tryptic digested peptides.
    • DELIMITER

      public static char DELIMITER
      Character defining as delimiter between protein sequences.
    • SENTINEL

      public static char SENTINEL
      Sentinel character necessary for computation of the suffix array.
    • occurrenceTablesPrimary

      public ArrayList<WaveletTree> occurrenceTablesPrimary
      Wavelet tree for storing the burrows wheeler transform.
    • occurrenceTablesReversed

      public ArrayList<WaveletTree> occurrenceTablesReversed
      Wavelet tree for storing the burrows wheeler transform reversed.
    • lessTablesPrimary

      public ArrayList<int[]> lessTablesPrimary
      Less table for doing an update step according to the LF step.
    • lessTablesReversed

      public ArrayList<int[]> lessTablesReversed
      Less table for doing an update step according to the LF step reversed.
    • indexStringLengths

      public ArrayList<Integer> indexStringLengths
      Length of the indexed string (all concatenated protein sequences).
  • Constructor Details

  • Method Details

    • computeMassValue

      public double computeMassValue(double currentMass, double refMass)
      Compute the mass value.
      Parameters:
      currentMass - the current mass
      refMass - the reference mass
      Returns:
      the mass value
    • computeMassTolerance

      public double computeMassTolerance(double currentTollerance, double refMass)
      Compute the inverse mass value.
      Parameters:
      currentTollerance - the current mass
      refMass - the reference mass
      Returns:
      the inverse mass value
    • addModificationPattern

      public void addModificationPattern(Modification modification)
      Adds a modification pattern for bitwise pattern search.
      Parameters:
      modification - modification object
    • checkModificationPattern

      public boolean checkModificationPattern(PeptideProteinMapping peptideProteinMapping)
      Checking if peptide-protein should be discarded due to pattern modification conflict.
      Parameters:
      peptideProteinMapping - the peptide protein mapping
      Returns:
      either yes or no
    • computeMappingRanges

      public int[] computeMappingRanges(double mass)
      Compute mapping ranges.
      Parameters:
      mass - the mass
      Returns:
      the mapping ranges
    • getAllocatedBytes

      public long getAllocatedBytes()
      Computes the number of allocated bytes.
      Returns:
      allocated bytes
    • getProteinMapping

      public ArrayList<PeptideProteinMapping> getProteinMapping(String peptide, SequenceMatchingParameters sequenceMatchingParameters)
      Description copied from interface: FastaMapper
      Returns the protein mapping in the FASTA file loaded in the sequence factory for the given peptide sequence in a map: peptide sequence found in the FASTA file | protein accession | list of indexes of the peptide sequence on the protein sequence. 0 is the first amino acid.
      Specified by:
      getProteinMapping in interface FastaMapper
      Parameters:
      peptide - the peptide sequence
      sequenceMatchingParameters - the sequence matching preferences
      Returns:
      the peptide to protein mapping: peptide sequence > protein accession > index in the protein An empty map if not
    • getProteinMappingWithoutVariants

      public ArrayList<PeptideProteinMapping> getProteinMappingWithoutVariants(String peptide, SequenceMatchingParameters seqMatchPref, int indexPart)
      Exact mapping peptides against the proteome.
      Parameters:
      peptide - the peptide
      seqMatchPref - the sequence matching preferences
      indexPart - the index part
      Returns:
      the mapping
    • getProteinMappingWithVariantsFixed

      public ArrayList<PeptideProteinMapping> getProteinMappingWithVariantsFixed(String peptide, SequenceMatchingParameters seqMatchPref, int indexPart)
      Variant tolerant mapping peptides against the proteome.
      Parameters:
      peptide - the peptide
      seqMatchPref - the sequence match preferences
      indexPart - the index part
      Returns:
      the mapping
    • getProteinMappingWithVariantsGeneric

      public ArrayList<PeptideProteinMapping> getProteinMappingWithVariantsGeneric(String peptide, SequenceMatchingParameters seqMatchPref, int indexPart)
      Variant tolerant mapping peptides against the proteome.
      Parameters:
      peptide - the peptide
      seqMatchPref - the sequence match preferences
      indexPart - the index part
      Returns:
      the mapping
    • getProteinMappingWithVariantsSpecific

      public ArrayList<PeptideProteinMapping> getProteinMappingWithVariantsSpecific(String peptide, SequenceMatchingParameters seqMatchPref, int indexPart)
      Variant tolerant mapping peptides against the proteome
      Parameters:
      peptide - the peptide
      seqMatchPref - the sequence matching preferences
      indexPart - the index part
      Returns:
      the mapping
    • pepMass

      public double pepMass(String peptide)
      Computing the mass of a peptide.
      Parameters:
      peptide - the peptide
      Returns:
      the peptide mass
    • withinMassTolerance

      public boolean withinMassTolerance(double mass, int numX)
      Lookup, if mass can be described a combination of numX different amino acids
      Parameters:
      mass - to be described
      numX - number of Xs
      Returns:
      decision
    • mapTagToProteinTermini

      public void mapTagToProteinTermini(MatrixContent cell, double combinationMass, boolean CTermDirection, LinkedList<MatrixContent>[] matrix, int k, int leftIndex, int rightIndex)
    • getProteinMapping

      public ArrayList<PeptideProteinMapping> getProteinMapping(Tag tag, SequenceMatchingParameters sequenceMatchingPreferences)
      Description copied from interface: FastaMapper
      Returns the protein mappings for the given peptide sequence. Peptide sequence | Protein accession | Index in the protein. An empty map if not found.
      Specified by:
      getProteinMapping in interface FastaMapper
      Parameters:
      tag - the tag to look for in the tree. Must contain a consecutive amino acid sequence of longer or equal size than the initialTagSize of the tree
      sequenceMatchingPreferences - the sequence matching preferences
      Returns:
      the protein mapping for the given peptide sequence
    • getProteinMappingWithoutVariants

      public ArrayList<PeptideProteinMapping> getProteinMappingWithoutVariants(Tag tag, SequenceMatchingParameters sequenceMatchingPreferences, int indexPart)
      Mapping tags against proteome without variants.
      Parameters:
      tag - the tag
      sequenceMatchingPreferences - the sequence matching preferences
      indexPart - the index part
      Returns:
      the protein mapping
    • getProteinMappingWithVariants

      public ArrayList<PeptideProteinMapping> getProteinMappingWithVariants(Tag tag, SequenceMatchingParameters sequenceMatchingPreferences, int indexPart)
      Mapping tags against proteome with variants.
      Parameters:
      tag - the tag
      sequenceMatchingPreferences - the sequence matching preferences
      indexPart - the index part
      Returns:
      the protein mapping
    • reconstructFasta

      public void reconstructFasta(File file)
      Reconstructs the FASTA file stored in the index
      Parameters:
      file - the output FASTA file object
    • prefixCharacter

      public char prefixCharacter(String proteinAccession, int index)
      Backward propagation of the BWT to get the previous character.
      Parameters:
      proteinAccession - the accession
      index - the index in the suffix array / BWT
      Returns:
      the previous character
    • suffixCharacter

      public char suffixCharacter(String proteinAccession, int index, int length)
      Forward propagation of the BWT to get the n'th consecutive character
      Parameters:
      proteinAccession - the accession
      index - the index in the suffix array / BWT
      length - number of forward steps
      Returns:
      the n'th next character
    • getSequence

      public String getSequence(String proteinAccession)
      Description copied from interface: SequenceProvider
      Returns the protein sequence for the given accession.
      Specified by:
      getSequence in interface SequenceProvider
      Parameters:
      proteinAccession - the accession of the protein
      Returns:
      the sequence of the protein
    • getSubsequence

      public String getSubsequence(String accession, int start, int end)
      Description copied from interface: SequenceProvider
      Returns the subsequence of the sequence of a given protein. Indexes are 0-based like for strings and no exception is thrown if indexes are out of bounds, the substring is trimmed.
      Specified by:
      getSubsequence in interface SequenceProvider
      Parameters:
      accession - the accession of the protein
      start - the start index
      end - the end index
      Returns:
      the subsequence as string
    • getAccessions

      public Collection<String> getAccessions()
      Description copied from interface: SequenceProvider
      Returns all accessions loaded in the provider.
      Specified by:
      getAccessions in interface SequenceProvider
      Returns:
      all accessions loaded in the provider
    • getDecoyAccessions

      public HashSet<String> getDecoyAccessions()
      Description copied from interface: SequenceProvider
      Returns the decoy accessions.
      Specified by:
      getDecoyAccessions in interface SequenceProvider
      Returns:
      the decoy accessions
    • getHeader

      public Header getHeader(String proteinAccession)
      Returns the header corresponding to the given accession.
      Parameters:
      proteinAccession - The accession.
      Returns:
      The corresponding header.
    • getHeaderAsString

      public String getHeaderAsString(String proteinAccession)
      Description copied from interface: SequenceProvider
      Returns the FASTA header of the protein as found in the FASTA file.
      Specified by:
      getHeaderAsString in interface SequenceProvider
      Parameters:
      proteinAccession - the accession of the protein
      Returns:
      the FASTA header of the protein as found in the FASTA file
    • getDescription

      public String getDescription(String proteinAccession)
      Description copied from interface: ProteinDetailsProvider
      Returns the description of the protein with the given accession.
      Specified by:
      getDescription in interface ProteinDetailsProvider
      Parameters:
      proteinAccession - the accession of the protein
      Returns:
      the description of the protein with the given accession
    • getSimpleDescription

      public String getSimpleDescription(String proteinAccession)
      Description copied from interface: ProteinDetailsProvider
      Returns the simple description of the protein with the given accession.
      Specified by:
      getSimpleDescription in interface ProteinDetailsProvider
      Parameters:
      proteinAccession - the accession of the protein
      Returns:
      the description of the protein with the given accession
    • getProteinDatabase

      public ProteinDatabase getProteinDatabase(String proteinAccession)
      Description copied from interface: ProteinDetailsProvider
      Returns the the protein database for the given protein.
      Specified by:
      getProteinDatabase in interface ProteinDetailsProvider
      Parameters:
      proteinAccession - the accession of the protein
      Returns:
      the name of the protein database
    • getGeneName

      public String getGeneName(String proteinAccession)
      Description copied from interface: ProteinDetailsProvider
      Returns the gene name for the given protein.
      Specified by:
      getGeneName in interface ProteinDetailsProvider
      Parameters:
      proteinAccession - the accession of the protein
      Returns:
      the gene name for the given protein
    • getTaxonomy

      public String getTaxonomy(String proteinAccession)
      Description copied from interface: ProteinDetailsProvider
      Returns the taxonomy for the given protein.
      Specified by:
      getTaxonomy in interface ProteinDetailsProvider
      Parameters:
      proteinAccession - the accession of the protein
      Returns:
      the taxonomy for the given protein
    • getOrganismIdentifier

      public String getOrganismIdentifier(String proteinAccession)
      Description copied from interface: ProteinDetailsProvider
      Returns the organism identifier for the given protein.
      Specified by:
      getOrganismIdentifier in interface ProteinDetailsProvider
      Parameters:
      proteinAccession - the accession of the protein
      Returns:
      the organism name for the given protein
    • getProteinEvidence

      public Integer getProteinEvidence(String proteinAccession)
      Description copied from interface: ProteinDetailsProvider
      Returns an integer representing the protein evidence level as indexed by UniProt.
      Specified by:
      getProteinEvidence in interface ProteinDetailsProvider
      Parameters:
      proteinAccession - the protein accession
      Returns:
      an integer representing the protein evidence level as indexed by UniProt