Class IdentificationFeaturesGenerator

java.lang.Object
com.compomics.util.experiment.identification.features.IdentificationFeaturesGenerator

public class IdentificationFeaturesGenerator
extends Object
This class provides identification features and stores them in cache.
Author:
Marc Vaudel, Harald Barsnes
  • Constructor Details

  • Method Details

    • setMassErrorDistribution

      public void setMassErrorDistribution​(String spectrumFile, double[] precursorMzDeviations)
      Sets a mass error distribution in the massErrorDistribution map.
      Parameters:
      spectrumFile - The spectrum file of interest.
      precursorMzDeviations - The sorted array of precursor mass errors.
    • getMassErrorDistribution

      public NonSymmetricalNormalDistribution getMassErrorDistribution​(String spectrumFile)
      Returns the precursor mass error distribution of validated peptides in a spectrum file.
      Parameters:
      spectrumFile - the name of the file of interest
      Returns:
      the precursor mass error distribution of validated peptides in a spectrum file
    • getCoverableAA

      public double[] getCoverableAA​(long proteinMatchKey)
      Returns an array of the likelihood to find identify a given amino acid in the protein sequence. 0 is the first amino acid.
      Parameters:
      proteinMatchKey - the key of the protein of interest
      Returns:
      an array of boolean indicating whether the amino acids of given peptides can generate peptides
    • getAACoverage

      public int[] getAACoverage​(long proteinMatchKey)
      Indicates the validation level of every amino acid in the given protein.
      Parameters:
      proteinMatchKey - the key of the protein of interest
      Returns:
      an array of boolean indicating whether the amino acids of given peptides can generate peptides
    • updateCoverableAA

      public void updateCoverableAA​(long proteinMatchKey)
      Updates the array of booleans indicating whether the amino acids of given peptides can generate peptides. Used when the main key for a protein has been altered.
      Parameters:
      proteinMatchKey - the key of the protein of interest
    • getFoundModifications

      public TreeSet<String> getFoundModifications()
      Returns the variable modifications found in the currently loaded dataset.
      Returns:
      the variable modifications found in the currently loaded dataset
    • estimateAACoverage

      public int[] estimateAACoverage​(long proteinMatchKey, boolean enzymatic)
      Returns amino acid coverage of this protein by enzymatic or non-enzymatic peptides only in an array where the index of the best validation level of every peptide covering a given amino acid is given. 0 is the first amino acid.
      Parameters:
      proteinMatchKey - the key of the protein match
      enzymatic - if not all peptides are considered, if true only enzymatic peptides will be considered, if false only non enzymatic
      Returns:
      the identification coverage of the protein sequence
    • getValidatedSequenceCoverage

      public double getValidatedSequenceCoverage​(long proteinMatchKey)
      Returns the sequence coverage of the protein of interest.
      Parameters:
      proteinMatchKey - the key of the protein of interest
      Returns:
      the sequence coverage
    • validatedSequenceCoverageInCache

      public boolean validatedSequenceCoverageInCache​(long proteinMatchKey)
      Indicates whether the sequence coverage is in cache.
      Parameters:
      proteinMatchKey - the key of the protein match
      Returns:
      true if the sequence coverage is in cache
    • getSequenceCoverage

      public HashMap<Integer,​Double> getSequenceCoverage​(long proteinMatchKey)
      Returns the sequence coverage of the protein of interest.
      Parameters:
      proteinMatchKey - the key of the protein of interest
      Returns:
      the sequence coverage
    • sequenceCoverageInCache

      public boolean sequenceCoverageInCache​(long proteinMatchKey)
      Indicates whether the sequence coverage is in cache.
      Parameters:
      proteinMatchKey - the key of the protein match
      Returns:
      true if the sequence coverage is in cache
    • getNonEnzymatic

      public long[] getNonEnzymatic​(long proteinMatchKey, DigestionParameters digestionPreferences)
      Returns a list of non-enzymatic peptides for a given protein match.
      Parameters:
      proteinMatchKey - the key of the protein match
      digestionPreferences - the digestion preferences
      Returns:
      a list of non-enzymatic peptides for a given protein match
    • updateSequenceCoverage

      public void updateSequenceCoverage​(long proteinMatchKey)
      Updates the sequence coverage of the protein of interest.
      Parameters:
      proteinMatchKey - the key of the protein of interest
    • getNormalizedSpectrumCounting

      public double getNormalizedSpectrumCounting​(long proteinMatchKey)
      Returns the spectrum counting metric of the protein match of interest using the preference settings normalized to the injected protein amount using the spectrum counting preferences of the identification features generator.
      Parameters:
      proteinMatchKey - the key of the protein match of interest
      Returns:
      the corresponding spectrum counting metric normalized in the metrics prefix of mol
    • getNormalizedSpectrumCounting

      public double getNormalizedSpectrumCounting​(long proteinMatchKey, SpectrumCountingParameters spectrumCountingPreferences, Metrics metrics)
      Returns the spectrum counting metric of the protein match of interest using the preference settings normalized to the injected protein amount using the given spectrum counting preferences.
      Parameters:
      proteinMatchKey - the key of the protein match of interest
      spectrumCountingPreferences - the spectrum counting preferences
      metrics - the metrics on the dataset
      Returns:
      the corresponding spectrum counting metric normalized in the metricsprefix of mol
    • getNormalizedSpectrumCounting

      public double getNormalizedSpectrumCounting​(long proteinMatchKey, UnitOfMeasurement unit, SpectrumCountingMethod method)
      Returns the spectrum counting metric of the protein match of interest using the preference settings normalized to the injected protein amount.
      Parameters:
      proteinMatchKey - the key of the protein match of interest
      unit - the unit to use for the normalization
      method - the method to use
      Returns:
      the corresponding spectrum counting metric normalized in the metricsprefix of mol
    • getNormalizedSpectrumCounting

      public double getNormalizedSpectrumCounting​(long proteinMatchKey, Metrics metrics, UnitOfMeasurement unit, Double referenceMass, SpectrumCountingMethod method)
      Returns the spectrum counting metric of the protein match of interest using the preference settings normalized to the injected protein amount.
      Parameters:
      proteinMatchKey - the key of the protein match of interest
      metrics - the metrics on the dataset
      unit - the unit to use for the normalization
      method - the method to use
      referenceMass - the reference mass if abundance normalization is chosen
      Returns:
      the corresponding spectrum counting metric normalized in the metrics prefix of mol
    • getSpectrumCounting

      public double getSpectrumCounting​(long proteinMatchKey)
      Returns the spectrum counting metric of the protein match of interest using the preference settings.
      Parameters:
      proteinMatchKey - the key of the protein match of interest
      Returns:
      the corresponding spectrum counting metric
    • getSpectrumCounting

      public Double getSpectrumCounting​(long proteinMatchKey, SpectrumCountingMethod method)
      Returns the spectrum counting metric of the protein match of interest for the given method.
      Parameters:
      proteinMatchKey - the key of the protein match of interest
      method - the method to use
      Returns:
      the corresponding spectrum counting metric
    • spectrumCountingInCache

      public boolean spectrumCountingInCache​(long proteinMatchKey)
      Indicates whether the default spectrum counting value is in cache for a protein match.
      Parameters:
      proteinMatchKey - the key of the protein match of interest
      Returns:
      true if the data is cached
    • estimateSpectrumCounting

      public static double estimateSpectrumCounting​(Identification identification, SequenceProvider sequenceProvider, long proteinMatchKey, SpectrumCountingParameters spectrumCountingPreferences, int maxPepLength, IdentificationParameters identificationParameters)
      Returns the spectrum counting index based on the project settings.
      Parameters:
      identification - the identification
      sequenceProvider - a provider for the protein sequences
      proteinMatchKey - the protein match key
      spectrumCountingPreferences - the spectrum counting preferences
      maxPepLength - the maximal length accepted for a peptide
      identificationParameters - the identification parameters
      Returns:
      the spectrum counting index
    • getObservableCoverage

      public double getObservableCoverage​(long proteinMatchKey)
      Returns the best protein coverage possible according to the given cleavage settings.
      Parameters:
      proteinMatchKey - the key of the protein match of interest
      Returns:
      the best protein coverage possible according to the given cleavage settings while estimating the probability to observe an amino acid
    • observableCoverageInCache

      public boolean observableCoverageInCache​(long proteinMatchKey)
      Indicates whether the observable coverage of a protein match is in cache.
      Parameters:
      proteinMatchKey - the key of the protein match
      Returns:
      true if the data is in cache
    • updateObservableCoverage

      public void updateObservableCoverage​(long proteinMatchKey)
      Updates the best protein coverage possible according to the given cleavage settings. Used when the main key for a protein has been altered.
      Parameters:
      proteinMatchKey - the key of the protein match of interest
    • getNValidatedProteins

      public int getNValidatedProteins()
      Returns the number of validated proteins. Note that this value is only available after getSortedProteinKeys has been called.
      Returns:
      the number of validated proteins
    • getNConfidentProteins

      public int getNConfidentProteins()
      Returns the number of confident proteins. Note that this value is only available after getSortedProteinKeys has been called.
      Returns:
      the number of validated proteins
    • getNUniquePeptides

      public int getNUniquePeptides​(long proteinMatchKey)
      Returns the number of unique peptides for this protein match. Note, this is independent of the validation status.
      Parameters:
      proteinMatchKey - the key of the match
      Returns:
      the number of unique peptides
    • getNUniqueValidatedPeptides

      public int getNUniqueValidatedPeptides​(long proteinMatchKey)
      Returns the number of unique validated peptides for this protein match.
      Parameters:
      proteinMatchKey - the key of the match
      Returns:
      the number of unique peptides
    • hasEnzymaticPeptides

      public boolean hasEnzymaticPeptides​(long proteinMatchKey)
      Returns true if the leading protein of the given group has any enzymatic peptides.
      Parameters:
      proteinMatchKey - the protein match
      Returns:
      true if the protein has any enzymatic peptides
    • getNEnzymaticTermini

      public int getNEnzymaticTermini​(Peptide peptide, String accession)
      Returns the maximal number of termini for the given peptide on the given protein. Returns 0 if the peptide is not found on the protein. Returns 2 if no enzyme was used.
      Parameters:
      peptide - the peptide
      accession - the accession of the protein
      Returns:
      the maximal number of termini for the given peptide on the given protein
    • getNValidatedPeptides

      public int getNValidatedPeptides​(long proteinMatchKey)
      Returns the number of validated peptides for a given protein match.
      Parameters:
      proteinMatchKey - the key of the protein match
      Returns:
      the number of validated peptides
    • getNConfidentPeptides

      public int getNConfidentPeptides​(long proteinMatchKey)
      Returns the number of confident peptides for a given protein match.
      Parameters:
      proteinMatchKey - the key of the protein match
      Returns:
      the number of confident peptides
    • updateNConfidentPeptides

      public void updateNConfidentPeptides​(long proteinMatchKey)
      Updates the number of confident peptides for a given protein match.
      Parameters:
      proteinMatchKey - the key of the protein match
    • updateNConfidentSpectra

      public void updateNConfidentSpectra​(long proteinMatchKey)
      Updates the number of confident spectra for a given protein match.
      Parameters:
      proteinMatchKey - the key of the protein match
    • nValidatedPeptidesInCache

      public boolean nValidatedPeptidesInCache​(long proteinMatchKey)
      Indicates whether the number of validated peptides is in cache for a given protein match.
      Parameters:
      proteinMatchKey - the key of the protein match
      Returns:
      true if the information is in cache
    • getNSpectra

      public Integer getNSpectra​(long proteinMatchKey)
      Estimates the number of spectra for the given protein match.
      Parameters:
      proteinMatchKey - the key of the given protein match
      Returns:
      the number of spectra for the given protein match
    • nSpectraInCache

      public boolean nSpectraInCache​(long proteinMatchKey)
      Indicates whether the number of spectra for a given protein match is in cache.
      Parameters:
      proteinMatchKey - the key of the protein match
      Returns:
      true if the data is in cache
    • getMaxNSpectra

      public int getMaxNSpectra()
      Returns the maximum number of spectra accounted by a single peptide Match all found in a protein match.
      Returns:
      the maximum number of spectra accounted by a single peptide Match all found in a protein match
    • getNValidatedSpectra

      public int getNValidatedSpectra​(long proteinMatchKey)
      Returns the number of validated spectra for a given protein match.
      Parameters:
      proteinMatchKey - the key of the protein match
      Returns:
      the number of validated spectra
    • getNConfidentSpectra

      public int getNConfidentSpectra​(long proteinMatchKey)
      Returns the number of confident spectra for a given protein match.
      Parameters:
      proteinMatchKey - the key of the protein match
      Returns:
      the number of validated spectra
    • nValidatedSpectraInCache

      public boolean nValidatedSpectraInCache​(long proteinMatchKey)
      Indicates whether the number of validated spectra is in cache for the given protein match.
      Parameters:
      proteinMatchKey - the key of the protein match
      Returns:
      true if the data is in cache
    • getNValidatedSpectraForPeptide

      public int getNValidatedSpectraForPeptide​(long peptideMatchKey)
      Returns the number of validated spectra for a given peptide match.
      Parameters:
      peptideMatchKey - the key of the peptide match
      Returns:
      the number of validated spectra
    • getNConfidentSpectraForPeptide

      public int getNConfidentSpectraForPeptide​(long peptideMatchKey)
      Returns the number of confident spectra for a given peptide match.
      Parameters:
      peptideMatchKey - the key of the peptide match
      Returns:
      the number of confident spectra
    • updateNConfidentSpectraForPeptide

      public void updateNConfidentSpectraForPeptide​(long peptideMatchKey)
      Updates the number of confident spectra for a given peptide match.
      Parameters:
      peptideMatchKey - the key of the peptide match
    • nValidatedSpectraForPeptideInCache

      public boolean nValidatedSpectraForPeptideInCache​(long peptideMatchKey)
      Indicates whether the number of validated spectra for a peptide match is in cache.
      Parameters:
      peptideMatchKey - the key of the peptide match
      Returns:
      true if the data is in cache
    • clearSpectrumCounting

      public void clearSpectrumCounting()
      Clears the spectrum counting data in cache.
    • getConfidentModificationSites

      public String getConfidentModificationSites​(IdentificationMatch identificationMatch, String sequence)
      Returns a summary of all modifications present on the sequence confidently assigned to an amino acid. Example: SEQVEM<mox>CE gives Oxidation of M (M6).
      Parameters:
      identificationMatch - the identification match
      sequence - the sequence
      Returns:
      a modification summary for the given match
    • getConfidentModificationSitesNumber

      public String getConfidentModificationSitesNumber​(IdentificationMatch identificationMatch)
      Returns the number of confidently localized variable modifications.
      Parameters:
      identificationMatch - the identification match
      Returns:
      a modification summary for the given protein
    • getAmbiguousModificationSites

      public String getAmbiguousModificationSites​(IdentificationMatch identificationMatch, String sequence)
      Returns a list of the modifications present on the sequence ambiguously assigned to an amino acid grouped by representative site followed by secondary ambiguous sites. Example: SEQVEM<mox>CEM<mox>K returns M6 {M9}.
      Parameters:
      identificationMatch - the identification match
      sequence - the sequence
      Returns:
      a modification summary for the given protein
    • getAmbiguousModificationSiteNumber

      public String getAmbiguousModificationSiteNumber​(IdentificationMatch identificationMatch)
      Returns a summary of the number of modifications present on the sequence ambiguously assigned to an amino acid grouped by representative site followed by secondary ambiguous sites. Example: SEQVEM<mox>CEM<mox>K returns M6 {M9}.
      Parameters:
      identificationMatch - the identification match
      Returns:
      a modification summary for the given match
    • getConfidentModificationSites

      public String getConfidentModificationSites​(IdentificationMatch match, String sequence, ArrayList<String> targetedModifications)
      Returns a summary of the modifications present on the peptide sequence confidently assigned to an amino acid with focus on given list of modifications.
      Parameters:
      match - the identification match
      sequence - the sequence
      targetedModifications - the modifications to include in the summary
      Returns:
      a modification summary for the given match
    • getConfidentModificationSitesNumber

      public String getConfidentModificationSitesNumber​(IdentificationMatch match, ArrayList<String> targetedModifications)
      Returns the number of confidently localized variable modifications.
      Parameters:
      match - the identification match
      targetedModifications - the modifications to include in the summary
      Returns:
      a modification summary for the given match
    • getAmbiguousModificationSites

      public String getAmbiguousModificationSites​(IdentificationMatch match, String sequence, ArrayList<String> targetedModifications)
      Returns a list of the modifications present on the sequence ambiguously assigned to an amino acid grouped by representative site followed by secondary ambiguous sites. Example: SEQVEM<mox>CEM<mox>K returns M6 {M9}.
      Parameters:
      match - the identification match
      sequence - the sequence
      targetedModifications - the modifications to include in the summary
      Returns:
      a modification summary for the given match
    • getAmbiguousModificationSiteNumber

      public String getAmbiguousModificationSiteNumber​(IdentificationMatch match, ArrayList<String> targetedModifications)
      Returns a summary of the number of modifications present on the sequence ambiguously assigned to an amino acid grouped by representative site followed by secondary ambiguous sites. Example: SEQVEM<mox>CEM<mox>K returns M6 {M9}.
      Parameters:
      match - the identification match
      targetedModifications - the modifications to include in the summary
      Returns:
      a modification summary for the given match
    • getModifiedSequence

      public String getModifiedSequence​(IdentificationMatch identificationMatch, String sequence)
      Returns the match sequence annotated with modifications.
      Parameters:
      identificationMatch - the identification match
      sequence - the sequence of the match
      Returns:
      the protein sequence annotated with modifications
    • getValidatedProteins

      public long[] getValidatedProteins​(FilterParameters filterPreferences)
      Returns the list of validated protein keys. Returns null if the proteins have yet to be validated.
      Parameters:
      filterPreferences - the filtering preferences used. can be null
      Returns:
      the list of validated protein keys
    • getValidatedProteins

      public long[] getValidatedProteins​(WaitingHandler waitingHandler, FilterParameters filterPreferences)
      Returns the list of validated protein keys. Returns null if the proteins have yet to be validated.
      Parameters:
      filterPreferences - the filtering preferences used. can be null
      waitingHandler - the waiting handler, can be null
      Returns:
      the list of validated protein keys
    • getProcessedProteinKeys

      public long[] getProcessedProteinKeys​(WaitingHandler waitingHandler, FilterParameters filterPreferences)
      Returns the sorted list of protein keys.
      Parameters:
      filterPreferences - the filtering preferences used. can be null
      waitingHandler - the waiting handler, can be null
      Returns:
      the sorted list of protein keys
    • getProteinKeys

      public long[] getProteinKeys​(WaitingHandler waitingHandler, FilterParameters filterPreferences)
      Returns the ordered protein keys to display when no filtering is applied.
      Parameters:
      waitingHandler - can be null
      filterPreferences - the filtering preferences used. can be null
      Returns:
      the ordered protein keys to display when no filtering is applied.
    • getSortedPeptideKeys

      public long[] getSortedPeptideKeys​(long proteinKey)
      Returns a sorted list of peptide keys from the protein of interest.
      Parameters:
      proteinKey - the key of the protein of interest
      Returns:
      a sorted list of the corresponding peptide keys
    • getSortedPsmKeys

      public long[] getSortedPsmKeys​(long peptideKey, boolean sortOnRt, boolean forceUpdate)
      Returns the ordered list of spectrum keys for a given peptide.
      Parameters:
      peptideKey - the key of the peptide of interest
      sortOnRt - if true, the PSMs are sorted in retention time, false sorts on PSM score
      forceUpdate - if true, the sorted listed is recreated even if not needed
      Returns:
      the ordered list of spectrum keys
    • getNValidatedPsms

      public int getNValidatedPsms()
      Returns the number of validated PSMs for the last selected peptide /!\ This value is only available after getSortedPsmKeys has been called.
      Returns:
      the number of validated PSMs for the last selected peptide
    • setProteinKeys

      public void setProteinKeys​(long[] proteinList)
      Sets the ordered protein list.
      Parameters:
      proteinList - the ordered protein list
    • getIdentificationFeaturesCache

      public IdentificationFeaturesCache getIdentificationFeaturesCache()
      Returns the identification features cache.
      Returns:
      the identification features cache
    • setIdentificationFeaturesCache

      public void setIdentificationFeaturesCache​(IdentificationFeaturesCache identificationFeaturesCache)
      Sets the the identification features cache.
      Parameters:
      identificationFeaturesCache - the new identification features cache
    • getMetrics

      public Metrics getMetrics()
      Returns the metrics.
      Returns:
      the metrics
    • setSpectrumCountingPreferences

      public void setSpectrumCountingPreferences​(SpectrumCountingParameters spectrumCountingPreferences)
      Sets the spectrum counting preferences.
      Parameters:
      spectrumCountingPreferences - the spectrum counting preferences
    • getNValidatedProteinGroups

      public int getNValidatedProteinGroups​(long peptideKey)
      Indicates whether a peptide is found in a single protein match.
      Parameters:
      peptideKey - the peptide key of interest
      Returns:
      true if peptide is found in a single protein match
    • getNValidatedProteinGroups

      public int getNValidatedProteinGroups​(long peptideKey, WaitingHandler waitingHandler)
      Indicates whether a peptide is found in a single protein match.
      Parameters:
      peptideKey - the peptide key of interest
      waitingHandler - waiting handler allowing the canceling of the progress
      Returns:
      true if peptide is found in a single protein match