java.lang.Object
com.compomics.util.experiment.personalization.ExperimentObject
com.compomics.util.experiment.io.biology.protein.Header
All Implemented Interfaces:
Serializable, Cloneable

public class Header
extends ExperimentObject
implements Cloneable
This class represents the header for a Protein instance. It is meant to work closely with FASTA format notation. The Header class knows how to handle certain often-used headers such as SwissProt and NCBI formatted FASTA headers.
Note that the Header class is it's own factory, and should be used as such.
Author:
Lennart Martens, Harald Barsnes, Marc Vaudel
See Also:
Serialized Form
  • Method Details

    • parseFromFASTA

      public static Header parseFromFASTA​(String aFASTAHeader) throws StringIndexOutOfBoundsException
      Factory method that constructs a Header instance based on a FASTA header line.
      Parameters:
      aFASTAHeader - the String with the original FASTA header line.
      Returns:
      Header with the Header instance representing the given header. The object returned will have been parsed correctly if it is a standard SwissProt or NCBI formatted header, and will be plain in all other cases.
      Throws:
      StringIndexOutOfBoundsException - thrown if issues occur during the parsing
    • getID

      public String getID()
      Returns the ID.
      Returns:
      the ID
    • setID

      public void setID​(String aID)
      Sets the ID. Null if not set.
      Parameters:
      aID - the ID
    • getForeignID

      public String getForeignID()
      Returns the foreign ID. Null if not set.
      Returns:
      the foreign ID
    • setForeignID

      public void setForeignID​(String aForeignID)
      Sets the foreign ID.
      Parameters:
      aForeignID - the foreign ID
    • getAccession

      public String getAccession()
      Returns the accession. Null if not set.
      Returns:
      the accession
    • setAccession

      public void setAccession​(String aAccession)
      Sets the accession.
      Parameters:
      aAccession - the accession
    • getAccessionOrRest

      public String getAccessionOrRest()
      Returns the accession or if this is null the rest. This is a quick fix for unsupported custom headers.
      Returns:
      the accession or if this is null the rest
    • getDatabaseType

      public ProteinDatabase getDatabaseType()
      Returns the database type as inferred from the header structure.
      Returns:
      the database type
    • setDatabaseType

      public void setDatabaseType​(ProteinDatabase aDatabaseType)
      Sets the database type.
      Parameters:
      aDatabaseType - the database type
    • getForeignAccession

      public String getForeignAccession()
      Returns the foreign accession. Null if not set.
      Returns:
      the foreign accession
    • setForeignAccession

      public void setForeignAccession​(String aForeignAccession)
      Sets the foreign accession.
      Parameters:
      aForeignAccession - the foreign accession
    • getDescription

      public String getDescription()
      Returns the description. Null if not set.
      Returns:
      the description
    • setDescription

      public void setDescription​(String aDescription)
      Sets the description.
      Parameters:
      aDescription - the description
    • getDescriptionShort

      public String getDescriptionShort()
      Returns the short description. Null if not set.
      Returns:
      the short description
    • setDescriptionShort

      public void setDescriptionShort​(String aDescriptionShort)
      Sets the short description.
      Parameters:
      aDescriptionShort - the short description
    • getDescriptionProteinName

      public String getDescriptionProteinName()
      Returns the protein name as inferred from the description.
      Returns:
      the protein name
    • setDescriptionProteinName

      public void setDescriptionProteinName​(String aDescriptionProteinName)
      Sets the protein name.
      Parameters:
      aDescriptionProteinName - the protein name
    • getGeneName

      public String getGeneName()
      Returns the gene name.
      Returns:
      the gene name
    • setGeneName

      public void setGeneName​(String aGeneName)
      Set the gene name.
      Parameters:
      aGeneName - the gene name
    • getProteinEvidence

      public Integer getProteinEvidence()
      Returns the protein evidence level as indexed in UniProt. Null if not available.
      Returns:
      the protein evidence level
    • setProteinEvidence

      public void setProteinEvidence​(Integer aProteinEvidence)
      Sets the protein evidence level.
      Parameters:
      aProteinEvidence - the protein evidence level
    • getTaxonomy

      public String getTaxonomy()
      Returns the taxonomy.
      Returns:
      the taxonomy
    • setTaxonomy

      public void setTaxonomy​(String aTaxonomy)
      Sets the taxonomy.
      Parameters:
      aTaxonomy - the taxonomy
    • getForeignDescription

      public String getForeignDescription()
      Returns the foreign description.
      Returns:
      the foreign description
    • setForeignDescription

      public void setForeignDescription​(String aForeignDescription)
      Sets the foreign description.
      Parameters:
      aForeignDescription - the foreign description
    • getRest

      public String getRest()
      Returns the rest of the header.
      Returns:
      the rest of the header
    • setRest

      public void setRest​(String aRest)
      Sets the rest of the header.
      Parameters:
      aRest - the rest of the header
    • getRawHeader

      public String getRawHeader()
      Returns the entire header.
      Returns:
      the entire header
    • setRawHeader

      public void setRawHeader​(String aRawHeader)
      Sets the entire header.
      Parameters:
      aRawHeader - the entire header
    • getSimpleProteinDescription

      public String getSimpleProteinDescription()
      Returns a simplified protein description for a UniProt header. For example "GRP78_HUMAN 78 kDa glucose-regulated protein OS=Homo sapiens GN=HSPA5 PE=1 SV=2" becomes "78 kDa glucose-regulated protein [GRP78_HUMAN]". For non UniProt headers the normal protein description is returned.
      Returns:
      a simplified protein description for a UniProt header
    • getAbbreviatedFASTAHeader

      public String getAbbreviatedFASTAHeader()
      This method returns an abbreviated version of the Header, suitable for inclusion in FASTA formatted files.
      The abbreviated header is composed in the following way:
      >[ID]|[accession_string]|([foreign_ID]|[foreign_accession_string]|[foreign_description] )[description]
      Returns:
      String with the abbreviated header.
    • getAbbreviatedFASTAHeader

      public String getAbbreviatedFASTAHeader​(String decoyTag)
      This method returns an abbreviated version of the Header, suitable for inclusion in FASTA formatted files.
      The abbreviated header is composed in the following way:
      >[ID]|[accession_string]|([foreign_ID]|[foreign_accession_string]|[foreign_description] )[description]
      Parameters:
      decoyTag - the decoy tag to add
      Returns:
      String with the abbreviated header.
    • toString

      public String toString()
      This method reports on the entire processed(!) header. To get the raw header use getRawHeader instead.
      Overrides:
      toString in class Object
      Returns:
      String with the full header.
    • toString

      public String toString​(String decoyTag)
      This method reports on the entire processed(!) header, with the given decoy tag added. To get the raw header use getRawHeader instead.
      Parameters:
      decoyTag - the decoy tag to add
      Returns:
      String with the full header.
    • getScore

      public int getScore()
      This method will attribute a score to the current header, based on the following scoring list:
      • SwissProt : 4
      • IPI, SwissProt reference : 3
      • IPI, TrEMBL or REFSEQ_NP reference : 2
      • IPI, without SwissProt, TrEMBL or REFSEQ_NP reference : 1
      • NCBI, SwissProt reference : 2
      • NCBI, other reference : 1
      • Unknown header format : 0
      Returns:
      int with the header score. The higher the score, the more interesting a Header is.
    • getCoreHeader

      public String getCoreHeader()
      This method reports on the core information for the header, which is comprised of the ID and the accession String:
           [ID]|[accession_string]
       
      This is mostly useful for appending this core as an addendum to another header.
      Returns:
      String with the header core data ([ID]|[accession_string]).
    • addAddendum

      public void addAddendum​(String aAddendum)
      This method allows the addition of an addendum to the list. If the addendum is already preceded with '^A', it is added as is, otherwise '^A' is prepended before addition to the list.
      Parameters:
      aAddendum - String with the addendum, facultatively preceded by '^A'.
    • getAddenda

      public String getAddenda()
      This method allows the caller to retrieve all addenda for the current header, or 'null' if there aren't any.
      Returns:
      String with the addenda, or 'null' if there aren't any.
    • hasAddenda

      public boolean hasAddenda()
      This method reports on the presence of addenda for this header.
      Returns:
      boolean whether addenda are present.
    • getFullHeaderWithAddenda

      public String getFullHeaderWithAddenda()
      This method reports on the full header, with the addenda (if present). If no addenda are present, this method reports the same information as the 'toString()' method.
      Returns:
      String with the header and addenda (if any).
    • getAbbreviatedFASTAHeaderWithAddenda

      public String getAbbreviatedFASTAHeaderWithAddenda()
      This method returns an abbreviated version of the Header, suitable for inclusion in FASTA formatted files.
      The abbreviated header is composed in the following way:
      >[ID]|[accession_string]|([foreign_ID]|[foreign_accession_string]|[foreign_description] )[description]([addenda])
      Note that the output of this method is identical to that of the getAbbreviatedFASTAHeader() if no addenda are present.
      Returns:
      String with the abbreviated header and addenda (if any).
    • setLocation

      public void setLocation​(int aStart, int aEnd)
      This method allows the caller to add information to the header about location of the sequence in a certain master sequence.
      This information is typically specified right after the accession number:
           [id]|[accession_string] ([startindex]-[endindex])|...
       
      Please note the following:
      • If an index is already present, it is removed and replaced.
      • If the header is of unknown format, the indeces are appended to the end of the header.
      Parameters:
      aStart - int with the startindex.
      aEnd - int with the endindex.
    • getStartLocation

      public int getStartLocation()
      This method reports on the start index of the header. It returns '-1' if no location is specified.
      Returns:
      int with the start location, or '-1' if none was defined.
    • getEndLocation

      public int getEndLocation()
      This method reports on the end index of the header. It returns '-1' if no location is specified.
      Returns:
      int with the end location, or '-1' if none was defined.
    • clone

      public Object clone()
      This method provides a deep copy of the Header instance.
      Overrides:
      clone in class Object
      Returns:
      Object Header that is a deep copy of this Header.
    • getDatabaseTypesAsString

      public static String[] getDatabaseTypesAsString()
      Returns the implemented database types as an array of String.
      Returns:
      the implemented database types as an array of String
    • getDatabaseTypeAsString

      public static String getDatabaseTypeAsString​(ProteinDatabase databaseType)
      Convenience method returning the database name as a String.
      Parameters:
      databaseType - the database type
      Returns:
      the name
    • getProteinEvidencAsString

      public static String getProteinEvidencAsString​(Integer type)
      Return the UniProt protein evidence type as text.
      Parameters:
      type - the type of evidence
      Returns:
      the protein evidence type as text
    • asGenericHeader

      public String asGenericHeader()
      Returns the header in generic format.
      Returns:
      the header in generic format