Class Header
java.lang.Object
com.compomics.util.experiment.personalization.ExperimentObject
com.compomics.util.experiment.io.biology.protein.Header
- All Implemented Interfaces:
Serializable,Cloneable
This class represents the header for a Protein instance. It is meant to work
closely with FASTA format notation. The Header class knows how to handle
certain often-used headers such as SwissProt and NCBI formatted FASTA
headers.
Note that the Header class is it's own factory, and should be used as such.
Note that the Header class is it's own factory, and should be used as such.
- Author:
- Lennart Martens, Harald Barsnes, Marc Vaudel
- See Also:
-
Field Summary
Fields inherited from class com.compomics.util.experiment.personalization.ExperimentObject
NO_KEY -
Method Summary
Modifier and TypeMethodDescriptionvoidaddAddendum(String aAddendum) This method allows the addition of an addendum to the list.Returns the header in generic format.clone()This method provides a deep copy of the Header instance.This method returns an abbreviated version of the Header, suitable for inclusion in FASTA formatted files.getAbbreviatedFASTAHeader(String decoyTag) This method returns an abbreviated version of the Header, suitable for inclusion in FASTA formatted files.This method returns an abbreviated version of the Header, suitable for inclusion in FASTA formatted files.Returns the accession.Returns the accession or if this is null the rest.This method allows the caller to retrieve all addenda for the current header, or 'null' if there aren't any.This method reports on the core information for the header, which is comprised of the ID and the accession String:Returns the database type as inferred from the header structure.static StringgetDatabaseTypeAsString(ProteinDatabase databaseType) Convenience method returning the database name as a String.static String[]Returns the implemented database types as an array of String.Returns the description.Returns the protein name as inferred from the description.Returns the short description.intThis method reports on the end index of the header.Returns the foreign accession.Returns the foreign description.Returns the foreign ID.This method reports on the full header, with the addenda (if present).Returns the gene name.getID()Returns the ID.Returns the organism identifier.static StringReturn the UniProt protein evidence type as text.Returns the protein evidence level as indexed in UniProt.Returns the entire header.getRest()Returns the rest of the header.intgetScore()This method will attribute a score to the current header, based on the following scoring list: SwissProt : 4 IPI, SwissProt reference : 3 IPI, TrEMBL or REFSEQ_NP reference : 2 IPI, without SwissProt, TrEMBL or REFSEQ_NP reference : 1 NCBI, SwissProt reference : 2 NCBI, other reference : 1 Unknown header format : 0Returns a simplified protein description for a UniProt header.intThis method reports on the start index of the header.Returns the taxonomy.booleanThis method reports on the presence of addenda for this header.static HeaderparseFromFASTA(String aFASTAHeader) Factory method that constructs a Header instance based on a FASTA header line.voidsetAccession(String aAccession) Sets the accession.voidsetDatabaseType(ProteinDatabase aDatabaseType) Sets the database type.voidsetDescription(String aDescription) Sets the description.voidsetDescriptionProteinName(String aDescriptionProteinName) Sets the protein name.voidsetDescriptionShort(String aDescriptionShort) Sets the short description.voidsetForeignAccession(String aForeignAccession) Sets the foreign accession.voidsetForeignDescription(String aForeignDescription) Sets the foreign description.voidsetForeignID(String aForeignID) Sets the foreign ID.voidsetGeneName(String aGeneName) Set the gene name.voidSets the ID.voidsetLocation(int aStart, int aEnd) This method allows the caller to add information to the header about location of the sequence in a certain master sequence.voidsetOrganismIdentifier(String aOrganismIdentifier) Sets the organism identifier.voidsetProteinEvidence(Integer aProteinEvidence) Sets the protein evidence level.voidsetRawHeader(String aRawHeader) Sets the entire header.voidSets the rest of the header.voidsetTaxonomy(String aTaxonomy) Sets the taxonomy.toString()This method reports on the entire processed(!) header.This method reports on the entire processed(!) header, with the given decoy tag added.Methods inherited from class com.compomics.util.experiment.personalization.ExperimentObject
addUrParam, asLong, clearParametersMap, getId, getUrParam, getUrParams, removeUrParam, setId, setUrParams
-
Method Details
-
parseFromFASTA
Factory method that constructs a Header instance based on a FASTA header line.- Parameters:
aFASTAHeader- the String with the original FASTA header line.- Returns:
- Header with the Header instance representing the given header. The object returned will have been parsed correctly if it is a standard SwissProt or NCBI formatted header, and will be plain in all other cases.
- Throws:
StringIndexOutOfBoundsException- thrown if issues occur during the parsing
-
getID
Returns the ID.- Returns:
- the ID
-
setID
Sets the ID. Null if not set.- Parameters:
aID- the ID
-
getForeignID
Returns the foreign ID. Null if not set.- Returns:
- the foreign ID
-
setForeignID
Sets the foreign ID.- Parameters:
aForeignID- the foreign ID
-
getAccession
Returns the accession. Null if not set.- Returns:
- the accession
-
setAccession
Sets the accession.- Parameters:
aAccession- the accession
-
getAccessionOrRest
Returns the accession or if this is null the rest. This is a quick fix for unsupported custom headers.- Returns:
- the accession or if this is null the rest
-
getDatabaseType
Returns the database type as inferred from the header structure.- Returns:
- the database type
-
setDatabaseType
Sets the database type.- Parameters:
aDatabaseType- the database type
-
getForeignAccession
Returns the foreign accession. Null if not set.- Returns:
- the foreign accession
-
setForeignAccession
Sets the foreign accession.- Parameters:
aForeignAccession- the foreign accession
-
getDescription
Returns the description. Null if not set.- Returns:
- the description
-
setDescription
Sets the description.- Parameters:
aDescription- the description
-
getDescriptionShort
Returns the short description. Null if not set.- Returns:
- the short description
-
setDescriptionShort
Sets the short description.- Parameters:
aDescriptionShort- the short description
-
getDescriptionProteinName
Returns the protein name as inferred from the description.- Returns:
- the protein name
-
setDescriptionProteinName
Sets the protein name.- Parameters:
aDescriptionProteinName- the protein name
-
getGeneName
Returns the gene name.- Returns:
- the gene name
-
setGeneName
Set the gene name.- Parameters:
aGeneName- the gene name
-
getProteinEvidence
Returns the protein evidence level as indexed in UniProt. Null if not available.- Returns:
- the protein evidence level
-
setProteinEvidence
Sets the protein evidence level.- Parameters:
aProteinEvidence- the protein evidence level
-
getTaxonomy
Returns the taxonomy.- Returns:
- the taxonomy
-
setTaxonomy
Sets the taxonomy.- Parameters:
aTaxonomy- the taxonomy
-
getOrganismIdentifier
Returns the organism identifier.- Returns:
- the organism identifier
-
setOrganismIdentifier
Sets the organism identifier.- Parameters:
aOrganismIdentifier- the organism identifier
-
getForeignDescription
Returns the foreign description.- Returns:
- the foreign description
-
setForeignDescription
Sets the foreign description.- Parameters:
aForeignDescription- the foreign description
-
getRest
Returns the rest of the header.- Returns:
- the rest of the header
-
setRest
Sets the rest of the header.- Parameters:
aRest- the rest of the header
-
getRawHeader
Returns the entire header.- Returns:
- the entire header
-
setRawHeader
Sets the entire header.- Parameters:
aRawHeader- the entire header
-
getSimpleProteinDescription
Returns a simplified protein description for a UniProt header. For example "GRP78_HUMAN 78 kDa glucose-regulated protein OS=Homo sapiens GN=HSPA5 PE=1 SV=2" becomes "78 kDa glucose-regulated protein [GRP78_HUMAN]". For non UniProt headers the normal protein description is returned.- Returns:
- a simplified protein description for a UniProt header
-
getAbbreviatedFASTAHeader
This method returns an abbreviated version of the Header, suitable for inclusion in FASTA formatted files.
The abbreviated header is composed in the following way:
>[ID]|[accession_string]|([foreign_ID]|[foreign_accession_string]|[foreign_description] )[description]- Returns:
- String with the abbreviated header.
-
getAbbreviatedFASTAHeader
This method returns an abbreviated version of the Header, suitable for inclusion in FASTA formatted files.
The abbreviated header is composed in the following way:
>[ID]|[accession_string]|([foreign_ID]|[foreign_accession_string]|[foreign_description] )[description]- Parameters:
decoyTag- the decoy tag to add- Returns:
- String with the abbreviated header.
-
toString
This method reports on the entire processed(!) header. To get the raw header use getRawHeader instead. -
toString
This method reports on the entire processed(!) header, with the given decoy tag added. To get the raw header use getRawHeader instead.- Parameters:
decoyTag- the decoy tag to add- Returns:
- String with the full header.
-
getScore
public int getScore()This method will attribute a score to the current header, based on the following scoring list:- SwissProt : 4
- IPI, SwissProt reference : 3
- IPI, TrEMBL or REFSEQ_NP reference : 2
- IPI, without SwissProt, TrEMBL or REFSEQ_NP reference : 1
- NCBI, SwissProt reference : 2
- NCBI, other reference : 1
- Unknown header format : 0
- Returns:
- int with the header score. The higher the score, the more interesting a Header is.
-
getCoreHeader
This method reports on the core information for the header, which is comprised of the ID and the accession String:[ID]|[accession_string]This is mostly useful for appending this core as an addendum to another header.- Returns:
- String with the header core data ([ID]|[accession_string]).
-
addAddendum
This method allows the addition of an addendum to the list. If the addendum is already preceded with '^A', it is added as is, otherwise '^A' is prepended before addition to the list.- Parameters:
aAddendum- String with the addendum, facultatively preceded by '^A'.
-
getAddenda
This method allows the caller to retrieve all addenda for the current header, or 'null' if there aren't any.- Returns:
- String with the addenda, or 'null' if there aren't any.
-
hasAddenda
public boolean hasAddenda()This method reports on the presence of addenda for this header.- Returns:
- boolean whether addenda are present.
-
getFullHeaderWithAddenda
This method reports on the full header, with the addenda (if present). If no addenda are present, this method reports the same information as the 'toString()' method.- Returns:
- String with the header and addenda (if any).
-
getAbbreviatedFASTAHeaderWithAddenda
This method returns an abbreviated version of the Header, suitable for inclusion in FASTA formatted files.
The abbreviated header is composed in the following way:
>[ID]|[accession_string]|([foreign_ID]|[foreign_accession_string]|[foreign_description] )[description]([addenda])
Note that the output of this method is identical to that of the getAbbreviatedFASTAHeader() if no addenda are present.- Returns:
- String with the abbreviated header and addenda (if any).
-
setLocation
public void setLocation(int aStart, int aEnd) This method allows the caller to add information to the header about location of the sequence in a certain master sequence.
This information is typically specified right after the accession number:[id]|[accession_string] ([startindex]-[endindex])|...Please note the following:- If an index is already present, it is removed and replaced.
- If the header is of unknown format, the indeces are appended to the end of the header.
- Parameters:
aStart- int with the startindex.aEnd- int with the endindex.
-
getStartLocation
public int getStartLocation()This method reports on the start index of the header. It returns '-1' if no location is specified.- Returns:
- int with the start location, or '-1' if none was defined.
-
getEndLocation
public int getEndLocation()This method reports on the end index of the header. It returns '-1' if no location is specified.- Returns:
- int with the end location, or '-1' if none was defined.
-
clone
This method provides a deep copy of the Header instance. -
getDatabaseTypesAsString
Returns the implemented database types as an array of String.- Returns:
- the implemented database types as an array of String
-
getDatabaseTypeAsString
Convenience method returning the database name as a String.- Parameters:
databaseType- the database type- Returns:
- the name
-
getProteinEvidencAsString
Return the UniProt protein evidence type as text.- Parameters:
type- the type of evidence- Returns:
- the protein evidence type as text
-
asGenericHeader
Returns the header in generic format.- Returns:
- the header in generic format
-