Class Header
java.lang.Object
com.compomics.util.experiment.personalization.ExperimentObject
com.compomics.util.experiment.io.biology.protein.Header
- All Implemented Interfaces:
Serializable
,Cloneable
public class Header extends ExperimentObject implements Cloneable
This class represents the header for a Protein instance. It is meant to work
closely with FASTA format notation. The Header class knows how to handle
certain often-used headers such as SwissProt and NCBI formatted FASTA
headers.
Note that the Header class is it's own factory, and should be used as such.
Note that the Header class is it's own factory, and should be used as such.
- Author:
- Lennart Martens, Harald Barsnes, Marc Vaudel
- See Also:
- Serialized Form
-
Field Summary
-
Method Summary
Modifier and Type Method Description void
addAddendum(String aAddendum)
This method allows the addition of an addendum to the list.String
asGenericHeader()
Returns the header in generic format.Object
clone()
This method provides a deep copy of the Header instance.String
getAbbreviatedFASTAHeader()
This method returns an abbreviated version of the Header, suitable for inclusion in FASTA formatted files.String
getAbbreviatedFASTAHeader(String decoyTag)
This method returns an abbreviated version of the Header, suitable for inclusion in FASTA formatted files.String
getAbbreviatedFASTAHeaderWithAddenda()
This method returns an abbreviated version of the Header, suitable for inclusion in FASTA formatted files.String
getAccession()
Returns the accession.String
getAccessionOrRest()
Returns the accession or if this is null the rest.String
getAddenda()
This method allows the caller to retrieve all addenda for the current header, or 'null' if there aren't any.String
getCoreHeader()
This method reports on the core information for the header, which is comprised of the ID and the accession String:ProteinDatabase
getDatabaseType()
Returns the database type as inferred from the header structure.static String
getDatabaseTypeAsString(ProteinDatabase databaseType)
Convenience method returning the database name as a String.static String[]
getDatabaseTypesAsString()
Returns the implemented database types as an array of String.String
getDescription()
Returns the description.String
getDescriptionProteinName()
Returns the protein name as inferred from the description.String
getDescriptionShort()
Returns the short description.int
getEndLocation()
This method reports on the end index of the header.String
getForeignAccession()
Returns the foreign accession.String
getForeignDescription()
Returns the foreign description.String
getForeignID()
Returns the foreign ID.String
getFullHeaderWithAddenda()
This method reports on the full header, with the addenda (if present).String
getGeneName()
Returns the gene name.String
getID()
Returns the ID.static String
getProteinEvidencAsString(Integer type)
Return the UniProt protein evidence type as text.Integer
getProteinEvidence()
Returns the protein evidence level as indexed in UniProt.String
getRawHeader()
Returns the entire header.String
getRest()
Returns the rest of the header.int
getScore()
This method will attribute a score to the current header, based on the following scoring list: SwissProt : 4 IPI, SwissProt reference : 3 IPI, TrEMBL or REFSEQ_NP reference : 2 IPI, without SwissProt, TrEMBL or REFSEQ_NP reference : 1 NCBI, SwissProt reference : 2 NCBI, other reference : 1 Unknown header format : 0String
getSimpleProteinDescription()
Returns a simplified protein description for a UniProt header.int
getStartLocation()
This method reports on the start index of the header.String
getTaxonomy()
Returns the taxonomy.boolean
hasAddenda()
This method reports on the presence of addenda for this header.static Header
parseFromFASTA(String aFASTAHeader)
Factory method that constructs a Header instance based on a FASTA header line.void
setAccession(String aAccession)
Sets the accession.void
setDatabaseType(ProteinDatabase aDatabaseType)
Sets the database type.void
setDescription(String aDescription)
Sets the description.void
setDescriptionProteinName(String aDescriptionProteinName)
Sets the protein name.void
setDescriptionShort(String aDescriptionShort)
Sets the short description.void
setForeignAccession(String aForeignAccession)
Sets the foreign accession.void
setForeignDescription(String aForeignDescription)
Sets the foreign description.void
setForeignID(String aForeignID)
Sets the foreign ID.void
setGeneName(String aGeneName)
Set the gene name.void
setID(String aID)
Sets the ID.void
setLocation(int aStart, int aEnd)
This method allows the caller to add information to the header about location of the sequence in a certain master sequence.void
setProteinEvidence(Integer aProteinEvidence)
Sets the protein evidence level.void
setRawHeader(String aRawHeader)
Sets the entire header.void
setRest(String aRest)
Sets the rest of the header.void
setTaxonomy(String aTaxonomy)
Sets the taxonomy.String
toString()
This method reports on the entire processed(!) header.String
toString(String decoyTag)
This method reports on the entire processed(!) header, with the given decoy tag added.Methods inherited from class com.compomics.util.experiment.personalization.ExperimentObject
addUrParam, asLong, clearParametersMap, getId, getUrParam, getUrParams, removeUrParam, setId, setUrParams
-
Method Details
-
parseFromFASTA
Factory method that constructs a Header instance based on a FASTA header line.- Parameters:
aFASTAHeader
- the String with the original FASTA header line.- Returns:
- Header with the Header instance representing the given header. The object returned will have been parsed correctly if it is a standard SwissProt or NCBI formatted header, and will be plain in all other cases.
- Throws:
StringIndexOutOfBoundsException
- thrown if issues occur during the parsing
-
getID
Returns the ID.- Returns:
- the ID
-
setID
Sets the ID. Null if not set.- Parameters:
aID
- the ID
-
getForeignID
Returns the foreign ID. Null if not set.- Returns:
- the foreign ID
-
setForeignID
Sets the foreign ID.- Parameters:
aForeignID
- the foreign ID
-
getAccession
Returns the accession. Null if not set.- Returns:
- the accession
-
setAccession
Sets the accession.- Parameters:
aAccession
- the accession
-
getAccessionOrRest
Returns the accession or if this is null the rest. This is a quick fix for unsupported custom headers.- Returns:
- the accession or if this is null the rest
-
getDatabaseType
Returns the database type as inferred from the header structure.- Returns:
- the database type
-
setDatabaseType
Sets the database type.- Parameters:
aDatabaseType
- the database type
-
getForeignAccession
Returns the foreign accession. Null if not set.- Returns:
- the foreign accession
-
setForeignAccession
Sets the foreign accession.- Parameters:
aForeignAccession
- the foreign accession
-
getDescription
Returns the description. Null if not set.- Returns:
- the description
-
setDescription
Sets the description.- Parameters:
aDescription
- the description
-
getDescriptionShort
Returns the short description. Null if not set.- Returns:
- the short description
-
setDescriptionShort
Sets the short description.- Parameters:
aDescriptionShort
- the short description
-
getDescriptionProteinName
Returns the protein name as inferred from the description.- Returns:
- the protein name
-
setDescriptionProteinName
Sets the protein name.- Parameters:
aDescriptionProteinName
- the protein name
-
getGeneName
Returns the gene name.- Returns:
- the gene name
-
setGeneName
Set the gene name.- Parameters:
aGeneName
- the gene name
-
getProteinEvidence
Returns the protein evidence level as indexed in UniProt. Null if not available.- Returns:
- the protein evidence level
-
setProteinEvidence
Sets the protein evidence level.- Parameters:
aProteinEvidence
- the protein evidence level
-
getTaxonomy
Returns the taxonomy.- Returns:
- the taxonomy
-
setTaxonomy
Sets the taxonomy.- Parameters:
aTaxonomy
- the taxonomy
-
getForeignDescription
Returns the foreign description.- Returns:
- the foreign description
-
setForeignDescription
Sets the foreign description.- Parameters:
aForeignDescription
- the foreign description
-
getRest
Returns the rest of the header.- Returns:
- the rest of the header
-
setRest
Sets the rest of the header.- Parameters:
aRest
- the rest of the header
-
getRawHeader
Returns the entire header.- Returns:
- the entire header
-
setRawHeader
Sets the entire header.- Parameters:
aRawHeader
- the entire header
-
getSimpleProteinDescription
Returns a simplified protein description for a UniProt header. For example "GRP78_HUMAN 78 kDa glucose-regulated protein OS=Homo sapiens GN=HSPA5 PE=1 SV=2" becomes "78 kDa glucose-regulated protein [GRP78_HUMAN]". For non UniProt headers the normal protein description is returned.- Returns:
- a simplified protein description for a UniProt header
-
getAbbreviatedFASTAHeader
This method returns an abbreviated version of the Header, suitable for inclusion in FASTA formatted files.
The abbreviated header is composed in the following way:
>[ID]|[accession_string]|([foreign_ID]|[foreign_accession_string]|[foreign_description] )[description]- Returns:
- String with the abbreviated header.
-
getAbbreviatedFASTAHeader
This method returns an abbreviated version of the Header, suitable for inclusion in FASTA formatted files.
The abbreviated header is composed in the following way:
>[ID]|[accession_string]|([foreign_ID]|[foreign_accession_string]|[foreign_description] )[description]- Parameters:
decoyTag
- the decoy tag to add- Returns:
- String with the abbreviated header.
-
toString
This method reports on the entire processed(!) header. To get the raw header use getRawHeader instead. -
toString
This method reports on the entire processed(!) header, with the given decoy tag added. To get the raw header use getRawHeader instead.- Parameters:
decoyTag
- the decoy tag to add- Returns:
- String with the full header.
-
getScore
public int getScore()This method will attribute a score to the current header, based on the following scoring list:- SwissProt : 4
- IPI, SwissProt reference : 3
- IPI, TrEMBL or REFSEQ_NP reference : 2
- IPI, without SwissProt, TrEMBL or REFSEQ_NP reference : 1
- NCBI, SwissProt reference : 2
- NCBI, other reference : 1
- Unknown header format : 0
- Returns:
- int with the header score. The higher the score, the more interesting a Header is.
-
getCoreHeader
This method reports on the core information for the header, which is comprised of the ID and the accession String:[ID]|[accession_string]
This is mostly useful for appending this core as an addendum to another header.- Returns:
- String with the header core data ([ID]|[accession_string]).
-
addAddendum
This method allows the addition of an addendum to the list. If the addendum is already preceded with '^A', it is added as is, otherwise '^A' is prepended before addition to the list.- Parameters:
aAddendum
- String with the addendum, facultatively preceded by '^A'.
-
getAddenda
This method allows the caller to retrieve all addenda for the current header, or 'null' if there aren't any.- Returns:
- String with the addenda, or 'null' if there aren't any.
-
hasAddenda
public boolean hasAddenda()This method reports on the presence of addenda for this header.- Returns:
- boolean whether addenda are present.
-
getFullHeaderWithAddenda
This method reports on the full header, with the addenda (if present). If no addenda are present, this method reports the same information as the 'toString()' method.- Returns:
- String with the header and addenda (if any).
-
getAbbreviatedFASTAHeaderWithAddenda
This method returns an abbreviated version of the Header, suitable for inclusion in FASTA formatted files.
The abbreviated header is composed in the following way:
>[ID]|[accession_string]|([foreign_ID]|[foreign_accession_string]|[foreign_description] )[description]([addenda])
Note that the output of this method is identical to that of the getAbbreviatedFASTAHeader() if no addenda are present.- Returns:
- String with the abbreviated header and addenda (if any).
-
setLocation
public void setLocation(int aStart, int aEnd)This method allows the caller to add information to the header about location of the sequence in a certain master sequence.
This information is typically specified right after the accession number:[id]|[accession_string] ([startindex]-[endindex])|...
Please note the following:- If an index is already present, it is removed and replaced.
- If the header is of unknown format, the indeces are appended to the end of the header.
- Parameters:
aStart
- int with the startindex.aEnd
- int with the endindex.
-
getStartLocation
public int getStartLocation()This method reports on the start index of the header. It returns '-1' if no location is specified.- Returns:
- int with the start location, or '-1' if none was defined.
-
getEndLocation
public int getEndLocation()This method reports on the end index of the header. It returns '-1' if no location is specified.- Returns:
- int with the end location, or '-1' if none was defined.
-
clone
This method provides a deep copy of the Header instance. -
getDatabaseTypesAsString
Returns the implemented database types as an array of String.- Returns:
- the implemented database types as an array of String
-
getDatabaseTypeAsString
Convenience method returning the database name as a String.- Parameters:
databaseType
- the database type- Returns:
- the name
-
getProteinEvidencAsString
Return the UniProt protein evidence type as text.- Parameters:
type
- the type of evidence- Returns:
- the protein evidence type as text
-
asGenericHeader
Returns the header in generic format.- Returns:
- the header in generic format
-