Class ProteinGeneDetailsProvider
java.lang.Object
com.compomics.util.experiment.biology.genes.ProteinGeneDetailsProvider
public class ProteinGeneDetailsProvider extends Object
Class used to map proteins to gene information.
- Author:
- Marc Vaudel, Harald Barsnes
-
Field Summary
Fields Modifier and Type Field Description static String
GENE_MAPPING_FILE_SUFFIX
The suffix to use for files containing gene mappings.static String
GO_MAPPING_FILE_SUFFIX
The suffix to use for files containing GO mappings.static String
SEPARATOR
The separator used to separate line contents. -
Constructor Summary
Constructors Constructor Description ProteinGeneDetailsProvider()
Constructor. -
Method Summary
Modifier and Type Method Description void
createDefaultGeneMappingFilesGeneric(String jarFilePath, File sourceEnsemblVersionsFile, File sourceGoDomainsFile, boolean updateEqualVersion)
Copies the given gene mapping files to the gene mappings folder.void
downloadGeneMappings(String ensemblType, String ensemblSchemaName, String ensemblDatasetName, String ensemblVersion, WaitingHandler waitingHandler)
Download the gene mappings.boolean
downloadGeneSequences(File destinationFile, String ensemblType, String ensemblSchemaName, String ensemblDbName, WaitingHandler waitingHandler)
Download the gene sequences mappings.boolean
downloadGoMappings(String ensemblType, String ensemblSchemaName, String ensemblDbName, boolean swissProtMapping, WaitingHandler waitingHandler)
Download the GO mappings.boolean
downloadMappings(WaitingHandler waitingHandler, Integer taxon)
Try to download the gene and GO mappings for the currently selected species.HashMap<String,String>
getEnsemblSpeciesVersions(File ensemblVersionsFile)
Gets the information contained into the Ensembl species file.String
getEnsemblVersion(Integer taxon)
Returns the Ensembl version for a given species.Integer
getEnsemblVersionFromFile(File ensemblVersionsFile, String species)
Gets the Ensembl version of a given species from a file.static File
getEnsemblVersionsFile()
Returns the Ensembl version file.static File
getGeneMappingFile(String ensemblDatasetName)
Returns the gene mapping file.static File
getGeneMappingFolder()
Returns the path to the folder containing the gene mapping files.GeneMaps
getGeneMaps(GeneParameters genePreferences, FastaSummary fastaSummary, SequenceProvider sequenceProvider, ProteinDetailsProvider proteinDetailsProvider, WaitingHandler waitingHandler)
Returns the gene maps for the given proteins.static File
getGoDomainsFile()
Returns the GO domains file.static File
getGoMappingFile(String ensemblDatasetName)
Returns the GO mapping file.void
initialize(String jarFilePath)
Initializes the factory.void
loadEnsemblSpeciesVersions(File ensemblVersionsFile)
Loads the given Ensembl species file.boolean
newVersionExists(Integer taxon)
Returns true if a newer version of the species mapping exists in Ensembl.boolean
queryEnsembl(String requestXml, File destinationFile, String ensemblType)
Sends an XML query to Ensembl and writes the result in a text file.boolean
queryEnsembl(String requestXml, File destinationFile, String ensemblType, WaitingHandler waitingHandler)
Sends an XML query to Ensembl and writes the result in a text file.boolean
queryEnsembl(String requestXml, String waitingText, File destinationFile, String ensemblType, WaitingHandler waitingHandler)
Sends an XML query to Ensembl and writes the result in a text file.static void
setGeneMappingFolder(String geneMappingFolder)
Sets the folder where gene mappings are saved.void
updateEnsemblVersion(String ensemblDatasetName, String ensemblVersion)
Update the Ensembl version for the given species in the local map and in the Ensembl versions file.
-
Field Details
-
SEPARATOR
The separator used to separate line contents.- See Also:
- Constant Field Values
-
GENE_MAPPING_FILE_SUFFIX
The suffix to use for files containing gene mappings.- See Also:
- Constant Field Values
-
GO_MAPPING_FILE_SUFFIX
The suffix to use for files containing GO mappings.- See Also:
- Constant Field Values
-
-
Constructor Details
-
ProteinGeneDetailsProvider
public ProteinGeneDetailsProvider()Constructor.
-
-
Method Details
-
initialize
Initializes the factory. Note: the species factory must be initialized first.- Parameters:
jarFilePath
- the path to the jar file- Throws:
IOException
- Exception thrown if an error occurs while reading the species mapping
-
getGeneMaps
public GeneMaps getGeneMaps(GeneParameters genePreferences, FastaSummary fastaSummary, SequenceProvider sequenceProvider, ProteinDetailsProvider proteinDetailsProvider, WaitingHandler waitingHandler)Returns the gene maps for the given proteins. For every protein, the species must be given as well as the gene name, in the format used in a UniProt Importing gene mappings file.- Parameters:
genePreferences
- the gene preferencesfastaSummary
- summary information on the Importing gene mappings file containing the proteinssequenceProvider
- the protein sequence providerproteinDetailsProvider
- the protein details providerwaitingHandler
- waiting handler displaying progress for the download and allowing canceling of the progress.- Returns:
- the gene maps for the FASTA file loaded in the factory
-
downloadGeneSequences
public boolean downloadGeneSequences(File destinationFile, String ensemblType, String ensemblSchemaName, String ensemblDbName, WaitingHandler waitingHandler) throws MalformedURLException, IOExceptionDownload the gene sequences mappings.- Parameters:
destinationFile
- the destination file where to save the gene sequencesensemblType
- the Ensembl type, e.g., default or plantsensemblSchemaName
- the Ensembl schema name, e.g., default or plants_mart_18ensemblDbName
- the Ensembl DB name of the selected specieswaitingHandler
- waiting handler displaying progress and allowing canceling the process- Returns:
- true if downloading went OK
- Throws:
MalformedURLException
- if an MalformedURLException occursIOException
- if an IOException occurs
-
downloadGoMappings
public boolean downloadGoMappings(String ensemblType, String ensemblSchemaName, String ensemblDbName, boolean swissProtMapping, WaitingHandler waitingHandler) throws MalformedURLException, IOExceptionDownload the GO mappings.- Parameters:
ensemblType
- the Ensembl type, e.g., default or plantsensemblSchemaName
- the Ensembl schema name, e.g., default or plants_mart_18ensemblDbName
- the Ensembl db name of the selected speciesswissProtMapping
- if true, use the uniprotswissprot_accession parameter, if false use the uniprotsptrembl parameterwaitingHandler
- waiting handler displaying progress and allowing canceling the process- Returns:
- true if downloading went OK
- Throws:
MalformedURLException
- if an MalformedURLException occursIOException
- if an IOException occurs
-
queryEnsembl
public boolean queryEnsembl(String requestXml, File destinationFile, String ensemblType) throws MalformedURLException, IOExceptionSends an XML query to Ensembl and writes the result in a text file.- Parameters:
requestXml
- the XML requestdestinationFile
- the file where to save the resultsensemblType
- the Ensembl type, e.g., default or plants- Returns:
- true if downloading went OK
- Throws:
MalformedURLException
- if an MalformedURLException occursIOException
- if an IOException occurs
-
queryEnsembl
public boolean queryEnsembl(String requestXml, File destinationFile, String ensemblType, WaitingHandler waitingHandler) throws MalformedURLException, IOExceptionSends an XML query to Ensembl and writes the result in a text file.- Parameters:
requestXml
- the XML requestdestinationFile
- the file where to save the resultsensemblType
- the Ensembl type, e.g., default or plantswaitingHandler
- waiting handler displaying progress and allowing canceling the process- Returns:
- true if downloading went OK
- Throws:
MalformedURLException
- if an MalformedURLException occursIOException
- if an IOException occurs
-
queryEnsembl
public boolean queryEnsembl(String requestXml, String waitingText, File destinationFile, String ensemblType, WaitingHandler waitingHandler) throws MalformedURLException, IOExceptionSends an XML query to Ensembl and writes the result in a text file.- Parameters:
requestXml
- the XML requestdestinationFile
- the file where to save the resultsensemblType
- the Ensembl type, e.g., default or plantswaitingHandler
- waiting handler displaying progress and allowing canceling the processwaitingText
- the text to write in case a progress dialog is used- Returns:
- true if downloading went OK
- Throws:
MalformedURLException
- if an MalformedURLException occursIOException
- if an IOException occurs
-
downloadGeneMappings
public void downloadGeneMappings(String ensemblType, String ensemblSchemaName, String ensemblDatasetName, String ensemblVersion, WaitingHandler waitingHandler) throws MalformedURLException, IOException, IllegalArgumentExceptionDownload the gene mappings.- Parameters:
ensemblType
- the Ensembl type, e.g., default or plantsensemblSchemaName
- the Ensembl schema name, e.g., default or plants_mart_18ensemblDatasetName
- the Ensembl dataset name of the selected speciesensemblVersion
- the Ensembl versionwaitingHandler
- the waiting handler- Throws:
MalformedURLException
- if an MalformedURLException occursIOException
- if an IOException occursIllegalArgumentException
- if an IllegalArgumentException occurs
-
getGeneMappingFolder
Returns the path to the folder containing the gene mapping files.- Returns:
- the gene mapping folder
-
setGeneMappingFolder
Sets the folder where gene mappings are saved.- Parameters:
geneMappingFolder
- the folder where gene mappings are saved
-
createDefaultGeneMappingFilesGeneric
public void createDefaultGeneMappingFilesGeneric(String jarFilePath, File sourceEnsemblVersionsFile, File sourceGoDomainsFile, boolean updateEqualVersion)Copies the given gene mapping files to the gene mappings folder. If newer versions of the mapping exists they will be overwritten according to updateEqualVersion.- Parameters:
jarFilePath
- the Ensembl versions filesourceEnsemblVersionsFile
- the Ensembl versions filesourceGoDomainsFile
- the GO domains fileupdateEqualVersion
- if true, the version is updated with equal version numbers, false, only update if the new version is newer
-
updateEnsemblVersion
public void updateEnsemblVersion(String ensemblDatasetName, String ensemblVersion) throws IOExceptionUpdate the Ensembl version for the given species in the local map and in the Ensembl versions file.- Parameters:
ensemblDatasetName
- the dataset name of the species to update, e.g., hsapiens_gene_ensemblensemblVersion
- the new Ensembl version- Throws:
IOException
- if an IOException occurs
-
getEnsemblVersionFromFile
public Integer getEnsemblVersionFromFile(File ensemblVersionsFile, String species) throws IOExceptionGets the Ensembl version of a given species from a file.- Parameters:
ensemblVersionsFile
- the Ensembl versions filespecies
- the species of interest- Returns:
- the Ensembl version
- Throws:
IOException
- thrown whenever an error occurred while reading the file
-
getEnsemblSpeciesVersions
public HashMap<String,String> getEnsemblSpeciesVersions(File ensemblVersionsFile) throws FileNotFoundException, IOExceptionGets the information contained into the Ensembl species file.- Parameters:
ensemblVersionsFile
- the Ensembl species file to read- Returns:
- The Ensembl versions for each species
- Throws:
FileNotFoundException
- if an FileNotFoundException occursIOException
- if an IOException occurs
-
loadEnsemblSpeciesVersions
Loads the given Ensembl species file.- Parameters:
ensemblVersionsFile
- the Ensembl species file to load- Throws:
IOException
- if an IOException occurs
-
downloadMappings
Try to download the gene and GO mappings for the currently selected species.- Parameters:
waitingHandler
- the waiting handlertaxon
- the NCBI taxon of the species- Returns:
- true if the download was successful
- Throws:
IOException
- exception thrown whenever an error occurred while reading the mapping files
-
getGeneMappingFile
Returns the gene mapping file.- Parameters:
ensemblDatasetName
- the Ensembl dataset name- Returns:
- the gene mapping file
-
getGoMappingFile
Returns the GO mapping file.- Parameters:
ensemblDatasetName
- the Ensembl dataset name- Returns:
- the GO mapping file
-
getEnsemblVersionsFile
Returns the Ensembl version file.- Returns:
- the Ensembl version file
-
getGoDomainsFile
Returns the GO domains file.- Returns:
- the GO domains file
-
getEnsemblVersion
Returns the Ensembl version for a given species.- Parameters:
taxon
- the NCBI taxon of the species- Returns:
- the Ensembl version for a given species.
-
newVersionExists
Returns true if a newer version of the species mapping exists in Ensembl.- Parameters:
taxon
- the NCBI taxon of the species- Returns:
- rue if a newer version of the species mapping exists in Ensemble
-