PSORTb
Introduction
The PSORT principle uses the amino acid sequence information to generate an overall prediction of the protein localization sites. This rules are derived from experimental observations. For example, when analysing a gram negative organism, possible localization sites are: cytoplasm, cytoplasmic membrane, periplasm, outer membrane and extracellular space.
Blast2GO allows to assign sub-cellular localization sites to proteins based on their amino acid sequence via PSORTb. PSORTb is an algorithm which can be applied to bacteria or archaea protein sequences and uses a probabilistic system to predict the most probable localization. Once sites are predicted, its corresponding cellular component GO terms can be merged with the already existing Blast2GO annotations.
Parameters and Execution
Starting with a previously loaded Blast2GO project with protein sequences, the PSORTb tool can be found under Toolbox > Blast2GO > Psortb > PSORTb.
If the loaded project contains nucleotide sequences, the Translate Longest ORF tool can help to obtain the predicted protein sequences and be able to run PSORTb.
The wizard allows to adjust the algorithm parameters and it performs different analysis depending on the Organism Type and the Gram Stain. It can be used with bacteria positive and negative gram stains or archaea organism sequences. For more details of the core algorithm, visit http://www.psortb.org.
The algorithm returns score values between 0 and 10 for each localisation site, the Cutoff parameter allows to set a minimum value of each localization above which the value can be considered as possible localization.
Results
The tool will iterate over the input sequences and analyse each of them with PSORTb. The process will open a new tab and as the results come back, they are shown in a table format.
The table contains one row for each sequence, where the columns have the following meaning:
- Sequence name: shows each sequence identifier.
- Final localization: contains the the predicted localization name.
- Final score: represents the prediction score for the localization.
- GO ID: the Gene Ontology ID associated to the location.
- Secondary Localization: a possible secondary localization when there is more than one score above the cutoff.
- The next 6 columns, hidden by default, show the score for all possible localizations.
Merge GO information
The GO IDs from the prediction can be merged into the original Blast2GO project as cellular component characterization of the sequences. The merge wizard asks for the Blast2GO project file where to merge the GO results and will add the GO information to the project, matching the Sequence Name.