Miscellaneous

Content of this page:

Example Data

Add some Blast2GO example data to the Navigation Area to play around with.

Annex (legacy)

Annex [Myhre et al.,2006] was developed by the Norwegian University of Science and Technology and is essentially a set of manually curated relationships between the three different Gene Ontology categories. The approach uses uni-vocal relationships between GO terms to add implicit annotation. The Annex dataset consists of 6000+ manually reviewed relations between molecular function terms which are "involved in" biological processes and molecular function terms "acting in" cellular components. Annex-based GO term augmentation can be run on any annotation loaded in Bast2GO. Generally, between 10% and 15% extra annotation is achieved and around 30% of GO term confirmations are obtained through the Annex data-set.

Annotation Table

This function allows to create a CLC-bio Annotation Table containing the Gene Ontology terms generated with Blast2GO and can be used in combination with Add Annotations in order to perform Hypergeometric Tests on Annotations and Gene Set Enrichment Analysis (Toolbox > Microarray and Small RNA Analysis > Annotation Test).

Remove First Level Annotations

This function removes for each sequence the three main (root or top-level) GO terms (molecular function, biological process and cellular component), if present since they do not provide any relevant information.

Validate Annotations

This function validates the annotation result and removes redundant GOs from the dataset. It assures that only the most specific annotations for a given sequence are saved. In this way this function prevents that two or more GO terms lying on the same GO branch are assigned to the same sequence. The Gene Ontology "true path rule" assures that all the terms lying on the branch or route from a term up to the root (top-level) must always be true for a given gene product. Therefore, any term is considered as redundant and is removed if a child term coexists for the same sequence. 

This function can be run independently, however Blast2GO applies this method automatically always after a modification is made to an existing annotation, such as merging GO terms from InterProScan search, after Annex augmentation or upon manual curation.

Find Duplicates

This function helps to identify (and optionally delete) sequences from a data-set which are 100% identical.

Set to Sense

Converts sequences with a negative reading frame Top-Blast-Hit to anti-sense i.e. query sequences will be translated to its reverse compliment (e.g. ATTG > CAAT).

Also adds "_antisense" to the sequence name, use Batch Rename to revoke the name change afterwards.

Translate Longest ORF

Converts nucleotide sequences to its longest open reading frame.

Also adds "_ORF" to the sequence name, use Batch Rename to revoke the name change afterwards.

Batch Rename

Renames all selected sequences by either adding text or by replacing text patterns (regular expressions can be used). Also allows to convert the sequence names to lower or upper case.

BDA

The primary goal of Blast2GO is to assign functional labels in form of GO-terms to nucleotide or protein sequences. However, not only functional labels but also a meaningful description for novel sequences is desired. A common approach is to directly transfer the "Best-BLAST-hit description to the novel sequence. It is frequent that best-hit descriptions are of low-informative text such as "unknown", "putative" or "hypothetical" while descriptions of other Blast hits of the same sequence do contain informative keywords. For this reason, a text-mining functionality has been included in Blast2GO. It analyses a set of sequence descriptions of a given BLAST result. The feature is called the BLAST Description Annotator (BDA). Depending on the frequency of occurrence and the information content, the most suitable description is selected out of the collection of words. In this way, this simple approach avoids sequence descriptions like for example "hypothetical", "putative" or "unknown protein" in the case that a more informative and representative description is available. These descriptions are only of exploratory nature and do not have the same weight of evidence as the functional labels.

Retrieve Blast Top-Hit

This feature uses Blast result information to search the top-hit at NCBI, Ensembl or Uniprot (via web services). It is then possible to replace the original query sequence with its Blast top-hit or to extract the information to a new project (various scenarios are possible).

A possible use case scenario  would be a so called "Double-Blast" (Figure 1): The blast results of a first run are used to replace the sequence data for a second run against a different set of query sequences. Imagine an RNA-seq data-set with a high percentage of sequences without any alignments against a protein database (e.g. blastx against NR). This feature could be used to select and extract the sequences without hits (red ones) into a new project. These sequences could be blasted (blastn) against a set of EST sequences. The initially unaligned sequences are now replaced with the ESTs (Retrieve Blast Top-Hit). In the last step, these sequences are blasted against NR and will hopefully return valid protein hits to follow the functional annotation pipeline with GO Mapping an GO Annotation.

For each Top-Hit (first significant alignment from an already performed BLAST), apply the filters (bottom part of the dialog) and search them in the corresponding database (online). 
It is possible to either replace the sequence from your data-set or to extract them into a new data-set (Action option). You can also decide whether you want to keep the original sequence names or if you want to rename them to the downloaded sequences names. The latter will add a small note to the sequence description, telling you the original name. 
The last remaining option allows you to decide whether you want to replace your sequences with the downloaded ones (default) or if you just want to retrieve their name.

Figure 1: Example workflow for "Double-Blast", using Retrieve Blast Top-Hit.