The genomic data in ggKbase is organized by the following data types and hierarchy:
You can go to the Projects landing page by clicking on the Projects link on the top menu anywhere on the website. The dropdown menu (shown as a list image) next to the Projects is a shortcut for you to see the list of accessible projects.
You can also go to the Projects page by clicking on the green "Explore projects" button on the home page.
Note: The list of projects that you can see on the Projects page depends on your user account permissions.
Projects are organized into "Project Groups," based on different factors, such as study site or type. However, one project can only belong to one project group.
Each table row is one project. The first two columns have the name and description of the projects. Following them are the number of organisms, contigs, and features in the projects, which serve as the other three basic data type in ggKbase.
When a project is selected, you will arrive at the Project page that displays all the organisms in the project. There are two types of projects in ggKbase: binning and analysis projects.
If the project is of the binning type, which is the majority of the projects in ggKbase, you will have the metagenome or unknown bin that contains the entire raw genomic data sample. The Metagenome bin will undergo the binning process, which aims to identify and separate individual organisms. They are placed in the table, sorted by the abundance of the organisms, under the "Bins" header.
For analysis project, the Project page does not have the Metagenome bin. It usually contains bins from many different sources, including outside reference genomes. Usually, genomes with some common characteristics are placed in an analysis project. For example, this would make it easy to produce a genome summary based on certain collection of bins.
When you click on one of the projects in the table, you enter into the Organism page. This page provides you a wealth of information about the organism. At the meta level, there are Metagenomic Context, Taxonomy, Properties, and Genome completeness - contained in the four boxes below the the name of the organism.
Below them are tabs of tables containing the number of contigs, genes, rnas, trnas, ribosomal proteins, bacterial SCGs, archaeal SCGs, as well as the taxonomy table. These tables can be sorted by different headers. For example, the contigs table can be sorted by name, number of features, GC content, and coverage.
Contig or Scaffold
When you click on one of the contigs in the Organism page, you will be taken to the Contig (a.k.a. Scaffold) page that displays all the genes or features that make up that contig.
The first box below the name of the contig shows you the taxonomy winner at different taxonomy levels. The taxonomy winner is determined by the majority (>50%) of the phylogenetic makeup among the genes that make up the contig. Each gene has been taxonomically classified based on their annotations. Each classification corresponds to a specific color.
On the Contig page, you are provided with a number of actions, such as Download DNA, Blast sequence, and Taxonomy visualization.
Feature or Gene
Features or genes are the smallest building block in ggKbase. They are first annotated by being BLASTed against the NCBI database. Further annotations may be added manually by researchers.
On the Feature page, you are shown all the details about the feature, including its Top 3 Annotations, Cross References, Similarities, and Sequences. You can also add it to a list and put additional notes about the feature.