DOE Joint Genome Institute's GOLD logo JGI HOME     LOG IN
  • Home
  • Search
    • Advanced Search
    • Metadata Search
  • Distribution Graphs
  • Biogeographical Metadata
    • Biosample Distribution Map
    • Organism Distribution Map
  • Ecosystem Classification
  • SRA Explorer
  • Statistics
  • Usage Policy
  • Team
  • Help
  • News
  • Downloads
  • GOLD API
Studies 56,028
Biosamples 191,558
Sequencing Projects 515,647
Analysis Projects 381,514
Organisms 480,483
  • Help/FAQs

Help Page

 

GOLD Documentation

GOLD Project Entry Help Document

GOLD Standardized Metagenome Naming Document

Guidance on submitting public NCBI genomes and metagenomes into GOLD and IMG

Standardized Metagenome-Assembled Genome (MAG) naming in GOLD

 Contact us with Feedback or Questions

Report an issue or send a message to GOLD

GOLD Terminology

TermDescription
Analysis Project Analysis Project is the informatics processing of a Sequencing Project. It describes how the assembly and annotation of a Sequencing Project were performed. Individual submissions in IMG represent individual Analysis Projects.
Analysis Project Type Analysis project type is determined by the sequencing project type and basically describes the annotation process applied. Common examples are isolate genome analysis, metagenome analysis, metatranscriptome analysis, etc.
Biome GOLD biome represents the environmental sample selected for sequencing
Biosample GOLD Biosample corresponds to the physical material collected from the environment, and by effect represents the descriptor of the metadata that is associated with an environmental sample. A GOLD Biosample is applicable for metagenome/metatranscriptome projects only. Note that GOLD’s definition of Biosample is conceptually different from NCBI’s BioSample that encompasses both organism and environmental samples.
Contig Contig is a consensus sequence of a continuous DNA fragment generated from a set of overlapping reads; as a result the bases, their order, as well as the length of the fragment are known with high confidence. This is in contrast to scaffold, which is reconstructed based on information generated as a result of partial sequencing of DNA fragments (e.g., End-sequencing of clones), and therefore has gaps represented by Ns and has uncertain length.
Culture Type Represents the type of the culture from which the organism has been obtained. The Culture type has two values, it can be either an isolate or co-culture.
Draft Draft is a type of "sequencing status" of a genome project which is at an incomplete or "draft" stage.
Ecosystem Classification Paths Ecosystem classification paths describe the environment from which an environmental sample or an organism was collected. This five level hierarchical classification system was described by Ivanova et. al. in a paper titled "A call for standardized classification of metagenome projects." Ecosystem at the top describes the broader environment (Environmental, Engineered and Host-associated) and at the specific ecosystem at the bottom of this hierarchy refers to the specific feature of the environment as shown in the following example.
Five Levels: Ecosystem -> Ecosystem Category -> Ecosystem Type -> Ecosystem Subtype -> Specific Ecosystem
Example Path: Environmental -> Aquatic -> Marine -> Oceanic - > Aphotic zone
Ecosystem An ecosystem is a combination of a physical environment (abiotic factors) and all the organisms (biotic factors) that interact with this environment. The abiotic factors play a profound role on the type and composition of organisms in a given environment. The GOLD Ecosystem at the top of the five-level classification system is aimed at capturing the broader environment from which an organism or environmental sample is collected. The three broad groups under Ecosystem are Environmental, Host-associated, and Engineered. They represent samples collected from a natural environment or from another organism or from engineered environments like bioreactors respectively.
Ecosystem Category Ecosystem categories represent divisions within the ecosystem based on specific characteristics of the environment from where an organism or sample is isolated. For example, the Environmental ecosystem is divided into Air, Aquatic and Terrestrial. Ecosystem categories for Host-associated samples can be individual hosts or phyla and for engineered samples it may be manipulated environments like bioreactors, solid waste etc.
Ecosystem Type Ecosystem types represent things having common characteristics within the Ecosystem Category. These common characteristics based grouping is still broad but specific to the characteristics of a given environment. For example, the Aquatic ecosystem category may have ecosystem types like Marine or Thermal springs etc. Ecosystem category Air may have Indoor air or Outdoor air as different Ecosystem Types. In the case of Host-associated samples, ecosystem type can represent Respiratory system, Digestive system, Roots etc.
Ecosystem Subtype Ecosystem subtypes represent further subdivision of Ecosystem types into more distinct subtypes. For example, Ecosystem Type Marine (Environmental -> Aquatic -> Marine) is further divided into Intertidal zone, Coastal, Pelagic, Intertidal zone etc. in the Ecosystem subtype category.
Specific Ecosystem Specific ecosystems represent specific features of the environment like aphotic zone in an ocean or gastric mucosa within a host digestive system. They help to define samples based on very specific characteristics of an environment under the five-level classification system.
Ecotype Ecotype is a population of a species that survives as a distinct group through environmental selection and isolation and that is comparable with a taxonomic subspecies, but not yet classified as a subspecies.
Finished Finished represent the quality of a sequencing project when the genome sequences have less than 1 error per 100,000 base pairs and where each replicon is assembled into a single contiguous sequence with a minimal number of possible exceptions commented in the submission record. All sequences are complete and have been reviewed and edited, all known mis-assemblies have been resolved, and repetitive sequences have been ordered and correctly assembled. The definition is following community standards as defined here
GPTS Proposal ID This is a unique legacy JGI proposal ID assigned to old JGI sequencing projects.
Habitat Natural environment of an organism or biosample; the place that is natural for the life and growth of an organism or a general description of the place where a biosample was collected from. E.g. Wetland, Human skin etc.
IMG Submission ID This is a unique ID a dataset receives when submitted to the IMG annotation pipeline.
ITS Proposal ID This is a unique ID assigned to all the proposals approved for sequencing at the JGI
ITS SPID This is a unique ID assigned to all the JGI’s sequencing projects
JGI Genome Portal This is a centralized resource at the JGI for data download and can be accessed here
MAG Genomes that have been reconstructed through assembly and binning from metagenomes (standing for Metagenome Assembled Genomes).
Metagenome The study of genetic material isolated directly from environmental samples, such as water, soil or sediments, may also be referred to as environmental genomics, ecogenomics or community genomics.
Metagenome - Cell Enrichment Metagenome - Cell Enrichment is a draft metagenome assembly derived from a cell enrichment (> 1 cell) sample. A cell enrichment is generally obtained by physical separation of a biologically relevant unit, such as microcolonies. Due to the low biomass for cell enrichments, the extracted DNA is typically amplified using whole-genome amplification prior to sequencing.
Metagenome - Single Particle Sort Metagenome - Single Particle Sort is a draft genome or metagenome assembly derived from a single particle isolated via flow cytometry. A single particle sort can consist of a single cell or an aggregate of multiple cells, not necessarily of the same phylogenetic background. The extracted DNA is amplified using whole-genome amplification prior to sequencing. No amplicon-based 16S rRNA gene information is available for single particle sorts.
Metatranscriptome The study of the expressed portion of genomes, mRNAs, isolated directly from an environmental sample that may be transcribed into cDNAs for high-throughput sequencing.
Organism An individual living thing. It can be plant, fungus, microbe etc.
Organism Type This refers to the origins of the organism and can be any of the following terms: Natural, Genetically modified, Hybrid, and Synthesized
Permanent Draft This is a status of a genome project which indicates no other sequencing improvements or gap closures are planned.
Proportal This an IMG DataMart focused on the analysis of Prochlorococcus (and related) species datasets. More information is available here.
Proportal CladeClade is a taxonomy designation for some Cyanobacterial lineages only, as specified by external submitters.
Scaffold Scaffolds consist of overlapping contigs separated by gaps.
Sequencing Project Sequencing Project is the individual organism or sample that is targeted for sequencing. An individual genome project may be composed of more than one sequencing reactions and/or sequencing technologies. A sequencing project may be an isolate genome, or a Metagenome sample, or a transcriptome, or a metatranscriptome, or a 16S survey, etc. From a single Biosample, multiple different sequencing projects may be performed. For JGI projects, one sequencing project must always be correlated with a single SPID.
Sequencing Quality This represents community-defined categories of standards that better reflect the quality of the genome sequence, based on our understanding of the technologies, available assemblers, and efforts to improve upon drafted genomes. The values are based on the Chain et al. publication.
Study Study is an umbrella Project and represents the list of sequencing projects that are part of the original research proposal. Proposal is a synonym to Study. E.g., HMP study, GEBA study.
Specimen GOLD specimen refers to the sequencing material source either an Organism or Biome
Type Strain This typically an alphanumeric string designating type strain status of an isolate genome. Type strain is the strain which was used when the species was first described, and is typically deposited and retrievable from service culture collections like DSMZ, ATCC, etc.
Uncultured Type Denotes how an uncultured organism was obtained. This applies both for real uncultured organisms as well as virtual organisms of metagenomic origin. The Uncultured type can be Single Cell, Pooled Single Cells, Population enrichment, Metagenomic etc.
WGS This is Whole Genome Sequencing.

Frequently Asked Questions

  • Q: What is GOLD?
    A: - The Genomes OnLine Database (GOLD) is a centralized catalogue of sequencing projects from around the world, along with their associated metadata.

  • Q: How do I cite GOLD in my publication?
    A: - If you use GOLD to assist in your research publications please cite Mukherjee et al., 2020

  • Q: How is GOLD related to IMG and JGI Genome Portal?
    A: - Integrated Microbial Genomes (IMG) system provides the tools for comparative analysis of genomes and metagenomes. The JGI Genome Portal provides unified access to all JGI datasets from where users can download sequence data. Users who wish to have sequences annotated in IMG are required to first define their project and provide necessary metadata in GOLD. Then they can upload sequences to IMG for annotation. Finally, the annotated sequences can be downloaded from the Genome Portal.

  • Q: Can I download sequence data from GOLD?
    A: - GOLD does not host any sequence data. You can download both raw and analyzed sequences from the JGI Genome Portal

  • Q: Do I submit any sequences in GOLD?
    A: - You do not submit any sequence in GOLD. Please enter metadata about your projects in GOLD and then upload sequence data to the IMG submission system for analysis.

  • Q: What do the different Project Statuses in GOLD mean?
    A: - Project Status of a GOLD Sequencing Project is based on the completion level of the genome. Based on the stage of sequencing, a GOLD Project Status may be 'Proposed' , 'In Progress' , 'Abandoned' , 'Permanent Draft' , 'Complete and Published' etc. A GOLD Project Status is considered to be 'Complete and Published' when all the sequencing is done and the genome has a chromosome based public assembly in GenBank. The Project Status is 'Permanent Draft' when its sequencing is complete but the sequences are not assembled into chromosomes and the corresponding WGS-based assembly is publicly available in GenBank or IMG.

  • Q: My Analysis Project has a review status “Waiting For User.” What does it mean?
    A: - It means that your project has been reviewed and it is either missing key metadata or doesn’t follow standardized naming conventions. In such cases we notify the submitter and wait for their response. Please check your inbox and spam folders for emails from a member of the GOLD Team.

  • Q: I need to correct some metadata related to my project, but I cannot do it. What do I do?
    A: - After you finish entering a project in GOLD, there are a limited number of fields which can be edited. The number of entities eligible for editing by an user is further limited after an Analysis Project is approved. If you need to update a name or any metadata after approval, please contact GOLD here

  • Q: I want to grant access to my projects to my colleague(s). How can I do it?
    A: - Please provide GOLD with your colleague(s) email(s) associated with their GOLD accounts and the project(s) ID(s).

  • Q: I have over a hundred projects to submit. Do I have to create them manually?
    A: - If you have more than 10 projects, please contact us here and one of our team members will assist you with batch submission of your projects.

  • Q: I created a project in GOLD and submitted sequences to IMG. When will my data be available for analysis?
    A: - Once you submit seq data to IMG for annotation it may take anytime from 2-4 weeks. You will be notified once the analysis is done. You can check the status on IMG submissions site.

  • Q: I see this new bacterial genome in NCBI but it is not in GOLD/IMG. When will it be available?
    A: - We import projects from NCBI to GOLD on a regular basis but given the scale of the projects at NCBI, there is a time lag before a genome is added in GOLD/IMG. Please contact us with relevant NCBI project details (example BioProject Accession, Biosample Accession, GenBank ID, SRA IDs etc.) and we will prioritize it for you.

  • Q: How do I download and install a local copy of GOLD?
    A: - We do not have a mechanism to support a local copy/installation of GOLD. However we are happy to work with our users individually and will be happy to generate custom reports based on your need. Please refer to our usage policy for more details.

Trainings and Workshops

GOLD-IMG Webinar: Data Submission & Management

Microbial Genomics & Metagenomics Workshops

Disclaimer
Credits
Contact
Accessibility / Section 508 Statement

©1997-2023 The Regents of the University of California
lbnl logo
DOE logo