Studies | 61,342 |
Biosamples | 209,180 |
Organisms | 516,630 |
Sequencing Projects | 573,574 |
Analysis Projects | 434,770 |
GOLD Project Entry Help Document
GOLD Standardized Metagenome Naming Document
Guidance on submitting public NCBI genomes and metagenomes into GOLD and IMG
Standardized Metagenome-Assembled Genome (MAG) naming in GOLD
Report an issue or send a message to GOLD. (Fields marked with an * are required.)
Term | Description |
---|---|
Analysis Project | Analysis Project is the informatics processing of a Sequencing Project. It describes how the assembly and annotation of a Sequencing Project were performed. Individual submissions in IMG represent individual Analysis Projects. |
Analysis Project Type | Analysis project type is determined by the sequencing project type and basically describes the annotation process applied. Common examples are isolate genome analysis, metagenome analysis, metatranscriptome analysis, etc. |
Biome | GOLD biome represents the environmental sample selected for sequencing |
Biosample | GOLD Biosample corresponds to the physical material collected from the environment, and by effect represents the descriptor of the metadata that is associated with an environmental sample. A GOLD Biosample is applicable for metagenome/metatranscriptome projects only. Note that GOLD’s definition of Biosample is conceptually different from NCBI’s BioSample that encompasses both organism and environmental samples. |
Contig | Contig is a consensus sequence of a continuous DNA fragment generated from a set of overlapping reads; as a result the bases, their order, as well as the length of the fragment are known with high confidence. This is in contrast to scaffold, which is reconstructed based on information generated as a result of partial sequencing of DNA fragments (e.g., End-sequencing of clones), and therefore has gaps represented by Ns and has uncertain length. |
Culture Type | Represents the type of the culture from which the organism has been obtained. The Culture type has two values, it can be either an isolate or co-culture. |
Draft | Draft is a type of "sequencing status" of a genome project which is at an incomplete or "draft" stage. |
Ecosystem Classification Paths |
Ecosystem classification paths describe the environment from which an environmental sample or an organism was collected. This five level hierarchical classification system was described by Ivanova et. al. in a paper titled "A call for standardized classification of metagenome projects." Ecosystem at the top describes the broader environment (Environmental, Engineered and Host-associated) and at the specific ecosystem at the bottom of this hierarchy refers to the specific feature of the environment as shown in the following example.
Five Levels: Ecosystem -> Ecosystem Category -> Ecosystem Type -> Ecosystem Subtype -> Specific Ecosystem Example Path: Environmental -> Aquatic -> Marine -> Oceanic - > Aphotic zone |
Ecosystem | An ecosystem is a combination of a physical environment (abiotic factors) and all the organisms (biotic factors) that interact with this environment. The abiotic factors play a profound role on the type and composition of organisms in a given environment. The GOLD Ecosystem at the top of the five-level classification system is aimed at capturing the broader environment from which an organism or environmental sample is collected. The three broad groups under Ecosystem are Environmental, Host-associated, and Engineered. They represent samples collected from a natural environment or from another organism or from engineered environments like bioreactors respectively. |
Ecosystem Category | Ecosystem categories represent divisions within the ecosystem based on specific characteristics of the environment from where an organism or sample is isolated. For example, the Environmental ecosystem is divided into Air, Aquatic and Terrestrial. Ecosystem categories for Host-associated samples can be individual hosts or phyla and for engineered samples it may be manipulated environments like bioreactors, solid waste etc. |
Ecosystem Type | Ecosystem types represent things having common characteristics within the Ecosystem Category. These common characteristics based grouping is still broad but specific to the characteristics of a given environment. For example, the Aquatic ecosystem category may have ecosystem types like Marine or Thermal springs etc. Ecosystem category Air may have Indoor air or Outdoor air as different Ecosystem Types. In the case of Host-associated samples, ecosystem type can represent Respiratory system, Digestive system, Roots etc. |
Ecosystem Subtype | Ecosystem subtypes represent further subdivision of Ecosystem types into more distinct subtypes. For example, Ecosystem Type Marine (Environmental -> Aquatic -> Marine) is further divided into Intertidal zone, Coastal, Pelagic, Intertidal zone etc. in the Ecosystem subtype category. |
Specific Ecosystem | Specific ecosystems represent specific features of the environment like aphotic zone in an ocean or gastric mucosa within a host digestive system. They help to define samples based on very specific characteristics of an environment under the five-level classification system. |
Ecotype | Ecotype is a population of a species that survives as a distinct group through environmental selection and isolation and that is comparable with a taxonomic subspecies, but not yet classified as a subspecies. |
Finished | Finished represent the quality of a sequencing project when the genome sequences have less than 1 error per 100,000 base pairs and where each replicon is assembled into a single contiguous sequence with a minimal number of possible exceptions commented in the submission record. All sequences are complete and have been reviewed and edited, all known mis-assemblies have been resolved, and repetitive sequences have been ordered and correctly assembled. The definition is following these community standards. |
GPTS Proposal ID | This is a unique legacy JGI proposal ID assigned to old JGI sequencing projects. |
Habitat | Natural environment of an organism or biosample; the place that is natural for the life and growth of an organism or a general description of the place where a biosample was collected from. E.g. Wetland, Human skin etc. |
IMG Submission ID | This is a unique ID a dataset receives when submitted to the IMG annotation pipeline. |
ITS Proposal ID | This is a unique ID assigned to all the proposals approved for sequencing at the JGI |
ITS SPID | This is a unique ID assigned to all the JGI’s sequencing projects |
JGI Genome Portal | JGI's centralized resource for data downloads is the JGI Genome Portal |
MAG | Genomes that have been reconstructed through assembly and binning from metagenomes (standing for Metagenome Assembled Genomes). |
Metagenome | The study of genetic material isolated directly from environmental samples, such as water, soil or sediments, may also be referred to as environmental genomics, ecogenomics or community genomics. |
Metagenome - Cell Enrichment | Metagenome - Cell Enrichment is a draft metagenome assembly derived from a cell enrichment (> 1 cell) sample. A cell enrichment is generally obtained by physical separation of a biologically relevant unit, such as microcolonies. Due to the low biomass for cell enrichments, the extracted DNA is typically amplified using whole-genome amplification prior to sequencing. |
Metagenome - Single Particle Sort | Metagenome - Single Particle Sort is a draft genome or metagenome assembly derived from a single particle isolated via flow cytometry. A single particle sort can consist of a single cell or an aggregate of multiple cells, not necessarily of the same phylogenetic background. The extracted DNA is amplified using whole-genome amplification prior to sequencing. No amplicon-based 16S rRNA gene information is available for single particle sorts. |
Metagenome - SIP specific Terms: | |
SIP | Stable Isotope Probing, a type of experimental design where active growth is analyzed through assimilation of stable isotope-labeled media or unlabeled controls. |
Isotopic Label | A stable isotope (e.g. 13C, 15N, 18O, unlabeled) that is incorporated into growth media, with the assumption that active growth will incorporate the isotope into biomass. |
Metagenome - SIP | a type of SIP experimental design where active microbial population growth is analyzed through DNA assimilation of stable isotope-labeled media or controls (e.g. 12C, 13C, 14N, 15N, etc). |
SIP Fractionation | A process of SIP experimental design where SIP Parent samples are separated into multiple “Fraction” samples via density gradient. |
SIP Parent | Samples from a SIP experimental design that were either grown with a labeled media supply or under unlabeled control conditions. The term “Parent” indicates the sample represents a pre-fractionation state within SIP experimental design. A SIP Parent will spawn >1 associated SIP Fractions. |
SIP Fraction | A sample that is the result of density gradient fractionation of a SIP Parent sample. |
Fraction Density | The density (g/mL) of the SIP Fraction sampled from a gradient density during SIP fractionation of a SIP Parent sample. Fraction density provides the relative position of SIP Fraction samples, indicating the relative isotopic enrichment of the fraction. |
SIP Sample Group Name | Name used to associate groups of connected SIP Parent samples, i.e., a set of corresponding treatment and control replicates. |
qSIP | Quantitative Stable Isotope Probing, analysis of Metagenome - SIP data that estimates levels of isotopic enrichment in metagenome assembled genomes (MAGs) recovered from Metagenome-SIP projects. |
Metatranscriptome | The study of the expressed portion of genomes, mRNAs, isolated directly from an environmental sample that may be transcribed into cDNAs for high-throughput sequencing. |
Organism | An individual living thing. It can be plant, fungus, microbe etc. |
Organism Type | This refers to the origins of the organism and can be any of the following terms: Natural, Genetically modified, Hybrid, and Synthesized |
Permanent Draft | This is a status of a genome project which indicates no other sequencing improvements or gap closures are planned. |
Proportal | This is an IMG DataMart focused on the analysis of Prochlorococcus (and related) species datasets. More information. |
Proportal Clade | Clade is a taxonomy designation for some Cyanobacterial lineages only, as specified by external submitters. |
Scaffold | Scaffolds consist of overlapping contigs separated by gaps. |
Sequencing Project | Sequencing Project is the individual organism or sample that is targeted for sequencing. An individual genome project may be composed of more than one sequencing reactions and/or sequencing technologies. A sequencing project may be an isolate genome, or a Metagenome sample, or a transcriptome, or a metatranscriptome, or a 16S survey, etc. From a single Biosample, multiple different sequencing projects may be performed. For JGI projects, one sequencing project must always be correlated with a single SPID. |
Sequencing Quality | This represents community-defined categories of standards that better reflect the quality of the genome sequence, based on our understanding of the technologies, available assemblers, and efforts to improve upon drafted genomes. The values are based on the Chain et al. publication. |
Study | Study is an umbrella Project and represents the list of sequencing projects that are part of the original research proposal. Proposal is a synonym to Study. E.g., HMP study, GEBA study. |
Specimen | GOLD specimen refers to the sequencing material source either an Organism or Biome |
Temperature Range | Temperature Range describes a broader band of temperature at which a particular Organism can grow. We use controlled vocabulary (CV) terms to describe temperature ranges. They are: Psychrophile (below 0 to 9.99 °C); Psychrotrophic (10 to 19.99 °C); Mesophile (20 to 45 °C); Thermophile (45.1 – 79.99 °C) and Hyperthermophile (80 °C and above). Temperature range field Psychrotolerant is used for Organisms that survive at low temperature and can also grow at elevated temperatures. Temperature range field Thermotolerant is used for Organisms that grow like Mesophiles but can also tolerate temperatures above 45 °C. |
Type Strain | This typically an alphanumeric string designating type strain status of an isolate genome. Type strain is the strain which was used when the species was first described, and is typically deposited and retrievable from service culture collections like DSMZ, ATCC, etc. |
Uncultured Type | Denotes how an uncultured organism was obtained. This applies both for real uncultured organisms as well as virtual organisms of metagenomic origin. The Uncultured type can be Single Cell, Pooled Single Cells, Population enrichment, Metagenomic etc. |
WGS | This is Whole Genome Sequencing. |
Webinars
GOLD-IMG Webinar: Data Submission & Management
Workshops
Microbial Genomics & Metagenomics Workshops
Short Videos