General
Many genome projects have generated a deluge of whole-genome data from modern domesticated goats,
ancient goats, and their wild relatives, to identify genetic diversity, selective regions and
historic introgression events that involved in domestication, subsequent intensive breeding, or
climate-driven adaptation. Nevertheless, there is no dedicated database integrating these resource.
Here we used a uniform pipeline to develop a comprehensive goat database (GGVD) which focuses
on high-quality SNPs, indels, selective regions, and introgression from 208 modern domestic goats,
24 bezoars, 46 ibex, and 82 ancient goats. A total of ~41.44 M SNPs and ~5.14 M indels were identified
in modern goat samples. Selected loci were implemented through eight statistical tests (FST, Pi ratio,
XP-EHH, Pi, Hp, Tajima’s D, CLR, and iHS). Introgression regions between ibex species and domestic goats
were also integrated into database. Users can freely visualize the frequency of genomic variations
in geographical maps, selective regions in interactive tables, Manhattan plots or line charts, and
especially, the SNP genotype patterns in heatmaps. Ancient goats were used to track the spatiotemporal
trajectories of each genetic variant. Moreover, we have introduced the UCSC Genome Browser, BLAT,
BLAST, LiftOver, and pcadapt into database. GGVD allows users to identify breeding-associated candidate genes
and variants, and track the state of variants before, during and following selection and introgression
events. GGVD will be a useful archive for future genetic studies and goat breeding.
Related articles:
1. Zheng Z., Wang X., Li M., Li Y., ... Chen Y & Jiang Y. (2020) The origin of domestication genes in
goats. Science Advances, 6, eaaz5216.
2. Cai Y., Fu W., Cai D., Heller R., Zheng Z., ... Jiang Y & Wang X. (2020) Ancient genomes reveal the
evolutionary history and origin of cashmere producing goats in China. Molecular Biology and Evolution,
37, 2099-2109.
Manual
I. Samples and population structure
Our database integrates 208 modern domestic goats, 24 bezoars, 82 ancient goats,
and 46 ibex from published genetic works. The principal component analysis (PCA) and neighbor-joining (NJ)
phylogenetic tree revealed a clear genetic structure as samples from the same geographical origin cluster
together except African dairy goats, roughly consistent with our previous results. Given that dairy goats
are mainly originated from Europe, it is reasonable for this group clustered with European goats to form
one large branch. Combined with our previous classification, eight geographically distributed subgroups
according to PCA and NJ tree can be roughly ascribed to: Bezoars, Africa, Africa (dairy), Europe, Australia,
Southwest Asia, South Asia, and East Asia.
Fig 1. Population genetic structure and data processing pipeline of 360 goat individuals.
II. Variation search
The GGVD allows users to obtain information of SNPs and indels by searching for a specific variant rsid, a gene symbol or a genomic region. Users can filter SNPs and indels further by "Advanced Search", in which some parameters, such as minor allele frequency and consequence type, can be set; this option enables users to narrow down the items of interest in an efficient and intuitive manner. The results are presented in an interactive table and graph. Based on the returned results, users can obtain related details including variant position, alleles, minor allele frequency, variant effect, rs id and the allele frequency distribution pattern in 25 geographically distributed goat populations or eight subgroups according to genetic structure.
SNPs or indels search
Fig 2. Screenshots of a SNP data search and the results for examples.
III. Signature search
For basic search, users can select a specific gene symbol or genomic region, one of the statistical methods (Pi, Hp, Tajima's D, CLR, iHS, FST, Pi ratio, XP-EHH), and a specific goat group to view the selection scores. The results are retrieved in a tabular format. When users click the "show" button on the table, selective signals are displayed in Manhattan plots, where the target region or gene is highlighted in red colour. For advanced search, users can select groups and methods by any combination to view the selection scores in line charts, which is usually used as a partial enlarged view of selective sweep. In our database, the selection scores are pre-processed by several algorithms, such as Z-transform, logarithm, and p values are calculated according to data distribution mode.
Fig 3. Screenshots of search for selective signature and representation of the P2RY1 gene.
IV. Introgression browse
In this section, putative introgressed haplotypes with a frequency higher than 0.1 in goats were kept. Users can browse introgression segments that facilitated goat domestication in the following ways, including viewing segment length, introgression frequency, similarity between ibex species and domestic goats, and gene annotation. Moreover, the introgressed haplotypes of all samples can be displayed in Gbrowse to help identify the most likely source of species and lineages.
Fig 4. Screenshots of introgression data search and representation of MUC6 gene.
V. GGVD tools
1. Local UCSC genome browser
Users can search with a gene symbol, or a transcript name, or a genomic region to view SNPs, indels, genomic
signature, genotype patterns, and conserved elements in the global view. Currently, 90 tracks have been released
for the goat ARS1 assembly. The "PDF/PS" item under the "View" menu of navigation bar was used to generate a high
quality image in PostScript or PDF formats.
2. Alignment search tools (BLAT/BLAST)
We introduced two sequence alignment tools, webBlat and viroBLAST. The webBlat can be used to quickly
search for homologous regions of a DNA or mRNA sequence, which can then be displayed in the browser.
ViroBLAST can find regions of local similarity between sequences, which can be used to infer functional
and evolutionary relationships between sequences.
3. Genome coordinate conversion tool (liftOver)
We also introduced a genome coordinate conversion Tool, liftOver. The liftOver tool is used to translate
genomic coordinates from one assembly version into another and also retrieves putative orthologous regions
in other species. Our database produces six liftOver chain files (ASM_vs_CHIR_1.0.minimap2.chain.gz,
ASM_vs_CHIR_2.0.minimap2.chain.gz, CHIR_1.0_vs_ASM.minimap2.chain.gz, CHIR_1.0_vs_CHIR_2.0.minimap2.chain.gz,
CHIR_2.0_vs_ASM.minimap2.chain.gz, and CHIR_2.0_vs_CHIR_1.0.minimap2.chain.gz) and provides an online lift
for any two versions of the goat genome.
4. Genome selective signature scans tool (pcadapt)
GGVD also provide a web tool, pcadapt, for analyzing PCA and detecting selective signatures of user defined
individuals and groups. The pcadapt-related R package is encapsulated in the backend interface and the results
will be sent to the user's mailbox after the submitted task is completed.