Title

Documentation

General

The rapid advancement of next-generation sequencing technology yielded a deluge of world-wide horse genomic data for characterization of population genetic diversity and genomic selection. However, efficient storage, querying and visualization of such huge datasets remain challenging. Here, we developed a comprehensive Horse Genome Variation Database (HGVD) that provides six main functionalities: Gene Quick Search, Variation Search, Genomic Signature Search, Genome Browser, Alignment Search Tools (BLAT/BLAST) and Genome Coordinate Conversion Tool (LiftOver).

Related articles:

1. Dawei Cai, Siqi Zhu, Mian Gong, Naifan Zhang, Jia Wen, Qiyao Liang, Weilu Sun, Xinyue Shao, Yaqi Guo, Yudong Cai, Zhuqing Zheng, Wei Zhang, Songmei Hu, Xiaoyang Wang, He Tian, Youqian Li, Wei Liu, Miaomiao Yang, Jian Yang, Duo Wu, Ludovic Orlando, Yu Jiang (2022) Radiocarbon and genomic evidence for the survival of Equus Sussemionus until the late Holocene eLife 11:e73346.

Manual

I. Samples and population structure

Our database integrates resequencing data from published cattle genetic works, giving a total of 432 sample set representing 54 breeds. The set contains 10 geographic groups: 108 West European cattle, 83 Central South European cattle, 9 Middle East cattle, 9 Tibetan cattle, 28 Northeast Asian cattle, 47 North and Central Chinese cattle, 33 South Chinese cattle, 24 Indo-Pakistani cattle and 70 African cattle. Principal component analysis (PCA) and ADMIXTURE analysis demonstrated a clear genetic structure with samples from each geographical region clustering together. Six geographically distributed ancestral components can be roughly ascribed to: African taurine, European taurine, Eurasian taurine, East Asian taurine, Chinese indicine, and Indian indicine.

 

Fig 1. Geographic distribution and population genetics analyses of 393 horse individuals.

 
II. Gene quick search

We integrated information from NCBI, AmiGO 2 and KEGG. Users can input a gene symbol to view basic gene information (e.g., genomic location, transcript and protein sequence, GO ID and GO terms, and relevant KEGG pathways), gene variation information (e.g., SNPs and Indels), and gene selective signatures (e.g., FST, XP-CLR, XP-EHH, Pi, Hp, iHS). We also provide links to Gbrowse and external databases (NCBI, AmiGO 2, and KEGG) to help the user obtain more information, such as gene/mRNA/protein sequence, KEGG Orthology (KO), and motif.

III. Variation search

The HGVD allows users to obtain information of SNPs and indels by searching for a specific gene or a genomic region in three versions of the horse genome. Users can filter SNPs and indels further by "Advanced Search", in which some parameters, such as minor allele frequency and consequence type, can be set; this option enables users to narrow down the items of interest in an efficient and intuitive manner. The results are presented in an interactive table and graph. For SNPs and indels, users can obtain related details including variant position, alleles, minor allele frequency, variant effect, rs id and the allele frequency distribution pattern in 75 world-wide horse breeds or senve "core" horse groups.

1. SNPs or indels Search

IV. Signature search

Users can select a specific gene symbol or genomic region, one of the statistical methods (Pi, Hp, iHS, FST, XP-CLR, XP-EHH), and a specific "core" horse group to view the selection scores. In our database, the selection scores are pre-processed by several algorithms (Z-transform, logarithm) which are commonly used in published papers. The results are retrieved in a tabular format. When users click the "show" button on the table, selective signals are displayed in Manhattan plots or common graphics, where the target region or gene is highlighted in red/blue colour.

 
V. BGVD tools

1. Alignment search tools (BLAT/BLAST)

We introduced two sequence alignment tools, webBlat and NCBI wwwBLAST. The webBlat can be used to quickly search for homologous regions of a DNA or mRNA sequence, which can then be displayed in the browser. BLAST can find regions of local similarity between sequences, which can be used to infer functional and evolutionary relationships between sequences.

Project organizers

Yu Jiang

Northwest A&F University, Yangling, Shaanxi, China

Email: yu.jiang@nwafu.edu.cn

Chuzhao Lei

Northwest A&F University, Yangling, Shaanxi, China

Email: leichuzhao1118@ nwafu.edu.cn