Select a stage name below to get a detailed description and images. 2015). https://doi.org/10.1016/j.cbpc.2016.04.004, CAS  There was a region of chromosome 4 with drastically fewer variants in our study (Appendix Fig. PubMed Central  4) that was also reported in (Butler et al. https://doi.org/10.1038/nature12111, Howe DG, Bradford YM, Eagle A et al (2017) The Zebrafish Model Organism Database: new support for human disease models, mutation details, gene expression phenotypes and searching. 2015). National Academies Press (US), Washington, D.C., pp 1–13, Bowen ME, Henke K, Siegfried KR et al (2012) Efficient mapping and cloning of mutations in zebrafish by low-coverage whole-genome sequencing. The adjustment of the MQ threshold from GATK’s recommendation of 40 to 35 accounted for the difference in quality score reporting between the aligner suggested by GATK (BWA) and Bowtie 2. In addition to discovering more variants, the design allowed us to estimate allele frequencies for a population more accurately than previously possible due to bias when estimating based on read frequencies in a pool (Raineri et al. So from a single cell the day they're born, they will have a head, and a tail, and a beating heart within 24 hours. We establish T5D as a model that is representative of diversity levels within laboratory zebrafish lines and demonstrate that experimental design and analysis can exert major effects when characterizing genetic diversity in heterogeneous populations. Haploid DNA contents (C-values, in picograms) are currently available for 6222 species (3793 vertebrates and 2429 non-vertebrates) based on 8004 records from 786 published sources.You can navigate the database using the menu on the left. J Gerontol Ser A 70:1470–1478. For these samples, 350 ng of DNA was used in the library preparation. 2011). d Proportion of indels for discrete alternate allele frequencies. 3b, d). The authors wish to thank the Center for Genome Research and Biocomputing (CGRB) at Oregon State University for providing core support to conduct the sequencing studies and the Bioinformatics Consulting and Services Core (BCSC) at North Carolina State University for bioinformatics support. https://doi.org/10.1093/gbe/evr090, Mrakovcic M, Haley LE (1979) Inbreeding depression in the Zebra fish Brachydanio rerio (Hamilton Buchanan). GC content for each sample was ~ 37%, which is consistent with the zebrafish genome (Han and Zhao 2008). This low-variability region lies within an area of the genome that has primarily zebrafish-specific genes not homologous to other species (Howe et al. Toxicol Sci 137:212–233. The abundance of sites with non-reference alleles per T5D zebrafish could imply that within a population, zebrafish are more genetically variable than humans. Here, we characterize salient features of population genetic architecture of the Tropical 5D (T5D) line as a representative laboratory population of zebrafish. BMC Genom. a Venn diagram of SNP sites (in millions) compared to the Zv9 reference genome. This latest assembly has been refined by the addition of nearly 1000 finished clone sequences and the resolution of more than 400 genome issues. All authors have stated that there are no conflicts of interest. T5D was found to have more variants compared to results from studies using pooled sequencing and smaller sample sizes (Fig. All library preparation and sequencing were performed at Oregon State University’s Center for Genome Research and Biocomputing (http://cgrb.oregonstate.edu/core). The 20.1 M SNPs equate to 13.4 SNPs per 1 kb genomic sequence. This was performed with the aims of characterizing genomic variability in the outbred, T5D wild-type zebrafish population, discovering the type of variation (common SNPs versus rare variants, etc.) Nucmer was run with the—mum option. Bioinform Appl NOTE 25:2078–2079. 2016), so exposure would not have altered constitutive DNA sequence. Genome Res 20:1297–1303. 2011). 2016), an individual zebrafish within the T5D population may vary more from the current zebrafish reference genome than individuals from certain human ethnic populations vary compared to the human reference genome. 2012). VEP (McLaren et al. Google Scholar, Betts K, Shelton-Davenport M (2016) Interindividual Variability: New Ways to Study and Implications for decision making: workshop in brief. There are also long-term benefits associated with creating a database of known SNPs in zebrafish populations. Commonly used to understand gene function. Mamm Genome 23:713–718. 2016). PubMed Central  The NHGRI-1 line was derived from one mating pair of TAB-5 (a TU and AB cross), where the founding male was previously sequenced at 52× coverage and the female at 47× (LaFave et al. PubMed  Indeed, the zebrafish model is gaining tractability as a human disease model (Howe et al. In order to use the CC mice in an infrastructure more similar to naturally occurring populations with heterozygosity, an outbred population was created. 2017). A major difference from many model organisms is that standard husbandry practices in zebrafish are designed to maintain population diversity. PubMed  We further show that regulatory interactions ancestral to vertebrates con… https://doi.org/10.1038/ncomms1248, PubMed  (2009)). This downsampling approach resulted in a twofold reduction in variant calling capability, providing evidence that sequencing design could be a major driver of variability differences among zebrafish lines. The Zebrafish Information Network (ZFIN) is the database of genetic and genomic data for the zebrafish (Danio rerio) as a model organism.ZFIN provides a wide array of expertly curated, organized and cross-referenced zebrafish research data. Nature 496:498–503. The zebrafish (Danio rerio) has long been appreciated as a unique model system for vertebrate genetics and developmental biology.The zebrafish genome is about half the size of most mammalian genomes [] containing some 4.6 pg of DNA [] distributed across 25 pairs of chromosomes (2n = 50) [].Initial comparisons of zebrafish and mammalian gene maps have revealed extensive … Zebrafish variant comparisons. 2009). b Allele frequency spectrum for common human variants. The zebrafish genome project at the Wellcome Sanger Institute produced the zebrafish reference assembly of the Tuebingen strain. Nature 2013; 496 (7446): 498-503. 2015). Comparisons between named strains and inter-lab populations of zebrafish have shown variability in several phenotypes, providing the rationale that constitutive genetic variation may contribute to the variability in exposure response (Lange et al. To address the impact of sequence design on comparisons between T5D and other lines that used pooled sequencing, a portion of the T5D data was used as a simulated pool. A VCF file for NHGRI-1 (LaFave et al. https://doi.org/10.1089/zeb.2012.0848, Raineri E, Ferretti L, Esteve-Codina A et al (2012) SNP calling by sequencing pooled samples. Estimates in other species have been similar (4.9 SNPs per kb in sheep, 5.5 SNPs per kb in chickens, 10.1 SNPs per kb in fly, and 13.9 SNPs per kb in mouse), though they have been based on combined line/breed data (Ka-Shu Wong et al. Subsampling to simulate a pooled sequencing approach showed that T5D variation is in line with the more variable zebrafish laboratory strains (Fig. 2012; Butler et al. https://doi.org/10.1007/s11051-009-9740-9, Article  2015). Genetics 114:1291–1308, Yang H, Wang JR, Didion JP et al (2011) Subspecific origin and haplotype diversity in the laboratory mouse. To filter T5D variants accordingly, the repeat masked annotation of Zv9 was downloaded from http://hgdownload.soe.ucsc.edu/goldenPath/danRer7/database/rmsk.txt.gz. Approximately 45 M single nucleotide polymorphisms (SNPs) segregate in the CC and DO populations, four times more than in any singular laboratory mouse strain (Yang et al. Its use as a laboratory animal was pioneered by the American molecular biologist George Streisinger and his colleagues at the University of Oregon in the 1970s and 1980s; Streisinger's zebrafish clones were among the earliest successful vertebrate clones created. 1b). Zebrafish is a model organism widely used for the understanding of gene function, including the fundamental basis of human disease, enabled by the presence in its genome of a high number of orthologs to human genes. One reason for the success of zebrafish as a model organism is its amenability to genetic manipulation. Using the Tanguay lab Tropical 5D zebrafish line (T5D), we performed whole genome sequencing on a large group (n = 276) of individual zebrafish embryos. Size of genome Value 1.41e+9 bp Range: 26,206 protein coding genes bp BWA outputs a larger range of mapping quality scores, averaging 60 for high confidence reads, whereas the maximum quality score for Bowtie 2 is 42, indicating a perfectly aligned read. Additionally, the CVF files had masked variants in non-complex regions of the genome. The T5D allele frequencies are based on 276 individual whole genome sequences. https://doi.org/10.1093/toxsci/kft235, Unckless RL, Rottschaefer SM, Lazzaro BP (2015) A genome-wide association study for nutritional indices in Drosophila. Google Scholar, Ivanov DK, Escott-Price V, Ziehm M et al (2015) Longevity GWAS using the Drosophila genetic reference panel. The T5D zebrafish are housed at Sinnhuber Aquatic Research Laboratory (SARL) at Oregon State University and maintained in accordance with their Institutional Animal Care and Use Committee protocols. Thus, this model could be used for large-scale studies of chemical bioactivity that include genetic information on response mechanisms during development of exposed individuals (Baer et al. https://doi.org/10.1038/nature10811, Mckenna A, Hanna M, Banks E et al (2010) The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. The zebrafish genome reference assembly: useful URLs. 2009; Truong et al. PubMed  To ensure consistency between datasets, we performed the same masking procedure on the AB, TU, TL, and WIK datasets even though masking had been previously performed. The y axis displays the variant count partitioned into 1 mb bins of genomic sequence (x axis). Environ Sci Technol 44:5979–5985. Genome Biol. ↑ Wellcome Trust Sanger Institute, Family ties: Relationship between human and zebrafish genomes. For the previously discovered variants in AB, TU, TL, and WIK, SNPs in TU followed a slightly different read frequency distribution, with fewer fixed SNPs. Google Scholar, Patowary A, Purkanti R, Singh M et al (2013) A sequence-based variation map of zebrafish. 2016). T5D variant counts and proportions of non-reference reads moved closer to those observed in other lines (Fig. The genome of the zebrafish Danio rerio is to be deciphered by a dedicated team at Britain's Sanger Centre. 1d). 2015). For these populations, each isogenic line has been sequenced. This can be explained in part by the heavy reliance of the reference genome sequence on TU zebrafish. https://doi.org/10.1007/s00335-018-9735-x, DOI: https://doi.org/10.1007/s00335-018-9735-x, Over 10 million scientific documents at your fingertips, Not logged in Nat Genet 36:1133–1137. c Number of models per disease category stacked by organism (from https://monarchinitiative.org). These results were compared to the other species (human and mouse). Nucleic Acids Res 45:D758–D768. The project to sequence the zebrafish's 1.7 x 109 base-pair genome - about half the size of that of the mouse or human genome - is expected to take three years. For example, in Caucasians an average of 3.3 M SNPs and 0.49 M indels with non-reference alleles were identified per individual (Shen et al. (All other zebrafish data refer to the reference genome and publically available data). Samples were sheared to ~ 320 bp, and 100 ng was used in the WaferGen robotic DNA library prep. 2013. We observed more intron variants in T5D, and synonymous gene transcript variant percentages fell between mouse and human (Fig. The estimate of 20.1 M SNPs segregating in the population (10.3 M in non-repetitive regions of the genome used for zebrafish line comparisons) included non-reference allele frequencies from 0.1 to 99.8%. 2012; Patowary et al. https://doi.org/10.1007/s00335-007-9045-1, Sakharkar MK, Perumal BS, Sakharkar KR, Kangueane P (2005) An analysis on gene architecture in human and mouse genomes. Often a single fish can give you somewhere between 20 and 200 offspring in a single breeding, which is for geneticists just absolutely great. Nat Genet 43:491–498. The proportion of the types of SNP found in T5D were similar to those reported by the dbSNP variant sites in both human and mouse. The mouse has been extensively used to mechanistically model human disease, but until the inception of a major recombinant inbred line (RIL) panel, the lack of variability within any single inbred strain did not sufficiently model human genetic variability (Churchill et al. With regards to habitat, Zebrafish are typically found in shallow ponds, canals and streams, etc (stagnant or slow-flowing waters of between 18 and 24 degrees Celsius). By 72 hours their brains are working, and fins and trunk are twitching, and by five days old they are swimming around and they're hunting and they're fully viable organisms. https://doi.org/10.1534/g3.114.016477, Usenko CY, Harper SL, Tanguay RL (2007) In vivo evaluation of carbon fullerene toxicity using embryonic zebrafish. J Fish Biol 15:323–327, Nasiadka A, Clark MD (2012) Zebrafish breeding in the laboratory environment. Their rapid development allows for high-throughput studies that can expand scientific discovery on several axes related to differential susceptibility. In order to create a RIL panel representing the genetic diversity among a more general populace of mice, the collaborative cross (CC) (Chesler et al. The effect of the variants on genes and transcripts and consequences on protein sequence were annotated for each species using Ensembl variant effect predictor (VEP) (McLaren et al. Model organisms have long been utilized to study genetic determinants underlying human disease susceptibility, because experiments can exert necessary controls over factors such as diet, lifestyle, and environment that would be impossible in a human setting. PubMed  Nonetheless, isogenic models of any species fail to model the influence of genetic diversity on toxicity responses, a critical factor in human responses to toxicants. Of these, 6.85 M overlap with the SNPs discovered in T5D. Because select individuals or entire communities may be especially susceptible to adverse health effects from chemical exposure through common consumer products, occupational hazards, environmental emergencies, or geographic location (Brette et al. 2011) hard filtering recommendations for SNPs and indels (filter SNPs with quality by depth (QD) < 2, phred-scaled Fisher’s exact test p-value (FS) > 60, root mean square mapping quality (MQ) < 35, mapping quality Mann–Whitney Rank-Sum < − 12.5, or read position Mann–Whitney Rank-Sum < − 8, strand odds ratio (SOR) > 3; filter indels with QD < 2, FS > 100, read position Mann–Whitney Rank-Sum < − 20, SOR > 10). Our observations suggest that interindividual genetic diversity (i.e., natural variation) within laboratory populations may be higher than currently estimated and may have implications for differential susceptibility observed in toxicological studies. c Venn diagram of indel sites (in millions). 2012). Genome Biol Evol 3:1187–1196. J Nanopart Res 12:1645–1654. a Genome size, known variant count in dbSNP, variant effect, and consequences of transcript variants. The zebrafish genome. Mammalian Genome https://doi.org/10.1242/dev.083931, Oliveira R, Grisolia CK, Monteiro MS et al (2016) Multilevel assessment of ivermectin effects using different zebrafish life stages. Considerably more SNPs and indels were discovered through individual whole genome sequencing of a large T5D sample than in other zebrafish studies, even exceeding the current build of dbSNP. 2013; LaFave et al. https://doi.org/10.1007/s00204-015-1554-1, Roberts A, Pardo-Manuel de Villena F, Wang W et al (2007) The polymorphism architecture of mouse genetic resources elucidated using genome-wide resequencing data: implications for QTL discovery and systems genetics. They're not very susceptible to disease. 1a). In order to assess the similarity of T5D variation to a hybrid population that has previously employed an individual sequencing approach, SNP sites were compared to NHGRI-1 SNP sites. Thus, inclusion of knowledge regarding constitutive genetic diversity will benefit all translational applications of the zebrafish model, from the mechanistic to the ecological to the clinical. 1). PubMed Google Scholar. The red box contains the variant effects for the 20.1 M SNPs found in T5D. These were screened out of the indel files to minimize the inclusion of microsatellite differences and other potential variants that may be more individual-based than population-based. 12,009,411 were successfully mapped to the Zv9 reference genome ( GRCz10 ) box contains the variant effects for first... The quality and quantity were verified using a fluorometric plate reader and Bioanalyzer an Illumina HiSeq 3000 12. Pg, the count decreased from 2,966,260 to 2,608,746 to 2,339,775 characteristics our..., DOI: https: //doi.org/10.1093/gbe/evr090, Mrakovcic M, Haley LE ( 1979 ) Inbreeding depression in library... Datasets for human, mouse, and synonymous gene transcript variant percentages fell between mouse human. And smaller sample sizes ( Fig VCF file for NHGRI-1 ( LaFave et.! Were detected in this pooled sample at an average of 4.2× coverage per site dbSNP... Comparative transcriptome analysis reveals vertebrate phylotypic period during organogenesis determine their predicted effects and consequences found to more! Explore this interindividual susceptibility ( French et al be captured without a reasonably large of. Of isogenic RILs ( Churchill et al ( Brown et al ( 2012 ) the variant! The consequences of transcript variants missed in the GRCz10 genome, as input for the time. Software MUMmer would be consistent with the zebrafish genome size genome note that Abamectin is (... Reference was used to determine variants ( those observed at frequencies of < 0.1 ) would have been at... Equate to 13.4 SNPs per 1 kb genomic sequence large sample of individuals genotypes for individuals with low coverage certain... Sites to the most recent zebrafish reference genome sequence name below to get detailed! That standard husbandry practices in zebrafish populations was proportional to chromosome length ( Appendix Fig ) the alignment/map! Whole genome sequences intron variants in our study ( Appendix table 1 ethnic/population-level choice of reference may influence number. Variable in the last 30 years, the project joined the genome that has primarily zebrafish-specific genes not homologous other. Capture diversity has also been used in the population, zebrafish have proven be... Simultaneously ( joint genotyping ) works in mice, too, but is! Verified using a Bayesian likelihood model for genotyping was proportional to chromosome length Appendix. Also features alternate loci scaffolds ( ALT_REF_LOCI ) for further improvement and ongoing maintenance 20.1 M found... Zebrafish, depending on gene size reference versus alternate alleles TU zebrafish for T5D compared the! Salzberg SL ( 2012 ) the ensembl variant effect, and zebrafish from NCBI ’ Center... Proportional to chromosome length ( Appendix table 1 generations ( Kovács et al Inbreeding depression the! Population diversity whilst the overall genome size database, Release 2.0 … the zebrafish is a member the! A link to the human genome is masked ( http: //www.repeatmasker.org/ ) in - 45.63.79.152 2018 workshop further. For comparison using the nucmer package from the software MUMmer ) compared to the Zv9 reference and. Library preparation and sequencing were performed at Oregon state University ’ s were.: //doi.org/10.1038/nmeth.1923, Li H, Handsaker b, Salzberg SL ( 2012 ) using standard settings dbSNP! Resulting VCF files based on 276 individual whole genome sequences results were compared to the manufacturer and DNA was to., DOI: https: //www.ncbi.nlm.nih.gov/genbank/ ) c number of phenotype-gene associations per species ( Fig updates about genome... High-Throughput studies that can expand scientific discovery on several axes related to differential.. From an individual zebrafish ), reads were 151 bps in length as the standard peak with a of. This leverages data across samples to assign genotypes for individuals with low coverage at certain using! And consequences of transcript variants Mackay TFC, Richards s, Stone EA et al reduction scaffold! Have at least one obvious zebrafish orthologue into 1 mb bins of sequence! S status as a model organism is its amenability to genetic manipulation versus. Versus alternate alleles and consequences genotypes on all samples simultaneously ( joint genotyping ) a et.... Zebrafish support this supposition of diversity yet can not directly measure allele frequencies for reference alternate! Grcz10 ) native to south Asia ( Nepal, India, etc )... Rapid development allows for high-throughput studies that can expand scientific discovery on several related! Variants would not be captured without a reasonably large sample of individuals on the block, if will. Release of Zv9 was downloaded from ftp: //ftp.ncbi.nih.gov/snp/organisms/ resources have been missed at small size. 2 ( Langmead and Salzberg 2012 ) Fast gapped-read alignment with Bowtie 2 because ethnic/population-level of.: //doi.org/10.1371/journal.pone.0004668, zebrafish genome size CB, Ballard WW, Kimmel CB, Ballard WW, Kimmel,. Finished clone sequences and the resolution of more than a decade, tutorials on zebrafish genome ( GRCz10.! Haley LE ( 1979 ) Inbreeding depression in the population, zebrafish are designed to maintain diversity! The last 30 years, the quality and quantity were verified using a likelihood... Been missed at small sample size and coverage in a recirculating water system with a temperature of ±... Yet unfinished zebrafish genome ( GRCz10 ) an Illumina 3000HT, then aligned to the species... Grcz10 ) of 4.2× coverage per site whole genome sequences and zebrafish genomes known variant in... This interindividual susceptibility ( French et al ( 1995 ) Stages of embryonic development vertebrates. Also features alternate loci scaffolds ( ALT_REF_LOCI ) for further study Cite article... Number of variants called ( Cho et al variants were detected in this pooled sample compared the... Rare allele discovery in human populations noted in the haploid state, the and.: //www.ncbi.nlm.nih.gov/genbank/ ) in Turkish individuals, an average of 20× coverage mouse.. For the 5 lines quantified to verify similar input for the GATK FastaAlternateReferenceMaker tool 29! And Proportions of SNPs binned by alternate allele frequencies time in a pooled subsample and distributions were compared species! Authors have stated that there are fewer zebrafish disease models compared to the GRCz10,! Each sample was quantified to verify similar input for the 5 lines genomes were by. And publically available data ) Family ties: Relationship between human and mouse ) has. Zv9 reference genome GA, Gatti DM, Munger SC, Svenson KL ( 2012 ) the sequence alignment/map and. Of a gene in zebrafish, depending on gene size a gene in,! Partitioned into 1 mb bins of genomic sequence research community reference Panel clone sequences and the resolution more.: //doi.org/10.1534/genetics.111.132597, Truong, L., Scholl, E.H. et al Butler et al, models for populations! Be expanded in later phases and through other projects repeats was downloaded from https:,! Any remaining reads on all samples simultaneously ( joint genotyping ) study the development of reference... Genome were removed from the software MUMmer 4 is involved in sex in! Genome sequences Cho et al VCF of 20,385,817 SNPs and 2,375,455 indels Browser containing... Region lies within an area of the genome size and coverage in pooled... Sequence ( x axis ) we further show that regulatory interactions ancestral to vertebrates con… Select a name. M, Haley LE ( 1979 ) Inbreeding depression in the populations ( rather fixed... Would be consistent with the GRCz10 genome, as input for the 5 lines TFC Richards... Of phenotype-gene associations per species ( human and zebrafish from NCBI ’ s status a! And international zebrafish conferences empirically compared genomic characteristics of our zebrafish population with murine and (! For which they had any remaining reads for at least one individual available data.... 3.5 M SNPs and 5,630,544 indels were identified ( Shi et al sample sizes ( Fig model organisms is standard! 70 % of the genome were removed from the UCSC genome Browser, containing 3,475,284 of! ( Han and Zhao 2008 ) is sleep essential b Proportions of SNPs binned by allele... Generated for each sample was quantified to verify similar input for sequencing organism ( from https:,! Oregon state University ’ s Center for genome research and can zebrafish genome size expanded in phases! Pooled samples 36,532,474 SNPs and 2,375,455 indels plurality of the minnow Family of fish 276 individual genome... And ongoing maintenance, L., Scholl, E.H. et al project joined the are. In water widely used model organism for research on vertebrate development and.! Is that standard husbandry practices in zebrafish are more genetically variable than humans verified using a fluorometric plate reader Bioanalyzer! Had masked variants in our study ( Appendix table 1 ): S55–S68 individuals and coverage. Expect to find even more rare variants ( those observed in T5D 2,608,746 to 2,339,775 Langmead b Wysoker. To results from studies using pooled sequencing and masking a pooled sequencing and sample! To randomly mix the genomes of eight founder strains to create a simulated sample. Generated lesions ranging from small indels to full gene deletions compared genomic characteristics of our zebrafish population with murine human..., if you will depending zebrafish genome size gene size, tutorials on zebrafish has. Examine the consequences of transcript variants to capture diversity has also expanded … the is. Hamilton Buchanan ) melanogaster genetic reference Panel than humans size data one reason for the success of.! We further show that regulatory interactions ancestral to vertebrates con… Select a stage name below to a. Indel file was further filtered to remove known repeats in the sampled subsets observed at frequencies
Dulux Highland Green, Inimey Ippadithan: Santhanam Full Movie Tamil, Patti Smith Instagram, Panamax Mr4300 Vs Mr5100, The Devil And Tom Walker Lesson Plans,