Petromyzon marinus

Overview
Full NamePetromyzon marinus
GenusPetromyzon
Speciesmarinus
Common NameSea Lamprey
AbbreviationPmar

The sea lamprey is a member of an ancient lineage that diverged from the vertebrate stem approximately 550 million years ago (MYA). By virtue of this deep evolutionary perspective, lamprey has served as a critical model for understanding the evolution of several conserved and derived features that are relevant to broad fields of biology and biomedicine. Studies have used lampreys to provide perspective on the evolution of developmental pathways that define vertebrate embryogenesis, vertebrate nervous and neuroendocrine systems, genome structure, immunity, clotting and others. These studies reveal aspects of vertebrate biology that have been conserved over deep evolutionary time and reveal evolutionary modifications that gave rise to novel features that emerged within the jawed vertebrate lineage (gnathostomes). Lampreys also possess several features that are not observed in gnathostomes, which could represent either aspects of ancestral vertebrate biology that have not been conserved in the gnathostomes or features that arose since the divergence of the ancestral lineages that gave rise to lampreys and gnathostomes. These include the ability to achieve full functional recovery after complete spinal cord transection, deployment of evolutionarily independent yet functionally equivalent adaptive immune receptors, and the physical restructuring of the genome during development known as programmed genome rearrangement (PGR).

Programmed genome rearrangement results in the physical elimination of ~0.5 Gb of DNA from it’s ~2.3 Gb genome. The elimination events that mediate PGR are initiated at the 7th embryonic cell division and are essentially complete by 3 days post fertilization. As a result, lampreys are effectively chimeric, with germ cells possessing a full complement of genes and all other cell types possessing a smaller, reproducible, fraction of the germline genome. Previous analyses support the idea that the somatic genome lacks several genes that contribute to the development and maintenance of germ cells but are potentially deleterious if misexpressed in somatic lineages. However, our understanding of the mechanisms and consequences of PGR remains incomplete, as only a fraction of the germline genome has been sequenced to date.

In contrast to the germline genome, the somatically retained portions of the genome are relatively well characterized. Because PGR was not known to occur in lampreys prior to 2009, sequencing efforts focused on somatic tissues from which DNA or intact nuclei could be readily obtained (e.g. blood and liver). Sequencing of the sea lamprey somatic genome followed an approach that had proven successful for other vertebrate genomes prior to the advent of next generation sequencing technologies (Sanger sequencing of clone ends, fosmid ends and BAC ends). Due to the abundance of highly-identical interspersed repetitive elements and moderately high levels of polymorphism (approaching 1%), assembly of the somatic genome resulted in a consensus sequence that was substantially more fragmentary than other Sanger-based vertebrate assemblies. Nonetheless, this initial assembly yielded significant improvements in our understanding of the evolution of vertebrate genomes and fundamental aspects of vertebrate neurobiology, immunity and development.

This assembly represents the first assembly of the sea lamprey germline genome. Through extensive optimization of assembly pipelines, we identified a computational solution that allowed us to generate an assembly from next-generation sequence data (Illumina and Pacific Biosciences reads) that surpasses the existing Sanger-based somatic assembly. Analysis of the resulting assembly has revealed several hundred genes that are eliminated from somatic tissues by PGR and sheds new light on the evolution of genes and functional elements in the wake of ancient large-scale duplication events.

Dr. Robb Krumlauf's Group, here at the Stowers Institute for Medical Research, is studying the Hox gene clusters in the sea lamprey and comparing their organisation with those of jawed vertebrates. The Hox family of transcription factors are encoded by genes that reside in genomic clusters and play key anterior-posterior patterning roles in multiple tissues during embryonic development. Hox genes exhibit segmental domains of activity in the developing head and neck that are remarkably similar between different vertebrate species. A remarkable feature of Hox gene expression during embryogenesis is that the order and timing of Hox gene expression along the embryonic anterior-posterior axis correlate with the relative positions of those genes along the Hox cluster. It is widely held that the emergence of gene regulatory networks governing Hox-dependent patterning of the embryonic brain and pharynx were fundamentally important events during the evolution of the complex vertebrate head. Thus, it is important to understand when and how these molecular cascades evolved in early vertebrates. As one of the only living jawless vertebrates, the sea lamprey is a crucial model organism for such endeavours, and research into the sea lamprey Hox genes is illuminating these important questions.




EMBARGO ON DATA USE: We are delighted to make lamprey genomes available to the broader community. We encourage others to use these data, but in doing so we also expect that they will respect our right to first presentation (including journal publications, pre-prints such as in bioRxiv, public conference talks, and press releases) of a genome-wide analysis of these assemblies. This includes the use of genome-wide data for phylogenetic and evolutionary analysis, on behalf of ourselves as data producers, the sample providers and collaborators. Therefore, please respect the embargo on the presentation of analyses using pre-publication data that we release via this website and relevant archives. Similar fair use statements apply to other assemblies such as those released by the Sanger Institute and the VGP.

Exceptions to the policy are for analyses of either a single locus, or a single gene family, or a maximum of 5 gene loci across multiple species, or for use as a reference for mapping reads from independent studies. Individual genomes and datasets will be considered released from this embargo when they are expressly published, and the relevant reference(s) will be added to this website at that time. For any queries about using the data, referencing/publishing analyses based on pre-publication data or interest in contributing to genome-scale analyses please contact jjsmit3@uky.edu. Use of data for sea lamprey are also covered under the VGP data use policies


The following assemblies are currently under embargo: Petromyzon marinus (kPetMar1), Entosphenus tridentatus (ETR male, ETR female), Lampetra richardsoni (LPT). gPmar100 was published in Nature Genetics (2018)




Funding Statement: Funding from several sources contributed to the assembly and annotation of the sea lamprey genome assembly, including funding from the Great Lakes Fisheries Commission to Jeramiah Smith and Erich Jarvis; funding from the NSF under grant number MCB-1818012 to J Smith, and funding from the NIH under grant number R35GM130349 to J Smith. The support and resources from the Center for High Performance Computing at the University of Utah are gratefully acknowledged. The computational resources used were partially funded by the NIH Shared Instrumentation Grant 1S10OD021644-01A1.

Embargo On Data Use

We are delighted to make lamprey genomes available to the broader community. We encourage others to use these data, but in doing so we also expect that they will respect our right to first presentation (including journal publications, pre-prints such as in bioRxiv, public conference talks, and press releases) of a genome-wide analysis of these assemblies. This includes the use of genome-wide data for phylogenetic and evolutionary analysis, on behalf of ourselves as data producers, the sample providers and collaborators. Therefore, please respect the embargo on the presentation of analyses using pre-publication data that we release via this website and relevant archives. Similar fair use statements apply to other assemblies such as those released by the Sanger Institute and the VGP.

Exceptions to the policy are for analyses of either a single locus, or a single gene family, or a maximum of 5 gene loci across multiple species, or for use as a reference for mapping reads from independent studies. Individual genomes and datasets will be considered released from this embargo when they are expressly published, and the relevant reference(s) will be added to this website at that time. For any queries about using the data, referencing/publishing analyses based on pre-publication data or interest in contributing to genome-scale analyses please contact jjsmit3@uky.edu. Use of data for sea lamprey are also covered under the VGP data use policies.

The following assemblies are currently under embargo: Petromyzon marinus (kPetMar1), Entosphenus tridentatus (ETR male, ETR female), Lampetra richardsoni (LPT). gPmar100 was published in Nature Genetics (2018)

Community Annotation

Crowd Sourced Curation

Join the community effort to improve the gene models. We would like to generate a high quality set of gene models. This is only possible with manual gene annotation. The more curators, the more likely this task can be completed. To request an invitation to help curate, please email us at simrbase@stowers.org

Our Curation Editor

We use Apollo, an open source software project. Apollo is a web application that allows for many curators to edit gene models at the same time. You can watch other edits going on live even when they are happening across the country or even across the world. To request an invitation to help curate, please email us at simrbase@stowers.org

Apollo Documentation

Find information on how to use Apollo in the offical Apollo User Guide. There is plenty of information about how to get started and detailed information about manual gene editing. Send any questions to simrbase@stowers.org

RULES

To be sure that all annotations are of the highest standard and trustworthy, there is some information that is required so that the curation process by each individual is transparent.

All new annotations must have the following:

  1. Gene Information
    • Name: Gene Name
    • Symbol: Short name or symbol
    • Description: Informative description
    • Attributes:
      • Attribute: "Supported By"
      • Value: for :Supported By":
        • DNA Sequencing Reads,
        • RNA Sequencing Reads,
        • BLAST Alignment,
        • or custom value
    • Comments: Any comments that document the process of curating this sequence feature

    Annotations can have the following if they exist

    • Any Custom Attributes and values
    • DBXRefs: Accessions of this sequence feature in another database of the same species.
      • DB: Database name like GenBank
      • GenBank ID or other database ID for this feature in this species
    • Pubmed IDs: Pubmed IDs of any article that mentions this sequence feature in this organism
    • Gene Ontology IDs: Any GO ID that is associated with this sequence feature in this organism
    Data Analyses

    All data loaded into SIMRbase has an "Analysis Page". These pages provide information about the methods and provide a link to download the files.

    Namesort descending Program Source
    Petromyzon marinus Genome Assembly (kPetmar1) kPetmar1 GCF_010993605.1_kPetMar1.pri_genomic.fna
    Petromyzon marinus Germline Genome Assembly FASTA (gPmar100) Dovetail gPmar100.fa
    Petromyzon marinus Germline Specific Regions BWA-MEM PM_germ_enrich_highcov.bw
    Petromyzon marinus HOX Genes Manual Curation cDNA FASTA Apollo Annotations.cdna.12142017.fasta
    Petromyzon marinus HOX Genes Manual Curation GFF Apollo Annotations.12142017.gff3
    Petromyzon marinus kPetMar1 BLASTX Cavefish e!100 BLASTX cavefish.tar.gz
    Petromyzon marinus kPetMar1 BLASTX Celegans e!100 BLASTX celegan.tar.gz
    Petromyzon marinus kPetMar1 BLASTX Fly e!100 BLASTX fly.tar.gz
    Petromyzon marinus kPetMar1 BLASTX Human e!100 BLASTX human.tar.gz
    Petromyzon marinus kPetMar1 BLASTX Lamprey e!100 BLASTX lamprey.tar.gz
    Petromyzon marinus kPetMar1 BLASTX Medaka e!100 BLASTX medaka.tar.gz
    Petromyzon marinus kPetMar1 BLASTX Mouse e!100 BLASTX mouse.tar.gz
    Petromyzon marinus kPetMar1 BLASTX Nematostella e!Metazoa46 BLASTX nematostella.tar.gz
    Petromyzon marinus kPetMar1 BLASTX Smed Smesg BLASTX smed.tar.gz
    Petromyzon marinus kPetMar1 BLASTX Uniprot BLASTX uniprot.tar.gz
    Petromyzon marinus kPetMar1 BLASTX Xenopus e!100 BLASTX xenopus.tar.gz
    Petromyzon marinus kPetMar1 BLASTX Yeast e!100 BLASTX yeast.tar.gz
    Petromyzon marinus kPetMar1 BLASTX Zebrafish e!100 BLASTX zebrafish.tar.gz
    Petromyzon marinus kPetMar1 IPRSCAN IPRSCAN pmz_iprscan.gz
    Petromyzon marinus kPetMar1 Mapped GFF MAKER pmz.genes.gff
    Petromyzon marinus kPetMar1 Protein FASTA MAKER pmz.proteins.fasta
    Petromyzon marinus kPetMar1 Transcript FASTA MAKER pmz.transcripts.fasta
    Petromyzon marinus lncRNA GSNAP P_marinus_lncrna.fasta
    Petromyzon marinus PMZ_v3.1 Gene Models FASTA MAKER PMZ_v3.1_transcripts.fa
    Petromyzon marinus PMZ_v3.1 Gene Models GFF MAKER PMZ_v3.1_genes.gff3
    Petromyzon marinus repeats RepeatMasker germ1_update4_RModeler_union_velvet29_from_t165_bl180.fa.gz
    Downloads
    gPmar1.0 Genome Assembly gPmar100
    Transcripts PMZ_v3.1 Transcripts
    Proteins PMZ_v3.1 Proteins



    See Data Analysis page for a listing of the Analyses performed on gPmar100 genome, PMZ_v3.1 Gene Model transcripts or proteins. Downloads for each analysis can be found on the individual analysis pages, along with a description of the methodologies used.

    Feature Summary
    The following features are currently present for this organism
    Feature TypeCount
    exon259,426
    polypeptide54,772
    mRNA34,991
    CDS34,991
    gene34,991
    three_prime_UTR13,831
    supercontig13,511
    five_prime_UTR12,502
    chromosome86
    Funding Statement

    Funding from several sources contributed to the assembly and annotation of the sea lamprey genome assembly, including funding from the Great Lakes Fisheries Commission to Jeramiah Smith and Erich Jarvis; funding from the NSF under grant number MCB-1818012 to J Smith, and funding from the NIH under grant number R35GM130349 to J Smith. The support and resources from the Center for High Performance Computing at the University of Utah are gratefully acknowledged. The computational resources used were partially funded by the NIH Shared Instrumentation Grant 1S10OD021644-01A1.

    Genome Consortium

    The work of many diverse groups made this germline genome assembly and annotation possible.

    Group 1 speciality
    Group 2 speciality
    Group 3 speciality
    Group 4 speciality
    Group 5 speciality
    Group 6 speciality
    Genome Properties
    Property NameValue
    Assembly Scaffold Count 12,077
    Assembly VersiongPmar1.0
    Chromosome Number99 pairs
    Genome Size~2.3 Gb
    N5034 contigs contain 12Mb
    Germline Genome Manuscript

    The sea lamprey germline genome provides insights into programmed genome rearrangement and vertebrate evolution

    Jeramiah J. Smith, Nataliya Timoshevskaya, Chengxi Ye, Carson Holt, Melissa C. Keinath, Hugo J. Parker, Malcolm E. Cook, Jon E. Hess, Shawn R. Narum, Francesco Lamanna, Henrik Kaessmann, Vladimir A. Timoshevskiy, Courtney K. M. Waterbury, Cody Saraceno, Leanne M. Wiedemann, Sofia M. C. Robb, Carl Baker, Evan E. Eichler, Dorit Hockman, Tatjana Sauka-Spengler, Mark Yandell, Robb Krumlauf, Greg Elgar & Chris T. Amemiya


    The sea lamprey (Petromyzon marinus) serves as a comparative model for reconstructing vertebrate evolution. To enable more informed analyses, we developed a new assembly of the lamprey germline genome that integrates several complementary data sets. Analysis of this highly contiguous (chromosome-scale) assembly shows that both chromosomal and whole-genome duplications have played significant roles in the evolution of ancestral vertebrate and lamprey genomes, including chromosomes that carry the six lamprey HOX clusters. The assembly also contains several hundred genes that are reproducibly eliminated from somatic cells during early development in lamprey. Comparative analyses show that gnathostome (mouse) homologs of these genes are frequently marked by polycomb repressive complexes (PRCs) in embryonic stem cells, suggesting overlaps in the regulatory logic of somatic DNA elimination and bivalent states that are regulated by early embryonic PRCs. This new assembly will enhance diverse studies that are informed by lampreys’ unique biology and evolutionary/comparative perspective.


    Nature Genetics (2018) doi:10.1038/s41588-017-0036-1

    Go to Nature Genetics to Download