Petromyzon marinus PMZ_v3.1 Gene Models GFF
Overview
Identification of Repetitive Elements: Repeats were identified within assembled scaffolds using RepeatModeler and annotated using RepeatMasker version open-4.0.5 (see URLs) and a library of vertebrate repeats from repbase (repeatmaskerlibraries-20140131). Identification of Coding Sequences: Genome annotations were produced using the MAKER genome annotation pipeline, which supports re-annotation using pre-existing gene models as input. Previous Petromyzon marinus gene models (WUGSC 7.0/petMar2 assembly) were mapped against the new genome assembly into GFF3 format and were used as prior model input to MAKER for re-annotation. Snap and Augustus were also used with MAKER and were trained using the pre-existing lamprey gene models. Additional input to MAKER included previously-published mRNA-seq reads derived from lamprey embryos and testes10,12,13 and assembled using Trinity, as well as mRNA-seq reads (NexSeq 75-100 bp paired-end) were derived from whole embryos and dissected heads at Tahara stage 20, as well as dissected embryonic dorsal neural tubes at Tahara stage 18, 20 and 21. The following protein datasets were also used: Ciona intestinalis (sea squirt), Lottia gigantea (limpet), Nematostella vectensis (sea anemone), Takifugu rubripes (pufferfish), Branchiostoma floridae (lancelet), Callorhinchus milii (elephant shark), Xenopus tropicalis (western clawed frog), Drosophila melanogaster (fruit fly), Homo sapiens (human), Mus musculus (mouse), Danio rerio (zebrafish), Hydra magnipapillata, Trichoplax adhaerens, and the Uniprot/Swiss-Prot protein database. Protein domains were identified in final gene models using the InterProScan domain identification pipeline, and putative gene functions were assigned using BLASTP identified homology to the Uniprot/Swiss-Prot protein database. Properties
Additional information about this analysis:
|