Abstract
The Colorado potato beetle (Leptinotarsa decemlineata) is one of the most notorious insect pests of potatoes globally. Here, we generated a high-quality chromosome-level genome assembly of L. decemlineata using a combination of the PacBio HiFi sequencing and Hi-C scaffolding technologies. The genome assembly (−1,008 Mb) is anchored to 18 chromosomes (17 + XO), with a scaffold N50 of 58.32 Mb. It contains 676 Mb repeat sequences and 29,606 protein-coding genes. The chromosome-level genome assembly of L. decemlineata provides in-depth knowledge and will be a helpful resource for the beetle and invasive biology research communities.
Similar content being viewed by others
Background & Summary
The Colorado potato beetle (CPB), Leptinotarsa decemlineata, is one of the most successful globally-invasive insects. Its current habitat ranges over 16 million km2 across North America, Europe and Asia and continues to expand globally1. Both adults and larvae devour entire leaves. This makes CPB one of the most destructive insect pests. It has been estimated that a single larva can destroy approximately 40 cm2 of potato leaves over the stage2,3. Chemical pesticides have been used to control CPB since the 1860s4. However, high selection pressures have promoted the emergence of high level insecticide resistant CPB populations over the last decades5,6. Since the middle of the last century, the beetle has developed resistance to 52 different insecticides compounds.
Whole-genome sequencing is a fundamental tool to address important scientific issues in biological research, by providing a whole set of gene resources of a given species. The first genome assembly of L. decemlineata based on Illumina short reads was published in 20187, followed by an improved version Ldec_2.0. These two versions of CPB genomes have provided useful gene resources for the beetle community8,9. However, due to the limitation of short reads in genome assembly, the quality of the CPB genome still need be improved.
To this end, we applied the PacBio HiFi sequencing and High-throughput chromosome conformation capture technologies (Hi-C), to generate a high-quality chromosome-level genome assembly of L. decemlineata (Fig. 2). This produced a new CPB genome with high quality at chromosome level, which has a total scaffold length of 1,008.42 Mb mapping to 18 chromosomes (17 + XO). Compared to the published version Ldec_2.0, the scaffold N50 increased from 139 Kb to 58.32 Mb. Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis showed that gene coverage increased from 92.1% to 98.0% (Table 1). A total of 676 Mb repeat sequences representing 67.04% of whole genome were identified, much more than that found in Ldec_2.0, suggesting the new version of CPB genome is more complete. Among these repeat sequences, 72.47% were classified as known repeat elements (Table 2). In addition, protein-coding genes increased from 24,671 to 29,606, showing that a more complete set of genes were obtained. Most protein-coding genes identified in the previous version can be found in the new annotation. Functional categories were classified based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and Gene ontology (GO) databases (Table 3).
A total of 418 single-copy orthologous genes were found among CPB and other 15 insect species (Table S1). These 1:1:1 orthologous gene were used to construct a phylogenetic tree. The evolutionary analysis results showed that L. decemlineata and other Chrysomelidae beetles formed a cluster. Anoplophora glabripennis (family: Cerambycidae) diverged from L. decemlineata (family: Chrysomelidae) approximately 96.5 million years ago (mya), and Tribolium castaneum (family: Tenebrionidae) diverged from L. decemlineata (family: Chrysomelidae) approximately 152.5 mya9.
In total, 14,446 gene clusters were identified across the 16 species. Compared with other insect species, CPB had 1,260 expanded and 716 contracted gene families (Fig. 3, Table S2). REVIGO analysis indicated that expanded orthogroups are enriched in DNA integration, macroautophagy, regulation of adenosine receptor signalling pathway and diverse biological process (Fig. 4a, Table S3). In contrast, the contracted orthogroups were significantly enriched in L-ornithine transmembrane, transporter activity, virus receptor activity (Fig. 4a, Table S4).
The whole genome of Tribolium castaneum and Anthonomus grandis in Chrysomelidae were publicly reported10,11, thus, we performed whole-genome synteny analysis of L. decemlineata with these two species. A large number of fission and fusion events were identified between L. decemlineata and the other two beetles, suggesting that the beetle family Chrysomelidae have undergone a high degree of divergence. CPB has XO sex determining system12. Synteny analysis also showed that the CPB Chromosome 6 (Chr 6) shared high sequence synteny with X chromosome of T. castaneum (Fig. 5). The gene LdVssc has been reported as X-linked13, and this gene can be found in Chr 6. Combining these evidences, the CPB Chr 6 is regarded as X chromosome.
As the first high-quality chromosome level genome assembly in Chrysomelidae, the chromosome-level genome assembly of L. decemlineata not only illuminate the genetic architecture of this important agricultural pests, providing a powerful approach to identify new gene targets for control measures, but also allows for exploration of biological characteristics of Chrysomelidae beetles.
Methods
Sample collection and sequencing
Leptinotarsa decemlineata adults were collected from Xinjiang Province, China. The adults were fed with fresh potato leaves and maintained at 26 ± 1 °C, under a 14:10-hr (light–dark) photoperiod cycle and 85% ± 5% relative humidity.
Genomic DNA was extracted from one female pupa using the QIAamp DNA Mini Kit (QIAGEN). Sex of the CPB pupa is identified by observing the 7th visible sternite14. The 7th visible sternite in the female pupa is separated in the middle by a suture, while the male pupa is complete and depressed in the centre. The integrity and purity of DNA was verified with agarose gel electrophoresis (AEG) and Nanodrop 2000. Eight micrograms of genomic DNA were sheared using g-Tubes (Covaris), and concentrated with AMPure PB magnetic beads. Each SMRT bell library was constructed using the Pacific Biosciences SMRT bell template prep kit 1.0. The constructed library was size-selected using the Sage ELF system for molecules 8–12 Kb, followed by primer annealing and the binding of SMRT bell templates to polymerases with the DNA Polymerase Binding Kit. Sequencing was carried out on the Pacific Bioscience Sequel II platform (Annoroad Gene Technology Co., Ltd, Beijing, China).
Chromosome-level genome assembly of L. decemlineata
HiFi reads were produced using the circular consensus sequencing (CCS) mode on the PacBio long-read systems. 31 Gb HiFi reads (30×) were produced with an average length of 19,479 bp. De novo assembly of PacBio HiFi reads was performed using Hifiasm v0.1314.
Hi-C libraries were constructed and sequenced on the Illumina HiSeq X Ten platform (Annoroad Gene Technology Co., Ltd, Beijing, China), using a standard procedure15. The clean reads were first aligned to the genome assembly using bowtie 2 v2.2.316. Unmapped reads were mainly composed of the chimeric regions spanning across the ligation junction. The ligation site of an unmapped read was determined with HiC-Pro v2.7.817. Then, its 5′ fractions were aligned back with the genome assembly. A single alignment file which merged the results of both mapping steps was generated. Reads that had low mapping quality, multiple matches in the assembly, singletons and mitochondrial DNA were discarded. The valid interaction pairs were used to scaffold assembled contigs into 18 pseudo-chromosomes using LACHESIS v2e27abb18. The number of pseudochromosomes was consistent with the data of L. decemlineata karyotype (n = 17 + XO)19. The chromosome matrix was visualized as a heatmap in the form of diagonal patches of strong linkage (Fig. 2a). The quality and completeness of the assembled genome was evaluated using BUSCO v5.020.
Gene prediction and functional annotation
A repeat database was used to train RepeatModeler221. Then, the repeat elements were annotated using the RepeatMasker v4.1.022 by homology searching with default parameters. After filtering the repeat sequences, the results of de novo prediction, transcriptome-based and homolog-based methods were combined to predict gene composition23. De novo gene models were generated using BRAKER2 v.2.1.524. Thirteen CPB transcriptomes were downloaded from the NCBI SRA database (SRR12121893, SRR13510813, SRR13510819, SRR13510821, SRR13510823, SRR9667707, SRR12121892, SRR13510812, SRR13510818, SRR13510820, SRR13510822, SRR9667699.1, SRR9667708). The transcriptomes were processed using Trimmomatic25, HISAT2 v.2.1.026 and StringTie2 v.2.1.527 to generate transcripts assemblies. The Homology proteins from all insect species were from OrthoDB28. Homology-based evidence was generated using GenomeThreader v.1.7.129. Finally, gene models were predicted after integrating results of the three methods of predictions using EVidenceModeler30.
The functions of protein-coding genes were annotated using DIAMOND BLASTP against the Swiss-Prot protein database (https://www.uniprot.org/) and Pfam database (http://pfam.xfam.org/). The predicted genes were classified into functional categories based on KEGG (https://www.genome.jp/kegg) and GO (https://www.uniprot.org/) (Table 3).
Phylogenetic analysis
We selected 15 coleopteran species for phylogenomic analysis, with Chrysoperla carnea (Order: Neuroptera) as an out-group. The protein sequences except CPB of these taxa were downloaded from NCBI and InsectBase 2.023 (Table S1).
A total of 418 single-copy orthogroups were extracted using Broccoli v1.231.The protein sequences in each orthogroup were extracted using seqkit v2.2.032, independently aligned using MAFFT v7.47133 and filtered using trimAl v1.434 with default parameters. The phylogenetic tree was constructed using iq-tree v1.6.1035 with the following parameters: -nt AUTO -m TEST -bb 1000. Branch support values were obtained from 1,000 bootstrap replicates. The divergence time among different species was estimated using the MCMCtree in the PAML package v4.9j36. Three standard divergence time points based on fossil records in the Paleobiology Database (www.paleobiodb.org) were applied: (a) stem Chrysomeloidea at 93.5–99.6 mya (b) stem Coleoptera at 166.1–168.3 mya (c) stem Coccinellidae at 295.5–298.9 mya.
Gene family expansion and contraction
The expansion and contraction of gene families were determined using CAFE v5.0.02937. The results from the phylogenetic tree with divergence times were used as inputs. A p-value of 0.05 was used to identify families that were significantly expanded and contracted. Gene ontology (GO) enrichment of expanded and contracted orthogroups of L.decemlineata were analysed and visualized by REVIGO38. The dispensability (i.e., redundancy with respect to the chosen representative GO term) of GO terms was less than 0.1.
Chromosomal synteny analysis
The whole-genome synteny analysis among the three species, was carried out using satsuma2 (https://github.com/bioinfologics/satsuma2). Synteny blocks were plotted across chromosomes using CIRCOS39.
Identification of sex chromosomes
To determine X chromosome, Blastn was used to map the X-linked locus LdVssc with 18 CPB chromosomes with default parameters.
Data Records
The PacBio and Hi-C sequencing data that were used for the genome assembly have been deposited in the NCBI Sequence Read Archive with accession number SRR2051912440,41 and SRR2109553642 and under BioProject accession number PRJNA854273. The chromosomal assembly has been deposited at GenBank with accession nember JANJPO00000000043. The annotated genes have been deposited in InsectBase 2.0 with ID IBG_0081844.
Technical Validation
The chromosome-level genome assembly was 1,008 Mb with a scaffold N50 of 58.32 Mb. For quantitative assessment of genome assembly, BUSCO assessment showed that 98.0% of BUSCO genes (insecta_odb10) were successfully identified in the genome assembly (Table 1), suggesting a remarkably complete assembly of the L. decemlineata genome.
The Hi-C heatmap revealed a well-organized interaction contact pattern along the diagonals within/around the chromosome inversion region (Fig. 1), which indirectly confirmed the accuracy of the chromosome assembly.
Code availability
All software and pipelines were executed according to the manual and protocols of the published bioinformatic tools. The version and code/parameters of software have been described in Methods.
References
Zehnder, G. W. Timing of Insecticides for Control of Colorado Potato Beetle (Coleoptera: Chrysomelidae) in Eastern Virginia Based on Differential Susceptibility of Life Stages. J. Econ. Entomol. 79, 851–856 (1986).
Logan, P. A., Casagrande, R. A., Faubert, H. H. & Drummond, F. A. Temperature-dependent development and feeding of immature Colorado potato beetles, Leptinotarsa decemlineata (Say)(Coleoptera: Chrysomelidae). Environ. Entomol. 14, 275–283 (1985).
Ferro, D. N., Alyokhin, A. V. & Tobin, D. B. Reproductive status and flight activity of the overwintered Colorado potato beetle. Entomol. Exp. Appl. 91, 443–448 (1999).
Harris, C. R. & Svec, H. J. Colorado potato beetle resistance to carbofuran and several other insecticides in Quebec. J. Econ. Entomol. 74, 421–424 (1981).
Gauthier, N. L., Hofmaster, R. N. & Semel, M. History of Colorado potato beetle control. In “Advances in potato pest management” (Lashomb, J. H. & Casagrande, R. Eds.). Hutchinson Ross Stroudsberg Pa. 13–33 (1981).
Alyokhin, A., Baker, M., Mota-Sanchez, D., Dively, G. & Grafius, E. Colorado Potato Beetle Resistance to Insecticides. Am. J. Potato Res. 85, 395–413 (2008).
Schoville, S. D. et al. A model species for agricultural pest genomics: the genome of the Colorado potato beetle, Leptinotarsa decemlineata (Coleoptera: Chrysomelidae). Sci. Rep. 8, 1931 (2018).
Dunn, N. A. et al. Apollo: Democratizing genome annotation. PLOS Comput. Biol. 15, e1006790 (2019).
Thomas, G. W. C. et al. Gene content evolution in the arthropods. Genome Biol. 21, 15 (2020).
Tribolium Genome Sequencing Consortium. The genome of the model beetle and pest Tribolium castaneum. Nature 452, 949–955 (2008).
Cohen, Z. P. et al. Insight into weevil biology from a reference quality genome of the boll weevil, Anthonomus grandis grandis Boheman (Coleoptera: Curculionidae). G3 GenesGenomesGenetics jkac309 (2022).
Hsiao, T. H. & Hsiao, C. Chromosomal analysis of Leptinotarsa and Labidomera species (Coleoptera: Chrysomelidae). Genetica 60, 139–150 (1983).
Hawthorne, D. J. AFLP-Based Genetic Linkage Map of the Colorado Potato Beetle Leptinotarsa decemlineata: Sex Chromosomes and a Pyrethroid-Resistance Candidate Gene. Genetics 158, 695–700 (2001).
Y, P. A method for sex determination of the Colorado potato beetle pupa, Leptinotarsa decemlineata (Coleoptera: Chrysomelidae). Entomol. News 104, 140–142 (1993).
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
Shi, J. et al. Chromosome conformation capture resolved near complete genome assembly of broomcorn millet. Nat. Commun. 10, 464 (2019).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).
Petitpierre, E., Segarra, C., Yadav, J. S. & Virkki, N. Chromosome Numbers and Meioformulae of Chrysomelidae. in Biology of Chrysomelidae (eds. Jolivet, P., Petitpierre, E. & Hsiao, T. H.) 161–186, https://doi.org/10.1007/978-94-009-3105-3_10 (Springer Netherlands, 1988).
Krzywinski, M. et al. Circos: An information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009).
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. 117, 9451–9457 (2020).
Tarailo‐Graovac, M. & Chen, N. Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences. Curr. Protoc. Bioinforma. 25 (2009).
Mei, Y. et al. InsectBase 2.0: a comprehensive gene resource for insects. Nucleic Acids Res. 50, D1040–D1045 (2022).
Brůna, T., Hoff, K. J., Lomsadze, A., Stanke, M. & Borodovsky, M. BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database. NAR Genomics Bioinforma. 3, lqaa108 (2021).
Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).
Kubota, A. et al. Cytochrome P450 CYP2 genes in the common cormorant: Evolutionary relationships with 130 diapsid CYP2 clan sequences and chemical effects on their expression. Comp. Biochem. Physiol. Part C Toxicol. Pharmacol. 153, 280–289 (2011).
Kovaka, S. Transcriptome assembly from long-read RNA-seq alignments with StringTie2. Genome Biol. 20, 1–13 (2019).
Kuznetsov, D. et al. OrthoDB v11: annotation of orthologs in the widest sampling of organismal diversity. Nucleic Acids Res. gkac998 (2022).
Gremme, G., Brendel, V., Sparks, M. E. & Kurtz, S. Engineering a software tool for gene structure prediction in higher organisms. Inf. Softw. Technol. 47, 965–978 (2005).
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).
Derelle, R., Philippe, H. & Colbourne, J. K. Broccoli: Combining Phylogenetic and Network Analyses for Orthology Assignment. Mol. Biol. Evol. 37, 3389–3396 (2020).
Shen, W., Le, S., Li, Y. & Hu, F. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PLOS ONE 11, e0163962 (2016).
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973 (2009).
Nguyen, L.-T., Schmidt, H. A., Von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
Yang, Z. PAML 4: Phylogenetic Analysis by Maximum Likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007).
Mendes, F. K., Vanderpool, D., Fulton, B. & Hahn, M. W. CAFE 5 models variation in evolutionary rates among gene families. Bioinformatics 36, 5516–5518 (2021).
Supek, F., Bošnjak, M., Škunca, N. & Šmuc, T. REVIGO Summarizes and Visualizes Long Lists of Gene Ontology Terms. PLOS ONE 6, e21800 (2011).
Krzywinski, M. I. et al. Circos: An information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR20519124 (2022).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR21095536 (2022).
NCBI GenBank https://identifiers.org/nucleotide:JANJPO000000000 (2022).
Leptinotarsa decemlineata in InsectBase 2.0 http://v2.insect-genome.com/Ldec
Manni, M., Berkeley, M. R., Seppey, M., Simão, F. A. & Zdobnov, E. M. BUSCO Update: Novel and Streamlined Workflows along with Broader and Deeper Phylogenetic Coverage for Scoring of Eukaryotic, Prokaryotic, and Viral Genomes. Mol. Biol. Evol. 38, 4647–4654 (2021).
Akdemir, K. C. & Chin, L. HiCPlotter integrates genomic data with interaction matrices. Genome Biol. 16, 1–8 (2015).
Acknowledgements
This work was supported by the Guangdong Major Project of Basic and Applied Basic Research (2021B0301030004), the National Key Research and Development Program of China (2018YFD0200802) and the National Natural Science Foundation of China (32102271).
Author information
Authors and Affiliations
Contributions
Y.G. and F.L. conceived the research project. J.Y., X.D. and M.Z. led the collection of samples and population metadata. C.Z., Z.H. and F.L. performed the bioinformatic analyses. Y.G., F.L. and R.Z. wrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Yan, J., Zhang, C., Zhang, M. et al. Chromosome-level genome assembly of the Colorado potato beetle, Leptinotarsa decemlineata. Sci Data 10, 36 (2023). https://doi.org/10.1038/s41597-023-01950-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-023-01950-5