Background & Summary

The Colorado potato beetle (CPB), Leptinotarsa decemlineata, is one of the most successful globally-invasive insects. Its current habitat ranges over 16 million km2 across North America, Europe and Asia and continues to expand globally1. Both adults and larvae devour entire leaves. This makes CPB one of the most destructive insect pests. It has been estimated that a single larva can destroy approximately 40 cm2 of potato leaves over the stage2,3. Chemical pesticides have been used to control CPB since the 1860s4. However, high selection pressures have promoted the emergence of high level insecticide resistant CPB populations over the last decades5,6. Since the middle of the last century, the beetle has developed resistance to 52 different insecticides compounds.

Whole-genome sequencing is a fundamental tool to address important scientific issues in biological research, by providing a whole set of gene resources of a given species. The first genome assembly of L. decemlineata based on Illumina short reads was published in 20187, followed by an improved version Ldec_2.0. These two versions of CPB genomes have provided useful gene resources for the beetle community8,9. However, due to the limitation of short reads in genome assembly, the quality of the CPB genome still need be improved.

To this end, we applied the PacBio HiFi sequencing and High-throughput chromosome conformation capture technologies (Hi-C), to generate a high-quality chromosome-level genome assembly of L. decemlineata (Fig. 2). This produced a new CPB genome with high quality at chromosome level, which has a total scaffold length of 1,008.42 Mb mapping to 18 chromosomes (17 + XO). Compared to the published version Ldec_2.0, the scaffold N50 increased from 139 Kb to 58.32 Mb. Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis showed that gene coverage increased from 92.1% to 98.0% (Table 1). A total of 676 Mb repeat sequences representing 67.04% of whole genome were identified, much more than that found in Ldec_2.0, suggesting the new version of CPB genome is more complete. Among these repeat sequences, 72.47% were classified as known repeat elements (Table 2). In addition, protein-coding genes increased from 24,671 to 29,606, showing that a more complete set of genes were obtained. Most protein-coding genes identified in the previous version can be found in the new annotation. Functional categories were classified based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway and Gene ontology (GO) databases (Table 3).

Fig. 1
figure 1

The Colorado potato beetle, Leptinotarsa decemlineata.

Fig. 2
figure 2

Heatmap of genome-wide Hi-C data (resolution = 500,000 bp) and overview of the genomic landscape of Leptinotarsa decemlineata. (a) The heatmap of chromosome interactions in L. decemlineata was visualized by HiCPlotter45. The frequency of Hi-C interaction links is represented by colours, which ranges from white (low) to red (high). (b) Circos plot of distribution of the genomic elements in L. decemlineata was visualized by Circos46. From the outer ring to the inner circle, blue, red and green represent GC content, repeat sequence coverage and gene density of each chromosome, respectively.

Table 1 Comparison of two Leptinotarsa decemlineata genome assemblies.
Table 2 Statistics of repeat elements of Leptinotarsa decemlineata.
Table 3 Repeat elements of Leptinotarsa decemlineata.

A total of 418 single-copy orthologous genes were found among CPB and other 15 insect species (Table S1). These 1:1:1 orthologous gene were used to construct a phylogenetic tree. The evolutionary analysis results showed that L. decemlineata and other Chrysomelidae beetles formed a cluster. Anoplophora glabripennis (family: Cerambycidae) diverged from L. decemlineata (family: Chrysomelidae) approximately 96.5 million years ago (mya), and Tribolium castaneum (family: Tenebrionidae) diverged from L. decemlineata (family: Chrysomelidae) approximately 152.5 mya9.

In total, 14,446 gene clusters were identified across the 16 species. Compared with other insect species, CPB had 1,260 expanded and 716 contracted gene families (Fig. 3, Table S2). REVIGO analysis indicated that expanded orthogroups are enriched in DNA integration, macroautophagy, regulation of adenosine receptor signalling pathway and diverse biological process (Fig. 4a, Table S3). In contrast, the contracted orthogroups were significantly enriched in L-ornithine transmembrane, transporter activity, virus receptor activity (Fig. 4a, Table S4).

Fig. 3
figure 3

Phylogenetic tree of Leptinotarsa decemlineata and 15 other insect species. The numbers of expanded gene families (green) and contracted gene families (red) are shown to the right of each species branch44.

Fig. 4
figure 4

Gene ontology (GO) enrichment of expanded and contracted orthogroups in Leptinotarsa decemlineata.

The whole genome of Tribolium castaneum and Anthonomus grandis in Chrysomelidae were publicly reported10,11, thus, we performed whole-genome synteny analysis of L. decemlineata with these two species. A large number of fission and fusion events were identified between L. decemlineata and the other two beetles, suggesting that the beetle family Chrysomelidae have undergone a high degree of divergence. CPB has XO sex determining system12. Synteny analysis also showed that the CPB Chromosome 6 (Chr 6) shared high sequence synteny with X chromosome of T. castaneum (Fig. 5). The gene LdVssc has been reported as X-linked13, and this gene can be found in Chr 6. Combining these evidences, the CPB Chr 6 is regarded as X chromosome.

Fig. 5
figure 5

Comparative analysis of synteny among Leptinotarsa decemlineata, Tribolium castaneum and Anthonomus grandis. (a) Whole-genome synteny between Leptinotarsa decemlineata and Tribolium castaneum. (b) Whole-genome synteny between Leptinotarsa decemlineata and Anthonomus grandis.

As the first high-quality chromosome level genome assembly in Chrysomelidae, the chromosome-level genome assembly of L. decemlineata not only illuminate the genetic architecture of this important agricultural pests, providing a powerful approach to identify new gene targets for control measures, but also allows for exploration of biological characteristics of Chrysomelidae beetles.

Methods

Sample collection and sequencing

Leptinotarsa decemlineata adults were collected from Xinjiang Province, China. The adults were fed with fresh potato leaves and maintained at 26 ± 1 °C, under a 14:10-hr (light–dark) photoperiod cycle and 85% ± 5% relative humidity.

Genomic DNA was extracted from one female pupa using the QIAamp DNA Mini Kit (QIAGEN). Sex of the CPB pupa is identified by observing the 7th visible sternite14. The 7th visible sternite in the female pupa is separated in the middle by a suture, while the male pupa is complete and depressed in the centre. The integrity and purity of DNA was verified with agarose gel electrophoresis (AEG) and Nanodrop 2000. Eight micrograms of genomic DNA were sheared using g-Tubes (Covaris), and concentrated with AMPure PB magnetic beads. Each SMRT bell library was constructed using the Pacific Biosciences SMRT bell template prep kit 1.0. The constructed library was size-selected using the Sage ELF system for molecules 8–12 Kb, followed by primer annealing and the binding of SMRT bell templates to polymerases with the DNA Polymerase Binding Kit. Sequencing was carried out on the Pacific Bioscience Sequel II platform (Annoroad Gene Technology Co., Ltd, Beijing, China).

Chromosome-level genome assembly of L. decemlineata

HiFi reads were produced using the circular consensus sequencing (CCS) mode on the PacBio long-read systems. 31 Gb HiFi reads (30×) were produced with an average length of 19,479 bp. De novo assembly of PacBio HiFi reads was performed using Hifiasm v0.1314.

Hi-C libraries were constructed and sequenced on the Illumina HiSeq X Ten platform (Annoroad Gene Technology Co., Ltd, Beijing, China), using a standard procedure15. The clean reads were first aligned to the genome assembly using bowtie 2 v2.2.316. Unmapped reads were mainly composed of the chimeric regions spanning across the ligation junction. The ligation site of an unmapped read was determined with HiC-Pro v2.7.817. Then, its 5′ fractions were aligned back with the genome assembly. A single alignment file which merged the results of both mapping steps was generated. Reads that had low mapping quality, multiple matches in the assembly, singletons and mitochondrial DNA were discarded. The valid interaction pairs were used to scaffold assembled contigs into 18 pseudo-chromosomes using LACHESIS v2e27abb18. The number of pseudochromosomes was consistent with the data of L. decemlineata karyotype (n = 17 + XO)19. The chromosome matrix was visualized as a heatmap in the form of diagonal patches of strong linkage (Fig. 2a). The quality and completeness of the assembled genome was evaluated using BUSCO v5.020.

Gene prediction and functional annotation

A repeat database was used to train RepeatModeler221. Then, the repeat elements were annotated using the RepeatMasker v4.1.022 by homology searching with default parameters. After filtering the repeat sequences, the results of de novo prediction, transcriptome-based and homolog-based methods were combined to predict gene composition23. De novo gene models were generated using BRAKER2 v.2.1.524. Thirteen CPB transcriptomes were downloaded from the NCBI SRA database (SRR12121893, SRR13510813, SRR13510819, SRR13510821, SRR13510823, SRR9667707, SRR12121892, SRR13510812, SRR13510818, SRR13510820, SRR13510822, SRR9667699.1, SRR9667708). The transcriptomes were processed using Trimmomatic25, HISAT2 v.2.1.026 and StringTie2 v.2.1.527 to generate transcripts assemblies. The Homology proteins from all insect species were from OrthoDB28. Homology-based evidence was generated using GenomeThreader v.1.7.129. Finally, gene models were predicted after integrating results of the three methods of predictions using EVidenceModeler30.

The functions of protein-coding genes were annotated using DIAMOND BLASTP against the Swiss-Prot protein database (https://www.uniprot.org/) and Pfam database (http://pfam.xfam.org/). The predicted genes were classified into functional categories based on KEGG (https://www.genome.jp/kegg) and GO (https://www.uniprot.org/) (Table 3).

Phylogenetic analysis

We selected 15 coleopteran species for phylogenomic analysis, with Chrysoperla carnea (Order: Neuroptera) as an out-group. The protein sequences except CPB of these taxa were downloaded from NCBI and InsectBase 2.023 (Table S1).

A total of 418 single-copy orthogroups were extracted using Broccoli v1.231.The protein sequences in each orthogroup were extracted using seqkit v2.2.032, independently aligned using MAFFT v7.47133 and filtered using trimAl v1.434 with default parameters. The phylogenetic tree was constructed using iq-tree v1.6.1035 with the following parameters: -nt AUTO -m TEST -bb 1000. Branch support values were obtained from 1,000 bootstrap replicates. The divergence time among different species was estimated using the MCMCtree in the PAML package v4.9j36. Three standard divergence time points based on fossil records in the Paleobiology Database (www.paleobiodb.org) were applied: (a) stem Chrysomeloidea at 93.5–99.6 mya (b) stem Coleoptera at 166.1–168.3 mya (c) stem Coccinellidae at 295.5–298.9 mya.

Gene family expansion and contraction

The expansion and contraction of gene families were determined using CAFE v5.0.02937. The results from the phylogenetic tree with divergence times were used as inputs. A p-value of 0.05 was used to identify families that were significantly expanded and contracted. Gene ontology (GO) enrichment of expanded and contracted orthogroups of L.decemlineata were analysed and visualized by REVIGO38. The dispensability (i.e., redundancy with respect to the chosen representative GO term) of GO terms was less than 0.1.

Chromosomal synteny analysis

The whole-genome synteny analysis among the three species, was carried out using satsuma2 (https://github.com/bioinfologics/satsuma2). Synteny blocks were plotted across chromosomes using CIRCOS39.

Identification of sex chromosomes

To determine X chromosome, Blastn was used to map the X-linked locus LdVssc with 18 CPB chromosomes with default parameters.

Data Records

The PacBio and Hi-C sequencing data that were used for the genome assembly have been deposited in the NCBI Sequence Read Archive with accession number SRR2051912440,41 and SRR2109553642 and under BioProject accession number PRJNA854273. The chromosomal assembly has been deposited at GenBank with accession nember JANJPO00000000043. The annotated genes have been deposited in InsectBase 2.0 with ID IBG_0081844.

Technical Validation

The chromosome-level genome assembly was 1,008 Mb with a scaffold N50 of 58.32 Mb. For quantitative assessment of genome assembly, BUSCO assessment showed that 98.0% of BUSCO genes (insecta_odb10) were successfully identified in the genome assembly (Table 1), suggesting a remarkably complete assembly of the L. decemlineata genome.

The Hi-C heatmap revealed a well-organized interaction contact pattern along the diagonals within/around the chromosome inversion region (Fig. 1), which indirectly confirmed the accuracy of the chromosome assembly.