Research Article |
Corresponding author: Karolina Fučíková ( karolina.fucikova@gmail.com ) Academic editor: Brecht Verstraete
© 2023 Karolina Fučíková, Melissa Taylor, Louise A. Lewis, Brian K. Niece, Aleeza S. Isaac, Nicole Pietrasiak.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Fucikova K, Taylor M, Lewis LA, Niece BK, Isaac AS, Pietrasiak N (2023) Johansenicoccus eremophilus gen. et sp. nov., a novel evolutionary lineage in Chlorophyceae with unusual genomic features. Plant Ecology and Evolution 156(3): 311-325. https://doi.org/10.5091/plecevo.105762
|
Background – Green algae are a diverse group of photosynthetic eukaryotes, yet are still vastly understudied compared to land plants. For many years, green algae were characterized based on their morphology and life cycles. More recently, phylogenetic and genomic analyses have been added to the phycological toolkit for a better understanding of algal biodiversity and evolutionary history.
Material and methods – A desert strain of green algae was isolated from Joshua Tree National Park (JTNP) in southern California as part of a larger biodiversity survey. The alga’s nuclear rRNA genes as well as the chloroplast genome were sequenced, annotated, and analysed in addition to a morphological assessment.
Results – Morphologically this strain is especially similar to Pseudomuriella and Rotundella, and its lipid profile resembles that of other soil algae, but phylogenomic analyses demonstrate that it is a distinct evolutionary lineage in Chlorophyceae. The alga exhibits several unusual genomic features, the most remarkable being its highly derived yet apparently functional nuclear rRNA genes, 18S and 28S. Both genes are GC-rich and bear many compensatory base changes to maintain a similar secondary structure to that of other green algae. The chloroplast genome has a distinct gene order and repeat arrangement from other published green algal plastomes, but contains the expected genes and also provides phylogenetically informative data.
Conclusion – We conclude that the strain be placed into a new species and genus in the class Chlorophyceae, and propose the name Johansenicoccus eremophilus for this new taxon. Johansenicoccus eremophilus exemplifies science’s insufficient understanding of the range of genomic variations among inconspicuous soil algae.
biological soil crust, chlorophyte, coccoid, direct repeat, inverted repeat, microalga
Green algae span a great diversity of body plans, from simple solitary unicells to complex filaments and thalli with differentiated cell types, and they inhabit a broad range of environments in nearly every corner of Earth (
Studies of green algae from soils have shown remarkable adaptations. Desert soil algae have distinct mechanisms for coping with environmental stresses (
Within the green algal phylum Chlorophyta, the class Trebouxiophyceae is especially known to harbour numerous soil-dwelling species and includes many desert-dwelling lineages among them (
The class Chlorophyceae recently received detailed systematic attention and the taxonomic structure within the class was examined using chloroplast genome data (
In the present study, we focus on a single enigmatic isolate from a biological soil crust originally collected from the Joshua Tree National Park (JTNP) by
The material was collected as part of a larger study focused on the biodiversity of JTNP biological soil crusts (
A 1 g subsample of the composite soil was rehydrated in 100 ml Bold’s Basal Medium (BBM) (
The morphology of the strain WJT24VFNP31 was first assessed using light microscopy to survey and measure cell size, shape, and internal structures visible under a light microscope. The Olympus BH-2 photomicroscope with Nomarski DIC optics was used for observations and photographs were obtained using an Olympus DP25 camera (Olympus, Center Valley, CA, USA). Cells from 2–4-week-old cultures were observed. To determine the number of nuclei in young and mature vegetative cells, the cells were first left overnight in a permeation buffer containing 0.1% Triton X and 0.05% Tween 20. Cells were washed three times with phosphate buffer saline (PBS), then fixated using 4% formalin for at least 2 hours. Samples were washed and combined with DAPI (diluted from stock 1:1000) for 7–10 minutes. Samples were washed again five times with PBS in order to remove any residual DAPI that would cause background staining. Cells were mounted onto a microscope slide in distilled H2O and were visualized using a Nikon Eclipse 90i microscope (Nikon Instruments Inc., Melville, NY, USA) with an Andor camera (Andor Technology Ltd., Belfast, Northern Ireland).
DNA was extracted from WJT24VFNP31 using MO BIO PowerPlant Pro DNA Isolation Kit (Qiagen, Germantown, MD, USA) following the manufacturer’s protocol with the exception of using distilled water instead of Buffer 7. The total genomic DNA was sequenced using Illumina MiSeq (Illumina Inc., San Diego, CA, USA) technology (150 bp paired reads). Two million reads (2,078,788) were obtained in total from two separate MiSeq runs.
The sequences of the 18S and 28S genes were initially obtained using PCR and Sanger sequencing, following the protocols described in
The Illumina reads were paired, trimmed and assembled de-novo using Geneious v.6 (Biomatters Inc., Boston, MA, USA; details in
After the full contigs for rDNA and the chloroplast genome were obtained, a reference assembly was performed in Geneious to visually inspect coverage and its distribution along the contigs. Stringent conditions were applied in the reference assembly, allowing no more than 1% mismatches and 1% gaps, with a word size of 20 nucleotides. Areas of low coverage were inspected by eye to rule out mis-assemblies.
The NCBI nucleotide database was searched for closest matches for the rRNA gene sequences of WJT24VFNP31 using BLAST (
The secondary structure of the 18S gene was visualized using the Chlamydomonas reinhardtii P.A.Dangeard model from https://bioinformatics.psb.ugent.be/webtools/rRNA/secmodel/Crei_SSU.html (
The secondary structure of the internal transcribed spacer 2 (ITS2) was predicted using the ITS2 Database (
Nucleotide sequences of 59 chloroplast genes were aligned with the existing data from
The evolution of the 18S GC content was reconstructed using fastAnc in the R package phytools v.1.2 (
To examine whether GC content in the 18S gene is correlated with the temperature of the locality of origin, we first conducted a Pearson correlation test in base R on the uncorrected temperature and 18S GC data, which were available for all but two of the Chlorophycean species included in our phylogenetic data set. We then regressed the 18S GC and average temperature of the warmest month at the locality of origin and corrected for the effect of phylogenetic relatedness using the phylogenetically independent contrasts (PIC) method (
Three replicate cultures were grown for three weeks or longer until sufficient biomass was produced for extraction. A minimum of 15 mg of wet algal biomass was harvested and suspended in gas-chromatography vials with methanol. The open vials were placed in a glass desiccator overnight with Drierite desiccant. After weighing the vials for their dry mass the next day (optimum dry mass being above 5 mg), transesterification was conducted using 200 μl of 2:1 CHCl3:methanol, 300 μl of 0.6 M HCl:methanol, and 25 μl of C13:0 methyl ester internal standard. The vials were tightly sealed with parafilm to prevent evaporation. After heating the samples at 85°C for an hour in a water bath, the vials were cooled and 1 ml of hexane was added to each vial. The vials were then vortexed, allowed to rest for 1–4 hours, then 500 μl of the top layer was removed and placed in a new vial (
All lipids were quantitated by a Varian CP-3800 Gas Chromatograph using a Saturn 2000 Mass Spectrometry detector (Varian, Palo Alto, CA, USA). The gas chromatography analysis was conducted using a Phenomenex ZB-WAX capillary column (Phenomenex, Torrance, CA, USA) with helium as carrier gas with the following parameters: helium flow 1 mL/min, temperature program: 100°C for 1 min, 25°C/min heating up to 200°C, hold for 1 min, 5°C/min heating up to 250°C, hold for 7 min. For mass spectrometry, a two-minute solvent delay was followed by electron impact ionization and detection of masses from 45–400 m/z. Quantitation was accomplished with the Total Ion Chromatogram for all lipids except for C22:5 (quant ions 79 + 91 m/z) and C24:0 (quant ion 382 m/z) because their peaks overlapped.
The lipid content, composition, and saturation indexes were calculated in MS Excel. The percent total lipid content of the samples was calculated using the initial dry mass of the algae after they were desiccated. Based on total lipid content, estimated percent composition of the various fatty acids were determined.
Cells were solitary, spherical or nearly so (Fig.
Light micrographs capturing the morphology of Johansenicoccus eremophilus gen. et sp. nov. A, B. Young and mature vegetative cells. C. Vegetative cells and autospore formation. D. Mature vegetative cells with clear multiple chloroplasts and extra-plastidic inclusions. E. Detail of vegetative cell; arrows point to unidentified extra-plastidic inclusions. F, G. Irregularly shaped cells, possibly in the process of spore production. Scale bars represent 10 µm, second scale bar pertains to E–G.
The phylogenetic relationships among Chlorophyceae, derived from the sequences of 59 chloroplast genes, were largely consistent with the phylogenies of
Bayesian consensus phylogeny based on an analysis of 59 concatenated protein-coding chloroplast genes, with the nucleotide data partitioned by codon position. Numbers on branches represent Bayesian Posterior Probability. Solid branches represent relationships that were also recovered in the corresponding Bayesian analysis of the amino acid data. Dashed branches indicate relationships that were not recovered in the amino acid analysis. Scale bar represents the estimated number of nucleotide substitutions per site.
Under the stringent reference assembly conditions, which gave a conservative (low) estimate, the chloroplast genome coverage ranged from 12 to 139 (mean 75.9). The chloroplast genome was assembled in two equally probable structures: either in a single circular contig with two direct (not inverted) repeats containing ribosomal genes, or in two circular contigs containing one of the repeats each. Without long-read sequencing data, it is not possible to determine the true structure of the genome, and it is also possible that the genome exists in both forms. Nevertheless, all expected genes were detected in the contig(s) and upon visual inspection there were no areas of suspected mis-assembly. Therefore, we concluded that the chloroplast genome sequence is likely complete, despite the uncertainty about its structure.
The chloroplast genome of WJT24VFNP31 (GenBank OQ849777) is 235,854 bp in size. Of the two possible structural configurations we present the single circular molecule with two direct repeats. The repeats were confirmed by their twofold read coverage compared to the rest of the plastome, and contain ribosomal RNA genes, two tRNA genes (trnA-UGC and trnI-GAU), and part of the petA gene. No introns were found in the repeats (Supplementary material
All genes expected in Sphaeropleales and Chlorophyceae incertae sedis were detected in the WJT24VFNP31 plastome. In addition, the uncommon (in Chlorophyceae) trnR-UCG gene was present, as was a trnT-AGU gene. The genome also contained a trans-spliced psaA gene, split into two exons at position 269, as is common in the SV clade, but contiguous at position 89, a condition only reported from a handful of SV taxa. A full account of gene and intron content across analysed chlorophyceans is presented in Supplementary material
The full nuclear ribosomal gene set is deposited in GenBank under accession number OQ849776. The top BLAST match for the 18S gene was 84.1% similar (Heterochlamydomonas inaequalis Ed.R.Cox & T.R.Deason), and others in the 83–84% range included other Volvocales, Sphaeropleales (e.g., Mychonastes P.D.Simpson & S.D.Van Valkenburg), and even Trebouxiophyceae (Neocystis Hindák), further underscoring the uniqueness and uncertain phylogenetic position of WJT24VFNP31 and its rDNA sequence.
The GC content in the 18S gene was 54.9%, which is 3% higher than the next highest GC in the data set, which is in Jenufa perforata Němcová, M.Eliáš, Škaloud & Neustupa (51.9%). The third highest was Borodinellopsis texensis Dykstra (51.1%), illustrating that high GC content is not exactly clade-specific, although high 18S GC seems to be more common among incertae sedis taxa (lineages outside of OCC, Volvocales, and Scenedesminia, Fig.
The ITS2 structure was significantly different from those currently deposited in the ITS2 Database. The Database tools readily detected the proximal stem and the borders of the spacer, however, the initial search for suitable templates did not yield any results. Two of the putatively closest relatives of the strain WJT24VFNP31 were therefore selected as templates for structural folding: Spermatozopsis (represented in the database only by S. exsultans), and Ankyra judayi. No high-quality model could be derived from the Spermatozopsis template. Percentages of helix transfer were 64, 35, 50, and 22 for the four helices, respectively. Mapping was also performed onto the Ankyra template and yielded helix transfer percentages of 66, 37, 60, and 50. Ankyra judayi was thus a better fit as a template for WJT24VFNP31. The predicted ITS2 structure modelled onto the Ankyra template contained many extra bulges (unpaired regions), and therefore alternative structures of individual helices were explored. Mfold (
Figure
In contrast to the nuclear ribosomal GC content, the GC content of the chloroplast coding regions in WJT24VFNP31 is 38.5%, which is not remarkably high compared to other Chlorophyceae. Both species of Mychonastes have over 40% GC, and the volvocalean Stephanosphaera pluvialis Cohn has the highest chloroplast coding GC of 44%. Supplementary material
The Pearson correlation test yielded a significant positive correlation between the 18S GC content and maximum temperature at the locality of origin (p = 0.0014), and the corresponding plot showed the strain WJT24VFNP31 as an outlier with extremely high 18S GC, even given the warm, desert climate at its locality of origin (Fig.
Regression analysis of the relationship between temperature at the site of origin and GC content in the 18S gene across 59 species of green algae. Temperature refers to the average temperature in the warmest month at or near the site of the species/strain’s collection. Algae from different habitats (aquatic, snow, soil) are represented with differently shaped symbols. The strain WJT24VFNP31 (Johansenicoccus eremophilus gen. et sp. nov.) is labelled. Grey line represents the fitted regression line.
The chloroplast ribosomal genes were not investigated in depth, as data for several taxa are not available due to the partial nature of the published genome sequences. In most algae in our data set, the GC content in chloroplast ribosomal genes was 10–20% higher than the general coding GC in the chloroplast. The strain WJT24VFNP31 and Bracteacoccus giganteus H.W.Bischoff & Bold had the highest GC content in the rrs gene: 53.7%. All three included species of Bracteacoccus had rrs GC over 53%, while most other taxa had rrs GC < 52%. There was a weak positive correlation between temperature and rrs GC: r = 0.2733 and p = 0.0499. The data set is available in our GitHub repository for further exploration (https://doi.org/10.5281/zenodo.8011298).
The most abundant fatty acid in the strain WJT24VFNP31 was alpha-linolenic acid (C 18:3 n-3, 56.0% on average), followed by palmitic (C16:0, 24.7%) and vaccenic (C18:1 trans-11, 9.7%) acids (Supplementary material
Johansenicoccus eremophilus Fučíková & Pietrasiak.
Spherical, broadly oval or irregular cells with multiple parietal chloroplasts in maturity, without pyrenoids. Reproduction via autospores. Resembles several coccoid genera, especially Bracteacoccus Tereg, Chromochloris Kol & F.Chodat, Pseudomuriella N.Hanagata, and Rotundella Fučíková, P.O.Lewis & L.A.Lewis, but is distinct phylogenetically as evidenced by analyses of 18S rDNA and rbcL data (GenBank accessions OQ849776 and OQ849777, respectively).
Named in honour of Dr Jeffrey R. Johansen, the Brontosaurus of desert soil algae.
USA – California • Joshua Tree National Park; 33°45’47”N, 115°47’46”W; 13 Jun. 2006; holotype: fixed algae on microscope slide, CONN [CONN00234349]; authentic culture: UTEX B 3223.
Cells solitary, spherical or rarely oval or irregular, 4–12 μm in diameter. Chloroplasts parietal without pyrenoids. In young cells, chloroplast single and cup-shaped or lobed, nucleus single. Multiple chloroplasts and nuclei in mature cells. Older cells contain darkly reddish-brown cytoplasmic granules outside the plastids. Cell wall thin and smooth, not thickening appreciably with age. Reproduction via four or more autospores; zoospore production uncertain. In culture, the cells form elevated colonies with lighter, well-defined margins, which grow and merge into a firm, glossy lawn. Senescent cultures accumulate orange pigments, turning from green to olive and eventually to deep orange in colour.
So far only known from the type locality, Joshua Tree National Park, USA.
Biological soil crust, desert.
The name reflects the desert-dwelling nature of the species, as erêmos means desert in ancient Greek and phílos means loving.
The morphology of J. eremophilus is a fairly common one among Chlorophyceae. Spherical cells with multiple parietal, pyrenoid-free chloroplasts are also found in Bracteacoccus, Bracteamorpha Fučíková, P.O.Lewis & L.A.Lewis, Chromochloris, Pseudomuriella, Rotundella, and Tumidella Fučíková, P.O.Lewis & L.A.Lewis. These genera are all made up of soil-dwelling species and include numerous desert-dwellers. Additionally, the aquatic genus Dictyococcus Gerneck can be considered morphologically similar but is readily distinguished by the inflexed edges of its chloroplasts (
The genus Tumidella was not represented in our phylogenetic data set but was shown as related to Bracteacoccus and other Scenedesminia by
A confident identification of any of the aforementioned genera should involve the sequencing of at least one molecular marker. We recommend the chloroplast gene rbcL, as we cannot be sure if the anomalous nuclear rDNA is unique to only the strain WJT24VFNP31 or if similar sequence would be found in any newly discovered Johansenicoccus isolates in the future.
Spermatozopsis similis appears to be the closest relative of J. eremophilus. Their relationship received absolute statistical support in our analyses, although the two taxa are separated by considerable genetic distance (Fig.
The sister clade to S. similis and J. eremophilus is the family Sphaeropleaceae, represented in our data by Ankyra judayi and Atractomorpha echinata L.R.Hoffmann. The position of S. similis in the phylogenetic proximity of Sphaeropleaceae is consistent with
The nuclear ribosomal region of Johansenicoccus eremophilus contains the 18S gene, 5.8S gene, 28S gene, and two internal transcribed spacers in the expected order, but the rRNA genes are extremely GC-rich compared to other Chlorophyceae (Figs
One previously explored hypothesis is that high GC content may be an adaptation to hot climates because of the higher thermal stability of the G-C pairing compared to the A-T/A-U pairing. While there does not seem to be a correlation between GC content and temperature for whole genomes or the (more or less) freely evolving third codon positions, the pattern has been consistently recovered for structural RNA in prokaryotes (
In our study we uncovered a significant positive correlation between 18S GC content and warmest-month temperature at the locality of origin (Fig.
This uniqueness in ribosomal genes all but precludes phylogenetic analysis using 18S or 28S, which are otherwise very commonly used in algal systematics. Using secondary structure to guide the alignment and analysis, as recommended by e.g.
The chloroplast genome of Johansenicoccus eremophilus is nearly 236 kbp, which makes it the largest among its closest relatives, Spermatozopsis (135 kbp,
Unlike the unsurprising gene content, the structure of the J. eremophilus plastome is noteworthy, being either a single circular molecule with two large direct repeats containing blocks of rRNA genes (Supplementary material
Microalgal lipid production is subject to a growing body of literature, as microalgal cultivation promises advances in biofuels, feedstock, and nutritional supplements. Green algal biomass tends to be rich in 16- and 18-carbon fatty acids, and often contains significant amounts of unsaturated lipids. Our results are consistent with this body of literature. For example, Tetradesmus obliquus, a commonly used algal model, was found to contain similar fatty acids as found in our study (
The evolutionary significance of lipid composition has been well studied in vascular plants. Such studies have shown that plant cells can desaturate their membrane lipids in response to lower temperature and vice versa. The mechanisms of the maintenance of membrane fluidity are complex especially in environments where temperature fluctuates rapidly on a daily basis, as can be true in alpine environments and deserts (
In summary, we have used multiple lines of data to characterize a novel lineage of green algae, Johansenicoccus eremophilus. The alga has several unusual genomic properties, which contrast with its simple and common morphology. Some of the unique genomic features may be adaptive to the desert lifestyle of the alga. So far, ours is the only find of this taxon worldwide with no known close relatives, but additional surveys of soil algae may well discover new species of Johansenicoccus in the future. Such discoveries would allow for better understanding of the gene and genome evolution in this lineage.
We thank the Joshua Tree National Park for permission to survey biological soil crusts in 2006 and characterize the park’s soil algal flora. This strain was obtained under national park permit #JOTR-2006-SCI-0018. Sampling efforts in Joshua Tree National Park were supported by the California Desert Research Fund at The Community Foundation, Robert Lee Graduate Student Research Grant, and the Phycological Society Grant in Aid of Research awarded to NP during her Ph.D. studies. We thank Dr Valerie R. Flechtner for spearheading the collection and study of WJT green algae. Genomic sequencing was supported by NSF grant DEB-1036448. Lipid analysis materials and equipment, as well as the summer support for MT were provided by the Assumption University Department of Biological and Physical Sciences. Summer support for AI was provided by the Assumption University Honors Program. Analyses were carried out at the Computational Biology Core Facility of the University of Connecticut.
Putative structure of the chloroplast genome of Johansenicoccus eremophilus gen. et sp. nov. The chloroplast genome may be present in the cells in one of two configurations, indistinguishable by our short-read data. The structure shown in this figure is a single circular molecule with direct repeats containing the block of rRNA genes. Two smaller circular molecules, each with one ribosomal block, are an equally plausible structure based on our data.
Gene content and intron positions across chlorophycean plastomes. Asterisks (*) indicate that the gene is located in an inverted repeat, or in the case of WJT24VFNP31 in a direct repeat. Numbers refer to insertion positions of introns, if present. TR = trans-spliced intron, C = cis-spliced introns (if both cis- and trans-spliced are present in the same gene, otherwise cis is assumed). Blue colour of cells indicates the complete sequence of the gene is available; light blue indicates a partial sequence where uncertainty may exist about intron presence/absence. Grey colour indicates an incomplete genome assembly where the gene in question is missing but may be expected to be present based on its presence in related taxa. White cells indicate gene is absent from the completely assembled plastome.
18S rRNA secondary structure of Johansenicoccus eremophilus gen. et sp. nov. (GenBank accession number OQ849776) with CBCs, HCBCs, and other base-pairing changes highlighted in colour, and substitutions and structural differences unique to Johansenicoccus highlighted with arrows, compared to Chlamydomonas reinhardtii.
18S secondary structure of Chlamydomonas reinhardtii (GenBank accession number M32703) with highlighted differences (with colour and arrows) from Johansenicoccus eremophilus gen. et sp. nov.
Results of the nuclear internal transcribed spacer 2 (ITS2) secondary structure modeling in Mfold and ITS2 Database. a: secondary structures of the ITS2 helices of Johansenicoccus eremophilus gen. et sp. nov. predicted by Mfold. b: secondary structure of J. eremophilus predicted by ITS2 Database using Ankyra judayi as template. c: Ankyra judayi ITS2 secondary structure. d: Spermatozopsis exsultans ITS2 secondary structure.
Evolution of GC content in the 18S nuclear ribosomal gene (left) and chloroplast coding regions (right) mapped onto the chloroplast-derived phylogeny (based on data from 59 chloroplast genes), colour representation.
Regression analysis of the relationship between temperature at the site of origin and GC content in the 18S gene across 59 species of green algae, corrected for phylogenetic relatedness using phylogenetically independent contrasts (PICs). Temperature refers to the average temperature in the warmest month at or near the site of the species/strain’s collection. Dashed line represents the fitted regression line.
Lipid composition in Johansenicoccus eremophilus gen. et sp. nov. Percentages were determined from three replicate extractions and analyses. Saturated and unsaturated fatty acids are marked; the “other” category was a mix of both. Bar graph on the bottom shows the same data but adds error bars (standard error of mean) to show variation among the three replicate extractions.