Research Article
Print
Research Article
Barking up the wrong tree: the dangers of taxonomic misidentification in molecular phylogenetic studies
expand article infoRafael Felipe de Almeida§, Marco O.O. Pellegrini§, Isa L. de Morais, Rosangela Simão-Bianchini|, Pantamith Rattanakrajang§, Martin Cheek§, Ana Rita G. Simões§
‡ Universidade Estadual de Goiás, Quirinópolis, Brazil
§ Royal Botanic Gardens, Kew, Richmond, United Kingdom
| Instituto de Pesquisas Ambientais, São Paulo, Brazil
¶ Mahidol University, Bangkok, Thailand
Open Access

Abstract

Background and aimsKeraunea is a Brazilian endemic genus that has sat uncomfortably in Convolvulaceae where it was placed due to an enlarged and adnate fruit bract typical of Neuropeltis. A recent molecular phylogeny suggested that two of its five morphologically almost identical species actually belong to two different families, Malpighiaceae (superrosids) and Ehretiaceae (superasterids). Later studies have demonstrated that Keraunea effectively belongs to Ehretiaceae, but the proposal of one species belonging to Malpighiaceae has remained problematic. In this study, we re-assess this hypothesis, discuss the issues that have led to this assumption, and offer insights on the importance of carefully using herbarium collections and incorporating morphological evidence in systematic studies.

Material and methods – Sequences of matK, rbcL, and ITS for all 77 currently accepted genera of Malpighiaceae, K. brasiliensis and Elatinaceae (outgroup) were compiled from GenBank and analysed with Maximum Likelihood and Bayesian Inference criteria for nuclear, plastid and combined datasets. Additional database and herbarium studies were performed to locate and analyse all duplicates of the holotype of K. brasiliensis to check for misidentified or contaminated material.

Key results – Our examination of expanded DNA datasets and herbarium sheets of all K. brasiliensis isotypes revealed that a mistake in tissue sampling was, in fact, what led to this species being proposed to belong in Malpighiaceae. Kew’s isotype had a leaf of Malpighiaceae (likely Mascagnia cordifolia) stored in the fragment capsule, which was sampled and sequenced instead of the actual leaves of K. brasiliensis. Recently published studies have settled the placement of Keraunea in Ehretiaceae (Boraginales) and proposed three additional species.

Conclusions – DNA sequences can be helpful in classifying taxa when morphology is conflicting or of a doubtful interpretation, with molecular phylogenetic placement being established as a popular tool accelerating the discovery of systematic relationships. Nonetheless, molecular techniques are also susceptible to methodological mistakes, which necessitates building a solid foundation of plant morphology and taxonomy to avoid artefacts in phylogenetic studies.

Keywords

Boraginales, collections, Convolvulaceae, DNA extraction, Malpighiales, phylogenetics, taxonomy

Introduction

For the past four decades, phylogenetic studies have relied on biological collections, such as herbaria, as an essential source of DNA samples. These collections provide easy access to a wealth of specimens that could otherwise only be obtained via costly or difficult fieldwork, such as: 1) specimens from a wide range of geographical locations (e.g. different continents); 2) scarce plant material available (e.g. only the type specimen); 3) threatened or extinct species; and 4) geographically restricted populations (Shepherd and Perrie 2014; Bieker and Martin 2018). However, DNA in mounted herbarium specimens decays very fast (for comparison, six times faster than in animal bones) and is usually available in small amounts and highly degraded into short fragments (Weiß et al. 2016; Bieker and Martin 2018). Although some herbarium specimens are simply too degraded to be used in traditional sequencing methods (i.e. Sanger sequencing), Next Generation Sequencing methods, which enrich DNA extracts with extremely short (40–100 bp) DNA fragments, pushed the boundaries of what could be sequenced from very low-quality genetic material (Bieker and Martin 2018). These new advances in DNA sequencing from herbarium specimens have been reflected in the last decade of published phylogenetic studies, with an increase of 50% in the number of studies mainly or solely relying on herbarium specimens for DNA extraction (Bieker and Martin 2018).

A common issue with using herbarium samples for DNA extraction is contamination and misidentification, usually occurring during laboratory procedures (Wang 2018) or originating from biological collections when specimens are not identified by taxonomic experts. Recent studies have been proposing new statistical methods to detect and remove contaminants from animal (Weissensteiner et al. 2021; Owen et al. 2022), bacterial (Pightling et al. 2019), or environmental (Sepulveda et al. 2021) phylogenomic datasets. However, before these recent efforts to identify contaminants in phylogenomic datasets were developed, previous phylogenetic studies that included sequences of contaminated or misidentified specimens generated erroneous phylogenetic trees in different groups of the plant tree of life (e.g. ApiaceaeDownie et al. 2010; BetulaceaeWang et al. 2016; JuncaceaeElliot et al. 2023; LauralesSmith and Brown 2018; MenispermaceaeOrtiz et al. 2007; RubiaceaeMcCartha et al. 2019; and TofieldiaceaeChen et al. 2013). The impact on subsequent secondary analyses of these genetic data is extensive, including molecular dating estimates, biogeographic inferences, or systematic studies re-classifying organisms solely based on the contaminant sequences (Wang et al. 2014). Nonetheless, to the best of our knowledge, no study to date has addressed the impact of taxonomic misidentification of herbarium specimens used for DNA extraction in plant molecular systematics. In this work, we focus on a recent issue with re-classifying the genus Keraunea Cheek & Sim.-Bianch. as an example of the implications of taxonomic misidentifications in DNA samples obtained from herbarium specimens for phylogenetic studies.

Keraunea was first published by Cheek and Simão-Bianchini (2013), describing a single species endemic to Brazil, K. brasiliensis Cheek & Sim.-Bianch. At the time, it was proposed that it belonged to the family Convolvulaceae, based on the presence of a superior ovary with two carpels, bifid stigma, gamopetalous corolla, epipetalous stamens, climbing habit, and alternate, exstipulate, pinnately-nerved, simple and entire leaves. It also presented an unusual fruit, with much-enlarged bracts, adnate to the pedicels (Fig. 1), characteristic of the Convolvulaceae genera Neuropeltis Wall. and Neuropeltopsis Ooststr., in tribe Poraneae (sensu Staples and Brummitt 2007). Another three genera of Convolvulaceae in this tribe, Calycobolus Willd. ex Schult., Dipteropeltis Hallier f., and Rapona Baill., present superficially similar wind-dispersed analogous structures. However, these enlarged leaf-like structures embracing the fruit are, in fact, sepals and not bracts (as found in Keraunea, Neuropeltis, and Neuropeltopsis). The presence of a single style, rather than a bifid style, as is characteristic of Neuropeltis and allied genera, and the genus’s clear distinctiveness from the other members of Poraneae, especially regarding stigma shape, resulted in Keraunea uncomfortably sitting within Convolvulaceae since its description. Only a few months later, a second species of Keraunea was described, with only relatively minor morphological differences to K. brasiliensis (e.g. indumentum density, inflorescence structure, the length of the calyx and corolla lobes, and corolla length), K. capixaba Lombardi (Lombardi 2014). The description of the genus and the two species was based solely on macromorphological characters. The molecular phylogenetic relationships between the species and genus to the rest of Convolvulaceae remained unconfirmed.

Figure 1. 

Field photographs of the fruits of (A) Keraunea brasiliensis Cheek & Simão-Bianchini (Ehretiaceae), photo by Domingos Cardoso; (B) Neuropeltis racemosa Wall. (Convolvulaceae), photo by Pantamith Rattanakrajang; (C) Calycobolus campanulatus (K.Schum. ex Hallier f.) Heine (Convolvulaceae), photo by Olivier Lachenaud, and (D) Mascagnia cordifolia (A.Juss.) Griseb. (Malpighiaceae), photo by Marco O.O. Pellegrini.

A recent large-scale phylogenetic study of the family, using nuclear genomic data with target capture techniques (Simões et al. 2022), proposed, for the first time, that Keraunea did not belong to Convolvulaceae. However, no discussion on morphological characters supporting this placement was added, nor was an alternative family classification proposed, leaving Keraunea as “incertae sedis”. Another recently published molecular phylogenetic study (Muñoz-Rodríguez et al. 2022), based on phylogenetic analysis of plastid (matK, rbcL) and nuclear (ITS) regions, restated that the genus was placed outside of Convolvulaceae, supporting Simões et al. (2022). Both species of the genus (i.e. K. brasiliensis and K. capixaba) were sampled alongside sequences of a wide range of taxa across Convolvulaceae, with the results leading the authors to presume that Keraunea was polyphyletic (Muñoz-Rodríguez et al. 2022): the type species (i.e. K. brasiliensis) would belong to Malpighiaceae (Malpighiales, superrosids) “despite several morphological anomalies”, while the second species (i.e. K. capixaba) should belong to Ehretiaceae (Boraginales, superasterids). The authors also stated that the isotype of K. brasiliensis (Passos et al. 5263) should be placed in Malpighiaceae. At the same time, one of the paratypes of the same species (Lombardi 1819) belonged to Ehretiaceae alongside K. capixaba. Consequently, these results raised concerns about possible methodological issues around the sampling of Passos et al. 5263. However, Muñoz-Rodríguez et al. (2022) concluded that “[their] molecular results strongly suggest Passos et al. 5263 belongs to Malpighiaceae, most likely within Mascagnia” but refrained from proposing any taxonomic changes at that time.

We have found these results difficult to reconcile with the existing taxonomic knowledge of these plant groups, especially considering how evolutionarily distant Malpighiaceae and Ehretiaceae are and how both species of Keraunea are remarkably morphologically similar to each other. Hence, in order to explain the polyphyly of Keraunea, the occurrence of an exceptional morphological convergence between members of these two very distinct and distantly-related families would be necessary. As a result, our goal is to provide a plausible explanation for these curious results with a re-analysis of all available evidence. In this study, we test the placement of Passos et al. 5263 (“Keraunea brasiliensis”) in Malpighiaceae through a set of comprehensive phylogenetic and herbarium analyses of the same specimens and expanded DNA datasets as Muñoz-Rodríguez et al. (2022), shedding new light on this taxonomic conundrum.

Material and methods

Phylogenetics

Muñoz-Rodríguez et al. (2022) used an outdated generic sampling for Malpighiaceae based on Davis and Anderson (2010). Since then, several phylogenetic and taxonomic studies have been published, including the synonymy of several genera and the publication of two new genera for Malpighiaceae (Davis et al. 2020; Almeida and van den Berg 2021). In order to accurately test the phylogenetic placement of Keraunea brasiliensis within Malpighiaceae, we downloaded sequences of the markers matK, rbcL, and ITS stored on GenBank for all 77 currently accepted genera of Malpighiaceae according to POWO (2023) and the sequences of K. brasiliensis (Passos et al. 5263) presented as supplementary material by Muñoz-Rodríguez et al. (2022). Sequences of Elatine L. and Bergia L. (Elatinaceae), the sister group of Malpighiaceae (Davis and Chase 2004; Cai et al. 2016), were also used to root our analysis. Additionally, we performed a secondary analysis to test the placement of K. brasiliensis within the genus Mascagnia (Bertero ex DC.) Bertero, including 18 accepted species (out of 58) with available sequences on GenBank, four species of Amorimia W.R.Anderson as the outgroup, and Ectopopterys W.R.Anderson to root the analysis.

Datasets were compiled, for each marker, using Geneious v.4.8 (Kearse et al. 2012) and aligned using Muscle v.1.0 (Edgar 2004), with subsequent adjustments in the preliminary matrices by visual inspection. Separate and combined analyses of plastid, nuclear, and plastid + nuclear regions were performed using Bayesian Inference (BI) and Maximum Likelihood (ML) criteria for phylogenetic reconstruction. Both model-based methods were conducted with a mixed model (GTR+G+I) and unlinked parameters selected using jModelTest 2 (Darriba et al. 2012), and the analyses were done with MrBayes v.3.1.2 (Ronquist and Huelsenbeck 2003) and raxmlGUI 2.0 (Edler et al. 2021). For the BI, the Markov Chain Monte Carlo (MCMC) was run using two simultaneous independent runs with four chains each (one cold and three heated), saving one tree every 1,000 generations for ten million generations. We excluded 20% of the retained trees as “burnin” and checked for a stationary phase of the likelihood, checking for ESS values higher than 200 for all parameters on Tracer v.1.6 (Rambaut et al. 2014). The clades’ posterior probabilities (PP) were based on the 50% majority rule consensus, using the stored trees, and calculated with MrBayes v.3.1.2 (Ronquist and Huelsenbeck 2003). Bootstrap values are shown above the branches, while the posterior probabilities values are shown below. All datasets and consensus trees are available as supplementary files (https://doi.org/10.6084/m9.figshare.21961550.v2).

Herbarium studies

Images of the holotype and all isotypes of Keraunea brasiliensis (Passos et al. 5263) were searched in online specimen databases, such as GBIF (https://www.gbif.org), JSTOR (https://plants.jstor.org), Jabot (http://jabot.jbrj.gov.br), Reflora (https://reflora.jbrj.gov.br), and speciesLink (https://specieslink.net). The Kew isotype (K000979156) was consulted in person at the Kew herbarium (Rafael Almeida, Ana Rita Simões, and Martin Cheek), and the Brazilian duplicates (ALCB, CEPEC, HRCB, HUEFS, and SPF; acronyms according to Thiers 2023) were accessed by Rosangela Simão-Bianchini and local collaborators among the staff of the above-cited herbaria. At Kew herbarium, morphological details were photographed using a Leica S9i stereomicroscope with a coupled digital camera.

Results

Phylogenetics

The topologies of the nuclear, plastid, and combined phylogenetic trees were found to be highly congruent. Hence, we have chosen to discuss the results in light of the combined analysis instead of the individual datasets (for additional information, see supplementary files). “Keraunea brasiliensis” (Passos et al. 5263) was recovered with high support as nested within the genus Mascagnia in the Malpighioid clade by both the individual and combined analyses, using BI and ML inference criteria (Fig. 2). The low support for most relationships within the Tetrapteroid clade is interpreted as the result of missing data between all three molecular datasets. Based on our secondary analysis, “K. brasiliensis” was recovered with high support as sister to Mascagnia cordifolia (A.Juss.) Griseb (Fig. 3). Thus, we compared the DNA sequences of K. brasiliensis to M. cordifolia in the individual matK and rbcL alignments and observed that the DNA sequences of the two species for these two genetic markers had the exact same nucleotide composition (see supplementary files). The rbcL sequence of “K. brasiliensis” was also shorter (600 bp) than that of M. cordifolia and the remaining Malpighiaceae (1,400 bp). Hence, our molecular phylogenetic results support the view that the sequence of the specimen Passos et al. 5263 (K) from Muñoz-Rodríguez et al. (2022) belongs to Mascagnia, very probably representing the species M. cordifolia, given the identical genetic sequences for the barcoding markers matK and rbcL.

Figure 2. 

Consensus tree of the combined analysis based on the markers matK, rbcL, and ITS showing the phylogenetic placement of “Keraunea brasiliensis” (Passos et al. 5263) (highlighted in red) within Malpighiaceae, making the family non-monophyletic. Elatinaceae (highlighted in light grey) represents the outgroup and the root of this analysis. Bootstrap values from the ML are shown above the branches, and posterior probabilities from the BI are shown below the branches. The tree on the left is presented for branch length visualisation. Photographs of Elatine gratioloides A.Cunn. by Melissa Hutchison, Stigmaphyllon angustilobum A.Juss. by Rafael F. de Almeida, and Keraunea spp. by Geovane S. Siqueira.

Figure 3. 

Consensus tree of the combined analysis based on the markers matK and rbcL showing the phylogenetic placement of “Keraunea brasiliensis” (Passos et al. 5263) (highlighted in red) within Mascagnia. Amorimia W.R.Anderson (highlighted in light grey) represents one of the outgroups, and Ectopopterys W.R.Anderson is the root of this analysis. Bootstrap values from the ML are shown above the branches, and posterior probabilities from the BI are shown below the branches. The tree on the left is shown for branch length visualisation. Photographs of Amorimia ssp. by Fabián Michelangeli, Mascagnia cordifolia by Marco O.O. Pellegrini, and Mascagnia australis C.E.Anderson by Climbie F. Hall.

Herbarium studies

In light of the confirmation of the molecular evidence by Muñoz-Rodríguez et al. (2022), we moved on to questioning if there could have been an issue with the source of the sequence, either by laboratory contamination or problems with the source of the samples. The first hypothesis of laboratorial contamination was discarded a priori since Muñoz-Rodríguez et al. (2022) themselves stated that their sample of K. brasiliensis was re-sequenced by different laboratories, which resulted in identical sequences. This led us to investigate next if there could have been an issue with the source of the sample. Hence, the solution to this taxonomic conundrum would rely on the analysis of the herbarium sheets sampled by Muñoz-Rodríguez et al. (2022).

Nonetheless, these authors do not explicitly state which duplicate of Passos et al. 5263 was sequenced (from the six available specimens: ALCB, CEPEC, HRCB, HUEFS, K, or SPF), although it is mentioned by the authors that only the K herbarium was visited in person. However, the K isotype (K000979156) was not annotated to indicate that it had been sampled for DNA studies by the authors. In fact, this isotype only had a DNA sample slip dated from 2019 for unpublished molecular phylogenetic studies led by Kew’s in-house researcher Dr Tim Utteridge. Thus, we also looked at the possibility of the sequences of K. brasiliensis generated by Muñoz-Rodríguez et al. (2022) being from one of the Brazilian isotypes (i.e. ALCB037775, CEPEC00077827, HRCB38156, and HUEFS0028681) or the holotype (i.e. Passos et al. 5263, SPF). The curators of the abovementioned herbaria were contacted, and it was confirmed that leaf material for DNA sequencing had not been sent from these herbaria to Muñoz-Rodríguez or collaborators. Additionally, the database of the Brazilian Federal Government authority that regulates and authorises the use of any genetic material from Brazilian biological diversity in scientific studies (SisGen - Sistema Nacional de Gestão do Patrimônio Genético e do Conhecimento Tradicional Associado, https://sisgen.gov.br) was checked for records of authorisation having been granted to sequence the DNA of K. brasiliensis from any of the Brazilian herbaria in which all of the type specimens of K. brasiliensis were deposited. This search retrieved no results, further supporting that none of these materials had been sampled by foreign researchers or sequenced outside Brazil. Hence, we conclude that the sampled specimen would have, indeed, been the K sheet.

During the examination of the K isotype of K. brasiliensis (Passos et al. 5263), it was found that the plant was entirely glued to the sheet, and some detached leaves and fruits were stored in a paper capsule on the left lower side of the sheet (Fig. 4). These fragments were likely the source of the sample collected by Muñoz-Rodríguez and collaborators, considering the difficulties in collecting leaf material from the glued specimen. Yet, we observed that one of the leaves inside the fragment capsule did not match the general morphology of the remaining leaves of K. brasiliensis (Fig. 4). Inspection under a hand lens and stereomicroscope revealed that this “distinct” leaf presented V-shaped, 1-celled malpighiaceous hairs instead of the unbranched, multicellular hairs with bulbous bases, of the remaining leaves of K. brasiliensis (Fig. 4). On this discovery, Rosangela Simão-Bianchini and several Brazilian herbaria collaborators also checked the holotype and isotype sheets. Particular attention was given to the content of fragment capsules to check for possible foreign leaf material mixed up with true leaves of Keraunea. This investigation confirmed that none of the duplicates of Passos et al. 5263 and their fragment capsules in Brazilian herbaria contained leaf fragments with V-shaped, 1-celled malpighiaceous hairs. Thus, we could conclude that this sample mixture only pertained to the K isotype, further supporting our assumption that Muñoz-Rodríguez et al. (2022) did, indeed, sample the K specimen and not one of the Brazilian specimens.

Figure 4. 

Photograph of the isotype of Keraunea brasiliensis (Passos et al. 5263) deposited at RBG Kew’s herbarium showing the open fragment capsule storing leaves of K. brasiliensis (lower photographic detail showing unbranched hairs with bulbous base) and a single leaf of Mascagnia cordifolia (upper photographic detail showing 2-branched hairs). Photograph by Rafael Felipe de Almeida.

Thus, our critical re-analysis of the molecular and morphological data used by Muñoz-Rodríguez et al. (2022) allowed us to conclude that the phylogenetic placement of K. brasiliensis in Malpighiaceae can be explained by the contamination of the fragment capsule in the K isotype (Passos et al. 5263, K000979156) by Malpighiaceae leaves and their fragments. It is possible that this contamination was accidentally introduced during fieldwork or during the processing of the specimens for herbarium incorporation when leaf material, most likely of Mascagnia cordifolia, was added to the capsule by mistake.

Discussion

Molecular phylogenetics of misidentifications and contaminants

It is not uncommon for molecular phylogenetic studies to demonstrate that genera, and every so often species, are not monophyletic. In fact, the main contribution of molecular phylogenetic studies to plant systematics has been to challenge traditional classification systems based on morphological dogmas. One of the most famous cases is the dicotyledon/monocotyledon traditional division of flowering plants, which was challenged and reorganised into an entirely different system of subdivisions of flowering plants once the first molecular phylogenetic studies demonstrated the non-monophyletic nature of the dicotyledons (Chase et al. 1993). Nonetheless, to the best of our knowledge, Keraunea represents an unprecedented case in which the type species of a genus was presumed non-monophyletic. Different samples of an isotype and a paratype were placed in distinct clades of Pentapetalae (i.e. superasterids and superrosids) without any morphological support.

When Muñoz-Rodríguez et al. (2022) first observed the non-monophyly of K. brasiliensis, these authors re-sequenced the tissue samples in a different laboratory to further investigate their unusual results. In fact, this is only the first step towards troubleshooting phylogenetic incongruencies not corroborated by morphological evidence. The second and most important step is going back to the original sampled specimen to check for potential taxonomic misidentifications and/or contamination by foreign plant material. Unfortunately, skipping this second step seems to be common practice in plant phylogenetic studies, with several Sanger DNA sequences published on GenBank for different groups of seed plants showing misidentification problems, as discussed by Smith and Brown (2018). These authors had to exclude several shorter gene regions (i.e. DNA barcode regions such as matK and rbcL) from their seed plant phylogenetic study due to misidentification or contamination of samples, which had led to the incorrect placements of several of the analysed groups. Thus, even if only a single sequence constitutes a misidentification or contamination in a multiple loci analysis, this is enough to drive the incorrect placement of the taxon. Consequently, when conducting large-sized phylogenetic analyses, small percentages of bad data can dramatically inhibit accurate phylogenetic estimates (Smith and Brown 2018) and lead to incorrect and often disruptive taxonomic placements, such as in the case of Keraunea.

The case of the Keraunea brasiliensis isotype being phylogenetically misplaced in Malpighiaceae could have been avoided if Muñoz-Rodríguez et al. (2022) had noticed some macro-morphological differences in the leaves (Table 1) or the presence of V-shaped, 1-celled malpighiaceous hairs (Table 1, Fig. 4). Alternatively, in the face of the odd molecular results, these authors should have critically re-analysed the sampled herbarium specimens while looking for the potential source of the mistake. It is almost impossible for most families of flowering plants to be reliably identified on vegetative characters alone, especially using only leaf fragments. Malpighiaceae represents one of the few exceptions and maybe one of the most well-known. One of the most significant synapomorphies for this family, the indumentum of their vegetative structures, is always made of 1-celled hairs with two branches that can be T-, Y-, or V-shaped, united by a well-developed or inconspicuous base (i.e. foot) (Almeida and Morais 2022).

Table 1.

Morphological differences between the leaves of Keraunea brasiliensis (Ehretiaceae, Boraginales, superasterids) and Mascagnia cordifolia (Malpighiaceae, Malpighiales, superrosids).

Keraunea brasiliensis Mascagnia cordifolia
Petiole
Shape Canaliculate D-shaped in cross-section
Glands Absent 1-glandular at the base near the insertion with the stem, gland discoid
Leaf blade
Overall shape Elliptic to lanceolate Orbicular to broadly elliptic to ovate
Base shape Cuneate Cordate to slightly cordate to round
Margin Flat Slightly revolute
Apex shape Obtuse Mucronate
Texture Membranous Coriaceous
Colouration (in sicco) Greenish-grey to greenish-tan Dark green to olive-green
Venation Camptodromous, secondary veins acute Brochidodromous, secondary veins obtuse to round
Glands Absent 2–7-glandular near the margin, glands punctate, slightly impressed
Indumentum
Pubescence type Hirtellous Velutinous
Colouration White Tan to yellowish brown
Hair morphology Multicelled, acicular (unbranched), base bulbous 1-celled, V-shaped (2-branched), delicate, foot absent

This unique hair morphology was first described in Malpighiaceae and has since been referred to as malpighiaceous hairs in taxonomic literature (e.g. Almeida and Morais 2022). However, malpighiaceous hairs are not exclusive to Malpighiaceae, being also found in 23 unrelated families of flowering plants (i.e. Acanthaceae, Aizoaceae, Asteraceae, Boraginaceae, Brassicaceae, Burseraceae, Cannabaceae, Capparaceae, Combretaceae, Convolvulaceae, Cornaceae, Connaraceae, Ebenaceae, Escalloniaceae, Lythraceae, Myrtaceae, Sapindaceae, Sapotaceae, Thymelaeaceae, Verbenaceae, Vitaceae, Vochysiaceae, and Zygophyllaceae; Rao and Sarma 1992). Within Convolvulaceae, 2-branched hairs (i.e. malpighiaceous hairs) are found in Ipomoea L. (Wood et al. 2016), Evolvulus L. (Silva and Simão-Bianchini 2014), Cordisepalum Verdc. (Staples 2006), Dinetus Buch.-Ham. ex D.Don (Staples 2006), Duperreya Gaudich. (Staples 2006), Poranopsis Roberty (Staples 2006), Tridynamia Gagnep. (Staples 2006), Stylisma Raf. (Myint 1966), Jacquemontia Choisy (Patel 2021), Erycibe Roxb. (Kochaiphat et al. 2021), and Neuropeltis Wall. (Breteler 2010).

Systematics of Keraunea

Keraunea was proposed by Cheek and Simão-Bianchini (2013) to be close to Neuropeltis and Neuropeltopsis, two unusual Palaeotropical genera of Convolvulaceae with very enlarged bracts supporting the fruit and partly fused to the pedicel. Therefore, the enlarged membranous bracts of Keraunea greatly resemble the membranous to subcoriaceous bracts present in Neuropeltis and Neuropeltopsis, being one of the first morphological characters that suggested its original placement in Convolvulaceae. Nonetheless, molecular phylogenetic evidence did not corroborate the close relationship between Keraunea and Neuropeltis (Muñoz-Rodríguez et al. 2022; Simões et al. 2022). Although sampling of Neuropeltis was limited to only one African species, N. acuminata (P.Beauv.) Benth., and the genus Neuropeltopsis had not been sampled in either study.

Since Neuropeltis shows malpighiaceous hairs in several species (Breteler 2010), one could argue that this type of hair is common in several genera of Convolvulaceae, being the main reason justifying the results of Muñoz-Rodríguez et al. (2022). Nonetheless, some differences distinguish the hairs of Convolvulaceae from those of Malpighiaceae. In Convolvulaceae, the 2-branched hairs are always 2-celled and T-shaped, while in Malpighiaceae, the 2-branched hairs are always 1-celled and T-, Y-, or V-shaped, such as in the mixed material from Passos et al. 5263 (K000979156; Table 1; Fig. 4). Therefore, based solely on morphology, it can be confidently established that the mixed material from the isotype at Kew is of a Malpighiaceae species, most probably a specimen of Mascagnia cordifolia, which always shows V-shaped hairs such as those presented in Fig. 4. Additionally, the hairs present in all leaves of the known specimens of Keraunea brasiliensis are always unbranched, multicelled, and bulbous at the base, as described by Cheek and Simão-Bianchini (2013) and confirmed by the present authors in the re-examination of the specimens (Table 1; Fig. 4). Thus, we here refute the hypothesis of Keraunea brasiliensis – or Keraunea pro parte – belonging to Malpighiaceae.

Meanwhile, the family placement of Keraunea has been the subject of two subsequent morphological and phylogenetic studies. A few days after the online publication of the preprint of this study (Almeida et al. 2023), another preprint was published solving the phylogenetic placement of Keraunea in Ehretiaceae (Boraginales; Cheek et al. 2023). These authors sampled four DNA markers (ITS, matK, rbcL, and trnL-F) for K. brasiliensis, K. capixaba, and a third newly proposed species (= K. confusa Moonlight & D.B.O.S.Cardoso), for all currently accepted genera of Ehretiaceae, and for a few genera of Cordiaceae and Heliotropiaceae as outgroups (Cheek et al. 2023). A few days after the publication of Cheek et al. (2023)’s preprint, Moonlight and Cardoso (2023) effectively published an independent study also solving the phylogenetic placement of Keraunea within Ehretiaceae (Boraginales) alongside a taxonomic revision for this genus including three new species (i.e. Keraunea bullata Moonlight & D.B.O.S.Cardoso, K. confusa Moonlight & D.B.O.S.Cardoso, and K. velutina Moonlight & D.B.O.S.Cardoso). Both studies excluded the contaminated sequences of Passos et al. 5363 from their molecular sampling and recovered similar trees, corroborating the correct phylogenetic placement of Keraunea in Ehretiaceae by independent analyses of comparable molecular datasets. The main difference between these studies is that Cheek et al. (2023) sampled three species of Keraunea (including a non-contaminated sequence of K. brasiliensis from Passos et al. 5363), while Moonlight and Cardoso (2023) sampled a single species of Keraunea (i.e. K. confusa, described from Lombardi 1819). Hence, at present, Keraunea comprises five accepted species endemic to the Atlantic Rainforest and Caatinga biomes in the States of Bahia, Espírito Santo, Minas Gerais, and Rio de Janeiro, Brazil (Cheek et al. 2023; Moonlight and Cardoso 2023). Since several nomenclatural types of Keraunea (i.e. isotypes) were stored as undetermined specimens within Nyctaginaceae or Solanaceae in several herbaria, future taxonomic novelties may arise after a careful re-analysis of the herbaria from eastern Brazil or additional field collections are conducted.

Future directions

The misidentification of herbarium sheets sampled in phylogenetic studies highlights the need to adequately address how molecular phylogenetic studies should proceed with contaminated and/or misidentified sequences for large-sized phylogenetic analyses. Hence, taxonomic expertise is fundamental to ensure the correct taxonomic identification of DNA samples in phylogenetic studies. On a superficial analysis of the first 60 open-access papers on plant phylogenomics retrieved for 2022 from the Google Scholar database (https://scholar.google.com, accessed 10 Jan. 2023), we gathered that only 22% of these studies included a taxonomic expert on the group (i.e. someone who has already published floras or taxonomic revisions in the analysed groups) in the authorship of the paper. Additionally, 78% of these studies do not specify the taxonomic criteria used in the study or mention the taxonomic specialist who would have confirmed the identification of the sampled specimens (see supplementary files). This scenario is worrisome since it might be a reflection of the view of biological collections by some plant molecular systematists as immutable DNA archives. This misconception of the dynamic nature of plant systematics and the need to constantly revise the determinations of the consulted specimens in the light of new evidence can be easily tackled by association and collaboration with a taxonomic expert(s) in the study group.

While large phylogenomic projects, such as RBG Kew Tree of Life (Baker et al. 2022), have developed thorough quality control pipelines to test the correct phylogenomic placement of the sequenced samples at the family rank, determinations below the family rank need to be treated with the same care and scrutiny. Furthermore, it is essential to maintain good taxonomic practice for collecting DNA samples, as relying on bioinformatic pipelines alone it is not always possible to effectively point out human errors introduced in the analyses. Good knowledge of plant taxonomy and morphology is a fundamental basis for minimising mistakes in the early stages of the molecular phylogenetic processes. To mitigate the dangers of cross-contamination in herbarium specimens or misidentification/contamination that underlie molecular systematic studies, we here suggest two sets of golden rules for herbarium sampling:

Avoiding cross-contamination on herbarium specimens

It is not unusual for plant material of several species to be mounted on the same herbarium sheet, which is labelled as a single species. Here is how it can happen:

  1. Mixed collections (fieldwork) – In the wild, where species are sampled, two or more species can grow side-by-side or entwined with each other, especially climbers. If the specimen collector is under pressure or is unobservant, specimens that are a mixture can go into the press.
  2. Look-alike species – Similar-looking species can grow together in close proximity (e.g. species of Gramineae/Poaceae). A collector who needs to fill their press and is not a specialist in the group, and is working under pressure or working under adverse conditions (e.g. poor light or weather), can collect multiple individuals to complete a sheet, overlooking that they are not all the same species, genus, or sometimes even the same family. This is fairly common for certain plant groups, like monocotyledons, aquatic plants, or small-sized species in general.
  3. Incomplete specimens – In certain species, inflorescences arise on a plant physically distant from the leaves (e.g. in cauliflorous species, subshrubs, geophytic herbs, aquatic plants, parasitic and mycoheterotrophic species, etc.). A collector can mistakenly associate two species as a single specimen, thinking they belong together. The error may not be detected before the specimen is incorporated into a herbarium and checked by an expert in that group.

Avoiding misidentification and contamination in molecular plant systematics

In an era of expansion of molecular plant systematic techniques, we here draw attention to the importance of taxonomic skills in guaranteeing the correct sampling for molecular phylogenetic studies and the ability to critically interpret the hypotheses in the light of additional biological evidence before accepting inaccurate results. For future phylogenetic and/or phylogenomic studies that rely on sampling herbarium specimens, we suggest simple recommendations to ensure rigorous taxonomic standards to minimise human error and/or taxonomic biases of any kind:

  1. identification – Avoid sampling specimens with doubtful identification, i.e. specimens not confidently identified by a taxonomic expert in the plant group (for example, someone who has already published a number of flora accounts or taxonomic revisions of said group). If you are not sure who identified it, consider it doubtful, and keep open the possibility of revisiting the specimen identification, e.g. if the molecular phylogenetic results are to some extent incongruent or inexplicable in the light of the available knowledge about the plant group.
  2. Contamination/Misidentification/Mixed specimens – Carefully check the sampled specimens for any mixed collections or contaminations on the mounted sheets, especially inside the capsule where loose fragments are stored. Loose plant fragments are the most prone to herbaria contamination and/or misidentification, and fragment capsules are always a potential host of mixed leaf material. When collecting samples from the fragment capsule, ensure that the leaves are identical to the ones on the mounted specimen, including checking under the stereomicroscope if in doubt.
  3. Nomenclature – Always verify on robust online taxonomic databases (e.g. Plants of the World Online, Tropicos.org, etc.) if the name on the identification slip or label is currently accepted by the taxonomic community. This is a fundamental step, very commonly skipped in phylogenetic/phylogenomic studies, that might impact how taxonomic information is interpreted in an evolutionary context. Thus, in addition to the genus and epithet, also check for the correct authority of the species name (e.g. Ipomoea diversifolia R.Br. is an accepted name, while Ipomoea diversifolia (Schumach. & Thonn.) Didr. is a synonym of Ipomoea sagittifolia Burm.f., which is a very different species from I. diversifolia R.Br.). Overlooking these details, or annotating the species name wrongly, can lead to further complications in interpreting a molecular phylogeny, with consequences to the systematics of the group in question.
  4. Taxonomist – The phylogenetic/phylogenomic study’s sampler should ideally be someone with adequate taxonomic experience or willing to receive taxonomic training that will prepare them for more easily spotting any incongruencies in the sampled specimens, e.g. misidentifications, contamination, material mixture, nomenclatural inconsistencies, and reliability of the taxonomic authorship of the determinations on the herbarium samples. Most herbaria rely upon the work of countless past-present-future plant taxonomists to ensure the best taxonomic rigour in curating their collections. Not all plant collections, for several reasons, will have all their specimens always kept up to date according to recent advances in taxonomy and systematics, especially in an age of quick advances in molecular systematics, aggravated by reductions in curatorial and taxonomic staff in most herbaria. For that reason, working in collaboration with taxonomic specialists is highly encouraged.
  5. Critical thinking – Challenging all preconceptions is vital to the advance of science and must always be the most important part of your study design. Keep an open mind to different explanations as to why results are not as expected. Try sampling at least two terminals for critical taxa, and explore a range of hypotheses that may explain your results, especially if they are completely unexpected or incongruent with current knowledge of the plant groups in question at that given time. Play detective and retrace your steps, as well as the botanical history of the specimens, to ensure that enough evidence, in addition to the molecular data, will support your results.

Conclusions

Molecular DNA sequences can be very helpful in classifying plant taxa when morphology is conflicting or of a doubtful interpretation, with molecular phylogenetic placement being a popular tool to potentially accelerate the discovery of systematic relationships. Nonetheless, it needs to be done with a critical assessment of the obtained results in the context of a range of biological information (i.e. macromorphology, micromorphology, ecology, reproductive biology, phytochemistry, etc.), particularly when the new hypotheses are disruptive to the current classification system, or incongruent with the current knowledge of the plant groups in question. Genetic and genomic techniques are, much like any others, prone to lapses, which further stresses the need for caution in adopting molecular phylogenetic results into a currently accepted classification system. In an era of expansion of molecular plant systematic techniques, we here draw attention to the vital role of morphology and experienced taxonomic skills in guaranteeing adequate and reliable sampling for molecular phylogenetic studies and the ability to critically interpret the obtained hypotheses in the light of a range of biological evidence, particularly the most easily accessible, morphological characters.

Acknowledgements

We thank the researchers and curators of all consulted herbaria (Cassio van den Berg, Maria Candida H. Mamede, Maria Lenise Guedes and Viviane Jono) for their assistance; and Climbiê F. Hall, Domingos Cardoso, Fabián Michelangeli, Geovane S. Siqueira, Melissa Hutchison, and Olivier Lachenaud for permission to reproduce their photographs. RFA was supported by a postdoctoral fellowship from CNPq (#317720/2021-0) and FAPEG (#202110267000867), Brazil.

References

  • Almeida RF, Morais IL (2022) Morphology of Malpighiaceae from Brazil - Part 1 - Vegetative. Universidade Estadual de Goiás, Quirinópolis, 1–44. https://doi.org/10.29327/5176599
  • Almeida RF, van den Berg C (2021) Molecular phylogeny and character mapping support generic adjustments in the Tetrapteroid clade (Malpighiaceae). Nordic Journal of Botany 39(1): e02876. https://doi.org/10.1111/njb.02876
  • Almeida RF, Cheek M, Pellegrini MOO, Morais IL, Simão-Bianchini R, Rattanakrajang P, Simões ARG (2023) Barking up the wrong tree: the importance of morphology in plant molecular phylogenetic studies. Arpha preprint. https://doi.org/10.3897/arphapreprints.e101292
  • Baker WJ, Bailey P, Barber V, Barker A, Bellot S, Bishop D, Botigue LR, Brewer G, Carruthers T, Clarkson JJ, Cook J, Cowan RS, Dodsworth S, Epitawalage N, Françoso E, Gallego B, Johnson M, Kim JT, Leempoel K, Maurin O, McGinnie C, Pokorny L, Roy S, Stone M, Toledo E, Wickett NJ, Zuntini AR, Eiserhardt WL, Kersey PJ, Leitch IJ, Forest F (2022) A comprehensive phylogenomic platform for exploring the angiosperm Tree of Life. Systematic Biology 71: 301–319. https://doi.org/10.1093/sysbio/syab035
  • Breteler FJ (2010) Description of a new species of Neuropeltis (Convolvulaceae) with a synopsis and a key to all African species. Plant Ecology and Evolution 143(2): 176–180. https://doi.org/10.5091/plecevo.2010.387
  • Cai L, Xi Z, Peterson K, Rushworth C, Beaulieu J, Davis CC (2016) Phylogeny of Elatinaceae and the tropical Gondwanan origin of the Centroplacaceae (Malpighiaceae, Elatinaceae) clade. PLoS ONE 11(9): e0161881. https://doi.org/10.1371/journal.pone.0161881
  • Chase MW, Soltis DE, Olmstead RG, Morgan D, Les DH, Mishler BD, Duvall MR, Price RA, Hills HG, Qiu YL, Kron KA, Rettig JH, Conti E, Palmer JD, Manhart JR, Sytsma KJ, Michaels HJ, Kress WJ, Karol KG, Clark WD, Hedren M, Gaut BS, Jansen RK, Kim KJ, Wimpee CF, Smith JF, Furnier GR, Strauss SH, Xiang QY, Plunkett GM, Soltis PS, Swensen SM, Williams SE, Gadek PA, Quinn CJ, Eguiarte LE, Golenberg E, Learn Jr GH, Graham SW, Barrett SCH, Dayanandan S, Albert VA (1993) Phylogenetics of seed plants: an analysis of nucleotide sequences from the plastid gene rbcL. Annals of the Missouri Botanical Garden 80(3): 528–580. https://doi.org/10.2307/2399846
  • Chen LY, Chen JM, Gituru RW, Wang QF (2013) Eurasian origin of Alismatidae inferred from statistical dispersal–vicariance analysis. Molecular Phylogenetics and Evolution 67: 38–42. https://doi.org/10.1016/j.ympev.2013.01.001
  • Darriba D, Taboada GL, Doallo R, Posada D (2012) jModelTest 2: more models, new heuristics and parallel computing. Nature Methods 9: 772. https://doi.org/10.1038/nmeth.2109
  • Davis CC, Chase MW (2004) Elatinaceae are sister to Malpighiaceae; Peridiscaceae belong to Saxifragales. American Journal of Botany 91: 262–273. https://doi.org/10.3732/ajb.91.2.262
  • Davis CC, Anderson WR (2010) A complete generic phylogeny of Malpighiaceae inferred from nucleotide sequence data and morphology. American Journal of Botany 97(12): 2031–2048. https://doi.org/10.3732/ajb.1000146
  • Downie SR, Spalik K, Katz-Downie DS, Reduron JP (2010) Major clades within Apiaceae subfamily Apioideae as inferred by phylogenetic analysis of nrDNA ITS sequences. Plant Diversity and Evolution 128(1–2): 111–136. https://doi.org/10.1127/1869-6155/2010/0128-0005
  • Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32(5): 1792–1797. https://doi.org/10.1093/nar/gkh340
  • Edler D, Klein J, Antonelli A, Silvestro D (2021) raxmlGUI 2.0: A graphical interface and toolkit for phylogenetic analyses using RAxML. Methods in Ecology and Evolution 12(2): 373–377. https://doi.org/10.1111/2041-210X.13512
  • Elliot TL, Larridon I, Barrett RL, Bruhl JJ, Costa SM, Escudero M, Hipp AL, Jimenez-Mejias P, Kirschner J, Luceno M, Marquez-Corro JI, Martin-Bravo S, Roalson EH, Semmouri I, Spalink D, Thomas WW, Villaverde T, Wilson KL, Muasya AM (2023) Addressing inconsistencies in Cyperaceae and Juncaceae taxonomy: comment on Brožová et al. (2022). Molecular Phylogenetics and Evolution 179: 107665. https://doi.org/10.1016/j.ympev.2022.107665
  • Kearse M, Moir R, Wilson A, Stones-Havas S, Cheung M, Sturrock S, Buxton S, Cooper A, Markowitz S, Duran C, Thierer T, Ashton B, Meintjes P, Drummond A (2012) Geneious Basic: an integrated and extendable desktop software platform for the organisation and analysis of sequence data. Bioinformatics 28(12): 1647–1649. https://doi.org/10.1093/bioinformatics/bts199
  • McCartha GL, Taylor CM, van der Ent A, Echevarria G, Gutiérrez GMN, Pollard AJ (2019) Phylogenetic and geographic distribution of nickel hyperaccumulation in neotropical Psychotria. American Journal of Botany 106(10): 1377–1385. https://doi.org/10.1002/ajb2.1362
  • Moonlight PW, Cardoso DBOS (2023) A taxonomic revision of Keraunea, including three new species and its phylogenetic realignment with Ehretiaceae (Boraginales). Phytokeys 219: 145–170. https://doi.org/10.3897/phytokeys.219.101779
  • Muñoz-Rodríguez P, Wood JRI, González LV, Davis CC, Goodwin ZA, Scotland RW (2022) Molecular analyses place the genus Keraunea outside Convolvulaceae. Harvard Papers in Botany 27(2): 221–227. https://doi.org/10.3100/hpib.v27iss2.2022.n11
  • Ortiz RDC, Kellogg EA, van der Werff H (2007) Molecular phylogeny of the moonseed family (Menispermaceae): implications for morphological diversification. American Journal of Botany 94: 1425–1438. https://doi.org/10.3732/ajb.94.8.1425
  • Owen CL, Marshall DC, Wade EJ, Meister R, Goemans G, Kunte K, Moulds M, Hill K, Villet M, Pham TH, Kortyna M, Lemmon EM, Lemmon AR, Simon C (2022) Detecting and removing sample contamination in phylogenomic data: an example and its implications for Cicadidae phylogeny (Insecta: Hemiptera). Systematic Biology 71(6): 1504–1523. https://doi.org/10.1093/sysbio/syac043
  • Pightling AW, Pettengill JB, Wang Y, Rand R, Strain E (2019) Within-species contamination of bacterial whole-genome sequence data has a greater influence on clustering analyses than between-species contamination. Genome Biology 20: 286. https://doi.org/10.1186/s13059-019-1914-x
  • Sepulveda AJ, Hoegh A, Gage JA, Caldwell ESL, Birch JM, Stratton C, Hutchins PR, Barnhart EP (2021) Integrating environmental DNA results with diverse datasets to improve biosurveillance of river health. Frontiers in Ecology and Evolution 9: 620715. https://doi.org/10.3389/fevo.2021.620715
  • Simões ARG, Eserman LA, Zuntini AR, Chatrou LW, Utteridge TMA, Maurin O, Rokni S, Roy S, Forest F, Baker WJ, Stefanović S (2022) A bird’s eye view of the systematics of Convolvulaceae: novel insights from nuclear genomic data. Frontiers in Plant Science 13: 889988. https://doi.org/10.3389/fpls.2022.889988
  • Staples GW (2006) Revision of Asiatic Poraneae (Convolvulaceae) – Cordisepalum, Dinetus, Duperreya, Porana, Poranopsis, and Tridynamia. Blumea 51(3): 43–491. https://doi.org/10.3767/000651906X622067
  • Staples GW, Brummitt RK (2007) Convolvulaceae. In: Heywood VH, Brummit RK, Culham A, Seberg O (Eds) Flowering plant families of the World. Kew Publishing, London, 108–110.
  • Thiers B (2023) Index Herbariorum: a global directory of public herbaria and associated staff. New York Botanical Garden’s Virtual Herbarium. https://sweetgum.nybg.org/science/ih/ [accessed 26.01.2023]
  • Wang N, McAllister HA, Bartlett PR, Buggs RJA (2016) Molecular phylogeny and genome size evolution of the genus Betula (Betulaceae). Annals of Botany 117(6): 1023–1035. https://doi.org/10.1093/aob/mcw048
  • Wang W, Li HL, Xiang XG, Chen ZD (2014) Revisiting the phylogeny of Ranunculeae: implications for divergence time estimation and historical biogeography. Journal of Systematics and Evolution 52(5): 551–565. https://doi.org/10.1111/jse.12101
  • Weiß CL, Schuenemann VJ, Devos J, Shirsekar G, Reiter E, Gould BS, Stinchcombe JR, Krause J, Burbano HA (2016) Temporal patterns of damage and decay kinetics of DNA retrieved from plant herbarium specimens. Royal Society Open Science 3: 160239. https://doi.org/10.1098/rsos.160239
  • Weissensteiner H, Forer L, Fendt L, Kheirkhah A, Salas A, Kronenberg F, Schoenherr S (2021) Contamination detection in sequencing studies using the mitochondrial phylogeny. Genome Research 31: 309–316. https://doi.org/10.1101/gr.256545.119
login to comment