Genetic tools weed out misconceptions of strain reliability in Cannabis sativa: implications for a budding industry
Unlike other plants, Cannabis sativa is excluded from regulation by the United States Department of Agriculture (USDA). Distinctive Cannabis varieties are ostracized from registration and therefore nearly impossible to verify. As Cannabis has become legal for medical and recreational consumption in many states, consumers have been exposed to a wave of novel Cannabis products with many distinctive names. Despite more than 2000 named strains being available to consumers, questions about the consistency of commercially available strains have not been investigated through scientific methodologies. As Cannabis legalization and consumption increases, the need to provide consumers with consistent products becomes more pressing. In this research, we examined commercially available, drug-type Cannabis strains using genetic methods to determine if the commonly referenced distinctions are supported and if samples with the same strain name are consistent when obtained from different facilities.
We developed ten de-novo microsatellite markers using the “Purple Kush” genome to investigate potential genetic variation within 30 strains obtained from dispensaries in three states. Samples were examined to determine if there is any genetic distinction separating the commonly referenced Sativa, Indica and Hybrid types and if there is consistent genetic identity found within strain accessions obtained from different facilities.
Although there was strong statistical support dividing the samples into two genetic groups, the groups did not correspond to commonly reported Sativa/Hybrid/Indica types. The analyses revealed genetic inconsistencies within strains, with most strains containing at least one genetic outlier. However, after the removal of obvious outliers, many strains showed considerable genetic stability.
We failed to find clear genetic support for commonly referenced Sativa, Indica and Hybrid types as described in online databases. Significant genetic differences within samples of the same strain were observed indicating that consumers could be provided inconsistent products. These differences have the potential to lead to phenotypic differences and unexpected effects, which could be surprising for the recreational user, but have more serious implications for patients relying on strains that alleviate specific medical symptoms.
Cultivation of Cannabis sativa L. dates back thousands of years (Abel 2013) but has been largely illegal worldwide for the best part of the last century. The U.S. Drug Enforcement Agency considers Cannabis a Schedule I drug with no “accepted medical use in treatment in the United States” (United States Congress n.d.), but laws allowing Cannabis for use as hemp, medicine, and some adult recreational use are emerging (ProCon 2018). Global restrictions have limited Cannabis related research, and there are relatively few genetic studies focused on strains (Lynch et al. 2016; Soler et al. 2017), but studies with multiple accessions of a particular strain show variation (Lynch et al. 2016; Soler et al. 2017; Sawler et al. 2015).
Currently, the Cannabis industry has no way to verify strains. Consequently, suppliers are unable to provide confirmation of strains, and consumers have to trust the printed name on a label matches the product inside the package. Reports of inconsistencies, along with the history of underground trading and growing in the absence of a verification system, reinforce the likelihood that strain names may be unreliable identifiers for Cannabis products at the present time. Without verification systems in place, there is the potential for misidentification and mislabeling of plants, creating names for plants of unknown origin, and even re-naming or re-labeling plants with prominent names for better sale. Cannabis taxonomy is complex (Emboden 1974; Schultes et al. 1974; Hillig 2005; Russo 2007; Clarke and Merlin 2013; Clarke et al. 2015; Clarke and Merlin 2016; Small et al. 1976; Small 2015a), but given the success of using genetic markers, such as microsatellites, to determine varieties in other crops, we suggest that similar genetic based approaches should be used to identify Cannabis strains in medical and recreational marketplaces.
There are an estimated
3.5 million medical marijuana patients in the United States (U.S.) (Leafly 2018b) and various levels of recent legalization in many states has led to a surge of new strains (Leafly 2018a; Wikileaf 2018). Breeders are producing new Cannabis strains with novel chemical profiles resulting in various psychotropic effects and relief for an array of symptoms associated with medical conditions including (but not limited to): glaucoma (Tomida et al. 2004), Chron’s Disease (Naftali et al. 2013), epilepsy (U.S. Food and Drug Administration 2018; Maa and Figi 2014), chronic pain, depression, anxiety, PTSD, autism, and fibromyalgia (Naftali et al. 2013; Cousijn et al. 2018; Ogborne et al. 2000; Borgelt et al. 2013; ProCon 2016).
There are primarily two Cannabis usage groups, which are well supported by genetic analyses (Lynch et al. 2016; Soler et al. 2017; Sawler et al. 2015; Dufresnes et al. 2017): hemp defined by a limit of 9 -tetrahydrocannabinol (THC) in the U.S., and marijuana or drug-types with moderate to high THC concentrations (always > 0.3% THC). Within the two major groups Cannabis has been further divided into strains (varietals) in the commercial marketplace, and particularly for the drug types, strains are assigned to one of three categories: Sativa which reportedly has uplifting and more psychotropic effects, Indica which reportedly has more relaxing and sedative effects, and Hybrid which is the result of breeding Sativa and Indica types resulting in intermediate effects. The colloquial terms Sativa, Hybrid, and Indica are used throughout this document even though these terms do not align with the current formal botanical taxonomy for Cannabis sativa and proposed Cannabis indica (McPartland 2017; Piomelli and Russo 2016). We feel the colloquial terminology is necessary here as the approach for this study was from a consumer view, and these are the terms offered as common descriptors for the general public (Leafly 2018a; Wikileaf 2018; cannabis.info 2018; NCSM 2018; PotGuide.com 2018; Seedfinder 2018). Genetic analyses have not provided a clear consensus for higher taxonomic distinction among these commonly described Cannabis types (Lynch et al. 2016; Sawler et al. 2015), and whether there is a verifiable difference between Sativa and Indica type strains is debated (McPartland 2017; Piomelli and Russo 2016; Erkelens and Hazekamp 2014). However, both the recreational and medical Cannabis communities claim there are distinct differences in effects between Sativa and Indica type strains (Leafly 2018a; Wikileaf 2018; cannabis.info 2018; NCSM 2018; PotGuide.com 2018; Seedfinder 2018; Leaf Science 2016; Smith 2012).
Female Cannabis plants are selected based on desirable characters (mother plants) and are produced through cloning and, in some cases, self-fertilization to produce seeds (Green 2005). Cloning allows Cannabis growers to replicate plants, ideally producing consistent products. There are an overwhelming number of Cannabis strains that vary widely in appearance, taste, smell and psychotropic effects (Leafly 2018a; Wikileaf 2018; cannabis.info 2018; NCSM 2018; PotGuide.com 2018; Seedfinder 2018). Online databases such as Leafly (2018a) and Wikileaf (2018), for example, provide consumers with information about strains but lack scientific merit for the Cannabis industry to regulate the consistency of strains. Other databases exist (cannabis.info 2018; NCSM 2018; PotGuide.com 2018; Seedfinder 2018), but the method of assignment to the three groups is often undisclosed, confounded, or mysterious. Wikileaf reports a numeric percentage of assignment to Sativa and/or Indica (Wikileaf 2018), which is why we chose it as our reference scale of ancestry, although there is some disagreement among online sources (Additional file 1: Table S1). To our knowledge, there have not been any published scientific studies specifically investigating the genetic consistency of strains at multiple points of sale for Cannabis consumers.
Breeders and growers choose Cannabis plants with desirable characters (phenotype) related to flowers, cannabinoid profile, and terpene production. Phenotype is a product of genotype and environment. Cannabis is considerably variable and extraordinarily plastic in response to varying environmental conditions (Onofri and Mandolino 2017). Therefore, determining sources of variation, at the most basic level, requires examining genetic differences. Strains propagated through cloning should have minimal genetic variation. Eight of the strains examined in this study are reportedly clone only strains indicating there should be little to no genetic variation within these strains. That being said, it is possible for mutations to accumulate over multiple generations of cloning (Gabriel et al. 1993; Hojsgaard and Horandl 2015), but these should not be widespread. Self-fertilization and subsequent seed production may also be used to grow a particular strain. With most commercial plant products growers go through multiple generations of self-fertilization and backcrossing to remove genetic variability within a strain and provide a consistent product (Riggs 1988). However, for many Cannabis strains, the extent of genetic variability stabilization is uncertain. It has been observed that novel Cannabis strains developed through crossing are often phenotypically variable (Green 2005), which could be the result of seed producers growing seeds that are not stabilized enough to produce a consistent phenotype. Soler et al. (2017) examined the genetic diversity and structure of Cannabis cultivars grown from seed and found considerable variation, suggesting that seed lots are not consistent. Given the uncertainties surrounding named Cannabis strains, genetic data provides an ideal path to examine how widespread genetic inconsistencies might be.
In the U.S., protection against commercial exploitation, trademarking, and recognition of intellectual property for developers of new plant cultivars is provided through the United States Department of Agriculture (USDA) and The Plant Variety Protection Act of 1970 (United States Department of Agriculture 1970). Traditionally, morphological characters were used to define new varieties in crops such as grapes (Vitis vinifera L.), olives (Olea europea L.) and apples (Malus domestica Borkh.). With the rapid development of new varieties in these types of crops, morphological characters have become increasingly difficult to distinguish. Currently, quantitative and/or molecular characters are often used to demonstrate uniqueness among varieties. Microsatellite genotyping enables growers and breeders of new cultivars to demonstrate uniqueness through variable genetic profiles (Rongwen et al. 1995). Microsatellite genotyping has been used to distinguish cultivars and hybrid varieties of multiple crop varietals within species (Rongwen et al. 1995; Guilford et al. 1997; Hokanson et al. 1998; Cipriani et al. 2002; Belaj et al. 2004; Sarri et al. 2006; Baldoni et al. 2009; Stajner et al. 2011; Costantini et al. 2005; Pellerone et al. 2001; Poljuha et al. 2008; Muzzalupo et al. 2009). Generally, 3–12 microsatellite loci are sufficient to accurately identify varietals and detect misidentified individuals (Cipriani et al. 2002; Belaj et al. 2004; Sarri et al. 2006; Baldoni et al. 2009; Poljuha et al. 2008; Muzzalupo et al. 2009). Cannabis varieties however, are not afforded any legal protections, as the USDA considers it an “ineligible commodity” (United States Department of Agriculture 2014) but genetic variety identification systems provide a model by which Cannabis strains could be developed, identified, registered, and protected.
We used a well-established genetic technique to compare commercially available C. sativa strains to determine if products with the same name purchased from different sources have genetic congruence. This study is highly unique in that we approached sample acquisition as a common retail consumer by purchasing flower samples from dispensaries based on what was available at the time of purchase. All strains were purchased as-is, with no additional information provided by the facility, other than the identifying label. This study aimed to determine if: (1) any genetic distinction separates the common perception of Sativa, Indica and Hybrid types; (2) consistent genetic identity is found within a variety of different strain accessions obtained from different facilities; (3) there is evidence of misidentification or mislabeling.
Cannabis samples for 30 strains were acquired from 20 dispensaries or donors in three states (Table 1). All samples used in this study were obtained legally from either retail (Colorado and Washington), medical (California) dispensaries, or as a donation from legally obtained samples (Greeley 1). DNA was extracted using a modified CTAB extraction protocol (Doyle 1987) with 0.035–0.100 g of dried flower tissue per extraction. Several databases exist with various descriptive Sativa and Indica assignments for thousands of strains (Additional file 1: Table S1). For this study proportions of Sativa and Indica phenotypes from Wikileaf (2018) were used. Analyses were performed on the full 122-sample dataset (Table 1). The 30 strains were assigned a proportion of Sativa according to online information (Table 2). Twelve of the 30 strains were designated as ‘popular’ due to higher availability among the dispensaries as well as online information reporting the most popular strains (Table 2) (Rahn 2015; Rahn 2016; Rahn et al. 2016; Escondido 2014). Results from popular strains are highlighted to show levels of variation in strains that are more widely available or that are in higher demand.
The Cannabis draft genome from “Purple Kush” (GenBank accession AGQN00000000.1) was scanned for microsatellite repeat regions using MSATCOMMANDER-1.0.8-beta (Faircloth 2008). Primers were developed de-novo flanking microsatellites with 3–6 nucleotide repeat units (Additional file 1: Table S2). Seven of the microsatellites had trinucleotide motifs, two had hexanucleotide motifs, and one had a tetranucleotide motif (Additional file 1: Table S2). One primer in each pair was tagged with a 5′ universal sequence (M13 or T7) so that a matching sequence with a fluorochrome tag could be incorporated via PCR (Schwabe et al. 2015). Ten primer pairs produced consistent peaks within the predicted size range and were used for the genetic analyses herein (Additional file 1: Table S2).
PCR and data scoring
Microsatellite loci (Additional file 1: Table S2) were amplified in 12 μL reactions using 1.0 μL DNA (10–20 ng/ μL), 0.6 μL fluorescent tag (5 μM; FAM, VIC, or PET), 0.6 μL non-tagged primer (5 μM), 0.6 μL tagged primer (0.5 μM), 0.7 μL dNTP mix (2.5 mM), 2.4 μL GoTaq Flexi Buffer (Promega, Madison, WI, USA), 0.06 μL GoFlexi taq polymerase (Promega), 0.06 μL BSA (Bovine Serum Albumin 100X), 0.5–6.0 μL MgCl or MgSO4, and 0.48–4.98 μL dH2O. An initial 5 min denaturing step was followed by thirty five amplification cycles with a 1 min denaturing at 95 °C, 1 min annealing at primer-specific temperatures and 1 min extension at 72 °C. Two multiplexes (Additional file 1: Table S2) based on fragment size and fluorescent tag were assembled and 2 μL of each PCR product were combined into multiplexes up to a total volume of 10 μL. From the multiplexed product, 2 μL was added to Hi-Di formamide and LIZ 500 size standard (Applied Biosystems, Foster City, CA, USA) for electrophoresis on a 3730 Genetic Analyzer (Applied Biosystems) at the Arizona State University DNA Lab. Fragments were sized using GENEIOUS 8.1.8 (Biomatters Ltd).
Genetic statistical analyses
GENALEX ver. 6.4.1 (Peakall and Smouse 2006; Peakall and Smouse 2012) was used to calculate deviation from Hardy–Weinberg equilibrium (HWE) and number of alleles for each locus (Additional file 1: Table S2). Linkage disequilibrium was tested using GENEPOP ver. 4.0.10 (Raymond and Rousset 1995; Rousset 2008). Presence of null alleles was assessed using MICRO-CHECKER (Van Oosterhout et al. 2004). Genotypes were analyzed using the Bayesian cluster analysis program STRUCTURE ver. 2.4.2 (Pritchard et al. 2000). Burn-in and run-lengths of 50,000 generations were used with ten independent replicates for each STRUCTURE analysis. STRUCTURE HARVESTER (Earl and vonHoldt 2012) was used to determine the K value to best describe the likely number of genetic groups for the data set. GENALEX produced a Principal Coordinate Analysis (PCoA) to examine variation in the dataset. Lynch & Ritland (1999) mean pairwise relatedness (r) statistics were calculated between all 122 samples resulting in 7381 pairwise r-values showing degrees of relatedness. For all strains the r-mean and standard deviation (SD) was calculated averaging among all samples. Obvious outliers were determined by calculating the lowest r-mean and iteratively removing those samples to determine the relatedness among the remaining samples in the subset. A graph was generated for 12 popular strains (Table 2) to show how the r-mean value change within a strain when outliers were removed.
The microsatellite analyses show genetic inconsistencies in Cannabis strains acquired from different facilities. While popular strains were widely available, some strains were found only at two dispensaries (Table 1). Since the aim of the research was not to identify specific locations where strain inconsistencies were found, dispensaries are coded to protect the identity of businesses.
There was no evidence of linkage-disequilibrium when all samples were treated as a single population. All loci deviate significantly from HWE, and all but one locus was monomorphic in at least two strains. All but one locus had excess homozygosity and therefore possibly null alleles. Given the inbred nature and extensive hybridization of Cannabis, deviations from neutral expectations are not surprising, and the lack of linkage-disequilibrium indicates that the markers are spanning multiple regions of the genome. The number of alleles ranged from 5 to 10 across the ten loci (Additional file 1: Table S2). There was no evidence of null alleles due to scoring errors.
STRUCTURE HARVESTER calculated high support (∆K = 146.56) for two genetic groups, K = 2 (Additional file 2: Figure S1). STRUCTURE assignment is shown in Fig. 1 with the strains ordered by the purported proportions of Sativa phenotype (Wikileaf 2018). A clear genetic distinction between Sativa and Indica types would assign 100% Sativa strains (“Durban Poison”) to one genotype and assign 100% Indica strains (“Purple Kush”) to the other genotype (Table 2, Fig. 1, Additional file 3: Figure S2). Division into two genetic groups does not support the commonly described Sativa and Indica phenotypes. “Durban Poison” and “Purple Kush” follow what we would expect if there was support for the Sativa/Indica division. Seven of nine “Durban Poison” (100% Sativa) samples had 96% assignment to genotype 1, and three of four “Purple Kush” (100% Indica) had 89% assignment to genotype 2 (Fig. 1, Additional file 3: Figure S2). However, samples of “Hawaiian” (90% Sativa) and “Grape Ape” (100% Indica) do not show consistent patterns of predominant assignment to genotype 1 or 2. Interestingly, two predominantly Sativa strains “Durban Poison” (100% Sativa) and “Sour Diesel” (90% Sativa) have 86 and 14% average assignment to genotype 1, respectively. Hybrid strains such as “Blue Dream” and “Tahoe OG” (50% Sativa) should result in some proportion of shared ancestry, with assignment to both genotype 1 and 2. Eight of nine samples of “Blue Dream” show > 80% assignment to genotype 1, and three of four samples of “Tahoe OG” show Fig. 1
Bar plot graphs generated from STRUCTURE analysis for 122 individuals from 30 strains dividing genotypes into two genetic groups, K=2. Samples were arranged by purported proportions from 100% Sativa to 100% Indica (Wikileaf 2018) and then alphabetically within each strain by city. Each strain includes reported proportion of Sativa in parentheses (Wikileaf 2018) and each sample includes the coded location and city from where it was acquired. Each bar indicates proportion of assignment to genotype 1 (blue) and genotype 2 (yellow)
A Principal Coordinate Analyses (PCoA) was conducted using GENALEX (Fig. 2). Principal Coordinate Analyses (PCoA) is organized by color from 100% Sativa types (red), through all levels of Hybrid types (green 50:50), to 100% Indica types (purple; Fig. 2). Strain types with the same reported proportions are the same color but have different symbols. The PCoA of all strains represents 14.90% of the variation in the data on coordinate axis 1, 9.56% on axis 2, and 7.07% on axis 3 (not shown).
Principal Coordinates Analysis (PCoA) generated in GENALEX using Nei’s genetic distance matrix. Samples are a color-coded continuum by proportion of Sativa (Table 1) with the strain name given for each sample: Sativa type (red: 100% Sativa proportion, Hybrid type (dark green:50% Sativa proportion), and Indica type (purple: 0% Sativa proportion). Different symbols are used to indicate different strains within reported phenotype. Coordinate axis 1 explains 14.29% of the variation, coordinate axis 2 explains 9.56% of the variation, and Coordinate axis 3 (not shown) explains 7.07%
Lynch & Ritland (1999) pairwise genetic relatedness (r) between all 122 samples was calculated in GENALEX. The resulting 7381 pairwise r-values were converted to a heat map using purple to indicate the lowest pairwise relatedness value (− 1.09) and green to indicate the highest pairwise relatedness value (1.00; Additional file 4: Figure S3). Comparisons are detailed for six popular strains (Fig. 3) to illustrate the relationship of samples from different sources and the impact of outliers. Values of close to 1.00 indicate a high degree of relatedness (Lynch and Ritland 1999), which could be indicative of clones or seeds from the same mother (Green 2005; SeedFinder 2018a). First order relatives (full siblings or mother-daughter) share 50% genetic identity (r-value = 0.50), second order relatives (half siblings or cousins) share 25% genetic identity (r-value = 0.25), and unrelated individuals are expected to have an r-value of 0.00 or lower. Negative values arise when individuals are less related than expected under normal panmictic conditions (Moura et al. 2013; Norman et al. 2017).
Heat maps of six prominent strains (a–f) using Lynch & Ritland (Faircloth 2008) pairwise genetic relatedness (r) values: purple indicates no genetic relatedness (minimum value -1.09) and green indicates a high degree of relatedness (maximum value 1.0). Sample strain names and location of origin are indicated along the top and down the left side of the chart. Pairwise genetic relatedness (r) values are given in each cell and cell color reflects the degree to which two individuals are related
Individual pairwise r-values were averaged within strains to calculate the overall r-mean as a measure of genetic similarity within strains which ranged from − 0.22 (“Tangerine”) to 0.68 (“Island Sweet Skunk”) (Table 3). Standard deviations ranged from 0.04 (“Jack Herer”) to 0.51 (“Bruce Banner”). The strains with higher standard deviation values indicate a wide range of genetic relatedness within a strain, while low values indicate that samples within a strain share similar levels of genetic relatedness. In order to determine how outliers impact the overall relatedness in a strain, the farthest outlier (lowest pairwise r-mean value) was removed and the overall r-means and SD values within strains were recalculated (Table 3). In all strains, the overall r-means increased when outliers were removed. In strains with more than three samples, a second outlier was removed and the overall r-means and SD values were recalculated. Overall r-means were used to determine degree of relatedness as clonal (or from stable seed; overall r-means > 0.9), first or higher order relatives (overall r-means 0.46–0.89), second order relatives (overall r-means 0.26–0.45), low levels of relatedness (overall r-means 0.00–0.25), and not related (overall r-means Table 3 Lynch & Ritland (1999) pairwise relatedness comparisons of overall r-means (Mean) and standard deviations (SD) for samples of 30 strains including r-mean and SD after the first and second (where possible) outliers were removed. Outliers were samples with the lowest r-mean
This graph indicates the mean pairwise genetic relatedness (r) initially (light purple), and after the removal of one (medium purple) or two (dark purple) outlying samples in 12 popular strains
Cannabis is becoming an ever-increasing topic of discussion, so it is important that scientists and the public can discuss Cannabis in a similar manner. Currently, not only are Sativa and Indica types disputed (Emboden 1974; Hillig 2005; Russo 2007; Clarke and Merlin 2013; Clarke et al. 2015; Clarke and Merlin 2016; McPartland 2017; Piomelli and Russo 2016; Small 2015b; De Meijer and Keizer 1996), but experts also are at odds about nomenclature for Cannabis (Emboden 1974; Hillig 2005; Russo 2007; Clarke and Merlin 2013; Clarke et al. 2015; Clarke and Merlin 2016; McPartland 2017; Piomelli and Russo 2016; Small 2015b; De Meijer and Keizer 1996). We postulated that genetic profiles from samples with the same strain identifying name should have identical, or at least, highly similar genotypes no matter the source of origin. The multiple genetic analyses used here address paramount questions for the medical Cannabis community and bring empirical evidence to support claims that inconsistent products are being distributed. An important element for this study is that samples were acquired from multiple locations to maximize the potential for variation among samples. Maintenance of the genetic integrity through genotyping is possible only following evaluation of genetic consistency and continuing to overlook this aspect will promote genetic variability and phenotypic variation within Cannabis. Addressing strain variability at the molecular level is of the utmost importance while the industry is still relatively new.
Genetic analyses have consistently found genetic distinction between hemp and marijuana, but no clear distinction has been shown between the common description of Sativa and Indica types (Lynch et al. 2016; Soler et al. 2017; Sawler et al. 2015; Dufresnes et al. 2017; De Meijer and Keizer 1996). We found high support for two genetic groups in the data (Fig. 1) but no discernable distinction or pattern between the described Sativa and Indica strains. The color-coding of strains in the PCoA for all 122 samples allows for visualization of clustering among similar phenotypes by color: Sativa (red/orange), Indica (blue/purple) and Hybrid (green) type strains (Fig. 2). If genetic differentiation of the commonly perceived Sativa and Indica types previously existed, it is no longer detectable in the neutral genetic markers used here. Extensive hybridization and selection have presumably created a homogenizing effect and erased evidence of potentially divergent historical genotypes.
Wikileaf maintains that the proportions of Sativa and Indica reported for strains are largely based on genetics and lineage (Nelson 2016), although online databases do not give scientific evidence for their categorization other than parentage information from breeders and expert opinions. This has seemingly become convoluted over time (Russo 2007; Clarke and Merlin 2013; Small 2015a; Small 2016). Our results show that commonly reported levels of Sativa, Indica and Hybrid type strains are often not reflected in the average genotype. For example, two described Sativa type strains “Durban Poison” and “Sour Diesel”, have contradicting genetic assignments (Fig. 1, Table 2). This analysis indicates strains with similar reported proportions of Sativa or Indica may have differing genetic assignments. Further illustrating this point is that “Bruce Banner”, “Flo”, “Jillybean”, “Pineapple Express”, “Purple Haze”, and “Tangerine” are all reported to be 60/40 Hybrid type strains, but they clearly have differing levels of admixture both within and among these reportedly similar strains (Table 2, Fig. 1). From these results, we can conclude that reported ratios or differences between Sativa and Indica phenotypes are not discernable using these genetic markers. Given the lack of genetic distinction between Indica and Sativa types, it is not surprising that reported ancestry proportions are also not supported.
To accurately address reported variation within strains, samples were purchased from various locations, as a customer, with no information of strains other than publicly available online information. Evidence for genetic inconsistencies is apparent within many strains and supported by multiple genetic analyses. Soler et al. (2017) found genetic variability among seeds from the same strain supplied from a single source, indicating genotypes within strains are variable. When examining the STRUCTURE genotype assignments, it is clear that many strains contained one or more divergent samples with a difference of > 0.10 genotype assignment (e.g. “Durban Poison” – Denver 1; Figs. 1, 3a). Of the 30 strains examined, only four strains had consistent STRUCTURE genotype assignment and admixture among all samples. The number of strains with consistent STRUCTURE assignments increased to 11 and 15 when one or two samples were ignored, respectively. These results indicate that half of the included strains showed relatively stable genetic identity among most samples. Six strains had only two samples, both of which were different (e.g., “Trainwreck” and “Headband”). The remaining nine strains in the analysis had more than one divergent sample (e.g., “Sour Diesel”) or had no consistent genetic pattern among the samples within the strain (e.g., “Girl Scout Cookies”; Table 3, Figs. 1, 2, Additional file 3: Figure S2). It is noteworthy that many of the strains used here fell into a range of genetic relatedness indicative of first order siblings (see Lynch & Ritland analysis below) when samples with high genetic divergence were removed from the data set (Table 3; Figs. 3, 4). Eight of the 30 strains examined are identified as clone only (Table 2). All eight of the strains described as clone only show differentiation of at least one sample within the strain (Fig. 1). For example, one sample of “Blue Dream” is clearly differentiated from the remaining eight, and “Girl Scout Cookies” has little genetic cohesiveness among the eight samples (Figs. 1, 2). Other genetic studies have similarly found genetic inconsistencies across samples within the same strain (Lynch et al. 2016; Soler et al. 2017; Sawler et al. 2015). These results lend support to the idea that unstable genetic lines are being used to produce seed.
While collecting samples from various dispensaries, it was noted that strains of “Chemdawg” had various different spellings of the strain name, as well as numbers and/or letters attached to the name. Without knowledge of the history of “Chemdawg”, the assumption was that these were local variations. These were acquired to include in the study to determine if and how these variants were related. Upon investigation of possible origins of “Chemdawg”, an interesting history was uncovered, especially in light of the results. Legend has it that someone named “Chemdog” (a person) grew the variations (“Chem Dog”, “Chem Dog D”, “Chem Dog 4”) from seeds he found in a single bag of Cannabis purchased at a Grateful Dead concert (Danko 2016). However, sampling suggests dispensaries use variations of the name, and more often the “Chemdawg” form of the name is used, albeit incorrectly (Danko 2016). The STRUCTURE analysis indicates only one “Chemdawg” individual has > 0.10 genetic divergence compared to the other six samples (Fig. 1, Additional file 3: Figure S2). Five of seven “Chemdawg” samples cluster in the PCoA (Fig. 2), and six of seven “Chemdawg” samples are first order relatives (r-value > 0.50; Table 3, Fig. 3). The history of “Chem Dog” is currently unverifiable, but the analysis supports that these variations could be from seeds of the same plant. This illustrates how Cannabis strains may have come to market in a non-traditional manner. Genetic analyses can add scientific support to the stories behind vintage strains and possibly help clarify the history of specific strains.
Genetic inconsistencies may come from both suppliers and growers of Cannabis clones and stable seed, because currently they can only assume the strains they possess are true to name. There is a chain of events from seed to sale that relies heavily on the supplier, grower, and dispensary to provide the correct product, but there is currently no reliable way to verify Cannabis strains. The possibility exists for errors in plant labeling, misplacement, misspelling (e.g. “Chem Dog” vs. “Chemdawg”), and/or relabeling along the entire chain of production. Although the expectation is that plants are labeled carefully and not re-labeled with a more desirable name for a quick sale, these misgivings must be considered. Identification by genetic markers has largely eliminated these types of mistakes in other widely cultivated crops such as grapes, olives and apples. Modern genetic applications can accurately identify varieties and can clarify ambiguity in closely related and hybrid species (Guilford et al. 1997; Hokanson et al. 1998; Sarri et al. 2006; Costantini et al. 2005; United States Department of Agriculture 2014).
Matching genotypes within the same strains were expected, but highly similar genotypes between samples of different strains could be the result of mislabeling or misidentification, especially when acquired from the same source. The pairwise genetic relatedness r-values were examined for incidence of possible mislabeling or re-labeling. There were instances in which different strains had r-values = 1.0 (Additional file 4: Figure S3), indicating clonal genetic relationships. Two samples with matching genotypes were obtained from the same location (“Larry OG” and “Tahoe OG” from San Luis Obispo 3). This could be evidence for mislabeling or misidentification because these two samples have similar names. It is unlikely that these samples from reportedly different strains have identical genotypes, and more likely that these samples were mislabeled at some point. Misspelling may also be a source of error, especially when facilities are handwriting labels. An example of possible misspelling may have occurred in the sample labeled “Chemdog 1” from Garden City 1. “Chemdawg 1”, a described strain, could have easily been misspelled, but it is unclear whether this instance is evidence for mislabeling or renaming a local variant. Inadvertent mistakes may carry through to scientific investigation where strains are spelled or labeled incorrectly. For example, Vergara et al. (2016) reports genome assemblies for “Chemdog” and “Chemdog 91” as they are reported in GenBank (GCA_001509995.1), but neither of these labels are recognized strain names. “Chemdawg” and “Chemdawg 91” are recognized strains (Leafly 2018a; Wikileaf 2018; cannabis.info 2018; NCSM 2018; PotGuide.com 2018; Seedfinder 2018), but according to the original source, the strain name “Chemdawg” is incorrect, and it should be “Chem Dog” (Danko 2016), but the name has clearly evolved among growers since it emerged in 1991 (Danko 2016). Another example that may lead to confusion is how information is reported in public databases. For example, data is available for the reported monoisolate of “Pineapple Banana Bubba Kush” in GenBank (SAMN06546749), and while “Pineapple Kush”, “Banana Kush” and “Bubba Kush” are known strains (Leafly 2018a; Wikileaf 2018; cannabis.info 2018; NCSM 2018; PotGuide.com 2018; Seedfinder 2018), the only record we found of “Pineapple Banana Bubba Kush” is in GenBank. This study has highlighted several possible sources of error and how genotyping can serve to uncover sources of variation. Although this study was unable to confirm sources of error, it is important that producers, growers and consumers are aware that there are errors and they should be documented and corrected whenever possible.
Over the last decade, the legal status of Cannabis has shifted and is now legal for medical and some recreational adult use, in the majority of the United States as well as several other countries that have legalized or decriminalized Cannabis. The recent legal changes have led to an unprecedented increase in the number of strains available to consumers. There are currently no baseline genotypes for any strains, but steps should be taken to ensure products marketed as a particular strain are genetically congruent. Although the sampling in this study was not exhaustive, the results are clear: strain inconsistency is evident and is not limited to a single source, but rather exists among dispensaries across cities in multiple states. Various suggestions for naming the genetic variants do not seem to align with the current widespread definitions of Sativa, Indica, Hybrid, and Hemp (Hillig 2005; Clarke and Merlin 2013). As our Cannabis knowledge base grows, so does the communication gap between scientific researchers and the public. Currently, there is no way for Cannabis suppliers, growers or consumers to definitively verify strains. Exclusion from USDA protections due to the Federal status of Cannabis as a Schedule I drug has created avenues for error and inconsistencies. Presumably, the genetic inconsistencies will often manifest as differences in overall effects (Minkin 2014). Differences in characteristics within a named strain may be surprising for a recreational user, but differences may be more serious for a medical patient who relies on a particular strain for alleviation of specific symptoms.
This study shows that in neutral genetic markers, there is no consistent genetic differentiation between the widely held perceptions of Sativa and Indica Cannabis types. Moreover, the genetic analyses do not support the reported proportions of Sativa and Indica within each strain, which is expected given the lack of genetic distinction between Sativa and Indica. There may be land race strains that phenotypically and genetically separate as Sativa and Indica types, however our sampling does not include an adequate number of these strains to define these as two potentially distinct genotypes. The recent and intense breeding efforts to create novel strains has likely merged the two types and blurred previous separation between the two types. However, categorizing strains this way helps consumers communicate their preference for a spectrum of effects (e.g.: Sativa-dominant Hybrid), and the vernacular usage will likely continue to be used, despite a lack of evidence of genetic differentiation.
Instances we found where samples within strains are not genetically similar, which is unexpected given the manner in which Cannabis plants are propagated. Although it is impossible to determine the source of these inconsistencies as they can arise at multiple points throughout the chain of events from seed to sale, we theorize misidentification, mislabeling, misplacement, misspelling, and/or relabeling are all possible. Especially where names are similar, there is the possibility for mislabeling, as was shown here. In many cases genetic inconsistencies within strains were limited to one or two samples. We feel that there is a reasonable amount of genetic similarity within many strains, but currently there is no way to verify the “true” genotype of any strain. Although the sampling here includes merely a fragment of the available Cannabis strains, our results give scientific merit to previously anecdotal claims that strains can be unpredictable.
Unlike other plants, Cannabis sativa is excluded from regulation by the United States Department of Agriculture (USDA). Distinctive Cannabis varieties are ostracized from registration and therefore nearly impossible to verify. As Cannabis has become legal for medical and recreational consumption in many states, consumers have been exposed to a wave of novel Cannabis products with many distinctive names. Despite more than 2000 named strains being available to consumers, questions about the consistency of commercially available strains have not been investigated through scientific methodologies. As Cannabis legalization and consumption increases, the need to provide consumers with consistent products becomes more pressing. In this research, we examined commercially available, drug-type Cannabis strains using genetic methods to determine if the commonly referenced distinctions are supported and if samples with the same strain name are consistent when obtained from different facilities. We developed ten de-novo microsatellite markers using the “Purple Kush” genome to investigate potential genetic variation within 30 strains obtained from dispensaries in three states. Samples were examined to determine if there is any genetic distinction separating the commonly referenced Sativa, Indica and Hybrid types and if there is consistent genetic identity found within strain accessions obtained from different facilities. Although there was strong statistical support dividing the samples into two genetic groups, the groups did not correspond to commonly reported Sativa/Hybrid/Indica types. The analyses revealed genetic inconsistencies within strains, with most strains containing at least one genetic outlier. However, after the removal of obvious outliers, many strains showed considerable genetic stability. We failed to find clear genetic support for commonly referenced Sativa, Indica and Hybrid types as described in online databases. Significant genetic differences within samples of the same strain were observed indicating that consumers could be provided inconsistent products. These differences have the potential to lead to phenotypic differences and unexpected effects, which could be surprising for the recreational user, but have more serious implications for patients relying on strains that alleviate specific medical symptoms.