Mitch Cruzan

Date of Award

Spring 6-2-2017

Document Type


Degree Name

Master of Science (M.S.) in Biology



Physical Description

1 online resource (vi, 102 pages)


Phylogeography, Chloroplast DNA, Seeds -- Dispersal, Single nucleotide polymorphisms




Tracking seed dispersal using traditional, direct measurement approaches is difficult and generally underestimates dispersal distances. Variation in chloroplast haplotypes (cpDNA) offers a way to trace past seed dispersal and to make inferences about factors contributing to present patterns of dispersal. Although cpDNA generally has low levels of intraspecific variation, this can be overcome by assaying the whole chloroplast genome. Whole-genome sequencing is more expensive, but resources can be conserved by pooling samples. Unfortunately, haplotype associations among SNPs are lost in pooled samples and treating SNP frequencies as independent estimates of variation provides biased estimates of genetic distance. I have developed an application, CallHap, that uses a least-squares algorithm to evaluate the fit between observed and predicted SNP frequencies from pooled samples based on network topology, thus enabling pooling for chloroplast sequencing for large-scale studies of chloroplast genomic variation. This method was tested using artificially-constructed test networks and pools, and pooled samples of Lasthenia californica (California goldfields) from Whetstone Prairie, in Southern Oregon, USA. In test networks, CallHap reliably recovered network topologies and haplotype frequencies. Overall, the CallHap pipeline allows for the efficient use of resources for estimation of genetic distance for studies using non-recombining, whole-genome haplotypes, such as intra-specific variation in chloroplast, mitochondrial, bacterial, or viral DNA.

Persistent Identifier