Some of the material in is restricted to members of the community. By logging in, you may be able to gain additional access to certain collections or items. If you have questions about access or logging in, please use the form on the Contact Page.
Shaw, K. J. (2022). Population Genetics Model Extensions: Including Indels, Quality Scores, and a Spatial Domain. Retrieved from https://purl.lib.fsu.edu/diginole/Shaw_fsu_0071E_17371
Due to the large variance in the stochastic nature of evolution and DNA, it can be difficult to be sure that inferences that use genetical data are perfectly correct. However, as we incorporate more data into our analysis then the confidence of our inferences increases. I investigate the inclusion of 3 types of data that are typically ignored. The use of deletions and insertions in the maximum likelihood calculations of trees is investigated as well as using the quality scores in those same calculations. I show that including this data increases the probability of the true evolutionary tree being the same tree that a maximum likelihood method will infer. I also investigate the effects a geographically continuous population has on the genetical patterns and rates of fixation and timing of fixation, and how those rates compare to panmictic models, which ignore spatial structure. Typically, Maximum likelihood techniques treat gaps in genetic data as unknown values. There have been a few attempts to build maximum likelihood techniques that include insertion and deletion processes, but these are usually very computationally expensive and infeasible in most cases. A Poisson Indel Process (PIP) has been developed, which reduces the theoretical computation of the likelihood calculations. I show how this method's computational cost can be decreased even further. I also show that it provides slightly more accurate inferences. All data gathering techniques introduce some sort of error. DNA sequencers quantify the confidence the program has in each nucleotide with a quality score. These quality scores represent the probability that the sequencer misclassified the nucleotide in question. Most Maximum likelihood techniques ignore these quality scores. At times when the scores are really low that data will be thrown out. I show how to incorporate these quality scores into a maximum likelihood calculation. This requires an adjustment to the traditional conversion of nucleotide data into numerical data into a more realistic biological and mathematical conversion. Modelling the spatial structure populations are often constrained to has proved to be difficult. There are infinite types of structure that can be imposed by a landscape, each structure having a different effect on the dynamics of the population. I investigate how a continuous landscape with individuals how can disperse only a finite distance affects how quickly new alleles spread through the population. The spread of neutral alleles is shown to be independent of population density. The expected probability and time to fixation are examined and shown to be affect by both the size and shape of the geographical area and the population density. The rate of spread for advantageous alleles is dependent on the population density, with diminishing returns as population density increases. The probability of fixation is only slightly dependent upon population density, it is mostly dependent upon selective advantage, but the expected time to fixation is heavily dependent on population density and geographical area.
Fixation, Indels, Maximum Likelihood, Population Genetics, Quality Scores, Wave of Advance
Date of Defense
July 6, 2022.
Submitted Note
A Dissertation submitted to the Department of Scientific Computing in partial fulfillment of the requirements for the degree of Doctor of Philosophy.
Bibliography Note
Includes bibliographical references.
Advisory Committee
Peter Beerli, Professor Directing Dissertation; Anuj Srivastava, University Representative; Bryan Quaife, Committee Member; Sachin Shanbhag, Committee Member; Alan Lemmon, Committee Member.
Publisher
Florida State University
Identifier
Shaw_fsu_0071E_17371
Shaw, K. J. (2022). Population Genetics Model Extensions: Including Indels, Quality Scores, and a Spatial Domain. Retrieved from https://purl.lib.fsu.edu/diginole/Shaw_fsu_0071E_17371