Some of the material in is restricted to members of the community. By logging in, you may be able to gain additional access to certain collections or items. If you have questions about access or logging in, please use the form on the Contact Page.
Scolnik, R. (2016). Predictive Accuracy Measures for Binary Outcomes: Impact of Incidence Rate and Optimization Techniques. Retrieved from http://purl.flvc.org/fsu/fd/FSU_2016SP_Scolnik_fsu_0071E_13146
Evaluating the performance of models predicting a binary outcome can be done using a variety of measures. While some measures intend to describe the model's overall fit, others more accurately describe the model's ability to discriminate between the two outcomes. If a model fits well but doesn't discriminate well, what does that tell us? Given two models, if one discriminates well but has poor fit while the other fits well but discriminates poorly, which of the two should we choose? The measures of interest for our research include the area under the ROC curve, Brier Score, discrimination slope, Log-Loss, R-squared and F-score. To examine the underlying relationships among all of the measures, real data and simulation studies are used. The real data comes from multiple cardiovascular research studies and the simulation studies are run under general conditions and also for incidence rates ranging from 2% to 50%. The results of these analyses provide insight into the relationships among the measures and raise concern for scenarios when the measures may yield different conclusions. The impact of incidence rate on the relationships provides a basis for exploring alternative maximization routines to logistic regression. While most of the measures are easily optimized using the Newton-Raphson algorithm, the maximization of the area under the ROC curve requires optimization of a non-linear, non-differentiable function. Usage of the Nelder-Mead simplex algorithm and close connections to economics research yield unique parameter estimates and general asymptotic conditions. Using real and simulated data to compare optimizing the area under the ROC curve to logistic regression further reveals the impact of incidence rate on the relationships, significant increases in achievable areas under the ROC curve, and differences in conclusions about including a variable in a model.
A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy.
Bibliography Note
Includes bibliographical references.
Advisory Committee
Daniel McGee, Professor Co-Directing Thesis; Elizabeth Slate, Professor Co-Directing Thesis; Isaac Eberstein, University Representative; Fred Huffer, Committee Member.
Publisher
Florida State University
Identifier
FSU_2016SP_Scolnik_fsu_0071E_13146
Use and Reproduction
This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). The copyright in theses and dissertations completed at Florida State University is held by the students who author them.
Scolnik, R. (2016). Predictive Accuracy Measures for Binary Outcomes: Impact of Incidence Rate and Optimization Techniques. Retrieved from http://purl.flvc.org/fsu/fd/FSU_2016SP_Scolnik_fsu_0071E_13146