Risk estimation using an exact method of fitting log-link models

Zhu, C

doi:10.25959/100.00038453

Zhu_whole_thesis.pdf (2.31 MB)

Risk estimation using an exact method of fitting log-link models

thesis

posted on 2023-05-28, 12:14 authored by Zhu, C

The relative risk has been widely reported as a ratio measure of association between covariates for study factors and a binary outcome of interest in medical research. It is possible to estimate relative risk through the log binomial model, a member of the family of generalised linear models with binomial errors and logarithmic link. However, since it was first proposed, this model has encountered numerical difficulties which restrict its use in studies using real-world data. The standard fitting algorithm of the log binomial model may fail to converge when the maximum likelihood (ML) solution is on the boundary of the allowable parameter space. If the ML solution lies on the boundary, special methods are needed because at least one vector of covariate values (referred to as boundary vector) has an estimated probability of unity when evaluated at the ML solution. For a model with a single covariate, Deddens et al. (2003) proposed an exact method based on re-parametrisation of the covariate. Petersen and Deddens (2010) proposed an extension of the exact method to general cases, but the method was incomplete, and the details to implement the method were missing. In this thesis, we provide details, including formulae (with proof) for estimating the covariances necessary to implement the method, explanation (with proof) of an inter dependency between coefficient estimates, and proof that the method can be applied in general. The relevant R package for implementing the exact method is provided. Another measure of the effect of a risk factor is the risk difference, which is recommended to be reported in clinical trials to assist clinicians in making evidence-based decisions about treatment allocation. It is possible to estimate the risk difference by fitting an identity-link binomial model. However, the standard fitting algorithm of the identity-link binomial model may fail to converge due to two sources of numerical difficulties. Use of an inadmissible starting value is sometimes responsible for failed convergence in the identity-link binomial model as it can cause the fitted probability of some observations to be less than zero or greater than unity. The standard fitting algorithm, therefore, may not be able to iteratively correct the results if it starts the iteration from an inadmissible starting value. To solve this problem, we have introduced a well-designed starting value calibration for obtaining an admissible starting value of a standard fitting algorithm in an identity-link binomial model. Numerical difficulties may also be encountered if the ML solution is on the boundary of the allowable parameter space. The standard fitting algorithm in the identity-link binomial model will usually fail to converge when the ML solution lies on the boundary of parameter space. Given its similarity to the log binomial model, an extension of the exact method is introduced to overcome the difficulties in the identity-link binomial model. However, there are two boundaries, lower and upper, in the parameter space of identity-link binomial models, whereas the log binomial model only has an upper bound. We provide a strategy to compare and locate the ML solution. Eight theorems and two corollaries with proof are presented to obtain the estimates of coefficients and the relevant variance-covariance matrix. We demonstrate the application of the exact method in detail using example data. A real-world dataset and a designed simulation are provided to further discuss and compare the results of the exact method with other approaches. The relevant R package for implementing the exact method is provided. The risk ratio/relative risk as a measure of effect is also used in the clustered/ longitudinal dataset. Fitting a marginal log binomial model estimated by generalised estimating equation (marginal LBM by GEE) provides a possible way to estimate the relative risk in correlated data. However, the algorithm may fail to converge even from admissible starting values. The previously published studies of the marginal LBM by GEE have focused on convergence rates and the selection of a working correlation structure. To date, there is no published work accounting for the causes of non-convergence or proposing remedies for it. By investigating data with convergence issues, we found that formulating the marginal LBM by GEE as a population-averaged model might also fail to converge or converge to an inappropriate solution when there is a fitted probability that is extraordinarily close or equal to unity. It is a similar issue to the log binomial model for independent data. We extend the exact method to the marginal LBM by GEE and provide details for its implementation. The properties of the exact estimator are investigated by simulation, and the results are compared with those of a marginal modified Poisson with log-link function estimated by generalised estimating equation (marginal Poisson by GEE). The relevant R package for implementing the exact method is provided. In this thesis, we studied the numerical difficulties in the log binomial model, the identity link binomial model and the marginal LBM by GEE. Two algorithms were introduced to address the difficulties due to the inadmissible starting values for the log and identity-link binomial model. In the presence of boundary vectors, the exact method is effective in estimating the coefficients of covariates in those three models. It can eliminate the influence of boundary vector and improve the model fitting.

History

Publication status

Unpublished

Rights statement

Repository Status

Open

Usage metrics

Keywords

relative risk/risk ratio risk difference convergence log binomial model GEE boundary maximum likelihood estimates

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Risk estimation using an exact method of fitting log-link models

History

Publication status

Rights statement

Repository Status

Usage metrics

Categories

Keywords

Licence

Exports