Highly Stratified Model in Biostatistics (Mentor: Dr. Haimeng Zhang)

The Cox proportional hazards regression model is used in a number of areas
in biostatistics
and epidemiology to quantify the
effects of exposure on survival for a cohort of
individuals followed over time. In this model, a common but unspecified baseline
hazard function is assumed to apply to all cohort members. The relation between exposure and failure is the one of most
interest, and is modeled by the real parameter specifying the increased relative risk, having the exponential form
, say, for an individual with covariate . More explicitly, we assume that the conditional hazard function satisfies
given a time-independent covariates . In the case where information is available on the entire cohort, the
maximum partial likelihood estimator (MPLE) is often used. In practice, however, it becomes difficult and expensive to collect
complete data when dealing with large cohorts, and sampling schemes not only offer substantial savings, but ultimately
become
the only practical alternative. One of the simplest and popular sampling schemes, termed the nested case-control
sampling
(NCCS) design is to choose a fixed number of controls to compare to the failure at each failure time.
It has been shown that the MPLE is efficient in the sense that it achieves the asymptotic
variance lower bound if information is
available over every individual in the cohort. For sampling designs, however, the situation is quite different. It is not always clear
whether the estimators typically employed utilize the given sampled data in the most efficient manner. For NCCS, in particular, it has been shown that the MPLE estimator is not efficient in its use of available information in the time fixed covariates case. In counterpoint to such cases, we explore a highly stratified model, under which the MPLE based on the NCCS design approaches efficiency. More explicitly, under highly stratified situations or instances where the covariate values are increasingly less dependent upon the past and no censoring, the MPLE uses the available information efficiently in the limit as the number of cohort members tends to infinity. The study of this ``efficient" model is valuable for two reasons;
it limits the scope of the search
for estimators which can

Page:

**1**2