Zhigang Li Dartmouth Medical School, Hanover, NH 03755, USA ude.htuomtrad@iL.gnagihZ Ian W. McKeague Department of Biostatistics, Columbia University, 722 West 168th Street, New York, NY 10032, USA ude.aibmuloc@1312mi
We consider the problem of calculating power and sample size for tests based on generalized estimating equations (GEE), that arise in studies involving clustered or correlated data (e.g., longitudinal studies and sibling studies). Previous approaches approximate the power of such tests using the asymptotic behavior of the test statistics under fixed alternatives. We develop a more accurate approach in which the asymptotic behavior is studied under a sequence of local alternatives that converge to the null hypothesis at root-m rate, where m is the number of clusters. Based on this approach, explicit sample size formulae are derived for Wald and quasi-score test statistics in a variety of GEE settings. Simulation results show that in the important special case of logistic regression with exchangeable correlation structure, previous approaches can inflate the projected sample size (to obtain nominal 90% power using the Wald statistic) by over 10%, whereas the proposed approach provides an accuracy of around 2%.
Keywords: Clustered and correlated data, GEE, Local alternatives, Longitudinal data analysis, Marginal models
Power and sample size formulae play an important role in the design of experimental and observational studies. There is an extensive literature on this topic, especially for hypothesis tests based on the method of generalized estimating equations (GEE), as introduced by Liang and Zeger (1986) for handling correlated longitudinal or clustered data. In this setting, however, all previous sample size formulae have been derived using the asymptotic behavior of test statistics under fixed alternatives. In particular, fixed alternatives were used by Liu and Liang (1997) (henceforth LL) to derive sample size formulae for quasi-score statistics, and by Shih (1997) (henceforth Shih) for Wald test statistics, see also Rochon (1998), Pan (2001), Liu, Shih, and Gehan (2002), Jung and Ahn (2003, 2005), and Kim, Williamson and Lyles (2005).
In this article, a more accurate approach to power and sample size calculations in the GEE setting is developed using local asymptotic theory (Le Cam, (1960)). That is, rather than directly calculating the power of a test of H0 : ψ = ψ0 vs. a fixed alternative H1 : ψ = ψ1, where ψ is a p-vector representing the parameters of interest, we first calculate the asymptotic power of the test under a sequence of local alternatives H 1 m : ψ = ψ 0 + h ∕ m where m is the sample size (the number of clusters) and h is a fixed p-vector (the local parameter). The local asymptotic approach is considered to be standard in settings that do not involve clusters of correlated data (van der Vaart, (1998); Lehmann and Romano, (2005)), but it has not previously been attempted in the GEE setting as far as we know. For use of the approach in a survival analysis setting, see Lin, Yao, and Ying (1999).
The justification for our approach is provided by a result showing that in general GEE settings the Wald and quasi-score test statistics are asymptotically noncentral chi-squared under a sequence of local alternatives. This result also provides a suitable small sample approximation to the asymptotic power function involving a weighted average of the gradient of the estimating equation. Under marginal models, this approximation can be expressed directly in terms of the distribution of the covariates. This leads to explicit sample size formulae by inverting the small sample approximation to the asymptotic power function at the local parameter value h = m ( ψ 1 − ψ 0 ) , and solving for m in the usual way.
Our approach has several advantages over previous approaches. For the quasi-score test, previous methods of power calculation depend on knowing the limiting value of the nuisance parameter estimator under H1; such estimators are generally inconsistent under fixed alternatives, cf. Self and Mauritsen (1988). This can involve the additional burden of having to numerically find the root of a nonlinear equation; our approach, on the other hand, does not require this extra step because the nuisance parameter estimator is consistent under the local alternatives. For the Wald test, our approach has better accuracy. Detailed comparisons of our results with those of LL and Shih are made in the setting of marginal models with exchangeable correlation structure under various sampling designs.
The paper is organized as follows. Preliminary material about GEE is provided in Section 2. Our main results are stated and discussed in Section 3. In Section 4, we compare the sample size formulae provided by our approach with those of LL and Shih. The results of a simulation study comparing the accuracy of the various formulae are reported in Section 5. An application involving exposure to arsenic in drinking water is given in Section 6. Concluding remarks appear in Section 7.
In this section we provide the background material we need on the GEE method and marginal models (see Diggle, Heagerty, Liang, and Zeger, (2002); Fitzmaurice, Davidian, Verbeke, and Molenberghs, (2009)). For convenience, we consider longitudinal data as a special type of clustered data in which “cluster” can refer to (repeated measures on) a single subject, or a group of subjects.
Let m be the number of clusters and ni the number of units in the ith cluster, i = 1, … , m. Let yij denote the outcome, xij the p-vector of covariates of interest, zij the q-vector of confounding covariates, and μij the conditional mean for the jth unit in the ith cluster. The GEE approach of Liang and Zeger (1986) produces consistent parameter estimates given that the model for the marginal means μij is correctly specified, regardless of misspecification of the intracluster correlation matrix. The marginal model assumes
g ( μ i j ) = z i j T κ + x i j T ψ , i = 1 , 2 , … , m , j = 1 , 2 , … , n i ,where g is a known link function, ψ contains the parameters of interest, and k contains nuisance parameters, including the intercept. Let ϕ denote the univariate dispersion parameter of the model, an s-dimensional vector α denote the correlation parameters. The full vector of parameters is denoted β = (ϕ, α T , k T , ψ T ) T .
Let θ = (k T , ψ T ) T . The GEE approach provides a consistent estimator of θ by solving the following estimating equation, for given values of α and ϕ, regardless of misspecification of the intracluster correlation matrix:
∑ i = 1 m U i ( θ , α , ϕ ) = 0 ,where U i ( θ , α , ϕ ) = D i T V i − 1 S i , D i = ∂ μ i ∕ ∂ θ , S i = y i − μ i , and V i = Δ i 1 2 R i ( α ) Δ i 1 2 is the working covariance matrix. Here Δi = diag[Var(yi1), … , Var(yini)] and Ri(α) is the working correlation matrix (all such quantities being conditional on the covariates and cluster sizes). The parameters α and ϕ are usually unknown. Further estimating equations can be introduced to estimate them; see Fitzmaurice, Davidian, Verbeke, and Molenberghs (2009, Ch. 3) for detailed discussion. This results in a combined estimating equation for the full set of parameters β. Numerical methods to solve for β are widely implemented in statistical packages.
The dimension ni of yi is assumed to be uniformly bounded. We are interested in estimating β ∈ 𝔹 ⊂ ℝ k , a k-vector of unknown parameters indexing the conditional marginal means and variance-covariance matrices of the yi, where 𝔹 is compact. Let Ψi be a Borel function from ℝ ni × 𝔹 to ℝ k , i = 1, …, m, and
s m ( b ) = ∑ i = 1 m Ψ i ( y i , b ) , b ∈ Bwhere the estimating function Ψi also depends on covariates (suppressed in the notation; all expectations and variances are taken to be conditional on the covariates and the dimension ni). Suppose that Eβi(yi, β)> = 0 for all i, where the subscript β indicates the true value of the parameter. The GEE estimator, β ~ ∈ B , satisfies β ^ ∈ B , satisfies s m ( β ^ ) = 0 . Under mild conditions, see Shao (2003, Section 5.4) (henceforth Shao),
m ( β ^ − β ) → d N k ( 0 , Σ β )as m → ∞, where Σ β = lim m → ∞ m < M m ( β ) >− 1 [ ∑ i = 1 m Var β < Ψ i ( y i , β ) >] < M m ( β ) >− 1 , M m ( β ) = − E β < ∇ β s m ( β ) >, and ∇βsm(β) = ∂ sm(β) ∕ ∂ β. The limiting covariance matrix Σβ can be consistently estimated by replacing β with β ^ and Varβ i(yi, β)> with Ψ i ( y i , β ~ ) < Ψ i ( y i , β ~ ) >T ; the resulting estimator is denoted by Σ ^ .
In this section we first develop the local asymptotic behavior of the GEE estimators and the Wald and quasi-score statistics under the general setting in Section 2.2. Based on that, we propose a power and sample size calculation procedure for the GEE under marginal models. Throughout we restrict attention to the testing of a simple null hypothesis of the form H0 : ψ = ψ0 vs. the alternative H1 : ψ ≠ ψ0, where ψ is the vector consisting of the last p components of β. Let λ be the vector of the first k – p components of β, so β = (λ T , ψ T ) T , and for marginal models λ = (ϕ, α T , k T ) T .
The Wald statistic is given by
W m = m ( ψ ^ − ψ 0 ) T ( B Σ ^ B T ) − 1 ( ψ ^ − ψ 0 ) ,where B = (0p×(k−p), Ip) is the p × (k – p) zero matrix and Ip is the p × p identity matrix. Here and elsewhere, expressions involving B are simply used to extract the relevant submatrix or subvector. The quasi-score statistic or generalized score statistic (Rotnitzky and Jewell, (1990, p. 488), or Boos, (1992, p. 329)) is given by
T m = < B s m ( β ~ ) >T V T − 1 < B s m ( β ~ ) >, V T = < B M m ( β ~ ) − 1 B T >− 1 × [ B M m ( β ~ ) − 1 < ∑ i = 1 m Ψ i ( y i β ~ ) Ψ i ( y i , β ~ ) T >M m ( β ~ ) − 1 B T ] × < B M m ( β ~ ) − 1 B T >− 1 ,β ^ = ( λ ~ T , ψ 0 T ) T and λ ^ , constructed in the Appendix, is a consistent estimator of λ under H0. Both Wm and Tm converge in distribution under H0 to χ p 2 , and H0 is rejected at significance level ζ if the test statistic is greater than χ p , 1 − ζ 2 , where χ p , 1 − ζ 2 is the 100(1 – ζ)th percentile of χ p 2 .
We illustrate our approach to obtaining power and sample size for the quasi-score test; the procedure is the same for the Wald test and the power and sample size formulae for both tests coincide. The power function of the quasi-score test is ψ ↦ π m ( ψ ) = P ( T m ≥ χ p , 1 − ζ 2 ∣ ψ ≠ ψ 0 ) , the probability of rejecting the null hypothesis given that the true value of the parameter ψ belongs to the alternative. To obtain power or sample size under a specific value of ψ , say ψ = ψA, we propose to study the power based on a sequence of alternatives H 1 m : ψ = ψ 1 m = ψ 0 + h ∕ m that converge at root-m rate to the null hypothesis; here h is an arbitrarily fixed p-vector. We show that Wm is asymptotically noncentral chi-squared under H1m and approximate the power πm(ψ1m) under ψ = ψ1m based on this result. The power under the fixed alternative ψ = ψA given a sample size m is then obtained by specifying h so that ψ1m = ψA, in which case πm(ψ1m) becomes identically πm(ψA). This approach calculates power and sample sizes under the fixed value ψA although it is based on local asymptotic theory since ψA is on the path along which ψ1m converges to ψ0 as m → ∞. The justification for this procedure depends on finding the asymptotic behavior of ψ ^ under H1m, and we do this by extending the approach in Shao to a sequence of contiguous alternatives.
The results in this section are developed under the general setting described in Section 2.2. Various regularity conditions are needed, as provided in the Appendix. Let Pm denote the joint distribution of i>i=1,2,…m under H1m conditional on the covariates. Let β 0 = ( λ 0 T , ψ 0 T ) T and β m = ( λ 0 T , ψ 1 T ) T , where λ0 is the true value of λ. Our first result establishes the asymptotic behavior of ψ ^ under Pm as m → ∞; a sketch of the proof is given in the Appendix.
Under regularity conditions (C1)–(C9) in the Appendix, m ( ψ ^ − ψ 0 ) converges to Np(h, BΣβ0B T ) in distribution under Pm.
In the special case that the yi are i.i.d. and β ^ is the MLE of β, the limiting covariance matrix Σβ0 is the inverse of the Fisher information matrix, and our result for the local asymptotic distribution of m ( β ^ − β 0 ) reduces to the classical result obtained by applying Le Cam’s third lemma; see, e.g., Chapter 7 of van der Vaart (1998).
Theorem 5.14 in Shao can be considered as a special case of the above theorem with h = 0, where the asymptotic behavior of the GEE estimator ψ ^ is provided under a fixed value of ψ , see (2.4). The limiting covariance matrix remains the same under the local alternatives, namely the submatrix of Σβ0 corresponding to the components in ψ0. The major extra challenge to show the above result, beyond the theorem in Shao, is that the distributions of the yi are allowed to vary.
We now state our result giving the asymptotic behavior of Tm and Wm under H1m; a sketch of the proof is given in the Appendix. The result is the basis of our approach to obtain power and sample size.
Under regularity conditions (C1)–(C9) in the Appendix, the test statistics Tm and Wm converge under Pm in distribution to a noncentral chi-squared random variable χ p 2 ( ν ) with non-centrality parameter ν = h T (BΣβ0BT) – h.
In this section, we propose a power and sample size calculation procedure for GEE under marginal models. To carry out the calculations at the design stage of a study, we first need a good approximation to finite sample distributions of Tm and Wm. Theorem 2 provides such an approximation, but we propose to replace the non-centrality parameter ν by what we consider to be a better approximation for a given sample size m:
ν m = m ξ m ψ T Σ m ψ − 1 ξ m ψ ξmψ = BMm(β0)> −1 Eβmsm(β0)>, Σmψ = mBMm(β0)> −1 Varβmsm(β0)>Mm(β0)> −1 B T .Note that expressions (3.1)–(3.3) are conditional on the covariates and cluster sizes. If the design is non-random and prespecified, then these expressions could be used directly. However, in general, these expressions would not be available and would need to be estimated from pilot data. For the rest of this section we restrict attention to the case that the covariates and cluster sizes (zi, xi, ni), i = 1, … , m, are i.i.d., where xi = (xi1, xi2, …, xini) and zi = (zi1, zi2, …, zini). Suppose it is of interest to achieve nominal power 1 – η at significance level ζ under ψ = ψA. As mentioned, we replace h in the expression of νm by m ( ψ A − ψ 0 ) . Ideally, for the purpose of sample size calculation, we would integrate Mm, Eβmμ(β0)> and Varβmsm(β0)> over the distribution of (z1, x1, n1). After those steps (details are provided in Section 2 of the supplementary material), under the marginal model described in Section 2.1, the non-centrality parameter νm can be approximated by ν ~ m = m ξ ~ ψ T Σ ~ ψ − 1 ξ ~ ψ , where
ξ ~ ψ = B ‒ < E ( D 1 T V 1 − 1 D 1 ) >− 1 E [ D 1 T V 1 − 1 < μ 1 ( θ A ) − μ 1 ( θ 0 ) >] , Σ ~ ψ = B ‒ < E ( D 1 T V 1 − 1 D 1 ) >− 1 × [ E < D 1 T V 1 − 1 Var β A ( y 1 ∣ z 1 , x 1 , n 1 ) V 1 − 1 D 1 >] < E ( D 1 T V 1 − 1 D 1 ) >− 1 B ‒ T .Here D1 and ν1 are evaluated under H0, the expectations are taken with respect to the pre-specified distribution of (z1, x1, n1), θ A = ( κ 0 T , ψ A T ) T , β A = ( λ 0 T , ψ A T ) T , B ‒ = ( 0 p × q , i p ) , and λ 0 = ( θ 0 , α 0 T , κ 0 T ) T is the true value of the nuisance parameter. The power at ψ = ψA is given by
π m ( ψ A ) = P < χ p 2 ( m ξ ~ ψ T Σ ~ ψ − 1 ξ ~ ψ ) ≥ χ p , 1 − ζ 2 >.The final step is to solve an equation for sample size m for achieving 1 – η nominal power at significance level ζ:
πm(ψA) = 1 − ηwhere ν ~ satisfies P < χ p 2 ( ν ~ ) ≥ χ p , 1 − ζ 2 >= 1 − η , and then the sample size is given by
m = ν ~ ξ ~ ψ T Σ ~ ψ − 1 ξ ~ ψ .The vector h is chosen to be m ( ψ A − ψ 0 ) , so the fixed value ψA is on the path along which H1m converges to H0. The method calculates power and sample sizes under a fixed value ψA of ψ although it is based on local asymptotic theory since ψA is on the path along which ψ1m converges to ψ0 as m → ∞. We will see that our proposed method works better than previous approaches in various commonly seen cases with fixed alternative values, as discussed in a simulation study later.
An alternative approach is to calculate the power πm(ψ1m) with νm replaced by ν (cf. van der Vaart, 1998; Lehmann and Romano, 2005). To calculate ν, this approach uses the limiting variance BΣβ0B T that does not depend on the alternative value ψA. However, the variance of m ( ψ ^ − ψ 0 ) may depend on the alternative value ψA for a given sample size m under some models, e.g., logistic regression models, where variance of the outcome is a function of its mean.
In summary, our sample size calculation proceeds as follows,
Choose type I error rate ζ and power 1 – η. Specify the joint distribution of covariates and cluster size.Give ψ0, ψA and λ0 and specify the working correlation matrix (based on pilot data if available). Calculate D1 and ν1 in (2.2) under β = β0.
Calculate VarβA(y1|z1, x1, n1) using the choice of the working correlation matrix (in Step 3) in place of the true correlation matrix.
Based on the results in Steps 3 and 4, find, using numerical integration methods if necessary, the expectations within (3.5) under the joint distribution of the covariates and cluster size given in Step 2.
Calculate ψ0, ψA, λ0 and the sample size m according to (3.4), (3.5), (3.6), and (3.7).The joint distribution in Step 2 and values of the nuisance parameters in Step 3 may need to be estimated from pilot data. One possibility in Step 2 would simply be to use the empirical distribution of the pilot data; alternatively, a parametric model could be fit to the pilot data (e.g., the normal distribution in the arsenic study discussed in Section 6), in which case numerical integration might be needed in Step 5. Of course, if the true correlation matrix is known, it could be used in Step 4, but typically such information is not available at the design stage (cf., Liu and Liang (1997); Shih (1997); Liu, Shih, and Gehan (2002); Jung and Ahn (2005)). As mentioned by LL, sensitivity analysis should also be carried out to assess how the sample size varies according to changes of the specified working correlation matrix. Sample size re-estimation (SSR) is often carried out when interim data are available for updating design parameters (e.g., Shih (2001); Friede and Kieser (2006)). SSR for studies with correlated data have been actively discussed (e.g., Shih and Gould (1995); Lake et al. (2002); Zucker and Denne (2002); Yin and Shen (2005)).
In this section, we compare our approach with LL’s and Shih’s for several important examples of marginal models. The working correlation matrix is assumed to be the true correlation matrix. Background and discussion of LL’s and Shih’s approaches are provided in Section 3 and Section 4 of the supplementary material.
LL restricted attention to the case of discrete covariates, but for continuous covariates their approach requires an initial (ad hoc) discretization. As pointed out by Shieh (2000), there is no consensus in how the discretization should be done. For LL’s approach, it is also necessary to derive the limiting value of the nuisance parameter estimator under the alternative hypothesis. Compared to Shih’s approach, our approach has much better accuracy under various circumstances.
Consider a study with common cluster size n, continuous outcomes and covariates: xij ≡ xi, zij ≡ 1 (intercept), j = 1, … , n. For simplicity, suppose that the working correlation matrix Ri ≡ R coincides with the true correlation matrix. The standard linear regression version of (2.1) is μij = k+xiψ, i = 1,2, …, m, j = 1,2, … , n, and there is a constant conditional marginal variance Var(yij|xi) = σ 2 . The null hypothesis of interest is H0 : ψ = ψ0. To obtain a desired power 1 – η at significance level ζ for detecting ψ = ψA, the sample size from our approach (3.7) is
m = ( z 1 − ζ ∕ 2 + z 1 − η ) 2 ( 1 n T R − 1 1 n ) − 1 σ 2 ( ψ A − ψ 0 ) 2 Var ( x 1 ) ,where 1n is the n-vector of 1’s. Derivation of this formula is provided in Section 8 of the supplementary material. When the correlation has an exchangeable correlation structure with ρij = ρ, the numerator in (4.1) is (z1–ζ/2 + z1–η) 2 n – 1)ρ>σ 2 . Note that the presence of the intracluster correlation has effectively increased the noise variance from σ 2 to n – 1)ρ>σ 2 . The inflation factor 1 + (n – 1)ρ is known as the design effect (Scott and Holt (1982)), and provides a direct measure of how the sample size (needed to achieve a fixed nominal power) increases as a function of the intracluster correlation. The formula (4.1) agrees with Shih’s. In this example, LL’s approach requires (a) x1 has a discrete distribution, or (b) an ad hoc discretization is made to its continuous distribution if x1 is a continuous variable; in the former case, LL’s result agrees with (4.1).
This example uses the same assumptions as 4.1 above except that we use the logistic regression version of (2.1):
logit(μij) = κ + xiψ, i = 1, 2, …, n.The null hypothesis of interest is H0 : ψ = ψ0. To obtain a desired power 1 – η for detecting ψ = ψA, the sample size from (3.7) is
m = ( z 1 − ζ ∕ 2 + z 1 − η ) 2 ( 1 n T R − 1 1 n ) − 1 × < E ( v 1 x ) ( E ( x v 0 x ) ) 2 + E ( x 2 v 1 x ) ( E ( v 0 x ) ) 2 − 2 E ( s v 1 x ) E ( x v 0 x ) E ( v 0 x ) >[ E ( v 0 x ) E ( x p 1 x ) − E ( v 0 x ) E ( x p 0 x ) − E ( x v 0 x ) < E ( p 1 x ) − E ( p 0 x ) >] 2
m = ( z 1 − ζ ∕ 2 + z 1 − η ) 2 E ( v 1 x ) ( 1 n T R − 1 1 n ) − 1 ( ψ A − ψ 0 ) 2 [ E ( v 1 x ) E ( x 2 v 1 x ) − < E ( x v 1 x ) >2 ] .
Again, LL’s approach requires (a) x1 has a discrete distribution, or (b) an ad hoc discretization is made to its continuous distribution; in both cases, LL’s approach requires a solution to a nonlinear equation for k* in terms of a discrete distribution P(x1 = ul) = ωl, l = 1, … L:
∑ l = 1 L ω l < expit ( κ 0 + u l ψ A ) − expit ( κ ∗ + u l ψ 0 ) >= 0 .However, if the null hypothesis of interest is ψ0 = 0, this equation has an explicit solution and their result agrees with ours in case (a). In this example, there is a one-dimensional nuisance parameter and the equation is readily solved, but in general it would be more challenging to reach a solution. Moreover, the solution of this nonlinear equation is sensitive to the choice of discretization.
Consider a cohort study of m pairs of siblings in which one sibling is exposed and the other unexposed, and the outcome is binary (disease or non-disease). The covariates are xi = (xi1, xi2) T = (1, 0) T , where 1 and 0 designate exposed and unexposed, respectively, and zi = (zi1, zi2) T with zi1 = zi2 = 1 (intercept) for i = 1, …, m. Let ρ denote the common correlation between siblings. Logistic regression is used, and the cluster size n = 2. Let p0 and p1 denote the risk of the disease in unexposed and exposed subjects, respectively. Suppose the null hypothesis of interest is H0 : ψ = ψ0. To obtain a desired power 1 – η for detecting ψ = ψA, the sample size m from (3.7) is
m = ( z 1 − ζ ∕ 2 + z 1 − η ) 2 ( v 0 2 v 1 + v 0 v ~ 0 2 − 2 ρ v 0 v ~ 0 v 0 v 1 ) v 0 2 ( p 1 − p ~ 0 ) 2 ,
m = ( z 1 − ζ ∕ 2 + z 1 − η ) 2 ( v 0 + v 1 − 2 ρ v 0 v 1 ) ( ψ A − ψ 0 ) 2 v 0 v 1LL’s approach requires solving an equation for k*:
( 1 − ρ p 1 ∗ c 0 p 0 ∗ ) ( p 0 − p 0 ∗ ) + ( 1 − ρ c 0 p 0 ∗ p 1 ∗ ) ( p 1 − p 1 ∗ ) = 0 ,where c0 = exp(ψ0/2), p 0 ∗ = expit ( κ ∗ ) , and p 1 ∗ = expit ( κ ∗ + ψ 0 ) . Again, if the null hypothesis of interest is ψ0 = 0, (4.3) has the explicit solution k* = logit[(p0 + p1)/2], and their result agrees with ours.
In this section we report the results of a simulation study for some of the examples in the previous section in which the sample size formulae from previous approaches disagree with that from our approach. We generated 10, 000 replicated samples according to the marginal model in each setting. For each sample, we estimated the parameters and implemented the tests. The empirical power is given by the proportion of samples with test statistic value exceeding χ p , 1 − ζ 2 . The software SAS v9.2 was used to generate the data sets and calculate the test statistics. Accuracy of the sample size formula is then determined by how close the empirical power is to the nominal power. We set the type I error rate to be ζ = 0.05 and the nominal power at 90%. The Monte Carlo standard errors of the empirical powers reported in the simulation study are about 0.3%. In each example, we also provide the sample size that generates empirical power closest to the nominal power 90% in the simulation study and call it the “actual” sample size.
Consider a special case of Example 4.2 involving a two-group comparison study with common cluster size 2, binary outcomes (disease or non-disease) and covariates xij ≡ xi = 1 (exposed) or 0 (unexposed), and zij ≡ 1 (intercept), j = 1,2. Suppose half of the clusters are unexposed and the other half exposed. We set the reference risk of disease in the unexposed group to be p0 = 0.1, and set ψ0 = 0. The simulation results are presented in Table 7.1 . The empirical powers of the sample sizes from Shih’s approach are substantially larger than the nominal power 90%, whereas from our approach they are mostly within one or two Monte Carlo standard errors of the nominal power. The sample sizes from our approach almost coincide with the “actual”. Shih’s approach overestimates the sample sizes by over 8% for the first three cases, and over 10% for the last six cases. In particular, for the last three cases the sample sizes are overestimated by about 18%. On the other hand, our approach is accurate to within about 1–3% for the first six cases and 1–4% for the last three cases. The reason for the overestimation of sample size based on Shih’s method is unclear, but perhaps it is due to the use of the limiting variance of m ( ψ ^ − ψ A ) in the calculation of the non-centrality parameter, as discussed in Remark 4 of Section 3.
Simulation results comparing our approach with Shih’s approach in the two-sample comparison problem with a binary outcome and a cluster level binary exposure. Sample sizes are for achieving a nominal power 90% to detect different relative risks (p1/p0) at 0.05 significance level for different correlations ρ. The reference risk p0 is fixed at 0.1 and ψ0 = 0. Empirical powers are based on the Wald test. The “actual” are the sample sizes with empirical powers closest to the nominal power 90%.
Our approach | Shih’s approach | “Actual” | |||||
---|---|---|---|---|---|---|---|
RR | ρ | m | Power (%) | m | Power (%) | m | Power (%) |
2.5 | 0.20 | 156 | 89.82 | 172 | 92.43 | 158 | 89.96 |
0.50 | 195 | 89.47 | 215 | 92.28 | 196 | 90.03 | |
0.80 | 234 | 89.40 | 258 | 92.68 | 238 | 90.06 | |
3.0 | 0.20 | 95 | 89.32 | 110 | 93.65 | 96 | 90.11 |
0.50 | 119 | 89.52 | 138 | 93.52 | 122 | 90.49 | |
0.80 | 142 | 89.45 | 165 | 93.25 | 144 | 89.86 | |
3.5 | 0.20 | 65 | 89.11 | 79 | 93.85 | 66 | 89.87 |
0.50 | 81 | 88.20 | 99 | 94.28 | 84 | 89.62 | |
0.80 | 97 | 88.76 | 118 | 94.45 | 100 | 89.98 |
First consider the sibling study (Example 4.3) with p0 = 0.1 and ψ0 = 0, see Table 7.2 . Shih’s approach still overestimates the sample size. The sample sizes from our approach are all accurate to within 1–4% compared to the actual, whereas Shih’s approach overestimates the sample sizes by 6–18%. Table 7.3 compares our approach with LL when ψ0 changes to 0.5 (LL’s approach agrees with ours when ψ0 = 0). The empirical powers of the sample sizes from our approach are mostly within one Monte Carlo standard error of the nominal power, whereas those from LL’s approach fall outside such limits.
Simulation results comparing our approach with Shih’s approach in the sibling study with a binary outcome and a binary exposure. Sample sizes are for achieving a nominal power 90% to detect different relative risks (p1/p0) at 0.05 significance level for different correlations ρ. The reference risk p0 is fixed at 0.1 and ψ0 = 0. Empirical powers are based on the Wald test.
Our approach | Shih’s approach | “Actual” | |||||
---|---|---|---|---|---|---|---|
RR | ρ | m | Power (%) | m | Power (%) | m | Power (%) |
2.0 | 0.10 | 238 | 90.34 | 251 | 91.72 | 236 | 90.02 |
0.15 | 225 | 90.31 | 238 | 91.85 | 220 | 90.03 | |
0.20 | 213 | 90.93 | 225 | 91.95 | 209 | 89.96 | |
2.5 | 0.10 | 118 | 90.16 | 130 | 92.91 | 118 | 90.16 |
0.15 | 112 | 90.69 | 124 | 93.18 | 109 | 90.08 | |
0.20 | 106 | 90.70 | 117 | 93.37 | 102 | 89.96 | |
3.0 | 0.10 | 72 | 90.05 | 84 | 93.67 | 72 | 90.05 |
0.15 | 68 | 89.75 | 79 | 93.90 | 68 | 89.75 | |
0.20 | 65 | 90.36 | 75 | 93.88 | 64 | 90.00 |
Simulation results comparing our approach with LL’s approach in the sibling study with a binary outcome and a binary exposure. Sample sizes are for achieving a nominal power 90% to detect different effect sizes relative risks (p1/p0) at 0.05 significance level for different correlations ρ. Risk p0 is fixed at 0.1 and ψ0 = 0.5. Empirical powers are based on the quasi-score test.
Our approach | LL’s approach | “Actual” | |||||
---|---|---|---|---|---|---|---|
RR | ρ | m | Power (%) | m | Power(%) | m | Power (%) |
2.5 | 0.10 | 395 | 90.04 | 388 | 89.47 | 395 | 90.04 |
0.15 | 373 | 89.83 | 366 | 89.59 | 374 | 90.10 | |
0.20 | 351 | 89.82 | 345 | 89.11 | 351 | 89.82 | |
3.0 | 0.10 | 180 | 90.16 | 176 | 89.53 | 179 | 89.99 |
0.15 | 170 | 90.22 | 166 | 89.48 | 169 | 90.06 | |
0.20 | 160 | 90.55 | 157 | 89.81 | 158 | 89.95 | |
3.5 | 0.10 | 104 | 89.58 | 102 | 89.57 | 105 | 89.87 |
0.15 | 99 | 90.11 | 96 | 88.80 | 99 | 90.11 | |
0.20 | 93 | 90.19 | 91 | 89.70 | 92 | 89.88 | |
4.0 | 0.10 | 68 | 89.12 | 66 | 88.11 | 70 | 89.89 |
0.15 | 65 | 90.12 | 63 | 89.23 | 65 | 90.12 | |
0.20 | 61 | 90.37 | 59 | 89.50 | 61 | 90.37 |
Exposure to arsenic through drinking water represents a major threat to human health (Karagas (2010)). Recently, a cohort study has been initiated by the Children’s Environmental Health and Disease Prevention Center and Superfund Basic Research Program at Dartmouth College to investigate the impact of arsenic exposure on the health of pregnant women and children in New England. A primary aim of the study is to assess the association between arsenic exposure during pregnancy and infant growth. Suppose the binary outcome ‘short stature’ is to be recorded at 0.5, 1, 1.5, and 2 years of age, giving the cluster size of n = 4 in model (4.2). The proportion of ‘short’ children (k0 = log = –2.717 corresponding to xi = 0 in (4.2). Data indicate that mean log-aresenic concentration (μg/L) in the water supply in New Hampshire is –0.942, so in (4.2) we set xi = log-exposure–(–0.942). Pilot data on the study population (Gilbert-Diamond et al. (2012)) show that log-arsenic exposure is (approximately) distributed as N(–0.04, 4), so we specify xi ~ N(0.902, 4).
First we consider an AR(1) structure, ρ = ρ |i–j| . Pilot data on children’s heights in this cohort were not provided, so it is not possible to estimate ρ and we consider a range of values: ρ = 0.2, 0.5 and 0.8. For each ρ, we calculate the sample size needed to detect an odds ratio of 1.5 (ψA = 0.406) due to a unit increase in log-arsenic exposure (to achieve 90% power at significance level 0.05). Our approach gives sample sizes of 70, 105, and 157, and Shih’s gives 69, 103, and 154 for the values of ρ, respectively. Using LL’s approach with a discretization of 150 equispaced values within 3 standard deviations of the mean log-arsenic exposure, the sample sizes are 73, 109, and 164, respectively. Under an exchangeable structure with ρij = ρ, for ρ = 0.2, 0.5 and 0.8, the sample sizes are 84, 131, and 178 from our approach, 82, 128, and 174 from Shih’s, and 88, 137, and 185 from LL’s, respectively. In this example, we used Monte Carlo integration in Step 5. SAS code is provided in the supplementary material.
We have developed a method to calculate sample size for studies with correlated data. Under the framework of marginal models, our approach gives the same sample size formulae for the Wald and quasi-score tests. We study power under a sequence of alternatives that converge at root-m rate to the null hypothesis, and show that the statistics converge in distribution to a noncentral chi-squared. Then we link the sequence of alternatives to a fixed alternative by choosing a specific value of the local parameter vector h so that ψ1 is on the path along which H1m converges to H0.
As shown in simulation study, our approach provides considerable improvements in terms of accuracy. Hanfelt and Liang (1995) developed an approximate likelihood ratio test for GEE, that could be used as the basis for power and sample size calculations. We conjecture that our approach could be extended to the approximate likelihood ratio test and would lead to the same conclusion since the likelihood ratio test is equivalent to the other two tests for independent data.
We used correlation to measure the association between binary outcomes. An alternative to the correlation as a measure of association between pairs of binary responses is the odds ratio, which has a more straightforward interpretation. To estimate the odds ratio as a measure of association, a second set of estimating equations (Lipsitz, Laird, and Harrington (1991); Carey, Zeger, and Diggle (1993)) can be used. The power and sample size calculations approach developed in the present paper can readily be adapted to these settings.
A referee raised the question of whether our approach could be extended to ordinal or nominal outcomes. Lipsitz, Kim, and Zhao (1994) developed a GEE for such outcomes; This could be handled under our local asymptotic approach, but further work would be needed to develop a specific procedure. Another issue raised by a referee concerns the handling of missing data. We refer the interested reader to Robins, Rotnitzky, and Zhao (1995) who developed an inverse-weighted GEE method for missing data. Again, further work would be needed to develop a specific procedure.
Monte Carlo simulations are sometimes used to obtain power and sample size, especially for complicated designs where explicit sample size formulae are hard to derive. However, simulations have some serious disadvantages from a practical perspective: they are time-consuming, computational expertise is needed and, in the case of small sample sizes, the results can be highly sensitive to distributional assumptions. On the other hand, our formulae only require plugging a few numbers into a calculator (or at most some routine numerical integrations), are more appealing to practitioners, and distributional assumptions play a limited role. With an explicit sample size formula, we are able to calculate the minimal sample size to find the most efficient design strategy in the planning stages of a study. Even with complicated designs, where explicit formulae are not available, explicit sample size formulae are still useful for providing initial sample size estimates that can later be refined using simulations.
The authors are grateful for many helpful comments from an associate editor and the referees. The research of Dr. Ian McKeague was supported in part by NIH Grant R01GM095722. We thank Dr. Margaret Karagas and the Children’s Environmental Health and Disease Prevention Center at Dartmouth College for providing preliminary data in the arsenic study example (supported by P20 ES018175 and P42 ES007373 from the NIEHS and NIH; and RD-83459901-0 from the USEPA).
(C1) The parameter space 𝔹 is compact and β0 belongs to its interior.
(C2) There exists δ > 1 such that
c 0 ≡ sup m ≥ 1 max i = 1 , … , m E β m ∣ h i ( y i ) ∣ 1 = δ < ∞ and c 1 ≡ sup m ≥ 1 max i = 1 , … , m E β m ∣ y i ∣ δ < ∞
(C4) sup i ≥ 1 ∣ E b 1 < Ψ i ( y i , b ) >− E b 1 ∗ < Ψ i ( y i , b ∗ ) >∣ ≲ ∣ b 1 − b 1 ∗ ∣ + ∣ b − b ∗ ∣ for all b1, b 1 ∗ , b, b* ∈ 𝔹, where “≲” means “smaller than up to a constant”.
Condition (C2) is slightly stronger than the corresponding condition in Shao, namely that supi E|hi(yi)| 1+δ < ∞ and supi E|yi| δ < ∞ for some δ >0; we need the stronger condition because in our case the distribution of yi is changing with sample size. The consistency result established in the first part of the proof of Theorem 1 still holds if we replace δ > 1 by δ > 0 in (C2), but we need δ > 1 for the asymptotic normality result. Condition (C3) is also a little stronger because we make the additional assumption that the functions are uniformly bounded on 𝔹. Similar to the existence of a well-separated point of maximum in Theorem 5.7 in van der Vaart (1998), (C5) requires that fm(b) has a unique zero β0, and only values close to β0 yield a value of fm(b) close to zero. (C4) and (C7) are Lipschitz conditions.
Under marginal models, 1 m M m ( β 0 ) and 1 m ∑ i = 1 m var β 0 ( Ψ i ( y i , β 0 ) ) in (C8) reduce to 1 m ∑ i = 1 m D i T V i − 1 D i and
1 m ∑ i = 1 m D i T V i − 1 Cov ( y i ∣ z i , x i ) V i − 1 D i ,respectively; These are important components of the covariance matrix of the GEE estimator developed in Liang and Zeger (1986). Also, the matrix M(β0) in (C9) is the inverse of the asymptotic covariance of m ( β ^ − β 0 ) if the working correlation is correctly specified.
The detailed proof is provided in Section 6 of the supplementary material. Under (C1)–(C5), it can be shown that β ^ → P m β 0 and ψ ^ → P m ψ 0 by adapting the proof of Theorem 5.7 of van der Vaart (1998).
By using the extra conditions (C6)–(C9), we can further show that m ( β ^ − β 0 ) converges in distribution under Pm to N(h*, Σβ0), where h ∗ = ( 0 k − p T , h T ) T and 0k–p is the (k–p)-dimensional zero vector. The proof is completed by noticing that ψ ^ = B β ^ .
The detailed proof is provided in Section 7 of the supplementary material.
The first step is to show that Σ ^ converges in probability under H1m to Σβ0. Thus B Σ ^ B T converges in probability under H1m to BΣβ0B T . From Theorem 1, m ( ψ ^ − ψ 0 ) converges in distribution under H1m to N(h, BΣβ0B T ). Therefore, by Slutsky’s Lemma and the Continuous Mapping Theorem, Wm converges in distribution under H1m to noncentral χ p 2 with non-centrality parameter ν = h T (BΣβ0B T ) –1 h.
An estimate λ ^ of the nuisance parameter vector λ under H0 is needed to calculate the quasi-score statistic. For this purpose it suffices to use the first k – p estimating equations, so λ ^ can be taken as a solution of Csm(λ, ψ0) = 0, where C = (I(k–p) 0(k–p)×p). Recall that λ ^ and ψ0 combine to form β ^ . Write the quasi-score statistic as
T m = < m − 1 ∕ 2 B s m ( β ~ ) >T ( m − 1 V T ) − 1 < m − 1 ∕ 2 B s m ( β ~ ) >.It can be shown that Tm is asymptotically equivalent to Wm, concluding the proof.
Contents of supplementary material Detailed proofs, derivations of the sample size formulae, outlines of LL’s and Shih’s approaches, and SAS code for calculating the sample sizes in the arsenic example.