Power and Sample Size Calculations for Generalized Estimating Equations via Local Asymptotics

Zhigang Li Dartmouth Medical School, Hanover, NH 03755, USA ude.htuomtrad@iL.gnagihZ Ian W. McKeague Department of Biostatistics, Columbia University, 722 West 168th Street, New York, NY 10032, USA ude.aibmuloc@1312mi

Associated Data

Supplementary material. GUID: 3CE174E2-7FFD-4E46-97AE-DF3EA0B9746C

Abstract

We consider the problem of calculating power and sample size for tests based on generalized estimating equations (GEE), that arise in studies involving clustered or correlated data (e.g., longitudinal studies and sibling studies). Previous approaches approximate the power of such tests using the asymptotic behavior of the test statistics under fixed alternatives. We develop a more accurate approach in which the asymptotic behavior is studied under a sequence of local alternatives that converge to the null hypothesis at root-m rate, where m is the number of clusters. Based on this approach, explicit sample size formulae are derived for Wald and quasi-score test statistics in a variety of GEE settings. Simulation results show that in the important special case of logistic regression with exchangeable correlation structure, previous approaches can inflate the projected sample size (to obtain nominal 90% power using the Wald statistic) by over 10%, whereas the proposed approach provides an accuracy of around 2%.

Keywords: Clustered and correlated data, GEE, Local alternatives, Longitudinal data analysis, Marginal models

1. Introduction

Power and sample size formulae play an important role in the design of experimental and observational studies. There is an extensive literature on this topic, especially for hypothesis tests based on the method of generalized estimating equations (GEE), as introduced by Liang and Zeger (1986) for handling correlated longitudinal or clustered data. In this setting, however, all previous sample size formulae have been derived using the asymptotic behavior of test statistics under fixed alternatives. In particular, fixed alternatives were used by Liu and Liang (1997) (henceforth LL) to derive sample size formulae for quasi-score statistics, and by Shih (1997) (henceforth Shih) for Wald test statistics, see also Rochon (1998), Pan (2001), Liu, Shih, and Gehan (2002), Jung and Ahn (2003, 2005), and Kim, Williamson and Lyles (2005).

In this article, a more accurate approach to power and sample size calculations in the GEE setting is developed using local asymptotic theory (Le Cam, (1960)). That is, rather than directly calculating the power of a test of H₀ : ψ = ψ₀ vs. a fixed alternative H₁ : ψ = ψ₁, where ψ is a p-vector representing the parameters of interest, we first calculate the asymptotic power of the test under a sequence of local alternatives H 1 m : ψ = ψ 0 + h ∕ m where m is the sample size (the number of clusters) and h is a fixed p-vector (the local parameter). The local asymptotic approach is considered to be standard in settings that do not involve clusters of correlated data (van der Vaart, (1998); Lehmann and Romano, (2005)), but it has not previously been attempted in the GEE setting as far as we know. For use of the approach in a survival analysis setting, see Lin, Yao, and Ying (1999).

The justification for our approach is provided by a result showing that in general GEE settings the Wald and quasi-score test statistics are asymptotically noncentral chi-squared under a sequence of local alternatives. This result also provides a suitable small sample approximation to the asymptotic power function involving a weighted average of the gradient of the estimating equation. Under marginal models, this approximation can be expressed directly in terms of the distribution of the covariates. This leads to explicit sample size formulae by inverting the small sample approximation to the asymptotic power function at the local parameter value h = m ( ψ 1 − ψ 0 ) , and solving for m in the usual way.

Our approach has several advantages over previous approaches. For the quasi-score test, previous methods of power calculation depend on knowing the limiting value of the nuisance parameter estimator under H₁; such estimators are generally inconsistent under fixed alternatives, cf. Self and Mauritsen (1988). This can involve the additional burden of having to numerically find the root of a nonlinear equation; our approach, on the other hand, does not require this extra step because the nuisance parameter estimator is consistent under the local alternatives. For the Wald test, our approach has better accuracy. Detailed comparisons of our results with those of LL and Shih are made in the setting of marginal models with exchangeable correlation structure under various sampling designs.

The paper is organized as follows. Preliminary material about GEE is provided in Section 2. Our main results are stated and discussed in Section 3. In Section 4, we compare the sample size formulae provided by our approach with those of LL and Shih. The results of a simulation study comparing the accuracy of the various formulae are reported in Section 5. An application involving exposure to arsenic in drinking water is given in Section 6. Concluding remarks appear in Section 7.

2. Preliminaries

In this section we provide the background material we need on the GEE method and marginal models (see Diggle, Heagerty, Liang, and Zeger, (2002); Fitzmaurice, Davidian, Verbeke, and Molenberghs, (2009)). For convenience, we consider longitudinal data as a special type of clustered data in which “cluster” can refer to (repeated measures on) a single subject, or a group of subjects.

2.1. Generalized estimating equations and marginal models

Let m be the number of clusters and n_i the number of units in the ith cluster, i = 1, … , m. Let y_ij denote the outcome, x_ij the p-vector of covariates of interest, z_ij the q-vector of confounding covariates, and μ_ij the conditional mean for the jth unit in the ith cluster. The GEE approach of Liang and Zeger (1986) produces consistent parameter estimates given that the model for the marginal means μ_ij is correctly specified, regardless of misspecification of the intracluster correlation matrix. The marginal model assumes

g ( μ i j ) = z i j T κ + x i j T ψ , i = 1 , 2 , … , m , j = 1 , 2 , … , n i ,

where g is a known link function, ψ contains the parameters of interest, and k contains nuisance parameters, including the intercept. Let ϕ denote the univariate dispersion parameter of the model, an s-dimensional vector α denote the correlation parameters. The full vector of parameters is denoted β = (ϕ, α T , k T , ψ T ) T .

Let θ = (k T , ψ T ) T . The GEE approach provides a consistent estimator of θ by solving the following estimating equation, for given values of α and ϕ, regardless of misspecification of the intracluster correlation matrix:

∑ i = 1 m U i ( θ , α , ϕ ) = 0 ,

where U i ( θ , α , ϕ ) = D i T V i − 1 S i , D i = ∂ μ i ∕ ∂ θ , S i = y i − μ i , and V i = Δ i 1 2 R i ( α ) Δ i 1 2 is the working covariance matrix. Here Δ_i = diag[Var(y_i₁), … , Var(y_{in_i})] and R_i(α) is the working correlation matrix (all such quantities being conditional on the covariates and cluster sizes). The parameters α and ϕ are usually unknown. Further estimating equations can be introduced to estimate them; see Fitzmaurice, Davidian, Verbeke, and Molenberghs (2009, Ch. 3) for detailed discussion. This results in a combined estimating equation for the full set of parameters β. Numerical methods to solve for β are widely implemented in statistical packages.

2.2. General setting

The dimension n_i of y_i is assumed to be uniformly bounded. We are interested in estimating β ∈ 𝔹 ⊂ ℝ k , a k-vector of unknown parameters indexing the conditional marginal means and variance-covariance matrices of the y_i, where 𝔹 is compact. Let Ψ_i be a Borel function from ℝ n_i × 𝔹 to ℝ k , i = 1, …, m, and

s m ( b ) = ∑ i = 1 m Ψ i ( y i , b ) , b ∈ B

where the estimating function Ψ_i also depends on covariates (suppressed in the notation; all expectations and variances are taken to be conditional on the covariates and the dimension n_i). Suppose that E_β_i(y_i, β)> = 0 for all i, where the subscript β indicates the true value of the parameter. The GEE estimator, β ~ ∈ B , satisfies β ^ ∈ B , satisfies s m ( β ^ ) = 0 . Under mild conditions, see Shao (2003, Section 5.4) (henceforth Shao),

m ( β ^ − β ) → d N k ( 0 , Σ β )

as m → ∞, where Σ β = lim m → ∞ m < M m ( β ) >− 1 [ ∑ i = 1 m Var β < Ψ i ( y i , β ) >] < M m ( β ) >− 1 , M m ( β ) = − E β < ∇ β s m ( β ) >, and ∇_βs_m(β) = ∂ s_m(β) ∕ ∂ β. The limiting covariance matrix Σ_β can be consistently estimated by replacing β with β ^ and Var_β _i(y_i, β)> with Ψ i ( y i , β ~ ) < Ψ i ( y i , β ~ ) >T ; the resulting estimator is denoted by Σ ^ .

3. Power and sample size calculation method

In this section we first develop the local asymptotic behavior of the GEE estimators and the Wald and quasi-score statistics under the general setting in Section 2.2. Based on that, we propose a power and sample size calculation procedure for the GEE under marginal models. Throughout we restrict attention to the testing of a simple null hypothesis of the form H₀ : ψ = ψ₀ vs. the alternative H₁ : ψ ≠ ψ₀, where ψ is the vector consisting of the last p components of β. Let λ be the vector of the first k – p components of β, so β = (λ T , ψ T ) T , and for marginal models λ = (ϕ, α T , k T ) T .

The Wald statistic is given by

W m = m ( ψ ^ − ψ 0 ) T ( B Σ ^ B T ) − 1 ( ψ ^ − ψ 0 ) ,

where B = (0_p×(k−p), I_p) is the p × (k – p) zero matrix and I_p is the p × p identity matrix. Here and elsewhere, expressions involving B are simply used to extract the relevant submatrix or subvector. The quasi-score statistic or generalized score statistic (Rotnitzky and Jewell, (1990, p. 488), or Boos, (1992, p. 329)) is given by

T m = < B s m ( β ~ ) >T V T − 1 < B s m ( β ~ ) >, V T = < B M m ( β ~ ) − 1 B T >− 1 × [ B M m ( β ~ ) − 1 < ∑ i = 1 m Ψ i ( y i β ~ ) Ψ i ( y i , β ~ ) T >M m ( β ~ ) − 1 B T ] × < B M m ( β ~ ) − 1 B T >− 1 ,

β ^ = ( λ ~ T , ψ 0 T ) T and λ ^ , constructed in the Appendix, is a consistent estimator of λ under H₀. Both W_m and T_m converge in distribution under H₀ to χ p 2 , and H₀ is rejected at significance level ζ if the test statistic is greater than χ p , 1 − ζ 2 , where χ p , 1 − ζ 2 is the 100(1 – ζ)th percentile of χ p 2 .

We illustrate our approach to obtaining power and sample size for the quasi-score test; the procedure is the same for the Wald test and the power and sample size formulae for both tests coincide. The power function of the quasi-score test is ψ ↦ π m ( ψ ) = P ( T m ≥ χ p , 1 − ζ 2 ∣ ψ ≠ ψ 0 ) , the probability of rejecting the null hypothesis given that the true value of the parameter ψ belongs to the alternative. To obtain power or sample size under a specific value of ψ , say ψ = ψ_A, we propose to study the power based on a sequence of alternatives H 1 m : ψ = ψ 1 m = ψ 0 + h ∕ m that converge at root-m rate to the null hypothesis; here h is an arbitrarily fixed p-vector. We show that W_m is asymptotically noncentral chi-squared under H₁_m and approximate the power π_m(ψ₁_m) under ψ = ψ₁_m based on this result. The power under the fixed alternative ψ = ψ_A given a sample size m is then obtained by specifying h so that ψ₁_m = ψ_A, in which case π_m(ψ₁_m) becomes identically π_m(ψ_A). This approach calculates power and sample sizes under the fixed value ψ_A although it is based on local asymptotic theory since ψ_A is on the path along which ψ₁_m converges to ψ₀ as m → ∞. The justification for this procedure depends on finding the asymptotic behavior of ψ ^ under H₁_m, and we do this by extending the approach in Shao to a sequence of contiguous alternatives.

3.1. Local asymptotics

The results in this section are developed under the general setting described in Section 2.2. Various regularity conditions are needed, as provided in the Appendix. Let P_m denote the joint distribution of i>_i=1,2,…m under H₁_m conditional on the covariates. Let β 0 = ( λ 0 T , ψ 0 T ) T and β m = ( λ 0 T , ψ 1 T ) T , where λ₀ is the true value of λ. Our first result establishes the asymptotic behavior of ψ ^ under P_m as m → ∞; a sketch of the proof is given in the Appendix.

Theorem 1

Under regularity conditions (C1)–(C9) in the Appendix, m ( ψ ^ − ψ 0 ) converges to N_p(h, BΣ_β₀B T ) in distribution under P_m.

Remark 1

In the special case that the y_i are i.i.d. and β ^ is the MLE of β, the limiting covariance matrix Σ_β₀ is the inverse of the Fisher information matrix, and our result for the local asymptotic distribution of m ( β ^ − β 0 ) reduces to the classical result obtained by applying Le Cam’s third lemma; see, e.g., Chapter 7 of van der Vaart (1998).

Remark 2

Theorem 5.14 in Shao can be considered as a special case of the above theorem with h = 0, where the asymptotic behavior of the GEE estimator ψ ^ is provided under a fixed value of ψ , see (2.4). The limiting covariance matrix remains the same under the local alternatives, namely the submatrix of Σ_β₀ corresponding to the components in ψ₀. The major extra challenge to show the above result, beyond the theorem in Shao, is that the distributions of the y_i are allowed to vary.

We now state our result giving the asymptotic behavior of T_m and W_m under H₁_m; a sketch of the proof is given in the Appendix. The result is the basis of our approach to obtain power and sample size.

Theorem 2

Under regularity conditions (C1)–(C9) in the Appendix, the test statistics T_m and W_m converge under P_m in distribution to a noncentral chi-squared random variable χ p 2 ( ν ) with non-centrality parameter ν = h T (BΣ_β₀B_T) – h.

3.2. Power and sample size calculation procedure

In this section, we propose a power and sample size calculation procedure for GEE under marginal models. To carry out the calculations at the design stage of a study, we first need a good approximation to finite sample distributions of T_m and W_m. Theorem 2 provides such an approximation, but we propose to replace the non-centrality parameter ν by what we consider to be a better approximation for a given sample size m:

ν m = m ξ m ψ T Σ m ψ − 1 ξ m ψ ξ_mψ = BM_m(β₀)> −1 E_{β_m}s_m(β₀)>, Σ_mψ = mBM_m(β₀)> −1 Var_{β_m}s_m(β₀)>M_m(β₀)> −1 B T .

Note that expressions (3.1)–(3.3) are conditional on the covariates and cluster sizes. If the design is non-random and prespecified, then these expressions could be used directly. However, in general, these expressions would not be available and would need to be estimated from pilot data. For the rest of this section we restrict attention to the case that the covariates and cluster sizes (z_i, x_i, n_i), i = 1, … , m, are i.i.d., where x_i = (x_i₁, x_i₂, …, x_in_i) and z_i = (z_i₁, z_i₂, …, z_in_i). Suppose it is of interest to achieve nominal power 1 – η at significance level ζ under ψ = ψ_A. As mentioned, we replace h in the expression of ν_m by m ( ψ A − ψ 0 ) . Ideally, for the purpose of sample size calculation, we would integrate M_m, E_β_mμ(β₀)> and Var_β_ms_m(β₀)> over the distribution of (z₁, x₁, n₁). After those steps (details are provided in Section 2 of the supplementary material), under the marginal model described in Section 2.1, the non-centrality parameter ν_m can be approximated by ν ~ m = m ξ ~ ψ T Σ ~ ψ − 1 ξ ~ ψ , where

ξ ~ ψ = B ‒ < E ( D 1 T V 1 − 1 D 1 ) >− 1 E [ D 1 T V 1 − 1 < μ 1 ( θ A ) − μ 1 ( θ 0 ) >] , Σ ~ ψ = B ‒ < E ( D 1 T V 1 − 1 D 1 ) >− 1 × [ E < D 1 T V 1 − 1 Var β A ( y 1 ∣ z 1 , x 1 , n 1 ) V 1 − 1 D 1 >] < E ( D 1 T V 1 − 1 D 1 ) >− 1 B ‒ T .

Here D₁ and ν₁ are evaluated under H₀, the expectations are taken with respect to the pre-specified distribution of (z₁, x₁, n₁), θ A = ( κ 0 T , ψ A T ) T , β A = ( λ 0 T , ψ A T ) T , B ‒ = ( 0 p × q , i p ) , and λ 0 = ( θ 0 , α 0 T , κ 0 T ) T is the true value of the nuisance parameter. The power at ψ = ψ_A is given by

π m ( ψ A ) = P < χ p 2 ( m ξ ~ ψ T Σ ~ ψ − 1 ξ ~ ψ ) ≥ χ p , 1 − ζ 2 >.

The final step is to solve an equation for sample size m for achieving 1 – η nominal power at significance level ζ:

π_m(ψ_A) = 1 − η

where ν ~ satisfies P < χ p 2 ( ν ~ ) ≥ χ p , 1 − ζ 2 >= 1 − η , and then the sample size is given by

m = ν ~ ξ ~ ψ T Σ ~ ψ − 1 ξ ~ ψ .

Remark 3

The vector h is chosen to be m ( ψ A − ψ 0 ) , so the fixed value ψ_A is on the path along which H₁_m converges to H₀. The method calculates power and sample sizes under a fixed value ψ_A of ψ although it is based on local asymptotic theory since ψ_A is on the path along which ψ₁_m converges to ψ₀ as m → ∞. We will see that our proposed method works better than previous approaches in various commonly seen cases with fixed alternative values, as discussed in a simulation study later.

Remark 4

An alternative approach is to calculate the power π_m(ψ₁_m) with ν_m replaced by ν (cf. van der Vaart, 1998; Lehmann and Romano, 2005). To calculate ν, this approach uses the limiting variance BΣ_β₀B T that does not depend on the alternative value ψ_A. However, the variance of m ( ψ ^ − ψ 0 ) may depend on the alternative value ψ_A for a given sample size m under some models, e.g., logistic regression models, where variance of the outcome is a function of its mean.

In summary, our sample size calculation proceeds as follows,

Choose type I error rate ζ and power 1 – η. Specify the joint distribution of covariates and cluster size.

Give ψ₀, ψ_A and λ₀ and specify the working correlation matrix (based on pilot data if available). Calculate D₁ and ν₁ in (2.2) under β = β₀.

Calculate Var_β_A(y₁|z₁, x₁, n₁) using the choice of the working correlation matrix (in Step 3) in place of the true correlation matrix.

Based on the results in Steps 3 and 4, find, using numerical integration methods if necessary, the expectations within (3.5) under the joint distribution of the covariates and cluster size given in Step 2.

Calculate ψ₀, ψ_A, λ₀ and the sample size m according to (3.4), (3.5), (3.6), and (3.7).

The joint distribution in Step 2 and values of the nuisance parameters in Step 3 may need to be estimated from pilot data. One possibility in Step 2 would simply be to use the empirical distribution of the pilot data; alternatively, a parametric model could be fit to the pilot data (e.g., the normal distribution in the arsenic study discussed in Section 6), in which case numerical integration might be needed in Step 5. Of course, if the true correlation matrix is known, it could be used in Step 4, but typically such information is not available at the design stage (cf., Liu and Liang (1997); Shih (1997); Liu, Shih, and Gehan (2002); Jung and Ahn (2005)). As mentioned by LL, sensitivity analysis should also be carried out to assess how the sample size varies according to changes of the specified working correlation matrix. Sample size re-estimation (SSR) is often carried out when interim data are available for updating design parameters (e.g., Shih (2001); Friede and Kieser (2006)). SSR for studies with correlated data have been actively discussed (e.g., Shih and Gould (1995); Lake et al. (2002); Zucker and Denne (2002); Yin and Shen (2005)).

4. Comparison with previous approaches

In this section, we compare our approach with LL’s and Shih’s for several important examples of marginal models. The working correlation matrix is assumed to be the true correlation matrix. Background and discussion of LL’s and Shih’s approaches are provided in Section 3 and Section 4 of the supplementary material.

LL restricted attention to the case of discrete covariates, but for continuous covariates their approach requires an initial (ad hoc) discretization. As pointed out by Shieh (2000), there is no consensus in how the discretization should be done. For LL’s approach, it is also necessary to derive the limiting value of the nuisance parameter estimator under the alternative hypothesis. Compared to Shih’s approach, our approach has much better accuracy under various circumstances.

4.1. Continuous outcomes and cluster level covariates

Consider a study with common cluster size n, continuous outcomes and covariates: x_ij ≡ x_i, z_ij ≡ 1 (intercept), j = 1, … , n. For simplicity, suppose that the working correlation matrix R_i ≡ R coincides with the true correlation matrix. The standard linear regression version of (2.1) is μ_ij = k+x_iψ, i = 1,2, …, m, j = 1,2, … , n, and there is a constant conditional marginal variance Var(y_ij|x_i) = σ 2 . The null hypothesis of interest is H₀ : ψ = ψ₀. To obtain a desired power 1 – η at significance level ζ for detecting ψ = ψ_A, the sample size from our approach (3.7) is

m = ( z 1 − ζ ∕ 2 + z 1 − η ) 2 ( 1 n T R − 1 1 n ) − 1 σ 2 ( ψ A − ψ 0 ) 2 Var ( x 1 ) ,

where 1_n is the n-vector of 1’s. Derivation of this formula is provided in Section 8 of the supplementary material. When the correlation has an exchangeable correlation structure with ρ_ij = ρ, the numerator in (4.1) is (z_1–ζ/2 + z_1–η) 2 n – 1)ρ>σ 2 . Note that the presence of the intracluster correlation has effectively increased the noise variance from σ 2 to n – 1)ρ>σ 2 . The inflation factor 1 + (n – 1)ρ is known as the design effect (Scott and Holt (1982)), and provides a direct measure of how the sample size (needed to achieve a fixed nominal power) increases as a function of the intracluster correlation. The formula (4.1) agrees with Shih’s. In this example, LL’s approach requires (a) x₁ has a discrete distribution, or (b) an ad hoc discretization is made to its continuous distribution if x₁ is a continuous variable; in the former case, LL’s result agrees with (4.1).

4.2. Binary outcomes and cluster level covariates

This example uses the same assumptions as 4.1 above except that we use the logistic regression version of (2.1):

logit(μ_ij) = κ + x_iψ, i = 1, 2, …, n.

The null hypothesis of interest is H₀ : ψ = ψ₀. To obtain a desired power 1 – η for detecting ψ = ψ_A, the sample size from (3.7) is

m = ( z 1 − ζ ∕ 2 + z 1 − η ) 2 ( 1 n T R − 1 1 n ) − 1 × < E ( v 1 x ) ( E ( x v 0 x ) ) 2 + E ( x 2 v 1 x ) ( E ( v 0 x ) ) 2 − 2 E ( s v 1 x ) E ( x v 0 x ) E ( v 0 x ) >[ E ( v 0 x ) E ( x p 1 x ) − E ( v 0 x ) E ( x p 0 x ) − E ( x v 0 x ) < E ( p 1 x ) − E ( p 0 x ) >] 2

m = ( z 1 − ζ ∕ 2 + z 1 − η ) 2 E ( v 1 x ) ( 1 n T R − 1 1 n ) − 1 ( ψ A − ψ 0 ) 2 [ E ( v 1 x ) E ( x 2 v 1 x ) − < E ( x v 1 x ) >2 ] .

Again, LL’s approach requires (a) x₁ has a discrete distribution, or (b) an ad hoc discretization is made to its continuous distribution; in both cases, LL’s approach requires a solution to a nonlinear equation for k* in terms of a discrete distribution P(x₁ = u_l) = ω_l, l = 1, … L:

∑ l = 1 L ω l < expit ( κ 0 + u l ψ A ) − expit ( κ ∗ + u l ψ 0 ) >= 0 .

However, if the null hypothesis of interest is ψ₀ = 0, this equation has an explicit solution and their result agrees with ours in case (a). In this example, there is a one-dimensional nuisance parameter and the equation is readily solved, but in general it would be more challenging to reach a solution. Moreover, the solution of this nonlinear equation is sensitive to the choice of discretization.

4.3. Sibling studies with binary outcomes and unit level binary covariates

Consider a cohort study of m pairs of siblings in which one sibling is exposed and the other unexposed, and the outcome is binary (disease or non-disease). The covariates are x_i = (x_i₁, x_i₂) T = (1, 0) T , where 1 and 0 designate exposed and unexposed, respectively, and z_i = (z_i₁, z_i₂) T with z_i₁ = z_i₂ = 1 (intercept) for i = 1, …, m. Let ρ denote the common correlation between siblings. Logistic regression is used, and the cluster size n = 2. Let p₀ and p₁ denote the risk of the disease in unexposed and exposed subjects, respectively. Suppose the null hypothesis of interest is H₀ : ψ = ψ₀. To obtain a desired power 1 – η for detecting ψ = ψ_A, the sample size m from (3.7) is

m = ( z 1 − ζ ∕ 2 + z 1 − η ) 2 ( v 0 2 v 1 + v 0 v ~ 0 2 − 2 ρ v 0 v ~ 0 v 0 v 1 ) v 0 2 ( p 1 − p ~ 0 ) 2 ,

m = ( z 1 − ζ ∕ 2 + z 1 − η ) 2 ( v 0 + v 1 − 2 ρ v 0 v 1 ) ( ψ A − ψ 0 ) 2 v 0 v 1

LL’s approach requires solving an equation for k*:

( 1 − ρ p 1 ∗ c 0 p 0 ∗ ) ( p 0 − p 0 ∗ ) + ( 1 − ρ c 0 p 0 ∗ p 1 ∗ ) ( p 1 − p 1 ∗ ) = 0 ,

where c₀ = exp(ψ₀/2), p 0 ∗ = expit ( κ ∗ ) , and p 1 ∗ = expit ( κ ∗ + ψ 0 ) . Again, if the null hypothesis of interest is ψ₀ = 0, (4.3) has the explicit solution k* = logit[(p₀ + p₁)/2], and their result agrees with ours.

5. Simulation study

In this section we report the results of a simulation study for some of the examples in the previous section in which the sample size formulae from previous approaches disagree with that from our approach. We generated 10, 000 replicated samples according to the marginal model in each setting. For each sample, we estimated the parameters and implemented the tests. The empirical power is given by the proportion of samples with test statistic value exceeding χ p , 1 − ζ 2 . The software SAS v9.2 was used to generate the data sets and calculate the test statistics. Accuracy of the sample size formula is then determined by how close the empirical power is to the nominal power. We set the type I error rate to be ζ = 0.05 and the nominal power at 90%. The Monte Carlo standard errors of the empirical powers reported in the simulation study are about 0.3%. In each example, we also provide the sample size that generates empirical power closest to the nominal power 90% in the simulation study and call it the “actual” sample size.

5.1. Simulation results for binary outcomes and cluster level binary exposures

Consider a special case of Example 4.2 involving a two-group comparison study with common cluster size 2, binary outcomes (disease or non-disease) and covariates x_ij ≡ x_i = 1 (exposed) or 0 (unexposed), and z_ij ≡ 1 (intercept), j = 1,2. Suppose half of the clusters are unexposed and the other half exposed. We set the reference risk of disease in the unexposed group to be p₀ = 0.1, and set ψ₀ = 0. The simulation results are presented in Table 7.1 . The empirical powers of the sample sizes from Shih’s approach are substantially larger than the nominal power 90%, whereas from our approach they are mostly within one or two Monte Carlo standard errors of the nominal power. The sample sizes from our approach almost coincide with the “actual”. Shih’s approach overestimates the sample sizes by over 8% for the first three cases, and over 10% for the last six cases. In particular, for the last three cases the sample sizes are overestimated by about 18%. On the other hand, our approach is accurate to within about 1–3% for the first six cases and 1–4% for the last three cases. The reason for the overestimation of sample size based on Shih’s method is unclear, but perhaps it is due to the use of the limiting variance of m ( ψ ^ − ψ A ) in the calculation of the non-centrality parameter, as discussed in Remark 4 of Section 3.

Table 7.1

Simulation results comparing our approach with Shih’s approach in the two-sample comparison problem with a binary outcome and a cluster level binary exposure. Sample sizes are for achieving a nominal power 90% to detect different relative risks (p₁/p₀) at 0.05 significance level for different correlations ρ. The reference risk p₀ is fixed at 0.1 and ψ₀ = 0. Empirical powers are based on the Wald test. The “actual” are the sample sizes with empirical powers closest to the nominal power 90%.

Our approach		Shih’s approach		“Actual”
RR	ρ	m	Power (%)	m	Power (%)	m	Power (%)
2.5	0.20	156	89.82	172	92.43	158	89.96
0.50	195	89.47	215	92.28	196	90.03
0.80	234	89.40	258	92.68	238	90.06
3.0	0.20	95	89.32	110	93.65	96	90.11
0.50	119	89.52	138	93.52	122	90.49
0.80	142	89.45	165	93.25	144	89.86
3.5	0.20	65	89.11	79	93.85	66	89.87
0.50	81	88.20	99	94.28	84	89.62
0.80	97	88.76	118	94.45	100	89.98

5.2. Simulation results for sibling studies with binary outcomes and binary exposures

First consider the sibling study (Example 4.3) with p₀ = 0.1 and ψ₀ = 0, see Table 7.2 . Shih’s approach still overestimates the sample size. The sample sizes from our approach are all accurate to within 1–4% compared to the actual, whereas Shih’s approach overestimates the sample sizes by 6–18%. Table 7.3 compares our approach with LL when ψ₀ changes to 0.5 (LL’s approach agrees with ours when ψ₀ = 0). The empirical powers of the sample sizes from our approach are mostly within one Monte Carlo standard error of the nominal power, whereas those from LL’s approach fall outside such limits.

Table 7.2

Simulation results comparing our approach with Shih’s approach in the sibling study with a binary outcome and a binary exposure. Sample sizes are for achieving a nominal power 90% to detect different relative risks (p₁/p₀) at 0.05 significance level for different correlations ρ. The reference risk p₀ is fixed at 0.1 and ψ₀ = 0. Empirical powers are based on the Wald test.

Our approach		Shih’s approach		“Actual”
RR	ρ	m	Power (%)	m	Power (%)	m	Power (%)
2.0	0.10	238	90.34	251	91.72	236	90.02
0.15	225	90.31	238	91.85	220	90.03
0.20	213	90.93	225	91.95	209	89.96
2.5	0.10	118	90.16	130	92.91	118	90.16
0.15	112	90.69	124	93.18	109	90.08
0.20	106	90.70	117	93.37	102	89.96
3.0	0.10	72	90.05	84	93.67	72	90.05
0.15	68	89.75	79	93.90	68	89.75
0.20	65	90.36	75	93.88	64	90.00

Table 7.3

Simulation results comparing our approach with LL’s approach in the sibling study with a binary outcome and a binary exposure. Sample sizes are for achieving a nominal power 90% to detect different effect sizes relative risks (p₁/p₀) at 0.05 significance level for different correlations ρ. Risk p₀ is fixed at 0.1 and ψ₀ = 0.5. Empirical powers are based on the quasi-score test.

Our approach		LL’s approach		“Actual”
RR	ρ	m	Power (%)	m	Power(%)	m	Power (%)
2.5	0.10	395	90.04	388	89.47	395	90.04
0.15	373	89.83	366	89.59	374	90.10
0.20	351	89.82	345	89.11	351	89.82
3.0	0.10	180	90.16	176	89.53	179	89.99
0.15	170	90.22	166	89.48	169	90.06
0.20	160	90.55	157	89.81	158	89.95
3.5	0.10	104	89.58	102	89.57	105	89.87
0.15	99	90.11	96	88.80	99	90.11
0.20	93	90.19	91	89.70	92	89.88
4.0	0.10	68	89.12	66	88.11	70	89.89
0.15	65	90.12	63	89.23	65	90.12
0.20	61	90.37	59	89.50	61	90.37

6. Application to an arsenic study

Exposure to arsenic through drinking water represents a major threat to human health (Karagas (2010)). Recently, a cohort study has been initiated by the Children’s Environmental Health and Disease Prevention Center and Superfund Basic Research Program at Dartmouth College to investigate the impact of arsenic exposure on the health of pregnant women and children in New England. A primary aim of the study is to assess the association between arsenic exposure during pregnancy and infant growth. Suppose the binary outcome ‘short stature’ is to be recorded at 0.5, 1, 1.5, and 2 years of age, giving the cluster size of n = 4 in model (4.2). The proportion of ‘short’ children (k₀ = log = –2.717 corresponding to x_i = 0 in (4.2). Data indicate that mean log-aresenic concentration (μg/L) in the water supply in New Hampshire is –0.942, so in (4.2) we set x_i = log-exposure–(–0.942). Pilot data on the study population (Gilbert-Diamond et al. (2012)) show that log-arsenic exposure is (approximately) distributed as N(–0.04, 4), so we specify x_i ~ N(0.902, 4).

First we consider an AR(1) structure, ρ = ρ |i–j| . Pilot data on children’s heights in this cohort were not provided, so it is not possible to estimate ρ and we consider a range of values: ρ = 0.2, 0.5 and 0.8. For each ρ, we calculate the sample size needed to detect an odds ratio of 1.5 (ψ_A = 0.406) due to a unit increase in log-arsenic exposure (to achieve 90% power at significance level 0.05). Our approach gives sample sizes of 70, 105, and 157, and Shih’s gives 69, 103, and 154 for the values of ρ, respectively. Using LL’s approach with a discretization of 150 equispaced values within 3 standard deviations of the mean log-arsenic exposure, the sample sizes are 73, 109, and 164, respectively. Under an exchangeable structure with ρ_ij = ρ, for ρ = 0.2, 0.5 and 0.8, the sample sizes are 84, 131, and 178 from our approach, 82, 128, and 174 from Shih’s, and 88, 137, and 185 from LL’s, respectively. In this example, we used Monte Carlo integration in Step 5. SAS code is provided in the supplementary material.

7. Discussion

We have developed a method to calculate sample size for studies with correlated data. Under the framework of marginal models, our approach gives the same sample size formulae for the Wald and quasi-score tests. We study power under a sequence of alternatives that converge at root-m rate to the null hypothesis, and show that the statistics converge in distribution to a noncentral chi-squared. Then we link the sequence of alternatives to a fixed alternative by choosing a specific value of the local parameter vector h so that ψ₁ is on the path along which H₁_m converges to H₀.

As shown in simulation study, our approach provides considerable improvements in terms of accuracy. Hanfelt and Liang (1995) developed an approximate likelihood ratio test for GEE, that could be used as the basis for power and sample size calculations. We conjecture that our approach could be extended to the approximate likelihood ratio test and would lead to the same conclusion since the likelihood ratio test is equivalent to the other two tests for independent data.

We used correlation to measure the association between binary outcomes. An alternative to the correlation as a measure of association between pairs of binary responses is the odds ratio, which has a more straightforward interpretation. To estimate the odds ratio as a measure of association, a second set of estimating equations (Lipsitz, Laird, and Harrington (1991); Carey, Zeger, and Diggle (1993)) can be used. The power and sample size calculations approach developed in the present paper can readily be adapted to these settings.

A referee raised the question of whether our approach could be extended to ordinal or nominal outcomes. Lipsitz, Kim, and Zhao (1994) developed a GEE for such outcomes; This could be handled under our local asymptotic approach, but further work would be needed to develop a specific procedure. Another issue raised by a referee concerns the handling of missing data. We refer the interested reader to Robins, Rotnitzky, and Zhao (1995) who developed an inverse-weighted GEE method for missing data. Again, further work would be needed to develop a specific procedure.

Monte Carlo simulations are sometimes used to obtain power and sample size, especially for complicated designs where explicit sample size formulae are hard to derive. However, simulations have some serious disadvantages from a practical perspective: they are time-consuming, computational expertise is needed and, in the case of small sample sizes, the results can be highly sensitive to distributional assumptions. On the other hand, our formulae only require plugging a few numbers into a calculator (or at most some routine numerical integrations), are more appealing to practitioners, and distributional assumptions play a limited role. With an explicit sample size formula, we are able to calculate the minimal sample size to find the most efficient design strategy in the planning stages of a study. Even with complicated designs, where explicit formulae are not available, explicit sample size formulae are still useful for providing initial sample size estimates that can later be refined using simulations.

Supplementary Material

Supplementary material

Acknowledgments

The authors are grateful for many helpful comments from an associate editor and the referees. The research of Dr. Ian McKeague was supported in part by NIH Grant R01GM095722. We thank Dr. Margaret Karagas and the Children’s Environmental Health and Disease Prevention Center at Dartmouth College for providing preliminary data in the arsenic study example (supported by P20 ES018175 and P42 ES007373 from the NIEHS and NIH; and RD-83459901-0 from the USEPA).

Appendix. Regularity conditions

(C1) The parameter space 𝔹 is compact and β₀ belongs to its interior.

(C2) There exists δ > 1 such that

c 0 ≡ sup m ≥ 1 max i = 1 , … , m E β m ∣ h i ( y i ) ∣ 1 = δ < ∞ and c 1 ≡ sup m ≥ 1 max i = 1 , … , m E β m ∣ y i ∣ δ < ∞

(C4) sup i ≥ 1 ∣ E b 1 < Ψ i ( y i , b ) >− E b 1 ∗ < Ψ i ( y i , b ∗ ) >∣ ≲ ∣ b 1 − b 1 ∗ ∣ + ∣ b − b ∗ ∣ for all b₁, b 1 ∗ , b, b* ∈ 𝔹, where “≲” means “smaller than up to a constant”.

Remark 5

Condition (C2) is slightly stronger than the corresponding condition in Shao, namely that sup_i E|h_i(y_i)| 1+δ < ∞ and sup_i E|y_i| δ < ∞ for some δ >0; we need the stronger condition because in our case the distribution of y_i is changing with sample size. The consistency result established in the first part of the proof of Theorem 1 still holds if we replace δ > 1 by δ > 0 in (C2), but we need δ > 1 for the asymptotic normality result. Condition (C3) is also a little stronger because we make the additional assumption that the functions are uniformly bounded on 𝔹. Similar to the existence of a well-separated point of maximum in Theorem 5.7 in van der Vaart (1998), (C5) requires that f_m(b) has a unique zero β₀, and only values close to β₀ yield a value of f_m(b) close to zero. (C4) and (C7) are Lipschitz conditions.

Remark 6

Under marginal models, 1 m M m ( β 0 ) and 1 m ∑ i = 1 m var β 0 ( Ψ i ( y i , β 0 ) ) in (C8) reduce to 1 m ∑ i = 1 m D i T V i − 1 D i and

1 m ∑ i = 1 m D i T V i − 1 Cov ( y i ∣ z i , x i ) V i − 1 D i ,

respectively; These are important components of the covariance matrix of the GEE estimator developed in Liang and Zeger (1986). Also, the matrix M(β₀) in (C9) is the inverse of the asymptotic covariance of m ( β ^ − β 0 ) if the working correlation is correctly specified.

Sketch of proof of Theorem 1

The detailed proof is provided in Section 6 of the supplementary material. Under (C1)–(C5), it can be shown that β ^ → P m β 0 and ψ ^ → P m ψ 0 by adapting the proof of Theorem 5.7 of van der Vaart (1998).

By using the extra conditions (C6)–(C9), we can further show that m ( β ^ − β 0 ) converges in distribution under P_m to N(h*, Σ_β₀), where h ∗ = ( 0 k − p T , h T ) T and 0_k–p is the (k–p)-dimensional zero vector. The proof is completed by noticing that ψ ^ = B β ^ .

Sketch of proof of Theorem 2

The detailed proof is provided in Section 7 of the supplementary material.

Asymptotic distribution of W_m

The first step is to show that Σ ^ converges in probability under H₁_m to Σ_β₀. Thus B Σ ^ B T converges in probability under H₁_m to BΣ_β₀B T . From Theorem 1, m ( ψ ^ − ψ 0 ) converges in distribution under H₁_m to N(h, BΣ_β₀B T ). Therefore, by Slutsky’s Lemma and the Continuous Mapping Theorem, W_m converges in distribution under H₁_m to noncentral χ p 2 with non-centrality parameter ν = h T (BΣ_β₀B T ) –1 h.

Asymptotic distribution of T_m

An estimate λ ^ of the nuisance parameter vector λ under H₀ is needed to calculate the quasi-score statistic. For this purpose it suffices to use the first k – p estimating equations, so λ ^ can be taken as a solution of C_s_m(λ, ψ₀) = 0, where C = (I_(k–p) 0(_k–p)×p). Recall that λ ^ and ψ₀ combine to form β ^ . Write the quasi-score statistic as

T m = < m − 1 ∕ 2 B s m ( β ~ ) >T ( m − 1 V T ) − 1 < m − 1 ∕ 2 B s m ( β ~ ) >.

It can be shown that T_m is asymptotically equivalent to W_m, concluding the proof.

Footnotes

Contents of supplementary material Detailed proofs, derivations of the sample size formulae, outlines of LL’s and Shih’s approaches, and SAS code for calculating the sample sizes in the arsenic example.

References

Boos DD. On generalized score tests. The American Statistician. 1992; 46 :327–333. [Google Scholar]
Carey V, Zeger SL, Diggle P. Modelling multivariate binary data with alternating logistic regressions. Biometrika. 1993; 82 :517–526. [Google Scholar]
Diggle PJ, Heagerty P, Liang K-Y, Zeger SL. Analysis of Longitudinal Data. 2nd Edition Oxford University Press; 2002. [Google Scholar]
Fitzmaurice GM, Davidian M, Verbeke G, Molenberghs G. Longitudinal Data Analysis: Handbooks of Modern Statistical Methods. Chapman & Hall/CRC; Boca Raton: 2009. [Google Scholar]
Friede T, Kieser M. Sample size recalculation in internal pilot study designs: A review. Biometrical Journal. 2006; 48 :537–555. [PubMed] [Google Scholar]
Gilbert-Diamond D, Cottingham KL, Gruber JF, Punshonb T, Sayarath V, Gandolfi AJ, Baker ER, Jackson B, Folt CL, Karagas MR. Rice consumption contributes to arsenic exposure in U.S. women. The Proceedings of the National Academy of Sciences of the United States of America. 2012 In press. [PMC free article] [PubMed] [Google Scholar]
Hanfelt JJ, Liang K-Y. Approximate likelihood ratios for general estimating functions. Biometrika. 1995; 82 :461–477. [Google Scholar]
Jung SH, Ahn C. Sample size estimation for GEE method for comparing slopes in repeated measurements data. Statistics in Medicine. 2003; 22 :1305–1315. [PubMed] [Google Scholar]
Jung SH, Ahn C. Sample size for a two-group comparison of repeated binary measurements using GEE. Statistics in Medicine. 2005; 24 :2583–2596. [PubMed] [Google Scholar]
Karagas MR. Arsenic-related mortality in Bangladesh. The Lancet. 2010; 376 :213–214. [PMC free article] [PubMed] [Google Scholar]
Kim H-Y, Williamson JM, Lyles CM. Sample-size calculations for studies with correlated ordinal outcomes. Statistics in Medicine. 2005; 21 :1337–1350. [PubMed] [Google Scholar]
Lake S, Kammann E, Klar N, Betensky R. Sample size re-estimation in cluster randomization trials. Statistics in Medicine. 2002; 24 :2977–2987. [PubMed] [Google Scholar]
Le Cam L. Locally asymptotically normal families of distributions. University of California Publications in Statistics. 1960; 3 :37–98. [Google Scholar]
Lehmann E, Romano J. Testing Statistical Hypotheses. 3rd Edition Springer; New York: 2005. [Google Scholar]
Liang K-Y, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986; 73 :13–22. [Google Scholar]
Lin DY, Yao Q, Ying Z. A general theory on stochastic curtailment for censored survival data. Journal of the American Statistical Association. 1999; 94 :510–521. [Google Scholar]
Lipsitz SR, Kim K, Zhao L. Analysis of repeated categorical data using generalized estimating equations. Statistics in Medicine. 1994; 13 :1149–1163. [PubMed] [Google Scholar]
Lipsitz SR, Laird NM, Harrington DP. Generalized estimating equations for correlated binary data: Using the odds ratio as a measure of association. Biometrika. 1991; 78 :153–160. [Google Scholar]
Liu A, Shih WJ, Gehan E. Sample size and power determination for clustered repeated measurements. Statistics in Medicine. 2002; 21 :1787–1801. [PubMed] [Google Scholar]
Liu G, Liang K-Y. Sample size calculations for studies with correlated observations. Biometrics. 1997; 53 :937–947. [PubMed] [Google Scholar]
Pan W. Sample size and power calculations with correlated binary data. Controlled Clinical Trials. 2001; 22 :211–227. [PubMed] [Google Scholar]
Robins JM, Rotnitzky A, Zhao L. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. Journal of American Statistical Association. 1995; 90 :106–121. [Google Scholar]
Rochon J. Application of GEE procedures for sample size calculations in repeated measures experiments. Statistics in Medicine. 1998; 17 :1643–1658. [PubMed] [Google Scholar]
Rotnitzky A, Jewell NP. Hypothesis testing of regression parameters in semiparametric generalized linear models for cluster correlated data. Biometrika. 1990; 77 :485–497. [Google Scholar]
Scott AJ, Holt D. The effect of two-stage sampling on ordinary least squares methods. Journal of the American Statistical Association. 1982; 77 :848–854. [Google Scholar]
Self SG, Mauritsen RH. Power/sample size calculations for generalized linear models. Biometrics. 1988; 44 :79–86. [Google Scholar]
Shao J. Mathematical Statistics. Springer–Verlag; New York: 2003. [Google Scholar]
Shieh G. On power and sample size calculations for likelihood ratio tests in generalized linear models. Biometrics. 2000; 56 :1192–1196. [PubMed] [Google Scholar]
Shih WJ. Sample size and power calculations for periodontal and other studies with clustered samples using the method of generalized estimating equations. Biometrical Journal. 1997; 39 :899–908. [Google Scholar]
Shih WJ. Sample size re-estimation–journey for a decade. Statistics in Medicine. 2001; 20 :515–518. [PubMed] [Google Scholar]
Shih WJ, Gould AL. Re-evaluating design specifications of longitudinal clinical trials without unblinding when the key response is rate of change. Statistics in Medicine. 1995; 14 :2239–2248. [PubMed] [Google Scholar]
van der Vaart AW. Asymptotic Statistics. Cambridge University Press; Cambridge, U.K.: 1998. [Google Scholar]
von Bahr B, Esseen C. Inequalities for the rth absolute moment of a sum of random variables, 1 ≤ r ≤ 2. Annals of Mathematical Statistics. 1965; 36 :299–303. [Google Scholar]
Yin G, Shen Y. Adaptive design and estimation in randomized clinical trials with correlated observations. Biometrics. 2005; 61 :362–369. [PubMed] [Google Scholar]
Zucker DM, Denne J. Sample-size redetermination for repeated measures studies. Biometrics. 2002; 58 :548–559. [PubMed] [Google Scholar]