Medical Expenditure Panel Survey Computing Standard Errors for MEPS Estimates

Computing Standard Errors for MEPS Estimates

Steven Machlin, William Yu, and Marc Zodet

Introduction - Person-level files - Event-level files and supplements

Introduction

The Household Component of the Medical Expenditure Panel Survey (MEPS-HC) is designed to produce national and regional estimates of the health care use, expenditures, sources of payment, and insurance coverage of the U.S. civilian noninstitutionalized population. The sample design of the survey includes stratification, clustering, multiple stages of selection, and disproportionate sampling. Furthermore, the MEPS sampling weights reflect adjustments for survey nonresponse and adjustments to population control totals from the Current Population Survey. These survey design and estimation complexities require special consideration when analyzing MEPS data (i.e., it is not appropriate to assume simple random sampling).

To obtain accurate estimates from MEPS survey data, for either descriptive statistics or more sophisticated analyses based on multivariate models, the MEPS survey design complexities need to be taken into account by applying MEPS survey weights to produce estimates and using an appropriate technique to derive standard errors associated with the weighted estimates. Several methods for estimating standard errors for estimates from complex surveys have been developed, including the Taylor-series linearization method, balanced repeated replication, and the jack-knife method.

The MEPS public use files include variables to obtain weighted estimates and to implement a Taylor-series approach to estimate standard errors for weighted survey estimates. These variables, which jointly reflect the MEPS survey design, include the estimation weight, sampling strata, and primary sampling unit (PSU). The documentation and codebook for MEPS public use files contain these survey design variables. For example, the documentation for file HC-070 (2002 full-year consolidated data file) includes the person weight (PERWT02F), stratum (VARSTR), and PSU (VARPSU) variables.

Statistical software packages that are commonly used to estimate standard errors from complex multistage designs using the Taylor-series linearization method include SAS® (version 8.2 or higher), SUDAAN®, Stata®, and SPSS® (version 12.0 or higher). Examples of basic programming code from these packages to produce selected estimates and the corresponding standard errors are provided in this document. The software packages vary with respect to the specific types of estimates and models that can be produced accounting for the complex survey design and the treatment of missing data. For complete information on the capabilities of each package, analysts need to refer to the appropriate software user documentation manuals. The Web sites for SAS, SUDAAN, Stata, and SPSS are http://www.sas.com, http://www.rti.org, http://www.stata.com, and http://www.spss.com, respectively. The R language also has a package for complex survey analysis. Information on this package can be found in the June 2003 R News newsletter available on the R website at http://www.r-project.org.

Standard errors for MEPS estimates are most accurate when the analytic file contains all of the MEPS sample persons (e.g., those with positive values for the person weight variable) and the appropriate syntax is used to analyze population subgroups. Section I below provides examples of basic programming code for SAS, SUDAAN, Stata, and SPSS to generate estimates from MEPS person-level files, both for the total population and for population subgroups. Section II provides options for estimation in situations where analytic files do not include all of the MEPS sample persons. These situations include analyses based solely on data from MEPS event files, which only contain sample persons that received a particular type of care, and analyses of data from MEPS supplements (e.g., the diabetes supplement data in PUF HC-070), which require the use of special analytic weights that exclude the sample persons who were not included in the supplement.

^top

I. MEPS Person-Level Files

A. Analyses of the Total Population

Example: Using the 2002 MEPS full-year consolidated file (PUF HC-070) as the analytic file, the basic programming code provided below for each software package will produce correct estimates of the overall mean total expenditures in 2002 ($2,813.24) and the corresponding standard error ($58.99).

SAS

proc surveymeans;
stratum varstr;
cluster varpsu;
weight perwt02f;
var totexp02;

SUDAAN

Note: SUDAAN requires that the data be sorted by the survey design variables that appear on the NEST statement (i.e., varstr varpsu in example below).

proc descript filetype=sas design=wr;
nest varstr varpsu;
weight perwt02f;
var totexp02;

Stata (syntax below applies to releases 8.0 and higher)

svyset [pweight=perwt02f], strata(varstr) psu(varpsu)
svymean totexp02

SPSS

csplan analysis
/plan file=’filename’
/planvars analysisweight=perwt02f
/design strata=varstr cluster=varpsu
/estimator type=wr.
csdescriptives
/plan file=’filename’
/summary variables=totexp02
/mean
/statistics se.

B. Analyses Limited to a Population Subgroup

Analyses are often limited to a subgroup of the population. However, creating a special analysis file that contains only observations for the subgroup of interest may yield incorrect standard errors or an error message (e.g., "stratum with only one psu detected" in Stata) because all of the observations corresponding to a stage of the MEPS sample design may be deleted. Therefore, it is advisable to preserve the entire survey design structure for the program by reading in the entire person-level file. Each software package provides a capability to limit the analysis to a subgroup of the population without sub-setting the analysis file.

Example: Using the 2002 MEPS full year consolidated file (PUF HC-070) as the analytic file, the following statements will produce accurate estimates of the average total expenditures in 2002 for children younger than 18 years of age ($1,085.82) and the corresponding standard error ($70.28).

SAS

proc surveymeans;
stratum varstr;
cluster varpsu;
weight perwt02f;
var totexp02;
domain agegroup;

Note: The domain statement in this example will generate estimates for all categories of the variable agegroup (a hypothetical constructed analytic variable where the youngest group is children under 18). There is no option within the surveymeans procedure to select only a specific population subgroup (e.g., agegroup=1).

SUDAAN

proc descript filetype=sas design=wr;
nest varstr varpsu;
weight perwt02f;
var totexp02;
subpopn agegroup=1;

Note: The subpopn statement in this example generates estimates for children under 18 (where agegroup is a constructed analytic variable that is equal to 1 for children under 18).

Stata (syntax below applies to releases 8.0 and higher)

svyset [pweight=perwt02f], strata(varstr) psu(varpsu)
svymean totexp02, subpop(children)

Note: The subpop statement in this example generates estimates for children under 18 only (where children is a constructed variable set equal to 1 for persons under 18 and set equal to 0 for all other persons).

SPSS

csplan analysis
/plan file=’filename’
/planvars analysisweight=perwt02f
/design strata=varstr cluster=varpsu
/estimator type=wr.
csdescriptives
/plan file=’filename’
/summary variables=totexp02
/mean
/statistics se
/subpop table=children.

Note: The subpop statement in this example will generate estimates for all categories of the variable children (a hypothetical constructed dichotomous analytic variable where 1=children under 18 and 0=adults 18 and over). There is no option within the csdescriptives procedure to select only a specific population subgroup (e.g., children=1).

^top

II. Analysis of MEPS Event-Level Files and MEPS Supplements

There are some situations where it is not convenient to include all of the MEPS sample persons in the analytic file. In particular, MEPS event-level files only contain sample persons that received a particular type of care. Also, while data from MEPS supplements are typically contained on person-level files that include all sample persons, their analysis requires the use of special analytic weights that essentially exclude sample persons who were not included in the supplement.

While standard errors are technically most accurate when the analytic file contains all of the MEPS sample persons (e.g. those with positive values on the person weight variable), it is possible to produce standard errors that will usually be fairly accurate without creating an analytic file that includes the entire sample. The following two sections provide more detailed information as well as examples of programming code for SAS and SUDAAN when working with MEPS event-level files and data from MEPS supplements.

A. MEPS Event-Level Files

MEPS event-level files include only records for persons with health care use in the year. Therefore, analyses based solely on event-level files do not preserve the entire estimation structure because persons in the sample without health care use are not represented. If a substantial number of persons in the sample are not represented in the file and some strata contain only observations from one PSU, then estimating standard errors becomes problematic.

Error messages are generated in SUDAAN and Stata when only one PSU is encountered in a stratum because standard errors cannot be estimated. While the MISSUNIT option on the NEST statement in SUDAAN will generate standard error estimates (see note under SUDAAN example below), Stata does not provide an option for estimating standard errors when there are some strata with only one PSU. However, there is a discussion of options for dealing with this situation on the Stata Web site at http://www.stata.com/support/faqs/stat/stratum.html.

In contrast to SUDAAN and Stata, SAS and SPSS will automatically generate standard errors when there are some strata with only one PSU. The methodology used to compute standard errors when there are some strata with only one PSU used by SAS and SPSS differs from that used by SUDAAN when the MISSUNIT option is specified (see note under example below). Consequently, standard errors from these packages will not necessarily be identical (see example below).

Example: Using the 2002 MEPS hospital inpatient stays file (PUF HC-067D) as the analytic file, the following sample programming code for SAS, SUDAAN, and SPSS produce estimates of the average total expense per hospital stay in 2002 ($8,698.00) and the corresponding standard error ($286.55 from SAS and SPSS versus $298.30 from SUDAAN with MISSUNIT option).

SAS

proc surveymeans;
stratum varstr;
cluster varpsu;
weight perwt02f;
var ipxp02x;

Note: The variances for the strata with only one PSU are considered to be 0 in the overall computation of standard errors.

SUDAAN

Note: SUDAAN requires that the data be sorted by the survey design variables that appear on the NEST statement (i.e. varstr02 varpsu02 in example below).

proc descript filetype=sas design=wr;
nest varstr varpsu/missunit;
weight perwt02f;
var ipxp02x;

Note: The missunit option on the nest statement specifies that if only one sample unit is encountered within a stage (i.e., one PSU in a stratum), then the contribution of that unit toward the overall standard error is estimated using the difference in that unit is value and the overall mean value of the population.

SPSS

csplan analysis
/plan file=’filename’
/planvars analysisweight=perwt02f
/design strata=varstr cluster=varpsu
/estimator type=wr.
csdescriptives
/plan file=’filename’
/summary variables=ipxp02x
/mean
/statistics se.

B. MEPS Supplements

The MEPS includes periodic supplements that collect data for only a subset of sample persons (e.g., persons with a specific health condition). Analyzing data from these supplements requires the use of special weights that are set to 0 for persons not included in the supplement. Analysis of MEPS supplements in which a substantial number of persons have a weight of 0 can produce similar problems and require similar approaches to those described above in the section on MEPS event-level files.

Example: Using the 2002 MEPS full-year consolidated file (PUF HC-070) as the analytic file (which contains data from the diabetes supplement), the following sample programming code for SAS, SUDAAN, and SPSS produce estimates of the proportion of the population in 2002 that treated their diabetes with insulin injections (26.63 percent) and the corresponding standard error (1.17 percent from SAS and SPSS versus 1.19 percent from SUDAAN with MISSUNIT option).

The analytic variable INSINJECT in this example was set equal to 1 if the respondent indicated they used insulin injections and set equal to 0 if they indicated they did not use insulin injections.

SAS

proc surveymeans;
stratum varstr;
cluster varpsu;
weight diabw02f;
var insinject;
Note: The variances for the strata with only one PSU are considered to be 0 in the overall standard error computation.

SUDAAN

Note: SUDAAN requires that the data be sorted by the survey design variables that appear on the NEST statement (i.e., varstr00 varpsu00 in example below).

proc descript filetype=sas design=wr;
nest varstr varpsu/missunit;
weight diabw02f;
var insinject;

Note: The missunit option on the nest statement specifies that if only one sample unit is encountered within a stage (i.e., one PSU in a stratum), then the contribution of that unit toward the overall standard error is estimated using the difference in that unit's value and the overall mean value of the population.

SPSS

csplan analysis
/plan file=’filename’
/planvars analysisweight=diabw02f
/design strata=varstr cluster=varpsu
/estimator type=wr.
csdescriptives
/plan file=’filename’
/summary variables=insinject
/mean
/statistics se.

^top

Suggested Citation:
Machlin, S., Yu, W., and Zodet, M. Computing Standard Errors for MEPS Estimates. January 2005. Agency for Healthcare Research and Quality, Rockville, Md. http://www.meps.ahrq.gov/survey_comp/standard_errors.jsp

Connect With Us

Sign up for Email Updates

Agency for Healthcare Research and Quality