| 
      
        | 
                    Computing Standard Errors for MEPS Estimates
                    
                    Steven Machlin, William Yu, and Marc Zodet                    
                  
 
 
 
               Introduction
 The Household Component of the 
                Medical Expenditure Panel Survey (MEPS-HC) is designed to 
                produce national and regional estimates of the health care use,
              expenditures, sources of payment, and insurance coverage of the
              U.S. civilian noninstitutionalized population. The sample design
              of the survey includes stratification, clustering, multiple
                stages of selection, and disproportionate sampling. Furthermore,
              the MEPS sampling weights reflect adjustments for survey
                nonresponse and adjustments to population control totals from
              the Current Population Survey. These survey design and
                estimation complexities require special consideration when 
                analyzing MEPS data (i.e., it is not appropriate to assume 
                simple random sampling).
 
 To obtain accurate estimates from MEPS survey data, for either 
                descriptive statistics or more sophisticated analyses based on 
                multivariate models, the MEPS survey design complexities need to 
                be taken into account by applying MEPS survey weights to produce 
                estimates and using an appropriate technique to derive standard 
                errors associated with the weighted estimates. Several methods 
                for estimating standard errors for estimates from complex 
                surveys have been developed, including the Taylor-series 
                linearization method, balanced repeated replication, and the 
                jack-knife method.
 The MEPS public use files include 
              variables to obtain weighted estimates and to implement a 
              Taylor-series approach to estimate standard errors for weighted 
              survey estimates. These variables, which jointly reflect the MEPS 
              survey design, include the estimation weight, sampling strata, and 
              primary sampling unit (PSU). The documentation and codebook for 
              MEPS public use files contain these survey design variables. For 
              example, the documentation for file HC-070 (2002 full-year 
              consolidated data file) includes the person weight (PERWT02F), 
              stratum (VARSTR), and PSU (VARPSU) variables. Statistical software packages that 
              are commonly used to estimate standard errors from complex 
              multistage designs using the Taylor-series linearization method 
              include SAS® (version 8.2 or higher), SUDAAN®, Stata®, and SPSS® 
              (version 12.0 or higher). Examples of basic programming code from 
              these packages to produce selected estimates and the corresponding 
              standard errors are provided in this document. The software 
              packages vary with respect to the specific types of estimates and 
              models that can be produced accounting for the complex survey 
              design and the treatment of missing data. For complete information 
              on the capabilities of each package, analysts need to refer to the 
              appropriate software user documentation manuals. The Web sites for 
              SAS, SUDAAN, Stata, and SPSS are http://www.sas.com, http://www.rti.org,
              http://www.stata.com, and
              http://www.spss.com, respectively. The 
              R language also has a package for complex survey analysis. 
              Information on this package can be found in the June 2003 R News 
              newsletter available on the R website at
              http://www.r-project.org.              
               Standard errors for MEPS estimates 
              are most accurate when the analytic file contains all of the MEPS 
              sample persons (e.g., those with positive values for the person 
              weight variable) and the appropriate syntax is used to analyze 
              population subgroups. Section I below provides examples of basic 
              programming code for SAS, SUDAAN, Stata, and SPSS to generate 
              estimates from MEPS person-level files, both for the total 
              population and for population subgroups. Section II provides 
              options for estimation in situations where analytic files do not 
              include all of the MEPS sample persons. These situations include 
              analyses based solely on data from MEPS event files, which only 
              contain sample persons that received a particular type of care, 
              and analyses of data from MEPS supplements (e.g., the diabetes 
              supplement data in PUF HC-070), which require the use of special 
              analytic weights that exclude the sample persons who were not 
              included in the supplement. 
               ^topI.
              MEPS Person-Level Files                    
                                    A. Analyses of the
                        Total Population 
 Example: Using the 2002 MEPS full-year consolidated file
                        (PUF HC-070) as the analytic file, the basic programming
                        code provided below for each software package will produce
                        correct estimates of the overall mean total expenditures
                        in 2002 ($2,813.24) and the corresponding standard error
                        ($58.99).
 
 SAS
 
 proc surveymeans;
 stratum varstr;
 cluster varpsu;
 weight perwt02f;
 var totexp02;
 SUDAAN
 Note: SUDAAN requires that the data be sorted by the
                            survey design variables that appear on the NEST statement
                            (i.e., varstr varpsu in example below).
 
 proc descript filetype=sas design=wr;
 nest varstr varpsu;
 weight perwt02f;
 var totexp02;
 
 Stata (syntax below applies to releases 8.0 and higher)
 svyset [pweight=perwt02f], strata(varstr) 
                          psu(varpsu)
 svymean totexp02
 SPSS
 csplan analysis
 /plan file=’filename’
 /planvars analysisweight=perwt02f
 /design strata=varstr cluster=varpsu
 /estimator type=wr.
 csdescriptives
 /plan file=’filename’
 /summary variables=totexp02
 /mean
 /statistics se.
 
 B. Analyses Limited to a Population Subgroup
 
 Analyses are often limited to a subgroup of the population.
                        However, creating a special analysis file that contains
                        only observations for the subgroup of interest may yield
                        incorrect standard errors or an error message (e.g., "stratum
                        with only one psu detected" in Stata) because all
                        of the observations corresponding to a stage of the MEPS
                        sample design may be deleted. Therefore, it is advisable
                        to preserve the entire survey design structure for the
                        program by reading in the entire person-level file. Each
                        software package provides a capability to limit the analysis
                        to a subgroup of the population without sub-setting the
                        analysis file.
 
 Example: Using the 2002 MEPS full year consolidated file
                        (PUF HC-070) as the analytic file, the following statements
                        will produce accurate estimates of the average total
                        expenditures in 2002 for children younger than 18 years
                        of age ($1,085.82) and the corresponding standard error
                        ($70.28).
 
 SAS
 
 proc surveymeans;
 stratum varstr;
 cluster varpsu;
 weight perwt02f;
 var totexp02;
 domain agegroup;
 
 Note: The domain statement in this example will generate
                        estimates for all categories of the variable agegroup
                        (a hypothetical constructed analytic variable where the
                        youngest group is children under 18). There is no option
                        within the surveymeans procedure to select only a specific
                        population subgroup (e.g., agegroup=1).
 
 SUDAAN
 
 proc descript filetype=sas design=wr;
 nest varstr varpsu;
 weight perwt02f;
 var totexp02;
 subpopn agegroup=1;
 
 Note: The subpopn statement in this example generates
                        estimates for children under 18 (where agegroup is a
                        constructed analytic variable that is equal to 1 for
                        children under 18).
 
 Stata (syntax below applies to releases 8.0 and higher)
 
 svyset [pweight=perwt02f], strata(varstr) psu(varpsu)
 svymean totexp02, subpop(children)
 
 Note: The subpop statement in this example generates
                        estimates for children under 18 only (where children
                        is a constructed variable set equal to 1 for persons
                        under 18 and set equal to 0 for all other persons).
 
 SPSS
 
 csplan analysis
 /plan file=’filename’
 /planvars analysisweight=perwt02f
 /design strata=varstr cluster=varpsu
 /estimator type=wr.
 csdescriptives
 /plan file=’filename’
 /summary variables=totexp02
 /mean
 /statistics se
 /subpop table=children.
 
 Note: The subpop statement in this example will generate
                        estimates for all categories of the variable children
                        (a hypothetical constructed dichotomous analytic variable
                        where 1=children under 18 and 0=adults 18 and over).
                        There is no option within the csdescriptives procedure
                        to select only a specific population subgroup (e.g.,
                        children=1).
 ^topII. Analysis
            of MEPS Event-Level Files and MEPS Supplements There are some situations where it is not convenient to include all of the
  MEPS sample persons in the analytic file. In particular, MEPS event-level files
  only contain sample persons that received a particular type of care. Also,
  while data from MEPS supplements are typically contained on person-level files
  that include all sample persons, their analysis requires the use of special
  analytic weights that essentially exclude sample persons who were not included
  in the supplement.
 
 While standard errors are technically most accurate when the analytic file
  contains all of the MEPS sample persons (e.g. those with positive values on
  the person weight variable), it is possible to produce standard errors that
  will usually be fairly accurate without creating an analytic file that includes
  the entire sample. The following two sections provide more detailed information
  as well as examples of programming code for SAS and SUDAAN when working with
  MEPS event-level files and data from MEPS supplements.
 
 A. MEPS Event-Level Files
 
 MEPS event-level files include only records for persons with health care use in the year. Therefore, analyses based solely on event-level files do not preserve the entire estimation structure because persons in the sample without health care use are not represented. If a substantial number of persons in the sample are not represented in the file and some strata contain only observations from one PSU, then estimating standard errors becomes problematic.
 
 Error messages are generated in SUDAAN and Stata when only one PSU is encountered
  in a stratum because standard errors cannot be estimated. While the MISSUNIT
  option on the NEST statement in SUDAAN will generate standard error estimates
  (see note under SUDAAN example below), Stata does not provide an option for
  estimating standard errors when there are some strata with only one PSU. However,
  there is a discussion of options for dealing with this situation on the Stata
  Web site at http://www.stata.com/support/faqs/stat/stratum.html.
 
 In contrast to SUDAAN and Stata, SAS and SPSS will automatically generate standard errors when there are some strata with only one PSU. The methodology used to compute standard errors when there are some strata with only one PSU used by SAS and SPSS differs from that used by SUDAAN when the MISSUNIT option is specified (see note under example below). Consequently, standard errors from these packages will not necessarily be identical (see example below).
 
 Example: Using the 2002 MEPS hospital inpatient stays file (PUF HC-067D) as the analytic file, the following sample programming code for SAS, SUDAAN, and SPSS produce estimates of the average total expense per hospital stay in 2002 ($8,698.00) and the corresponding standard error ($286.55 from SAS and SPSS versus $298.30 from SUDAAN with MISSUNIT option).
 
 SAS
 
 proc surveymeans;
 stratum varstr;
 cluster varpsu;
 weight perwt02f;
 var ipxp02x;
 
 Note: The variances for the strata with only one PSU are considered to be 0
  in the overall computation of standard errors.
 
 SUDAAN
 
 Note: SUDAAN requires that the data be sorted by the survey design variables
  that appear on the NEST statement (i.e. varstr02 varpsu02 in example below).
 
 proc descript filetype=sas design=wr;
 nest varstr varpsu/missunit;
 weight perwt02f;
 var ipxp02x;
 
 Note: The missunit option on the nest statement specifies that if only one
  sample unit is encountered within a stage (i.e., one PSU in a stratum), then
  the contribution of that unit toward the overall standard error is estimated
  using the difference in that unit is value and the overall mean value of the
  population.
 
 SPSS
 
 csplan analysis
 /plan file=’filename’
 /planvars analysisweight=perwt02f
 /design strata=varstr cluster=varpsu
 /estimator type=wr.
 csdescriptives
 /plan file=’filename’
 /summary variables=ipxp02x
 /mean
 /statistics se.
  
    B. MEPS Supplements
 
 The MEPS includes periodic supplements that collect data for only a subset
  of sample persons (e.g., persons with a specific health condition). Analyzing
  data from these supplements requires the use of special weights that are set
  to 0 for persons not included in the supplement. Analysis of MEPS supplements
  in which a substantial number of persons have a weight of 0 can produce similar
  problems and require similar approaches to those described above in the section
  on MEPS event-level files.
 
 Example: Using the 2002 MEPS full-year consolidated file (PUF HC-070) as the
  analytic file (which contains data from the diabetes supplement), the following
  sample programming code for SAS, SUDAAN, and SPSS produce estimates of the
  proportion of the population in 2002 that treated their diabetes with insulin
  injections (26.63 percent) and the corresponding standard error (1.17 percent
  from SAS and SPSS versus 1.19 percent from SUDAAN with MISSUNIT option).
 
 The analytic variable INSINJECT in this example was set equal to 1 if the respondent
  indicated they used insulin injections and set equal to 0 if they indicated
  they did not use insulin injections.
 
 SAS
 
 proc surveymeans;
 stratum varstr;
 cluster varpsu;
 weight diabw02f;
 var insinject;
 Note: The variances for the strata with only one PSU are considered to be 0
  in the overall standard error computation.
 
 SUDAAN
 
 Note: SUDAAN requires that the data be sorted by the survey design variables
  that appear on the NEST statement (i.e., varstr00 varpsu00 in example below).
 
 proc descript filetype=sas design=wr;
 nest varstr varpsu/missunit;
 weight diabw02f;
 var insinject;
 
 Note: The missunit option on the nest statement specifies that if only one
  sample unit is encountered within a stage (i.e., one PSU in a stratum), then
  the contribution of that unit toward the overall standard error is estimated
  using the difference in that unit's value and the overall mean value of the
  population.
 
 SPSS
 
 csplan analysis
 /plan file=’filename’
 /planvars analysisweight=diabw02f
 /design strata=varstr cluster=varpsu
 /estimator type=wr.
 csdescriptives
 /plan file=’filename’
 /summary variables=insinject
 /mean
 /statistics se.
 ^top            
          
 
              
                | Suggested
                      Citation: Machlin, S., Yu, W., and Zodet, M. Computing Standard
                  Errors for MEPS Estimates. January 2005. Agency for Healthcare
                  Research and Quality, Rockville, Md. http://www.meps.ahrq.gov/survey_comp/standard_errors.jsp
 |                |  |