MEPS HC-036: 1996-2007 Pooled Estimation File
November 2009
(Documentation revised May 2010)
Agency for Healthcare Research and Quality
Center for Financing, Access, and Cost Trends
540 Gaither Road
Rockville, MD 20850
(301) 427-1406
TABLE OF CONTENTS
A. Data Use Agreement
B. Background
1.0 Household Component
2.0 Medical Provider Component
3.0 Survey Management
C. Technical and Programming Information
1.0 General Information
2.0 Data File Information
3.0 Linking Instructions
4.0 Adjustment of Analytic Weight Variable
5.0 Subpopulation Analysis Caveat
6.0 Further Information
A. Data Use Agreement
Direct individual identifiers have been removed from the micro-data contained in these files. Nevertheless, under data
Section 308(d) of the Public Health Service Act (42, U.S. Code, 242m(d)) and the Confidential Information Protection and Statistical
Efficiency Act (CIPSEA) (Title 5 of PL 107-347), National Center for Health Statistics (NCHS) data must be used for statistical purposes only
and no attempt must be made to identify individuals. The provisions of CIPSEA provide for a felony conviction and/or fine of up to $250,000
if this promise is violated. In addition, data collected by the Agency for Healthcare Research and Quality (AHRQ) and /or the NCHS may not
be used for any purpose other than for the purpose for which it was supplied; any effort to determine the identity of any reported cases, is
prohibited by law.
Unauthorized disclosure of confidential information is also subject to penalty under Title IX of the Public Health
Service Act, 42 U.S.C. 299, Section 924(d), which reads as follows:
"Any person who violates subsection (c) shall be subject to a civil monetary penalty of not more the same manner as civil
money penalties under subsection (a) of section 1128A of the Social Security Act are imposed and collected."
Therefore in accordance with the above referenced Federal Statutes, it is understood that:
- No one is to use the data in this data set in any way except for statistical reporting and analysis; and
- If the identity of any person or establishment should be discovered inadvertently, then (a) no use will be made of this knowledge,
(b) the Director Office of Management AHRQ will be advised of this incident, (c) the information that would identify any individual or
establishment will be safeguarded or destroyed, as requested by AHRQ, and (d) no one else will be informed of the discovered identity;
and
- No one will attempt to link this data set with individually identifiable records from any data sets other than the Medical Expenditure
Panel Survey or the National Health Interview Survey.
By using these data you signify your agreement to comply with the above stated statutorily based requirements with
the knowledge that deliberately making a false statement in any matter within the jurisdiction of any department or agency of the Federal
Government violates Title 18 part 1 Chapter 47 Section 1001 and is punishable by a fine of up to $10,000 or up to 5 years in prison.
The Agency for Healthcare Research and Quality requests that users cite AHRQ and the Medical Expenditure Panel Survey
as the data source in any publications or research based upon these data.
Return to Table of Contents
B. Background
1.0 Household Component
The Medical Expenditure Panel Survey (MEPS) provides nationally representative estimates of health care use,
expenditures, sources of payment, and health insurance coverage for the U.S. civilian non-institutionalized population. The MEPS
Household Component (HC) also provides estimates of respondents' health status, demographic and socio-economic characteristics, employment,
access to care, and satisfaction with health care. Estimates can be produced for individuals, families, and selected population subgroups.
The panel design of the survey, which includes 5 Rounds of interviews covering 2 full calendar years, provides data for examining person level
changes in selected variables such as expenditures, health insurance coverage, and health status. Using computer assisted personal interviewing
(CAPI) technology, information about each household member is collected, and the survey builds on this information from interview to interview.
All data for a sampled household are reported by a single household respondent.
The MEPS-HC was initiated in 1996. Each year a new panel of sample households is selected. Because the data
collected are comparable to those from earlier medical expenditure surveys conducted in 1977 and 1987, it is possible to analyze long-term
trends. Each annual MEPS-HC sample size is about 15,000 households. Data can be analyzed at either the person or event level. Data must
be weighted to produce national estimates.
The set of households selected for each panel of
the MEPS-HC is a subsample of households participating in the previous
year's National Health Interview Survey (NHIS) conducted by the National
Center for Health Statistics. The NHIS sampling frame provides a
nationally representative sample of the U.S. civilian non-institutionalized
population and reflects an oversample of blacks, Hispanics, and Asians
since 2006. MEPS oversamples additional policy relevant sub-groups. Details
of the MEPS sample design have been previously published –
http://www.meps.ahrq.gov/mepsweb/data_stats/Pub_ProdResults_Details.jsp?pt=Methodology
Report&opt=2&id=852.
The linkage of the MEPS to the previous year's NHIS provides additional
data for longitudinal analytic purposes.
Return to Table of Contents
2.0 Medical Provider Component
Upon completion of the household CAPI interview
and obtaining permission from the household survey respondents, a sample
of medical providers are contacted by telephone to obtain information
that household respondents cannot accurately provide. This part of the
MEPS is called the Medical Provider Component (MPC) and information
is collected on dates of visit, diagnosis and procedure codes, charges
and payments. The Pharmacy Component (PC), a subcomponent of the MPC,
does not collect charges or diagnosis and procedure codes but does collect
drug detail information, including National Drug Code (NDC) and medicine
name, as well as date filled and sources and amounts of payment. The
MPC is not designed to yield national estimates. It is primarily used
as an imputation source to supplement/replace household reported
expenditure information.
Return to Table of Contents
3.0 Survey Management
MEPS-HC and MPC data are collected under the authority
of the Public Health Service Act. Data are collected under contract
with Westat, Inc. Data sets and summary statistics are edited and published
in accordance with the confidentiality provisions of the Public
Health Service Act and the Privacy Act. The National Center for Health
statistics (NCHS) provides consultation and technical assistance.
As soon as data collection and editing are completed, the MEPS survey data are released to the public in staged releases of
summary reports, micro data files, and tables via the MEPS web site: www.meps.ahrq.gov.
Selected data can be analyzed through MEPSnet, an on-line interactive tool designed to give data users the capability to statistically analyze
MEPS data in a menu-driven environment.
Additional information on MEPS is available from the MEPS project manager or the MEPS public use data manager at the Center
for Financing Access and Cost Trends, Agency for Healthcare Research and Quality, 540 Gaither Road, Rockville, MD 20850 (301-427-1406).
Return to Table of Contents
C. Technical and Programming Information
1.0 General Information
To facilitate analysis of subpopulations and/or
low prevalence events, it may be desirable to pool together (i.e., combine)
more than one year of MEPS-HC data to yield sample sizes large enough
to generate reliable estimates. MEPS-HC samples in most years are not
completely independent because households are drawn from the same sample
geographic areas and many persons are in the sample for two consecutive
years (see MEPS-HC Methodology Reports for more details at http://www.meps.ahrq.gov).
Despite this lack of independence, it is valid to pool multiple years
of MEPS-HC data and keep all observations in the analysis because each
year of the MEPS-HC is designed to be nationally representative. However,
to obtain appropriate standard errors when pooling years of MEPS-HC data,
it is necessary to specify a common variance structure that properly
reflects the complex sample design of the MEPS.
This HC-036 file contains the proper variance structure to use when
making estimates from MEPS data that have been pooled over multiple years
and where one or more years are from 1996-2001. Prior to 2002, each annual
MEPS public use file was released with a variance structure unique to
the particular MEPS sample in that year. The variance structure in this
HC-036 file reconciles the differences in the variance units between
the units on the released annual MEPS public use files.
Starting in 2002, the annual MEPS public use files were released with
a common variance structure that allows users to pool data from 2002
and forward. This common variance structure is neither compatible with
the structure on the annual PUFs released prior to 2002 nor is it compatible
with the structure on this HC-036 dataset. Therefore, it is only necessary
to use the variance structure on this HC-036 dataset when pooling data
from MEPS years prior to 2002. The following scenarios provide some
guidelines for when analysts should use the variance structure in
this HC-036 file.
MEPS Years Pooled |
|
< 2001 |
2001 |
2002 |
2003 |
2004+ |
Which variance structure to use |
|
|
|
|
|
HC-036 |
|
|
|
|
|
HC-036 |
|
|
|
|
|
Annual PUFs |
|
|
|
|
|
Annual PUFs |
In the first scenario, only MEPS data from years
prior to 2002 are pooled together. In this case, analysts must use the
variance structure in HC-036. In the second scenario, data from years
prior to 2002 is pooled together with data from 2002 and forward. The
variance structure from HC-036 must be used in this circumstance as well.
In the last two scenarios, no data from years prior to 2002 are pooled.
In both of these cases, analysts should use the variance structure on
the released annual public use files. In no circumstance should the variance
structure on the annual PUFs be combined with the variance structure
on the HC-036 dataset.
The variables STRA9607 (stratum of the primary sampling unit) and PSU9607
(primary sampling unit) in this HC-036 dataset provide the appropriate
sample design information needed by survey procedures in software packages
that implement the with-replacement Taylor series linearization method
to obtain estimates of complex sample variances. The variables BRR1 – BRR128 in the HC-036BRR dataset (http://www.meps.ahrq.gov/mepsweb/data_stats/download_data_files_detail.jsp?cboPufNumber=HC-036BRR)
provide a comparable replicate sample design structure. These replicates
can be incorporated in software package survey procedures that implement
the balanced repeated replication (BRR) method to produce estimates
of complex sample variances.
Return to Table of Contents
2.0 Data File Information
Released as an ASCII data file (with SAS® and
SPSS® user statements) and in SAS Transport version, the HC-036 file
contains 202,468 records corresponding to the number of unique persons
in MEPS from 1996-2007. These records contain the standard MEPS-HC person
level ID variables (DUPERSID and PANEL), as well as the pooled variance
estimation structure (STRA9607 and PSU9607).
There is a record for each unique person
appearing in any of the 1996-2007 MEPS-HC full year person level public
use files: HC-012, HC-020,
HC-028, HC-038, HC-050, HC-060, HC-070,
HC-079, HC-089, HC-097, HC-105,
and HC-113. These twelve data sets have a combined total of 542,851
records; however, as each person may appear in one or two of these
data sets,
the number of unique persons (202,468) is fewer than the sum of the
number of records across the annual files (542,851).
Return to Table of Contents
3.0 Linking Instructions
The following steps should be taken to create a
pooled analysis dataset.
- Create a dataset for each year containing the records for all persons
to be included in the analysis. Keep the
unique person identifier variables (DUPERSID and PANEL), the person-level
sampling weight,
any classification variables (e.g., sex, race/ethnicity) and response
variables (e.g., total expenditure amount, number of prescription
drug purchases, etc) to be used in the data analysis.
- Reconcile the
discrepancies in variable names. For all years, most variable names
on the annual public use files contain a 2-digit
year
suffix. For instance, in the 1997 consolidated person-level
file (HC-020) the panel variable is called PANEL97, the total annual
expenditure
amount variable is called TOTEXP97 and the sampling weight
variable
is called
WTDPER97. However, in the 2003 dataset (HC-079) these same
variables are named PANEL03, TOTEXP03 and PERWT03F, respectively, and
in the 1996
dataset
(HC-012) the total expenditure and sampling weight variables
are named TOTEXP96 and WTDPER96, respectively, and the panel
variable is missing
(users should assign a value of 1 for each record in HC-012).
Starting in 2005, the panel variable is simply named PANEL
(no year suffix).
As illustrated below, the variable names must be made consistent
before pooling the data.
- Create a pooled analysis dataset by simply combining the individual-year
datasets (e.g., the records from the 1996 and 1997 files). In other
words, the number of records in the pooled file will equal the sum
of the record counts for the individual annual files being pooled.
- Attach the pooled variance structure to the pooled analysis dataset
by merging the variables STRA9607 and PSU9607 from this HC-036 file
to the pooled analysis dataset by DUPERSID and PANEL keeping all
records in the pooled analysis dataset only. Depending on the software
being
used to manage the datasets, the pooled analysis dataset may need
to be sorted by DUPERSID and PANEL prior to merging. This step
will add
two additional variables to the pooled file (STRA9607 and PSU9607)
but have no impact on the number of records.
Return to Table of Contents
4.0 Adjustment of Analytic
Weight Variable
It is generally recommended that analysts adjust
the analytic weight variable by dividing it by the number of years being
pooled. The sum of these adjusted weights represents the average annual
population size for the pooled period (rather than the sum of the population
sizes across multiple years that would result from unadjusted weights).
Although this adjustment will have no effect on estimated means, proportions,
or regression coefficients because the weight variable is being divided
by a constant (i.e., number of years), estimates of totals based on adjusted
weights will reflect an “average annual” basis rather than
the entire pooled period. If the objective is to
produce an estimated total for the entire pooled period (e.g., total
medical
expenditures across multiple years rather than average per year), then
the analytic
weight variable should not be divided by the number of years in the pooled
period.
Return to Table of Contents
5.0 Subpopulation Analysis
Caveat
When pooling data over several years to increase
sample sizes for small subdomains of the population (e.g., children
with asthma), users must be careful to maintain the integrity of the
MEPS survey design. The MEPS design is accounted for by the full set
of survey stratum and PSU values on both the annual files and this HC-036
pooled linkage file.1 When users create analytic subfiles that contain
only respondents in the subdomain of interest (e.g., children with asthma),
it is very unlikely that there will be all combinations of stratum and
PSU that properly account for the MEPS survey design in a linearized
estimate of the sampling variances. Therefore, the following approach
is recommended for analyzing subpopulations in MEPS:
- Construct a flag variable for all survey respondents that can be
used to identify persons in the subdomain of interest,
- Using a with-replacement design option for a Taylor Series procedure
in a complex survey design statistical software package, read in
records from all respondents (i.e., not just those in the subdomain
of interest)
and specify the analytic subdomain using the flag variable (see
step 1 above).2
6.0 Further Information
For any question regarding the HC-036 file or pooling
of data, please contact Sadeq Chowdhury by e-mail at: sadeq.chowdhury@ahrq.hhs.gov or
Fred Rohde by e-mail at: frederick.rohde@ahrq.hhs.gov.
1 The MEPS design is also accounted for by the full set of
replicates in the HC-036BRR data set (http://www.meps.ahrq.gov/mepsweb/data_stats/download_data_files_detail.jsp?cboPufNumber=HC-036BRR).
2 The syntax for specifying survey designs and analytic subdomains
varies across
software packages (see section IB at http://www.meps.ahrq.gov/mepsweb/survey_comp/standard_errors.jsp for
examples).
Return to Table of Contents
|