Researchers and users with approved research projects can access restricted data files that have not been publicly released for reasons of confidentiality at the AHRQ Data Center in Rockville, Maryland.
Qualified researchers can also access restricted data files through the U.S. Census Research Data Center (RDC) network (http://www.census.gov/ces/dataproducts/index.html -- Scroll down the page and click on the Agency for Health Care Research and Quality (AHRQ) link.) For information on the RDC research proposal process and the data sets available, read AHRQ-Census Bureau agreement on access to restricted MEPS data.
Data Files Available at the AHRQ Data Center
Confidential data files
Confidential and non-public use variables
- Household Component–Insurance Component
Linked File (available for 1996–1999 and 2001). For
many households, health insurance available through a household
member’s employer is a key source of coverage. However, many
details about the health insurance plans offered by and/or obtained
through employers are not readily available from household respondents. To fill this data gap, the employers of Household Component jobholders
are contacted through the Insurance Component survey. Employers
are asked about health insurance offerings, premiums, employee
contributions to premiums and other plan details for their establishment
as a whole. This information is then linked, via a data file, to
data collected during the household survey for the jobholder to
provide a more complete picture of the jobholder’s insurance
options. Significant survey non-response prevents these linked
data from being used to make nationally representative estimates
and results cannot be generalized beyond the sample of persons
included in the file. Thus, no weight has been constructed
for these files. Detailed documentation of these research
files for 1996–1999, including a crosswalk of variables to the MEPS-IC questionnaires
and the codebooks, are available for download. There are no plans for collecting
these data in future years.
- Nursing Home Component (available
for 1996 only). The
Nursing Home Component was a survey of nursing homes and persons resident in
or admitted to nursing homes at any time during calendar year 1996. The survey
gathered information on the demographic characteristics, residence history,
health and functional status, use of services, use of prescription medications,
and health care expenditures of nursing home residents. Nursing home administrators
and designated staff also provided information on facility size, ownership,
certification status, services provided, revenues and expenses, and other facility
characteristics. A community questionnaire obtained data from next of kin or
other knowledgeable persons in the community on income, assets, family relationships,
and caregiving information for the sampled nursing home resident. Detailed
documentation of these research files, including links to the Nursing Home
survey questionnaires and codebooks, are available for download. There are no plans for collecting
these data in future years.
- Medical Provider Component. The primary purpose
of the Medical Provider Component is to collect detailed charge and payment
data from hospitals, physicians, home health care providers, and pharmacies
to supplement information received from Household Component respondents on
their health care expenses. The billing data collected includes whether a visit
was capitated, charges for each procedure (CPT4) for office-based and outpatient
department visits, and amounts for each source of payment. Diagnosis codes
for medical visits and stays and NDC codes for prescription drugs are also
collected. While the Medical Provider Component was not designed to make
stand-alone national estimates, the detailed information on payments, procedure
codes, and diagnostic codes support a variety of analyses not possible with
the Household Component data alone.
- Area Resource File. The Area Resource File
(ARF) is a county-specific health resources data file designed to
be used by planners, policy makers, researchers, and other professionals
in the nation's health care delivery system and
factors that may impact health status and health care in the U.S.
It is a database containing more than 7,000 variables for each of the
nation's counties. The ARF contains information on health
facilities, health professions, measures of resource scarcity, health
status, economic activity, health training programs, and socioeconomic
and environmental characteristics. Historically and when requested,
data from the ARF has been merged onto MEPS files that are used in
the AHRQ Data Center. This can still be done; however, you now need
to show us that you have obtained a copy of the ARF file from
Health Resources and Services Administration (HRSA). Go to the website http://arf.hrsa.gov for more information on obtaining copies of the
- Two-Year, Two-Panel File. The two-year, two-panel (2YP) files are person-level research
files that pool data from the full-year consolidated files, supplemental variable files,
and longitudinal weight files. There is a file for each panel, and each file has one record
for each person and variables from the first and second years of the panels. The population
is all persons from panels 1 through 4 who had records in any of the relevant full year PUFs
(HC-012, HC-020, HC-028, HC-038, HC-039). The full-year variables were renamed so the variable
names are consistent across panels to facilitate research based on pooled panels. Pooling panels
may be useful for increasing the power of analyses of smaller populations, such as racial and
ethnic minorities, people with disabilities, rural residents, people treated for particular
health conditions, and less frequently used services.
- MEPS Public Use Data Files. All data files available for downloading on the MEPS Web site
are also available in the AHRQ Data Center.
Application Procedure and Form
- Fully Specified ICD-9 Codes: These codes allow medical conditions to be identified with greater specificity.
- Fully Specified Industry and Occupation Codes: These codes allow a worker's industry and occupation to be identified with greater specificity.
- State and County FIPS Codes: These codes can be used to merge data from the Area Resource File, or any data at the State and/or County level, onto the MEPS data.
- Census Tract and Block-Group Codes: These codes can be used to merge data from the U.S. Census, or any data at the tract or block-group level, onto the MEPS data.
- Non-Public Use Data Elements: These are
data elements from our questionnaires that are not directly identifiable
data, but have yet to be edited or released, i.e., asset information
and imputed NDC codes.
- Federal and State Marginal Tax Rates: Tax simulations using the National Bureau of Economic Research’s TAXSIM package have been run for the 1996–2002 HC full-year populations. Tax amounts and marginal tax rates have been computed for Federal, State and FICA taxes. Property taxes, sales taxes, and city/county income taxes have not been simulated.
Prospective researchers must submit an application, including a
research proposal, that will be reviewed by a committee. The application forms
are available in HTML
and PDF formats. AHRQ Data
Center applications are accepted continuously and are reviewed and
approved on a monthly basis.
It is recommended that researchers have an initial discussion of
the feasibility of the proposed project with AHRQ Data Center staff.
The complete application package should be mailed to the AHRQ Data
Center coordinator at the Division of Survey Operations/CFACT,
Agency for Healthcare Research and Quality, 540 Gaither Road, Rockville
The AHRQ Data Center manager will review material for completeness, receive clarification
from researcher (if needed), and make a recommendation for approval
to the director of the Center for Financing, Access and Cost Trends,
AHRQ. If needed, the proposal can iterate with the researcher and
Data Center manager until an appropriate package is developed.
If necessary, a cost estimate for required services will be reviewed
with the researcher/applicant, once the project is approved.
The following forms are developed and specified for each project by
the AHRQ Data Center coordinator: 1) The AHRQ Data Center agreement,
which makes clear the scope of the project, data, and services to be
the researcher and the Data Center; the obligations of both parties;
and the cost; 2) Task order/billing agreement for programming support;
and 3) Confidentiality affidavits, as required.
The AHRQ Data Center charges a user fee of $300.00 for approved
Data Center projects to cover technical assistance, simple file
construction, and up to four hours of programming support. This
fee will be waived for full-time graduate students working on
dissertations or other degree requirements, and Federal Government
The Data Center fee is also waived for using a Census
Bureau's Remote Data Center (RDC); however, the applicant
will be responsible for any additional fees required by the
Census Bureau while working at the RDC.
The AHRQ Data Center manager will coordinate the review of each proposal. The following will be considered in reviewing proposals:
Users should note that approval of the application does not constitute endorsement
by AHRQ of the substantive, methodological, theoretical, policy relevance or
scientific merit of the proposed research. Approval only constitutes a judgment
that the research is doable, broadly consistent with AHRQ’s mission, and an appropriate
use of the data in terms of what confidentiality and privacy protections were
promised to respondents or otherwise required by law.
Accessing the Data Files
Researchers must execute all computer runs using the facilities located within
the AHRQ Data Center in Rockville, Maryland. Remote access is not available at this
- The feasibility of existing data to the
project, that is, whether it is possible for the research to
be conducted with the available information. On occasion, it is
apparent from the outset that the sample will not support the intended
analysis. For instance, MEPS does not support estimates for smaller
states or for some conditions.
- The risk of disclosure of restricted information, that is whether the analysis can be conducted without compromising the confidentiality promised to all respondents (persons, employers, insurers, hospitals and physicians).
- The availability of resources to support the project at the AHRQ Data Center. At
this time only limited technical support can be provided on site,
so researchers should make every effort to familiarize
themselves with the MEPS data before coming to the data center.
- Whether the proposed project is in accordance with the mission of AHRQ as specified in its authorizing legislation.
For researchers who have already visited the AHRQ Data Center to
perform initial programming, we offer a limited service that
allows them to submit revised SAS or STATA programs that will be run
on their behalf by the Data Center coordinator. This service is not
for the purpose of program development and does not include programming
support—the researcher is responsible for developing a debugged
program. The service is to support minor modifications to existing
programs, such as changes in variables in a statistical model. If there
are numerous such requests for a single project, a separate fee will
be negotiated for providing this additional service.
The AHRQ Data Center allows researchers to supply their own data to
be merged with AHRQ data. The user-supplied data may consist of proprietary
data collected and owned by the user. Users must provide the Data Center
staff with complete documentation of any data proposed to be merged.
The documentation should include descriptive variable and value labels.
Users are responsible for interacting with Data Center staff to insure
that the data can be merged and that the formats are consistent. Merging
of characteristics of geographic areas must be done in such a way that
details of the sampling frame (i.e., what counties are included or
excluded) of the survey are not revealed to the user. Data merges will
be conducted by the Data Center or its contractors prior to the arrival
of the user. Files and information used in linking the data sets will
not be made available to the user. As a policy, AHRQ will catalog all
data (including researcher-supplied data) used at the Data Center and
will make that data available to other researchers. However, we can
make arrangements to restrict data access to your specific data if
required under existing data use agreements.
What output can be taken from the Data Center?
What output cannot be taken from the Data Center?
- All materials to be removed from the Data Center are subject to disclosure review.
- In addition to approved tables, researchers may request that the Data Center manager review and, if approved, download the following to an external media from the Data Center: programs, word processing documents, and electronic versions of approved printouts.
Data Files Not Available at the AHRQ Data Center
- Any output that could potentially identify respondents or small geographic areas, either directly or inferentially, cannot be removed from the Data Center.
- Models using geographic area as the dependent variable cannot be removed from the Data Center, unless the values are encrypted.
- The identity of sampling units, which could be used in efforts to identify the data subject, cannot be removed.
- In general, any direct or inferential identifiers not revealed on the public use data files cannot be removed from the Data Center.
- No sample case printouts are ever allowed to be removed from the Data Center.
- When using Proc Means, no cells with the same minimum and maximum or with cell sizes of less than 100 may be removed.
- When using Proc Univariate or producing cross-frequencies, no tables with the same minimum and maximum, with only one observation for the minimum and maximum, or tables with row and column sizes of less than 100 can be removed.
- Researchers will not have access to data files unless they are requested in their approved project.
- Researchers will not have access to files containing direct identifiers, such as names or addresses.
- Researchers will not have access to geographical data (state, county, census tract) unless it is encrypted. Depending upon the approved research project, researchers will be given access to characteristics of those geographic areas. Researchers may also be given access to files with dummy codes for places, but they will not be given the decodes that allow association of place name with the code. Upon request, an entire file can be pre-coded into categories (e.g., residing in a state with high/medium/low Medicaid generosity).
|AHRQ Data Center Coordinator
Division of Survey Operations
Center for Financing, Access, and Cost Trends
Agency for Healthcare Research and Quality
540 Gaither Road
Rockville, MD 20850
Phone: (301) 427-1406
Fax: (301) 427-1276