Methodology Report #17:
Additional Imputations of Employer Information for the Insurance
Component of the Medical Expenditure Panel Survey since 1996
John P. Sommers, PhD, Agency for Healthcare Research and Quality.
Table of Contents
Abstract
The Medical Expenditure Panel Survey (MEPS)
Background
General Technical Methods
Process for Each Group of Variables
Eighth group of variables
Ninth group of variables
Tenth group of variables
Eleventh group of variables
Twelfth group of variables
Thirteenth group of variables
Fourteenth group of variables
Fifteenth group of variables
Sixteenth group of variables
Seventeenth group of variables
Government Imputation Process
References
Appendix: Definitions of Selected Variables
Abstract
This report describes the process used to impute values
for missing establishment and plan characteristics for the Insurance Component of the Medical
Expenditure Panel Survey that are currently imputed but were not imputed for the 1996 Insurance
Component. (The list of items imputed for the 1996 Insurance Com-ponent, and which continue to
be imputed, can be found in MEPS Methodology Report No. 10.) The process involves four types of
cases: list sample, private sector; list sample, government; household sample, private sector;
and household sample, government. The description includes preparation of the data, selection of
the donors, and use of donor and other information to create the item for the recipient.
The estimates in this report are based on the most
recent data available at the time the report was written. However,
selected elements of MEPS data may be revised on the basis of additional
analyses, which could result in slightly different estimates from those
shown here. Please check the MEPS Web site for the most current file
releases.
Center for Financing, Access, and Cost Trends
Agency for Healthcare Research and Quality
540 Gaither Road
Rockville, MD 20850
www.meps.ahrq.gov
The Medical Expenditure Panel Survey (MEPS)
Background
The Medical Expenditure Panel Survey (MEPS) is conducted to provide
nationally representative estimates of health care use, expenditures,
sources of payment, and insurance coverage for the U.S. civilian
noninstitutionalized population. MEPS is cosponsored by the Agency for
Healthcare Research and Quality (AHRQ), formerly the Agency for Health
Care Policy and Research, and the National Center for Health Statistics
(NCHS).
MEPS comprises three component surveys: the Household Component (HC),
the Medical Provider Component (MPC), and the Insurance Component (IC).
The HC is the core survey, and it forms the basis for the MPC sample and
part of the IC sample. Together these surveys yield comprehensive data
that provide national estimates of the level and distribution of health
care use and expenditures, support health services research, and can be
used to assess health care policy implications.
MEPS is the third in a series of national probability surveys
conducted by AHRQ on the financing and use of medical care in the United
States. The National Medical Care Expenditure Survey (NMCES) was
conducted in 1977, the National Medical Expenditure Survey (NMES) in
1987. Beginning in 1996, MEPS continues this series with design
enhancements and efficiencies that provide a more current data resource
to capture the changing dynamics of the health care delivery and
insurance system.
The design efficiencies incorporated into MEPS are in accordance with
the Department of Health and Human Services (DHHS) Survey Integration
Plan of June 1995, which focused on consolidating DHHS surveys,
achieving cost efficiencies, reducing respondent burden, and enhancing
analytical capacities. To accommodate these goals, new MEPS design
features include linkage with the National Health Interview Survey
(NHIS), from which the sample for the MEPS-HC is drawn, and enhanced
longitudinal data collection for core survey components. The MEPS-HC
augments NHIS by selecting a sample of NHIS respondents, collecting
additional data on their health care expenditures, and linking these
data with additional information collected from the respondents’ medical
providers, employers, and insurance providers.
Household Component
The MEPS-HC, a nationally representative survey of the U.S. civilian
noninstitutionalized population, collects medical expenditure data at
both the person and household levels. The HC collects detailed data on
demographic characteristics, health conditions, health status, use of
medical care services, charges and payments, access to care,
satisfaction with care, health insurance coverage, income, and
employment.
The HC uses an overlapping panel design in which data are collected
through a preliminary contact followed by a series of five rounds of
interviews over a two and a half year period. Using computer-assisted
personal interviewing (CAPI) technology, data on medical expenditures
and use for two calendar years are collected from each household. This
series of data collection rounds is launched each subsequent year on a
new sample of households to provide overlapping panels of survey data
and, when combined with other ongoing panels, will provide continuous
and current estimates of health care expenditures.
The sampling frame for the MEPS-HC is drawn from respondents to NHIS,
conducted by NCHS. NHIS provides a nationally representative sample of
the U.S. civilian noninstitutionalized population, with oversampling of
Hispanics and blacks.
Medical Provider Component
The MEPS-MPC supplements and validates information on medical care
events reported in the MEPS-HC by contacting medical providers and
pharmacies identified by household respondents. The MPC sample includes
all hospitals, hospital physicians, home health agencies, and pharmacies
reported in the HC. Also included in the MPC are all office-based
physicians:
- Providing care for HC respondents receiving
Medicaid.
- Associated with a 75 percent sample of households
receiving care through an HMO (health maintenance organization) or managed
care plan.
- Associated with a 25 percent sample of the remaining
households. Data are collected on medical and financial characteristics of
medical and pharmacy events reported by HC respondents, including:
- Diagnoses coded according to ICD-9 (9th Revision,
International Classification of Diseases) and DSMIV (Fourth Edition,
Diagnostic and Statistical Manual of Mental Disorders).
- Physician procedure codes classified by CPT-4
(Current Procedural Terminology, Version 4).
- Inpatient stay codes classified by DRG (diagnosis
related group).
- Prescriptions coded by national drug code (NDC),
medication names, strength, and quantity dispensed.
- Charges, payments, and the reasons for any
difference between charges and payments.
The MPC is conducted through telephone interviews and mailed survey
materials.
Insurance Component
The MEPS-IC collects data on health insurance plans obtained through
private and public sector employers. Data obtained in the IC include the
number and types of private insurance plans offered, benefits associated
with these plans, premiums, contributions by employers and employees, and
employer characteristics.
Establishments participating in the MEPS-IC are selected through
three sampling frames:
-
A list of employers or other insurance providers identified by MEPS-HC
respondents who report having private health insurance at the Round 1
interview.
-
A Bureau of the Census list frame of private-sector business establishments.
-
The Census of Governments from the Bureau of the Census.
To provide an integrated picture of health insurance, data collected
from the first sampling frame (employers and other insurance providers)
are linked back to data provided by the MEPS-HC respondents. Data from
the other three sampling frames are collected to provide annual national
and State estimates of the supply of private health insurance available to
American workers and to evaluate policy issues
pertaining to health insurance. Since 2000, the Bureau of Economic
Analysis has used national estimates of employer contributions to group
health insurance from the MEPS-IC in the computation of Gross Domestic
Product (GDP).
The MEPS-IC is an annual panel survey.
Data are collected from the selected organizations through a prescreening
telephone interview, a mailed questionnaire, and a telephone follow-up for
nonrespondents.
Survey Management
MEPS data are collected under the authority of the Public Health
Service Act. They are edited and published in accordance with the
confidentiality provisions of this act and the Privacy Act. NCHS
provides consultation and technical assistance.
As soon as data collection and editing are completed, the MEPS survey
data are released to the public in staged releases of summary reports
and microdata files. Summary reports are released as printed documents
and electronic files. Microdata files are released on CD-ROM and/or as
electronic files.
Printed documents and CD-ROMs are available through the AHRQ
Publications Clearinghouse. Write or call:
AHRQ Publications Clearinghouse
Attn: (publication number)
P.O. Box 8547 Silver Spring, MD 20907
800-358-9295
703-437-2078 (callers outside the United States only)
888-586-6340 (toll-free TDD service; hearing impaired only)
To order online, send an e-mail to: ahrqpubs@ahrq.gov.
Be sure to specify the AHRQ number of the document or CD-ROM you are
requesting. Selected electronic files are available through the Internet
on the MEPS Web site:
http://www.meps.ahrq.gov/
For more information, visit the MEPS Web site or e-mail mepspd@ahrq.gov.
Return to Table of Contents
Background
This report is the second report containing
information on imputation of variables for the Insurance Component of
the Medical Expenditure Panel Survey (MEPS-IC). The MEPSIC is a survey
of employers, both private industry and public, that collects
information on employer-sponsored health insurance. The survey is
sponsored by the Agency for Healthcare Research and Quality and
conducted by the U.S. Census Bureau. It is designed to collect
information on employment-related health insurance, such as premiums and
types of plans offered. Information that describes characteristics of
the employer is also collected. These data are used to classify
employers for calculations of averages and totals and to serve as
independent variables for economic modeling.
The sample design of the MEPS-IC is described in
Sommers, 1999. Imputation of missing data for 1996 is described in
Sommers, 2000a. These documents reflect the survey as of the 1996 survey
year. Since that time, unions and insurers of respondents to the
Household Component of MEPS (MEPS-HC) have been dropped from the sample
due to the low response rates. The sample of self-employed individuals
with no employees (SENEs) has also been dropped due to a combination of
factors. The major reasons were low response rates and the fact that
many self-employed did not have insurance as self-employed individuals,
instead they obtained insurance through another employer or through
their spouse’s employment. Because the employers providing the insurance
in these cases are covered through the main sample of employers, this
further limited the number of sample persons with usable data. Combined
with the low response rates, this caused the sample of SENEs to be of
marginal value and so it was dropped.
Since that first survey for 1996, the list of variables imputed with
the data have been expanded significantly. Some of these are variables
that were not originally collected; however, most are additions to the
list that were collected for the 1996 survey year but not imputed.
(Copies of the standard 1999 MEPS-IC data collection questionnaires for
establishments and plans are available on the MEPS Web site at
http://www.meps.ahrq.gov/mepsweb/survey_comp/survey_ic.jsp.)
Because the imputations being described are an expansion of the
previous list and the basic technical methods are very similar, this
paper will not give the level of detail provided in Sommers, 2000a.
This paper also assumes that the previous imputations described have
been completed before the imputations described here take place. For
these reasons, readers are advised to be familiar with the previous work
described in Sommers, 2000a, in order to have the clearest picture of
the process.
Return to Table of Contents
General Technical Methods
The original imputation methods report described the process for
seven groups of variables. Variables were grouped based on natural
relationships. For instance, questions relating to whether an employer
offered health insurance to retirees and whether they offered insurance
to retirees below age 65, above, or both were done together. Likewise,
the variables were ordered to maintain consistency. For instance, type
of plan providers was imputed before premiums because premiums are
influenced by type of plan providers (Sommers, 2000a).
The basic methods used to produce the additional new imputed results
were similar to those used in developing the processes for the first
seven sets of imputation. They are grouped and continue in order with
later groups built using previous imputations, if required. The new
groups are numbered from 8 to 17. As before, each group generally goes
through three phases to produce imputed results:
-
Data preparation. Data editing is completed and data is normalized; for
example, all premium values are annualized.
-
Donors are
selected for each recipient needing a value. Generally, this is done
using a hot-deck method that is similar for all groups. Specifics
behind the hot-deck process can be found in Kalton and Kasprzyk,
1986, and are based upon the premise that the expected values of two
items are the same if both items agree on a set of important
predictive characteristics. The method used to implement this
technique was developed by Stiller and Dazell, 1997. It depends on
sorting the donors and recipients. As part of the process, class and
sort variables are developed and ordered. A donor and recipient must
have the same class variables. If donors and recipients disagreed on
any class variable then their expected difference in values have
been determined to be too large. They can differ on the sort
variables, which have less effect on expected values than class
variables. Efforts are made to match on sort variables also, but
these are ordered so that if not all variables are matched the least
important are dropped first in the matching process. (A more
extensive description of this process is given in Sommers, 2000a.)
-
Final required values are produced. Many times there is a direct
substitution of one or more donor values into the recipient’s slots
for the same variables. However, some of the recipient values are
determined by using ratios or other values derived from the donor
and applying them to a current recipient value. This is done to
maintain consistency among the recipient results. For instance, to
obtain the number of employees eligible for health insurance, the
donor ratio of eligible to total employees is applied to the number
of employees of the recipient. This maintains data relationships. If
a direct substitution of the donor’s eligible employees were used,
the process would need to be limited to donors with very similar
values of total employees to the value of the recipient. Otherwise,
the recipient values would likely have an expected value that was
too high or too low dependent upon the relationship of the
employment of the recipient to the average employment of the set of
donors.
Return to Table of Contents
Process for Each Group of Variables
In the next sections of the report, we proceed
through each of the new groups of variables that were imputed for the
MEPS-IC. For each group, we give the list of variables to be imputed
with reference to their questionnaire name (establishment or plan) and
question number. (Copies of the standard 1999 MEPS-IC data collection
questionnaires for establishments and plans are available on the MEPS
Web site at
http://www.meps.ahrq.gov/mepsweb/survey_comp/survey_ic.jsp.)
We also describe sort variables used in the selection of all donors for
the individual variables within the group, describe class variables used
for imputation for all individual variables within the group, and
describe the step-by-step process used to create values for each type of
recipient from the donor information. Sort and class variables are given
for imputation of private sector data. Changes made for imputation of
government data are given in the last section of this report. See the
appendix for precise definitions of required class and sort variables.
We assume that all logical edits have been performed
before the imputation takes place. Thus, for instance, if a respondent
gave a total number of part-time employees in question D1b as zero and
did not fill in how many were eligible or enrolled, these values would
automatically be set to zero. Because of this assumption, we do not
discuss logical edits in process descriptions unless this information
adds to the discussion of the process.
Throughout the process, we assume a standard
definition of a responding establishment and responding plan. An
establishment was considered a respondent if it answered that it did or
did not provide insurance for its employees, and if the establishment
did provide insurance for some of its employees, the establishment also
responded at the plan level for at least one of its plans. Responding
plans are defined as those that had information provided for at least
one of the following items on the plan questionnaire for the specific
plan:
- Type of providers, question 2
- Gatekeeper required, question 3
- Purchased or self-insured, question 4
- Plan active enrollment, question 7a
- Premium levels and contributions, questions 8 and 9
In the following sections, we describe the imputations of variables
in the new groups 8 through 17. As was done in the previous methods
report (Sommers, 2000a), we give (1) the variables imputed in the
groups, (2) the sort variables, in order of importance from most to
least important, and (3) the class variables. Along with these lists, we
give the processes used to convert data from the donor to create the
recipient values.
Return to Table of Contents
Eighth group of variables
Plan questionnaire (Pq) 6c |
Annual plan cost for self-insured plans. |
Establishment questionnaire (Eq) E1 |
Total annual cost of coverage for
all hospitalization/physician plans at the location. |
Sort variables
None, hot deck process not used.
Class Variables
None
Process
To impute these values does not require the selection of
any other donors. At this time, all enrollments and premiums for each
plan have been imputed in earlier imputation groups. Using the
assumption that the plan premium and enrollments are the same for the
entire year at the establishment, then the total annual cost for a plan
is the number of single enrollees multiplied by the annual single
premium plus the number of married enrollees multiplied by the annual
married premium. Using this method directly gives an estimate of total
annual plan cost for a self-insured plan. The weighted sum of these
estimates by plan for the set of plans collected for the establishment
gives an estimate of the total annual cost at the location for
hospitalization/physician plans. The weight used is the conditional plan
weight within the establishment given the establishment is in the
survey.
Return to Table of Contents
Ninth group of variables
Eq E8a |
Retirees in the
firm covered by insurance |
Eq E8b |
Retirees in the
firm with single coverage |
Eq E9a |
Retiree single
coverage premium |
Eq E9b |
Retiree single
coverage employer contribution |
Eq E10a |
Retiree family
coverage premium |
Eq E10b |
Retiree family
coverage employer contribution |
(Note these questions do not apply to the establishment in the
sample. They apply to the firm that controls the establishment. This is
done because retiree data are not available at the establishment level
and actually cannot always be related to a specific operating
establishment. For instance, retirees within a firm that worked at a
closed factory cannot be associated with a particular operating
establishment. Thus, retiree questions are for the firm and require
special estimation processes to be used in making estimates. For more
information, see Sommers, 2000b.)
Sort variables
A different donor is selected for each recipient.
A different set of sort variables is used for the hot deck for each
variable imputed. This reflects the differing sets of predictors for
each value. The variables listed have been placed in the same group
because they are all questions concerning retiree coverage, and
imputation of some of these variables requires use of one of the
previously imputed variables.
For retirees covered by insurance and retirees with single coverage,
the sort variables are (1) whether the establishment offers health insurance to
retirees under 65, (2) whether the firm offers health insurance
to retirees over 65, (3) the industry division group, (4) the
industry division, (5) Firm Age Group 2, and (6) Firm Size Class 2.
(See the appendix for variable definitions.)
For retiree single coverage premium the sort variables are
(1) industry division group, (2) industry division, (3) Firm Size Class 2,
(4) Firm Size Class 1, (5) Census division, (6) State, and (7) the size
of the retiree family coverage premium.
For retiree single coverage employer contribution the sort variables
are (1) industry division group, (2) industry division,(3) Firm Size
Class 2, (4) Firm Size Class 1, (5) Census division, and (6) State.
For retiree married coverage premium the sort variables are (1)
industry division group, (2) industry division, (3) Firm Size Class 2,
(4) Firm Size Class 1, (5) Census division, (6) State, and (7) the size
of the retiree single coverage premium.
For retiree married coverage employer contribution the sort variables
are (1) industry division group, (2) industry division, (3) Firm Size
Class 2, (4) Firm Size Class 1, (5) Census division, and (6) State.
Class variables
The class variables for all the hot deck
imputations in this group are Firm Size Class 3 and whether the
establishment offered health insurance to retirees.
Process
The values of the retiree single coverage and retiree
married coverage premiums are taken directly from the donor
establishment. The other four variables are obtained by multiplying a
ratio calculated from the donor times a value take from the recipient.
The total number of retirees for the firm is the number of employees for
the firm of the recipient multiplied by the ratio of the number of
retirees from the donor establishment’s firm over the total employment
of the donor establishment’s firm.
The total single enrollees is the total enrollment for the recipient
multiplied by the ratio of the total single enrollment for the donor’
firm over the total enrollment for the donor’s firm.
Each of the two contributions is calculated by multiplying the
corresponding (family or single) premium for the recipient by the ratio
of the donor plan’s corresponding employer contribution over the donor
plan’s corresponding premium.
Return to Table of Contents
Tenth group of variables
Eq E2a |
Optional coverages offered |
Eq E2b |
Total cost of optional coverage |
Sort variables
The two variables are imputed in sequence, but the second value does
not use a hot-deck routine and no sort values are used. For the first,
variable, "optional coverages offered," the file is sorted by industry
division and Firm Size Class 1.
Class variables
The variable Firm Size Class 2 is used as a class variable for imputation of the first variable.
Process
The first question if not answered is imputed directly
from a donor who provided a response. A donor is an establishment that
either checked that it did not offer any optional coverage or checked
one or more of the optional coverages listed. A recipient is an
establishment that either checked no box or checked that it did not
offer coverage and then checked a coverage that was offered.
The total cost of optional coverage for an establishment for those
establishments that offered this coverage, whether actual or imputed,
and did not report a cost had its costs imputed by applying a factor to
its total number of employees enrolled in health insurance.
The factors are derived from the costs of those establishments that
reported both the coverages offered and their total costs. Each
establishment could offer from one to four coverages. The reporting
establishments are grouped by whether they offer one, two, three, or
four optional coverages. The weighted sum of the reported optional
coverage costs for each group is calculated and divided by the weighted
total of their enrolled to obtain a ratio of cost per enrollee for those
establishments offering that number of optional coverages.
For each recipient, total costs are determined by multiplying the
establishment enrollment by the appropriate factor for the establishment
based upon the number of coverages offered by the establishment; i.e.,
if the establishment offers two optional coverages its total cost is its
enrollment times the average reported cost per enrollee for
establishments that reported offering two optional coverages.
Return to Table of Contents
Eleventh group of variables
Pq 8c |
How many former
employees are enrolled in plan? |
Sort variables
None, hot-deck process not used.
Class variables
Firm Size Class 2 and industry division group
Process
For each cell determined by the two class variables, the weighted
sum for reported plans of the number of former employees enrolled was
divided by the weighted sum of active enrollees for the reported plans
within the same cell. To impute the number of former employees enrolled
for a recipient plan, that plan’s total enrollment was multiplied by the
ratio calculated from reporting donor plans in the same cell.
Return to Table of Contents
Twelfth group of variables
Pq 13a |
Did plan have a deductible? |
Pq 13b |
What was annual
individual deductible? |
Pq 14a |
Did the plan require
a specific number of individual deductibles be met before the
family deductible is met? |
Pq 14b |
How many family
members were required to meet the individual deductible? |
Pq 14c |
What was the total
annual family deducible? |
Pq 15a |
Was hospital care
covered? |
Pq 15b |
How much and/or what
percentage was paid by enrollee for hospital care? |
Pq 15c |
Is physician care
covered? |
Pq 15d |
How much or what
percentage was paid by enrollee for physician care? |
Sort variables
The imputation is done in sequence with four hot-deck steps. The
variables are imputedin the following order:
- Did the plan have a deductible?
- Was the family deductible a multiple of the single
deductible?
- Did the plan have hospital coverage and did the
plan have physician coverage?
- The remaining variables in the list.
For the first two hot-deck runs, the files are sorted by (1) Firm
Size Class 2, (2) state,and (3) size of the single premium. For the
remaining two runs the files are sorted by (1)Firm Size Class 2, (2)
type of provider, (3) State, and (3) size of plan single premium.
Class variables
The class variable for whether the plan had a
deductible is the type of provider. Thereare no class variables for the
second and third hot-deck runs. The class variables for thefourth
hot-deck run are did the plan have a deductible and was the family
deductible amultiple of the single deductible.
Process
The variables are related because each set relies on
information from the previous imputation either to determine if there
needs to be an imputation at that point or to determine a class variable
for the next imputation. For instance, we must know if there is a
deductible from the first imputation in order to know if there needs to
be an imputation for the family deductible. The variables from the first
three imputations are needed to determine the structure of the
imputation results for the large number of variables imputed in the
fourth hot deck. The process approach is given below.
Due to the close interaction of the variables in this group, an
important first step is taken using a large number of logical edits. For
instance, if a plan has deductibles reported, then it is assumed that
the plan had deductibles. If co-pays are given for physician visits,
then it is assumed that physician care was covered. Once these edits
have been carried out, then the imputation steps are done in a sequence
that builds the information in a logical, correlated manner.
The first step is to determine if there was a deductible. A recipient
in this group would not have information about whether the plan had a
deductible and likely would also not have any information about most of
the other variables in this overall group. He/she would certainly not
have any information on the type of family deducible imputed in the
second hot-deck process nor the levels of the various deductibles in the
fourth hot-deck group above.
In this first step, a donor is a plan that had reported whether there
was a plan deductible. The value of the donor is directly imputed into
the recipient value to determine if the recipient had a deductible.
The donors for the second set are those plans that have a deductible
and information on the structure of the family deducible. Recipients
lack information on the nature of the family deductible but were known
to have had a deductible. As with the first hot-deck step, the donor
value is directly imputed into the recipient value.
Return to Table of Contents
The third hot deck determines if plans offered physician coverage
and/or hospitalization coverage. These two values are determined at this
stage of the imputation process because most of the remaining variables
in the imputation group are related to one of these types of care. For
instance, hospital co-pays must be determined, but before one can
determine if there is a hospital co-pay, one must know if there is
hospital coverage. If there is no coverage, there is no co-pay.
For this third hot-deck run, the donor plans are all those plans that
answered both the questions about type of coverage offered. Recipient
plans are those plans that failed to report if the plan covered either
hospital coverage, physician coverage or both. The donor value is
directly imputed into the recipient value if the recipient needs such a
value. For instance, if the recipient plan was reported to have
hospitalization coverage, but failed to report about physician coverage,
then only the donor’s value about physician coverage would be used.
For the fourth hot deck, donors are plans that have all the
information for all the items in the group that would be required (for
instance, if the plan had no deductible its deductibles could be blank)
and that offered family coverage, physician coverage, and
hospitalization coverage. Recipients are plans for which it is known
whether they have a deductible, which types of coverage are included,
and whether a family deductible is a multiple of the single deductible.
However, necessary details in these areas are not known. For instance,
if the plan had a deductible, not all required deductible values are
known. If the plan had hospitalization coverage, then it is not known
what the co-pays/percentage paid by the enrollee were.
There is only one donor per recipient. Donors have information for
all the possible fields to be imputed. This means a donor is sometimes
required to have more than the minimum information required to choose a
donor or to provide values for the recipient. For instance, all donors
have values for physician co-pays, but the recipient may not require a
physician co-pay because the plan does not have physician coverage or
the recipient plan has had this co-pay reported. Likewise, the recipient
plan may not offer family coverage and thus not require a family
deductible, but the donor plan, if it has a deductible, would have
family deductibles in case they were needed for the imputation. This
completeness and use of the two variables, whether the plan has a
deductible and the type of married deductible, assure that the donor
plan will have all the information required for any recipient plan in
the class. This donor specification was used because
(1) almost all reporting plans had the two types of coverage and married coverage,
and (2) if any information was given, complete information was
given. Thus, (1) very few donors are removed from the imputation by
the restriction leaving a large supply of donors, (2) it allows the
imputation to be carried out using a simpler process with fewer
steps by selecting a single donor for all these variables and then
using only the needed information, (3) it helps maintain correlation
and consistency of data by using the same donor, and (4) matches of donor and
recipient are still made using the most important prediction variables.
What information is used from the donor and how it is used to provide
information for the recipient depends upon the pattern of reported
information the recipient plan has. The process is such that a
determination is made as to which sections of the recipient plan are
missing information, then the process considers each section and the
pattern of missing information within that section.
The process handles the remaining items in three parts, all of the
deductibles are processed together as a group, the hospital
co-pay/percent paid and the physician co-pay/percent paid are each
handled a separate groups independent of the other and the deductibles.
Return to Table of Contents
The process for the deductibles requires that donor relationships be
maintained when imputing values to the recipient using donor ratios of
family to single deductibles. How each item is calculated depends upon
what values have been reported for the donor and recipient. One must
also remember that, at this point in the process, one knows whether the
family deductible is a value or a multiple of the single deductible for
both the donor and the recipient. Since these variables are class
variables in this imputation, the donor and recipient share this
characteristic. One also knows whether both the donor and recipient have
a deductible. Again, this is because this is a class variable in the
process. One also has, if necessary, imputed what type of coverages the
recipient provides so one knows if one requires a married deductible or
hospitalization deductible or physician deductible. On the other hand,
the donor always has each of these three types of coverage and thus can
provide for all three types of coverage within its class even if the
recipient does not need all three.
Data from donors are used in a way to both retain relationships of
the data within the donor and at the same time retain any recipient
information available. What is done depends upon the case and what the
donor and recipient deductible information is. Some of the key cases,
are as follows:
-
If the recipient has no deductible, then the deductibles are left empty.
-
If the recipient has a deductible and is
missing all deductible values, the donor values are simply imputed
to the recipient.
-
If the recipient requires a family deductible but has a single deductible,
one gets the family deductible by multiplying the reported recipient single
deductible by the ratio of family to total single deductible of the donor.
If the recipient has a family but no single deductible, the process is reversed
and the recipient family deductible is divided by the donor ratio to obtain the
single deductible.
-
To preserve whether there is a single individual deductible or separate deductibles
for hospital and physician care, if a single deductible is calculated for the recipient
using a donor ratio of total single deductible, and the donor has
separate deductibles, then the total recipient value is prorated into separate
deductibles using donor values. For instance, if the recipient reported no single deductible
but a family deducible of 200, and the donor had a family deductible of 300 and two
separate individual deductions of 75, then the total single deductible for the
recipient would be 100 = 200*(75 + 75)/300. The 100 would then be
prorated to 50 and 50 using the portions 75 and 75 from the donor to
allocate the 100 between the two individual deductibles.
-
For cases where the donor and recipient
both have family deductibles that are multiples of their single
deductions, if the recipient does not have a single deductible,
family deducible nor the number of single deductibles required for
the family deductible, then all donor values are imputed to the
recipient. However, if the recipient has no single deductible but
has a number of times the single deductible required for family
coverage, then only the recipient single deductible is taken from
the donor. As above, if the single deductible of the donor is broken
into two separate deductibles, then this pattern would be imputed to
the recipient.
-
Only required values are imputed. For
instance, if a recipient plan has no family coverage, then no family
deductibles are taken from the donor. Thus, if a recipient was
missing all values of deductibles but from earlier work one knew
that no family coverage was offered, then the family deductible from
the donor would not be used for this recipient.
The imputation of co-pays/percent paid are basically direct transfer
of values from the donor. Donors for the hospitalization
co-pays/percents had a reported value for either the co-pay or the
percent. It was assumed that if one were reported and the other value
missing that the other value was zero. It was also assumed that if a
plan had reported an amount paid that it was per stay unless the donor
reported otherwise. Recipients for hospitalization co-pays/percents had
both the hospital values missing, but offered hospital coverage. The
recipient takes from the donor plan the values of all three variables in
the set.
For physician co-pays/percents, the same assumptions and edits were
made as for hospitalization values. Thus, for physician co-pay/percents,
a recipient plan was a plan with physician coverage and no co-pay or
percentage reported, a donor plan had at least one of the two values
reported.. As with hospitalization, the recipient values were taken
directly from the donor.
Return to Table of Contents
Thirteenth group of variables
Pq 17a |
Did the plan have a
maximum out-of-pocket for an individual and , if so how much? |
Sort variables
State
Class variables
Type of provider, Pq 2
Process
The two variables in question Pq 17a are related. Only
one of the two should be answered. Donors are those plans with valid
responses to the question; that is, either they had no maximum or there
was a maximum given. Recipients are those plans that had neither of the
two questions answered or both.
Imputation from donor to recipient is
by direct substitution of donor to recipient value.
Return to Table of Contents
Fourteenth group of variables
Pq 17b |
Did the plan have a maximum out-of-pocket for a family
and if so how much? |
Sort variables
State
Class variables
Type of provider, Pq 2
Process
The two variables are related. Only one of the two should
be answered. Donors are those plans with valid responses to the
question; that is, either they had no maximum or there was a maximum
given. Recipients are those plans that had neither of the two questions
or both.
Imputation from donor to recipient is by direct substitution of donor
to recipient value.
Return to Table of Contents
Fifteenth group of variables
Pq 21 |
Does plan offer routine outpatient prescription coverage,
dental care, orthodontic care (only these three types of coverage are imputed)? |
Sort variables
State
Class variables
Type of provider, Pq 2
Process
Donors are those plans that have answered either yes or no to all
three of these items. Recipients are those plans without a yes or no
answer to all three items. To impute, there is a direct transfer of the
donor value for any or all of the three items which are not reported on
the recipient plans.
Return to Table of Contents
Sixteenth group of variables
Eq E3a |
Did the establishment require a waiting period before
a new employee could be covered by health insurance? |
Eq E3b |
If a waiting period was required for 1999, how long
was the typical waiting period? |
Sort variables
Firm age group, industry division group, SIC2,
Firm Size Class 1, State, and establishment size class.
Class variables
None
Process
The process requires two hot-deck imputations. Only
establishments that offer health insurance are considered. The donor set
for the first hot deck is all establishments that reported whether or
not they had a waiting period for health insurance. The recipient set is
all establishments that failed to answer whether they required a waiting
period for health insurance. Imputation of the recipient value is by
direct substitution of the donor value.
The donor set for the second imputation is the set of all
establishments that require a waiting period for health insurance for
their employees and reported the length of that period. Recipients are
those establishments that either reported or had imputed that they had a
waiting period for their employees before health insurance coverage
began, but failed to report the length of that waiting period. The
recipient value of the waiting period length is set equal to the value
of the donor’s waiting period.
Return to Table of Contents
Seventeenth group of variables
Pq 18a |
Could the plan have
refused to cover persons with certain preexisting conditions? |
Pq 18b |
Did this happen in
1999? Pq 19 Did the plan have a policy requiring a waiting
period before covering a pre-existing condition? |
Sort variables
Census division, state, Firm Size Class 2, and establishment employment
Class variables
Type of provider, Pq 3
Process
The process is completed in three steps. First, all plans that were
missing an answer about whether a person could have been refused
coverage for a pre-existing condition are recipients. Donors have a
reported value for the question. Imputation of a value is direct
placement of the donor value into the recipient value.
Plans that could deny coverage due to pre-existing conditions, but
did not indicate if this had happened, were recipients for the second
value. Donor plans reported if someone had been denied coverage for a
condition. Again, imputation was to directly copy the donor value to the
recipient.
The third imputation was similar to the first two. Donor plans had a
reported value for the question, recipient plans did not. Imputation was
by direct transfer of the donor value to the recipient.
Return to Table of Contents
Government Imputation Process
The imputation process for sampled governments
imputes the same data items as the private sector establishments. The
process is similar to that of the private sector. The only differences
are sort and class variables used. For government case imputation, the
same sort variables are used for all data groups. These are, in sort
order, region, state, and government employment size. For government
cases, there are no class variables that describe the government. Class
variables used are only the specialized variables for plans that apply
to that particular imputation group. Thus, for instance, for Group 11,
for the private sector, Firm Size Class 2 and industry division class
are class variables. For government case imputation, both would be
dropped. For the various imputations in Group 12, the variables type of
provider, did the plan have a deductible, and was the family deductible
a multiple of the single deductible are class variables and are kept for
government imputation. The size and industry division variables in Group
11 are dropped because they would describe the government and this type
variable is not used.
For the Group 12 imputation, the three variables are kept because
they are characteristics of the plan and these variables were used.
Return to Table of Contents
References
Kalton, G and Kasprzyk, D. (1986). The treatment of missing survey
data. Survey Methodology,
Statistics Canada, Vol. 12, No 1, pp 1–16.
Sommers, JP. List sample design of the 1996 Medical Expenditure Panel Survey
Insurance Component. Rockville, MD: Agency for Health Care Policy and Research 1999:
MEPS Methodology Report No. 6. AHCPR Pub. No. 99-037.
Sommers, JP. Imputation of employer information for the 1996 Medical Expenditure
Panel Survey Insurance Component. Rockville, MD: Agency for Healthcare Research
and Quality: 2000. MEPS Methodology Report No. 10. AHRQ Pub. No. 00-0039.
Sommers, JP. "Methods to Produce Establishment and
Firm Level Estimates for an Economic Survey." Presented at the
International Conference on Establishment Surveys, June, 2000, Buffalo,
NY.
Stiller, J and Dalzell, D. (1997) Hot-deck imputation with SAS arrays
and macros for large surveys. Proceedings of the 10th Annual NESUG Conference,
North East SAS Users Group, pp 709–714.
Return to Table of Contents
Appendix: Definitions of Selected Variables
Firm Size Class 1
1 if enterprise employment = 0-5 |
2 if enterprise employment = 6-24 |
3 if enterprise employment = 25-99 |
4 if enterprise employment = 100-999 |
5 if enterprise employment = 1000 or more |
Firm Size Class 2
1 if enterprise employment = 0-249 |
2 if enterprise employment = 250 or more |
Firm Size Class 3
1 if enterprise employment = 0-4999 |
2 if enterprise employment = 5000 or more |
Establishment Size Class
1 if establishment employment = 0-10 |
2 if establishment employment = 11 or more |
Industry Division
agriculture if two-digit SIC = 01-09 |
construction if two-digit SIC = 15-17 |
retail trade if two-digit SIC = 52-59 |
mining if two-digit SIC = 10-14 |
finance, insurance, and real estate if two-digit SIC = 60-67 |
wholesale trade if two-digit SIC = 50-51 |
manufacturing if two-digit SIC = 20-39 |
transportation, communication, and utilities if two-digit SIC = 40-49 |
services if two-digit SIC = 70-89 |
Industry Division Group
1 if industry division = agriculture, construction
or retail trade |
2 if industry division = manufacturing, transportation,
communication, utilities or services |
3 if industry division = mining, finance, insurance,
real estate or wholesale trade |
SIC2
The first two digits of the establishment’s six digit
Standard Industrial Classification (SIC) number. |
Firm Age Group
1 if age = 0-16 |
2 if age = 17 years or more |
Firm Age Group 2
1 if age = 0-4 |
2 if age = 5-9 |
3 if age = 10-14 |
4 if age = 15-19 |
5 if age = 20 years or more |
Census Division
New England if State = ME, NH, VT, MA, CT, RI |
Mid-Atlantic if State = NY, NJ, PA |
East North Central if State = OH, IN, IL, MI, WI |
West North Central if State = MN, IA, MO, ND, SD, NE, KS |
South Atlantic if State = DE, MD, DC, VA, WV, NC, SC, GA, FL |
East South Central if State = KY, TN, AL, MS |
West South Central if State = AR, LA, OK, TX |
Mountain if State = MT, ID, WY, CO, NM, AZ, UT, NV |
Pacific if State = WA, OR, CA, AK, HI |
SIC 2
The first two digits of the six-digit SIC code. |
Return to Table of Contents
Return to the MEPS Homepage
Suggested Citation:
Sommers, J. P. Additional Imputations of Employer Information for the Insurance
Component of the Medical Expenditure Panel Survey since 1996.
Methodology Report No. 17. January 2007. Agency
for Healthcare Research and Quality, Rockville, Md. http://www.meps.ahrq.gov/mepsweb/data_files/publications/mr17/mr17.shtml |