Methodology Report #20:
Class Variables for MEPS Expenditure Imputations
Marc W. Zodet, Agency for Healthcare Research and Quality; Diana Z. Wobus, Westat; Steven R. Machlin and
David Kashihara, Agency for Healthcare Research and Quality; and Deborah D. Dougherty, Westat.
Table of Contents
Abstract
The Medical Expenditure Panel Survey (MEPS)
Introduction
Background
Methodology
Examples
Table 1. Total mean expenditures for physician office visits and
inpatient hospital stays, by year (standard error)
Table 2. P-Values (Wald F Statistics) from weighted regression
models, by year (SUDAAN)
Table 3. Order of entry into weighted regression models, by year
(STEPWISE procedure)
Table 4. Coefficients for select variables from weighted regression
models, by year (SUDAAN)
Table 5. Final class variable list for imputing physician office
visit expenditures
Table 6. P-values (Wald F Statistics) from weighted regression models,
by year (SUDAAN)
Table 7. Order of entry into weighted regression models, by year
(STEPWISE procedure)
Table 8. Coefficients for select variables; weighted regression models,
by year (SUDAAN)
Table 9. Final class variable list for imputing inpatient hospital
expenditures
Table 10. Coefficients for reason in hospital; weighted regression
models, by year (SUDAAN)
Summary
References
Abstract
The Medical Expenditure Panel Survey (MEPS) collects
data on health care utilization, expenditures, sources of payment,
insurance coverage, and health care quality measures. The survey was
designed to produce national and regional estimates for the U.S.
civilian noninstitutionalized population. The data on medical expenses
are collected from both household respondents in the Household Component
and from a sample of their health care providers in the Medical Provider
Component. In the absence of payment information from either component,
expenditure data are derived for sample persons through an imputation
process. Missing expense data are imputed at the event level for each
medical event type using a weighted hot-deck procedure. This process
utilizes individual- and event-level data collected in MEPS that are
correlated with medical expenditures. Bivariate analyses and linear
regression models were utilized to assess the current class variables
used for imputation. This paper details the methodology used to select,
prioritize, and categorize the class variables used to impute missing
expenditures for two event types: doctor visits and inpatients
hospitalizations.
The estimates in this report are based on the most
recent data available at the time the report was written. However,
selected elements of MEPS data may be revised on the basis of additional
analyses, which could result in slightly different estimates from those
shown here. Please check the MEPS Web site for the most current file
releases.
Center for Financing, Access, and Cost Trends
Agency for Healthcare Research and Quality
540 Gaither Road
Rockville, MD 20850
http://www.meps.ahrq.gov/
Return to Table of Contents
The Medical Expenditure Panel Survey (MEPS)
Background
The Medical Expenditure Panel Survey (MEPS) is conducted
to provide nationally representative estimates of health care use,
expenditures, sources of payment, and insurance coverage for the U.S.
civilian noninstitutionalized population. MEPS is cosponsored by the
Agency for Healthcare Research and Quality (AHRQ), formerly the Agency
for Health Care Policy and Research, and the National Center for Health
Statistics (NCHS).
MEPS comprises three component surveys: the Household
Component (HC), the Medical Provider Component (MPC), and the Insurance
Component (IC). The HC is the core survey, and it forms the basis for
the MPC sample and part of the IC sample. Together these surveys yield
comprehensive data that provide national estimates of the level and
distribution of health care use and expenditures, support health
services research, and can be used to assess health care policy
implications.
MEPS is the third in a series of national probability
surveys conducted by AHRQ on the financing and use of medical care in
the United States. The National Medical Care Expenditure Survey (NMCES)
was conducted in 1977, the National Medical Expenditure Survey (NMES) in
1987. Beginning in 1996, MEPS continues this series with design
enhancements and efficiencies that provide a more current data resource
to capture the changing dynamics of the health care delivery and
insurance system.
The design efficiencies incorporated into MEPS are in
accordance with the Department of Health and Human Services (DHHS)
Survey Integration Plan of June 1995, which focused on consolidating
DHHS surveys, achieving cost efficiencies, reducing respondent burden,
and enhancing analytical capacities. To accommodate these goals, new
MEPS design features include linkage with the National Health Interview
Survey (NHIS), from which the sample for the MEPS-HC is drawn, and
enhanced longitudinal data collection for core survey components. The
MEPS-HC augments NHIS by selecting a sample of NHIS respondents,
collecting additional data on their health care expenditures, and
linking these data with additional information collected from the
respondents’ medical providers, employers, and insurance providers.
Household Component
The MEPS-HC, a nationally representative survey of the
U.S. civilian noninstitutionalized population, collects medical
expenditure data at both the person and household levels. The HC
collects detailed data on demographic characteristics, health
conditions, health status, use of medical care services, charges and
payments, access to care, satisfaction with care, health insurance
coverage, income, and employment.
The HC uses an overlapping panel design in which data
are collected through a preliminary contact followed by a series of five
rounds of interviews over a two and a half year period. Using
computer-assisted personal interviewing (CAPI) technology, data on
medical expenditures and use for two calendar years are collected from
each household. This series of data collection rounds is launched each
subsequent year on a new sample of households to provide overlapping
panels of survey data and, when combined with other ongoing panels, will
provide continuous and current estimates of health care expenditures.
The sampling frame for the MEPS-HC is drawn from
respondents to NHIS, conducted by NCHS. NHIS provides a nationally
representative sample of the U.S. civilian noninstitutionalized
population, with oversampling of Hispanics and blacks.
Medical Provider Component
The MEPS-MPC supplements and validates information on
medical care events reported in the MEPS-HC by contacting medical
providers and pharmacies identified by household respondents. The MPC
sample includes all hospitals, hospital physicians, home health
agencies, and pharmacies reported in the HC. Also included in the MPC
are all office-based physicians:
-
Providing care for HC respondents receiving Medicaid.
-
Associated with a 75 percent sample of households receiving care through
an HMO (health maintenance organization) or managed care plan.
- Associated with a 25 percent sample of the remaining households. Data are
collected on medical and financial characteristics of medical and pharmacy
events reported by HC respondents, including:
- Diagnoses coded according to ICD-9 (9th Revision, International
Classification of Diseases) and DSMIV (Fourth Edition, Diagnostic and
Statistical Manual of Mental Disorders).
- Physician procedure codes classified by CPT-4 (Current Procedural
Terminology, Version 4).
- Inpatient stay codes classified by DRG (diagnosis related group).
- Prescriptions coded by national drug code (NDC), medication names,
strength, and quantity dispensed.
- Charges, payments, and the reasons for any difference between charges
and payments.
The MPC is conducted through telephone interviews and mailed survey materials.
Insurance Component
The MEPS-IC collects data on health insurance plans
obtained through private and public sector employers. Data obtained in
the IC include the number and types of private insurance plans offered,
benefits associated with these plans, premiums, contributions by
employers and employees, and employer characteristics.
Establishments participating in the MEPS-IC are selected through three sampling
frames:
-
A list of employers or other insurance providers identified by MEPS-HC respondents
who report having private health insurance at the Round 1 interview.
-
A Bureau of the Census list frame of private-sector business establishments.
-
The Census of Governments from the Bureau of the Census.
To provide an integrated picture of health insurance, data collected from the
first sampling frame (employers and other insurance providers) are linked back
to data provided by the MEPS-HC respondents. Data from the other three sampling
frames are collected to provide annual national and State estimates of the supply
of private health insurance available to American workers and to evaluate policy
issues pertaining to health insurance. Since 2000, the Bureau of Economic Analysis
has used national estimates of employer contributions to group health insurance
from the MEPS-IC in the computation of Gross Domestic Product (GDP).
The MEPS-IC is an annual panel survey. Data are collected from the selected
organizations through a prescreening telephone interview, a mailed questionnaire,
and a telephone follow-up for nonrespondents.
Survey Management
MEPS data are collected under the authority of the
Public Health Service Act. They are edited and published in accordance
with the confidentiality provisions of this act and the Privacy Act.
NCHS provides consultation and technical assistance.
As soon as data collection and editing are completed,
the MEPS survey data are released to the public in staged releases of
summary reports and microdata files. Summary reports are released as
printed documents and electronic files. Microdata files are released on
CD-ROM and/or as electronic files.
Printed documents and CD-ROMs are available through the AHRQ Publications
Clearinghouse. Write or call:
AHRQ Publications Clearinghouse
Attn: (publication number)
P.O. Box 8547 Silver Spring, MD 20907
800-358-9295
703-437-2078 (callers outside the United States only)
888-586-6340 (toll-free TDD service; hearing impaired only)
To order online, send an e-mail to: ahrqpubs@ahrq.gov.
Be sure to specify the AHRQ number of the document or CD-ROM you are requesting.
Selected electronic files are available through the Internet on the MEPS
Web site: http://www.meps.ahrq.gov/
For more information, visit the MEPS Web site or e-mail mepspd@ahrq.gov.
Introduction
The Medical Expenditure Panel Survey (MEPS) collects
data on health care utilization, expenditures, sources of payment,
insurance coverage, and health care quality measures. The survey,
conducted annually since 1996 by the Agency for Healthcare Research and
Quality (AHRQ), is designed to produce national and regional estimates
for the U.S. civilian noninstitutionalized population.
MEPS data on medical expenses are collected from both
household respondents in the Household Component and from a sample of
their health care providers in the Medical Provider Component. When
payment (i.e., expenditure) information is missing from either
component, these data are derived through an imputation process. Expense
data are collected at the event level for each medical event type and a
weighted hot-deck procedure is used for imputation. This process
utilizes individual- and event-level data collected in MEPS that are
correlated with medical expenditures. AHRQ uses bivariate analyses and
linear regression models to assess potential variables to use in
imputation.
Using office-based visits and inpatients stays as
examples, this paper details the methodology used to select, prioritize,
and categorize the class variables used to impute missing expenditure
data. The paper does not address the specifics of how the imputations
are actually carried out. For a more detailed description of the
imputation procedure, see Machlin and Dougherty, 2004.
Return to Table of Contents
Background
Class variables
A key component of a hot-deck procedure is the matching
of sample observations with missing information (i.e., recipients) to
similar sample observations not missing the information (i.e., donors).
Categorical or "class" variables that characterize the sample
observations are used to classify both recipients and donors into
imputation cells (i.e., classes). Within each imputation cell, the
recipients’ missing values are imputed from the values of the donors.
Variables that are considered important predictors of the data to be
imputed are the primary candidates for use as class variables. The
underlying assumption is that the recipients have similar values with
regard to the measure of interest as the donors and that the data
associated with the donors within the same imputation cell are
appropriate for the imputation of the missing values (Cox, 1980).
Class variables are typically ordered in accordance with
predictive importance (i.e., more important predictors are ranked
higher). If there are fewer donors than recipients in a cell, then the
procedure will begin collapsing over the categories of the class
variables, starting at the bottom of the list and working up, until a
sufficient number of donors are available.
MEPS event types
MEPS expenditure data are imputed separately for each of
10 event types: hospital inpatient stays, hospital outpatient department
visits, emergency room visits, office-based visits (physician and
non-physician), home health (agency and paid independent), dental, other
medical equipment/supplies, and prescription medications. Separate
imputations are conducted for each event type because the relevant
variables and statistically significant correlates are not consistent
across the event types. Therefore, for each event type, the class
variables are evaluated and chosen separately, but some of the same
class variables are used across different event types. For example, the
class variables for the imputations of both emergency room expenditures
and dental expenditures include patient age. While the same class
variable may be used across multiple event types, the specification of
the specific categories for the variable used in the individual
imputations may differ. The remainder of this paper discusses the
process by which variables are evaluated and selected for use in the
creation of imputation cells.
Return to Table of Contents
Methodology
The lists of class variables used to impute
event-specific expenditures were initially established based on the
first year of MEPS data (1996). The process of identifying predictors of
total expenditures was based both on substantive decisions and
statistical associations that were identified primarily through multiple
linear regression models. In 2002, analysts from AHRQ and Westat, the
data collection contractor, jointly began to reevaluate and revise these
lists of class variables. The methods presented in this section and the
Examples section below are reflective of those efforts and focus
primarily on the quantitative methods used in the decision process.
Data
Event-level data are used for these analyses. Only
events that were potential donors (i.e., complete on the Household
Component and/or the Medical Provider Component) were used in the
analyses. Multiple years of data were examined: 1997, 1998, and 1999.
For the most part, each year of data was examined separately. However,
when the numbers of events were small (e.g., home health services),
years of data were pooled to stabilize the variance of the estimates.
Potential class variables
The class variables considered for the imputation were
those collected in MEPS that were thought a priori to potentially have a
significant impact on total expenditures. Two variables were considered
important enough to be included in all imputation procedures: type of
insurance coverage and total charges. The former was chosen because the
payment for health care services can vary widely by insurance status and
type of insurance coverage (e.g., private, Medicare, Medicaid, etc.);
the latter because total charges are highly correlated with total
expenditures. Unfortunately, when expenditures are missing total charges
are also frequently missing.
Other potential predictors of expenditures were selected
quantitatively. These included various indicators of health care
services (e.g., laboratory tests, radiology, surgeries/extractions,
etc.). Predictors can be specific to the type of event. For example, the
number of nights is associated with inpatient hospital stays, but is not
relevant to physician office visits.
Return to Table of Contents
Regression models
Multiple linear regression was used to evaluate the
statistical associations between potential class variables and total
expenditures. The dependent variable in each model was total
expenditures for the event. Total expenditures were defined as the sum
of direct payments for care provided during the year, including both
out-of-pocket, third-party (e.g., private insurance, Medicare, and
Medicaid), and other miscellaneous payment sources.
Two approaches were taken when fitting
the regression models to assess the association between potential
class variables and
total expenditures. First, to adjust for the complex design of MEPS,
linear regression models were fit using PROC REGRESS in the SUDAAN
statistical software package (http://www.rti.org/sudaan). With these
models, the two primary considerations were 1) whether or not the resulting
regression coefficients were significant and 2) the relative magnitude
and direction of the significant coefficients. Statistical significance
was determined at the α=0.05 level. To provide additional guidance
in the selection of variables, models were fit using SAS PROC STEPWISE
(http://www.sas.com/). The significance
level for entry and retention was 0.15 (the SAS default). Block entry
grouping of variables was used to ensure that all levels of a particular
variable were entered, retained, or eliminated as a group.
Results from both sets of models (i.e., those fit using
SUDAAN and those fit using SAS) were considered when selecting the final
list of class variables to be used in the imputation procedures. Model
results were also used to prioritize the class variables, which were
ranked with the most important substantive and statistical predictors
placed higher on the list. Model results were also used to determine the
collapsing strategies for variables with three or more levels. When it
became necessary to collapse over imputation cells due to insufficient
availability of donors, the most important predictors of total
expenditures (i.e., those higher on the list) were preserved. This was
an effort to assure that recipients and donors were matched based on the
most important predictors of total expenditures.
Return to Table of Contents
Examples
As noted previously, the process for identifying class
variables was performed separately for each type of event. Examples of
how this process works for physician office visits and inpatient
hospital stays are presented below. To provide a point of reference for
the magnitude of total expenses attributed to each of these two types of
medical events, table 1 presents mean total expenditures per event for
1997 through 1999 for events with complete (i.e., not imputed) data. In
2001, approximately one-third of the expenditure values were fully
imputed for physician office visits and hospital inpatient stays.
Table 1. Mean total expenditures for physician office visits and inpatient hospital stays, by year
(standard error)
|
1997 |
1998 |
1999 |
Physician office visits
1 |
$92 ($3) |
$98 ($3) |
$107 ($3) |
Hospital inpatient stays
1,
2 |
$5,647 ($301) |
$5,375 ($304) |
$5,929 ($367) |
1
Estimates are for patients with complete event data (i.e., donors).
2
Only events of patients who did not die during the year.
During the late 1990s, total expenditures for a
physician office visit averaged roughly $100 per event while facility
expenditures for an inpatient hospital stay during this same period
averaged approximately $5,600 per event.
Physician office visits
Table 2 summarizes p-values for regression model
coefficients fit using SUDAAN (i.e., adjusted for the complex survey
design). Separate models were fit for the years 1997, 1998, and 1999
with physician office visit expenditures as the dependent variable in
each model. Independent variables in the models were those hypothesized
as potentially significant predictors of office visit expenditures and
were the candidate variables from which to select the class variables to
create the imputation cells.
The information provided in table 2 shows that surgery,
radiology, other services, and laboratory services were all
statistically significant predictors of physician office visit
expenditures across all three years (p-values < 0.01). Other variables
were statistically significant predictors in some years, but not others.
For example, patient age was highly significant (p-value < 0.01) in
1999, but not in the two preceding years.
Table 2. P-Values (Wald F Statistics) from weighted regression models, by year (SUDAAN) Dependent variable = physician office visit expenditures
|
1997 |
1998 |
1999 |
# Obs Used in Regression |
48,815 |
34,948 |
31,978 |
R2 |
0.043 |
0.048 |
0.032 |
Class variable
1 |
|
|
|
Surgery (Yes; No) |
<0.01 |
<0.01 |
<0.01 |
Radiology (yes; no) |
<0.01 |
<0.01 |
<0.01 |
Other services (yes; no) |
<0.01 |
<0.01 |
<0.01 |
Laboratory services (yes; no) |
<0.01 |
<0.01 |
<0.01 |
Saw non-MD (yes; no) |
|
<0.10 |
<0.10 |
Age (<18; 18-24; 25-64; 65+) |
|
|
<0.01 |
Perceived health (poor; other) |
|
<0.10 |
<0.05 |
Race/ethnicity (Hispanic; other) |
|
|
|
Census region (S; MW; NE; W) |
|
|
|
MSA (MSA; Non-MSA) |
<0.05 |
<0.10 |
|
1
Variables forced into the models are not shown (e.g., Insurance Source of Payment [Private; Medicare;
Medicaid; CHAMPUS/TRICARE], Decile of Total Charges, and HMO Indicator [Yes; No])
Return to Table of Contents
Results from fitting the STEPWISE models for each year
are presented in table 3, which shows the order in which the independent
variables entered into the models. Surgery, radiology, and other
services were consistently the first, second, and third variables
entered into the model each year. Perceived health and laboratory
services alternated as the fourth and fifth variables, depending on the
year.
Table 3. Order of entry into weighted regression models, by year (STEPWISE procedure) Dependent variable = physician office visit expenditures
|
1997 |
1998 |
1999 |
# Obs used in regression |
48,815 |
34,948 |
31,978 |
R2 |
0.042 |
0.048 |
0.032 |
Variable entry order |
|
|
|
1st |
Surgery |
Surgery |
Surgery |
2nd |
Radiology |
Radiology |
Radiology |
3rd |
Other services |
Other services |
Other services |
4th |
Perceived health |
Lab services |
Perceived health |
5th |
Lab services |
Perceived health |
Lab services |
6th |
Saw non-MD |
Age |
Age |
7th |
Region |
Saw non-MD |
Region |
8th |
Region |
Region |
Saw non-MD |
Table 4 presents the SUDAAN regression coefficients for
selected variables used in the model. This table illustrates that
surgery was consistently associated with higher physician office visit
expenditures. For the years observed (i.e., 1997–1999), the average
additional expenditure associated with having a surgical procedure
during a physician office visit was approximately $200, when controlling
for the other variables on the model. These additional expenditures were
substantially greater than what is observed for the other factors being
considered. For example, the difference in mean expenditures per event
associated with surgery compared to radiology (the second strongest
effect) ranged from approximately $115 in 1999 ($196–$81) to
approximately $136 in 1997 ($205–$69).
Table 4. Coefficients for select variables from weighted regression models, by year (SUDAAN),
Dependent variable = physician office visit expenditures
|
β-Coefficients (SE β-Coefficients) |
Class variable |
1997 |
1998 |
1999 |
Surgery |
$205 ($25) |
$198 ($28) |
$196 ($28) |
Radiology |
$69 ( $5) |
$79 ( $7) |
$81 ( $9) |
Other services |
$53 ( $9) |
$44 ($10) |
$58 ( $8) |
Lab services |
$21 ( $4) |
$24 ( $6) |
$20 ( $6) |
Perceived health |
$40 ($25) |
$30 ($17) |
$34 ($14) |
Saw non-MD |
-$8 ( $6) |
-$14 ( $8) |
-$11 ( $7) |
Among the four most highly significant variables (i.e.,
surgery, radiology, other services, and laboratory services), the
magnitudes of the coefficients (i.e., the average expenditures)
associated with a particular service tended to diminish in accordance
with the entry order of the variables into the STEPWISE models. However,
while the expenses associated with surgery were consistently higher than
those of any of the other factors considered, the magnitude of the
differences between the other services (i.e., radiology, other, and lab)
varied from year to year. For example, a simple comparison of the mean
office visit expenditures associated with radiology compared to other
services demonstrated no significant difference in 1997; but there was a
significant difference in 1998, with payments for office visits
involving a radiology service running about $35 more per
visit compared with those with other services ($79 versus $44). In
summary, of the factors considered, surgery clearly had the greatest
impact on increasing physician office visit expenditures.
Return to Table of Contents
The final list of class variables used to impute
physician office visit expenditures is presented in table 5. The top
three variables were chosen based upon substantive reasoning: HMO (an
indicator of whether or not the patient was enrolled in an HMO), type of
insurance coverage, and total charges. The remainder were chosen based
upon the regression results. Surgery, radiology, and other services
followed in that order primarily because they were each highly
significant in each of the SUDAAN models across all three years and
because they were consistently the first three variables entered into
the STEPWISE models in all three years. The laboratory services variable
was placed above the perceived health variable because it was more
highly significant in each of the SUDAAN models and because it entered
into the STEPWISE models before the perceived health variable for two of
the three years. In turn, the perceived health variable was more
statistically significant in the SUDAAN models than the saw non-MD
variable. It also entered into each of the STEPWISE models before saw
non-MD and was therefore higher on the list. Despite being statistically
significant in at least one of the years examined, neither age nor MSA
(metropolitan statistical area) were included on the final list of class
variables. The rationale for dropping age and MSA came from the fact
that age was only significant in one year (p-value < 0.01), and MSA was
never retained in any of the STEPWISE procedures.
Table 5. Final class variable list for imputing physician office visit expenditures
1. |
HMO |
2. |
Type of Insurance Coverage |
3. |
Total charges |
4. |
Surgery |
5. |
Radiology |
6. |
Other services |
7. |
Laboratory services |
8. |
Perceived health |
9. |
Saw non-MD |
Hospital inpatient stays
Table 6 shows that, based on the SUDAAN model, the only
statistically significant predictors of inpatient hospital stay
expenditures of the variables considered were length of stay and reason
in hospital (p-values < 0.01). These results were consistent across each
of the three years. Results from the STEPWISE models confirmed the
importance of both length of stay (LOS) and reason in hospital, as these
variables were consistently the first and second variable, respectively,
added to each of the models (table 7).
Table 6. P-values (Wald F Statistics) from weighted regression models, by year (SUDAAN)
Dependent variable = inpatient hospital stay expenditures
|
1997 |
1998 |
1999 |
# Obs used in regression |
1,881 |
1,294 |
1,259 |
R2 |
0.40 |
0.36 |
0.44 |
Class variable
1 |
|
|
|
ER before admission (yes; no) |
|
|
|
HMO (yes; no) |
|
|
|
Length of Stay (0, 1, 2,…6, 7, 8-13, 14-30, 31-60, 61+) |
<0.01 |
<0.01 |
<0.01 |
Reason in hospital (surgery; treatment/therapy;
diagnostic tests; give birth; to be born; other) |
<0.01 |
<0.01 |
<0.01 |
Census region (N; MW; S; W) |
|
|
|
MSA (MSA; non-MSA) |
|
|
|
1
Variables forced into the models are not shown (e.g., Insurance Source of Payment [Private;
Medicare; Medicaid; CHAMPUS/TRICARE] and Decile of Total Charges)
Return to Table of Contents
Table 7. Order of entry into weighted regression models, by year (STEPWISE procedure)
Dependent variable = inpatient hospital stay expenditures
|
1997 |
1998 |
1999 |
# Obs used in regression |
1,881 |
1,294 |
1,259 |
R2 |
0.32 |
0.31 |
0.31 |
Variable entry order |
|
|
|
1st |
LOS |
LOS |
LOS |
2nd |
Reason |
Reason |
Reason |
3rd |
ER before |
Region |
Region |
4th |
Region |
|
HMO |
The coefficients for length of stay and reason in
hospital that resulted from SUDAAN are presented in table 8. For the
most part, mean expenditures per stay increased as the length of stay
increased. There was some erratic behavior of the coefficients for the
longest lengths of stay (e.g., sharp drops in average expenditures
associated with lengths of stay of more than 60 days). While this may
have been due to the influence of outliers in the 31–60 day category
and/or may suggest that some other functional form of the variable was
more appropriate, it had no impact on our decision to include length of
stay as a high-priority variable. Surgery was the most significant
contributor to inpatient expenditures compared with the other reasons
for hospitalization. The coefficients indicated that surgery is
associated with an approximate increase in inpatient expenditures of at
least $3,000 compared to the other reasons category for admission to the
hospital.
Table 8. Coefficients for select variables; weighted regression models, by year (SUDAAN)
Dependent variable = inpatient hospital stay expenditures
|
|
β-Coefficients (SE β-Coefficients) |
Class variable |
|
1997 |
1998 |
1999 |
Length of stay (days) |
0 (Reference) id="0" |
$0 ( $0) |
$0 ( $0) |
$0 ( $0) |
1 |
$2,121 ( $411) |
$2,020 ( $488) |
$771 ( $550) |
2 |
$3,824 ( $448) |
$3,073 ( $480) |
$2,146 ( $638) |
3 |
$4,715 ( $523) |
$3,792 ( $505) |
$3,126 ( $569) |
4 |
$5,637 ( $615) |
$5,239 ( $727) |
$4,193 ( $708) |
5 |
$6,922 ( $933) |
$6,624 ( $976) |
$4,436 ( $707) |
6 |
$7,853 ( $836) |
$7,307 ($1,236) |
$6,165 ( $1,125) |
7 |
$8,532 ( $927) |
$7,180 ($1,110) |
$7,340 ( $1,066) |
8-13 |
$10,555 ( $1,053) |
$8,722 ( $761) |
$8,769 ( $1,124) |
14-30 |
$18,967 ( $3,048) |
$18,123 ($2,706) |
$19,409 ( $4,170) |
31-60 |
$44,950 ($12,311) |
$25,739 ($6,567) |
$39,188 ($17,209) |
61+ |
$5,484 ( $827) |
$15,107 ($9,416) |
$48,210 ($11,756) |
Reason in hospital |
Surgery (reference) |
$0 ( $0) |
$0 ( $0) |
$0 ( $0) |
Treatment/therapy |
-$4,342 ( $590) |
-$3,906 ( $676) |
-$4,937 ( $882) |
Diagnostic tests |
-$4,315 ( $570) |
-$3,543 ( $521) |
-$4,998 ( $734) |
Give birth |
-$3,380 ( $461) |
-$3,122 ( $532) |
-$3,780 ( $622) |
To be born |
$2,456 ( $4,525) |
-$2,082 ($1,956) |
-$6,554 ( $1,701) |
Other |
-$3,792 ( $924) |
-$3,600 ($1,525) |
-$4,567 ( $796) |
Return to Table of Contents
The final list of class variables used to impute
inpatient hospital expenditures is presented in table 9. As usual, type
of insurance coverage and total charges were included at the top of the
list. In addition, an indicator of whether or not there was an emergency
room (ER) event before the hospital admission was included because the
billing information for the ER and the hospital stay are often rolled up
into one expenditure figure for the stay. Based on the findings noted
above, length of stay and reason in hospital then followed in that
order. MSA status and census region were also included on the final
list; based in part on their being retained in the STEPWISE models
(p-values<0.15).
Table 9. Final class variable list for imputing inpatient hospital expenditures
1. |
Type of Insurance Coverage |
2. |
Total Charges |
3. |
ER before Admission |
4. |
Length of Stay |
5. |
Reason in Hospital |
6. |
MSA/Non-MSA |
7. |
Census Region |
Class variable collapsing strategy
Results from the regression modeling presented above
were also used to establish the collapsing strategy used during the
hot-deck procedure for variables with three or more levels. The
coefficients from the SUDAAN regression models weighed heavily in
deciding how to collapse over variables with three or more categories.
For example, consider the reason in hospital variable described above.
Note that there was little difference between the coefficients for
treatment/therapy and diagnostics tests only. Hence, prior to using the
variable in the imputation procedure, it seemed reasonable to recode
these two levels into one, effectively reducing the variable from six
levels to five levels (table 10). During the imputation procedure,
further collapsing of the remaining levels was determined by the number
of recipients/donors residing in a given imputation cell. Given the
findings noted above, it was important to maintain surgery as a separate
category whenever possible since it was associated with the highest mean
expenditures. Thus, the hot deck was programmed to maintain surgery as a
separate category whenever possible.
Table 10. Coefficients for reason in hospital; weighted regression models, by year (SUDAAN)
Dependent variable = inpatient hospital stay expenditures
|
|
β-Coefficients (SE β-Coefficients) |
|
|
1997 |
1998 |
1999 |
Reason in hospital: |
1
{ |
Surgery (reference) |
$0 |
$0 |
$0 |
Treatment/therapy |
-$4,342 ( $590) |
-$3,906 ( $676) |
-$4,937 ( $882) |
Diagnostic tests only |
-$4,315 ( $570) |
-$3,543 ( $521) |
-$4,998 ( $734) |
Give birth |
-$3,381 ( $461) |
$3,122 ( $532) |
-$3,780 ( $622) |
To be born |
$2,456 ($4,525) |
$2,082 ($1,956) |
-$6,554 ($1,701) |
Other |
-$3,792 ( $924) |
-$3,600 ($1,525) |
-$4,567 ( $796) |
1
Recoded into a single category (i.e., reason in hospital changes from a six-level variable
to a five-level variable).
Return to Table of Contents
Summary
The process of selecting the most appropriate class
variables to use when imputing health care expenditures is a combination
of art and science that involves both substantive reasoning and
statistical analysis. As illustrated above, predictors of expenses can
vary by event type, and the selection of class variables includes the
examination of both person characteristics and event characteristics.
Careful selection of class variables should improve the quality of the
hot-deck imputation procedure and reduce bias in MEPS expenditure
estimates. The class variables used to impute health care expenditure
data in MEPS are periodically reviewed and refined. Class variables
being considered for future inclusion in the imputation procedures
include provider specialty for ambulatory events and person-level
condition information.
Return to Table of Contents
References
Cox, B. (1980). The Weighted Sequential Hot Deck
Imputation Procedure. American Statistical Association 2004 Proceedings
of the Section on Survey Research Methods, 721–726.
Machlin, S. and Dougherty, D. (2004). Overview of
Methodology for Imputing Missing Expenditure Data in the Medical Expenditure
Panel Survey. Methodology Report No. 19. March 2007. Agency for Healthcare
Research and Quality, Rockville, Md.
http://www.meps.ahrq.gov/mepsweb/data_files/publications/mr19/mr19.pdf
Return to Table of Contents
Return to the MEPS Homepage
Suggested Citation:
Zodet, M. W., Wobus, D. Z, Machlin, S. R., Kashihara, D., and Dougherty,
D. D.
Class Variables for MEPS Expenditure Imputations. Methodology
Report No. 20. March 2007. Agency for Healthcare Research and Quality,
Rockville, Md. http://www.meps.ahrq.gov/mepsweb/data_files/publications/mr20/mr20.shtml |