MEPS Home Medical Expenditure Panel Survey
Font Size:
Contact MEPS FAQ Site Map  
S
M
L
XL


 
Methodology Report #16:
Producing State Estimates with the Medical Expenditure Panel Survey, Household Component

John P. Sommers, PhD, Agency for Healthcare Research and Quality.


Table of Contents

Abstract

The Medical Expenditure Panel Survey (MEPS)

Background

Need for State Estimates from the MEPS-HC

State Estimates with MEPS-HC Data

The MEPS-HC

Number of PSUs

Design-based state estimates

Design-based estimates with weights post-stratified by state

Composite estimation

Conclusions and Recommendations

Acknowledgments

References


Abstract

In recent years, there has been a growing need for estimates of health care expenditures at the state level. The Household Component of the Medical Expenditure Panel Survey (MEPS-HC) is a survey designed to collect information on and to produce national and regional estimates of health care expenditures. However, while the sample design allows for some state estimates to produced, there is no assurance that the quality of these estimates are adequate for use, since the original purpose of the survey did not include production of state estimates.

This paper describes the results of research using data from the MEPS-HC to produce a selected group of state-level estimates for the 30 states with the largest populations. Three methods of estimation are used. Each method is evaluated using standard measures, and conclusions about the quality of these estimates, along with recommendations, are given.

The estimates in this report are based on the most recent data available at the time the report was written. However, selected elements of MEPS data may be revised on the basis of additional analyses, which could result in slightly different estimates from those shown here. Please check the MEPS Web site for the most current file releases.

Center for Financing, Access, and Cost Trends
Agency for Healthcare Research and Quality
540 Gaither Road
Rockville, MD 20850
www.meps.ahrq.gov


The Medical Expenditure Panel Survey (MEPS)

Background

The Medical Expenditure Panel Survey (MEPS) is conducted to provide nationally representative estimates of health care use, expenditures, sources of payment, and insurance coverage for the U.S. civilian noninstitutionalized population. MEPS is cosponsored by the Agency for Healthcare Research and Quality (AHRQ), formerly the Agency for Health Care Policy and Research, and the National Center for Health Statistics (NCHS).

MEPS comprises three component surveys: the Household Component (HC), the Medical Provider Component (MPC), and the Insurance Component (IC). The HC is the core survey, and it forms the basis for the MPC sample and part of the IC sample. Together these surveys yield comprehensive data that provide national estimates of the level and distribution of health care use and expenditures, support health services research, and can be used to assess health care policy implications.

MEPS is the third in a series of national probability surveys conducted by AHRQ on the financing and use of medical care in the United States. The National Medical Care Expenditure Survey (NMCES) was conducted in 1977, the National Medical Expenditure Survey (NMES) in 1987. Beginning in 1996, MEPS continues this series with design enhancements and efficiencies that provide a more current data resource to capture the changing dynamics of the health care delivery and insurance system.

The design efficiencies incorporated into MEPS are in accordance with the Department of Health and Human Services (DHHS) Survey Integration Plan of June 1995, which focused on consolidating DHHS surveys, achieving cost efficiencies, reducing respondent burden, and enhancing analytical capacities. To accommodate these goals, new MEPS design features include linkage with the National Health Interview Survey (NHIS), from which the sample for the MEPS-HC is drawn, and enhanced longitudinal data collection for core survey components. The MEPS-HC augments NHIS by selecting a sample of NHIS respondents, collecting additional data on their health care expenditures, and linking these data with additional information collected from the respondents’ medical providers, employers, and insurance providers.

Household Component

The MEPS-HC, a nationally representative survey of the U.S. civilian noninstitution-alized population, collects medical expenditure data at both the person and household levels. The HC collects detailed data on demographic characteristics, health conditions, health status, use of medical care services, charges and payments, access to care, satisfaction with care, health insurance coverage, income, and employment.

The HC uses an overlapping panel design in which data are collected through a preliminary contact followed by a series of five rounds of interviews over a two and a half year period. Using computer-assisted personal interviewing (CAPI) technology, data on medical expenditures and use for two calendar years are collected from each household. This series of data collection rounds is launched each subsequent year on a new sample of households to provide overlapping panels of survey data and, when combined with other ongoing panels, will provide continuous and current estimates of health care expenditures.

The sampling frame for the MEPS-HC is drawn from respondents to NHIS, conducted by NCHS. NHIS provides a nationally representative sample of the U.S. civilian noninstitutionalized population, with oversampling of Hispanics and blacks.

Medical Provider Component

The MEPS-MPC supplements and validates information on medical care events reported in the MEPS-HC by contacting medical providers and pharmacies identified by house-hold respondents. The MPC sample includes all hospitals, hospital physicians, home health agencies, and pharmacies reported in the HC. Also included in the MPC are all office-based physicians:

  • Providing care for HC respondents receiving Medicaid.
  • Associated with a 75 percent sample of households receiving care through an HMO (health maintenance organization) or managed care plan.
  • Associated with a 25 percent sample of the remaining households. Data are collected on medical and financial characteristics of medical and pharmacy events reported by HC respondents, including:
  • Diagnoses coded according to ICD-9 (9th Revision, International Classification of Diseases) and DSMIV (Fourth Edition, Diagnostic and Statistical Manual of Mental Disorders).
  • Physician procedure codes classified by CPT-4 (Current Procedural Terminology, Version 4).
  • Inpatient stay codes classified by DRG (diagnosis related group).
  • Prescriptions coded by national drug code (NDC), medication names, strength, and quantity dispensed.
  • Charges, payments, and the reasons for any difference between charges and payments.

The MPC is conducted through telephone interviews and mailed survey materials.

Insurance Component

The MEPS-IC collects data on health insurance plans obtained through private and public sector employers. Data obtained in the IC include the number and types of private insurance plans offered, benefits associated with these plans, premiums, contributions by employers and employees, and employer characteristics.

Establishments participating in the MEPS-IC are selected through three sampling frames:

  • A list of employers or other insurance providers identified by MEPS-HC respondents who report having private health insurance at the Round 1 interview.
  • A Bureau of the Census list frame of private-sector business establishments.
  • The Census of Governments from the Bureau of the Census.

To provide an integrated picture of health insurance, data collected from the first sampling frame (employers and other insurance providers) are linked back to data provided by the MEPS-HC respondents. Data from the other three sampling frames are collected to provide annual national and State estimates of the supply of private health insurance available to American workers and to evaluate policy issues pertaining to health insurance. Since 2000, the Bureau of Economic Analysis has used national estimates of employer contributions to group health insurance from the MEPS-IC in the computation of Gross Domestic Product (GDP).

The MEPS-IC is an annual panel survey. Data are collected from the selected organizations through a prescreening telephone interview, a mailed questionnaire, and a telephone follow-up for nonrespondents.

Survey Management

MEPS data are collected under the authority of the Public Health Service Act. They are edited and published in accordance with the confidentiality provisions of this act and the Privacy Act. NCHS provides consultation and technical assistance.

As soon as data collection and editing are completed, the MEPS survey data are released to the public in staged releases of summary reports and microdata files. Summary reports are released as printed documents and electronic files. Microdata files are released on CD-ROM and/or as electronic files.

Printed documents and CD-ROMs are available through the AHRQ Publications Clearinghouse. Write or call:

AHRQ Publications Clearinghouse
Attn: (publication number)
P.O. Box 8547 Silver Spring, MD 20907
800-358-9295
703-437-2078 (callers outside the United States only)
888-586-6340 (toll-free TDD service; hearing impaired only)

To order online, send an e-mail to: ahrqpubs@ahrq.gov.

Be sure to specify the AHRQ number of the document or CD-ROM you are requesting. Selected electronic files are available through the Internet on the MEPS Web site: http://www.meps.ahrq.gov/

For more information, visit the MEPS Web site or e-mail mepspd@ahrq.gov.

Return to Table of Contents


Background

An investigation of results produced from large-scale Federal household surveys, reveals few state estimates produced from these surveys. The exception is the Current Population Survey (CPS), conducted by the Census Bureau and sponsored by the Bureau of Labor Statistics, which is the source of state income, poverty, and health care uninsurance rate estimates (Census Bureau Web site: c). However, the state personal income and uninsurance rate estimates are multiple-year averages produced from combining several years of data (DeNavas-Walt et al., 2004). County- and state-level estimates of income and poverty are produced using special small area estimation techniques using CPS and other data. State estimates are rare, and design-based one-year estimates do not seem to be routinely published. State-level estimates are not routinely produced from other large-scale Federal household surveys, including the following:

  • The National Health Interview Survey (NHIS), sponsored by the Centers for Disease Control and Prevention's National Center for Health Statistics (NCHS) and conducted by the Census Bureau (NCHS Web site and Botman, et al., 2000)
  • The Survey of Income and Program Participation (SIPP), conducted by the Census Bureau (Census Bureau Web site: a and Kostanich and Dippo, 2002)
  • The American Housing Survey (AHS), sponsored by the Department of Housing and Urban Development and conducted by the Census Bureau (Census Bureau Web site: b and Census Bureau Web site: d )
  • The Household Component of the Medical Expenditure Panel Survey (MEPS-HC), sponsored by the Agency for Healthcare Research and Quality (AHRQ) (Medical Expenditure Panel Survey Web site and Cohen, 2000).

A basic indication explaining why state-level estimates are not produced with these surveys can be found in the estimates that are produced with the CPS data by the Census Bureau. Multiyear averages and small area estimation techniques are used when the reliability of the design-based estimates is poor. Multiyear averages increase sample sizes and thus lower errors. Small area estimation techniques use modeling and other complex and time-intensive estimation methods to "borrow strength" from data outside the state or outside the survey to improve results with poor precision that are produced using design-based survey estimation techniques, such as weighted means and totals. (Ghosh and Rao, 1994)

For the large national Federal surveys, unbiased design-based estimates for states are possible. An unbiased estimate of the state average for any variable is simply the weighted sum of the variable for all sample units within the state divided by the sum of the weights for the sample units within the state (Cochran, 1977). There are two reasons, however, why design-based estimates for individual states are generally of poor precision for the large national surveys. The first reason is that to meet population sub-domain and for key survey estimates for specified proportions, the survey sample sizes are generally allocated on a national basis. For general population-based national surveys, survey budgets do not allow adequate sample size in each state to produce state estimates with acceptable precision. There is generally no focus on state estimates, thus no minimum state samples. Even for surveys with state-level stratification as a design feature, such as the NHIS, given that the top10 states have over half the population, the sample sizes for the remaining states can fall below an ideal sample size.

Aside from the possible lack of adequate sample sizes in states, the sample designs used in the surveys listed above also make it difficult to produce high-quality design-based estimates for states. All the surveys have stratified multistage cluster sample designs. This involves sampling of clusters, sets of counties. This limits the samples in each state, no matter how many persons are in the final sample, to a limited number of counties within each state. This is done because of the costs of personal household visits to collect the information if the sample of persons were spread widely and evenly across the entire country. This clustering of the sample can have a large impact on the sampling error. If the average values for the variables being estimated vary considerably across the clusters of counties, then the sampling error is effectively limited by the number of county clusters. For further detail on the variances of cluster samples, see Cochran, 1977.

Return to Table of Contents

Need for State Estimates from the MEPS-HC

In 2004, the increase in costs of health insurance was above 10 percent (Kaiser Family Foundation, 2004). Further, large numbers of persons are without health coverage (DeNavas-Walt et al., 2004). At the same time, state budgets have been hard hit by loss of revenues and increasing health care costs from Medicaid spending. In 2003, Medicaid spending increased 8 percent over 2002 levels and was 21.4 percent of all state spending. As a result, most states reduced Medicaid benefits, reduced Medicaid eligibility, and implemented prescription drug cost control programs for Medicaid (National Association of State Budget Officers, 2003).

Given the variety of methods being implemented by the states to control Medicaid and other health costs and to increase the number of persons with health insurance, it is of great interest for all states to know which efforts have succeeded and which have not. To assess these results, one must have reliable data. Health care usage and costs at the state level must be analyzed to determine if costs are increasing over time and if more people are using health care. It would also be of use to know who is using care and what types; for instance, how much care certain parts of the population use and what type, doctor visits, hospitals, etc.

Because of these data needs, there is an emerging drive within the Department of Health and Human Services (DHHS) to examine the feasibility of producing state data from population-based surveys and other sources. This paper examines the possibility of producing selected state estimates from one core DHHS data source.

Return to Table of Contents

State Estimates with MEPS-HC Data

The MEPS-HC

National estimates of health care expenditures are an important resource for health policymakers and health services researchers. MEPS collects information regarding the use and payment for health care services from a nationally representative sample of the U.S. civilian noninstitutionalized population. In addition to the annual nationally representative expenditures estimates from MEPS, there is a growing need for estimates at the subnational level. While MEPS was designed to ensure reliable estimates at the national and regional level for individuals, families, and selected population subgroups, recent research has focused on the capacity for subnational estimates. A 2004 MEPS Statistical Brief (Machlin, et al., 2004) provided estimates of health care expenses and uninsured rates for the U.S. community population under age 65 in 10 large metropolitan areas. This paper examines the capacity for producing expenditure estimates with acceptable precision at the state level using MEPS.

The sample of households for the MEPS-HC is a subsample of households that responded to the prior year's NHIS. The MEPS sample is drawn from approximately one-half of the primary sampling units (PSUs) in the NHIS. Oversampling of households with Hispanics and African Americans carries over from the NHIS to the MEPS sample design. In addition, in forming strata for selection of the first-stage sampling units or PSUs in the NHIS, state was used as a stratification variable. This design feature carries over from the NHIS to MEPS-HC since the MEPS-HC uses approximately half of the NHIS PSUs. In this paper, the MEPS-HC design is investigated with respect to its capacity to support reliable state-level estimates for a selected number of states.

Return to Table of Contents

Number of PSUs

The first issue one must address when trying to make state estimates from a survey with a cluster sample is the number of PSUs available to make the estimates for each state. If the between PSU variance is large compared with the within PSU variance, a very small number of PSUs means a large error, regardless of the number of final stage sampling units (Cochran, 1977). For this reason, we checked the PSU structure for the MEPS. We found that if we ranked states by total population that the 10 largest states each had either more than six PSUs and/or had a number of certainty PSUs that covered a large portion (60 percent or more) of the state population. For the second largest set of states, those with population ranks from 11 through 20, the states contained at least four PSUs and/or had a certainty PSU that covered a large portion of the state population. For states with population ranks from 21 through 30, the minimum number of PSUs was three, but most had four or more non-certainty PSUs and population coverage of the selected PSUs was limited. However, since there are enough PSUs to calculate an error for each of these states, this third set of states is also used in the analysis. Most of the remaining small states had either one or no PSUs, and these were not included in the analysis.

Return to Table of Contents

Design-based state estimates

The first step of the research was to develop simple design-based estimates using the MEPS-HC design structure and data for 2002 for six types of expenditures: total (all types combined), dental visits, inpatient facility stays, office-based visits, outpatient doctor visits, and prescription drugs. This was done using the weights that were created by post-stratification to national CPS values for cells defined by age, race, gender, and marital status. For each type of expenditure, an estimate for each state was made for the percentage of persons who had that expenditure, the mean for those who had an expenditure, and the total expenditures. These estimates were created for each of the 30 largest states defined by total population. Relative standard error results were averaged for each type of estimate for three state groups: the 10 largest states, the second 10 largest states, and the third 10 largest states. These are called groups 1, 2, and 3 in order from the largest to smallest states. Results are shown in table 1. Maximum relative standard errors in each group are also shown.

The standard errors for the estimates shown in the tables in this report were produced using a set of 64 partially balanced half samples and the balanced repeated replication method (Wolter, 1985). This was done to take into account the post-stratification done to the sampling weights. For variance estimation purposes, the weights were post-stratified for each replicate. We found that use of Taylor Series methods for this first set of design-based estimates gave similar results. However, this was not true for results produced and discussed latter in this paper and shown in tables 2 and 3.

Table 1. Average and maximum relative standard errors (RSEs) by state size group for typical estimates: National post-stratification
    Conditional mean
expenditures
Percent of persons
with expenditure
Total
expenditures
Expenditure
type
Size
group
Mean
RSE
Max
RSE
Mean
RSE
Max
RSE
Mean
RSE
Max
RSE
All types combined 1 0.0989 0.1697 0.0226 0.0327 0.2159 0.3437
All types combined 2 0.1602 0.3223 0.0291 0.0424 0.3336 0.5232
All types combined 3 0.1677 0.3377 0.0389 0.1007 0.4120 0.5737
Dental visits 1 0.1423 0.2361 0.0713 0.1035 0.2387 0.3130
Dental visits 2 0.1453 0.2528 0.0947 0.1538 0.3484 0.5089
Dental visits 3 0.2499 0.5821 0.1272 0.2031 0.4438 0.6494
Inpatient facility 1 0.2110 0.4049 0.1299 0.1834 0.2928 0.4152
Inpatient facility 2 0.2920 0.5901 0.2056 0.3031 0.4618 0.7730
Inpatient facility 3 0.3495 0.8165 0.2404 0.4383 0.5629 0.7007
Office based visits 1 0.1091 0.1621 0.0330 0.0514 0.2138 0.3180
Office based visits 2 0.1417 0.2558 0.0468 0.0765 0.3212 0.4930
Office based visits 3 0.1899 0.2571 0.0603 0.1372 0.4060 0.5495
Outpatient doctors 1 0.1746 0.2661 0.1231 0.1761 0.2782 0.4306
Outpatient doctors 2 0.338 0.6895 0.2111 0.3783 0.4673 0.7005
Outpatient doctors 3 0.3764 0.7446 0.2794 0.4949 0.5280 0.7346
Prescription drugs 1 0.0953 0.1923 0.0389 0.0532 0.2305 0.4494
Prescription drugs 2 0.1290 0.2104 0.0531 0.0912 0.3401 0.531
Prescription drugs 3 0.1696 0.2695 0.0667 0.1392 0.4025 0.5868

Source: AHRQ, Household Component of the Medical Expenditure Panel Survey, 2002

Table 1 shows

  • State-level estimates with acceptable precision can be made for some states from the MEPS-HC, but there are still many estimates of poor precision. This can be seen from the maximum RSE values for many of the groups. (Note: RSE ge 0.30 is considered as poor.)
  • Estimates for the percentage of persons with each type of expenditure have good precision, while the total expenditure estimates are of uniformly poor quality.
  • The estimates are generally of best quality for expenditure types that affect the most people and thus have the greatest sample, such as all types combined or office-based visits, while the worst are inpatient facility and outpatient doctor estimates because the numbers of sample persons with these expenditure types are very small. None of the size groups of states could be published for the latter two types of expenditures.
  • The estimates decline in precision as the size of states decreases.
  • With a goal of a maximum RSEs of 20 percent, for the majority of the cases only estimates for the 10 largest states could be produced.

Most of these results with respect to the reliability of the estimates could be expected. Most are directly related to expected sample. For any type of expenditure, the percentage of persons with an expenditure is the best estimate because this estimate is based upon the entire sample and the distribution. The conditional mean and total expenditure estimates are based upon subsets of the sample in each state. Further, these estimates are based upon the distribution of expenditures which can be highly skewed. The extremely poor quality of the total expenditure estimates versus the conditional mean estimates can be attributed to the fact that the between PSU variances of population totals are much higher than the between PSU variances of average usage.

Return to Table of Contents

Design-based estimates with weights post-stratified by state

After review of the first set of results, it was decided that the estimates could be improved using weights that were post-stratified to CPS population totals at the state level. For each state, the nonresponse adjusted weights were post-stratified by state, age, race, and gender. The use of these weights produced a marked improvement in the precision of the results.

Table 2. Average and maximum relative standard errors (RSEs) by state size group for typical estimates: State level post-stratification
    Conditional mean
expenditures
Percent of persons
with expenditure
Total
expenditures
Expenditure
type
Size
group
Mean
RSE
Max
RSE
Mean
RSE
Max
RSE
Mean
RSE
Max
RSE
All types combined 1 0.0744  0.1126 0.0152 0.0206 0.0756 0.1181
All types combined 2 0.1072 0.2112 0.0198 0.0283 0.1085 0.2179
All types combined 3 0.1250 0.2635 0.0256 0.0558 0.1340 0.2582
Dental visits 1 0.1128 0.2032 0.0512 0.0794 0.1288 0.2214
Dental visits 2 0.1123 0.1947 0.0703 0.1050 0.1252 0.1974
Dental visits 3 0.1804  0.4529 0.0921 0.1366 0.1988 0.4655
Inpatient facility 1 0.1631 0.2852 0.0972 0.1482 0.1759 0.2896
Inpatient facility 2 0.2229 0.5188 0.1427 0.2009 0.2738 0.5763
Inpatient facility 3 0.2591 0.6084 0.1870 0.3432 0.3183 0.6300
Office based visits 1 0.0869 0.1217 0.0227 0.0349 0.0887 0.1252
Office based visits 2 0.1037 0.1751 0.0335 0.0589 0.1112 0.1827
Office based visits 3 0.1376 0.2090 0.0429 0.0829 0.1564 0.2494
Outpatient doctors 1 0.1399 0.2267 0.0883 0.1278 0.1574 0.2294
Outpatient doctors 2 0.2406 0.3997 0.1561 0.2291 0.2717 0.3809
Outpatient doctors 3 0.2546 0.4987 0.2078 0.3453 0.3061 0.6082
Prescription drugs 1 0.0674 0.1496 0.0254 0.0357 0.0743 0.1534
Prescription drugs 2 0.0890 0.1529 0.0349 0.0582 0.0976 0.1592
Prescription drugs 3 0.1235 0.2644 0.0418 0.1015 0.1353 0.2463

Source: AHRQ, Household Component of the Medical Expenditure Panel Survey, 2002

The use of weights post-stratified with state population totals yielded the following results shown in table 2:

  • Results are uniformly better than those in table 1.
  • Of special interest are the results for total expenditures for each type. These estimates are now of approximately the same quality as those for the conditional mean expenditures for the same type of expenditure. This improvement is likely the result of the stabilization of totals from the use of state-specific post-stratification. This post-stratification essentially makes the estimates of totals a ration estimate which uses the average usage times the mean conditional expenditure times an outside population total for each state. As we saw earlier, the RSEs of average usage and mean conditional expenditure estimates were more precise, and this change in the estimates of total expenditures results in estimates with RSEs closer to those of the first two types of estimates. Nevertheless, there are still estimates with very large RSEs.
  • Although improved, estimates for inpatient facility and outpatient doctors are still problematic.
  • For all other expenditure groups, all types combined, office-based visits, dental visits, and prescription drugs, the improvements are such that one could make estimates for the 20 largest states in size groups 1 and 2, with very few estimates with an RSE greater than 20 percent.

Estimates in an AHRQ report titled Estimates of Health Care Expenditures for the 10 Largest States, 2002, are based on this methodology (Machlin and Sommers, 2005).

Return to Table of Contents

Composite estimation

Given that we have produced estimates for every state, a small area estimation approach could potentially be used to adjust and improve these estimates by "borrowing strength across states." In this case, one can consider a random effects model or a Bayesian approach based upon such a model (Ghosh and Rao, 1994). However, it was decided to use a method that makes no assumptions about the relationships among estimates and is simple to apply across a large number of estimates.

It was decided to apply a composite estimation technique to estimates of the conditional mean expenditures and percentages with an expenditure. Strength can be borrowed from sample estimates at the Census Division level, the regional level, or the national level. This type of estimate uses a weighted average

A composite estimation equals to a weighted average of synthetic estimate plus a weighted average of direct estimate

where Y sub 1 and Y sub 2 are usually a synthetic and direct estimates of the same item. The weight is determined by minimizing the mean squared error (MSE) of the linear combination of estimators. In this case, the sample Census Division estimate for a larger geographic area was chosen as the synthetic estimate and the state estimate as the direct estimate. The Division estimates were chosen as the synthetic estimate because there seemed to be good correlation between estimates of states within the same Division. Thus, it seemed that the other states in the Division would be most helpful in providing information and strength for one another within the Division. Under these conditions, assuming that the state estimate is an unbiased estimate of the state value, then the value of w which minimizes the MSE of the composite estimator is

W equals (variance of state estimate minus co-variance of synthetic estimate and state estimate) divides by
(variance of state estimate plus variance of synthetic estimate minus two times co-variance of synthetic estimate and state estimate
plus squared value of bias)

where s denotes the design-based state estimator and d the division estimator for the larger geographic area. Bias squared in this case is the squared difference in expected values of the state and synthetic estimates:

The squared value of bias equals to the square of expected value of state estimate minus expected value of 
synthetic estimate

This result can be obtained simply by taking the MSE of the composite estimate and using differentiation to calculate the value of w which minimizes the result.

The terms in the numerator can be estimated using the half samples. The bottom term can be estimated using the square of the difference of the two estimators. However, that estimate can be very unstable. Thus, a method using the sum of squared differences of the half sample estimates was developed to estimate the denominator term, which is the variance of the difference of the two estimators plus the squared difference in their expected values. Using these estimates for w, new estimates and their MSEs and RSEs were calculated using

The variance of composite estimation equals squared value of weight times variance of synthetic 
estimate plus squared value of one minus weight times variance of state estimate minus two times weight times one minus weight
times co-variance of synthetic estimate and state estimate

The Mean Squared Error (MSE) of composite estimation equals squared value of weight times variance of synthetic 
estimate plus squared value of one minus weight times variance of state estimate plus two times weight times one minus weight
times co-variance of synthetic estimate and state estimate then plus squared value of weight times squared value of bias

This MSE was estimated using the individual parts developed in the calculation of w. Of special interest is the estimate of the bias. One can use the fact that

The expected squared value of synthetic estimate minus state estimate equals to the variance of synthetic estimate 
and state estimate then plus squared bias

One can just take the difference of the two estimates squared and subtract an estimate of the variance of the difference of the two estimates to obtain an estimate of the bias. However, this is an unstable estimate. We instead take advantage of the following expected value for the difference of the two estimates for the ith half sample:

The expected squared value of the ith half sample of synthetic estimate minus the ith half sample of state estimate 
equals to two times the variance of synthetic estimate and state estimate then plus squared bias

Thus we average the values of the difference squared for each half sample and subtract the standard estimate of twice the variance to obtain an estimate of the bias.

Because the value of w is an estimate, the sample estimate of this variance is only an estimate of the variance of the estimator given the estimate of w. This does not account for the expected variance due to the estimation of w. To account for this variation, w was estimated using groups of half samples. Specifically, w was estimated using these sets of half samples and half sample estimates of the composite were made by varying the estimates of w with the full sample values of the model and division estimates. Addition of this term was prompted by bias found by Prassad and Rao (1990) in formulas that did not consider the variation caused by estimation of w. This term did not add large sums to the overall errors. This indicates that the estimates of variance and bias are generally stable. However, when the value of w was very close to 1 or 0, the estimates of w became less stable. It was decided that the value of w would be limited to values between .1 and .9 for the final composite estimates currently produced.

Return to Table of Contents

Table 3 below shows values obtained using the composite estimation technique, comparable to those in tables 1 and 2 for conditional mean expenditures and percentage with the expenditure type. This synthetic estimate was chosen because it gave better results than using the Census Division and about the same quality as combining state and national estimates while not changing the original value of the state estimates to the same degree as using the national estimates as the synthetic estimate.

We should note that as with any set of error estimates made using sampling data, the estimates of mean squared errors used have errors also. Thus, some of the composite estimates could have less quality than the numbers indicate. This is true when making estimates of error for any set of estimates. However, we base our final evaluation of the quality of the composite estimates when compared to the other unbiased estimators not on the results for just one estimate, but on the fact that as a group the estimates of errors for the composite estimators are almost uniformly better than those for the more standard design-based estimates.

Table 3. Average and maximum relative mean squared errors (RMSEs) by state size group for typical estimates: Composite estimation using regional estimates
    Conditional mean
expenditures
Percent of persons with
expenditure
Expenditure
Type
Size
Group
Mean
RMSE
Max
RMSE
Mean
RMSE
Max
RMSE
All types combined 1 0.0587 0.0914 0.0107 0.0189
All types combined 2 0.0533 0.1116 0.0117 0.0176
All types combined 3 0.0758 0.1556 0.0143 0.0241
Dental visits 1 0.0699 0.1032 0.0418 0.0629
Dental visits 2 0.0688 0.1311 0.0536 0.0682
Dental visits 3 0.0912 0.1742 0.0719 0.1378
Inpatient facility 1 0.1086 0.1441 0.0694 0.0989
Inpatient facility 2 0.1217 0.1521 0.0587 0.1149
Inpatient facility 3 0.1750 0.3330 0.0872 0.2130
Office based visits 1 0.0598 0.1112 0.0163 0.0272
Office based visits 2 0.0609 0.1009 0.0182 0.0359
Office based visits 3 0.0657 0.1240 0.0261 0.0538
Outpatient doctors 1 0.0866 0.1223 0.0747 0.1212
Outpatient doctors 2 0.1119 0.2355 0.1243 0.1689
Outpatient doctors 3 0.1639 0.4506 0.1778 0.3118
Prescription drugs 1 0.0497 0.0841 0.0193 0.0252
Prescription drugs 2 0.0536 0.1053 0.0193 0.0459
Prescription drugs 3 0.0747 0.2189 0.0267 0.0513

Source: AHRQ, Household Component of the Medical Expenditure Panel Survey, 2002

One can see from table 3 that this technique provides several advantages.

  • In spite of their biased nature, estimates have uniformly better estimated relative mean squared errors than the previous two methods.
  • Improvement was greatest where it was needed most, i.e., in size groups 2 and 3 and inpatient facility and outpatient doctors.
  • Improvements were such that one could make estimates for all expenditure types for state size groups 1 and 2, the 20 largest states, and have very few estimates with greater than a 20 percent relative standard error and none with a relative standard error greater than 30 percent. With the exception of errors for inpatient facility and outpatient doctors, the vast majority of the estimates for the 20 largest states have relative errors of less than 10 percent.
  • For all but the least common types of expenditures, inpatient facility and outpatient doctors, most of the estimates for the set of the smallest states, size group 3, are acceptable using a 10 percent relative error as the standard of acceptance. Few of these estimates for the smallest states have relative error measures of over 20 percent.

One should note that although estimates of totals are not evaluated, if one has good estimates of means and proportions, one can make estimates of totals by multiplying means by total population estimates from other sources.

Return to Table of Contents

Conclusions and Recommendations

We made two sets of direct state estimates with 2002 MEPS-HC data for the 30 largest states. The first set was produced using standard nationally post-stratified weights, and the second used weights post-stratified within each of the largest states. Each set of estimates included estimates for six types of expenditures and three measures, an estimate for total expenditures, and estimates for conditional mean expenditure per person with an expenditure and for the percentage of persons with an expenditure. After this, estimates for the same six types of expenditures were made for the mean and the percentage with an expenditure using a small area technique where strength was borrowed for state estimates from data for the Census Division. Several patterns of information surfaced:

  • Estimates decreased in quality as the population size of the state decreased and the number of PSUs decreased.
  • The best estimates in any group were for the percentage of persons who had expenditures. Estimates of error for the conditional mean expenditure and total expenditures were correlated with the percentage of persons who had the expenditure, i.e., sample size used.
  • Overall, the relative errors of the estimates were best for the small area technique and worst based on the nationally stratified weights.
  • One can make estimates for the 20 largest states for the more common expenditures with the state post-stratified weights, and one can generally make improved estimates with the small area technique for each state for all the expenditure types tested.

The research was very successful and confirms the feasibility for making state estimates with the MEPS-HC to help inform health policy decisions at the state level. Given these results, it seems that estimates could be produced for additional survey variables at the state level for the 20 largest states. Whenever possible, the method used should be the small area technique. Efforts should be focused on items that affect a large enough portion of the population so that enough sample is available to produce estimates with acceptable quality. Among the items that might be considered are expenditures for obese and overweight persons, persons with private health insurance or persons without health insurance, all subsets that might have a large enough sample to produce reliable estimates.

Given the simplicity of the process and that development of state stratified weights and software to produce these estimates have been completed, additional estimates can and should be produced and assessed to determine how many state-level estimates can be produced on a regular basis in the future.

Return to Table of Contents

Acknowledgments

The author would like to thank Trena Ezzati-Rice and Steve Machlin for their valuable ideas and comments which were used freely in the production of this work.

Return to Table of Contents

References

Botman SL, Moore TF, Moriarity CL, and Parsons VL. Design and Estimation for the National Health Interview Survey, 1995?004. National Center for Health Statistics, Vital Health Stat 2(130). 2000.

Census Bureau Web site:
a: http://www.census.gov/programs-surveys/sipp/methodology/sampling.html
b: http://www.census.gov/hhes/www/housing/ahs/statedata.html
c: https://www.census.gov/topics/income-poverty/poverty.html
d: http://www.census.gov/hhes/www/housing/ahs/ahs01/appendixb.pdf

Cohen SB. Sample Design of the 1997 Medical Expenditure Panel Survey Household Component. MEPS Methodology Report No 11. AHRQ Pub. No. 01-0001. Rockville, MD: Agency for Healthcare Research and Quality. 2000. http://www.meps.ahrq.gov/

Cochran WG. Sampling Techniques. New York: John Wiley and Sons. 1977.

DeNavas-Walt C, Proctor BD, and Mills RJ. Income, Poverty and Health Insurance Coverage in the United States: 2003. U.S. Census Bureau, Current Population Reports P60-226. U.S. Government Printing Office: Washington, DC. 2004.

Ghosh M and Rao JNK. Small area estimation: An appraisal. Statistical Sciences, Vol. 9, No. 1, 55?3. 1994.

Kaiser Family Foundation. Employer Health Benefit, Annual Survey. Menlo Park, CA. 2004.

Kostanich DL and Dippo CS. Current Population Survey: Design and Methodology. Technical Paper 63RV. U.S. Government Printing Office: Washington, DC. 2002.

Machlin SR, Nixon AJ, and Sommers JP. Health Care Expenditures and Percentage Uninsured in 10 Large Metropolitan Areas, 2000. Statistical Brief #38. Agency for Healthcare Research and Quality. Rockville, MD. February 2004. http://www.meps.ahrq.gov/mepsweb/data_files/publications/st38/stat38.pdf

Machlin SR and Sommers JP. Estimates of Health Care Expenditures for the 10 Largest States, 2002. Statistical Brief #69. Agency for Healthcare Research and Quality, Rockville, Md. February 2005. http://www.meps.ahrq.gov/mepsweb/data_files/publications/st69/stat69.pdf

Medical Expenditure Panel Survey Web site: http://www.meps.ahrq.gov/mepsweb/data_stats/quick_tables_search.jsp?component=1&subcomponent=0

National Association of State Budget Officers. State Expenditure Report. Washington, DC. 2003.

National Center for Health Statistics Web site: http://www.cdc.gov/nchs

Prasad NGN and Rao JNK. The estimation of mean squared errors of small area estimators. Journal of American Statistical Association, 85: 163-171. 1990.

Wolter KM. Introduction to Variance Estimation. New York: Springer-Verlag. 1985.

Return to Table of Contents

Return to the MEPS Homepage


Suggested Citation:
Sommers, J. P. Producing State Estimates with the Medical Expenditure Panel Survey, Household Component. Methodology Report No. 16. December 2005. Agency for Healthcare Research and Quality, Rockville, Md. http://www.meps.ahrq.gov /data_files/publications/mr16/mr16.shtml