Font Size:

Methodology Report #26:
Variance Estimation from MEPS Event Files

Chowdhury S., Machlin S.

Table of Contents

Abstract
Background
Variance Estimation from MEPS Event Files
Introduction
MEPS Event Files
Domain Variance Estimation
Comparison of Variance Estimates from Event Files
Table 1. Sizes of different MEPS Event files in 2008
Table 2. Comparison of variance estimates obtained from Home Health Visit Event file and from merging of Event and Full Year files
Table 3. Comparison of variance estimates obtained from Hospital Inpatient Stay Event file and from merging of Event and Full Year files
Table 4. Comparison of variance estimates obtained from Outpatient Department Event file and from merging of Event and Full Year files
Table 5. Comparison of variance estimates obtained for expense estimates from Event file and from merging Event and Full Year files for subpopulations
Conclusion
References

_^top

Abstract

A series of public use files (PUFs) are released from the Medical Expenditure Panel Survey (MEPS) Household Component every year. The most commonly used files are at the person level (with household and family identifiers) but some files are at lower levels such as medical conditions and medical events files. Eight event-level files are published from MEPS each year. For each person in the MEPS sample, these event files may include no record, a single record, or multiple records depending on the number of events the person had during the year. Therefore, an event file only includes records related to a subset of persons in the full year person file. Since the number of persons in an event file is not known before conducting the survey, any analysis of estimates from event files should be treated as a domain analysis which requires the entire sample to take all variability into account in estimating the variance of domain estimates. That is, the analysis should ideally include all persons with and without events in the file. However in practice, since it is convenient to deal with the subset of cases with events only, users generally compute event level estimates without merging persons with no event from the full person-level file. The impact of not doing a domain analysis is usually negligible if the subset (the domain) is large compared to the full file. This report looks into the issue of variance estimation from the MEPS event files and evaluates the impact of not doing a proper domain analysis on variance estimates.

Suggested Citation

Chowdhury, S., Machlin, S. Variance Estimation from MEPS Event Files, Methodology Report No. 26. September 2011. Agency for Healthcare Research and Quality, Rockville, MD. http://www.meps.ahrq.gov/mepsweb/data_files/publications/mr26/mr26.pdf

* * *

The estimates in this report are based on the most recent data available at the time the report was written. However, selected elements of MEPS data may be revised on the basis of additional analyses, which could result in slightly different estimates from those shown here. Please check the MEPS Web site for the most current file releases.

Center for Financing, Access, and Cost Trends
Agency for Healthcare Research and Quality
540 Gaither Road
Rockville, MD 20850

http://www.meps.ahrq.gov

_^top

Background

The Medical Expenditure Panel Survey (MEPS)

Household Component

The Medical Expenditure Panel Survey (MEPS) provides nationally representative estimates of health care use, expenditures, sources of payment, and health insurance coverage for the U.S. civilian noninstitutionalized population. The MEPS Household Component (HC) also provides estimates of respondents' health status, demographic and socio-economic characteristics, employment, access to care, and satisfaction with health care. Estimates can be produced for individuals, families, and selected population subgroups. The panel design of the survey, which includes five rounds of interviews covering two full calendar years, provides data for examining person level changes in selected variables such as expenditures, health insurance coverage, and health status. Using computer assisted personal interviewing (CAPI) technology, information about each household member is collected, and the survey builds on this information from interview to interview. All data for a sampled household are reported by a single household respondent.

The MEPS-HC was initiated in 1996. Each year a new panel of sample households is selected. Because the data collected are comparable to those from earlier medical expenditure surveys conducted in 1977 and 1987, it is possible to analyze long-term trends. Each annual MEPS-HC sample size is about 15,000 households. Data can be analyzed at either the person or event level. Data must be weighted to produce national estimates.

The set of households selected for each panel of the MEPS-HC is a subsample of households participating in the previous year's National Health Interview Survey (NHIS) conducted by the National Center for Health Statistics of the Centers for Disease Control and Prevention. The NHIS sampling frame provides a nationally representative sample of the U.S. civilian noninstitutionalized population and reflects an oversample of blacks, Hispanics and, starting in 2006, Asians. MEPS oversamples additional policy relevant subgroups such as low income households. The linkage of the MEPS to the previous year's NHIS provides additional data for longitudinal analytic purposes.

Medical Provider Component

Upon completion of the household CAPI interview and obtaining permission from the household survey respondents, a sample of medical providers are contacted by telephone to obtain information that household respondents cannot accurately provide. This part of the MEPS is called the Medical Provider Component (MPC) and information is collected on dates of visit, diagnosis and procedure codes, charges, and payments. The Pharmacy Component (PC), a subcomponent of the MPC, does not collect charges or diagnosis and procedure codes but does collect drug detail information, including National Drug Code (NDC) and medicine name, as well as date(s) prescriptions are filled, sources and amounts of payment. The MPC is not designed to yield national estimates. It is primarily used as an imputation source to supplement and/or replace household reported expenditure information.

Survey Management

MEPS-HC and MPC data are collected under the authority of the Public Health Service Act. Data are collected under contract with Westat. Data sets and summary statistics are edited and published in accordance with the confidentiality provisions of the Public Health Service Act and the Privacy Act. The National Center for Health Statistics (NCHS) of the Centers for Disease Control and Prevention provides consultation and technical assistance related to the selection of the MEPS household sample.

As soon as data collection and editing are completed, the MEPS survey data are released to the public in staged releases of summary reports, micro data files, and tables via the MEPS Web site: http://www.meps.ahrq.gov. Selected data can be analyzed through MEPSnet, an on-line interactive tool designed to give data users the capability to statistically analyze MEPS data in a menu-driven environment.

Additional information on MEPS is available from the MEPS project manager or the MEPS public use data manager at the Center for Financing, Access, and Cost Trends, Agency for Healthcare Research and Quality, 540 Gaither Road, Rockville, MD 20850; 301-427-1406, or e-mail mepspd@ahrq.gov.

Be sure to specify the AHRQ number of the document or CD-ROM you are requesting. Selected electronic files are available through the Internet on the MEPS Web site: http://www.meps.ahrq.gov/

_^top

Introduction

The Medical Expenditure Panel Survey (MEPS) is a set of large-scale surveys of families and individuals, their medical providers (doctors, hospitals, pharmacies, etc.), and employers across the United States. MEPS provides estimates of specific health services use by the U.S. civilian noninstitutionalized population, the payments for these services, sources of payment, and the cost and scope of health insurance of U.S. workers. MEPS has three components: the Household Component (HC), Medical Provider Component (MPC) and the Insurance Component (IC). The Household Component collects data from individual households and their members in selected communities across the United States, drawn from a nationally representative subsample of households that participated in the prior year's National Health Interview Survey (conducted by the National Center for Health Statistics). The data collected from households are supplemented by data from their medical providers collected in the MPC. The Insurance Component is a separate survey of employers that provides data on employer-based health insurance.

The MEPS Household Component (which will be generally referred to as MEPS hereafter) collects detailed information for each person in the household on demographic characteristics, health conditions, health status, use of medical services, charges and source of payments, access to care, satisfaction with health care, health insurance coverage, income, and employment. The panel design of the survey, which features five rounds of interviewing covering two full calendar years, makes it possible to determine how changes in individuals’ health status, income, employment, eligibility for public and private insurance coverage, use of services, and payment for care are related.

MEPS data are available on the MEPS Web site in data tables, downloadable data files (person, job, event, or condition level) and interactive data tools, as well as in publications using HC data. The main public use files (PUFs) released from MEPS are the full-year (FY) consolidated file and related medical conditions and medical event level files. The FY file includes records at the person level with family and dwelling unit (DU) identifiers and provides individual level information on health status, socio-demographic characteristics, employment, insurance, access to care, and various other related data items. The conditions file provides detailed information about each heath condition reported by the household respondent. It includes records at the person/condition level. Each event file consists of a specific type of event for persons in the corresponding full-year consolidated file. In the conditions file or an event file, some persons may have multiple conditions/events within the year and thus have multiple records and other individuals may have no medical conditions or events and have no records on the particular file. The conditions file or an event file includes identifiers to link each event to the individual in the person-level file who had the condition or the event.

In analyzing MEPS event (or conditions) files, estimates are often produced from the event file without merging all person records from the full file (i.e., only using the subset of persons with events). For point estimates, since the persons with no events will have no contribution to event related estimates, there is no need to keep those persons in the file. However, for variance estimation, since the persons with events (or conditions) are a subset of all persons, the subset total in the population is not known and the sample size is random, the theoretically correct approach is to include all persons (with or without an event) in the file and variance estimates are produced by treating the persons with and without events as separate domains. This approach to analysis is called domain analysis. Domain analysis takes the variability into account by using the entire sample in estimating the variance of subgroup estimates. For more information about domain analysis, see Lohr (1999), Cochran (1977), and Fuller et al. (1989). The ‘domain’ statement in SAS survey procedures, ‘subgroup’ statement in SUDAAN, and similar features in other variance computation software are designed to correctly estimate variances in such situations. If the estimates are produced using the records in the event file only i.e., using the subset of persons with events from the FY file, the variances may not be estimated correctly. The extent of deviation of the estimated variance from the correct estimate for this technically improper analysis depends on the size of the subset compared to the full file. If the size of the subset is large and the number of variance strata with singleton primary sampling units (PSUs) is small or zero, the impact on the variance estimates will generally be negligible. This study examines this issue of variance estimation from the MEPS event files, with particular focus on the impact on variance estimates of analyzing the subset of cases with events only without doing a domain analysis that incorporates persons with no reported medical events.

_^top

MEPS Event Files

Eight public use event files from MEPS are published each year: prescription medicines, dental visits, other medical expenses, hospital inpatient stays, emergency room visits, outpatient department visits, office-based medical provider visits, and home care. Each of these files is an event-level file consisting of specific type of events for persons in the corresponding full-year consolidated file. Table 1 summarizes the sizes of 2008 MEPS event files and brief descriptions of different event files are provided below.

In a prescription medicines event file, each record represents a unique prescribed medicine event that is reported by the household respondent as being purchased or otherwise obtained for a household member. The file contains an identifier for each unique prescribed medicine and information on the detailed characteristics associated with the event; selected Multum Lexicon variables; conditions, if any, associated with the medicine; the date on which the person first used the medicine; total expenditures and sources of payments; and types of pharmacies that filled the household’s prescriptions.

The dental visits event file contains variables pertaining to household reported dental visits. The file includes the date of the dental event, type of provider seen, if the visit was due to an accident, reason for the dental event, whether or not medicines were prescribed, expenditures, and sources of payment.

The other medical supplies event file contains information on the purchase of and expenditures for medical equipment, supplies, glasses and other medical items purchased, and sources of payment. Each record in this file contains information for the whole calendar year for all items except glasses for which each record contains information for a data collection round.

The hospital inpatient stays event file contains characteristics associated with the hospital inpatient stay event, such as the date of the hospital inpatient stay, reason for the stay, types of services received, condition(s) and procedure(s) associated with the hospital inpatient stay, whether or not medicines were prescribed, expenditures, and sources of payment.

The emergency room visits event file contains characteristics associated with the emergency room visit, such as the date of the visit, types of care and services received, types of medicine prescribed during the visit, condition codes, expenditures, and sources of payment.

The outpatient visits event file contains characteristics associated with the outpatient visit data, such as the date of the visit, type of provider seen, type of care received, type of services provided, expenditures, and sources of payment.

The office-based medical provider visits event file contains characteristics associated with the office-based visit, such as date of the visit, type of provider seen, time spent with the provider, types of treatment and services received, types of medicine prescribed, condition codes, expenditures, and sources of payment.

The home health event file can be used to make estimates of the utilization and expenditures associated with home health care. The file contains monthly information on expenditures for home health visits, types of providers, types of services received, lengths of visits, reasons for the visits, expenditures, and sources of payment. Each record in this file represents a month of care.

Moreover, the above files include various ID variables that can be used to link events to individuals in the FY person-level file or to other events or conditions in those files.

_^top

Domain Variance Estimation

Estimates from sample surveys like MEPS are often produced for different subgroups or subpopulations into which the population can be divided. For example, estimates may be required for groups with different types of health insurance, or persons with and without a health condition or events, or for a particular ethnic group. These subgroups are called domains or subdomains of study. The interest may concentrate on a particular domain in which the persons have certain characteristics or events e.g., in the analysis of a particular type of event in MEPS. In that case, point estimates can be produced from the domain but the full sample is required for variance estimation. This is called domain analysis.

If the file is subset to the domain of interest only there will be no problem in producing the point estimates such as mean, percentage, or total. However, the variances or standard errors of these point estimates may be computed incorrectly from the subsetted file because the subset may not contain the full sample design information or the share of the domain to the full population to compute the variance correctly. This is not a problem if the sample is selected separately from each domain and the domain size in the population is known and the weighting adjustment is made independently within each domain. When the sample is not selected independently within each domain and the size is not fixed, the sample size becomes random in repeated draws. Also, if the population total of the domain is not known and not benchmarked at the domain level, the variance of the estimate of a total of a variable (say total expense) not only depends on the variance of the mean expense per person, but also the variance of the estimate of the total number of persons in the domain. The full file with all domains is required to compute the variance of the total or the proportion that belong to the domain. The estimate of the mean in this case becomes the case of an estimate of a ratio because both the numerator and the denominator of the mean are estimates and the variance of the mean needs to be correctly estimated by treating it as a ratio. For means, this complication can be avoided by assuming that the sample size for the domain is fixed over repeated draws of the sample of same overall size. The problem is more complex for computing the variance of an estimate of total.

If a simple random sample of size is selected from a population of size and the sample randomly includes units from total units in domain . Then is the sampling weight (in the absence of nonresponse) for the ith unit with . If the variable of interest is then the population mean, , for domain can be estimated as

and the population total, , for domain can be estimated as

if is known

if is unknown

This shows for the point estimation of the mean and total for the domain, only the cases within the domain are required irrespective of whether the domain total is known or unknown. Of course when is unknown, it is implicitly estimated from the full sample. However, for variance estimation for domain estimates, since the sample size in the domain, , is random, using only the cases within the domain is not sufficient to capture all components of the variance to compute the variance correctly. Moreover, since is implicitly estimated from the sample for the estimate of total, the full sample is required to capture the variance of this component to accurately compute the variance of the estimate of total. In this case, the variance of the total is estimated from the full sample as follows:

ignoring finite population correction (fpc)

where, with and . If the

variance is calculated from the cases in the domain only, it will not reflect the full variance of the estimate.

In addition to the theoretical reasons, there are practical reasons for keeping the full file and using the domain option for estimating the variance of a domain estimate. When a complex cluster sample design is used, the variance of a survey estimate is often computed based on variance strata and clusters (PSUs) using the Taylor series approximation. This approach needs at least two PSUs within each variance stratum to compute the variance by accounting for the variance contributions from all strata. If the domain of interest is small or clustered in certain areas so that some PSUs do not include any case from the domain then some variance strata appear to have no PSU or only one PSU when the file is subset to the domain. The situation of one PSU within a stratum is known as the singleton PSU problem. In this case, it is not possible for variance computation software to correctly compute the variance from that stratum unless the full file is provided and a domain analysis is requested. In the absence of the full file, different software packages deal with a singleton PSU differently. SAS complex survey procedures exclude the singleton PSUs from variance calculation, SUDAAN imputes a value equal to the overall mean of all other PSUs for the missing PSU if the MISSUNIT option is used, and STATA offers different options including the approach SAS and SUDAAN use. Therefore, variance estimates from different software may not be identical. When there are more than two PSUs in a stratum and the domain has cases in at least two PSUs but not in all PSUs, none of the software can account for the missing PSUs when the file is subset to the domain. This can also lead to an under estimation of variances. Calculations of degrees of freedom (df), design effect, hypothesis testing, etc. are also affected by this. For example, to compute df, SUDAAN counts the number of PSUs and strata with at least one observation from the domain. In contrast, SAS 9.1 Survey procedures compute df as the number of clusters (PSUs) in the non-empty strata minus the number of non-empty strata after excluding the singleton PSUs. When the df is not correctly computed it may affect the confidence interval or hypothesis testing and the resulting inference, particularly when the available df is small.

Theoretically, the variance is generally underestimated if the full file is not used. But the impact of a singleton PSU may be positive or negative depending on the situation and the software. However, overall the impact of using the subset and not the full file depends on the size and clustering of the domain compared to the full population. As the domain size gets larger, the impact becomes smaller and smaller both from theoretical grounds and because of the smaller number of singleton PSUs.

For further information about variance estimation for domain estimates, see Lohr (1999), Cochran (1977), Fuller et al. (1989), and user manuals for SAS survey procedures (SAS, 2004) and SUDAAN (Shah et al., 1997).

_^top

Comparison of Variance Estimates from Event Files

Table 1 presents a comparison of MEPS event files for 2008 in terms of the two factors that may affect the variance estimation from an event file—the file size and the number of variance strata with singleton PSUs. Three files with one or more singleton PSUs are Home Health, Hospital Inpatient Stays, and Outpatient Visits. These files also have the smallest number of persons. To investigate the impact on variance estimates when using the file subset to persons with events only, we compared the variances for selected estimates from these three files. Since these files have the smallest number of persons and have one or more singleton PSUs, any impact on variance from not doing a proper domain analysis should be more pronounced on the estimates from these files.

_^top

**Table 1. Sizes of different MEPS Event files in 2008**
Event file	Total number of event records	Corresponding number of persons and percentage of the full person file		Number of variance strata with single PSU
Event file	Total number of event records	Number	Percentage	Number of variance strata with single PSU
A. Prescribed Medicine	293,379	17,969	57.5	0
B. Dental Visits	26,253	11,639	37.2	0
C. Other Medical Expense	6,787	5,251	16.8	0
D. Hospital Inpatient Stays	2,821	2,113	6.8	4
E. Emergency Room Visits	6,115	4,165	13.3	0
F. Outpatient Visits	11,173	3,967	12.7	1
G. Office-Based Medical Provider Visits	136,460	21,208	67.8	0
H. Home Health	4,372	692	2.2	50

For the purpose of the analysis, each of these event files is merged with the 2008 FY person file by dwelling unit-person identifier (DUPERSID) and all records with necessary variables from both files are kept on the merged file. The merged file becomes an expanded event level file with one record for each person with no event (with missing values for event related variables) but one or more records for persons with events depending on the number of events. An indicator variable (say, event indicator) is created to indicate if the record came from the event file or not.

Estimates and standard errors (SEs) are then produced using SAS survey procedures in two different ways: 1) by subsetting only the records with events i.e., using a ‘by’ statement in SAS and 2) by performing a domain analysis using ‘event indicator’ as the domain. Using the ‘by’ statement only the persons with events are included in the analysis, which is equivalent to using the event file. In contrast, the expanded file with all person records with and without an event is used and estimates are produced when using the domain statement in SAS.

Tables 2 to 4 present comparative results under the two approaches for a selection of estimates of percentages, means, and totals from the three selected event files. As explained previously, the point estimates are the same under both approaches and the differences are only in standard errors (SEs). The differences in SEs are more pronounced for the estimates from the Home Health file which has 692 person records and 50 singleton PSUs, negligible for the estimates from the Hospital Inpatient Stays file which has 2,113 person records and only four singleton PSUs, and also negligible for the Outpatient Department visits file which has 3,967 person records and only one singleton PSU. For example, the SE of the estimate of mean expense per month for home health is 82.05 when produced from the event file and 85.38 when produced from the full file with the domain statement. For hospital inpatient stays, the SE for the estimate of mean expense per stay is 22.09 when produced from the event file and 22.15 when produced from the full file. For outpatient department visits, the SE for the estimate of mean expense per visit is 51.75 when produced from the event file and 51.66 when produced from the full file. A similar analysis was done using the Emergency Room Visits file, which has 4,165 person records and no singleton PSU, and not surprisingly found no difference in SEs of almost all estimates. It appears that the difference in SEs decreases or disappears as the number of persons increases and the number of singleton PSUs decreases in the event file.

When there is a difference, SEs are generally higher when estimated from the full file with domain analysis than when estimated from the event file. The differences are slightly more pronounced for SEs of totals than for SEs of means and percentages. There is a big difference in degrees of freedom (df) for estimating SEs from the event file and the full file. This is because in the event file some PSUs have no records but in the full file all PSUs have some records with or without events. However, since df is large in both cases, this difference in df will not have any impact on the inference here. If the df were small (say, less than 30) in one or both cases, the inference in terms of statistical testing or forming confidence intervals would be more precise from the full file.

As mentioned earlier, the treatment of singleton PSUs are different in SAS and SUDAAN. In SAS, singleton PSUs are excluded from the estimation of variances but in SUDAAN, when the MISSUNIT option is used, the overall mean of PSUs is used for the missing PSU to compute variances from a stratum with a singleton PSU. Since the number of stratum with singleton PSUs is small for the Inpatient and Outpatient files, there was no noticeable difference between the SE estimates from SAS in tables 3–4 and those from SUDAAN (not shown in any table).

Table 5 presents a comparison when the estimates are produced for subgroups within the Inpatient and Outpatient event files to see if there is any higher difference in SEs at that level. For this comparison, SEs are produced using three approaches: 1) subsetting the event file to the records in the subgroup of interest (i.e., using a ‘by’ statement in SAS), 2) using the event file and specifying subgroups as a domain, and 3) using the full file (with and without events) and specifying subgroups as a domain. The table shows that when a domain analysis is done either using the event file or the full file, the differences in SEs are small and negligible. However, if the analysis is done by subsetting the file to the subgroup of interest or by using a ‘by’ statement in SAS, the differences in SEs are substantial. For example, the SE of the estimate of mean expense for hospital inpatient stays for Hispanics is $1,295.6 when the analysis is done using the ‘by’ statement, $1,363.8 when the domain statement is used in the event file, and $1,360.6 when the domain statement is used in the full file. There are substantial differences in df. The df is substantially smaller when the estimates are produced by subsetting to the subgroup than when the estimates are produced using the domain statement either from the full event file or from the full file. The df available for variance estimation is large enough in either the full event file or the full file that the difference can be ignored. That means, if the domain statement is used for subgroup analysis, either the event file or the full file can be used without worrying about a substantial impact on the estimates of SEs or df for all event files except for the Home Health file. A ‘by’ statement or further subsetting of file to the subgroup of interest should never be used for analyzing any subgroup within an event file. In this comparison, the Home Health file is not included as the SEs are showing differences even at the overall level. Therefore, a domain analysis with the full Home Health file should always be used for estimation either at the overall or at the subgroup level.

_^top

**Table 2. Comparison of variance estimates obtained from Home Health Visit Event file and from merging of Event and Full Year files**
Variable	Estimate	SE of estimate		Degrees of freedom
		Event file	Full file	Event file	Full file
Insurance status
Private	29.76%	2.70	2.86	106	205
Public	69.49%	2.72	2.88
Uninsured	0.75%	0.18	0.20
Race/ethnicity
Hispanic	9.93%	1.70	1.78	106	205
NH-black	16.83%	1.49	1.68	106	205
Provider work for agency, hospital, nursing home?
Yes	74.64%	2.42	2.52	106	205
Any care due to hospitalization?
Yes	35.33%	2.50	2.59	106	205
Expense ($)
Mean/visit	$1,366	82.05	85.38	106	205
Total	$48.87B	4.72B	4.99B	106	205
OOP expense ($)
Mean/visit	$153.7	37.17	37.62	106	205
Total	$5.49B	1.54B	1.55B	106	205

_^top

**Table 3. Comparison of variance estimates obtained from Hospital Inpatient Stay Event file and from merging of Event and Full Year files**
Variable	Estimate	SE of estimate		Degrees of freedom
		Event file	Full file	Event file	Full file
Insurance status
Private	54.58%	1.65	1.66	179	205
Public	40.27%	1.60	1.61
Uninsured	5.15%	0.60	0.60
Had surgery	39.20%	1.34	1.35	179	205
Race/ethnicity
Hispanic	10.58%	1.03	1.03	179	205
NH-black	12.79%	1.01	1.02	179	205
Expense
Mean/visit	$11,349	427.11	424.59	179	205
Total	$329.9B	17.43B	17.73B	179	205
OOP expense ($)
Mean/visit	$312.6	22.09	22.15	179	205
Total	$9.09B	0.669B	0.675B	179	205
Number of nights
Mean/visit	5.22	0.21	0.21	179	205
Total	151.8M	8.46M	8.58M	179	205

_^top

**Table 4. Comparison of variance estimates obtained from Outpatient Department Event file and from merging of Event and Full Year files**
Variable	Estimate	SE of estimate		Degrees of freedom
		Event file	Full file	Event file	Full file
Insurance status
Private	66.07	2.82	2.82	191	205
Public	30.48	2.90	2.90
Uninsured	3.45	0.51	0.51
Any surgery?
Yes	12.24	0.78	0.78	191	205
Race/ethnicity
Hispanic	6.08	0.77	0.77	191	205
NH-black	10.48	1.45	1.45	191	205
Expense ($)
Mean/visit	$792.67	51.75	51.66	191	205
Total	$97.76B	7.23B	7.25B	191	205
OOP expense ($)
Mean/visit	$66.60	5.28	5.28	191	205
Total	$8.21B	593M	595M	191	205

_^top

**Table 5. Comparison of variance estimates obtained for expense estimates from Event File and from merging of Event and Full Year files for subpopulations**
			SE of estimate			Degrees of freedom
	Estimate		Event file		Full file¹	Event file		Full file¹
			Subset	Domain²	Domain²	Subset	Domain²	Domain²
Expense: Hospital Inpatient Stay
Mean for Hispanics		$12,081	1,295.6	1,363.8	1,360.6	57	179	205
Mean for blacks		$10,557	887.0	1,011.5	1,003.4	49
Total for Hispanics		$37.2B	5.73B	6.14B	6.16B	57
Total for blacks		$39.3B	3.18B	4.37B	4.46B	49
Expense: Outpatient Department Visits
Mean for Hispanics		$694	76.73	79.83	79.87	79	191	205
Mean for blacks		$734	63.19	72.15	72.20	74
Total for Hispanics		$5.24B	627M	665M	666M	79
Total for blacks		$9.06B	1,025M	1,310M	1,311M	74

¹ Merging of the event file and full person file
² Domain analysis within the whole file

_^top

Conclusion

In estimating variances of estimates from MEPS event files, theoretically the event file should be merged with the FY person file and then a domain analysis should be used. This is required to account for the extra variance due to the fact that the number of persons in an event file (i.e., the sample size) is random and the corresponding population size is unknown. However, the impact of not doing a domain analysis on variance estimates is usually small when the subgroup is large and the number of singleton PSUs is small. To assess the impact on variances of producing estimates from MEPS event files without merging with the FY person file, an analysis is performed using the four smallest event files and the SEs of some estimates are compared.

The analysis shows that SEs are somewhat distorted (about 5 percent) for the Home Health file if the estimates are not produced from the full file with the domain option but this is not a notable problem for the other event files. Generally, the differences in variances between full and subsetted files are slightly higher for the estimates of totals than for means and proportions. There are differences in df available for estimating variances but the difference is ignorable since the available df under both approaches is sufficient. For estimating variances of estimates of subgroups within an event file, the differences in variances are negligible whether the estimates are produced from the full event file or the full file as long as the subgroup is treated as a domain. However, if the domain analysis is not done, the estimates of variance can be considerably biased and the available number of df can be less than sufficient.

In summary, for analysis of estimates from the Home Health file, the estimates of variances should always be computed by merging the event file with the full file with domain option. For all other event files, the analysis can be done using the event file only (i.e., without merging with the full file) without having any noticeable impact on the estimates of variances or df. For analyzing subgroups within the event file, a domain option should always be used and the file should never be further subset to subgroup of interest or the ‘by’ statement should never be used.

However, the above conclusion is based on the sizes of 2008 event files and this conclusion will be valid as long as the MEPS sample size (and hence the sizes of event files) remain stable from year to year. If there is a substantial decrease in the overall sample size, this conclusion may not be applicable and the caveats described above may need to be extended to other event files than just Home Health.

Finally, the reasons and the need for domain analysis discussed in this report are also applicable for analyzing any subset of a full person file. Generally, a domain analysis should be used for analyzing subgroup estimates using the full person file unless the impact of subsetting the file is assessed. This is particularly important when the subgroup size is not large and may be clustered geographically.

_^top

References

Botman S. L., Moore T. F., Moriarity C. L. (2000). Parsons V. L. Design and Estimation for the National Health Interview Survey, 1995–2004. National Center for Health Statistics. Vital Health Stat 2(130).

Cochran W. G. (1977). Sampling Techniques. New York, John Wiley & Sons, Inc.

Ezzati-Rice, T. M., Rohde, F., Greenblatt, J. (2008). Sample Design of the Medical Expenditure Panel Survey Household Component, 1998–2007. Methodology Report No. 22. March 2008. Agency for Healthcare Research and Quality, Rockville, MD. http://www.meps.ahrq.gov/mepsweb/data_files/publications/mr22/mr22.pdf

Machlin S. R., Chowdhury S. R., Ezzati-Rice T., DiGaetano R., Goksel H., Wun L.-M., Yu W., Kashihara D. Estimation Procedures for the Medical Expenditure Panel Survey Household Component. Methodology Report #24. September 2010. Agency for Healthcare Research and Quality, Rockville, MD. http://www.meps.ahrq.gov/mepsweb/data_files/publications/mr24/mr24.pdf

Return to Table of Contents

MEPS HOME . CONTACT MEPS . MEPS FAQ . MEPS SITE MAP . MEPS PRIVACY POLICY . ACCESSIBILITY . VIEWERS & PLAYERS . COPYRIGHT

Connect With Us

To sign up for updates or to access your subscriber preferences, please enter your email address below.

5600 Fishers Lane
Rockville, MD 20857
Telephone: (301) 427-1364

Methodology Report #26: Variance Estimation from MEPS Event Files

Connect With Us

Sign up for Email Updates

Agency for Healthcare Research and Quality

Methodology Report #26:
Variance Estimation from MEPS Event Files