Hospital Visits after Hospital Outpatient Surgery measures facility-level risk-standardized rate of acute, unplanned hospital visits within 7 days of a procedure performed at a hospital outpatient department (HOPD) among Medicare Fee-For-Service (FFS) patients aged 65 years and older. An unplanned hospital visit is defined as an emergency department (ED) visit, observation stay, or unplanned inpatient admission.
-
-
1.5 Measure Type1.6 Composite MeasureNo1.7 Electronic Clinical Quality Measure (eCQM)1.8 Level Of Analysis1.9 Care Setting1.10 Measure Rationale
The goal of this measure is to reduce adverse patient outcomes associated with preparation for same-day surgery, the surgery itself, and follow-up care, by capturing unplanned hospital visits following outpatient surgery and making them more visible to providers and patients. The measure score provides an assessment of quality that is publicly reported and informs quality improvement.
This measure is not a paired measure and does not require reporting with other measures to appropriately interpret results.
1.11 Measure Webpage1.20 Testing Data Sources1.25 Data SourcesMedicare administrative claims and enrollment data
-
1.14 Numerator
The outcome is all-cause, unplanned hospital visits, defined as 1) an inpatient admission directly after the surgery or 2) an unplanned hospital visit (ED visit, observation stay, or unplanned inpatient admission) occurring after discharge and within 7 days of the surgical procedure.
1.14a Numerator DetailsBecause this measure is an outcome measure, it does not have a traditional numerator. We use this section to describe the outcome.
The outcome for this measure is all-cause, unplanned hospital visits, defined as
1. An inpatient admission directly after the surgery or2. An unplanned hospital visit (ED visit, observation stay, or unplanned inpatient admission) occurring after discharge and within 7 days of the surgical procedure.
If more than one unplanned hospital visit occurs in the 7 days following the surgical procedure, only the first hospital visit within the outcome timeframe is counted in the outcome. If there are two surgical procedures within a 7-day period, we adjust the follow-up period of the first procedure to be the time between the first procedure and the second procedure. The second procedure’s follow-up period remains 7 days post-procedure. Thus, hospital visit outcomes are assigned to the first procedure if they occur during the time between procedures, while outcomes in the 7 days following the second procedure are assigned to the second procedure.
Planned Admission Algorithm
For inpatient admissions occurring after Day 1 following surgery, we only include unplanned admissions in the measure outcome. We consider admissions occurring on the day of the surgery (Day 0) and Day 1 post-surgery “unplanned” as the vast majority of these admissions are inpatient admissions directly following surgery. “Planned” admissions are those planned by providers for anticipated medical treatment or procedures that must be provided in the inpatient setting. We do not count these in the outcome because variation in planned admissions does not reflect quality differences.
To identify admissions as planned or unplanned we use an algorithm we previously developed for CMS’s hospital readmission measures, CMS Planned Readmission Algorithm (PRA) Version 4.0. In brief, the algorithm uses the procedure codes and principal discharge diagnosis code on each hospital claim to identify admissions that are typically planned and may occur after a surgery. A few specific, limited types of care are always considered planned (for example, major organ transplant, rehabilitation, or maintenance chemotherapy). Otherwise, a planned admission is defined as a non-acute admission for a scheduled procedure (for example, total hip replacement or cholecystectomy). Post-discharge admissions for an acute illness or for complications of care are never considered planned.
Also, the measure never considers ED visits or observation stays as planned. The most recently published methodology report provides a detailed description of the planned admission algorithm adapted for the surgery measure: https://qualitynet.cms.gov/files/651b5fb570a30f001c388004?filename=2023_OQR_MeasureUpdatesRpt.pdf. The codes that define ED visits and observation stays are in the attached data dictionary, sheet “HOPD_Surgy__ED_Obs_Stay_Def”
-
1.15 Denominator
The denominator is defined as eligible same-day surgeries or cystoscopy procedures with intervention performed at HOPDs for Medicare FFS patients aged 65 years and older with the exception of eye surgeries and same day surgeries performed concurrently with high-risk procedures. The measure includes:
*Surgeries and procedures that are substantial and are typically performed as same-day surgeries.
*Surgeries on patients aged 65 or over.
*When multiple procedures occur concurrently, only surgeries that are not performed concurrently with a high-risk procedure are included.
*Surgeries for patients with continuous enrollment in Medicare Fee-for-Service (FFS) Parts A and B in the 12 months prior to the surgery.1.15a Denominator DetailsThe surgery measure was developed to improve the quality of care delivered to patients undergoing hospital outpatient surgeries. In brief, the surgery measure includes all hospital outpatient departments (HOPDs) that performed qualifying surgeries during the performance period. The target population for this measure is Medicare Fee for Service (FFS) patients over age 65 who have had surgery performed in an HOPD during the performance period. We list the specific inclusion criteria below and note that this measure is procedure-based (not patient-based).
Further information on the measure development process is available in the Hospital Visits After Hospital Outpatient Surgeries: Measure Technical Report (2014) and 2016 Technical Report Addendum: https://www.cms.gov/Medicare/Quality-Initiatives-Patient-Assessment-Instruments/HospitalQualityInits/Downloads/Hospital-Visits-after-Hospital-Outpatient-Surgery-Measure.pdf
Inclusion Criteria
1. Surgeries and procedures that are substantial and are typically performed as same-day surgeries.
Rationale: The target cohort is low-to-moderate-risk surgeries that can be safely performed as same-day surgeries and do not typically require an overnight stay or an inpatient admission. In addition, they do not occur in conjunction with a same-day emergency department (ED) visit or observation stay. We define same-day surgeries using the CMS’s list of covered ambulatory surgery center (ASC) procedures. The list is comprised of procedures for which the patients are expected to return home the same day as their procedure. We further restrict Medicare’s list of covered ASC procedures using the Global Surgical Package (GSP) indicator and include two types of procedures from this list:
*Substantive surgeries performed at HOPDs (except eye surgeries)Rationale: Ambulatory procedures include a heterogeneous mix of non-surgical procedures, minor surgeries, and more substantive surgeries. We want to include substantive surgeries but not very low-risk (minor) surgeries or non-surgical procedures, which typically have a high volume and a very low outcome rate. We define substantive procedures using the Medicare Physician Fee Schedule (MPFS) global surgery indicator (GSI) code 090.
*Cystoscopy procedures with interventionRationale: All endoscopy procedures are considered non-surgical procedures based on Medicare coding (GSI code 000). However, we include cystoscopy with intervention because it is a common procedure, often performed for therapeutic intervention by surgical teams, and the outcome rate and causes of hospital visits post-procedure are similar to those for surgeries in the measure cohort.
Please refer to the data dictionary “HOPD_Surg_Cohort” to review the list of qualifying same-day surgeries, including cystoscopy procedures with intervention. The data dictionary “HOPD_Surg_Eye_Exclusions” provides the list of eye surgeries that are excluded from the measure cohort.
2. Surgeries on patients aged 65 or over.
Rationale: Medicare beneficiaries under age 65, typically, are a highly diverse group with a higher burden of disability, and it is therefore difficult to adequately risk adjust for the under 65 population.
3. When multiple procedures occur concurrently, only surgeries that are not performed concurrently with a high-risk procedure are included.
Rationale: Occasionally, more than one surgery may be performed and some of these surgeries may be higher-risk procedures. When multiple procedures occur, we only include surgeries that are not performed concurrently with high-risk procedures. Please refer to the data dictionary “HOPD_Surg_High_Risk_Exclusions” tab to review the list of high-risk procedures. High-risk procedures are identified using the Hospital Outpatient PPS Addendum B. A procedure is considered high-risk if it is flagged as “Inpatient Only” (not paid under OPPS) or “Outpatient Only” (paid under OPPS, but not on the list of ASC-approved procedures). Removal of these procedures’ aids with alignment of the measure’s restriction to only include ASC-covered procedures.
4. Surgeries for patients with continuous enrollment in Medicare Fee-for-Service (FFS) Parts A and B in the 12 months prior to the surgery.
Rationale: Patients with full enrollment have all claims available for identifying comorbidities for risk adjustment.Citations
Centers for Medicare & Medicaid Services (CMS). Three Day Payment Window. 2013; http://www.cms.gov/Medicare/Medicare-Fee-for-Service-Payment/AcuteInpatientPPS/Three_Day_Payment_Window.html
-
1.15b Denominator Exclusions
The following surgeries are excluded from the denominator:
1.Surgeries for patients without continuous enrollment in Medicare FFS Parts A and B in the 7 days after the surgery.
2. Surgeries for patients who have an ED visit on the same day but are billed on a separate claim, unless the ED visit has a diagnosis indicative of a complication of care.
3. Surgeries that are billed on the same hospital claim as an emergency department (ED) visit and that occur on the same calendar day, unless the ED visit has a diagnosis indicative of a complication of care.
4. Surgeries that are billed on the same hospital outpatient claim and that occur after the ED visit.
5. Surgeries that are billed on the same outpatient claim as an observation stay.
1.15c Denominator Exclusions DetailsExclusion Criteria
1. Surgeries for patients without continuous enrollment in Medicare FFS Parts A and B in the 7 days after the surgery.
Rationale: We exclude these patients to ensure all patients have full data available for outcome assessment.
2. Surgeries for patients who have an ED visit on the same day but are billed on a separate claim, unless the ED visit has a diagnosis indicative of a complication of care.
Rationale: It is unclear whether a same-day ED visit occurred before or after an eligible same-day surgery. However, the measure will not exclude surgeries with same-day, separate-claim ED visits if the diagnoses are indicative of a complication of care because we want to continue to capture these outcomes. The ICD-10-CM codes that define complications of care are in the attached Data Dictionary, sheet “ HOPD_Surg_ED_Excl_CoC”.
3. Surgeries that are billed on the same hospital claim as an emergency department (ED) visit and that occur on the same calendar day unless the ED visit has a diagnosis indicative of a complication of care.
Rationale: In these situations, it is not possible to use claims data to determine whether the surgery was the cause of, subsequent to, or during the ED visit. However, if the ED visit is coded with a diagnosis for a complication, the assumption is that it occurred after the surgery. The ICD-10-CM codes that define complications of care are in the attached Data Dictionary, sheet “HOPD_Surg_ED_Excl_CoC”.
4. Surgeries that are billed on the same hospital outpatient claim and that occur after the ED visit.
Rationale: In these situations, we assume that the surgery was subsequent to the ED visit and may not represent a routine surgery.
The timing of the ED visits is determined using revenue center dates from the outpatient claim.
5. Surgeries that are billed on the same outpatient claim as an observation stay.
Rationale: We do not include these cases in the calculation because the sequence of events is not clear.
-
OLD 1.12 MAT output not attachedAttached1.13 Attach Data Dictionary1.13a Data dictionary not attachedYes1.16 Type of Score1.16a Other Scoring Method
Ratio
1.17 Measure Score InterpretationBetter quality = Lower score1.18 Calculation of Measure ScoreThe measure score is a facility-level risk-standardized hospital visit ratio (RSHVR). The RSHVR is calculated as the ratio of the predicted to the expected number of post-surgical unplanned hospital visits among an HOPD’s patients. For each HOPD, the numerator of the ratio is the number of hospital visits predicted for the HOPD’s patients, accounting for its observed rate, the number and complexity of the procedures performed at the HOPD, and the patient mix. The denominator is the number of hospital visits expected nationally for the HOPDs case/procedure mix. To calculate an HOPD’s predicted-to-expected (P/E) ratio, the measure uses a two-level hierarchical logistic regression model. The log-odds of the outcome for an index procedure is modeled as a function of the patient demographic, comorbidity, procedure characteristics, and a random HOPD-specific intercept. A ratio greater than one indicates that the HOPD’s patients and have more visits than expected, compared to an average HOPD with similar patient and procedural complexity. A ratio less than one indicates that the HOPD’s patients have fewer post-surgical visits than expected, compared to an average HOPD with similar patient and procedural complexity. For details on the measure calculation, please see the 2023 Hospital Visits after Hospital Outpatient Surgery Measure Update Report: https://qualitynet.cms.gov/files/651b5fb570a30f001c388004?filename=2023_OQR_MeasureUpdatesRpt.pdf
Below we provide the individual steps to calculate the measure score:
1. Identify surgeries meeting the inclusion criteria described in the denominator section above, and in Tab 1, “HOPD Surg Cohort,” of the data dictionary.
2. Exclude procedures meeting any of the exclusion criteria described in the exclusion section above.
3. Identify a binary flag for an unplanned hospital visit within 7 days of index procedures as described above.
4. Use patients’ historical and index procedure claims data to create risk-adjustment variables.
5. Fit a hierarchical generalized linear model (HGLM) and calculate the ratio of the number of “predicted” hospital visits to the number of “expected” hospital visits for each facility, given its case/procedure mix using the results. This is the risk-standardized hospital visit ratio (RSHVR). The HGLM is adjusted for age, clinical risk factors, and procedure RVU and body system that vary across patient populations, are unrelated to quality, and influence the outcome. Details about the risk-adjustment model can be found in the original measure development methodology report: Hospital Visits after Hospital Outpatient Surgery Measure Technical Report at https://www.qualitynet.org/files/5d0d3a7e764be766b0104644?filename=2016HOPDSurgeryTechReport.pdf
6. Use statistical bootstrapping to construct a 95% confidence interval estimate for each facility’s RSHVR. For more information about the measure methodology, please see the most recent Hospital Visits after Hospital Outpatient Surgery Measure Update Report posted here: https://qualitynet.cms.gov/files/651b5fb570a30f001c388004?filename=2023_OQR_MeasureUpdatesRpt.pdf1.19 Measure Stratification DetailsThis measure is currently stratified by dual eligibility (DE), reported confidentially to hospitals.
In the Calendar Year (CY) 2022 OPPS Proposed Rule, CMS described a plan to stratify reporting using two disparity methods, described below, in the HOPD setting and have identified the HOPD Surgery measure as one of six priority measures included in the Hospital Outpatient Quality Reporting (OQR) program for confidential disparity reporting stratified by patient DE.
The two stratification methods are:
1. The Within-Facility Disparity Method, highlights differences in outcomes for patient groups based on social risk factors within an HOPD, and2. The Across-Facility Disparity Method, illuminates variation in healthcare quality for patients with social risk factors across facilities.
The two methods are described in more detail below and visually shown in Figure 1 (see attachment with tables and figures). Details of the methodology can be found here: https://qualitynet.cms.gov/files/652fd45a8be3e0001c0b5141?filename=CY23_OP_32_35_36_DsprtySpecs.pdf
The Within-Facility Disparity Method reports differences in health outcomes between patient populations in the same facility. The goal of this method is to assess the difference in outcomes for two patients with the same condition and medical history but with different social risks. This method can answer the question: “Does a patient with a social risk factor experience similar health outcomes as a patient without that social risk factor when cared for at the same facility?”
The Across-Facility Disparity Method reports facility outcome rates for one patient population with a particular social risk factor across facilities. This method can answer the question: “How does the outcome rate for patients with a social risk factor at a specific facility compare to the outcome rate for patients with that social risk factor at an average facility?”
Please see the “Equity” section for results for the HOPD Surgery method when applying these two disparity methods.
1.26 Minimum Sample SizeThere is no minimum sample size required to calculate the measure score. All facilities can benefit from receiving individual facility results. Programs using the measure should consider reliability when choosing sample sizes for measure use. For example, for the Hospital Outpatient Quality Reporting program, a facility must perform 30 qualifying procedures to receive a publicly reported measure score.
-
Most Recent Endorsement ActivityCost and Efficiency Fall 2023Initial EndorsementLast Updated
-
StewardCenters for Medicare & Medicaid ServicesSteward Organization POC EmailSteward Organization URLSteward Organization Copyright
Not Applicable
Measure Developer Secondary Point Of ContactOscar Gonzalez
Acumen LLC
500 Airport Blvd., Suite 100
Burlingame, CA 94010
United StatesMeasure Developer Secondary Point Of Contact Email
-
-
-
2.1 Attach Logic Model2.2 Evidence of Measure Importance
The outcome of unplanned hospital visits following outpatient same-day surgery is a widely accepted measure of outpatient surgical care quality. Differences across hospitals in risk-standardized, post-procedure unplanned hospital visits are likely related to quality of care rather than to pre-existing medical conditions or chance. Studies have consistently shown that post-operative complications and poorly controlled symptoms are the primary contributors to unexpected hospital visits following outpatient surgery (Desai et al., 2022). This measure provides the opportunity to improve the quality of care and to lower rates of adverse events leading to hospital visits after outpatient surgery.
A recent study, using Medicare FFS beneficiaries aged 65 years and older, across 4000+ HOPDs, showed nearly 8% of (overall) hospital outpatient surgeries were followed by an unplanned hospital visit within 7 days (Desai et al., 2022). Additionally, if a patient received the same procedure at a lower-quality versus higher-quality hospital, their risk of an unplanned hospital visit within 7 days would increase by 29%. (Desai et al., 2022).
Estimates of hospital visit rates within the first 30 days following surgery vary from less than one percent to 28% depending on the type of surgery and body system, the outcome measured (inpatient admissions alone or with ED visits, and observation stays), outcome timeframe (e.g., 7, 14, or 30 days), and patient characteristics (e.g. age, sex) (Christian et al., 2019; De Oliveira et al., 2015; Liu et al., 2018a; Liu et al., 2018b). For example, unadjusted 7-day outcome rates (of unplanned hospital visits) varied substantially by body system, ranging from a low of 4.4% for procedures of the ear to a high of 14.3% for male genital organ procedures, such as transurethral electrosurgical resection of prostate and laser vaporization of prostate (Desai et al., 2022).
Common causes of preventable return visits following outpatient surgery include surgical errors, post-operative pain, infection, nausea, urinary retention, and vomiting (Desai et al., 2022; Liu et al., 2018a; Liu et al., 2018b). In one 2017 study of patients undergoing outpatient laparoscopic cholecystectomy, 60% of hospital return visits were due to these preventable events (Rosero et al., 2017). Other less common, but more serious, reasons for return hospital visits include bleeding, respiratory complications, deep vein thrombosis, cardiac complications, and urinary complications (De Oliveira et al., 2015; Liu et al., 2018a; Liu et al., 2018b; Rosero et al., 2017). Patient characteristics, such as age, sex, and comorbidities such as diabetes, can increase the risk of an admission (De Oliveira et al., 2015; Christian et al., 2019). In addition, clinical procedural factors can increase the risk, such as the type of anesthesia used, and longer operation time (Liu et al., 2018a; Christian et al., 2019).
Potential quality improvement actions include appropriate patient selection, improving surgical techniques, implementing protocols to address common problems such as adequate control of nausea and vomiting and postoperative pain, patient education about potential adverse effects of the surgery, reconciling patient medications, and organizing appropriate follow-up care with providers such as primary care physicians. For example, guidelines recommend multi-modal approaches for treatment of post-operative pain (Chou et al., 2016) as well as routine multi-modal nausea and vomiting prophylaxis for all patients (Gan et al., 2020). Facilities can also provide support for identifying and managing patient-level risk factors; for example, identifying patients with diabetes can ensure optimal care during the perioperative period regarding prevention of hyperglycemia (Thompson et al., 2016).
Citations
A Proposed Rule by the Centers for Medicare & Medicaid Services on 07/31/2023. (2023, July 31). Federal Register; National Archives. https://www.govinfo.gov/content/pkg/FR-2023-07-31/pdf/2023-14768.pdf
Christian RA, Gibbs DB, Nicolay RW, Selley RS, Saltzman MD. Risk factors for admission after shoulder arthroscopy. J Shoulder Elbow Surg. 2019 May;28(5):882-887.
Chou R, Gordon DB, de Leon-Casasola OA, Rosenberg JM, Bickler S, Brennan T, Carter T, Cassidy CL, Chittenden EH, Degenhardt E, Griffith S, Manworren R, McCarberg B, Montgomery R, Murphy J, Perkal MF, Suresh S, Sluka K, Strassels S, Thirlby R, Viscusi E, Walco GA, Warner L, Weisman SJ, Wu CL. Management of Postoperative Pain: A Clinical Practice Guideline From the American Pain Society, the American Society of Regional Anesthesia and Pain Medicine, and the American Society of Anesthesiologists' Committee on Regional Anesthesia, Executive Committee, and Administrative Council. J Pain. 2016 Feb;17(2):131-57.
De Oliveira GS Jr, Holl JL, Lindquist LA, Hackett NJ, Kim JY, McCarthy RJ. Older Adults and Unanticipated Hospital Admission within 30 Days of Ambulatory Surgery: An Analysis of 53,667 Ambulatory Surgical Procedures. J Am Geriatr Soc. 2015 Aug;63(8):1679-85.
Desai MM, Zogg CK, Ranasinghe I, et al. Variation in Risk-standardized Rates and Causes of Unplanned Hospital Visits Within 7 Days of Hospital Outpatient Surgery. Ann Surg. 2022;276(6):e714-e720.
Gan TJ, Belani KG, Bergese S, Chung F, Diemunsch P, Habib AS, Jin Z, Kovac AL, Meyer TA, Urman RD, Apfel CC, Ayad S, Beagley L, Candiotti K, Englesakis M, Hedrick TL, Kranke P, Lee S, Lipman D, Minkowitz HS, Morton J, Philip BK. Fourth Consensus Guidelines for the Management of Postoperative Nausea and Vomiting. Anesth Analg. 2020 Aug;131(2):411-448.
Liu J, Flynn DN, Liu WM, Fleisher LA, Elkassabany NM. Hospital-Based Acute Care Within 7 Days of Discharge After Outpatient Arthroscopic Shoulder Surgery. Anesth Analg. 2018a Feb;126(2):600-605.
Liu J, Kim DH, Maalouf DB, Beathe JC, Allen AA, Memtsoudis SG. Thirty-Day Acute Health Care Resource Utilization Following Outpatient Anterior Cruciate Ligament Surgery. Reg Anesth Pain Med. 2018b Nov;43(8):849-853.
Mingus ML, Bodian CA, Bradford CN, Eisenkraft JB. Prolonged surgery increases the likelihood of admission of scheduled ambulatory surgery patients. Journal of clinical anesthesia. Sep 1997;9(6):446-450.
Rosero EB, Joshi GP. Hospital readmission after ambulatory laparoscopic cholecystectomy: incidence and predictors. J Surg Res. 2017 Nov;219:108-115.
Schloemann DT, Sajda T, Ricciardi BF, Thirukumaran CP. Association of Total Knee Replacement Removal From the Inpatient-Only List With Outpatient Surgery Utilization and Outcomes in Medicare Patients. JAMA Netw Open. 2023;6(6):e2316769.
Thompson BM, Stearns JD, Apsey HA, Schlinkert RT, Cook CB. Perioperative Management of Patients with Diabetes and Hyperglycemia Undergoing Elective Surgery. Curr Diab Rep. 2016 Jan;16(1):2.
-
2.6 Meaningfulness to Target Population
A hospital visit following same-day surgery is an unexpected and potentially preventable outcome for patients scheduled for elective same-day surgeries that have a low anticipated risk. Providers (HOPDs and surgeons) are often unaware of their patients’ hospital visits after surgery because patients often present to the ED or to different hospitals, leading to understated adverse event rates and suggesting the need for better measurement to drive quality improvement. Therefore, both patients and providers benefit from outcome measures of hospital visits – a broad, patient-centered outcome that reflects the full range of reasons leading to hospitalization among patients undergoing same-day surgery.
The HOPD Surgery measures is part of the Hospital Outpatient Quality Reporting (HOQR) Program, a pay-for-reporting program. HOPDs first saw their facility-specific measure scores in 2017, during a “dry run” that precedes public reporting. The measure was first publicly reported in January 2020, on Hospital Compare (now Care Compare). Currently, there are no other publicly available quality reports of HOPDs that perform same-day surgery, and the recent migration of additional procedures to the outpatient setting (e.g., THA/TKA) (Burnett et al., 2023) underscores the measurement gap that would exist without this measure. Thus, this measure addresses an important quality measurement area and enhances the information available to patients choosing among HOPDs that provide same-day outpatient surgery. Furthermore, providing outcome rates to HOPDs makes visible to clinicians and hospitals meaningful quality differences and incentivizes improvement.
During measure development, we asked our Technical Expert Panel (TEP), made up of 15 members including patient representatives, expert clinicians, methodologist, researchers, and providers, to formally assess the measure’s face validity. We provided the TEP background on the NQF measure evaluation criteria and presented the measure specifications and testing and performance results for their evaluation. TEP members indicated their agreement (on a six-point scale) with the following statement: “The risk-standardized hospital visit ratios obtained from the outpatient surgery measure as specified can be used to distinguish between better and worse quality facilities.” 12 of the 13 indicated they moderately or strongly agreed. Two TEP members did not respond to the TEP survey.
Citation
Burnett, R. A., Barrack, T. N., Terhune, E. B., Della Valle, C. J., Shah, R. P., & Courtney, P. M. (2023). Over Half of All Medicare Total Knee Arthroplasty Patients Are Now Classified as an Outpatient-Three-Year Impact of the Removal From the Inpatient-Only List. The Journal of arthroplasty, 38(6), 992–997. https://doi.org/10.1016/j.arth.2022.12.029
-
Table 1. Performance Scores by Decile
Performance Gap Overall Minimum Decile_1 Decile_2 Decile_3 Decile_4 Decile_5 Decile_6 Decile_7 Decile_8 Decile_9 Decile_10 Maximum Mean Performance Score 1.02 0.44 0.70 0.82 0.89 0.94 0.98 1.00 1.05 1.10 1.20 1.53 3.31 N of Entities 3817 - 381 382 382 382 381 382 382 382 382 381 - N of Persons / Encounters / Episodes 1,204,167 251,836 251,836 173,925 142,221 91,332 52,575 50,424 117,445 91,330 112,165 120,914 -
-
-
-
3.1 Feasibility Assessment
Not applicable for Fall 2023 cycle.
3.3 Feasibility Informed Final MeasureBecause this is a claims-based measure the is no burden on the facility; rates are automatically calculated by CMS based on claims data submitted by facilities for payment.
-
3.4a Fees, Licensing, or Other Requirements
There are no fees, licensing, or other requirements to use this measure as specified.
3.4 Proprietary InformationNot a proprietary measure and no proprietary components
-
-
-
4.1.3 Characteristics of Measured Entities
The number of measured entities (HOPDs) varies by testing type. Please see Table 5 in attachment with compiled tables and figures.
4.1.1 Data Used for TestingPrevious 2020 Submission
For the original development of the Hospital Visits after Hospital Outpatient Surgery (HOPD Surgery) measure we used 2009-2011 Medicare data to develop a Medicare fee-for-service (FFS) cohort consisting of a 20% sample of same-day surgery claims from hospital outpatient departments (HOPDs) as outlined below. The measure cohort included patients with outpatient same-day surgery in 2010, and we used inpatient and outpatient data from 2009 to derive comorbidities for risk adjustment for these patients.
a. Datasets used to define the cohort:
-Carrier (Part B Physician) claims Standard Analytical File (SAF): This SAF contains a 20% sample of all base and line-item claims billed by physicians performing surgeries at HOPDs.
-Medicare 100% Hospital Outpatient SAF: This dataset contains 100% of all HOPD facility claims for surgeries performed at HOPDs. This dataset links physician claims for surgeries performed at HOPDs to the corresponding HOPD facility claim in order to obtain a facility identifier for HOPDs.
-Enrollment database and denominator files: This dataset contains Medicare FFS enrollment, demographic, and death information for Medicare beneficiaries.
b. Datasets used to identify the outcome (hospital visits):
-The Centers for Medicare & Medicaid Services (CMS) Medicare Provider Analysis and Review File (MedPAR) Part A Inpatient institutional claims (100% of all claims): This dataset is used to identify inpatient hospital claims.
-Medicare 100% Hospital Outpatient SAF: This dataset is used to identify emergency department (ED) and observation stay visits.
c. Datasets used to identify comorbidities for risk adjustment:
-Inpatient and outpatient claims (institutional and non-institutional carrier) data from the year prior to the outpatient surgery (2009) were used to identify comorbidities for risk adjustment for these patients.
For updated measure testing provided in this submission we used paid, final action Medicare claims from January 1, 2018 to December 31, 2018 to identify procedures performed in the outpatient setting at Hospital Outpatient Departments (HOPDs), and subsequent hospital visits. In addition, we used CMS enrollment and demographic data from the Health Account Joint Information (HAJI) database to determine inclusion and exclusion criteria. Patient history is assessed using claims data collected in the 12 months prior to the outpatient surgery.
For all derived cohorts:
a. Datasets used to define the cohort:
-All cohort, outpatient surgeries performed at HOPDs were identified using the full set of Medicare beneficiaries’ claims from the Carrier non-institutional claims, which included physician bills for hospital outpatient services. HOPD claims were linked to the outpatient institutional surgical claims or inpatient institutional surgical claim when CMS’s 3-day window payment period applied.
-Enrollment database and denominator files: These datasets contain Medicare Fee-For-Service (FFS) enrollment, demographic, and death information for Medicare beneficiaries, which is used to determine inclusion/exclusion criteria.
b. Datasets used to capture the outcome (hospital visits):
-The outcomes of emergency department (ED) visits and observation stays after outpatient surgery were identified from hospital outpatient institutional claims, and inpatient hospital admissions (at acute care and critical access hospitals) from inpatient institutional claims.
c. Datasets used to identify comorbidities for risk adjustment:
-Inpatient and outpatient claims (institutional and non-institutional carrier) data from the year prior to the outpatient surgery were used to identify comorbidities for risk adjustment for these patients.
To assess social risk factors, we used census as well as claims data (DE status obtained through the Master Beneficiary Summary File (MBSF) Database; Agency for Healthcare Research and Quality (AHRQ) socioeconomic status (SES) index score obtained through census data). The dataset used varies by testing type.
For updated measure testing provided in the 2020 (prior) submission we used paid, final action Medicare claims from January 1, 2018 to December 31, 2018 to identify procedures performed in the outpatient setting at Hospital Outpatient Departments (HOPDs), and subsequent hospital visits. In addition, we used CMS enrollment and demographic data from the Health Account Joint Information (HAJI) database to determine inclusion and exclusion criteria. Patient history is assessed using claims data collected in the 12 months prior to the outpatient surgery.
For all derived cohorts:
a. Datasets used to define the cohort:
-All cohort, outpatient surgeries performed at HOPDs were identified using the full set of Medicare beneficiaries’ claims from the Carrier non-institutional claims, which included physician bills for hospital outpatient services. HOPD claims were linked to the outpatient institutional surgical claims or inpatient institutional surgical claim when CMS’s 3-day window payment period applied.
-Enrollment database and denominator files: These datasets contain Medicare Fee-For-Service (FFS) enrollment, demographic, and death information for Medicare beneficiaries, which is used to determine inclusion/exclusion criteria.
b. Datasets used to capture the outcome (hospital visits):
-The outcomes of emergency department (ED) visits and observation stays after outpatient surgery were identified from hospital outpatient institutional claims, and inpatient hospital admissions (at acute care and critical access hospitals) from inpatient institutional claims.
c. Datasets used to identify comorbidities for risk adjustment:
-Inpatient and outpatient claims (institutional and non-institutional carrier) data from the year prior to the outpatient surgery were used to identify comorbidities for risk adjustment for these patients.
To assess social risk factors, we used census as well as claims data (DE status obtained through the Master Beneficiary Summary File (MBSF) Database; Agency for Healthcare Research and Quality (AHRQ) socioeconomic status (SES) index score obtained through census data). The dataset used varies by testing type.
Current 2023 Submission
For this 2023 endorsement maintenance submission we used paid, final action Medicare claims from January 1, 2022 to December 31, 2022 to identify procedures performed in the outpatient setting at Hospital Outpatient Departments (HOPDs), and subsequent hospital visits.
We use Medicare FFS claims to identify surgeries performed in the outpatient setting and subsequent hospital visits, as well as CMS enrollment and demographic data. Patient history is also assessed using claims data collected in the 12 months prior to the eligible same-day surgery. We identify outpatient surgeries using Medicare’s list of covered ASC procedures. CMS reviews and updates this list of surgeries annually. The process includes a transparent public comment submission and review process for addition and/or removal of procedures codes. The lists are posted at: https://www.cms.gov/Medicare/Medicare-Fee-for-ServicePayment/ASCPayment/11_Addenda_Updates.html (refer to Addendum AA of the respective link). Procedures listed on Medicare’s list of covered ASC procedures are defined using HCPCS and CPT® codes.
The measure attributes surgeries to an HOPD if a Part B physician claim is present and the claim can be linked to Medicare outpatient or inpatient institutional data. We first identify physician claims as Outpatient Hospital Department or Physician Office by the Line Place of Service Code in the Part B Carrier claims file. Place of Service coding is used to specify the entity where service(s) were rendered. We then link the physician claims to a hospital outpatient claim with the surgery indicated to identify the HOPD where the surgery took place. Physician claims with no match to a hospital outpatient claim are then matched to hospital inpatient claims with an inpatient admission date within zero to three days after the date of surgery, to capture surgical procedures billed per the CMS 3-day payment window policy. The HOPD of the admitting hospital is where the case is attributed.
4.1.4 Characteristics of Units of the Eligible PopulationThe number of patients varies by testing type. Please see Table 5 in attachment with compiled tables and figures.
4.1.2 Differences in DataPrevious 2020 Submission
Table 4 in the attachment with compiled tables and figures outlines the datasets used in each analysis.
Current 2023 Submission
Table 5 (attachment with compiled tables and figures) outlines the dataset used in this updated endorsement maintenance submission. We used Medicare Fee-For-Service claims from January 1, 2022 through December 31, 2022. This dataset included 1,204,167 procedures from 3,917 facilities. There were 2,876 facilities with at least 30 procedures (the public reporting threshold). Please see Table 3 for additional details.
-
4.2.1 Level(s) of Reliability Testing Conducted4.2.2 Method(s) of Reliability Testing
Previous 2020 Submission
Measure Score Reliability
We provide facility-level measure score reliability using the signal-to-noise method, using the formula presented by Adams and colleagues [Yu et al, 2013; Adams et al., 2010]. Specifically, for each facility we calculate the reliability as:
Reliability=(σ_(facility-to-facility)^2)/(σ_(facility-to-facility)^2+ (σ_(facility error variance)^2)/n)
Where facility-to-facility variance is estimated from the hierarchical logistic regression model, n is equal to each facility’s observed case size, and the facility error variance is estimated using the variance of the logistic distribution (pi^2/3). The facility-level reliability testing is limited to facilities with at least 30 admissions for public reporting.
Signal-to-noise reliability scores can range from 0 to 1. A reliability of zero implies that all the variability in a measure is attributable to measurement error. A reliability of one implies that all the variability is attributable to real difference in performance.
We calculated the measure score reliability for all facilities, and for facilities with a volume cutoff of 30 procedures, using Dataset #2. Our rationale for this is described below.
Relationship of reliability testing to minimum volume per facility
In general, CMS sets the volume cutoff for publicly reporting facility measures scores based on two considerations. CMS considers the empiric results of reliability testing conducted on the dataset used for public reporting. CMS also considers the volume cutoff for score reporting used for related measures (for example, Facility 7-Day Risk-Standardized Hospital Visit Rate after Outpatient Colonoscopy) and seeks to align where possible the cutoffs for similar measures that are concurrently reported. CMS has empirically determined that measure scores (risk-standardized hospital visit ratios or RSHVRs) for HOPDs with 30 or more procedures are reliable. Regardless of the score reporting volume cutoff, all facilities and their cases are used in calculating the measure scores. In the dry run and in public reporting CMS typically reports scores for facilities with fewer procedures than the volume cutoff as having “too few cases” to support a reliable estimate. In summary, the measure specifications do not prejudge the ideal volume cutoff. The minimum sample size for public reporting is a policy choice that balances considerations such as the facility-level reliability testing results on the reporting data and consistency across measures for consumers.
Citations
Adams J, Mehrota, A, Thoman J, McGlynn, E. (2010). Physician cost profiling – reliability and risk of misclassification. NEJM, 362(11): 1014-1021.
Yu, H, Mehrota, A, Adams J. (2013). Reliability of utilization measures for primary care physician profiling. Healthcare, 1, 22-29.
Current Submission
We calculated facility-level (signal-to-noise) reliability using the same method described above from the prior submission.
4.2.3 Reliability Testing ResultsPrevious 2020 Submission
The median facility-level reliability (signal-to-noise reliability) for all facilities (N=3974) was 0.759 (IQR 0.372-0.892); the median facility-level reliability for facilities with more than 30 procedures (n=2979) was 0.839 (IQR 0.696-0.915). The 2979 facilities represent 1,161,312 procedures or 99% of the total 1,172,087 procedures.
Current Submission
Table 2 below (Table 6 in tables/figures attachment) shows the results from updated testing, using the 2023 EM dataset.
The median facility-level reliability (signal-to-noise reliability) for all facilities (N=3817) was 0.853 (IQR 0.521-0.939); the median facility-level reliability for facilities with more than 30 procedures (n=2876) was 0.908 (IQR 0.808-0.953). The 2,876 facilities represent 1,194,500 procedures or 99.2% of the total 1,204,167 procedures. In Table 2 (Table 6 in tables/figures attachment) we present signal to noise reliability in deciles, as requested by Battelle, for all facilities. In Table 6A (see tables/figures attachment) we also provide signal to noise reliability for facilities with at least 30 procedures which is the public reporting threshold. We note that the table provided by Battelle is not editable; the last row for this measure is the number of procedures, not the number of patients.
Table 2. Accountable Entity–Level Reliability Testing Results by Denominator-Target Population SizeAccountable Entity-Level Reliability Testing Results Overall Minimum Decile_1 Decile_2 Decile_3 Decile_4 Decile_5 Decile_6 Decile_7 Decile_8 Decile_9 Decile_10 Maximum Reliability 0.71 0.03 0.09 0.28 0.52 0.71 0.82 0.88 0.92 0.94 0.96 0.98 0.99 Mean Performance Score 3,817 - 371 395 375 385 384 380 383 381 382 381 - N of Entities 1,204,167 - 984 4,486 11,707 26,341 48,553 77,212 116,505 164,913 247,841 505,625 - 4.2.4 Interpretation of Reliability ResultsPrevious 2020 submission
The median signal-to-noise reliability score is sufficiently high for both all facilities, and facilities with at least 30 procedures (the public reporting cutoff).
Current submissionThe median signal-to-noise reliability for the HOPD surgery measure is sufficiently high (above the CBE threshold of 0.6) for all facilities (0.85) and for facilities with at least 30 procedures (0.89) (the public reporting threshold).
-
4.3.1 Level(s) of Validity Testing Conducted4.3.2 Type of accountable entity-level validity testing conducted4.3.3 Method(s) of Validity Testing
Previous 2020 Submission
Empirical Validity Testing of the Measure Score
We examined whether better performance on the HOPD Surgery measure was correlated with better performance on measures that are related, meaning that at least to some extent the comparator measures assess the same domain of quality (complications requiring acute care after same-day surgery).
Hospital Outpatient Quality Reporting Measures
To identify related measures, we reviewed all of the measures that are currently publicly reported (for CY2020 Payment Determination) in the Hospital Outpatient Quality Reporting program (HOQR) and the Inpatient Quality Reporting Program (IQR). Note that, because Hospital Outpatient Departments are not a distinct entity but rather a diverse group of care settings (such as the ED, outpatient clinics, and outpatient surgery settings), many of the HOQR measures are not relevant comparators because they are restricted to particular settings (such as the ED or clinic) that do not overlap with the HOPD Surgery measure.
Of the 14 measures in the HOQR program that are not planned for retirement, none of the measures assessed the same quality domain. One measure, OP-32: Facility 7-Day Risk-Standardized Hospital Visit Rate after Outpatient Colonoscopy, assessed the same outcome. However, colonoscopy is a narrow and relatively low-risk procedure performed in a different setting (not the surgical suite); we therefore would not expect the measure scores from the colonoscopy measure to correlate with measure scores from the HOPD Surgery measure.
Hospital Inpatient Quality Reporting Measures
Of the Hospital Inpatient Quality Reporting measures, we identified readmission measures, specifically the Hospital-wide Readmission (HWR) measure, as a potential candidate for comparison. The HWR calculates rates of 30-day unplanned hospital readmissions for five different specialty cohorts: medicine, neurology, cardiovascular, cardiorespiratory, and surgery/gynecology), each with a fully developed and statistically tested risk model. (Methodology report is available at: https://www.cms.gov/Medicare/Quality-Initiatives-Patient-Assessment-Instruments/HospitalQualityInits/Downloads/Hospital-Wide-All-Cause-Readmission.zip). The HOPD Surgery measure cohort and outcome overlap with the surgery/gynecology cohort (hereafter “surgery cohort”) of HWR.
We hypothesized that the HOPD Surgery measure score would show a weak, positive relationship with the measure score for the surgery cohort of the HWR measure given that the measures assess overlapping but distinct surgeries (outpatient vs. inpatient) and overlapping but distinct patient outcomes (hospital visits within 7 days vs. readmissions within 30 days):
We expect some correlation because:
- It is possible that the same surgeons and surgical teams are performing surgeries covered by both measures, and in some hospitals those procedures may be co-located.
- Both measures count admissions to the hospital post-surgery in the outcome, although the HOPD measure also counts ED visits, which make up the majority of the return visits, as well as observation stays.
- The same organizational culture and processes may be in place to prevent visits to the hospital following surgery across both inpatient and outpatient procedures, such as timely recognition of post-operative complications and ensuring effective discharge plans (Brooke et al., 2012).
However, we do not expect moderate or strong correlations because:
- The outcomes differ; not only does the HOPD Surgery measure include ED visits and observation stays in addition to admissions, but the period of observation for the outcome differs (7 days for the HOPD Surgery measure vs. 30 days for the HWR surgery cohort).
- The cohorts (procedures and patients) are distinct; inpatient procedures are generally more complex procedures done on higher-risk patients.
- Instead, we hypothesize that the relationship, while positive, would be weak, because:
- Certain procedures, such as inguinal hernia repair, are more likely to be done on an outpatient vs. inpatient basis, whereas more complex procedures, such those within the CCS “Vascular stents and OR procedures, other than head or neck” are predominantly inpatient (Steiner et al., 2014). Further, the HWR surgery cohort includes more acutely ill patients.
- The two measure scores are the result of separate statistical models that assume a distribution of latent quality that is normally distributed. These estimates are shrunk toward an overall mean that depends on a hospital’s own performance as well as the other hospitals in the measure. Each measure’s score will ultimately have their own uncertainty associated with the estimate which will ultimately reduce correlation among the measure scores.
- Instead, we hypothesize that the relationship, while positive, would be weak, because:
For this analysis we used the measure scores from the HOPD Surgery measure calculated from Dataset #2 (January 1, 2018-December 31, 2018) and evaluated their association with measure scores from the same facilities using the HWR surgery cohort measure score (July 1, 2017-June 30, 2018). Specifically. we examined the relationship of performance on the HOPD Surgery measure score against performance within quartiles for the HWR surgery cohort measure score (see Figure 2a). We also calculated the Pearson correlation coefficient between the two measure scores, to characterize the strength and direction of the relationship. Finally, we examined the association of outlier status of the HOPD Surgery measure score with the quartiles of the HWR surgery cohort score (see Figure 2b). Specifically, we identified outliers by estimating an interval estimate (similar to a confidence interval) around each hospital’s measure score and identified those facilities that had a 95% interval estimate entirely above or entirely below 1.0, as described in below. We then performed a chi square test to determine if the outlier relationship (between HOPD Surgery measure score outliers and quartiles of HWR performance) was significantly different than what would be expected by chance alone.
Face Validity as Determined by the TEP
During measure development, we asked our TEP, made up of 15 members including patient representatives, expert clinicians, methodologist, researchers, and providers, to formally assess the measure’s face validity. We provided the TEP background on the NQF measure evaluation criteria and presented the measure specifications and testing and performance results for their evaluation.
List of TEP Members
- David Chang, PhD, MPH, MBA—Massachusetts General Hospital (Associate Professor of Surgery, Department of Surgery; Director of Healthcare Research and Policy Development, Codman Center for Clinical Effectiveness in Surgery); Boston, MA
- Gary Culbertson, MD—Iris Surgery Center (Plastic Surgeon; Medical Director); Sumter, SC
- Martha Deed, PhD—Member of the public; North Tonawanda, NY
- Richard Dutton, MD, MBA—Anesthesia Quality Institute (Executive Director); Park Ridge, IL
- Nestor Esnaola, MD, MPH, MBA—Temple University School of Medicine (Professor of Surgery; Chief, Surgical Oncology); Philadelphia, PA
- Charles Goldfarb, MD—Washington University School of Medicine (Associate Professor of Orthopaedic Surgery); St Louis, MO
- Lisa Ishii, MD, MHS—Johns Hopkins School of Medicine (Associate Professor, Department of Otolaryngology-Head & Neck Surgery); Baltimore, MD
- Sandra Koch, MD—Carson Medical Group (OB/GYN surgery); Carson City, NV
- Tricia Meyer, PharmD, MS—Scott & White Memorial Hospital (Associate Vice-President, Department of Pharmacy); Texas A&M University College of Medicine (Associate Professor, Department of Anesthesiology); Texas A&M Rangel College of Pharmacy (Adjunct Associate Professor, Department of Anesthesiology); Temple, TX
- Linda Radach, BA— Member of the public; Lake Forest Park, WA
- Danny Robinette, MD—Surgery Center of Fairbanks (General Surgeon; Medical Director); Fairbanks, AK
- Suketu Sanghvi, MD—The Permanente Medical Group, Kaiser Permanente (Ophthalmologist; Associate Executive Director); Oakland, CA
- Christopher Tessier, MD—Manchester Urology Associates (Urologist); Manchester, NH
- Thomas Tsai, MD, MPH—Brigham and Women’s Hospital (General Surgery Resident; Administrative Chief Resident for Research); Harvard School of Public Health (Postdoctoral Fellow, Department of Health Policy and Management); Boston, MA
- Katherine Wilson, RN, MHA—AmSurg Corp (Vice President, Quality); Nashville, TN
We systematically assessed the face validity of the measure score as an indicator of quality by soliciting the TEP members’ agreement with the following statement: “The risk-standardized hospital visit ratios obtained from the outpatient surgery measure as specified can be used to distinguish between better and worse quality facilities.”
TEP members indicated their agreement with the face validity of the measure on a six-point scale:
1=Strongly disagree
2=Moderately disagree
3=Somewhat disagree
4=Somewhat agree
5=Moderately agree
6=Strongly agree
Use of Established Measure Development Guidelines:
We developed this measure in consultation with national guidelines for publicly reported outcome measures, with outside experts, and with the public. The measure is consistent with the technical approach to outcome measurement set forth in NQF guidance for outcome measures, CMS MMS guidance, and the guidance articulated in the American Heart Association scientific statement, “Standards for Statistical Models Used for Public Reporting of Health Outcomes” (Krumholz et al., 2006; NQF, 2011).
Citations
Brooke BS, De Martino RR, Girotti M, Dimick JB, Goodney PP. Developing strategies for predicting and preventing readmissions in vascular surgery. J Vasc Surg. 2012;56(2):556–562.
Krumholz HM, Brindis RG, Brush JE, et al. Standards for statistical models used for public reporting of health outcomes: An American Heart Association scientific statement from the Quality of Care and Outcomes Research Interdisciplinary Writing Group: cosponsored by the Council on Epidemiology and Prevention and the Stroke Council endorsed by the American College of Cardiology Foundation. Circulation. 2006; 113(3):456-462.
National Voluntary Consensus Standards for Patient Outcomes 2009 A CONSENSUS REPORT Patient Outcomes. (2011). https://www.qualityforum.org/Publications/2011/07/National_Voluntary_Consensus_Standards_for_Patient_Outcomes_2009.aspx
Steiner CA, Karaca Z, Moore BJ, Imshaug MC, Pickens G. Surgeries in Hospital-Based Ambulatory Surgery and Hospital Inpatient Settings, 2014: Statistical Brief #223. Healthcare Cost and Utilization Project (HCUP) Statistical Briefs [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2006-2017 May. https://www.hcup-us.ahrq.gov/reports/statbriefs/sb223-Ambulatory-Inpatient-Surgeries-2014.jsp. Accessed November 6, 2019.
Current 2023 Submission
Because there is evidence that outcomes for some surgical procedures are associated with higher volume (Levaillant et al., 2021; Brodeur et al., 2022), we examined the relationship between the HOPD surgery measure scores with facility-level procedural volume. For our updated 2023 submission we provide additional validity testing in the form of the association of HOPD measure scores with procedural volume. We examined this association by plotting HOPD measure scores (RSHVRs) within quintiles of facility-level procedural volume. We also calculated an overall (Pearson) correlation coefficient. Because of the association between volume and outcomes exists for some procedures but not all, we hypothesized that there would be a weak, negative correlation between facility-level volume and HOPD surgery scores.
Citations
Brodeur PG, Kim KW, Modest JM, Cohen EM, Gil JA, Cruz AI. Surgeon and Facility Volume are Associated With Postoperative Complications After Total Knee Arthroplasty. Arthroplasty Today. 2022;14:223-230.e1. doi:https://doi.org/10.1016/j.artd.2021.11.017
Levaillant M, Marcilly R, Levaillant L, et al. Assessing the hospital volume-outcome relationship in surgery: a scoping review. BMC Medical Research Methodology. 2021;21:204. doi:https://doi.org/10.1186/s12874-021-01396-6
4.3.4 Validity Testing ResultsPrevious 2020 Submission
Empiric Validity Testing
To examine the external validity of the HOPD Surgery measure, we divided hospitals into quartiles based on their scores on the comparator measure, the surgery cohort of the HWR measure (range of scores 7.10%-14.79%). We then displayed the distribution of those hospitals’ HOPD Surgery measure scores (RSHVRs) within each of the HWR quartiles in a box plot or “whisker” plot (Figure 2a in tables/figures attachment). (Note: The horizontal line within a box represents the median HOPD RSHVR of all the hospitals in the quartile, the open circle represents the mean, the horizontal boundaries of a box represent the 1st and 3rd quartiles).
We also compared outliers on the HOPD Surgery measure within quartiles of HWR performance (Figure 2a and Figure 2b in tables/figures attachment). In Figure 2a(see figures/tables attachment) we display hospitals that are statistical outliers on the HOPD Surgery measure with a blue triangle (if they are better than expected) or a red diamond (if they are worse than expected) (outliers are identified as described below). In Figure 2b(see figures/tables attachment) we show the total number of “better than expected” and “worse than expected” facilities within each quartile of performance on the HWR measure (surgery cohort).
All analyses included facilities with at least 30 procedures.
The results show a trend toward better performance on the HOPD Surgery measure with better performance on the comparator measure (HWR, surgery cohort). As shown in Figure 2a(see figures/tables attachment), better performance on the HOPD Surgery measure shows a small positive trend with better performance across quartiles of performance on the HWR measure. The correlation coefficient indicates a very weak positive correlation (0.033, p=0.07) as expected.
The outlier (better, and worse, than expected) comparison is consistent with the trend toward better performance on the HOPD Surgery measure with better performance HWR measure (Figure 2b). There are more “better than expected” HOPD Surgery outliers in the first (better performing) quartile of HWR performance, and more “worse” HOPD Surgery outliers in the fourth (worst performing) quartile of HWR performance. A chi square test indicated that this relationship was significantly different than what would be expected by chance alone (p=0.0331).
More specifically,
• There are 64 HOPD Surgery “better than expected” outliers (blue bar in Figure 2b) in the first or best quartile (Q1) of HWR performance. There are also more “better than expected” HOPD Surgery outliers (blue bar) than “worse than expected” (red bar) (64 vs. 50).
• There are 56 HOPD Surgery “worse than expected” outliers (red bar) in the fourth (Q4) or worst performing quartile of HWR. There are also more “worse than expected” (red bar) HOPD Surgery outliers than “better than expected” (blue bar) outliers (56 vs. 31).
Validity as assessed by the TEP
The results of the TEP rating of agreement with the validity statement were as follows:
N=13
Mean rating=5.2
All TEP members who responded to the survey indicated they agreed with the statement that “The risk-standardized hospital visit ratios obtained from the outpatient surgery measure as specified can be used to distinguish between better and worse quality facilities.” 12 of the 13 indicated they moderately or strongly agreed. Two TEP members did not respond to the TEP survey.
Frequency of Ratings of Agreement
Rating # (%) of Responses
1 (Strongly disagree) 0 (0)
2 (Moderately disagree) 0 (0)
3 (Somewhat disagree) 0 (0)
4 (Somewhat agree) 1 (7.7)
5 (Moderately agree) 8 (61.5)
6 (Strongly agree) 4 (30.8)
Current Submission
Association of HOPD Surgery measure scores with volume
Because there is evidence that outcomes for some surgical procedures are associated with higher volume (Levaillant et al., 2021; Brodeur et al., 2022), we examined the relationship between the HOPD surgery measure scores with facility-level procedural volume. Figure 3 (see tables/figures attachment) shows that the there is an overall trend toward improved outcomes (lower RHSVRs) with increasing facility volume. Median RHSVRs decline across quintiles, in particular across the three highest-volume quintiles. For example, mean, median and IQR ranges of measure scores (RSHVRs) decline with each quintile when comparing the fourth and fifth quintiles of procedural volume to the third quintile. The correlation coefficient between facility-level procedural volume and the HOPD measure score was -0.18 (p-value: <0.0001).
Citations
Brodeur PG, Kim KW, Modest JM, Cohen EM, Gil JA, Cruz AI. Surgeon and Facility Volume are Associated With Postoperative Complications After Total Knee Arthroplasty. Arthroplasty Today. 2022;14:223-230.e1. doi:https://doi.org/10.1016/j.artd.2021.11.017
Levaillant M, Marcilly R, Levaillant L, et al. Assessing the hospital volume-outcome relationship in surgery: a scoping review. BMC Medical Research Methodology. 2021;21:204. doi:https://doi.org/10.1186/s12874-021-01396-6
4.3.5 Interpretation of Validity ResultsPrevious 2020 Submission
The combination of empiric and face validity results support the validity of this HOPD surgery measure. First, the results of the external empiric validation analysis suggest that there is a positive, although very weak, relationship between the HOPD Surgery measure score and the measure score for the surgery cohort of the HWR measure. However, we did observe a significant relationship between outliers identified in the HOPD Surgery measure and the performance score quartiles of the HWR surgery cohort. This relationship showed more “worse” than “better” HOPD Surgery outliers in the worst performing HWR quartile but the same number of “better” and “worse” outliers in the first (better) performing HWR quartile.
Current Submission
Our results from the current and past submissions provide additional evidence for the validity of the HOPD Surgery measure. With this current submission we provide additional evidence for measure score validity in the form of a volume-outcome relationship: as facility volume increases, we found that there is a trend toward lower (better) measure scores. We also found that the direction and strength of the overall association was as we predicted (a weak, negative relationship). These results are supported by literature that shows a volume/outcome relationship for some but not all procedures. Finally, TEP survey results show high agreement (with 92 percent or 12/13 respondents “moderately or “strongly” agreeing) regarding the ability of the HOPD Surgery measure to distinguish between higher- and lower-quality facilities. Taken together, these results support the validity of the HOPD Surgery measure.
-
4.4.1 Methods used to address risk factors4.4.2 Conceptual Model Rationale
Previous 2020 submission
We developed and used the conceptual framework described below to identify potential social risk factors. Limited social risk factor data are available at this time, however, on Medicare beneficiaries (Department of Health and Human Services, 2016). We analyzed two well-studied social risk factors that could best be operationalized in data, outlined below. We note that this measure already adjusts for age and gender.
1. Medicare-Medicaid dual-eligibility status
Dual eligibility for Medicare and Medicaid is available at the patient level in the Medicare Master Beneficiary Summary File. The eligibility threshold for over 65-year-old Medicare patients considers both income and assets. For the dual-eligible (DE) indicator, there is a body of literature demonstrating differential health care and health outcomes among beneficiaries, indicating that, while not ideal, the DE indicator allow us to examine some of the pathways of interest (Department of Health and Human Services, 2016)
2. Agency for Healthcare Research and Quality Socioeconomic Status (AHRQ SES) Index
We selected the AHRQ-validated SES index score because it is a well-validated variable that describes the average SES of people living in defined geographic areas (Bonito, 2008). It is a widely used index that summarizes area-level measures of employment, income, education, and housing from the American Community Survey (ACS). Each of the index components is available at the census block level, which we then used to link to patient’s residence using 9-digit ZIP code. The AHRQ SES index score summarizes the following variables:
• Percentage of people in the labor force who are unemployed,
• Percentage of people living below poverty level,
• Median household income,
• Median value of owner-occupied dwellings,
• Percentage of people ≥25 years of age with less than a 12th grade education,
• Percentage of people ≥25 years of age completing ≥4 years of college, and
• Percentage of households that average ≥1 people per room.
The AHRQ SES Index’s value as a proxy for patient-level information is dependent on having the most granular level data with respect to communities that patients live in. In this submission, we present analyses using the census block group level, the most granular level possible using ACS data. A census block group is a geographical unit used by the US Census Bureau which is between the census tract and the census block. It is the smallest geographical unit for which the bureau publishes sample data. The target size for block groups is 1,500 and they typically have a population of 600 to 3,000 people. We used 2013-2017 ACS data and mapped patients’ 9-digit ZIP codes via vendor software to the census block group level. Given the variation in cost of living across the country, we adjusted the median income and median property value components of the AHRQ SES Index by regional price parity values published by the Bureau of Economic Analysis (BEA). This provides a better marker of low SES neighborhoods in high expense geographic areas. We then calculated an AHRQ SES Index score for census block groups that can be linked to 9-digit ZIP codes.
We identify patients at risk due to social factors if they are in the bottom 25th percent of the ARHQ SES distribution.
Citations
Bonito A, Bann C, Eicheldinger C, Carpenter L. Creation of new race-ethnicity codes and socioeconomic status (SES) indicators for Medicare beneficiaries. Final Report, Sub-Task. 2008;2.
Department of Health and Human Services, Office of the Assistant Secretary of Planning and Evaluation. Report to Congress: Social Risk factors and Performance Under Medicare’s Value-based Payment Programs. 2016; https://aspe.hhs.gov/pdf-report/report-congress-social-risk-factors-and-performance-under-medicares-value-based-purchasing-programs. Accessed December 8, 2019.
Social Risk Factors for Disparities Analyses
CMS submitted the HOPD Surgery measure for NQF endorsement in January of 2015, prior to the NQF Sociodemographic Status (SDS) trial. Therefore, according to NQF guidance, results of social risk factor testing were not considered in the risk adjustment for this measure. However, during NQF public comment during initial endorsement, two stakeholders noted their concern regarding the lack of social risk factor adjustment. Accordingly, in response to public comment, we provided NQF with the results of social risk factor testing results that had been completed, which were consistent with the updated testing provided below. The Standing Committee voted to endorse the measure without adjustment for social risk factors, and NQF’s Consensus Standards Approval Committee (CSAC) voted to uphold the Standing Committee’s endorsement, following a discussion about social risk factor adjustment (NQF, 2015).
Since the measure was endorsed, we have updated the measure in response to feedback from stakeholders (as discussed in the Measure Submission/ITS form). CMS initiated a dry run in 2017 in preparation for 2020 public reporting but did not receive any feedback that resulted in re-examination of risk variables, including social risk factors. (Note that hospitals received their confidential facility-level scores in November 2019; CMS will report facility-level measure scores to the public on Hospital Compare in January 2020.)
For this re-endorsement application, we re-analyzed the effects of social risk factors on the models, incorporating the evolution in both policy and technical approaches from the past few years. CMS reviewed these results, and after careful consideration within the context of the conceptual model outlined below in this section, decided not to adjust the measure for social risk factors. The details regarding the methods, results, and interpretation of results are in section 2b3.4b, below.
We selected social risk factor variables based on a review of literature, conceptual pathways, and feasibility. In section 1.8, we describe the variables available in Medicare claims data that we considered and analyzed, based on this review. Below, we describe the pathways by which social risk factors may influence risk of the outcome.
Causal Pathways for Social Risk Variable Selection
Our conceptualization of the pathways by which patients’ social risk factors affect the outcome was informed by the literature (Mioton et al., 2014; Bhattacharyya, 2015; Al-Qurayshi et al., 2016; Dallas et al., 2017; Trivedi et al., 2014; Jha et al., 2011; Reames et al., 2014) and IMPACT Act–funded work by the National Academies of Sciences, Engineering and Medicine (NASEM) and the Department of Health and Human Services Assistant Secretary for Planning and Evaluation (ASPE) (HHS, 2016; NASEM, 2016).
Literature Review of Social Risk Variables and Ambulatory Surgery Post-Procedure Hospital Visits
To inform a conceptual model for the relationship of social risk factors to the outcome we performed a literature search during development of the original measure in 2016 that included articles that contained key words in the title or abstract related to outpatient surgeries or procedures, socioeconomic and sociodemographic disparities, and hospital visits (emergency department, observation, or hospital admission). We excluded any non-English language articles, articles published more than 10 years ago, articles without primary data, articles focused on pediatric patient population, and articles not explicitly focused on social risk factors and hospital visits after outpatient surgery. A total of 176 studies were reviewed by title and abstract, and all but two studies were excluded from full-text review based on the above criteria. The two studies indicated that African American and Hispanic patients and patients from lower-income households were at increased risk of post-procedure hospital visits in the outpatient surgery setting (Mioton et al., 2014; Bhattacharyya et al., 2015).
An updated literature search performed in November of 2019 identified two additional studies. In a 2016 study, authors found that patients in “high-risk” communities undergoing outpatient thyroidectomy were more likely to be operated on by low-volume surgeons, and that patients in these communities were more likely to have worse post-operative outcomes, including a higher risk for hospital admission (Al-Qurayshi et al., 2016). In one 2017 study, researchers found that Medicaid status was independently associated with an increase in the odds of an unplanned hospital admission following urethral sling placement, and that the increase remained after controlling for patient comorbidities, demographics, and facility characteristics (Dallas et al., 2017).
Conceptual Pathways for Social Risk Factor Variable Selection
Although there is limited literature linking social risk factors and adverse outcomes, we identified the following potential pathways through which social risk factors may influence the outcome of 7-day visits following outpatient surgery, based on the specific clinical consideration of the procedure and the broader social risk factor literature.
1.Differential care within a facility or unmet differential needs. One pathway by which social risk factors may contribute to post-surgical hospital visit risk is that patients may not receive equivalent care within a facility (Trivedi et al., HHS, 2016). However, as noted above, studies in the outpatient surgery setting are lacking. Moreover, patients with social risk factors, such as lower education, may require differentiated care – e.g., provision of information at a lower health literacy level – to achieve outcomes comparable to those of patients without social risk factors. Facilities that do not identify the need for and provide such care could have worse outcome rates for their patients with social risk factors.
2. Use of lower-quality facilities. Patients may differentially obtain care in lower quality facilities. With respect to inpatient hospital care, patients of lower income, lower education, or unstable housing have been shown not to have equitable access to high-quality facilities because such facilities are less likely to be found in geographic areas with large populations of poor patients. Thus, patients with low income are more likely to be seen in lower-quality hospitals, which can contribute to increased risk of adverse outcomes following hospitalization (Jha et al., 2011; Reames et al., 2014). In the outpatient setting, as described above, there is evidence that patients with social risk factors may receive services at facilities that have surgeons with less experience, putting patients at higher risk of a post-surgical visit (Al-Qurayshi et al., 2016).
3. Influence of social risk factors on hospital visit risk outside of facility quality. Some social risk factors, such as income or wealth, may affect the likelihood of post-procedure hospital visits without directly being associated with the quality of care received at the facility. For instance, while a surgeon and/or a facility may make appropriate care decisions and provide tailored care and education, we hypothesized that a lower-income patient may still have a worse outcome post-procedure due to factors such as a limited understanding of the discharge plan, or a lack of home support, transportation or other resources for following discharge instructions. These factors, however, can be anticipated and addressed for outpatient elective surgeries more readily than in more emergent care contexts.
4. Relationship of social risk factors with patients’ health at admission. Patients with lower income/education/literacy for unstable housing may have worse general health status and may present for their procedure with greater severity of underlying illness (HHS, 2016). This causal pathway should be largely accounted for by current clinical risk-adjustment.
The social risk variables that we examined were:
• Dual-eligible status
• AHRQ-validated SES Index score
Citations
Al-Qurayshi Z, Randolph GW, Srivastav S, et al. Outcomes in thyroid surgery are affected by racial, economic, and healthcare system demographics. Laryngoscope. 2016 Sep;126(9):2194-9.
Bhattacharyya N. Healthcare disparities in revisits for complications after adult tonsillectomy. Am J Otolaryngol. 2015 Mar-Apr;36(2):249-253.
Dallas KB, Sohlberg EM, Elliott CS, Rogo-Gupta L, et al. Racial and Socioeconomic Disparities in Short-term Urethral Sling Surgical Outcomes. Urology. 2017 Dec; 110:70-75.
Department of Health and Human Services, Office of the Assistant Secretary of Planning and Evaluation. Report to Congress: Social Risk factors and Performance Under Medicare’s Value-based Payment Programs. 2016; https://aspe.hhs.gov/pdf-report/report-congress-social-risk-factors-and-performance-under-medicares-value-based-purchasing-programs. Accessed November 10, 2017.
Jha AK, Orav EJ, Epstein AM. Low-quality, high-cost hospitals, mainly in South, care for sharply higher shares of elderly black, Hispanic, and Medicaid patients. Health Aff. 2011; 30:1904-1911.
Mioton LM, Buck DW, 2nd, Rambachan A, Ver Halen J, Dumanian GA, Kim JY. Predictors of readmission after outpatient plastic surgery. Plast Reconstr Surg. 2014;133(1):173-180.
National Academies of Sciences, Engineering, and Medicine (NASEM); Accounting for Social Risk Factors in Medicare Payment: Identifying Social Risk Factors. Washington DC: National Academies Press; 2016.
NQF-Endorsed Measures for Surgical Procedures, 2015, Final Report. December 23, 2015. http://www.qualityforum.org/Projects/s-z/Surgery_Measures_2014/Final_Report.aspx. Accessed December 18, 2019.
Reames BN, Birkmeyer NJ, Dimick JB, et al. Socioeconomic disparities in mortality after cancer surgery: failure to rescue. JAMA Surg. 2014; 149:475-481.
Trivedi AN, Nsa W, Hausmann LR, et al. Quality and equity of care in U.S. hospitals. New Engl J Med. 2014; 371:2298-2308.
Current 2023 Submission
Conceptual model
For this 2023 submission we have reviewed our conceptual model related to social risk factors and have determined that the original conceptual model remains valid. With an updated focused literature search we found additional evidence for the impact of social risk factors on outcomes for patients undergoing surgery. For example, a 2019 study found that while patients with low income undergoing colectomy had higher rates of surgical-site infections compared with higher-income patients, there was no difference in surgical-site infection rates based on income for patients undergoing hysterectomy (Qi et al., 2019). A 2023 study in cancer patients undergoing surgery found that patients with psychosocial risk factors were more likely to experience complications following surgery (Leeds et al., 2019). Finally, a 2021 study found that for some procedures, people living in counties with high social vulnerability (SVI) were more likely to experience complications compared with patients who live in low SVI counties (Diaz et al., 2021).
Social risk factors analyzed
With this updated submission we have replaced analyses that previously used the AHRQ SES variable, described above, with the validated Area Deprivation Index (ADI) (Forefront Group, 2023). We made this change to align with other CMS work on social risk factors that now uses the ADI. We describe the ADI variable below.
Area Deprivation index (ADI): The ADI, initially developed by Health Resources & Services Administration (HRSA), is based 17 measures across four domains: income, education, employment, and housing quality (Kind et al., 2018; Singh, 2003).
The 17 components are listed below:
- Population aged ≥ 25 y with < 9 y of education, %
- Population aged ≥ 25 y with at least a high school diploma, %
- Employed persons aged ≥ 16 y in white-collar occupations, %
- Median family income, $
- Income disparity
- Median home value, $
- Median gross rent, $
- Median monthly mortgage, $
- Owner-occupied housing units, % (home ownership rate)
- Civilian labor force population aged ≥16 y unemployed, % (unemployment rate)
- Families below poverty level, %
- Population below 150% of the poverty threshold, %
- Single-parent households with children aged < 18 y, %
- Households without a motor vehicle, %
- Households without a telephone, %
- Occupied housing units without complete plumbing, % (log)
- Households with more than 1 person per room, % (crowding)
ADI scores were derived using beneficiary’s 9-digit ZIP Code of residence, which is obtained from the Master Beneficiary Summary File, and is linked to 2017-2021 US Census/American Community Survey (ACS) data. In accordance with the ADI developers’ methodology, an ADI score is calculated for the census block group corresponding to the beneficiary’s 9-digit ZIP Code using 17 weighted Census indicators. Raw ADI scores were then transformed into a national percentile ranking ranging from 1 to 100, with lower scores indicating lower levels of disadvantage and higher scores indicating higher levels of disadvantage. Percentile thresholds established by the ADI developers were then applied to ADI percentile to dichotomize neighborhoods into more disadvantaged (high ADI areas=ranking equal to or greater than 85) or less disadvantaged areas (Low ADI areas= ranking of less than 85).
Citations
Diaz, A., Hyer, J. M., Barmash, E., Azap, R., Paredes, A. Z., & Pawlik, T. M. (2021). County-level Social Vulnerability is Associated With Worse Surgical Outcomes Especially Among Minority Patients. Annals of Surgery, 274(6), 881–891. https://doi.org/10.1097/SLA.0000000000004691
Kind AJH, Buckingham W. Making Neighborhood Disadvantage Metrics Accessible: The Neighborhood Atlas. New England Journal of Medicine, 2018. 378: 2456-2458. DOI: 10.1056/NEJMp1802313. PMCID: PMC6051533. AND University of Wisconsin School of Medicine Public Health. 2023 Area Deprivation Index v4.0. Downloaded from https://www.neighborhoodatlas.medicine.wisc.edu/.
Leeds, I. L., Meyers, P. M., Zachary Obinna Enumah, He, J., Burkhart, R. A., Haut, E. R., Efron, J. E., & Johnston, F. M. (2019). Psychosocial Risks are Independently Associated with Cancer Surgery Outcomes in Medically Comorbid Patients. Annals of Surgical Oncology, 26(4), 936–944. https://doi.org/10.1245/s10434-018-07136-3
Singh, G. K. (2003). Area Deprivation and Widening Inequalities in US Mortality, 1969–1998. American Journal of Public Health, 93(7), 1137–1143. https://doi.org/10.2105/ajph.93.7.1137
The Area Deprivation Index Is The Most Scientifically Validated Social Exposome Tool Available For Policies Advancing Health Equity. (2023). Forefront Group. https://doi.org/10.1377/forefront.20230714.676093
Qi AC, Peacock K, Luke AA, Barker A, Olsen MA, Joynt Maddox KE. Associations Between Social Risk Factors and Surgical Site Infections After Colectomy and Abdominal Hysterectomy. JAMA Netw Open. 2019;2(10):e1912339. doi:10.1001/jamanetworkopen.2019.12339
4.4.3 Risk Factor Characteristics Across Measured EntitiesTable 7 (see tables/figures attachment) shows the distribution of social risk factors identified in the conceptual model for the HOPD Surgery measure. The facility median proportion of patients with the DE and ADI variables is 3.4% and 6.3%, respectively.
We note that this measure is also reported to hospitals confidentially, stratified by dual eligibility.
4.4.4 Risk Adjustment Modeling and/or Stratification ResultsThe final list of risk of clinical, procedural and demographic variables was selected during development and is shown here and defined in the data dictionary in tab “HOPD_Surg_Risk_Factor_CCs”.
Age minus 65 (years above 65)
Cancer (CC 8-14)
Diabetes and DM Complications (CC 17-19, 122, 123)
Disorders of Fluid/Electrolyte/Acid-Base (CC 24)
Intestinal Obstruction/Perforation (CC 33)
Inflammatory Bowel Disease (CC 35)
Bone/Joint/Muscle Infections/Necrosis (CC 39)
Hematological Disorders Including Coagulation Defects and Iron Deficiency (CC 46, 48, 49)
Dementia or Senility (CC 51-53)
Psychiatric Disorders (CC 57-63)
Hemiplegia, Paraplegia, Paralysis, Functional Disability (CC 70, 71, 73, 74, 103-105, 189, 190)
Other Significant CNS Disease (CC 77-80)
Cardiorespiratory Arrest, Failure and Respiratory Dependence (CC 82-84)
Congestive Heart Failure (CC 85)
Ischemic Heart Disease (CC 86-89)
Hypertension and Hypertensive Disorders (CC 94, 95)
Arrhythmias (CC 96, 97)
Vascular Disease (CC 106-109)
Chronic Lung Disease (CC 111-113)
UTI and Other Urinary Tract Disorders (CC 144, 145)
Pelvic Inflammatory Disease and Other Specified Female Genital Disorders (CC 147)
Chronic Ulcers (CC 157-161)
Cellulitis, Local Skin Infection (CC 164)
Prior Significant Fracture (CC 169-171)
Morbid Obesity (CC 22)
Work Relative Value Units
Surgical Body System:
Miscellaneous diagnostic and therapeutic procedures
Cardiovascular
Digestive
Ear
Endocrine
Female Genitalia
Hemic-Lymphatic
Skin & Breast
Male Genitalia
Musculoskeletal
Nervous
Nose-Throat-Pharynx
Respiratory
Urinary
4.4.4a Attach Risk Adjustment Modeling and/or Stratification Specifications4.4.5 Calibration and DiscriminationCORE’s measures undergo an annual measure reevaluation process, which ensures that the risk-standardized models are continually assessed and remain valid, given possible changes in clinical practice and coding standards over time. Modifications made to measure cohorts, risk models, and outcomes are informed by a review of the most recent literature related to measure conditions or outcomes, feedback from various stakeholders, and empirical analyses, including assessment of coding trends that reveal shifts in clinical practice or billing patterns. Input is solicited from a workgroup composed of up to 20 clinical and measure experts, inclusive of internal and external consultants and subcontractors.
To assess model performance, we computed three summary statistics for the HOPD Surgery measure: two discrimination statistics (the C-statistic, predictive ability) and one calibration statistic (overfitting) (Harrell et al, 2001). In addition, we provide risk-decile plots.
Discrimination Statistics
(1) Area under the receiver operating characteristic (ROC) curve (c-statistic)
The c-statistic is the probability that predicting the outcome is better than chance, which is a measure of how accurately a statistical model is able to distinguish between a patient with and without an outcome.
To calculate the c-statistic, observed hospital visit ratios were compared to predicted hospital visit probabilities across predicted rate deciles.
Previous 2020 Submission
C-statistic: 0.684
The c-statistic of 0.684 indicate good model discrimination.
Current Submission
C-statistic: 0.693
The c-statistic of 0.694 indicates continued good model discrimination. The model indicated a wide range between the lowest decile and highest decile, indicating the ability to distinguish high-risk subjects from low-risk subjects.
(2) Predictive ability
Discrimination in predictive ability measures the ability to distinguish high-risk subjects from low-risk subjects; therefore, for a model with good predictive ability we would expect to see a wide range in hospital visit ratios between the lowest decile and highest decile. To calculate the predictive ability, we calculated the range of observed hospital visit ratios between the lowest and highest predicted deciles.
Previous 2020 Submission
Predictive Ability, % (lowest decile - highest decile): 2.26-18.02
The model indicated a wide range between the lowest decile and highest decile, indicating the ability to distinguish high-risk subjects from low-risk subjects.
Current Submission
Predictive Ability, % (lowest decile - highest decile): 1.74-16.01
The model continues to show a wide range between the lowest decile and highest decile, indicating the ability to distinguish high-risk subjects from low-risk subjects.
Calibration Statistics (from original measure development)
(3) Over-fitting indices
Over-fitting refers to the phenomenon in which a model accurately describes the relationship between predictive variables and outcome in the development dataset but fails to provide valid predictions in new patients. Estimated calibration values of γ0 far from 0 and estimated values of γ1 far from 1 provide evidence of over-fitting. We used Dataset #1 for this analysis. Our results, shown below, show a calibration value of close to 0 at one end and close to 1 to the other end indicating good calibration of the model.
CORE notes that after initial measure development we do not re-test our risk models for overfitting using a dataset that is external to the testing sample. In our risk models, coefficients are updated each time the measure is calculated; we refit the model with new data each time the measure is calculated. Therefore, random statistical fluctuations in model coefficients across repeated reporting cycles are part of the overall random error in the facility performance estimates. CORE believes that this approach is not a validity issue for this type of model, unlike the case of a static risk model.
2010 Development Sample results:
Calibration: (0,1)
2010 Validation Sample results:
Calibration: (-0.05, 0.96)
Risk Decile Plots
Higher deciles of the predicted outcomes are associated with higher observed outcomes, which show a good calibration of the model. The risk decile plot shown below indicates good discrimination of the model and good predictive ability.
Previous 2020 Submission
See Figure 4 in attachment with compiled figures and tables.
Current Submission
We provide updated risk decile plots for all patients, and for patients with DE and high ADI in Figures 5, 6, and 7 (see tables/figures attachment).
Higher deciles of the predicted outcomes are associated with higher observed outcomes, which continue to show good calibration of the model. The risk decile plot indicates continued good discrimination of the model and good predictive ability, for all patients, and for patients with DE and high ADI, separately.
Citation
Harrell FE and Shih YC. Using full probability models to compute probabilities of actual interest to decision makers, Int. J. Technol. Assess. Health Care 17 (2001), pp. 17–26.
4.4.6 Interpretation of Risk Factor FindingsSee fields above and respective attachment.
4.4.7 Final Approach to Address Risk FactorsRisk adjustment approachOnRisk adjustment approachOnSpecify number of risk factors27
Conceptual model for risk adjustmentOnConceptual model for risk adjustmentOn
-
-
-
5.1 Contributions Towards Advancing Health Equity
At the patient level, we know that patients with social risk factors (present in our conceptual model) may have higher unadjusted outcomes (hospital visit rates) following outpatient surgery, but differences vary depending on the social risk factor. For example, using CY2022 data (Jan 1, 2022-Dec 30, 2022) we found that patients with dual eligibility (DE) have an unadjusted hospital visit rate of 8.8%, compared with 6.2% for patients without DE. In contrast, however, patients with high ADI have visit rates that are only slightly higher than patients with Low ADI (6.8% vs. 6.3% respectively).
Measure Stratification
Due to these observed disparities in outcomes and the desire to shed light on and improve outcomes for all patients, CORE has developed for CMS (as described in the “stratification” section of this CBE submission) a disparities stratification methodology. Please see the “stratification” section for more information about the methodology; here we provide the most recent results for both the within-facility disparities method (which compares care within a hospital, comparing their DE and non-DE patients) and the across facility method (which compares facility-level outcomes for DE patients to the national average for all DE patients). The full methodology and results are available in more detail in this report:
https://qualitynet.cms.gov/files/652fd45a8be3e0001c0b5141?filename=CY23_OP_32_35_36_DsprtySpecs.pdf
For the within-hospital approach, we found that more hospitals have worse outcomes for their DE patients compared with their non-DE patients. Using data from January 1, 2022-December 31, 2022, we characterized performance using the within-facility approach at the facility level, using rate difference cutoffs of: “better” outcomes as rate differences of less than -1%; “worse” outcomes as rate differences of >1%, and “similar” outcomes for rate differences between -1% and +1%. Using these categories, we found that 160 hospitals were characterized as “better,” 326 hospitals were characterized as “worse” and the remaining 167 were characterized as “similar” for their outcomes for DE patients compared with non-DE patients.
For the across-facility disparities approach, we found that slightly more hospitals perform better than the national rate, compared with worse than the national rate: 152 hospitals had outcomes for DE patients that were better than the national rate, compared with 106 that were worse, and 207 that were no different than the national rate. However, 2,676 of 3,141 eligible hospitals (85%) had insufficient data to be categorized.
Publicly reported measure
The version of the HOPD Surgery measure that is publicly reported is not adjusted for social risk factors. We performed two analyses to explore the impact of adding either of two social risk factors (DE, and ADI) to the model, on measure scores. We found that adding either social risk factor to the model did not result in major impacts on measure scores, suggesting that the variables in the risk model account for some of the differences we see in unadjusted patient-level outcome rates. Therefore, in this pay-for-reporting program, providers will not be unfairly profiled when assessed by the HOPD surgery measure. We describe the analyses and results below.
To examine the impact of social risk factors on measure scores, we first examined correlations (Pearsons) between measure scores with and without either social risk factor and found that correlations were near 1 (0.999, and 0.999, respectively) (Figure 8 and 9 in attachment of tables/figures). Second, we examined the association between the facility proportion of patients with each social risk factor and measure scores, focusing on the quartile of facilities with the highest proportion of patients with social risk factors (Figure 9 and 10 in attachment of tables/figures). We found that there is a very weak but significant correlation (r=0.039, p=0.03) between the proportion of patients with DE and the measure score for the fourth quartile of facility-proportion of patients with DE. However, there is no significant correlation for the high ADI variable (r=0.02, p=0.28). We concluded therefore, that there is little to no impact of adding social risk factors on measure scores for this HOPD Surgery measure. As described above, however, CMS has implemented confidential reporting of the measure, stratified for DE.
-
-
-
6.1.3 Current Use(s)6.1.4 Program DetailsSponsor: Hospital outpatient quality reporting program (HOQR), CMS, https://qualitynet.cms.gov/outpatient/oqr, Implemented by CMS for outpatient services, the Hospital OQR is a national pay-for-quality-data-reporting program mandated by the Tax Relief and Healt, Geographic area: national; For the final cohorts from January 1, 2022 – December 31, 2022, there were 1,204,400 procedures performed in 3,818 faciliti, The level of measurement is the facility; the setting is the Hospital Outpatient Department.
-
6.2.1 Actions of Measured Entities to Improve Performance
The outcome of unplanned hospital visits following outpatient same-day surgery is a widely accepted measure of outpatient surgical care quality. This measure provides the opportunity to improve quality of care and to lower rates of adverse events leading to hospital visits after outpatient surgery.
Estimates of hospital visit rates within the first 30 days following surgery vary from less than one percent to 28% depending on the type of surgery, the outcome measured (inpatient admissions alone or with ED visits, and observation stays), outcome timeframe (e.g., 7, 14, or 30 days), and patient characteristics (e.g. age, sex) (Christian, 2019; Mull, 2019, De Oliveira, 2015; Liu et al., 2018; Rosero et al., 2017, DeFroda, 2017, Gengler et al., 2017, Liu et al., 2018-2). For example, a 2018 retrospective study of patients undergoing outpatient shoulder arthroscopy found an inpatient admission rate within 7 days of 0.22% (Liu et al, 2018a). In contrast, a 2018 study of veterans aged 65 or older found a 28% rate of hospital admissions (in-patient, emergency department, and observation stays) within 7 days for patients who had urological surgery, and a 6% rate of hospital admissions for patients who had orthopedic surgery (Mull et al., 2018).
Common causes of return visits following outpatient surgery include surgical errors, post-operative pain, infection, nausea, and vomiting (Rosero et al., 2017, Gildaseo et al., 2015, Liu et al., 2018a, Liu et al., 2018b). In one 2017 study of patients undergoing outpatient laparoscopic cholecystectomy, 60% of hospital return visits were due to these preventable events (Rosero et al., 2017). Other less common, but more serious, reasons for return hospital visits include bleeding, respiratory complications, deep vein thrombosis, cardiac complications, and urinary complications (Rosero et al, 2017; Gildasio, et. Al., 2015; DeOliveria, 2015; Liu et al., 2018a; Liu et al., 2018b; Rosero et al., 2017). Patient characteristics, such as age, sex, and comorbidities such as diabetes, can increase the risk of an admission (De Oliveria et al., 2015; DeFroda et al., 2017; Gengler et al., 2017; Christian et al., 2019). In addition, clinical procedural factors can increase the risk, such as the type of anesthesia used, and longer operation time (Defroda et al., 2017; Liu et al., 2018a; Gengler et al., 2017; Mingus et al., 1997; Christian et al., 2019).
Interventions to improve same-day outpatient surgical procedural quality can reduce unplanned hospital visits following outpatient surgery. Potential quality improvement actions include appropriate patient selection, improving surgical techniques, implementing protocols to address common problems such as adequate control of nausea and vomiting and postoperative pain, patient education about potential adverse effects of the surgery, reconciling patient medications, and organizing appropriate follow-up care with providers such as primary care physicians. For example, guidelines recommend multi-modal approaches for treatment of post-operative pain (Chou et al., 2016; Joshi et al., Mariano, et al, 2020) as well as routine multi-modal nausea and vomiting prophylaxis for all patients (Gan et al., 2014). Facilities can also provide support for identifying and managing patient-level risk factors; for example, identifying patients with diabetes can ensure optimal care during the perioperative period regarding prevention of hyperglycemia (Thompson et al., 2016).
A hospital visit following same-day surgery is an unexpected and potentially preventable outcome for patients scheduled for same-day surgeries that have a low anticipated risk. Providers (HOPDs and surgeons) are often unaware of their patients’ hospital visits after surgery because patients often present to the ED or to different hospitals, leading to understated adverse event rates and suggesting the need for better measurement to drive quality improvement (Mezei G, 1999). Therefore, both patients and providers benefit from outcome measures of hospital visits – a broad, patient-centered outcome that reflects the full range of reasons leading to hospitalization among patients undergoing same-day surgery.
The HOPD Surgery measures is part of the Hospital Outpatient Quality Reporting (HOQR) Program, a pay-for-reporting program. HOPDs first saw their facility-specific measure scores in 2017, during a “dry run” that precedes public reporting. The measure was first publicly reported in January 2020, on Hospital Compare. Currently, there are no other publicly available quality reports of HOPDs that perform same-day surgery. Thus, this measure addresses an important quality measurement area and enhances the information available to patients choosing among HOPDs that provide same-day outpatient surgery. Furthermore, providing outcome rates to HOPDs makes visible to clinicians and hospitals meaningful quality differences and incentivizes improvement.
Citations
Christian RA, Gibbs DB, Nicolay RW, Selley RS, Saltzman MD. Risk factors for admission after shoulder arthroscopy. J Shoulder Elbow Surg. 2019 May;28(5):882-887.
Chou et al., Guidelines on the Management of Postoperative Pain. The Journal of Pain, Vol 17, No 2 (February), 2016: pp 131-157.
DeFroda SF, Bokshan SL, Owens BD. Risk Factors for Hospital Admission Following Arthroscopic Bankart Repair. Orthopedics. 2017 Sep 1;40(5):e855-e861.
De Oliveira GS Jr, Holl JL, Lindquist LA, Hackett NJ, Kim JY, McCarthy RJ. Older Adults and Unanticipated Hospital Admission within 30 Days of Ambulatory Surgery: An Analysis of 53,667 Ambulatory Surgical Procedures. J Am Geriatr Soc. 2015 Aug;63(8):1679-85.
Falck-Ytter Y, Francis CW, Johanson NA, Curley C, Dahl OE, Schulman S, Ortel TL, Pauker SG, Colwell CW Jr. Prevention of VTE in orthopedic surgery patients: Antithrombotic Therapy and Prevention of Thrombosis, 9th ed: American College of Chest Physicians Evidence-Based Clinical Practice Guidelines.
American College of Chest Physicians Chest. 2012;141(2 Suppl):e278S.
Gan TJ, Diemunsch P, Habib AS, et al. Consensus guidelines for the management of postoperative nausea and vomiting. Anesth Analg. 2014 Jan;118(1):85-113.
Gengler I, Carpentier L, Pasquesoone X, Chevalier D, Mortuaire G. Predictors of unanticipated admission within 30 days of outpatient sinonasal surgery. Rhinology. 2017 Sep 1;55(3):274-280.
Joshi GP, Schug SA, Kehlet H. Procedure-specific pain management and outcome strategies.
Best Pract Res Clin Anaesthesiol. 2014 Jun;28(2):191-201.
Liu J, Flynn DN, Liu WM, Fleisher LA, Elkassabany NM. Hospital-Based Acute Care Within 7 Days of Discharge After Outpatient Arthroscopic Shoulder Surgery. Anesth Analg. 2018a Feb;126(2):600-605.
Liu J, Kim DH, Maalouf DB, Beathe JC, Allen AA, Memtsoudis SG. Thirty-Day Acute Health Care Resource Utilization Following Outpatient Anterior Cruciate Ligament Surgery. Reg Anesth Pain Med. 2018b Nov;43(8):849-853.
Marino, E. Management of acute perioperative pain. UpToDate, January 2020. Accessed on March 6, 2020. https://www.uptodate.com/contents/management-of-acute-perioperative-pain
Mezei G, Chung F. Return hospital visits and hospital readmissions after ambulatory surgery. Annals of surgery. Nov 1999;230(5):721-727.
Mingus ML, Bodian CA, Bradford CN, Eisenkraft JB. Prolonged surgery increases the likelihood of admission of scheduled ambulatory surgery patients. Journal of clinical anesthesia. Sep 1997;9(6):446-450.
Rosero EB, Joshi GP. Hospital readmission after ambulatory laparoscopic cholecystectomy: incidence and predictors. J Surg Res. 2017 Nov;219:108-115.
Sanabria D, Rodriguez J, Pecci P, Ardila E, Pareja R. Same-Day Discharge in Minimally Invasive Surgery Performed by Gynecologic Oncologists: A Review of Patient Selection. J Minim Invasive Gynecol. 2019.
Thompson BM, Stearns JD, Apsey HA, Schlinkert RT, Cook CB. Perioperative Management of Patients with Diabetes and Hyperglycemia Undergoing Elective Surgery. Curr Diab Rep. 2016 Jan;16(1):2.
6.2.2 Feedback on Measure PerformanceCMS receives feedback on all of its measures through the publicly available Q&A tool on Quality Net. Through this tool, we have received, since the last submission, only basic questions about the measure, including the cohort definition, the outcome definition, and specific questions about a facility’s data. We did not receive any suggestions for changes to the HOPD surgery measure.
6.2.3 Consideration of Measure FeedbackWe have made no major changes to the HOPD Surgery measure since it was last endorsed in 2020.
Minor measure updates include:
- Annual updates (2020, 2021, 2022) to ICD-10 codes that are used to specify the measure.
Each year, as part of reevaluation of the measure, CMS reviews the measure’s existing code set as well as updates to ICD-10, CPT®, and HCPCS coding guidelines to ensure that the measure’s code set is up to date.
- Update to coding for ED visits by shifting from the previously used ‘claim from date’ on the claim, to the ‘minimum ED revenue center date’ on the claim. This Aligns with changes made in the prior year to exclude cases based on this date.
6.2.4 Progress on ImprovementThe measure score is a risk-standardized hospital visit ratio (RSHVR) for each HOPD. It is calculated by computing the ratio of the number of predicted unplanned hospital visits (numerator) to the number of expected unplanned hospital visits (denominator). The numerator (predicted visits) is the number of unplanned hospital visits the HOPD is predicted to have, representing the observed unplanned hospital visit rate, the number of surgeries performed at the HOPD, and the HOPD’s case mix. The denominator (expected rate) is the number of unplanned hospital visits the HOPD is expected to have based on the nation’s performance with that HOPD’s case mix and surgical procedure mix.
This measure captures an ever-changing mix of procedures based on procedures that have been added (or removed) from the ASC-covered procedure list, which is the basis for inclusion of procedures for this measure. For example, Total Knee Arthroplasty (TKA) was removed from the CMS inpatient-only (IPO) list in January 2018, allowing TKAs to be performed in the inpatient or outpatient hospital setting; TKA was added to the Ambulatory Surgery Center (ASC) Covered Procedures List in January 2020 and THA was added in January 2021. HOPD volumes of THA/TKA procedures have been steadily increasing; after the onset of COVID, the proportion of THA/TKA procedures performed (for Medicare FFS patients) in the outpatient setting exceeded those performed in the inpatient setting (data not shown); other than this HOPD Surgery measure, there are currently no active/implemented outpatient performance measures that capture complications following THA/TKA procedures performed at HOPDs. (CMS, has, however proposed the adoption of the THA/TKA PRO-PM for the HOPD and ASC settings (Federal Register, 2023)).
Because this measure includes procedures defined by the ASC covered procedures list, when new procedures, such as THA/TKA, are included, we expect this to impact measure scores across facilities which impacts our ability to track improvements in outcomes. In fact, with this most recent data update, we found that there are more statistical outliers (both better and worse than national rates) compared with the prior 2020 submission, suggesting that performance variation has widened with the expansion of the cohort to include additional outpatient surgical procedures.
Facilities, however, receive (from CMS) facility-specific information to support quality improvement. For example, they receive detailed patient-level details that indicate which patients experienced an unplanned hospital visit, what type of visit it was (e.g., inpatient admission, ED visit, observation stay) and the principal diagnosis code associated with the visit. They also receive summary information that shows their unadjusted performance by body system (e.g., musculoskeletal, urinary tract, etc.) in comparison with state and national benchmarks.
6.2.5 Unexpected FindingsThere have been no unexpected findings during implementation. However, disruptions to the healthcare system due to COVID likely accelerated the migration of procedures to the outpatient space, resulting in changes to the case-mix of procedures captured by this measure (for example, more THA/TKA procedures).
-
-
-
Endorsement Should Be Removed From Measure 2687
OrganizationHarold D. Miller, CEO, Center for Healthcare Quality and Payment Reform
-
CBE# 2687 Staff Assessment
Importance
ImportanceStrengths:
- The developer cites evidence indicating that nearly 8% of hospital outpatient surgeries had an unplanned hospital visit within 7 days, which varies depending on whether the patient has his/her surgery at a lower-quality versus higher-quality hospital.
- The developer further cites evidence of common preventable causes of hospitalizations following outpatient surgery, including surgical errors, post-operative pain, infection, nausea, urinary retention, and vomiting. All of which, the devleoper posits within its logic model, can be acted upon for quality improvement. Data form the 2022 calendar year show that among the 3,817 hospitals included in the measure, the risk-standardized hospital visit ratios ranged from 0.44 to 3.31 (better quality = lower score), indicating wide variation still exists that warrants a performance measure. The developer also showed variation in performance categories across facilities finding 220 facilities (5.8%) performed “Better than Expected,” 2,427 facilities (63.6%) performed “No Different than Expected,” and the remaining 229 facilities (6.0%) performed “Worse than Expected.”
Limitations:
- It is unclear whether patients value the measured outcome. The developer notes there are no other publicly available measures that report this information and emphasizes this measure addresses an important quality measurement area and enhances the information available to patients choosing among HOPDs that provide same-day outpatient surgery. The developer further notes that during development, it convened a technical Expert Panel (TEP), including patient representatives, to assess the measure’s face validity.
Rationale:
- There is a business case for the measure, and although it is unclear whether patients value the measured outcome, the developer provides supporting evidence for the importance of the measured outcome. This includes summarizing actions providers can do to improve the outcome. Additionally, a gap in care remains that warrants this measure.
Feasibility Acceptance
Feasibility AcceptanceStrengths:
- This is a claims-based measure. The devleoper states there are no fees, licensing, or other requirements for using this measure.
Limitations:
None
Rationale:
- This is a claims-based measure without any fees, licensing, or other requirements for its use.
Scientific Acceptability
Scientific Acceptability ReliabilityStrengths:
- The measure is well-defined and precisely specified.
- Overall reliability among facilities with at least 30 procedures is 0.86 and ranges from 0.60 (decile 1) to 0.98 in decile 10 (minimum not reported).
- The developer performed signal-to-noise reliability testing and reported results overall and limited to facilities with at least 30 procedures for public reporting based on requirements for the Hospital Outpatient Quality Reporting program (2,786 facilities and 1,194,500 procedures, which accounts for 99.2% of total procedures).
- Data come from CY 2022 (January 1, 2022 - December 31, 2022)
Limitations:
- A small proportion of facilities with at least 30 procedures (<10%) have a signal-to-noise reliability below the threshold of 0.6.
Rationale:
- Reliability testing of data elements and accountable entity-level reliability were performed. A large majority of facilities have a reliability which exceeds the accepted threshold of 0.6 and less than 10% of facilities are below the threshold.
Scientific Acceptability ValidityStrengths:
- The developer conducted validity testing of the measure score by examining the relationship between the hospital outpatient surgery measure scores with facility-level procedural volume. The devleoper examined this association by plotting hospital outpatient surgery measure scores (RSHVRs) within quintiles of facility-level procedural volume. The developer hypothesized that there would be a weak, negative correlation between facility-level volume and hospital outpatient surgery scores. The developer found median RSHVRs decline across quintiles, especially for the three highest-volume quintiles, which indicates as facility volume increases, the results show a trend toward lower measure scores (lower=better quality). The correlation coefficient between facility-level procedural volume and the hospital outpatient surgery measure score was -0.18 (p-value: <0.0001). The developer did not indicate any missingness of data. The measure is risk adjusted on 27 clinical, procedural, and demographic risk factors. The developer constructed this risk model using a well-defined conceptual model, which also examined social risk factors, including dual eligibility and the Area Deprivation Index (ADI) variable. The developer did not include these social risk factors in the final model and reported a c-statistic of 0.693 indicating good model discrimination.
- The measure is stratified by dual-eligibility status in confidential reporting; see Equity section for details on testing social risk factors.
Limitations:
None
Rationale:
- The developer conducted validity testing of the measure score by examining the relationship between the hospital outpatient surgery measure scores with facility-level procedural volume. The devleoper examined this association by plotting hospital outpatient surgery measure scores (RSHVRs) within quintiles of facility-level procedural volume. The developer hypothesized that there would be a weak, negative correlation between facility-level volume and hospital outpatient surgery scores. The developer found median RSHVRs decline across quintiles, especially for the three highest-volume quintiles, which indicates as facility volume increases, the results show a trend toward lower measure scores (lower=better quality). The correlation coefficient between facility-level procedural volume and the hospital outpatient surgery measure score was -0.18 (p-value: <0.0001). The developer did not indicate any missingness of data. The measure is risk adjusted on 27 clinical, procedural, and demographic risk factors. The developer constructed this risk model using a well-defined conceptual model, which also examined social risk factors, including dual eligibility and the Area Deprivation Index (ADI) variable. The developer did not include these social risk factors in the final model and reported a c-statistic of 0.693 indicating good model discrimination.
- The measure is stratified by dual-eligibility status in confidential reporting; see Equity section for details on testing social risk factors.
Equity
EquityStrengths:
- Developers evaluated disparities for dual eligibility (DE) and area deprivation index (ADI).
- DE patients had a significantly higher visit rate than non-DE patients, unadjusted; measure is stratified by DE in confidential reporting, using both within-facility (outcome for a patient with a risk factor vs. without at the same facility) and across-facility disparity methods (outcome for a patient with the risk factor at that facility relative to an average facility).
- Patients with ADI were found to have a slightly higher rate of hospital visits than those without, unadjusted; measure is not stratified by ADI.
Limitations:
None
Rationale:
- Developers evaluated disparities for dual eligibility and area deprivation index, and identified a substantially higher hospital visit rate for patients with DE vs. non-DE patients. The measure is stratified by DE for confidential reporting using two methods: within-facility (outcome for a patient with DE vs. without at the same facility); and across-facility (outcome for a patient with DE at that facility relative to a patient with DE at an average facility).
Use and Usability
Use and UsabilityStrengths:
- Measure is currently in use in the Hospital outpatient quality reporting program (HOQR), a pay for reporting program, and facilities receive detailed patient-level reports for unplanned visits and their performance relative to state and national benchmarks.
- Developer claims providers and patients can benefit from this measure because hospital visits after surgery may not be apparent when they take place in the ED or a different hospital, which may understate the true rate.
- Given the breadth of this measure, developer cites is a wide range of clinical actions providers can take to reduce the hospital visit rate following surgery reflecting common issues; as far as actions entities can take, the developer suggests providing support for identifying and managing patient-level risk factors, e.g., for patient with diabetes, to help prevent hypoglycemia.
- Feedback on the measure can be submitted via Quality Net; since the last submission, developer reports receiving only basic questions about the measure and no suggestions for respecification; developers describe the annual process for reviewing and updating ICD-10, CPT, and HCPCS codes, which recently resulted in a change in coding the date of the ED claim
Limitations:
- Developer explains that the list of covered surgeries changes from year to year as more procedures are approved for the ambulatory surgical setting, and these changes can impact measure scores and make it difficult to track improvement; specifically, COVID accelerated migration of surgical procedures to outpatient, such as TKA/THA (no unexpected findings other than this issue).
Rationale:
- The measure is currently in use in the HOQR and facilities receive detailed patient-level reports for unplanned visits and their performance relative to state and national benchmarks. While there are many actions clinicians can take to reduce unplanned visits, the developers identify just one QI strategy, tracking and managing patient-level risk factors that affect unplanned visits following surgery (details not provided). Feedback on the measure is gathered through Quality Net and no comments suggested respecification.
- A challenge for this measure is the changing landscape of procedures approved for outpatient settings in part as a result of COVID, which complicates an evaluation of changes in performance over time.
Summary
N/A
-
No additional comments
Importance
ImportanceThe developers provided evidence supporting the importance of the measure, including the variability in the results across hospital departments and potential interventions. Although public comments questioned the risk that such an aggregate measure including a variety of different surgical procedures may be challenging to interpret, the risk would not over-ride the benefit of having such a summarized measure as a starting point in allowing a comparison of hospital outpatient department performance.
Feasibility Acceptance
Feasibility AcceptanceMeasure uses publicly available claims and other demographic and socioeconomic data sources.
Scientific Acceptability
Scientific Acceptability ReliabilityBoth reliability and validity are supported by the developer. Reliability testing was performed with reasonable results.
Scientific Acceptability ValidityBoth reliability and validity are supported by the developer. Validity was supported in comparison with procedure volume (expecting an inverse relationship - high volume centers with lower rates)
Equity
EquityThe developer reported results based on dual eligibility status and area deprivation index
Use and Usability
Use and UsabilityThe measure is currently in use in the HOQR. Although the results represent an aggregation of a variety of surgical procedures (cited as a limitation in the public comments), the balance of which may influence the final rate, the summarized measure still provides a high-level view of hospital outpatient department performance allowing an initial comparison.
Summary
No additional comments
Overall Summary
Importance
ImportanceEvidence provided support for measurement of the desired outcomes
Feasibility Acceptance
Feasibility AcceptanceClaims based, no issues identified for reporting
Scientific Acceptability
Scientific Acceptability ReliabilityReliability testing of data elements and accountable entity-level reliability were performed. A large majority of facilities have a reliability which exceeds the accepted threshold of 0.6 and less than 10% of facilities are below the threshold.
Scientific Acceptability ValidityThe developer conducted validity testing of the measure score examining this association by plotting hospital outpatient surgery measure scores (RSHVRs) within quintiles of facility-level procedural volume. The developer hypothesized that there would be a weak, negative correlation between facility-level volume and hospital outpatient surgery scores. The developer found median RSHVRs decline across quintiles, especially for the three highest-volume quintiles, which indicates as facility volume increases, the results show a trend toward lower measure scores (lower=better quality). The correlation coefficient between facility-level procedural volume and the hospital outpatient surgery measure score was -0.18 (p-value: <0.0001). The developer did not indicate any missingness of data. The measure is risk adjusted on 27 clinical, procedural, and demographic risk factors. The developer constructed this risk model using a well-defined conceptual model, which also examined social risk factors, including dual eligibility and the Area Deprivation Index (ADI) variable. The developer did not include these social risk factors in the final model and reported a c-statistic of 0.693 indicating good model discrimination
Equity
EquityDevelopers evaluated disparities for dual eligibility (DE) and area deprivation index (ADI).DE patients had a significantly higher visit rate than non-DE patients, unadjusted; measure is stratified by DE in confidential reporting, using both within-facility (outcome for a patient with a risk factor vs. without at the same facility) and across-facility disparity methods (outcome for a patient with the risk factor at that facility relative to an average facility).Patients with ADI were found to have a slightly higher rate of hospital visits than those without, unadjusted; measure is not stratified by ADI
Use and Usability
Use and UsabilityThe measure is currently in use in the HOQR and facilities receive detailed patient-level reports for unplanned visits and their performance relative to state and national benchmarks. While there are many actions clinicians can take to reduce unplanned visits, the developers identify just one QI strategy, tracking and managing patient-level risk factors that affect unplanned visits following surgery (details not provided). Feedback on the measure is gathered through Quality Net and no comments suggested respecification
Summary
Developer has met all submission criteria and measure is ready to continue use in the field
Overall Summary for Measure # 2687
Importance
ImportanceThe developer describes a 2022 study on the same population (Medicare fee-for-service beneficiaries) that shows ~8% of hospitals outpatient surgeries were followed by an unplanned hospital visit. The study also reports that if a patient received surgery at a low quality vs high quality hospital, risk of unplanned hospital visit within 7 days increases by 29%. The developer also describes common causes of preventable return visits and actions providers could take to reduce return visits.
Performance on the measure was included demonstrating variation in performance continues for this measure (prior submission in 2020). A TEP was convened including patient representatives. Twelve of the thirteen TEP members moderately or strongly agreed that the measure can be used to distinguish between better and worse quality.
These demonstrate a benefit to the measure and an adequate business case (reducing adverse events), and a gap in performance for the target population. Patient input was provided via the TEP.
Feasibility Acceptance
Feasibility AcceptanceThe measure is a claims-based measure and is feasible to collect as the system to collect data and calculate the measure is an automated process using electronic standardized data already routinely generated for billing purposes.
Scientific Acceptability
Scientific Acceptability ReliabilityThe developer presented signal-to-noise reliability testing for facilities with at least 30 procedures for calendar year 2022 where the lowest decile had reliability = 0.60 meeting the threshold of 0.6. A small portion of hospitals did not meet the threshold (<10%).
Scientific Acceptability ValidityThe developer presented the correlation between the measure and a related performance measure (surgical volume). There was an overall trend toward improved outcomes (measure in which lower is better) with increasing volume (higher volume is associated with better outcomes; higher is better) with a correlation coefficient of -0.18, p-value = <0.0001. The measure is risk adjusted on patient characteristics, comorbidities, work related value units, and surgical body system. The model has a c-statistic of 0.693 showing good model discrimination. The developer shared a decile analysis of the predictive ability with a wide range between lowest and highest decile indicating the ability to distinguish between low- and high-risk patients. Risk adjustment of social risk factors was tested using dual eligibility, AHRQ SES index, and the Area Deprivation Index. Although significant in the unadjusted model, adding these factors to the model had a small impact on the full model including risk adjustment for patient factors and clinical factors.
Equity
EquityRisk adjustment of social risk factors was tested using dual eligibility, AHRQ SES index, and the Area Deprivation Index. Although significant in the unadjusted model, adding these factors to the model had a small impact on the full model including risk adjustment for patient factors and clinical factors. Confidential reports stratified on dual eligibility are available to hospitals to provide information on disparities. These include data to allow hospitals to determine whether within their facility there are health disparities based on dual eligibility and to assess their performance in dual eligible populations compared to other hospitals.
Use and Usability
Use and UsabilityThe measure is currently in use in the CMS Outpatient Quality Reporting program. Hospitals are provided with confidential reports that provide insight into performance and include data stratified by dual eligible status to inform health disparities work.
Summary
The developer is seeking review of an already endorsed measure. The overall merits of the measure remain the same. However, the developer may consider how this measure aligns with measures in use in the CMS Ambulatory Surgery Center Quality Reporting program and whether there may be updates to the specifications or new measures that allow for across-setting comparisons.
Public Comment Objections
Importance
ImportanceI would like to see some of the objections in the public comment addressed
Feasibility Acceptance
Feasibility AcceptanceThe measure has been in use for a number of years.
Scientific Acceptability
Scientific Acceptability ReliabilityI would like to hear the methodological objections in the public comment addressed
Scientific Acceptability ValiditySame comment as above
Equity
EquityNo comments here
Use and Usability
Use and UsabilityI do think the measure meets these requirements
Summary
I do think here should be some discussion of the objections raised in the public comment and whether the comments can be incorporated to make a stronger measure, if the comments are deemed valid
While this measure is…
Importance
ImportanceThe lack of tracking by type of surgery and by reason for readmission detracts from the value to the patient. Also, it is not clear that readmission is the only appropriate measure of surgical success. This measure will not track if a patient has negative outcomes but goes to their primary care physician instead of the surgeon for treatment. Is there a potential for surgeons to focus on outpatient treatment for patients with issues rather than readmit them to maintain their performance average?
Feasibility Acceptance
Feasibility AcceptanceAll data required as specified in this proposal is in the EMR
Scientific Acceptability
Scientific Acceptability ReliabilityWith the type of surgical procedures done changing so rapidly it is unclear that this data will be valid for year to year or institution to institution comparisons.
Scientific Acceptability ValidityIt is not clear that this is measure is viable due to the multiple variables not addressed
Equity
EquityFailure to publicly share significant DE information doesn’t help address this issue
Use and Usability
Use and UsabilityIt is not clear that all factors effecting patient satisfaction are addressed.
Summary
While this measure is currently in use it is not clear that it is fair to surgeons and institutions or that it provides the information desired by the patient. By combining all types of surgeries and by counting all admissions after outpatient procedures, even if not specifically tied to the procedure the data seems distorted. Lack of diversity data and lack of data on post surgical issues addressed by other MDs than the surgeon or addressed without readmission is also a significant issue.
Not met but addressable
Importance
ImportanceMeasuring unplanned visits within 7-days with a diagnosis indicative of a complication of care of the prior surgery is an important metric.
However, the current measure lacks in certain areas in specificity. For example, as one public comment noted, mortalities that did not happen in the hospital setting are completely excluded and so are visits to urgent care centers with a diagnosis indicative of a complication of care.
Additional specifications for ED visits within 7 days should be made to create a stronger measure. Either exclude ED and urgent care visits altogether or only include visits that can be tied to complications of the prior surgery. This also avoids penalizing overcautious patients and physicians who want to prevent bigger issues that would warrant an inpatient admission.
Also, the rationale why a second procedure within 7-days is not an undesirable event but rather coded as a separate index event is puzzling.
Feasibility Acceptance
Feasibility AcceptanceAgree with staff assessment
Scientific Acceptability
Scientific Acceptability ReliabilityAgree with staff assessment
Scientific Acceptability ValidityAgree with staff assessment
Equity
EquityAgree with staff assessment
Use and Usability
Use and UsabilityThe measure still seems too broad to be broadly usable. The creation of procedure classification categories and separate models for each category could be beneficial to improve the model and gain more broad support for this measure.
Summary
Measuring unplanned visits within 7-days with a diagnosis indicative of a complication of care of the prior surgery is an important metric.
However, the current measure lacks in certain areas in specificity. For example, as one public comment noted, mortalities that did not happen in the hospital setting are completely excluded and so are visits to urgent care centers with a diagnosis indicative of a complication of care.
Additional specifications for ED visits within 7 days should be made to create a stronger measure. Either exclude ED and urgent care visits altogether or only include visits that can be tied to complications of the prior surgery. This also avoids penalizing overcautious patients and physicians who want to prevent bigger issues that would warrant an inpatient admission.
Also, the rationale why a second procedure within 7-days is not an undesirable event but rather coded as a separate index event is puzzling.
Consider creating procedure categories for better model performance and support.
NA
Importance
ImportanceAre ASC's included in this metric?
How does this measure ensure attribution of visit to the procedure location vs the hospital visit location?
Feasibility Acceptance
Feasibility AcceptanceNA
Scientific Acceptability
Scientific Acceptability ReliabilityNA
Scientific Acceptability ValidityNA
Equity
EquityIf this measure does not include the ASC procedure location then there could be equity issues between insured vs uninsured.
Use and Usability
Use and UsabilityNA
Summary
NA
Mixed reliability and validity findings
Importance
ImportanceThis is without a doubt an important measure that can drive quality improvements by reducing adverse patient outcomes associated with preparation for same-day surgery.
Feasibility Acceptance
Feasibility AcceptanceThis measure is feasible by definition since it is relying on the claims data
Scientific Acceptability
Scientific Acceptability ReliabilityTechnically, this measure has an acceptable signal-to-noise reliability. However, the signal-to-noise approach may not be appropriate for this measure. Since this measure utilizes a hierarchical linear regression model to compute the risk-standardized hospital visit ratio (RSHVR), the RSHVRs of the small hospitals are smoothed closer to the national mean. As the performance of the small hospitals will be close to the national mean, both the hospital-specific noise variances and the overall signal variance may be biased and won't reflect the true variation in the measure scores within each hospital (noise variance), and the differences between hospitals (signal variance). I think that for this measure, a test-retest or a split-sample approach would be methodologically more sound than a signal-to-noise method. I am also inclined to think that because of the smoothing, the reliability of the measure may be somewhat inflated.
Scientific Acceptability ValidityThe empirical validity for this measure does not support the convergent validity of the measure. The correlation between this measure and the criterion measure is very weak and not statistically significant (0.033; p=0.07). I believe that a hypothesis-driven validity method (aka known-group validity) should allow the developers to support the empirical validity of the measure.
Equity
EquityI appreciate that the developers used with within- and between-hospital comparison method to stratify the measure. This is consistent with the Congress IMPACT Act, and the ASPE recommendations.
Use and Usability
Use and UsabilityI agree with the developers that this measure is both usable and useful for driving quality improvements. It has high potential for having an impact on the population health.
Summary
Overall, this is a solid measure, but there's somewhat mixed evidence about its reliability and validity. I have little doubt that the measure is reliable and valid, but I think that the developers could have used a more robust approach to establish the scientific acceptability of the measure.
Endorsement Should Be Removed
Importance
ImportanceThe poor reliability of the measure means that the vast majority of hospitals are either rated as “no different than expected” or “number of cases too small,” and the problems with the measure definition and risk adjustment make it impossible to know whether hospitals that are “worse than expected” actually deliver lower quality care or simply have a different mix of patients or procedures.
A patient who is choosing where to have surgery wants to know whether a hospital delivers high-quality care for that specific type of surgery. Yet even for the most frequent types of surgery, this measure does not calculate quality separately for specific types of surgery; instead, it averages the quality of care together with almost all other types of surgery and some non-surgical procedures. The Measure Information Form claims that the measure “makes visible to clinicians and hospitals meaningful quality differences and incentivizes improvement,” but since the measure provides no information on which procedures contribute to a low score, there is no way for clinicians or hospitals to know what is needed for improvement. Moreover, even a hospital that receives a score that is “better than expected” or “no different than expected” could have serious quality problems with subsets of procedures, but these problems would be hidden within the aggregate measure score.
Use of the measure could result in worse care for patients. Therefore, it is important that this measure not be used for either public reporting or payment.
Feasibility Acceptance
Feasibility AcceptanceIt is feasible to collect the data and calculate the measure. However, the results using the data that would be collected do not produce a valid measure of quality or a reliable way of comparing hospitals.
Scientific Acceptability
Scientific Acceptability ReliabilityThis is not a reliable measure of the quality or efficiency of hospital outpatient surgery.
Although more than 1,000 hospitals had Risk-Standardized Hospital Visit Ratios (RSHVRs) that were greater than 1.05 (i.e., more than 5% higher than expected) and over 400 had visit ratios over 1.20, the Measure Information Form states that only 229 hospitals could be classified as “worse than expected” because of the high degree of uncertainty associated with the ratio. Similarly, even though more than 1,000 hospitals had RSHVRs below 0.90 (i.e., 10% better than expected), only 220 could be classified as “better than expected.” Despite the expansive definition of the denominator, 941 hospitals had too few cases (i.e., less than 30 cases) to determine a category at all.
The proportion of hospitals that are classified as either better or worse than expected is even lower than this in the data that have been publicly reported on the CMS Hospital Compare site. In 2023, only 88 hospitals were classified as “worse than expected” and only 84 hospitals were classified as “better than expected,” out of a total of 3,672. 95% of hospitals are either “no different than expected” or have too few cases to report a classification.
The Measure Evaluation Form cites the median facility-level reliability of the measure as evidence that the measure meets a minimum standard of reliability. However, the reliability of any measure is higher if there are more cases being measured, so the median reliability will be higher if more large hospitals are being measured. If a measure is going to be used to classify individual hospitals, what matters is the reliability of the measure for individual hospitals, not the median reliability for all hospitals.
Table 6 in the Measure Information Form shows that the reliability of the measure is unacceptably low for almost half of hospitals, and even excluding hospitals with fewer than 30 cases, Table 6A shows that the measure is unreliable for at least 10-20% of hospitals. (The reliability threshold of 0.6 that is used by CMS does not represent an acceptable level of reliability for a measure that is used for determining hospital payments or guiding patient treatment choices.)
A minimum of 30 cases for assessing performance is inappropriately low for this measure because the denominator of the measure is so broad. Since there are hundreds of different types of surgeries that can potentially be included in the denominator, and a wide range of different patients who could be receiving those procedures, 30 cases at one hospital could represent a completely different set of procedures and patients than 30 cases at another hospital, and the 30 cases at a hospital this year could look completely different than the cases the previous year. As a result, a hospital’s score may change from year to year simply because of the types of cases that happen to be included each year, not because of changes in the quality of care delivery, and one hospital’s score could be much higher or lower than another’s simply because of the mix of cases each happened to have that year, not because one hospital delivers higher quality care than the other.
The measure developers do not appear to have made any effort to examine the individual hospitals that are classified as “better” or “worse” than expected to ensure those classifications are not based on artifacts of the measure definition or the risk adjustment methodology. Neither does it appear that they have analyzed changes in hospitals’ performance on the measure from one year to the next to assess the measure’s reliability over time. These types of analyses are essential for a measure that is publicly reported and used for payment.
Scientific Acceptability ValidityThis is not a valid measure of the quality or efficiency of hospital outpatient surgery due to problems with the numerator, the denominator, and the risk adjustment methodology.
Problems With the Numerator
The stated goal of the measure is to reduce adverse patient outcomes associated with outpatient surgery performed in a hospital. However, the numerator includes events that have nothing to do with the surgery, while excluding events that represent undesirable outcomes and avoidable costs.
- Inclusion of Unrelated Visits. The numerator includes any visit to an Emergency Department and any hospital admission that occurs within 7 days following the outpatient surgery, regardless of whether the ED visit or hospital admission had anything to do with the surgery. Research has shown that a significant subset of ED visits following outpatient surgery are for reasons unrelated to the surgery; these visits may be due to an accident, a new illness unrelated to the surgery, or an exacerbation of a chronic condition. Including such visits overestimates the true complication rate for outpatient surgeries.
- Exclusion of Related Visits. The numerator does not include visits to urgent care centers or physician offices that are made to evaluate or treat complications of the outpatient surgery. These visits can involve the same types of complications and services that are addressed in a subset of ED visits, so excluding them underestimates the true complication rate for outpatient surgeries.
- Failure to Measure Mortality. The numerator does not include deaths resulting from the outpatient surgery unless they occurred during an ED visit or hospital admission. Not only is death obviously a serious complication, a hospital with a higher proportion of deaths could appear to have better performance on the measure. Although the rate of death after outpatient surgery is very low overall, it could be a factor for hospitals that perform a small number of procedures on patients who are older and have more comorbidities. The Measure Information Form does not discuss this issue, nor does it explicitly indicate whether patients who die are included or excluded from the measure denominator. Patients who do not have continuous enrollment in Medicare in the 7 days after surgery are excluded, which may exclude most patients who die at home.
Other studies have found that a large fraction of post-surgical visits are not related to the index surgery:
- In a study of unplanned hospital admissions after surgery, more than half of the admissions for elderly patients were not related to the procedure. Older seniors had a higher proportion of admissions that were unrelated to the procedure. (De Oliveira Jr. GS, et al. “Older Adults and Unanticipated Hospital Admission Within 30 Days of Ambulatory Surgery: An Analysis of 53,667 Ambulatory Surgical Procedures,” Journal of the American Geriatric Society 63:1679-1685.)
- In a study of emergent visits following outpatient orthopedic surgery, in which clinical data regarding visits were reviewed, nearly one-third (31.4%) of the visits were found to be unrelated to the surgery. (Williams BR, et al. “Unplanned Emergency and Urgent Care Visits After Outpatient Orthopaedic Surgery,” Journal of the AAOS Global Research & Reviews 5(9).)
- In a study of ED visits after hand surgery, only 36% of visits were found to be directly related to the procedure. (Menendez ME and Ring D. “Emergency Department Visits After Hand Surgery Are Common and Usually Related to Pain or Wound Issues,” Clinical Orthopaedics and Related Research 474:551-556.)
The inclusion of so many visits unrelated to the surgery creates two different problems.
- First, it dilutes the effect of differences between hospitals that are related to surgery. For example, if the rate of visits that are actually related to the surgery are 50% higher at hospital A as at hospital B, but only half of the visits at the hospitals are related to surgery, then the overall visit rate will only be 25% higher at hospital A than at hospital B (0.5 + 0.75)/(0.5 + 0.5) = 1.25.
- Second, differences between hospitals in the rate of visits that are unrelated to surgery may cause hospitals to appear to have higher or lower quality care than they actually do. For example, a patient may come to the ED for an exacerbation of a chronic condition such as heart failure or COPD. Rates of ED visits for chronic disease exacerbations will be higher in communities where chronic disease management services are less available, and patients who have lower incomes or lack of family support may have greater difficulty accessing the services that do exist. In these types of communities and for these types of patients, there will likely be more post-surgical ED visits for chronic disease exacerbations that are unrelated to outpatient surgery. The risk adjustment methodology only controls for the presence of a chronic condition; it does not control for differences in the care patients receive for the chronic condition. Care for the chronic condition is not the responsibility of the physicians or hospital staff who delivered the outpatient surgery, but under this measure, a higher rate of ED visits for problems related to chronic conditions will be interpreted as poor quality care for the outpatient surgery. Similarly, a patient may come to the ED for an acute condition that is unrelated to the surgery (e.g., a viral infection). A hospital in a community experiencing high rates of such illnesses could have higher rates of post-surgical visits, and this measure would make it appear that the hospital delivers lower quality surgical care. The risk adjustment used in the measure controls for the presence of chronic conditions, but it does not control for the frequency of new acute conditions.
Problems With the Denominator
The denominator for this measure can include over 2,000 different procedures performed in almost any part of the body, including such diverse procedures as knee replacements, hysterectomies, and repairs of ear drum punctures. These different procedures will be performed by completely different surgeons, and many of the hospital staff involved in the procedures will likely be different. Yet this measure combines all of those procedures together and calculates one single outcome measure for all of them.
If the hospital has a higher-than-average rate of post-surgery visits for some types of procedures and a lower-than-average rate for others, the overall rate of post-surgery visits will depend on both (1) the relative numbers of the two types of procedures and (2) the rates of post-surgery visits for each type of procedure. As a result, the hospital may be reported as having average performance overall even if it has high rates of post-surgery visits for specific types of procedures, and it could potentially be reported as having poor performance overall if it has a higher-than-average rate of visits on certain types of high-volume procedures, even though it performs well on many others.
It is not necessary to lump all of these disparate surgeries together in order to measure the frequency of hospital visits for complications; in fact, in the Ambulatory Surgical Center Quality Reporting (ASCQR) Program, there are separate measures of post-surgical visit rates for orthopedic procedures and urology procedures, and in 2024, there will be a third measure of post-surgical visit rates for general surgery procedures. Because the hospital measure combines visit rates for orthopedic and urology procedures with all other surgeries, it is impossible for a patient or anyone else to compare the post-surgical visit rates at hospital outpatient departments (HOPDs) and Ambulatory Surgery Centers (ASCs) for those procedures.
Problems With the Risk Adjustment Methodology
The measure does not report the actual rate of post-surgical visits at a hospital; it reports a ratio of the “predicted” number of visits to the “expected” number of visits. Both of these parameters are calculated using a hierarchical logistical regression model that is intended to adjust for differences in the characteristics of the patients receiving surgery that affect outcomes. However, there are serious flaws with the model that is used:
- Failure to adjust for outpatient surgeries performed in other facilities. Many, if not most, of the same types of outpatient surgeries are performed in Ambulatory Surgery Centers (ASCs) and some are performed in physician offices. Because a hospital has greater capabilities to address complications that occur during surgery than an ASC or physician office, patients whom physicians believe to be at higher risk of such complications will be more likely to have the surgery performed in a hospital than in alternative facilities. As a result, the rate of hospital visits following surgery will likely be higher for patients receiving surgery at a hospital. For example, one study found that the rate of ED visits following outpatient procedures was 2.5 times as high for procedures performed in a hospital than in either an ASC or physician office, and the rate of inpatient hospital admissions following surgery was more than 4 times as high. Another study found that the relative risk of unplanned hospital visits following ambulatory surgery was 1.37 for surgeries performed in a hospital vs. an ASC. The availability of alternative surgery facilities varies significantly across communities, which means the proportion of all ambulatory surgeries performed in hospitals will also vary significantly. For example, there are fewer ASCs in states with Certificate of Need laws, and there are fewer ASCs in small communities and rural areas simply because of the smaller number of patients. As a result, hospitals in states and parts of states where there are more ASCs will have a smaller and higher-risk group of outpatient surgery patients than other hospitals do, and so they will likely have higher rates of unplanned hospital visits after surgery. These differences will not be fully captured by the patient characteristics included in the risk model, and the measure does not adjust for the proportion of surgeries performed at a hospital, so hospitals that perform a smaller proportion of surgeries could inappropriately appear to be delivering lower-quality care.
- Failure to adjust for propensity of patients to visit the hospital for other health problems. Patients with severe or poorly managed chronic conditions are more likely to have ED visits and hospital admissions for exacerbations of those conditions than other patients with the same chronic conditions, and these exacerbations and visits can occur at any time, including during the week after an outpatient surgical procedure. Because the numerator of the measure includes almost any unplanned visit to the hospital after surgery, not just visits that are related to the surgery, these chronic disease-related visits will increase the rate of unplanned visits at a hospital. Although some types of chronic conditions are included as risk adjustment factors in the model used for the measure, the presence of a condition does not indicate how severe it is or how well managed it is. One of the earliest studies of outcomes following outpatient surgery adjusted for this by including a factor measuring a patient’s hospital utilization prior to surgery. However, no such adjustment is included in the risk adjustment model for this measure. As a result, a hospital that has more patients with poorly managed chronic conditions will inappropriately appear to be delivering lower-quality care. Other patient characteristics that could affect the likelihood of post-surgery hospital visits include whether the patient is in a Skilled Nursing Facility, is receiving home health care, or is on hospice, but no adjustment is made for these factors and there is no indication in the Measure Information Form that they were even examined.
- Failure to adjust for the type of outpatient surgery performed. The rate of post-surgery visits varies significantly across different types of surgery. This is one of the reasons why it would be preferable to calculate visit rates separately for major categories of surgery, as is done in the Ambulatory Surgical Center Quality Reporting (ASCQR) Program. However, if a single aggregate measure is going to be reported, it is essential to adjust for the types of surgeries that are actually performed at a hospital, so that a hospital is not classified as delivering poor-quality care simply because it is delivering more surgeries that have inherently higher risk. Despite the desirability and feasibility of adjusting for the type of surgery performed, this measure does not do so. Instead, it includes factors for “body system operated on” and the number of Work Relative Value Units (wRVU) assigned to the CPT code for the procedure performed. The Measure Information Form provides no justification for failing to adjust for the type of procedure; the Technical Report that is cited as a reference indicates that the use of body system and wRVU are “similar” to the risk adjustment system used in the American College of Surgeons’ National Surgical Quality Improvement Program (NSQIP). However, the NSQIP system uses a combination of groups of procedures (CPT codes) and RVUs, not broad “body systems” and wRVUs. No analysis is provided showing that the body system and wRVU are adequate substitutes for stratifying by procedure, or that these variables have a linear relationship to outcomes.
- Failure to adjust for important patient characteristics affecting surgery outcomes. The patient characteristics used for risk adjustment exclude a number of factors that have been found to be significant in other studies of post-surgical complications and exclude factors that are used in the measures of post-surgical complications in the Ambulatory Surgical Center Quality Reporting (ASCQR) Program. For example, tobacco use, opioid use, chronic anticoagulant use, and kidney disease are used as risk adjustment factors in some or all of the measures for orthopedic surgery, general surgery, and urology procedures in ASCs, but they are not included in the risk adjustment model for this measure; it appears that some were not even examined during development of the measure. This means that a patient with one or more of these characteristics would be assigned a lower risk score if they receive their surgery in a hospital rather than an Ambulatory Surgery Center.
- Assumption of common risk factor weightings for all procedures. Not only does the risk adjustment model exclude many important risk factors, it uses one set of variables and weights for all procedures, which implicitly assumes that the same patient characteristics affect every type of surgery in exactly the same way. In the ASCQR Program, there are separate measures for orthopedic surgery and general surgery, and they use different variables for risk adjustment; even for the variables that are used in both measures, the weights are different. The weights used in the hospital measure were estimated using data on all types of surgery combined, so even where the same risk adjustment factor is used in the hospital measure as in the ASC measure for the same procedure, the weighting will be different. This means that a patient with that risk factor will be assigned a different risk score depending on whether they receive a procedure in a hospital or in an Ambulatory Surgery Center.
Inadequate Evaluation of Risk Adjustment Model Performance
Because of the large number of different types of procedures included in the model, the variation in the proportion of total procedures performed at each hospital, and the variation in patient characteristics across communities, as well as the problematic assumptions described above, a thorough analysis is needed to determine how well the regression model in this measure adjusts for expected variations in outcomes and factors that are beyond the control of a hospital. The simplistic analysis provided in the Measure Information Form is not adequate to justify continued use of this measure.
The developers claim that the c-statistic for the model (.693) indicates “good model discrimination.” What the c-statistic means is that there is an approximately 30% chance that a patient who has a post-surgical hospital visit will be classified as lower risk than a patient who does not experience a visit. (The Measure Information Form says that “observed hospital visit ratios were compared to predicted hospital visit probabilities” in order to calculate the c-statistic, but presumably this means observed hospital visit rates, not the visit ratios calculated for the measure.) It is impossible to determine whether this value of the c-statistic is good or bad. The reason for creating the measure is a presumption that some procedures at some hospitals have outcomes that are better or worse than expected, so the model should not be expected to predict 100% of outcomes if it is using legitimate risk adjustment variables for the procedures and a single random effect variable for the hospital. However, it matters whether the 30% of cases that have a post-surgical visit and are classified as low risk are truly low risk (in which case, the post-surgical visits are more likely to represent poor quality care at the hospital where they received the procedure), or whether the 30% of cases are actually high risk (in which case, their post-surgical visits are more likely to represent an expected outcome of the procedure regardless of which hospital delivered it). There is no information in the Measure Information Form indicating whether the accuracy of prediction was assessed in detail or what the results were.
The developers also state that the ability to distinguish high-risk subjects from low-risk subjects is important and claim that “for a model with good predictive ability we would expect to see a wide range in hospital visit ratios between the lowest decile and highest decile.” It is not clear why this is true; the ability to distinguish high-risk subjects from low-risk subjects should be assessed by determining whether the model accurately predicts the procedures that have higher and lower visit rates, not by the magnitude of the variation in the hospital visit ratios calculated for the measure. The numbers reported in the Measure Information Form under “Predictive Ability” appear to be the % of surgeries with post-surgery visits (i.e., the visit rates), not the hospital visit ratios. (The Form reports the range as 1.74 to 16.01, whereas Table 1 reports the range of the hospital visit ratios as 0.44 to 3.61.) The range of these visit rates depends on how much difference there is in the actual rates for different surgical procedures, not on the model’s predictive ability.
The Measure Information Form fails to report the Hosmer-Lemeshow statistic, which is a commonly-used measure of the differences in predictive ability for different levels of risk. In addition, there is no analysis of whether the model systematically over- or under-predicts visit rates for different procedures or for specific subgroups of patients (e.g., those with specific types of health problems). These types of prediction errors could lead to erroneous classifications of hospitals that perform significantly more or less of these procedures or have an unusually high or low number of these patients.
Equity
EquityThe weaknesses in the measure methodology could cause some hospitals to be inappropriately labeled as “worse than expected” because of the kinds of patients they treat. As a result, use of the measure creates an undesirable incentive for hospitals to avoid treating patients with characteristics that are likely to result in higher numbers of visits, such as patients with multiple comorbidities and patients who do not have good access to primary care and chronic disease management services. This would exacerbate inequities in access and outcomes.
Use and Usability
Use and UsabilityA patient who is choosing where to have surgery wants to know whether a hospital delivers high-quality care for that specific type of surgery. Yet even for the most frequent types of surgery, this measure does not calculate quality separately for specific types of surgery; instead, it averages the quality of care together with almost all other types of surgery and some non-surgical procedures. The Measure Information Form claims that the measure “makes visible to clinicians and hospitals meaningful quality differences and incentivizes improvement,” but since the measure provides no information on which procedures contribute to a low score, there is no way for clinicians or hospitals to know what is needed for improvement. Moreover, even a hospital that receives a score that is “better than expected” or “no different than expected” could have serious quality problems with subsets of procedures, but these problems would be hidden within the aggregate measure score.
Since only a very small number of hospitals are being classified as anything other than “no different than expected,” this measure is unlikely to encourage improvements in the quality of care delivery. On the other hand, since the weaknesses in the measure methodology could cause some hospitals to be inappropriately labeled as “worse than expected” because of the kinds of patients they treat, use of the measure creates an undesirable incentive for hospitals to avoid treating patients with characteristics that are likely to result in higher numbers of visits, such as patients with multiple comorbidities and patients who do not have good access to primary care and chronic disease management services. It could also cause patients to inappropriately avoid obtaining needed surgery, or to try and obtain surgery in a facility that is not adequately equipped to address their needs.
Summary
Endorsement should be removed from this measure. There is no business case for using it. It is not a valid or reliable measure of the quality or efficiency of hospital outpatient surgery due to problems with the numerator, the denominator, and the risk adjustment methodology. Continued public reporting of the results could mislead patients about where they should receive surgery, and continued use of the measure to modify hospital payments could worsen disparities in access and outcomes for patients.
NA
Importance
ImportanceThere is a need for such a measure, which was demonstrated in the literature and via expert input.
Feasibility Acceptance
Feasibility AcceptanceIt is a measure based on claims data and has little additional burden to providers.
Scientific Acceptability
Scientific Acceptability ReliabilityReliability testing level level is not sufficient for a measure related to payment.
Scientific Acceptability ValidityThe discriminatory c-statistic of 0.693 is not sufficient for a measure tied to payment. While it is similar to some other public measures, they too are insufficient in discriminating between low and high risk.
Equity
EquityI appreciate the efforts to evaluate various marginalized groups and provide reporting based on those groups; however, if it was shown that in one quartile social determinants of health are statistically associated with poorer outcomes then they should be included in the model so that those caring for more marginalized patients are not penalized for caring for those patients through the CMS payment program.
Use and Usability
Use and UsabilityThis measure is only useful for reporting compared to other facilities at a facility level. It is not useful for quality improvement. There is no ability to drill down and actually find which populations are not performing well or use it for any quality improvement activities. Additionally, the use of a hierarchical model that provides a predicted to expected value results in a metric that will change very little over time and not identify systematic variations in care. This is particularly true in those facilities that tend to be smaller populations. I would suggest changing the usability to simply public reporting.
Summary
Not sufficient for a payment related measure
Good
Importance
ImportanceThis measure holds significance as it provides insights into the quality of care delivered during outpatient surgeries. It specifically examines potential issues that may arise post-surgery, such as unexpected complications or problems.
Feasibility Acceptance
Feasibility AcceptanceThe developer mentioned there are no fees, licensing, or other requirements to use this measure as specified.
Scientific Acceptability
Scientific Acceptability ReliabilityThe reliability testing results indicate that the measure scores are reliable, valid and have met the thresholds for both of them.
Scientific Acceptability ValidityThe testing results indicate that the measure scores are reliable, valid and have met the thresholds for both of them.
Equity
EquityThe measure developers are actively addressing health care disparities and inequities by implementing a disparities stratification methodology. This methodology assesses and reports outcomes for patients with dual eligibility (DE) compared to non-DE patients. The report discusses within-hospital and across-facility approaches to evaluate outcome disparities, revealing that more hospitals exhibit worse outcomes for DE patients compared to their non-DE counterparts.
Use and Usability
Use and UsabilityThe developer mentioned that this measure is implemented by CMS for outpatient services, the Hospital OQR is a national pay-for-quality-data-reporting program mandated by the Tax Relief and Health Care Act of 2006.
Summary
N/A
too many unanswered questions to endorse
Importance
Importancecomplication rates are important to track
-- 10 years ago (time goes by) when we designed the first alternative payment models for a CMMI implementation grant - we counted for total cost of care readmissions and acute events but limited all cause events to the first 72 hrs post discharge and for procedure related events thereafter, All cause events was acceptable to our practicing community but not after that window. This is a serious design flaw and injures the face validty of the measure.
Not sure a global measure for all surgeries make sense - a facility with lots of cataracts will look different from a safety net hospital. Will this type of measure discourage clinicians from attempting outpt procedures because of "penalties" for same day admissions when a frailer patient cannot go home? The risk adjustments in the document are hard to tract to assess fairness and face validity - quite a bit of a black box.
Feasibility Acceptance
Feasibility AcceptanceAll cause readmissions: -- 10 years ago (time goes by) when we designed the first alternative payment models for a CMMI implementation grant - we counted for total cost of care readmissions and acute events but limited all cause events to the first 72 hrs post discharge and for procedure related events thereafter, All cause events in the first 72 hrs was acceptable to our practicing community but not after that window. This is a serious design issue and injures the face validity of the measure.
Scientific Acceptability
Scientific Acceptability Reliabilitythe document comes across as a black box of statistical manipulation
Scientific Acceptability Validitysee comments about all cause measurement
Equity
Equityagree with staff
Use and Usability
Use and Usabilitysee comments about all cause events after 72 hrs of the procedure
Summary
- all cause admissions for 7 days is problematic
- would like to hear more about case mix and risk adjustment for this measure
- confused why ERCP (an endoscopy procedure) would be excluded
- potential for perverse incentives to avoid outpt procedures and increase system costs
Not met
Importance
Importance- No information about number of patients and caregivers in TEP or what their comments were.
- Provided references to describe improvements initiated in past years
- Wouldn't be able to report on importance to patient/caregiver constituency
Feasibility Acceptance
Feasibility Acceptance- Agree with staff
Scientific Acceptability
Scientific Acceptability Reliability- Rapid technological, workflow, staffing, and business culture changes concern me. Developers don't address this.
Scientific Acceptability Validity- The numerator doesn't include complications treated in physician practice or outpatient settings. The developers don't address this. Patients and caregivers often hear the message to call or reach out on the portal with issues and not go to ED.
- It is hard to imagine that the summarization of many procedures included in the denominator doesn't dilute the value for clinicians and patients.
Equity
Equity- If a quarter of social determinants affect the measure outcome, the developer should further analyze. Has an impact for patients as providers may limit access.
Use and Usability
Use and Usability- Once again, I couldn't explain the value of this measure to my constituency of patient/caregiver activists. Although I thank the developers for a much more readable report, it would help if they could include a summary in plain English.
- Again, including many and changing procedures in the denominator clouds their usability.
Summary
- I'm expected to share importance of these results with patient/caregiver experts. While the report is much more digestible than in previous years, the developers could do more.
- Key barriers to Met include that developers don't address rapid changes in technology, workflow, staffing, and business culture.
- the inclusion of many and changing procedures in the denominator clouds their usability
Would not support endorsement
Importance
ImportanceThe idea that procedural complication rates are important, to payers, health systems, regulators, and patients, is clearly true. That said, for the numbers to be meaningful they would have to have much more specificity to the underlying procedure.
Feasibility Acceptance
Feasibility AcceptanceAgree with staff
Scientific Acceptability
Scientific Acceptability ReliabilityHigh degree of uncertainty leads to not being able to definitively classify as under/overperformers.
Scientific Acceptability ValidityInclusion of all cause ED visit/hospitalization is inferior to limiting to visits potentially related to procedural complications, although the 7-day window does mitigate that somewhat. Including all types of surgery in the denominator is problematic because the overall metric will be determined by whichever procedures done have the highest volume. (e.g. if surgery A has 5 complications in 5 cases (100%), and surgeries B+C have 5 complications in 1000 cases (0.5%), then the overall rate will be 10/1005 (<1%), which risks hiding poor procedural complication rates for lower volume surgeries. Surgery specific numbers would be more relevant to guiding health system QI and patient decisions.
Equity
EquityIncomplete risk adjustment will effectively include determinants as related to care. For example, if this patient population comes to the ED for their primary care, they would be much more likely to visit than a different population who has relaible access to a PCP.
Use and Usability
Use and UsabilityAs currently designed, I do not see how this provides actionable information to patients or health systems.
Summary
As designed the fatal flaw of this measure is combining all surgeries/procedures into one global metric. The other factors (like the use of all-cause visits rather than procedure complication-related, or the metric exclusions) are in principle addressable. But even if you assume everything else is fixed a global "one number" for all of a hospital's procedures and surgeries seems meaningless.
Hospital Visits after Hospital Outpatient Surgery
Importance
ImportanceThe developer has provided evidence that states that about 8% of hospital outpatient surgeries had an unplanned hospital visit within 7 days. Also note variability between high quality and low quality surgery centers. Provided data that showed differences on quality
Feasibility Acceptance
Feasibility AcceptanceThis is a claims based measure
Scientific Acceptability
Scientific Acceptability ReliabilityFor facilities with 30 or more procedures had reliability of 0.86 and had a range. Data was from 2022.
There are a small number of facilities with 30 or more procedures that in the signal to noise ratio is below the threshold of 0.6.
Scientific Acceptability ValidityOverall, the developer did address validity of data. For missingness of data this was not accounted for. Data was stratified by dual eligibility.
Equity
EquityThe developer did address equality and used dual eligible data
Use and Usability
Use and UsabilityThe measure is currently part of the HOQR program. Facilities receive detail reports of unplanned visits and their performance to state and national benchmarks.
Also feedback on the measure can be submitted via Quality net.
Summary
Overall, this is a measure that is currently in use and has valuable information to improve quality of care for hospital visit after outpatient surgery. Has also looked at data in relation to dual eligibility.
Measure summary
Importance
ImportanceThe developer has provided adequate evidence associated with the importance of the outcome. It would be a measure important to both hospitals and patients.
Feasibility Acceptance
Feasibility AcceptanceIt's a claims based measure. This data reduces provider burden. To help facilitate greater provider engagement it would be beneficial to provide the data back to providers on a monthly basis.
Scientific Acceptability
Scientific Acceptability ReliabilityData leveraged is current - CY 2022 (January 1, 2022 - December 31, 2022)
Signal to Noise ratios appear inadequate as there is more noise than signal generated from the model. There is mention of the OP-32 colonoscopy measure but no additional comparison or mention if there is duplicity between this all procedure measure vs. OP-32.
Would recommend comparing model performance for the aggregated all procedure vs. individual groups of same/similar procedures to ensure complete information is captured and variation explained by the model. Could help assess signal to noise gaps as well. I think these concerns are also reflected by the TEP responses (Moderately agree) 8 (61.5) in ability of this measure to distinguish between better and worse quality facilities.
Scientific Acceptability ValidityReally hoping to see a higher C-Statistic than 0.693. Making adjustment as highlighted in the reliability section may improve performance. Instead of having grouped procedures in 1 model, could there be better model performance by building separate models as some of the co-morbidities and other risk factors may cancel one another; thus not provide adequate model performance for individual procedures.
Equity
EquityDevelopers evaluated disparities for dual eligibility and area deprivation index, and identified a substantially higher hospital visit rate for patients with DE vs. non-DE patients. The measure is stratified by DE for confidential reporting using two methods: within-facility (outcome for a patient with DE vs. without at the same facility); and across-facility (outcome for a patient with DE at that facility relative to a patient with DE at an average facility).
Use and Usability
Use and UsabilityAs captured by the references provided, this measure offers useful insights but could be more useful with improved model performance. Recommend exploring more procedure specific models. Additionally, while the within and across-facility equity assessment is a step in providing more insights into health care disparities, it would be more helpful to provide detailed, encounter specific opportunities for providers to evaluate. This is a challenge with outcome measures and thus, why it is so important to couple process measures with addressing provider generated healthcare disparities.
Summary
Would be helpful to address the objections raised by public comment. Opportunities to refine model performance, more procedure specific models should be evaluated.
See comments on measure and need for further CMS research
Importance
Importancethe differences observed in the data in predicted/expected suggests the measure is important.
Feasibility Acceptance
Feasibility AcceptanceClaims based measure. Calculation is not a burden to facilities and has/can be coded by CMS.
Scientific Acceptability
Scientific Acceptability ReliabilityUsing the standards previously used in endorsement of CMS measures, the scientific acceptability of the reliability rating is met.
That said, the public comment received from CHQPR raises reasonable issues that should be subject to some conversation at the meeting and prior to final endorsement. These comments are principally about the risk adjustment model, which straddles the issues of reliability and validity, so I comment on them here. Among the items I would like to see addressed are:
- What is done to exclude admissions or ED visits that are unrelated to the initial surgery. The current methods rely upon risk adjustment, i.e., prior conditions generally chronic, but do not include acute conditions or circumstances that might occur unrelated to the cause of the subsequent visit. Are these taken into account, or has analysis shown they are small and add little noise to the measure.
- The risk adjustment model uses surgical site but not an indicator of surgical intensity or risk. Have there been efforts to assess how much noise this introduces and whether there are alternative measures that wuld be more predictive of complication and hospital visits.
- The comment identifies research that finds that freestanding ASCs have lower case mix and may treat patients with lower risk profiles for complications. To what extent is there evidence that the risk adjustment model adequately controls for this?
- Any other responses to the public comment from CHQPR?
Scientific Acceptability ValidityThe validity beyond the risk adjustment seems fine.
The TEP face validity check is okay.
The low correlation with inpatient surgical readmissions, while statistically significant, is essentially at the null value, but that is not unexpected.
Of larger concern in the analysis is the demonstration of correlation showing better performance by high volume facilities. The use of a multi-level random effects model will, as a result of shrinkage, pull the scores for low volume facilities toward the mean. The correlation analysis may understate the correlation of volume and outcomes.
Equity
EquityStratifying by DE is probably adequate to address equity issues of mismeasurement of SES related variance, and the correlation analysis suggests explicit adjustment for area-based measures of SES would make little difference in rankings.
The substantial portion of the SES variance that seems associated with within hospital variance merits CMS research, as does the relative performance of hospitals with large vs small proportions of low SES patients.
Use and Usability
Use and UsabilityBeen used.
Summary
on equity issues.
Health systems w/o urgent care and specialty offices at disadvan
Importance
ImportanceThis is an important measure of hospital utilization after hospital outpatient surgery. While the focus on measuring this utilization is appropriate, the differentiation from procedures done in ambulatory surgical centers is not too clear.
Feasibility Acceptance
Feasibility AcceptanceThere is no additional reporting burden at the facility level.
Scientific Acceptability
Scientific Acceptability ReliabilityThe data presented are reliable.
Scientific Acceptability ValidityThe data presented are valid.
Equity
EquityThe measure addresses socioeconomic status.
Use and Usability
Use and UsabilityIntended use (currently in place):
Public Reporting
Quality Improvement (Internal to the specific organization)
Quality Improvement with Benchmarking (external benchmarking to multiple organizations)
Summary
This measure places health systems w/o urgent care and specialty offices at a significant disadvantage. For this reason, a revision of the measure is recommended.
n/a
Importance
Importancedeveloper cites evidence
Feasibility Acceptance
Feasibility Acceptanceno additional comments
Scientific Acceptability
Scientific Acceptability Reliabilitydata is recent
Scientific Acceptability Validitydeveloper conducted testing
Equity
Equitydeveloper addressed this area
Use and Usability
Use and Usabilitycurrently in use
Summary
n/a
None
Importance
ImportanceNone
Feasibility Acceptance
Feasibility AcceptanceNone
Scientific Acceptability
Scientific Acceptability ReliabilityNone
Scientific Acceptability ValidityNone
Equity
EquityNone
Use and Usability
Use and UsabilityNone
Summary
None
Agree with staff comments
Importance
ImportanceOpportunity to address potentially avoidable admissions following surgical procedures
Feasibility Acceptance
Feasibility Acceptanceclaims based
Scientific Acceptability
Scientific Acceptability ReliabilityPotential concern for change in surgical procedure clinical pathways/protocols and measurement yeay over year
Scientific Acceptability Validityagree with staff assessment
Equity
EquityMeasure developers addressed DE and ADI
Use and Usability
Use and UsabilityAgree with staff assessments
Summary
No additional comments; agree with staff comments
-
Endorsement should be removed from this measure. It is not a valid measure of the quality or efficiency of hospital outpatient surgery due to problems with the numerator, the denominator, and the risk adjustment methodology. Continued public reporting of the results could mislead patients about where they should receive surgery, and continued use of the measure to modify hospital payments could worsen disparities in access and outcomes for patients.
Problems with the Numerator
The stated goal of the measure is to reduce adverse patient outcomes associated with outpatient surgery performed in a hospital. However, the numerator includes events that have nothing to do with the surgery, while excluding events that represent undesirable outcomes and avoidable costs.
The Measure Information Form states that “studies have consistently shown that post-operative complications and poorly controlled symptoms are the primary contributors to unexpected hospital visits following outpatient surgery,” but the only study cited to support that statement is a paper authored by the measure developers that describes the development of this same measure (Desai MM, et al. “Variation in Risk-Standardized Rates and Causes of Unplanned Hospital Visits Within 7 Days of Hospital Outpatient Surgery,” Annals of Surgery 276(6).) That paper does not explicitly examine how many visits were due to complications of the surgery the patient received, it merely lists the top 25 diagnoses for all of the hospital visits that occurred after any of the long list of procedures included in the measure. The second and third most frequent diagnoses were “spinal stenosis, lumbar region” and “benign prostatic hyperplasia,” neither of which seem likely to be complications of most outpatient surgeries.
Other studies have found that a large fraction of post-surgical visits are not related to the index surgery:
The inclusion of so many visits unrelated to the surgery creates two different problems.
Problems with the Denominator
The denominator for this measure can include over 2,000 different procedures performed in almost any part of the body, including such diverse procedures as knee replacements, hysterectomies, and repairs of ear drum punctures. These different procedures will be performed by completely different surgeons, and many of the hospital staff involved in the procedures will likely be different. Yet this measure combines all of those procedures together and calculates one single outcome measure for all of them.
If the hospital has a higher-than-average rate of post-surgery visits for some types of procedures and a lower-than-average rate for others, the overall rate of post-surgery visits will depend on both (1) the relative numbers of the two types of procedures and (2) the rates of post-surgery visits for each type of procedure. As a result, the hospital may be reported as having average performance overall even if it has high rates of post-surgery visits for specific types of procedures, and it could potentially be reported as having poor performance overall if it has a higher-than-average rate of visits on certain types of high-volume procedures, even though it performs well on many others.
The measure also includes one type of non-surgical procedure – cystoscopy with intervention – but not other types of endoscopy. The Measure Information Form states this is because “the outcome rate and causes of hospital visits post-procedure are similar to those for surgeries in the measure cohort,” but no data are provided to support this assertion, nor are any data provided to support excluding other forms of endoscopy. Moreover, no data are provided on the number of these procedures performed in hospitals relative to other types of procedures or how that proportion varies across hospitals, so it is impossible to determine what impact this inclusion could have on comparisons of post-surgical visit rates across hospitals.
A patient who is choosing where to have surgery wants to know whether a hospital delivers high-quality care for that specific type of surgery. Yet even for the most frequent types of surgery, this measure does not calculate quality separately for specific types of surgery; instead, it averages the quality of care together with almost all other types of surgery and some non-surgical procedures. The Measure Information Form claims that the measure “makes visible to clinicians and hospitals meaningful quality differences and incentivizes improvement,” but since the measure provides no information on which procedures contribute to a low score, there is no way for clinicians or hospitals to know what is needed for improvement. Moreover, even a hospital that receives a score that is “better than expected” or “no different than expected” could have serious quality problems with subsets of procedures, but these problems would be hidden within the aggregate measure score.
It is not necessary to lump all of these disparate surgeries together in order to measure the frequency of hospital visits for complications; in fact, in the Ambulatory Surgical Center Quality Reporting (ASCQR) Program, there are separate measures of post-surgical visit rates for orthopedic procedures and urology procedures, and in 2024, there will be a third measure of post-surgical visit rates for general surgery procedures. Because the hospital measure combines visit rates for orthopedic and urology procedures with all other surgeries, it is impossible for a patient or anyone else to compare the post-surgical visit rates at hospital outpatient departments (HOPDs) and Ambulatory Surgery Centers (ASCs) for those procedures.
Problems with the Risk Adjustment Methodology
The measure does not report the actual rate of post-surgical visits at a hospital; it reports a ratio of the “predicted” number of visits to the “expected” number of visits. Both of these parameters are calculated using a hierarchical logistical regression model that is intended to adjust for differences in the characteristics of the patients receiving surgery that affect outcomes. However, there are serious flaws with the model that is used:
The availability of alternative surgery facilities varies significantly across communities, which means the proportion of all ambulatory surgeries performed in hospitals will also vary significantly. For example, there are fewer ASCs in states with Certificate of Need laws, and there are fewer ASCs in small communities and rural areas simply because of the smaller number of patients. As a result, hospitals in states and parts of states where there are more ASCs will have a smaller and higher-risk group of outpatient surgery patients than other hospitals do, and so they will likely have higher rates of unplanned hospital visits after surgery. These differences will not be fully captured by the patient characteristics included in the risk model, and the measure does not adjust for the proportion of surgeries performed at a hospital, so hospitals that perform a smaller proportion of surgeries could inappropriately appear to be delivering lower-quality care.
Inadequate Evaluation of Risk Adjustment Model Performance
Because of the large number of different types of procedures included in the model, the variation in the proportion of total procedures performed at each hospital, and the variation in patient characteristics across communities, as well as the problematic assumptions described above, a thorough analysis is needed to determine how well the regression model in this measure adjusts for expected variations in outcomes and factors that are beyond the control of a hospital. The simplistic analysis provided in the Measure Information Form is not adequate to justify continued use of this measure.
The developers claim that the c-statistic for the model (.693) indicates “good model discrimination.” What the c-statistic means is that there is an approximately 30% chance that a patient who has a post-surgical hospital visit will be classified as lower risk than a patient who does not experience a visit. (The Measure Information Form says that “observed hospital visit ratios were compared to predicted hospital visit probabilities” in order to calculate the c-statistic, but presumably this means observed hospital visit rates, not the visit ratios calculated for the measure.) It is impossible to determine whether this value of the c-statistic is good or bad. The reason for creating the measure is a presumption that some procedures at some hospitals have outcomes that are better or worse than expected, so the model should not be expected to predict 100% of outcomes if it is using legitimate risk adjustment variables for the procedures and a single random effect variable for the hospital. However, it matters whether the 30% of cases that have a post-surgical visit and are classified as low risk are truly low risk (in which case, the post-surgical visits are more likely to represent poor quality care at the hospital where they received the procedure), or whether the 30% of cases are actually high risk (in which case, their post-surgical visits are more likely to represent an expected outcome of the procedure regardless of which hospital delivered it). There is no information in the Measure Information Form indicating whether the accuracy of prediction was assessed in detail or what the results were.
The developers also state that the ability to distinguish high-risk subjects from low-risk subjects is important and claim that “for a model with good predictive ability we would expect to see a wide range in hospital visit ratios between the lowest decile and highest decile.” It is not clear why this is true; the ability to distinguish high-risk subjects from low-risk subjects should be assessed by determining whether the model accurately predicts the procedures that have higher and lower visit rates, not by the magnitude of the variation in the hospital visit ratios calculated for the measure. The numbers reported in the Measure Information Form under “Predictive Ability” appear to be the % of surgeries with post-surgery visits (i.e., the visit rates), not the hospital visit ratios. (The Form reports the range as 1.74 to 16.01, whereas Table 1 reports the range of the hospital visit ratios as 0.44 to 3.61.) The range of these visit rates depends on how much difference there is in the actual rates for different surgical procedures, not on the model’s predictive ability.
The Measure Information Form fails to report the Hosmer-Lemeshow statistic, which is a commonly-used measure of the differences in predictive ability for different levels of risk. In addition, there is no analysis of whether the model systematically over- or under-predicts visit rates for different procedures or for specific subgroups of patients (e.g., those with specific types of health problems). These types of prediction errors could lead to erroneous classifications of hospitals that perform significantly more or less of these procedures or have an unusually high or low number of these patients.
Poor Measure Reliability and an Inappropriately Low Minimum Case Threshold
Although more than 1,000 hospitals had Risk-Standardized Hospital Visit Ratios (RSHVRs) that were greater than 1.05 (i.e., more than 5% higher than expected) and over 400 had visit ratios over 1.20, the Measure Information Form states that only 229 hospitals could be classified as “worse than expected” because of the high degree of uncertainty associated with the ratio. Similarly, even though more than 1,000 hospitals had RSHVRs below 0.90 (i.e., 10% better than expected), only 220 could be classified as “better than expected.” Despite the expansive definition of the denominator, 941 hospitals had too few cases (i.e., less than 30 cases) to determine a category at all.
The proportion of hospitals that are classified as either better or worse than expected is even lower than this in the data that have been publicly reported on the CMS Hospital Compare site. In 2023, only 88 hospitals were classified as “worse than expected” and only 84 hospitals were classified as “better than expected,” out of a total of 3,672. 95% of hospitals are either “no different than expected” or have too few cases to report a classification.
The Measure Evaluation Form cites the median facility-level reliability of the measure as evidence that the measure meets a minimum standard of reliability. However, the reliability of any measure is higher if there are more cases being measured, so the median reliability will be higher if more large hospitals are being measured. If a measure is going to be used to classify individual hospitals, what matters is the reliability of the measure for individual hospitals, not the median reliability for all hospitals.
Table 6 in the Measure Information Form shows that the reliability of the measure is unacceptably low for almost half of hospitals, and even excluding hospitals with fewer than 30 cases, Table 6A shows that the measure is unreliable for at least 10-20% of hospitals. (The reliability threshold of 0.6 that is used by CMS does not represent an acceptable level of reliability for a measure that is used for determining hospital payments or guiding patient treatment choices.)
A minimum of 30 cases for assessing performance is inappropriately low for this measure because the denominator of the measure is so broad. Since there are hundreds of different types of surgeries that can potentially be included in the denominator, and a wide range of different patients who could be receiving those procedures, 30 cases at one hospital could represent a completely different set of procedures and patients than 30 cases at another hospital, and the 30 cases at a hospital this year could look completely different than the cases the previous year. As a result, a hospital’s score may change from year to year simply because of the types of cases that happen to be included each year, not because of changes in the quality of care delivery, and one hospital’s score could be much higher or lower than another’s simply because of the mix of cases each happened to have that year, not because one hospital delivers higher quality care than the other.
The measure developers do not appear to have made any effort to examine the individual hospitals that are classified as “better” or “worse” than expected to ensure those classifications are not based on artifacts of the measure definition or the risk adjustment methodology. Neither does it appear that they have analyzed changes in hospitals’ performance on the measure from one year to the next to assess the measure’s reliability over time. These types of analyses are essential for a measure that is publicly reported and used for payment.
Lack of Business Case for Using the Measure and Undesirable Effects of Doing So
Since only a very small number of hospitals are being classified as anything other than “no different than expected,” this measure is unlikely to encourage improvements in the quality of care delivery. On the other hand, since the weaknesses in the measure methodology could cause some hospitals to be inappropriately labeled as “worse than expected” because of the kinds of patients they treat, use of the measure creates an undesirable incentive for hospitals to avoid treating patients with characteristics that are likely to result in higher numbers of visits, such as patients with multiple comorbidities and patients who do not have good access to primary care and chronic disease management services. It could also cause patients to inappropriately avoid obtaining needed surgery, or to try and obtain surgery in a facility that is not adequately equipped to address their needs.