Skip to main content

Rate of Timely Follow-up on Abnormal Screening Mammograms for Breast Cancer Detection

CBE ID
4700e
Endorsement Status
1.1 New or Maintenance
Is Under Review
Yes
Next Maintenance Cycle
Fall 2024
1.3 Measure Description

This electronic Clinical Quality Measure (eCQM) reports the percentage of female patients aged 40 to 75 years with at least one abnormal screening (BI-RADS 0) or screening-to-diagnostic (BI-RADS 4, 5) mammogram during the measurement period (i.e., calendar year) who received timely diagnostic resolution defined as either follow-up imaging with negative/benign/probably benign results or a breast biopsy within 60 days after their index (i.e., first) abnormal screening mammogram.

Negative/benign/probably benign follow-up imaging was defined as diagnostic mammography, breast ultrasound or magnetic resonance imaging (MRI) with BI-RADS ratings of 1, 2, or 3. Relevant diagnostic breast biopsy procedures were defined as core needle biopsy, fine needle aspiration, and surgical excision.

Breast Imaging – Reporting and Data System (BI-RADS) ratings: 0-incomplete, 1-negative, 2-benign, 3-probably benign, 4-suspicious, 5-highly suggestive of malignancy.

        • 1.5 Measure Type
          1.6 Composite Measure
          No
          1.7 Electronic Clinical Quality Measure (eCQM)
          1.8 Level Of Analysis
          1.8b Specify Other Level of Analysis
          Integrated Delivery System
          1.9b Specify Other Care Setting
          Integrated Delivery System
          1.10 Measure Rationale

          Breast cancer is the second most common cause of cancer deaths among women in the United States [1]. In 2024, around 42,250 women will die from breast cancer and an estimated 310,720 new cases of invasive breast cancer will be diagnosed [1].

          Breast cancer survival is dependent upon cancer stage at diagnosis. Approximately 99% of women diagnosed with early-stage breast cancer live for five years or more [2]. However, this applies to only about 32% of those diagnosed at the most advanced stage. 

          Noninvasive mammographic screening is the primary screening modality used to detect breast cancer. Delays in diagnostic follow-up after abnormal mammographic screening results increase the risk of diagnosing cancer at a more advanced stage [3]. 

          National screening guidelines recommend that women with abnormal screening mammogram results (BI-RADS 0, 4, or 5) undergo additional follow-up imaging via diagnostic mammography, magnetic resonance imaging (MRI), and/or ultrasound [4, 5, 6, 7]. While it is recommended that patients with a benign follow-up imaging result return to routine screening, those with abnormal results (BI-RADS 4 or 5) should have diagnostic samples extracted (e.g., via percutaneous biopsy, fine needle aspiration, or surgical excision) from a suspicious area to evaluate for cancer [4]. 

          Expert-based quality measure programs support the need to establish a reasonable timeframe that encompasses this multi-step process. According to the Center for Disease Control and Prevention (CDC) National Breast and Cervical Cancer Early Detection Program (NBCCEDP), breast cancer screening to diagnostic resolution should occur within 60 days [8]. It is also expected that over 90% of women complete diagnostic resolution after an abnormal screening mammogram [8, 9]. Published literature shows that long wait times to diagnostic evaluation are associated with increased tumor size and lymph node metastases in patients with delays exceeding 12 weeks [10, 11, 12].

          Disparities in diagnostic follow-ups after abnormal screening mammograms are frequently reported in the literature. A 2021 systematic review reported rates of failure to follow-up on abnormal screening mammograms ranging from 7.2-33% [13]. A 2024 study on the American College of Radiology’s National Mammography Database (NMD) observed that only 66.4% of 2.9 million abnormal screening mammograms (BI-RADS 0) documented from 2008-2021 had diagnostic follow-up [14]. In this cohort, women with no family history of breast cancer had lower follow-up rates, Black and Native American women had lower overall follow-up rates and lower biopsy rates [14]. Rural and community hospital-affiliated facilities had longer median times to biopsy [14]. 

          The variability in follow-up rates in the NMD and existing literature imply the existence of barriers limiting mammography facilities from carrying out complete diagnostic resolution within a timely manner for all patients. This eCQM can be used to address quality assessment gaps by monitoring timeliness and completeness of care in medical facilities and health systems looking to improve the breast cancer screening diagnostic process.

          1. Key Statistics for Breast Cancer. American Cancer Society. Updated January 17, 2024. Accessed October 31, 2024. https://www.cancer.org/cancer/types/breast-cancer/about/how-common-is-breast-cancer.html.
          2. Cancer Statistics Working Group. U.S. Cancer Statistics Data Visualizations Tool, based on 2021 submission data (1999–2020): U.S. Department of Health and Human Services, Centers for Disease Control and Prevention and National Cancer Institute. Updated June 2024. Accessed July 2024. www.cdc.gov/cancer/dataviz. 
          3. McCarthy AM, Kim JJ, Beaber EF, et al. Follow-Up of Abnormal Breast and Colorectal Cancer Screening by Race/Ethnicity. Am J Prev Med. 2016;51(4):507-512. doi:10.1016/j.amepre.2016.03.017. PMID: 27132628.
          4. Sickles E, D’Orsi CJ. ACR BI-RADS follow-up and outcome monitoring. In: D’Orsi CJ, ed. ACR BI-RADS atlas, breast imaging reporting and data system. Reston, VA: American College of Radiology Reston; 2013:5-67. https://www.acr.org/-/media/ACR/Files/RADS/BI-RADS/BIRADSFAQ.pdf.
          5. Monticciolo DL, Malak SF, Friedewald SM, et al. Breast Cancer Screening Recommendations Inclusive of All Women at Average Risk: Update from the ACR and Society of Breast Imaging. J Am Coll Radiol. 2021;18(9):1280-1288. doi:10.1016/j.jacr.2021.04.021. PMID: 34154984.
          6. US Preventive Services Task Force, Nicholson WK, Silverstein M, et al. Screening for Breast Cancer: US Preventive Services Task Force Recommendation Statement [published correction appears in JAMA. 2024 Sep 30. doi: 10.1001/jama.2024.19851]. JAMA. 2024;331(22):1918-1930. doi:10.1001/jama.2024.5534. PMID: 38687503.
          7. Esserman LJ, Joe BN, et al. Diagnostic evaluation of suspected breast cancer. UpToDate. Updated October 31, 2023. Accessed October 31, 2024. https://www.uptodate.com/contents/diagnostic-evaluation-of-suspected-breast-cancer?search=birads&source=search_result&selectedTitle=2%7E13&usage_type=default&display_rank=2#H24.
          8. DeGroff A, Royalty JE, Howe W, et al. When performance management works: a study of the National Breast and Cervical Cancer Early Detection Program. Cancer. 2014;120 Suppl 16(Suppl 16):2566-2574. doi:10.1002/cncr.28817. PMID: 25099899.
          9. Miller JW, Hanson V, Johnson GD, Royalty JE, Richardson LC. From cancer screening to treatment: service delivery and referral in the National Breast and Cervical Cancer Early Detection Program. Cancer. 2014;120 Suppl 16(0 16):2549-2556. doi:10.1002/cncr.28823. PMID: 25099897.
          10. Olivotto IA, Gomi A, Bancej C, et al. Influence of delay to diagnosis on prognostic indicators of screen-detected breast carcinoma. Cancer. 2002;94(8):2143-2150. doi:10.1002/cncr.10453. PMID: 12001110.
          11. Ganry O, Peng J, Dubreuil A. Influence of abnormal screens on delays and prognostic indicators of screen-detected breast carcinoma. J Med Screen. 2004;11(1):28-31. doi:10.1177/096914130301100107. PMID: 15006111.
          12. Doubeni CA, Gabler NB, Wheeler CM, et al. Timely follow-up of positive cancer screening results: A systematic review and recommendations from the PROSPR Consortium. CA Cancer J Clin. 2018;68(3):199-216. doi:10.3322/caac.21452. PMID: 29603147.
          13. Reece JC, Neal EFG, Nguyen P, McIntosh JG, Emery JD. Delayed or failure to follow-up abnormal breast cancer screening mammograms in primary care: a systematic review. BMC Cancer. 2021;21(1):373. Published 2021 Apr 7. doi:10.1186/s12885-021-08100-3. PMID: 33827476.
          14. Oluyemi ET, Grimm LJ, Goldman L, et al. Rate and Timeliness of Diagnostic Evaluation and Biopsy After Recall From Screening Mammography in the National Mammography Database. J Am Coll Radiol. 2024;21(3):427-438. doi:10.1016/j.jacr.2023.09.002. PMID: 37722468.
          1.20 Testing Data Sources
          1.20a Specify Other Data Source
          MagView Mammography Software Systems & Solutions (Breast Center Analytics)
          1.25 Data Sources

          Health System 1 data were used to calculate the eCQM rates, assess feasibility, and conduct reliability and validity testing. All analyses were conducted using data routinely collected and documented in the Epic EHR and reported for six years (2018 to 2023). Six facility groups were included in the analyses.

          Health System 2 data were used to calculate eCQM rates and assess feasibility. All analyses were conducted using data routinely collected and documented in the Cerner (now Oracle Health) EHR and reported for six years (2018 to 2023). One facility group was included in the analyses.

          Health System 3 data were used to assess feasibility using the Allscripts EHR. eCQM rates are forthcoming. 

        • 1.14 Numerator

          Patients in the denominator population who received timely diagnostic resolution defined as negative/benign/probably benign follow-up imaging (BI-RADS 1, 2, 3) or breast biopsy within 60 days after the date of their index (i.e., first) abnormal screening (BI-RADS 0) or screening-to-diagnostic (BI-RADS 4, 5) mammogram.

          1.14a Numerator Details
          1. Extract the date of the first abnormal screening (BI-RADS 0) or screening-to-diagnostic (BI-RADS 4, 5) mammogram in the measurement period (i.e., calendar year) for each patient to define the index screening mammograms and index dates (i.e., start of the follow-up period) [value sets: “Screening Mammogram (Grouping)” OID 2.16.840.1.113762.1.4.1206.61; BIRADSCategories04And5 OID 2.16.840.1.113762.1.4.1206.67].
          2. If documented, extract the first follow-up imaging (i.e., diagnostic mammogram, ultrasound, or MRI) with negative/benign/probably benign (BI-RADS 1, 2, 3) ratings within 60 days after the date of the index abnormal screening mammogram for each patient [value sets: “Diagnostic Mammography” OID 2.16.840.1.113762.1.4.1206.65; “Ultrasound of the Breast” OID 2.16.840.1.113883.3.3157.1902; “MRI of the Breast” OID 2.16.840.1.113883.3.3157.1903; BIRADSCategories12And3 OID 2.16.840.1.113762.1.4.1206.68].
          3. If documented, extract the first breast biopsy procedure (i.e., core needle biopsy, fine needle aspiration, or surgical excision) within 60 days after the date of the index abnormal screening mammogram for each patient [value set: “Breast Cancer Biopsy and Surgical Excision” OID 2.16.840.1.113762.1.4.1206.66].
          4. Patients that received negative/benign/probably benign follow-up imaging or breast biopsy within 60 days are included in the numerator population.
        • 1.15 Denominator

          Female patients aged 40 to 75 years with an abnormal screening (BI-RADS 0) or screening-to-diagnostic (BI-RADS 4, 5) mammogram during the measurement period (i.e., calendar year). Only the first abnormal screening or screening-to-diagnostic mammogram (i.e., index screening test) is included in the measure calculation.

          1.15a Denominator Details
          1. Extract all abnormal screening mammograms (BI-RADS 0) and screening-to-diagnostic mammograms (BI-RADS 4, 5) during the measurement period (i.e., calendar year) [value sets: “Screening Mammogram (Grouping)” OID 2.16.840.1.113762.1.4.1206.61; BIRADSCategories04And5 OID 2.16.840.1.113762.1.4.1206.67].
          2. Retain abnormal screening and screening-to-diagnostic mammograms where the patient was aged between 40 and 75 years on the date of the mammogram [value set "BirthDate" OID 2.16.840.1.113883.3.560.100.4].
          3. Retain abnormal screening and screening-to-diagnostic mammograms where the patient was female [value set "ONCAdministrativeSex" OID 2.16.840.1.113762.1.4.1].
          4. Patients with at least one abnormal screening or screening-to-diagnostic mammogram are included in the denominator population.
          1.15d Age Group
          Other
          1.15e Age Group Other
          Universal Breast Cancer Screening Age for Females (40-75 years)
        • 1.15b Denominator Exclusions

          None.

          1.15c Denominator Exclusions Details

          None.

        • 1.12 Attach MAT Output
          1.13 Attach Data Dictionary
          1.13a Data dictionary not attached
          No
          1.16 Type of Score
          1.17 Measure Score Interpretation
          Better quality = Higher score
          1.18 Calculation of Measure Score
          1. Extract all abnormal screening mammograms (BI-RADS 0) and screening-to-diagnostic mammograms (BI-RADS 4, 5) during the measurement period (i.e., calendar year) [value sets: “Screening Mammogram (Grouping)” OID 2.16.840.1.113762.1.4.1206.61; BIRADSCategories04And5 OID 2.16.840.1.113762.1.4.1206.67].
          2. Retain abnormal screening and screening-to-diagnostic mammograms where the patient was aged between 40 and 75 years on the date of the mammogram [value set "BirthDate" OID 2.16.840.1.113883.3.560.100.4].
          3. Retain abnormal screening and screening-to-diagnostic mammograms where the patient was female [value set "ONCAdministrativeSex" OID 2.16.840.1.113762.1.4.1].
          4. Patients with at least one abnormal screening or screening-to-diagnostic mammogram are included in the target population.
          5. Extract the date of the first abnormal screening or screening-to-diagnostic mammogram in the measurement period (i.e., calendar year) for each patient to define the index screening mammograms and index dates (i.e., start of the follow-up period) [value sets: “Screening Mammogram (Grouping)” OID 2.16.840.1.113762.1.4.1206.61; BIRADSCategories04And5 OID 2.16.840.1.113762.1.4.1206.67].
          6. If documented, extract the first follow-up imaging (i.e., diagnostic mammogram, ultrasound, or MRI) with negative/benign/probably benign (BI-RADS 1, 2, 3) ratings within 60 days after the date of the index abnormal screening mammogram for each patient [value sets: “Diagnostic Mammography” OID 2.16.840.1.113762.1.4.1206.65; “Ultrasound of the Breast” OID 2.16.840.1.113883.3.3157.1902; “MRI of the Breast” OID 2.16.840.1.113883.3.3157.1903; BIRADSCategories12And3 OID 2.16.840.1.113762.1.4.1206.68].
          7. If documented, extract the first breast biopsy procedure (i.e., core needle biopsy, fine needle aspiration, or surgical excision) within 60 days after the date of the index abnormal screening mammogram for each patient [value set: “Breast Cancer Biopsy and Surgical Excision” OID 2.16.840.1.113762.1.4.1206.66].
          8. Patients that received negative/benign/probably benign follow-up imaging or breast biopsy within 60 days are included in the numerator population.
            Once numerator and denominator populations are defined:
          9. Calculate rate: Numerator population divided by denominator population and multiplied by 100 to calculate the percentage.
          1.18a Attach measure score calculation diagram
          1.19 Measure Stratification Details

          The measure is not stratified.

          1.26 Minimum Sample Size

          No minimum sample size specified.

        • Steward
          Brigham and Women's Hospital
          Steward Organization POC Email
          Steward Organization Copyright

          This electronic Clinical Quality Measure (eCQM) and related data specifications are owned and stewarded by the Brigham and Women's Hospital (BWH). BWH is not responsible for any use of the Measure. BWH makes no representations, warranties, or endorsement about the quality of any organization or physician that uses or reports performance measures and BWH has no liability to anyone who relies on such measures or specifications.

          Measure Developer Secondary Point Of Contact

          United States

          • 2.1 Attach Logic Model
            2.2 Evidence of Measure Importance

            Despite advancements in therapies for breast cancer, early detection via routine mammographic screening has had a substantial impact in reducing breast cancer mortality since the 1990s [1]. Breast cancer has the highest treatment cost of any cancer, costing over $26.2 billion for medical services and $3.5 billion for prescription medications [2]. Early detection through screening can reduce treatment costs by 30-100% [3]. The National Breast and Cervical Cancer Early Detection Program (NBCCEDP), which primarily serves uninsured and medically underserved women, is estimated to have saved 369,000 life-years compared to no screening for the 1.8 million women screened in the program between 1991-2006 [4]. 

            However, delays in diagnostic follow-ups after abnormal screening results worsen prognostic outcomes for patients by prolonging the initiation of early, lower-cost, and less invasive interventions at diagnosis [1]. Moreover, certain racial and ethnic minorities, including Black, Asian, and Hispanic women, as well as low-income patients and those living in underserved areas are more likely to experience lower rates of follow-up and timely diagnostic resolution, increasing their risks of later-staged diagnoses and death [5, 6, 7]. Non-Hispanic Black women are less likely to be diagnosed with breast cancer but more likely to die from it [8, 9]. These disparities in breast cancer outcomes emphasize the need for robust systems and protocols that monitor delays in follow-up to ensure all populations get access to timely and complete care after an abnormal screening mammogram result.

            Currently, federal requirements instituted by the Mammography Quality Standards Act (MQSA) are limited in their capacity to improve quality performance in breast imaging facilities because they do not provide guidelines on how to assess diagnostic timeliness [10, 11]. There is also no requirement for facilities to track and report abnormal screening mammogram (recall) rates and early cancer detection rates which can further inform the quality of mammography outcomes and practices in breast imaging facilities [9].

            A 2014 study found that breast imaging facilities are able to leverage routinely collected data to measure their ability to meet certain breast cancer screening diagnostic quality benchmarks [10]. These benchmarks included timely follow-up imaging (more than 90% of patients should receive diagnostic imaging within 30 days of an abnormal screening mammogram) and timely biopsy (more than 90% of patients should receive a recommended biopsy within 60 days of an abnormal screening mammogram) [10]. Only 62% of participating facilities (N=52) were able to show that they met the benchmark for timely follow-up imaging and 27% met the benchmark for timely diagnostic biopsy, highlighting the need to improve facility performance on the timeliness of diagnostic imaging and biopsy [10].

            A follow-up 2021 study by the same team found that facilities were also significantly more likely to reach these mammography quality benchmarks for timeliness of follow-up the longer they participated in a quality measurement program [11]. In this study, facilities not designated as an American College of Radiology (ACR) Breast Imaging Center of Excellence (BICOE) showed the most improvement in recall rate, proportion not lost to follow-up at imaging, biopsy recommendation rate, and early-stage cancer detection [11, 12]. Therefore, comprehensive quality improvement initiatives with stated performance benchmarks positively affect mammography practices and outcomes, with greater impact on previously underperforming facilities. 

            Results of regular quality assessments of breast cancer screening and timely diagnostic follow-up can be used to support the implementation of interventions, like patient navigation and case management, electronic health record (EHR) reminders, as well as patient education and outreach, that have been documented to improve follow-up, especially among racial and ethnic minorities and low-income patients [7, 13, 14].

            Given the minimal requirements to collect breast cancer screening follow-up data and non-standard methods to store it, facilities’ reported performance may be influenced by both care quality and data accuracy [10]. The type and amount of data that a facility reports may depend on the resources available to identify, extract, and measure these data, as well as initiatives to address any data quality gaps that prevent the accurate calculation of performance metrics [10]. Nevertheless, these studies show that facilities can perform better if tools are made available to them to do so. 

            This electronic clinical quality measure (eCQM) uses standard terminologies to calculate the rate of timely diagnostic resolution in facilities and health systems that perform mammographic screening and follow-up. The specifications of this measure are supported by findings outlined in peer-reviewed literature, current screening guidelines, and existing related clinical quality measures. Facilities can use this tool to conduct routine quality assessment checks to guide quality improvement initiatives aiming to promote health equity and timeliness of care in the breast cancer screening and diagnostic process. 

            1. Monticciolo DL, Malak SF, Friedewald SM, et al. Breast Cancer Screening Recommendations Inclusive of All Women at Average Risk: Update from the ACR and Society of Breast Imaging. J Am Coll Radiol. 2021;18(9):1280-1288. doi:10.1016/j.jacr.2021.04.021. PMID: 34154984.
            2. National Cancer Institute. Financial burden of cancer care. Cancer Trends Progress Report. Reviewed March 2024. Accessed October 2024. https://progressreport.cancer.gov/after/economic_burden.
            3. Feig S. Cost-effectiveness of mammography, MRI, and ultrasonography for breast cancer screening. Radiol Clin North Am. 2010;48(5):879-891. doi:10.1016/j.rcl.2010.06.002. PMID: 20868891.
            4. Hoerger TJ, Ekwueme DU, Miller JW et al. (2011) Estimated effects of the National Breast and Cervical Cancer Early Detection Program on breast cancer mortality. Am J Prev Med 40:397–404. PMID: 21406272.
            5. McCarthy AM, Kim JJ, Beaber EF, et al. Follow-Up of Abnormal Breast and Colorectal Cancer Screening by Race/Ethnicity. Am J Prev Med. 2016;51(4):507-512. doi:10.1016/j.amepre.2016.03.017. PMID: 27132628.
            6. Nguyen KH, Pasick RJ, Stewart SL, Kerlikowske K, Karliner LS. Disparities in abnormal mammogram follow-up time for Asian women compared with non-Hispanic white women and between Asian ethnic groups. Cancer. 2017;123(18):3468-3475. doi:10.1002/cncr.30756. PMID: 28603859.
            7. Reece JC, Neal EFG, Nguyen P, McIntosh JG, Emery JD. Delayed or failure to follow-up abnormal breast cancer screening mammograms in primary care: a systematic review. BMC Cancer. 2021;21(1):373. Published 2021 Apr 7. doi:10.1186/s12885-021-08100-3. PMID: 33827476.
            8. DeSantis CE, Fedewa SA, Goding Sauer A, Kramer JL, Smith RA, Jemal A.  Breast cancer statistics, 2015: convergence of incidence rates between black and white women. CA Cancer J Clin. 2016;66(1):31-42. doi:10.3322/caac.21320. PMID: 26513636.
            9. Richardson LC, Henley SJ, Miller JW, Massetti G, Thomas CC. Patterns and Trends in Age-Specific Black-White Differences in Breast Cancer Incidence and Mortality - United States, 1999-2014. MMWR Morb Mortal Wkly Rep. 2016;65(40):1093-1098. Published 2016 Oct 14. doi:10.15585/mmwr.mm6540a1. PMID: 27736827.
            10. Rauscher GH, Murphy AM, Orsi JM, Dupuy DM, Grabler PM, Weldon CB. Beyond the mammography quality standards act: measuring the quality of breast cancer screening programs. AJR Am J Roentgenol. 2014;202(1):145-151. doi:10.2214/AJR.13.10806. PMID: 24261339.
            11. Rauscher GH, Tossas-Milligan K, Macarol T, Grabler PM, Murphy AM. Trends in Attaining Mammography Quality Benchmarks With Repeated Participation in a Quality Measurement Program: Going Beyond the Mammography Quality Standards Act to Address Breast Cancer Disparities. J Am Coll Radiol. 2020;17(11):1420-1428. doi:10.1016/j.jacr.2020.07.019. PMID: 32771493. 
            12. Albus K. ACR Designated Comprehensive Breast Imaging Center (CBIC)(Revised 9-29-23). American College of Radiology. September 9, 2023. Accessed October 31, 2024. https://accreditationsupport.acr.org/support/solutions/articles/11000068075-acr-designated-comprehensive-breast-imaging-center-cbic-revised-9-29-23-.
            13. Haas JS, Atlas SJ, Wright A, et al. Multilevel Follow-up of Cancer Screening (mFOCUS): Protocol for a multilevel intervention to improve the follow-up of abnormal cancer screening test results. Contemp Clin Trials. 2021;109:106533. doi:10.1016/j.cct.2021.106533. PMID: 34375748.
            14. Atlas SJ, Tosteson ANA, Wright A, et al. A Multilevel Primary Care Intervention to Improve Follow-Up of Overdue Abnormal Cancer Screening Test Results: A Cluster Randomized Clinical Trial. JAMA. 2023;330(14):1348-1358. doi:10.1001/jama.2023.18755. PMID: 37815566.
          • 2.3 Anticipated Impact

            The anticipated impact has been described in the Evidence of Measure Importance, above. The benefits of adhering to universal breast cancer screening recommendations outweighs any potential unintended consequences related to screening.

            2.5 Health Care Quality Landscape

            There are currently three existing quality measures related to breast cancer screening:

            1. Breast Cancer Screening (Higher rate = better): “Percentage of women 50-74 years of age who had a mammogram to screen for breast cancer in the 27 months prior to the end of the measurement period.” (Quality ID #112, NQF #2372)
            2. Breast Cancer Screening Recall Rates (Target = 5 to 12%): “Percentage of beneficiaries with mammography or digital breast tomosynthesis (DBT) screening studies that are followed by a diagnostic mammography, DBT, ultrasound, or magnetic resonance imaging (MRI) of the breast in an outpatient or office setting within 45 days.” (Hospital Outpatient Quality Reporting ID #1648)
            3. Follow-Up after Abnormal Breast Cancer Assessment (Higher rate = better): "The percentage of inconclusive or high-risk BI-RADS assessments that received appropriate follow-up within 90 days of the assessment, for members 40–74 years of age." (BCF-E). This measure reports on the percentage of mammograms with a BI-RADS of 0 that received follow-up diagnostic imaging within 90 days or mammograms with a BI-RADS of 4 or 5 that received follow-up breast biopsy within 90 days; the measure does not quantify the percentage of patients that have timely diagnostic resolution from a screening mammogram to breast biopsy.

            Each of these clinical quality measures quantifies specific aspects of the breast cancer screening process; however, none of the measures assess the full process from abnormal screening mammogram to diagnostic resolution.

            2.6 Meaningfulness to Target Population

            Three provider interviews, and one pilot patient interview have been conducted to date. More interviews are underway with a target of 5-10 provider and 5-10 patient interviews. Feedback was also obtained at Technical Expert Panel (TEP) meetings and through a Public Comment period.

            Provider Interviews:

            Once a patient undergoes a screening mammogram, the following diagnostic imaging and breast biopsy scheduling falls under the breast imaging or radiology center’s responsibility. When asked about the role of a primary care provider in breast cancer screening, one stated “radiology owns most of that process in a closed loop, because of the nature of mammogram imaging and breast cancer”. While the primary care provider is sent the imaging results and can aid patients in understanding results and encouraging follow-up; scheduling timely follow-up is out of their scope. The providers suggest that the measure could negatively reflect on the primary care level but could be a meaningful reflection of the health system as articulated by one provider, “I don't think it should be something that primary care doctors are judged on. To me, it's more of a function of how well our health system works and the workflows that the breast imaging center have.” One provider suggested that reporting on follow-up rates could be “a helpful guide of what needs to be done” and inform quality improvement interventions. Although scheduling follow-up may not be the responsibility of the providers interviewed, there was agreement with the measure specifications as a whole and of index abnormal mammograms being categorized as BIRADS 0,4, or 5.

            Patient Pilot Interview:

            Between the steps in the screening process, the patient reported a “three to four week wait time between appointments.” The time and travel associated with the follow-up appointments is described as “difficult to manage” by the patient. Once at the appointments for diagnostic imaging and subsequently, breast biopsy, the patient explained that wait times lasted hours in clinic. In clinic wait times contributed to the patient feeling anxious due to the multiple stages of the screening process. When asked to reflect on the breast cancer screening eCQM, the patient stated it would “tell you about your experience vs. others.” The patient supports the reporting of this information via the breast cancer screening eCQM as it provides clarity on the care they receive in the context of the larger health system.

            Patient Perspective on the TEP:

            One patient representative stated that "I do believe that 90 days for many people is a long time. Now, that depends on the availability to get things done. But, I don't think that that should be the standard. I think the standard should be sooner." when asked about using a 60 day vs. 90 days follow up timeframe for diagnostic resolution.

            Public Comment Feedback:

            The eCQM specifications and preliminary rates were posted for Public Comment on the Centers for Medicare & Medicaid Services (CMS) Measures Management System (MMS) for 15 days. The posting was shared via email with CMS listserv members for wider distribution of the commenting opportunity.

            Comments were received from the following organizations expressing support for the eCQM and providing feedback on the specifications: American College of Radiology (ACR), American Medical Association (AMA), and Merck. Comments and feedback were used to revise measure specifications and select benchmarks as described in Note 1 (in the supplemental attachment "4700e_SupplementalInformation.docx.") Comments around meaningfulness included:

            "ACR supports the measure concept’s goal of ensuring the delivery of recommended follow-up care for female patients aged 40 to 75 who receive an abnormal mammography exam during the designated performance year." – American College of Radiology (ACR)

            “The measure’s preliminary performance scores come from hospital-affiliated outpatient sites within an integrated health system, so ACR advocates that the measure could be reasonably adopted in any facility-level program, i.e.:

            • Hospital outpatient, integrated health system
            • Single hospital/outpatient
            • Free standing center” 

            American College of Radiology (ACR)

            "The American Medical Association appreciates the opportunity to comment on the two measures addressing timely follow-up after an abnormal screening result and supports their intent… We agree that measurement should be at the facility level, assuming that testing demonstrates that the results are reliable and valid, and subsequently selected for those programs for which that level is appropriate." – American Medical Association (AMA)

            "Merck strongly supports the proposed measure, “Timely Follow-up on Abnormal Screening Mammograms for Breast Cancer Detection,” recognizing its potential to enhance the quality of care in breast cancer. This measure aligns with clinical guideline practices which emphasize the importance of timely follow-up care and appropriate diagnostics that can lead to earlier, more effective treatment opportunities. Research has highlighted the importance of timely follow-up care for breast cancer after receiving abnormal screening mammograms, as delays can lead to further progression of the disease and worsening outcomes. We support the development of this measure given its opportunity to promote a standardized process for care coordination in breast cancer screening follow-up." – Merck

            “To support widespread adoption and applicability across quality reporting programs, we encourage the measure developers to consider potential use cases of this measure among healthcare reporting entities, such as at the hospital and health plan level. Potentially relevant programs for implementation could include the HOQR program for Outpatient Prospective Payment System (OPPS) hospitals and MIPS Value Pathways (MVPs) to extend reporting to facility, ACO, or health plan level.” – Merck

          • 2.4 Performance Gap

            The eCQM performance is reported at the integrated delivery system level and the facility group level.

            Table 1 (in attachment "4700e_PerformanceGap.docx") presents eCQM performance rates for Health System 1 at the integrated delivery system level by year. Table 2 (in attachment "4700e_PerformanceGap.docx") shows the eCQM rates at Health System 1 by hospital-affiliated facility group and by year. Most rates were not statistically significantly different from the 90% benchmarks outlined above. However, in 2023, the overall eCQM performance rate for the integrated delivery system decreased to 84.4% (95% CI: 83.7%, 85.0%), dropping significantly below the 90% benchmark. The facility group level eCQM rates show that the performance significantly decreased in three groups in 2023 to 62.0% (95% CI: 58.6%, 65.3%), 75.2% (95% CI: 73.8%, 76.6%), and 85.1% (95% CI: 83.1%, 87.1%), indicating where efforts to improve follow-up should be focused. These results highlight the importance of tracking performance over time even in health systems and facility groups that have been consistently performing well and meeting benchmarks.

            Table 3 (in attachment "4700e_PerformanceGap.docx") presents eCQM performance rates for 1 hospital-affiliated facility group at Health System 2 by year. The eCQM performance rates were not statistically significantly different from the 90% benchmark from the year 2020 and onwards.

            Table 4 (in attachment "4700e_PerformanceGap.docx") presents preliminary results for 2 additional hospital-affiliated facility group at Health System 2 by year, demonstrating opportunities for improvement. The overall rates and rates by year are significantly different from the 90% benchmark. 

            Please note that it was not possible to provide the performance scores by decile. This is a new eCQM that was tested at two health systems to-date with a total of 1 integrated delivery system eCQM performance rate reported (Health system 1) and 7 facility-group eCQM performance rates reported (Health Systems 1 and 2).

            2.4a Attach Performance Gap Results
            • 3.1 Feasibility Assessment

              Table 9 (in supplemental attachment "4700e_SupplementalInformation.docx") presents the frequency of data elements by Health System. Table 10 (in supplemental attachment "4700e_SupplementalInformation.docx") shows the frequency of data elements for the 6 facility groups at Health System 1. All required data elements are routinely collected during patient care.

              One potential feasibility challenge is extraction of Breast Imaging – Reporting and Data System (BI-RADS) results from unstructured data fields in EHRs. Both Epic and Cerner (now Oracle Health) provide structured fields to document BI-RADS information for breast screening and diagnostic imaging, but not all health systems or facilities use this field. For example, only certain facilities at Health System 2 used the structured field, and Health System 3 did not use structured fields to store these data, although the analyses were conducted using data from a legacy system.

              All facilities within Health System 1 had the same feasibility scores for 2023 (Scorecard 1 in attachment "4700e_FeasibilityScorecard.xlsx"). Facilities at Health System 2 had different feasibility scores (Scorecards 2 and 3). All facilities within Health System 3 had the same feasibility scores (Scorecard 4).

              3.2 Attach Feasibility Scorecard
              3.3 Feasibility Informed Final Measure

              The feasibility assessment did not impact the final measure specifications. However, it highlighted a potential gap. Given that BI-RADS may not be captured in structured fields, a simple string search (to find words in free text) was developed to extract BI-RADS results from unstructured fields in EHRs (initial results and any addenda). The search terms account for variations in documentation practices (e.g., 'BI-RADS 4' vs. 'Category 4') and misspellings (e.g., 'BIRADS 4'). This approach demonstrated near perfect performance at Health System 1, where one facility group only began using the structured field in 2023, requiring extraction of BI-RADS data using the string search for earlier data years. This approach was also applied by Health System 3 to extract data from the legacy system.

              An alternate solution to string search is to leverage other health information technology systems that interface with the EHR. This approach was applied by Health System 2 for sites where BI-RADS data were not stored in structured fields. MagView Mammography Software & Workflow Solutions can be used to extract all data elements required to calculate the eCQM. MagView offers structured reporting by radiologists and can be used to generate standard and custom reports about patient follow-up imaging and biopsies. MagView interfaces with the EHR as follows: mammography orders are sent to MagView including patient demographics, date of birth, name, order, and ordering physician. Once the MagView workflow is completed, the results are sent back to the EHR in the form of a free-text report. Data are not extracted from the free-text report except for the name of the radiologist who reviewed the imaging, meaning that the BI-RADS results are available in the EHR but not documented in structured fields. 

              Health System 2 preliminary eCQM performance rates were calculated using data available from standard MagView reports, which were not able to define the target population according to the measure specifications. The preliminary rates are not restricted to female patients aged 40-75 years and therefore include all patients with an abnormal screening mammogram. Requests to generate custom reports to accurately reflect eCQM specifications are underway, and rates will be updated accordingly.

            • 3.4 Proprietary Information
              Not a proprietary measure and no proprietary components
              • 4.1.3 Characteristics of Measured Entities

                Table 11 (in supplemental attachment "4700e_SupplementalInformation.docx") presents health system characteristics and overall eCQM rates. In Health System 1, one hospital-affiliated facility group had 6 facilities, two groups had 5 facilities, one groups had 3 facilities, and two group had 1 facility.

                4.1.1 Data Used for Testing

                Health System 1 data were used for reliability and validity testing. The analyses were conducted using data routinely collected and documented in the Epic EHR and reported for six years (2018 to 2023).

                4.1.4 Characteristics of Units of the Eligible Population

                Table 12 (in supplemental attachment "4700e_SupplementalInformation.docx") shows patient characteristics for the included sample by health system. Table 13 (in supplemental attachment "4700e_SupplementalInformation.docx") shows patient characteristics for the 6 facility groups at Health System 1.

                4.1.2 Differences in Data

                None.

              • 4.2.2 Method(s) of Reliability Testing

                Patient-level Data Element Reliability Percentage Agreement and Kappa: Chart reviews were conducted on a stratified random sample of 100 patients. Given the high overall eCQM rate at Health System 1, patients that did not meet the numerator criteria as assigned by the eCQM were oversampled; 80 patients that met the numerator criteria and 20 patients that did not meet the numerator criteria were selected for chart review to calculate inter-abstractor reliability for data elements of the numerator and denominator populations. Manual chart review was considered the gold standard. Chart reviewers were blinded to the eCQM data extractions. Percentage agreement and Kappa were calculated for the gold-standard manual chart review abstractions and the eCQM automated data extractions. For the denominator data elements, inter-abstractor agreement required agreement on three elements: whether a screening mammogram was performed, screening mammogram date, and whether the result was abnormal (BI-RADS 0, 4, or 5). For the numerator data elements, inter-abstractor agreement required agreement on several elements: whether a procedure placed the patient in the numerator (i.e., diagnostic imaging or biopsy), type of procedure, date of the procedure, and if applicable, whether the result was negative, benign, or probably benign (BI-RADS 1, 2, or 3) for diagnostic imaging. Health Systems 2 and 3 are in the process of conducting chart reviews.

                Accountable Entity-level Reliability Signal-to-Noise Analysis: Signal-to-Noise Ratios (SNR) were calculated for the six hospital-affiliated facility groups at Health System 1. The signal-to-noise analysis estimated the proportion of overall variability explained by the differences between measured entities (i.e., hospital-affiliated facility groups). A minimum sample size of 10 patients was required for the signal-to-noise analysis. The results are reported overall and by year from 2018 to 2023, since the measure is intended to be reported annually. This analysis was only conducted at the facility group level given that this is a new eCQM and only one performance rate was available at the integrated delivery system level.

                4.2.3 Reliability Testing Results

                Patient-level Data Element Reliability Percentage Agreement and Kappa: From the stratified random sample of 100 patients, 100 were included in the denominator and 80 were included in the numerator. The percentage agreements between the gold-standard manual chart review abstractions and the eCQM automated data extractions were 100% with Kappas of 1.0 for each level of analysis (i.e., denominator data elements and numerator data elements).

                Accountable Entity-level Reliability Signal-to-Noise Analysis: The SNRs are provided in Table 5 (in attachment "4700e_ReliabilityTestingResults.docx"). Overall, the median SNR was 0.996 (95% CI: 0.988, 0.998) for the six hospital-affiliated facility groups at Health System 1. The minimum SNR was 0.914 and the maximum SNR was 0.999. The SNRs were high across all years from 2018 to 2023. The median SNR for 2023, which is most reflective of current performance, was 0.997 (95% CI: 0.991, 0.999) for the six hospital-affiliated facility groups. The minimum SNR was 0.989 and the maximum SNR was 0.998 in 2023.

                Please note that it was not possible to provide the performance scores by decile. This is a new eCQM that was tested at two health systems to-date with a total of 1 integrated delivery system eCQM performance rate reported (Health systems 1) and 7 facility-group eCQM performance rates reported (Health Systems 1 and 2).

                4.2.3a Attach Additional Reliability Testing Results
                4.2.4 Interpretation of Reliability Results

                Patient-level Data Element Reliability Percentage Agreement and Kappa: The 100% agreements and Kappas of 1.0 demonstrated excellent reliability between the gold-standard manual chart review abstractions and the eCQM automated data extractions. These results indicate that the eCQM reliably abstracted data to define the numerator and denominator populations.

                Accountable Entity-level Reliability Signal-to-Noise Analysis: The >0.97 median SNRs with narrow 95% confidence intervals, overall and across all years, indicate that a very high proportion of overall variability is explained by the differences between measured entities (i.e., hospital-affiliated facility groups).

              • 4.3.3 Method(s) of Validity Testing

                Patient-level Data Element Validity Percentage Agreement an Positive Predictive Value: Chart reviews were conducted on a stratified random sample of 100 patients to assess whether the eCQM appropriately allocated patients into the numerator or denominator only to calculate the eCQM rates. Given the high overall eCQM rate at Health System 1, patients that did not meet the numerator criteria as assigned by the eCQM were oversampled; 80 patients that met the numerator criteria and 20 patients that did not meet the numerator criteria were selected for chart review. Manual chart review was considered the gold standard. Chart reviewers were blinded to the eCQM automated allocations and reviewed the full chart to assess whether each patient should be included in the numerator or denominator only. Percentage agreement was calculated between the gold-standard manual chart review allocations and the eCQM automated allocations. The Positive Predictive Value (PPV) of the denominator was also calculated to quantify the proportion of patients included in the denominator that required immediate follow-up with diagnostic breast imaging or breast biopsy. Health Systems 2 and 3 are in the process of conducting chart reviews.

                Accountable Entity-level Face Validity: The objective of face validity testing was to demonstrate that this measure would be meaningful and beneficial to providers, patients, and informatics professionals, from the perspective of experts in the field. As a part of the validity testing process, we provided the Technical Expert Panel (TEP) with several opportunities throughout the measure development process to suggest improvements and refinements to the measure. The TEP consisted of six members, representing the patient experience and expertise in medicine, measure development, quality and safety of care, cancer screening, health services research, and EHRs. During a July 2024 meeting, the TEP was presented with final measure specifications and initial rate calculations at the integrated health system level and the hospital (i.e., hospital-affiliated facility group level). The TEP also had an opportunity to discuss questions and provide feedback to the measure development team at this time. A formal face validity vote was conducted using the polling function in Zoom. The TEP was asked to agree (vote YES) or disagree (vote NO) on the following two statements:

                1. The Timely Follow-up on Abnormal Screening Mammography for Breast Cancer Detection – eCQM, as specified at the integrated health system level, can be used to distinguish good from poor quality care.
                2. The Timely Follow-up on Abnormal Screening Mammography for Breast Cancer Detection – eCQM, as specified at the hospital level, can be used to distinguish good from poor quality care.

                TEP members were blinded to individual member responses but were told the final face validity vote results after eligible members had voted. 

                Accountable Entity-level Spearman’s Rank Correlation Coefficients and Interclass Correlation Coefficients: A random half split correlation was conducted at the hospital-affiliated facility group level at Health System 1, with 6 facility groups included in the analysis. To perform a random half split correlation analysis, we required a minimum of 20 patients for each facility group per year (10 patients in each split sample). Patients in each clinician group were randomly split by facility group and year into a test group or a validation group, with ~50% of patients in each group. The descriptive statistics and p-values for each group were calculated. Spearman’s rank correlation coefficients and Interclass Correlation Coefficients (ICC) were calculated with 95% confidence intervals. The ICCs were calculated to describe how much variation in the facility group level scores was due to facility group level signal variation. The Spearman’s rank correlation coefficients were calculated to compare the relative rankings of facility groups in the test and validation samples. The Spearman’s rank correlation coefficients were reported overall and by year, since the measure is intended to be reported annually. These analyses were only conducted at the facility group level given that this is a new eCQM and only one performance rate was available at the integrated delivery system level.

                4.3.4 Validity Testing Results

                Patient-level Data Element Validity Percentage Agreement and Positive Predictive Value: From the stratified random sample of 100 patients, 20 were included in the denominator only and 80 were included in the numerator. The percentage agreement between the gold-standard manual chart review allocations and the eCQM automated allocations was 99%. The PPV of the denominator was also 99%. The discrepancy occurred due to the inclusion of one patient with a BI-RADS of 0T in the denominator, indicating that a technical repeat of the screening mammogram was required. The repeat was timely and resulted in a BI-RADS 1 rating that did not require diagnostic follow-up.

                Accountable Entity-level Face Validity: At the July 2024 TEP meeting, members were asked to agree (vote YES) or disagree (vote NO) on the following two statements:

                1. The Timely Follow-up on Abnormal Screening Mammography for Breast Cancer Detection – eCQM, as specified at the integrated health system level, can be used to distinguish good from poor quality care.
                2. The Timely Follow-up on Abnormal Screening Mammography for Breast Cancer Detection – eCQM, as specified at the hospital level, can be used to distinguish good from poor quality care.

                The final vote for #1 was 6/6 members (100%) in agreement with the statement at the integrated health system level. The final vote for #2 was 5/6 members (83.3%) in agreement with the statement at the hospital level. Additional data to address feedback and concerns was provided to two TEP members that did not initially agree with the statement for review. The data showed that 89% to 99% of patients received both screening and follow-up within the same hospital-affiliated facility group for 6 facility groups in 2023. One TEP member changed their vote to agree with the statement and no response has been received from the other TEP member yet.

                Accountable Entity-level Spearman’s Rank Correlation Coefficients and Interclass Correlation Coefficients: The six facility groups from Health System 1 were included for years 2018 to 2023. 38,290 patients were included in the test sample, and 38,308 patients were included in the validation sample (Tables 6a-g in attachment "4700e_ValidityTestingResults.docx"). The eCQM rate of timely diagnostic resolution was 91.9% and 91.7% in the test and validation samples, respectively. P-values were calculated for patient-level demographic characteristics; there was one significant difference (p-value: 0.046) between the test and validation samples on ethnic group overall due Facility Group 6. The overall Spearman’s rank correlation coefficient was 1.00 (95% CI: 0.99, 0.99) (Table 7 in attachment "4700e_ValidityTestingResults.docx"). There was no apparent trend over time. The Spearman’s rank correlation coefficient for 2023, which is most reflective of current performance, was 0.94 (95% CI: 0.49, 0.99). The overall ICC was 0.037 (95% CI: 0.013, 0.302) in the test sample and 0.066 (95% CI: 0.027-0.306) in the validation sample (Table 8 in attachment "4700e_ValidityTestingResults.docx"). There were no apparent trends over time.

                4.3.4a Attach Additional Validity Testing Results
                4.3.5 Interpretation of Validity Results

                Patient-level Data Element Validity Percentage Agreement and Positive Predictive Value: The 99% agreement and PPV of 99% demonstrated strong validity of the eCQM automated allocations and ability to calculate accurate eCQM rates.

                Accountable Entity-level Face Validity: Face validity was established by a panel of experts who agreed that the measure can be used to distinguish good from poor quality care at the integrated health system level. The majority of TEP members agreed that the measure can be used to distinguish good from poor quality care at the hospital (i.e., facility group) level. Additional data has been shared with TEP members to address feedback and concerns. No responses have been received yet.

                Accountable Entity-level Spearman’s Rank Correlation Coefficients and Interclass Correlation Coefficients: The overall Spearman’s rank correlation coefficient of 1.00 (95% CI: 0.99, 0.99) indicated a very strong positive correlation between the test and validation samples. However, correlations by year were wide given that only six facility groups were included in the analysis. Additional facility group data is required to generate narrower confidence intervals. The overall ICCs were low at 0.037 (95% CI: 0.013, 0.302) in the test sample and 0.066 (95% CI: 0.027-0.306) in the validation sample, indicating that a low proportion of variation in the facility group level scores was due to facility group level signal variation. Notably, the 95% CIs were very wide given that only six facility groups were included in the analysis. Additional facility group data is required to generate narrower confidence intervals.

              • 4.4.1 Methods used to address risk factors
                Risk adjustment approach
                Off
                Risk adjustment approach
                Off
                Conceptual model for risk adjustment
                Off
                Conceptual model for risk adjustment
                Off
                • 5.1 Contributions Towards Advancing Health Equity

                  The eCQM performance rates were calculated stratified by age at index abnormal screening mammogram, race, ethnic group, primary insurance at index abnormal screening mammogram, and primary language (Table 14 in supplemental attachment "4700e_SupplementalInformation.docx"). A model was used to calculate the rates and 95% CIs while clustering by facility group. P-values were used to assess for significant differences in rates.

                  There were significant differences by race (p = 0.0004) and primary language (p = 0.005). Although all rates were > 85%, white and English speaking patients were significantly more likely to reach diagnostic resolution within 60 days after an abnormal screening mammogram. These findings align with disparities reported in the published literature. 

                  Similar analyses were conducted for each of the 6 facility groups (Table 15 in supplemental attachment "4700e_SupplementalInformation.docx"). There were no significant differences due to lower sample sizes.

                  Additional data analyses are planned stratified by year, particularly to address disparities in 2023, where facility group eCQM rates were considerably lower.

                  • 6.2.1 Actions of Measured Entities to Improve Performance

                    Rates of timely diagnostic resolution after abnormal mammographic screening tend to improve in health care facilities with long periods of enrollment in comprehensive quality improvement initiatives with clearly outlined performance benchmarks [1]. However, given the absence of standard policies defining the timeliness of diagnostic resolution and requirements for reporting mammography performance, not all facilities that perform mammographic screening keep track of data for follow-up after screening [1, 2]. Additionally, not all facilities are equally equipped to track this data since this would require staffing resources, access to interoperable data collection systems (e.g., interoperable electronic health records [EHRs]), and tools to measure the data (i.e., eCQMs) [1, 2, 3]. 

                    Measured entities responsible for implementing quality measures must consider these limitations when developing mandates for facilities to report measurement data. Entities must articulate clear performance standards around the timeliness of diagnostic resolution and conduct staff education on data collection systems to allow for measure reporting at regular intervals [1, 2]. Entities should also encourage better interoperability standards for EHRs to ensure that data can be shared between different health care systems to better track patient medical histories and reduce incidences of data missingness [4, 5]. Interoperability improves data quality which allows for a more meaningful measurement of mammography data, especially when using eCQMs [5]. This data can then be more reliably used to inform targeted interventions, like patient navigation and case management, EHR-based trigger algorithms to identify screening eligible patients needing follow-up, and patient education and reminders, in underperforming facilities and sites looking to meet performance benchmarks for timely diagnostic resolution [7, 8, 9]. 

                    1. Rauscher GH, Murphy AM, Orsi JM, Dupuy DM, Grabler PM, Weldon CB. Beyond the mammography quality standards act: measuring the quality of breast cancer screening programs. AJR Am J Roentgenol. 2014;202(1):145-151. doi:10.2214/AJR.13.10806. PMID: 24261339.
                    2. Rauscher GH, Tossas-Milligan K, Macarol T, Grabler PM, Murphy AM. Trends in Attaining Mammography Quality Benchmarks With Repeated Participation in a Quality Measurement Program: Going Beyond the Mammography Quality Standards Act to Address Breast Cancer Disparities. J Am Coll Radiol. 2020;17(11):1420-1428. doi:10.1016/j.jacr.2020.07.019. PMID: 32771493.
                    3. Murphy DR, Meyer AND, Vaghani V, et al. Electronic Triggers to Identify Delays in Follow-Up of Mammography: Harnessing the Power of Big Data in Health Care. J Am Coll Radiol. 2018;15(2):287-295. doi:10.1016/j.jacr.2017.10.001. PMID: 29102539.
                    4. Oluyemi ET, Grimm LJ, Goldman L, et al. Rate and Timeliness of Diagnostic Evaluation and Biopsy After Recall From Screening Mammography in the National Mammography Database. J Am Coll Radiol. 2024;21(3):427-438. doi:10.1016/j.jacr.2023.09.002. PMID: 37722468.
                    5. Kaushal R. The Role of Health Information Technology in Improving Quality and Safety in Pediatric Health Care. Agency for Healthcare Research and Quality; 2012. Available from: https://digital.ahrq.gov/sites/default/files/docs/page/final-kaushal-story-7-6-12.pdf. Accessed October 31, 2024.
                    6. Reece JC, Neal EFG, Nguyen P, McIntosh JG, Emery JD. Delayed or failure to follow-up abnormal breast cancer screening mammograms in primary care: a systematic review. BMC Cancer. 2021;21(1):373. Published 2021 Apr 7. doi:10.1186/s12885-021-08100-3. PMID: 33827476.
                    7. Haas JS, Atlas SJ, Wright A, et al. Multilevel Follow-up of Cancer Screening (mFOCUS): Protocol for a multilevel intervention to improve the follow-up of abnormal cancer screening test results. Contemp Clin Trials. 2021;109:106533. doi:10.1016/j.cct.2021.106533. PMID: 34375748.
                    8. Atlas SJ, Tosteson ANA, Wright A, et al. A Multilevel Primary Care Intervention to Improve Follow-Up of Overdue Abnormal Cancer Screening Test Results: A Cluster Randomized Clinical Trial. JAMA. 2023;330(14):1348-1358. doi:10.1001/jama.2023.18755. PMID: 37815566.
                    • Submitted by Koryn Rubin (not verified) on Tue, 12/10/2024 - 14:54

                      Permalink

                      The American Medical Association appreciates the opportunity to comment on this measure addressing timely follow-up after an abnormal screening result and supports its intent. However, we have several concerns and ask that the committee consider them during their review. 

                       

                      Specifically, current evaluation criteria require developers to complete validity testing on at least two electronic health record vendor systems (EHRs) and it appears that only one system was used for the data element validity analyses. In addition, because the BI-RADs data are not always documented in a discrete field, the developers used a string search to extract these data; however, we were unable to determine whether they adequately tested this term to ensure that it use is reliable and valid. 

                       

                      In addition, there are the potential factors that are outside of the control of the facility such as current workforce shortages and the possibility that patients receive their follow-up at another facility. These apparent measure failures are reflections of factors that are outside of the control of a given facility, and we believe that this measure currently does not adequately address those scenarios. 

                      Organization
                      American Medical Association

                      Thank you very much for taking the time to review the measure submission and provide comments.

                      For validity testing using data from other EHRs, we are in the process of conducting formal chart reviews at Health System 2 (Cerner, now Oracle Health) and Health System 3 (Allscripts), which are located in other regions of the U.S. and serve different patient populations. We used an iterative process to validate the data extractions for calculation of the eCQM performance rates at these two health systems; however, we have not yet received the quantitative validity testing results.

                      Regarding the performance of the BI-RADS string search algorithm, we provided a qualitative assessment in the measure submission based on the iterative development and testing of the algorithm at Health System 1, stating that "this approach demonstrated near perfect performance." The algorithm was required to extract BI-RADS results for one facility group at Health System 1 that only began using structured fields for last overall assessments of breast imaging results in 2023. We have now also conducted a quantitative assessment of the algorithm performance; the algorithm showed 100% accuracy for BI-RADS category extraction for the last overall assessment from a random sample of 100 breast imaging reports based on manual chart reviews. This string search approach was also applied by Health System 3 to extract data from a legacy system, and we are in the process of conducting formal chart reviews to assess algorithm performance.

                      We do not have evidence from the literature to quantify the frequency of out-of-system diagnostic resolution for breast cancer screening. However, the rates are low considering the high performance of all three health systems on this eCQM. Data from Health Systems 1 and 2 included in the measure submission show that it is feasible to complete this process in 60 days with performance rates around the 90% benchmark. We also recently received data from Health System 3, which showed an overall rate of 90.5% (95% CI: 89.8%, 91.1%) for one integrated delivery system over 8 years.

                      Further analyses of Health System 1 data showed that 89% to 99% of patients reaching diagnostic resolution received both screening and diagnostic resolution within the same facility group in 2023. Given that this eCQM provides an assessment of integrated delivery system and hospital-affiliated facility group capacity to complete timely diagnostic resolution, we applied a benchmark of 90%, which served to accommodate for out-of-system/out-of-facility diagnostic resolution for the performance gap assessment.

                      Organization
                      Brigham and Women's Hospital
                    • Importance

                      Importance Rating
                      Importance

                      Strengths:

                      • The developer cites evidence that mammographic screening has significantly reduced breast cancer mortality since the 1990s, with early detection also potentially reducing treatment costs by 30-100%. Delays in diagnostic follow-ups after abnormal screenings can worsen outcomes, particularly for racial and ethnic minorities and low-income groups, who are more likely to experience delayed follow-ups and higher mortality rates.
                      • The Center for Disease Control and Prevention (CDC) National Breast and Cervical Cancer Early Detection Program (NBCCEDP) recommends that breast cancer screening to diagnostic resolution should occur within 60 days.
                      • Studies indicate that only a minority of breast imaging facilities meet benchmarks for timely follow-up imaging and biopsy after abnormal screenings. However, cited evidence shows that participation in quality measurement programs has shown to improve these metrics, especially in facilities not designated as Breast Imaging Centers of Excellence.
                      • This new eCQM process measure is intended to address quality assessment gaps by helping facilities monitor timeliness and completeness of care in breast cancer screening diagnostic processes. The developer states that this measure will help facilities to conduct routine quality assessments and guide improvement initiatives to promote health equity and timeliness in breast cancer screening and diagnostics.
                      • Studies have shown that regular quality assessments and interventions such as patient navigation, EHR reminders, and patient education can enhance follow-up rates.
                      • The developer provides an assessment of similar, current quality measures, which include the Breast Cancer Screening rate for women aged 50-74, the Recall Rates for follow-up imaging or tests within 45 days after a screening, and the Follow-Up after Abnormal Breast Cancer Assessment, which tracks follow-up actions within 90 days for inconclusive or high-risk assessments.
                      • The developer posits that while these measures address important aspects of the breast cancer screening process, none comprehensively assess the entire pathway from an abnormal screening mammogram to the final diagnostic resolution, highlighting a gap in the current quality measurement framework.
                      • The developer provided performance gap data at the integrated delivery system-level and the facility-level. The developer notes that as a new eCQM that was tested at two health systems to-date.
                      • For health system 1, the overall eCQM performance rate at the integrated delivery system-level dropped to 84.4% in 2023, falling below the 90% benchmark. At the facility-level within the same system, significant decreases were noted in three groups with rates dropping to 62.0%, 75.2%, and 85.1%, respectively.
                      • Health system 2 showed consistent eCQM performance rates that met the 90% benchmark from 2020 onwards at one hospital-affiliated facility. However, preliminary results for two additional facilities in the same system revealed significant deviations from the 90% benchmark, highlighting opportunities for improvement.
                      • The developer conducted three provider interviews, and one pilot patient interview to date. More interviews are underway with a target of 5-10 provider and 5-10 patient interviews. Feedback was also obtained from a Technical Expert Panel (TEP) and through a public comment period. The pilot patient interview highlighted significant wait times and travel difficulties between screening steps, contributing to patient anxiety and underscoring the importance of timely follow-up, which the patient believes should be reflected in the eCQM to provide transparency about their care experience.

                      Limitations:

                      • The developer identifies that the performance reporting by facilities may be affected by the quality of data collection and the resources available for data management. The developer cites evidence that suggest longer participation in quality measurement programs helps facilities meet mammography quality benchmarks for timely follow-up. The developer further posits that facilities can perform better if tools, such as this eCQM, are made available to them. However, the developer’s logic model does not clearly depict what inputs and activities are needed to report and improve on this metric. This information is provided in more detail in section 6.2.1 of Usability, but it is not captured in the logic model.

                      Rationale: 

                      • This eCQM process measure for breast cancer screening and diagnostics is designed to fill existing gaps in quality assessment by enabling facilities to monitor the timeliness and completeness of care. Evidence supports that mammographic screening has significantly reduced breast cancer mortality since the 1990s and that early detection can substantially lower treatment costs. However, challenges remain, particularly for racial and ethnic minorities and low-income groups who are more likely to experience delays in follow-up after abnormal screenings, potentially leading to worse outcomes. This measure, alongside regular quality assessments and interventions such as patient navigation and EHR reminders, aims to enhance follow-up rates and promote health equity. Patient input has underscored the importance of this measure. The main limiation is the developer’s logic model does not clearly depict what inputs and activities are needed to report and improve on this metric.

                      Feasibility Acceptance

                      Feasibility Rating
                      Feasibility Acceptance

                      Strengths:

                      • As an eCQM, all data elements are in electronic format.
                      • The developer reports that all required data elements for the eCQM are routinely collected across various health systems and facility groups
                      • The measure is not a proprietary measure and no proprietary components.
                      • Feasibility scores varied slightly across facilities within the same health system, reflecting differences in data management practices and capabilities. Specifically, the feasibility assessment and scorecard revealed that BI-RADS data might not always be captured in structured fields, prompting the development of a string search method to extract these results from unstructured EHR fields, which showed near-perfect performance in initial applications.
                      • The developer notes that an alternative to string search is using health information technology systems like MagView, which interfaces with EHRs to extract necessary data elements for the eCQM, although this method also faced challenges in defining the target population accurately.
                      • These findings did not alter the final measure specifications.

                      Limitations:

                      • None. 

                      Rationale: 

                      • This eCQM leverages data that are routinely collected across various health systems and facility groups, all in electronic format, which facilitates the implementation of the measure. The developer identified challenges with inconsistent use of structured fields for BI-RADS. This prompted the development of a string search method to extract these results from unstructured EHR fields, which showed near-perfect performance in initial applications. These findings did not change the final measure specifications.

                      Scientific Acceptability

                      Scientific Acceptability Reliability Rating
                      Scientific Acceptability Reliability

                      Strengths:

                      • Data Sources and Dates:  Data used for testing were sourced from EHRs during the years 2018 to 2023.
                      • Patient/Encounter Level Reliability: The developer conducted inter-abstractor reliability testing at the person- or encounter-level for all critical data elements. The developer reported 100% agreement between the gold-standard manual chart review abstractions and the eCQM automated data extractions with Kappas of 1.0 for each level of analysis, which meets the expected threshold of 0.4.
                      • Accountable Entity Level Reliability: The developer conducted signal-to-noise reliability testing at the accountable entity-level. 100% of accountable entities meet the expected threshold of 0.6 with a minimum reliability of 0.914 across the six measurement years (2018-2023) and a minimum reliability of 0.989 in 2023. In the validity testing section, the developer conducted a random half split correlation analysis at the hospital-affiliated facility group level in Health System 1, involving six facility groups and splitting patients into test and validation groups. This analysis should have been reported in the reliaibity section. The eCQM rate for timely diagnostic resolution was similar between the test (91.9%) and validation (91.7%) samples, with no significant demographic differences, and the Spearman’s rank correlation coefficient was 0.94 in 2023, indicating a strong positive correlation. However, the wide confidence intervals and low ICCs suggest that more facility group data is needed to improve the precision of these estimates.

                      Limitations:

                      • Data Sources and Dates: The developer conducted signal-to-noise reliability testing at the accountable entity-level on only six facility groups.

                      Rationale: 

                      • The results demonstrate sufficient reliability at the accountable entity level.

                       

                      Scientific Acceptability Validity Rating
                      Scientific Acceptability Validity

                      Strengths:

                      • For patient-/episode-level (data element) validity for the numerator and denominator the developer reports a percentage agreement between the gold-standard manual chart review allocations and the eCQM automated allocations was 99%. The PPV of the denominator was also 99%.  The developer also reports complete agreement (100%) with the statement that the measure might be used at the integrated health system level, and less agreement that the measure might be used at the hospital level.  The developer did not conduct empirical validity testing (ICC is considered a test for reliability).
                      • The developer conducted statistical risk adjustment, based on a conceptual model, selecting risk factors that have a significant correlation to the outcome. The developer also explored social risk factors, such as dual eligibility and included these in the final model. The developer reported c-statistics of 0.736 and 0.753, indicating good model discrimination.

                      Limitations:

                      • As a new measure submission the patient-/episode-level (data element) validity testing is sufficient. However, only one EHR was used for data element testing.  The accountable entity level testing, while not required, was not sufficient.  For face validity, a larger TEP of at least 12 members including patient representatives and broad representation from potential measure users is preferred.   In addition, a Likert scale of at least five responses is preferred to demonstrate consensus.  The ICC should be reported under reliability, not validity.   Further testing with additional sites is necessary for the future.  The measure may only be valid within the same hospital-affiliated facility group.

                      Rationale: 

                      • The validity rating is based on patient-/episode-level (data element) testing only, which is acceptable for a new eCQM. The accountable entity level testing, while not required, was not sufficient.  For face validity, a larger TEP of at least 12 members including patient representatives and broad representation from potential measure users is preferred.   In addition, a Likert scale of at least five responses is preferred to demonstrate consensus.  The ICC should be reported under reliability, not validity. Further testing with additional sites and within at least two EHR vendors is necessary for the future. 

                      Equity

                      Equity Rating
                      Equity

                      Strengths:

                      • The eCQM performance rates were calculated and stratified by various demographic factors such as age, race, ethnic group, primary insurance, and primary language, using a model that accounted for clustering by facility group and included significance testing.
                        Significant disparities were observed in the rates by race and primary language, with white and English-speaking patients more likely to achieve diagnostic resolution within 60 days following an abnormal screening mammogram.
                        While similar analyses across the six facility groups showed no significant differences due to smaller sample sizes, additional data analyses are planned for future years to further explore and address these disparities, especially considering the lower eCQM rates observed in 2023 across some facility groups.

                      Limitations:

                      • The analytic approach was not specified in the submission.
                      • The developer notes they will continue to monitor disparities for the most affected subgroups, but they do not specifically address unintended consequences or apply an interpretation for the results.

                      Rationale: 

                      • The eCQM performance rates, stratified by demographics such as age, race, ethnic group, primary insurance, and primary language, revealed significant disparities; white and English-speaking patients were more likely to achieve diagnostic resolution within 60 days after an abnormal screening mammogram. Despite no significant differences found across the six facility groups due to smaller sample sizes, further analyses are planned to address these disparities, particularly following lower performance rates observed in 2023. The developer could also provide additional information in the submission itself describing methods and exploring the interpretation of the disparities findings and how they might be used to improve health care.

                      Use and Usability

                      Use and Usability Rating
                      Use and Usability

                      Strengths:

                      • This new measure is not currently in use, but it has a planned use for public reporting and quality improvement.
                      • The developer refers draws attention to the limitations of tracking data for follow-up after screening, due to potential resource challenges.
                      • Entities should establish and enforce standardized policies defining the timeliness of diagnostic resolution and requirements for reporting mammography performance to ensure all facilities consistently track and report follow-up data.
                      • Entities should also conduct comprehensive staff education on data collection systems and promote better interoperability standards EHRs, facilitating seamless data sharing across different healthcare systems.
                      • By improving data quality and management, the developer posits that entities can more effectively use mammography data to inform and implement targeted interventions such as EHR-based trigger algorithms, patient navigation, and education programs to improve timely diagnostic resolution in underperforming facilities.

                      Limitations:

                      • None. 

                      Rationale: 

                      • The new measure, intended for public reporting and quality improvement, addresses challenges in tracking follow-up data after mammographic screenings due to resource limitations. Entities are encouraged to standardize policies for timeliness and reporting, enhance staff training on data systems, and improve EHR interoperability to ensure consistent data tracking. These steps aim to facilitate targeted interventions to improve timely diagnostic resolutions.
                    • First Name
                      Matt
                      Last Name
                      Austin

                      Submitted by Matt Austin on Mon, 01/20/2025 - 18:12

                      Permalink

                      Importance

                      Importance Rating
                      Importance

                      Only had input from one patient on the meaningfulness of the measure.   Would like to see more feedback.

                      Feasibility Acceptance

                      Feasibility Rating
                      Feasibility Acceptance

                      Agree with staff assessment.

                      Scientific Acceptability

                      Scientific Acceptability Reliability Rating
                      Scientific Acceptability Reliability

                      Agree with staff assessment.

                      Scientific Acceptability Validity Rating
                      Scientific Acceptability Validity

                      The staff assessment noted that data element validity testing should occur in 2+ EHRs, but I don't see that in the guidebook.

                       

                      The staff assessment also noted the TEP should include 12+ members, but that is not listed in the guidebook.

                      Equity

                      Equity Rating
                      Equity

                      Optional.

                      Use and Usability

                      Use and Usability Rating
                      Use and Usability

                      Agree with staff assessment.

                      Summary

                      Comments noted.