Skip to main content

Breast Cancer Screening Recall Rates

CBE ID
4220
Endorsement Status
E&M Committee Rationale/Justification

Due to No Consensus.

1.1 New or Maintenance
Previous Endorsement Cycle
Is Under Review
No
1.3 Measure Description

The Breast Cancer Screening Recall Rates measure calculates the percentage of beneficiaries with mammography or digital breast tomosynthesis (DBT) screening studies that are followed by a diagnostic mammography, DBT, ultrasound, or magnetic resonance imaging (MRI) of the breast in an outpatient or office setting within 45 days.

        • 1.5 Measure Type
          1.6 Composite Measure
          No
          1.7 Electronic Clinical Quality Measure (eCQM)
          1.8 Level Of Analysis
          1.9 Care Setting
          1.10 Measure Rationale

          The Breast Cancer Screening Recall Rates measure calculates the percentage of beneficiaries with mammography or digital breast tomosynthesis (DBT) screening studies that are followed by a diagnostic mammography, DBT, ultrasound, or magnetic resonance imaging (MRI) of the breast in an outpatient or office setting on the same day or within 45 days.

          1.20 Testing Data Sources
          1.25 Data Sources

          CBE #4220e is calculated using data from final claims that facilities submit for Medicare beneficiaries enrolled in fee-for-service (FFS) Medicare. The data are calculated only for facilities paid through the OPPS for mammography and DBT screening in the hospital outpatient setting. Data are pulled from the hospital outpatient and carrier files to identify eligible cases for inclusion in the initial patient population and numerator (e.g., a mammography follow-up study can occur in any location and be included in the measure’s numerator). Due to claims adjudication, there is a lag between when an imaging study is performed and when it is reported on the public reporting website.

        • 1.14 Numerator

          Medicare beneficiaries who had a diagnostic mammography study, DBT, ultrasound, or MRI of the breast following a screening mammography or DBT study on the same day or within 45 days of the screening study in any location.

          1.14a Numerator Details

          CBE #4220e calculates the percentage of mammography and digital breast tomosynthesis (DBT) screening studies that are followed by a diagnostic mammography, DBT, ultrasound, or magnetic resonance imaging (MRI) of the breast in an outpatient or office setting within 45 days. The measure’s denominator contains any Medicare beneficiary who underwent a screening mammography or DBT study at a facility subject to OPPS regulation during the measurement period. From these beneficiaries, the numerator contains beneficiaries who had a diagnostic mammography study, DBT, ultrasound, or MRI of the breast following a screening mammography or DBT study within 45 days. 

          The Current Procedural Terminology (CPT) and Healthcare Common Procedure Coding System (HCPCS) codes used to identify beneficiaries with a diagnostic mammography study, DBT, ultrasound or MRI can be found in the submitted Excel file.

        • 1.15 Denominator

          Medicare beneficiaries who underwent a screening mammography or DBT study at a facility reimbursed through the Outpatient Prospective Payment System (OPPS).

          1.15a Denominator Details

          The CBE #4220e denominator contains any Medicare beneficiary who underwent a screening mammography or screening DBT’s performed at a facility subject to Outpatient Prospective Payment System (OPPS) regulation during the measurement period The CPT and HCPCS codes used to identify beneficiaries who underwent a screening mammography or screening DBT can be found in the submitted Excel file. 

        • 1.15b Denominator Exclusions

          None.

          1.15c Denominator Exclusions Details

          None.

        • OLD 1.12 MAT output not attached
          Not attached
          1.13 Attach Data Dictionary
          1.13a Data dictionary not attached
          Yes
          1.16 Type of Score
          1.17 Measure Score Interpretation
          Better quality = Score within a defined interval
          1.18 Calculation of Measure Score

          Please see attached measure score calculation diagram within the attachment under the 'Measure Score Calculation Diagram' question.

          1.18a Attach measure score calculation diagram, if applicable
          1.19 Measure Stratification Details

          Not applicable—CBE #4220e is not stratified.

          1.26 Minimum Sample Size

          CBE #4220e uses a relative precision model to determine the minimum necessary number of cases. Similar approaches are used for three other Outpatient Imaging Efficiency measures. For this precision model, calculating minimum case count is determined by acceptable levels of precision, the level of confidence necessary for each measure, and the minimum case count required to meet precision and confidence. Precision depends on the facility’s observed performance rate. In general, stricter levels of precision are necessary for scores that are closer to the tail ends of the possible range of the measure score (i.e., 0.05 or 0.95), whereas scores towards the middle of the possible range (e.g., 0.50) do not require as strict a level of precision. The level of significance is 0.10. Thus, the minimum case counts (see Table 1 within the attachment under the 'Logic Model' question) ensure 90 percent confidence that the observed score reflects the true score of the minimum case counts that would be necessary to publicly report CBE #4220e. Facilities would need at least 31 cases to qualify for public reporting; this number can vary from 31 to 67, depending on a facility’s performance rate.

        • Most Recent Endorsement Activity
          Initial Recognition and Management Fall 2023
          • 2.1 Attach Logic Model
            2.2 Evidence of Measure Importance

            From the perspective of both clinical quality and efficiency, there are potentially negative consequences if the mammography and DBT recall rate is either too high or too low. A high cumulative dose of low-energy radiation can be a consequence of too many false-positive mammography and DBT follow-up studies. Radiation received from mammography or DBT may induce more cancers in younger people or those carrying deleterious gene mutations, such as BRCA-1 and BRCA-2 (Berrington de Gonzalez et al., 2009). 

            Societies and guidelines provide inconsistent suggestions on the appropriate recall rates to establish for breast cancer screening. The ACR recommends a target recall rate for mammography screening between 5 percent and 12 percent (American College of Radiology 2013); European research, via the International Agency for Research on Cancer, sets a target recall rate of 5 percent..

            References:

            Berrington de Gonzalez, A., Berg, C., Visvanathan, K., & Robson, M. (2009). Estimated risk of radiation-induced breast cancer from mammographic screening for young BRCA mutation carriers. JNCI: Journal of the National Cancer Institute, 101(3), 205–209. https://doi.org/10.1093/jnci/djn440

            D’Orsi, C. J., Sickles, E. A., Mendelson, E. B., Morris EA, et al. (2013). ACR BI-RADS® atlas, breast imaging reporting and data system. Reston, VA: American College of Radiology.

          • 2.3 Anticipated Impact

            This measure will guide breast cancer screening decision making in hospital outpatient departments as there are potentially negative consequences if the mammography and DBT recall rate is either too high or too low.

            The measure potentially will reduce radiation received from mammography or DBT that may induce more cancers in younger people or those carrying deleterious gene mutations, as well as decreasing unnecessary imaging and biopsies. Alternatively, underuse of follow up for screening mammography or DBT may result in missed cases of cancer.

            CMS calculates performance for its Outpatient Imaging Efficiency measures using data from final claims that facilities submit for Medicare beneficiaries enrolled in FFS Medicare. The data are calculated only for facilities paid through the OPPS for mammography and DBT screening studies in the hospital outpatient setting. Data from the hospital outpatient and carrier files are used to determine beneficiary inclusion (e.g., a mammography follow-up study can occur in any location and be included in the measure’s numerator).

            Results reported are for the public reporting period based on data collected from July 1, 2021, through June 30, 2022 (referred to as 2023 public reporting or PR 2023). In PR 2023, 3,652 facilities had at least 1 eligible case in the measure denominator. A total of 3,391 facilities met the minimum case count requirement, making them eligible for public reporting.

            The analysis of the performance gap is presented in Table 2 and Table 3, within the attachment under the 'Logic Model' question. Table 2 presents the distribution of performance scores and denominator counts for facilities meeting MCC and for all facilities with at least one case in the denominator. Table 3 presents measure performance scores by patient biological sex, racial or ethnic identity, age group, and dual eligibility status, including chi-square values and probabilities used to assess whether differences in performance are statistically significant. For these analyses, only cases from facilities meeting minimum case count requirements for public reporting were used.

            Table 2 shows the mean measure performance for facilities meeting MCC (8.5 percent, standard deviation [S.D.], 6.7 percent) falls within the targeted recall rate range of 5 percent to 12 percent; however, analysis of performance across deciles demonstrates variability across facilities during the measurement period, with more than 30 percent (33.4) of facilities having scores outside of the targeted recall rate range. Scores for all eligible facilities (i.e., those with at least one case in the denominator) add an additional 261 facilities and 7,475 patients; these facilities display a similar distribution with slightly higher mean performance (8.9 percent, S.D., 8.7 percent).

            Performance by patient characteristics displayed in Table 3 show statistically significant differences in performance by biological sex, racial or ethnic identity, age band, and dual eligibility status. Care should be taken in interpretation of these results as some categories make up a small percentage of the total for each characteristic. For example, only 0.01 percent of patients in the measure sample are male (as would be expected, given the clinical scope of the measure), although the chi-square probability (<0.0001) indicates the difference in performance (24.3 percent for males, 9.2 percent for females) is significant. Racial identity also provides a similar chi-square probability (<0.0001), with white patients making up the majority of cases (86.4 percent of the total initial patient population, with a performance rate of 9.2 percent) followed by Black patients (7.6 percent of the initial patient population, with a performance rate of 8.5 percent). The next largest category is unknown race, comprising 2.1 percent of the initial patient population, with performance at 10.7 percent. While comprising a small percentage of the initial patient population, performance scores for patients of other race (9.8 percent), Asian or Pacific Islander (10.0 percent), and American Indian or Alaska Native (6.7 percent) show significant variation between race categories. Similarly, patients of Hispanic or Latino (9.4 percent) ethnicity also vary substantially from non-Hispanic or non-Latino populations. 

            Age band categories show consistent trends of lower scores as age increases, ranging from 17.6 percent for patients aged 18 to 34, to 8.2 percent for those age 85 or older. Younger patients make up a small percentage of the overall testing population, with the categories including those aged 18 to 54 comprising about 2.5 percent of the initial patient population. Those aged 55 to 64 are 4.9 percent of the initial patient population, with performance score of 9.3 percent. Patients over the age of 65 make up 92.6 percent of the initial patient population, with scores ranging from 9.3 percent (for ages 65 to 74) to 8.2 percent (for patients who are 85 or older).

            Finally, performance by dual eligibility was examined, with 92.6 percent of the initial patient population having only Medicare FFS coverage, and the remaining 7.4 percent enrolled in both Medicare FFS and Medicaid (dually eligible). The difference in performance was slight—9.2 percent for Medicare only versus 9.3 percent for dual eligible—but significant at the 0.05 level (p=0.0161). 

            2.5 Health Care Quality Landscape

            While other measures evaluate rates of breast cancer imaging, CBE #4220e monitors rate of recall following screening imaging. This measure provides valuable information for facilities, clinicians, administrators, policy makers, patients, researchers, and others to identify facilities recalling the appropriate number of patients for follow-up screening each year.

            2.6 Meaningfulness to Target Population

            Among the target population, additional imaging, and biopsies after a screening mammography or DBT can result in over-diagnosis among patients who do not have breast cancer, increasing their anxiety and distress. Alternatively, inappropriately low recall rates may lead to delayed diagnoses or undetected cases of breast cancer (Nelson et al., 2019). Inclusion of DBT when evaluating follow-up care may improve recall rates and positive prediction values compared to metrics that focus on mammography alone (Aujero et al., 2017; Bian et al., 2016; Chong et al., 2019; Conant et al., 2016; Pozzi et al., 2016; and Skaane 2017). 

            References:

            Aujero, M., Gavenonis, S., Benjamin, R., Zhang, Z., & Holt, J. (2017). Clinical performance of synthesized two-dimensional mammography combined with tomosynthesis in a large screening population. Radiology, 283(1), 70–76. https://doi.org/10.1148/radiol.2017162674

            Berrington de Gonzalez, A., Berg, C., Visvanathan, K., & Robson, M. (2009). Estimated risk of radiation-induced breast cancer from mammographic screening for young BRCA mutation carriers. JNCI: Journal of the National Cancer Institute, 101(3), 205–209. https://doi.org/10.1093/jnci/djn440

            Bian, T., Lin, Q., Cui, C., Li, L., Qi, C., Fei, J., & Su, X. (2016). Digital breast tomosynthesis: A new diagnostic method for mass-like lesions in dense breasts. The Breast Journal, 22(5), 535–540. https://doi.org/10.1111/tbj.12622 

            Chong, A., Weinstein, S., McDonald, E., & Conant, E. (2019). Digital breast tomosynthesis: Concepts and clinical practice. Radiology, 292(1), 1–14. https://doi.org/10.1148/radiol.2019180760

            Conant, E., Beaber, E., Sprague, B., Herschorn, S., Weaver, D., Onega, T., Tosteson, A., McCarthy, A., Poplack, S., Haas, J., Armstrong, K., Schnall, M., & Barlow, W. (2016). Breast cancer screening using tomosynthesis in combination with digital mammography compared to digital mammography alone: A cohort study within the PROSPR Consortium. Breast Cancer Research and Treatment, 156(1), 109–116. https://doi.org/10.1007/s10549-016-3695-1 

            Pozzi, A., Corte, A., Lakis, M., Jeong, H. (2016). Digital breast tomosynthesis in addition to conventional 2D-mammography reduces recall rates and is cost effective. Asian Pacific Journal of Cancer Prevention, 17(7), 3521–3526. Retrieved January 11, 2023, from https://pubmed.ncbi.nlm.nih. gov/27510003

            Skaane, P. (2016). Breast cancer screening with Digital Breast Tomosynthesis. Digital Breast Tomosynthesis, 11–28. https://doi.org/10.1007/978-3-319-28631-0_2

            • 3.1 Feasibility Assessment

              CBE #4220e was assessed via qualitative survey of a multi-stakeholder group of 32 individuals. The measure developer previously seated a technical expert panel, which contained 12 individuals with extensive experience in clinical care (7 physicians), healthcare administration (3 payers, purchasers, or hospital administration staff), and patients (2 patients who act in an advocacy role). To supplement the information gathered from the technical expert panel, the measure developer also reached out to the American College of Radiology, who provided contact information for clinicians, healthcare administration staff, patients, and caregivers. In total, 25 physicians of various specialties, 5 healthcare administration or management staff, 1 patient, and 2 caregivers responded to the survey (which contained questions about measure face validity, feasibility, and usability). Results from this survey are presented throughout the full measure submission form.

              For the question related to feasibility for CBE #4220e, the results indicate that 75 percent of the respondents agree that the measure does not place an undue burden on hospitals to collect the data. For the individuals that responded either Disagree or Strongly Disagree, one stated that burden would depend on who is reporting the measure and how it is reported (additional information on measure use appears in the Use section below); another person felt the measure would be difficult to track without specific Current Procedural Terminology (CPT) codes to identify diagnostic studies that count as follow-up care in the measure’s numerator (which has been resolved); a third respondent felt that burden would be high if exclusion of high-risk individuals was added to the technical specifications (which did not happen); finally, a fourth respondent felt that the measure would have significant burden, but did not explain why.

              Results from the qualitative survey related to measure feasibility for CBE #4220e appear in Table 4, within the attachment under the ‘Measure Logic’ question.

              3.3 Feasibility Informed Final Measure

              No changes were made to the final measure specifications in response to the feasibility assessment. There was high agreement that the measure does not place an undue burden on hospitals to collect the data. 

            • 3.4a Fees, Licensing, or Other Requirements

              There are no fees, licensing, or other requirements to use any aspect of this measure as specified.

              3.4 Proprietary Information
              Not a proprietary measure and no proprietary components
              • 4.1.3 Characteristics of Measured Entities

                A total of 3,652 facilities were included in the testing population, with 3,315,335 imaging studies included in the measure’s denominator. Table 2 above shows the distribution of performance scores and denominator counts for all facilities as well as for the subset (3,391) of facilities meeting MCC requirements. These include all facilities for which relevant Medicare claims data were available; no sampling strategy was employed.

                Distribution for location (i.e., urban versus rural), bed size, teaching status, and ownership status of facilities meeting MCC requirements are shown in Table 5, within the attachment under the 'Logic Model' question. The majority of facilities were urban (59.6 percent), non-teaching (83.5 percent), and non-profit (65.8 percent). Distribution by bed size shows a plurality of facilities to be small (0–50 count bed size, 32.5 percent) with substantive proportions at each subsequent bed size category.

                4.1.1 Data Used for Testing

                As described above, CMS calculates its Outpatient Imaging Efficiency measures using data from final claims that facilities submit for Medicare beneficiaries enrolled in FFS Medicare. The data are calculated only for facilities paid through the OPPS for mammography and DBT screening in the hospital outpatient setting. Data from the hospital outpatient and carrier files are used to determine beneficiary inclusion (e.g., a mammography follow-up study can occur in any location and be included in the measure’s numerator).

                All reported testing results are for the public reporting period based on data collected from July 1, 2021, through June 30, 2022 (PR 2023). In PR 2023, 3,652 facilities had at least 1 eligible case in the measure denominator. A total of 3,391 facilities met the minimum case count requirement, making them eligible for public reporting.

                4.1.4 Characteristics of Units of the Eligible Population

                Table 3, within the attachment under the 'Logic Model' question, displays the distribution of cases included in the denominator from facilities meeting MCC requirements by patient characteristic, including biological sex, racial or ethnic identification, age band, and dual eligibility status. 

                4.1.2 Differences in Data

                The same data were used for all aspects of testing.

              • 4.2.1 Level(s) of Reliability Testing Conducted
                4.2.2 Method(s) of Reliability Testing

                Reliability was calculated in accordance with the methods described in The Reliability of Provider Profiling: A Tutorial (2009). This approach calculates the ability of the measure to distinguish between the performances of different facilities. Specifically, the testing calculated the signal-to-noise ratio for each facility meeting minimum case count, with higher scores indicating greater reliability. The reliability score is estimated using a beta-binomial model and is a function of the facility’s sample size and score on the measure, as well as the variance across facilities. 

                References: 

                Adams, J. (2009). The Reliability of Provider Profiling: A Tutorial. https://doi.org/10.7249/ tr653

                4.2.3 Reliability Testing Results

                See next section.

                Table 2. Accountable Entity–Level Reliability Testing Results by Denominator-Target Population Size
                Accountable Entity-Level Reliability Testing Results
                &nbsp; Overall Minimum Decile_1 Decile_2 Decile_3 Decile_4 Decile_5 Decile_6 Decile_7 Decile_8 Decile_9 Decile_10 Maximum
                Reliability 0.92 0.41 0.81 0.88 0.91. 0.94 0.95 0.97 0.98 0.98 0.99 1.00 1.00
                Mean Performance Score N/A N/A 339 339 339 341 337 340 339 339 338 340 340
                N of Entities N/A N/A 41,335 65,887 96,318 124,514 179,137 243,097 319,001 434,845 620,705 1,183,001 1,183,001
                4.2.4 Interpretation of Reliability Results

                As shown above, reliability scores for CBE #4220e ranged from 0.41 to 1.00, with a median reliability score of 0.95. This median score is indicative of very strong measure reliability and suggests that this measure is able to identify true differences in performance between individual facilities.

              • 4.3.1 Level(s) of Validity Testing Conducted
                4.3.3 Method(s) of Validity Testing

                Feedback received from external stakeholders during a listening session about CBE #4220e indicate that a diverse group of stakeholders support its validity. Stakeholders were in agreement that screening mammography and DBT are appropriate imaging modalities that should be used to capture the initial patient population of the measure; some stakeholders recommended the measure consider the addition of other types of screening modalities, such as MRI and ultrasound, to the specifications. Stakeholders also reached consensus that guidance for the measure should include a target screening recall rate of 5 percent to 12 percent, in alignment with American College of Radiology guidelines, noting that the addition of DBT may shift that range downward (because DBT provides more precise imaging). Lastly, stakeholders suggested that facility characteristics could impact recall rates (e.g., underserved areas may have higher recall rates because patients in those areas have limited engagement with the healthcare system and tend to experience fragmented care). Differences in facility characteristics could be corrected using an administrative cancer detection rate.

                In addition to the public listening session, the face validity of the measure was systematically assessed via qualitative survey of a multi-stakeholder group of 32 individuals, including one patient/patient advocate (the composition of which is described in the Feasibility section, above).

                4.3.4 Validity Testing Results

                The results shown above in Table 7, within the attachment under the 'Logic Model' question, indicate that 75 percent of the respondents support the measure’s intent—to assess recall rates to determine appropriate diagnostic imaging for breast cancer detection. Furthermore, 71 percent of the respondents strongly agreed or agreed that the measure addresses quality of care (Table 8, within the attachment under the 'Logic Model' question).

                For individuals who responded Disagree or Strongly Disagree to the question summarized in Table 7, suggestions were made to update the measure name to reference recall in lieu of follow up (the change was made) and encouragement to use BIRADS assessment values instead of documentation from CPT codes (which is not feasible for measures calculated using administrative claims data).

                For responses to the question summarized in Table 8, one person noted that “there are many additional factors to consider” (without sharing additional context); four individuals suggested removal of magnetic resonance as a follow-up imaging modality (the suggestion was not accepted); and three people encouraged removal of information about breast density (which was addressed).

                4.3.5 Interpretation of Validity Results

                Please see Table 7 and Table 8, within the attachment under the 'Logic Model' question.

              • 4.4.1 Methods used to address risk factors
                4.4.1b If an outcome or resource use measure is not risk adjusted or stratified

                CBE #4220e is a process measure for which the measure steward provides no risk adjustment or risk stratification. It was determined that risk adjustment and risk stratification were not appropriate for the measure based on the measure evidence base and the measure construct. Stakeholder feedback received during the listening session for the measure suggest facility characteristics could potentially impact measure scores because patients in underserved areas who have limited engagement with the healthcare system may have higher rates of recall. Additionally, the prevalence of cancer in underserved areas is typically higher because their population tends to have limited to no primary care prevention. As a process-of-care measure, the decision to image a patient should not be influenced by sociodemographic status factors; rather, adjustment would risk masking such important inequities in care delivery. Variation across populations is reflective of differences in the quality of care provided to the disparate population included in the measure’s denominator. The measure steward will continue to assess the need for risk adjustment throughout the measure’s lifespan.

                Risk adjustment approach
                Off
                Risk adjustment approach
                Off
                Conceptual model for risk adjustment
                Off
                Conceptual model for risk adjustment
                Off
                • 5.1 Contributions Towards Advancing Health Equity

                  As shown above in Table 3, some potential social risk factors were examined to identify performance gaps. These factors include biological sex, racial or ethnic identity, age band, and dual eligibility status. Statistically significant differences in performance have been identified which demonstrate an opportunity for improving health equity based on these risk factors.

                  • 6.2.1 Actions of Measured Entities to Improve Performance

                    Usability of the measure was assessed via qualitative survey of a multi-stakeholder group of 32 individuals, including one patient/patient advocate. The results indicate that 77.4 percent of the respondents agree the measure can be used by hospital outpatient departments to guide decision-making and improve healthcare quality and health outcomes. One responder suggested that the measure be used in conjunction with a breast cancer detection rate. Furthermore, 80.6 percent of the respondents agreed that the measurement of mammography and DBT follow-up rates for breast cancer detection is highly important because reporting the measure results can supply meaningful information to consumers and healthcare providers.

                    • First Name
                      Amanda
                      Last Name
                      Overholt

                      Submitted by Amanda on Mon, 01/08/2024 - 16:00

                      Permalink

                      Importance

                      Importance Rating
                      Importance

                      Strengths:

                      • The developer cites evidence and guidelines from the American College of Radiology, which highlights the potentially negative consequences if the mammography and digital breast tomosynthesis (DBT) recall rate is either too high or too low. The developer posits that this measure “will guide breast cancer screening decision making in hospital outpatient departments.” In addition, this measure will potentially “reduce radiation received from mammography or DBT that may induce more cancers in younger people or those carrying deleterious gene mutations, as well as decreasing unnecessary imaging and biopsies. Alternatively, underuse of follow-up for screening mammography or DBT may result in missed cases of cancer.” The developer provides a logic model for this measure, but it does not capture the statements shared under “Anticipated Impact.” Across 3,391 facilities from July 1, 2021 – June 30, 2022, the mean measure performance for facilities meeting MCC (8.5%, standard deviation [S.D.], 6.7%) falls within the targeted recall rate range of 5% to 12%. However, variability exists across the distribution of facility scores, with more than 30% (33.4) of facilities having scores outside of the targeted recall rate range. The developer also reported statistically significant differences in performance by biological sex, racial or ethnic identity, age band, and dual eligibility status.
                      • The developer states that “while other measures evaluate rates of breast cancer imaging, this measure provides valuable information to facilities, consumers, researchers, clinicians, and policy-makers, with respect to recalling the appropriate number of patients for follow-up screening each year.

                       

                      Limitations:

                      • The developer states that there are inconsistencies in the target recall rate for breast cancer screening. However, guidelines from the American College of Radiology recommend a target recall rate for mammography screening between 5% and 12%. In Europe, the International Agency for Research on Cancer sets a target recall rate at 5%. The developer does not describe the actions providers can take to ensure appropriate recall rates. The developer does not provide direct patient input on this measure. However, it does cite evidence of the negative consequences associated with additional screening, which include anxiety and stress to patients, as well as the risk of a delayed cancer diagnosis.

                       

                      Rationale:

                      • The developer cites evidence and guidelines from the American College of Radiology, which highlight the potentially negative consequences if the mammography and digital breast tomosynthesis (DBT) recall rate is either too high or too low. The developer posits that this measure will "potentially reduce radiation received from mammography or DBT that may induce more cancers in younger people or those carrying deleterious gene mutations, as well as decreasing unnecessary imaging and biopsies.”
                      • However, the developer states that there are inconsistencies in the target recall rate for breast cancer screening. However, guidelines from the American College of Radiology recommend a target recall rate for mammography screening between 5% and 12%. The committee should consider if the range in recall rates is appropriate and perhaps have the developer further justify the range (benefits vs. harms) associated.
                        A gap exists across the distribution of facility scores, with more than 30% (33.4) of facilities having scores outside of the targeted recall rate range. The developer also reported statistically significant differences in performance by biological sex, racial or ethnic identity, age band, and dual eligibility status.
                      • Lastly, the developer does not provide direct patient input on this measure. However, it does cite evidence of the negative consequences associated with additional screening, which include anxiety and stress to patients, as well as the risk of a delayed cancer diagnosis.

                      Feasibility Acceptance

                      Feasibility Rating
                      Feasibility Acceptance

                      Strengths:

                      • The developer conducted a feasibility assessment by engaging a multistakeholder panel of experts and the American College of Radiology. It did not disclose any data availability or feasibility issues. Rather, members of the panel provided feedback on burden, indicating a 75% agreement that the measure does not place an undue burden on hospitals to collect the data. For the 25% who found the measure burdensome, the developer resolved most concerns.
                      • The developer notes that it did not make any changes to the measure as a result of the feedback. This measure uses claims data, and the Centers for Medicare & Medicaid Services calculates its Outpatient Imaging Efficiency measures using data from final claims that facilities submit for Medicare beneficiaries enrolled in fee-for-service Medicare. The developer states that there are no fees, licensing, or other requirements to use any aspect of this measure as specified.

                       

                      Limitations:

                      None

                       

                      Rationale:

                      • This measures uses the electronic data source of claims. The developer conducted a feasibility assessment by engaging a multistakeholder panel of experts and the American College of Radiology. It did not disclose any data availability issues. Rather, members of the panel provided feedback on burden, indicating a 75% agreement that the measure does not place an undue burden on hospitals to collect the data. For the 25% who found the measure burdensome, the developer resolved most concerns.
                        The developer notes that it did not make any changes to the measure as a result of the feedback.

                      Scientific Acceptability

                      Scientific Acceptability Reliability Rating
                      Scientific Acceptability Reliability

                      Strengths:

                      • The measure specifications are well defined and precise. The measure could be implemented consistently across organizations and allows for comparisons.
                      • The vast majority of facilities have a reliability well above the threshold of 0.6 with the first decile having mean reliability of 0.81 and an overall median of 0.95.
                      • A total of 3,391 facilities met the minimum case count requirement, making them eligible for public reporting, with millions of imaging studies between them included in the denominator.
                      • This measure uses a relative precision model to determine the minimum necessary number of cases. Facilities need at least 31 cases to qualify for public reporting; this number can vary from 31 to 67, depending on a facility’s performance rate.
                      • All reported testing results are for the public reporting period based on data collected from July 1, 2021, through June 30, 2022 (PR 2023).

                       

                      Limitations:

                      • There appears to be at least one outlier (reliability of 0.41). It would be useful to know more about this outlier and see why the reliability was so much lower than the median (0.95) and even the mean reliability of the first decile (0.81).

                       

                      Rationale:

                      • Measure score reliability testing (accountable entity level reliability) performed. The vast majority of facilities have a reliability which exceeds the accepted threshold of 0.6 with only one facility below the threshold (minimum of 0.41). Sample size for each year and accountable entity level analyzed is sufficient.
                      Scientific Acceptability Validity Rating
                      Scientific Acceptability Validity

                      Strengths:

                      • The developer conducted face validity testing of the measure score via qualitative survey of a multi-stakeholder group of 32 individuals. In total, 25 physicians of various specialties, 5 healthcare administration or management staff, 1 patient, and 2 caregivers responded to the survey (which contained questions about measure face validity, feasibility, and usability).
                      • The developer states that this group reached consensus that guidance for the measure should include a target screening recall rate of 5% to 12%, in alignment with American College of Radiology guidelines.
                        75% of the respondents support the measure’s intent and 71% strongly agreed or agreed that the measure addresses quality of care.
                      • For individuals who did not agree, suggestions were made to update the measure name to reference recall in lieu of follow up (the change was made by the developer) and encouragement to use BIRADS assessment values instead of documentation from CPT codes (which is not feasible for measures calculated using administrative claims data).
                      • One person noted that “there are many additional factors to consider” (without sharing additional context); four individuals suggested removal of magnetic resonance as a follow-up imaging modality (the suggestion was not accepted by the developer); and three people encouraged removal of information about breast density (which was addressed by the developer).
                      • Lastly, the measure is not risk-adjusted, as it is a process measure.

                       

                      Limitations:

                      None

                       

                      Rationale:

                      • The developer conducted face validity testing of the measure score via qualitative survey of a multi-stakeholder group of 32 individuals. In total, 25 physicians of various specialties, 5 healthcare administration or management staff, 1 patient, and 2 caregivers responded to the survey (which contained questions about measure face validity, feasibility, and usability).
                      • The developer states that this group reached consensus that guidance for the measure should include a target screening recall rate of 5% to 12%, in alignment with American College of Radiology guidelines.
                        75% of the respondents support the measure’s intent and 71% strongly agreed or agreed that the measure addresses quality of care.
                      • For individuals who did not agree, suggestions were made to update the measure name to reference recall in lieu of follow up (the change was made by the developer) and encouragement to use BIRADS assessment values instead of documentation from CPT codes (which is not feasible for measures calculated using administrative claims data).
                      • One person noted that “there are many additional factors to consider” (without sharing additional context); four individuals suggested removal of magnetic resonance as a follow-up imaging modality (the suggestion was not accepted by the developer); and three people encouraged removal of information about breast density (which was addressed by the developer).
                      • Lastly, the measure is not risk-adjusted, as it is a process measure.

                      Equity

                      Equity Rating
                      Equity

                      Strengths:

                      • Disparities are evaluated through analyzing differences in performance scores by sex, race/ethnicity, age group, and dual eligibility, and the developer found significant overall differences (chi-squares) for each factor

                       

                      Limitations:

                      • Regarding the observed disparity by sex (i.e., men have a recall rate of 24.3% compared with 9.2% among women). Regardless of the sample size, one may expect a higher rate of positive results requiring follow-up since men probably would only get a mammogram if they are symptomatic and then its purpose would be to rule out breast cancer, and as such should result in a higher recall rate than the target range for this measure. This may be the same for women under age 40/50 (depending on the guideline followed)
                      • Significance tests are limited to chi-square tests, i.e., there are no t-tests to evaluate differences by race/ethnicity category, e.g., relative to a reference category or the target recall range; in addition, all groups but men and women under 45 have values within the target range, so the clinical importance of these findings is unclear; it might be most appropriate to conclude that no disparities were identified

                       

                      Rationale:

                      • Developers use performance data to calculate chi-square statistics for the rate of recall by sex, race/ethnicity, age group, and dual eligibility, and find overall significance for each of these factors, concluding there is opportunity for improving health equity based on these factors.
                      • The method chosen does not report differences between specific groups, and because most rates are within the target range the clinical importance of the findings is unclear. In addition, the groups for which rates outside the range were found are groups that are not generally recommended for screening mammography (men, women under 45), so the target range may not be appropriate for them.

                      Use and Usability

                      Use and Usability Rating
                      Use and Usability

                      Strengths:

                      • Developer indicates the measure is planned for use in public reporting and internal quality improvement (QI).
                      • After the measure was submitted to Battelle, the developer added more information based on its review of the staff assessment: The measure is intended for use in CMS's Hospital Outpatient Quality Reporting (HOQR) Program, which provides financial incentives for performance. The developer suggests facilities wanting help with improving on this measure can consult with their QIN-QIO or consult reports available on the CMS QualityNet site discussing how they can align rates of recall with the guidance from the American College of Radiology.
                      • Developer asked a multi-stakeholder group (n=32) to assess usability of the measure; 77.4% agreed that the measure could be used by entities to guide decision making and for QI, and 80.6% agreed that the information about follow-up rates for breast CA detection can provide consumers and providers with actionable information.

                       

                      Limitations:

                      • No explicit articulation of the actions measured entities could take to improve performance, but in their comments the developer listed sources facilities can consult to develop a QI initiative.

                       

                      Rationale:

                      • Developers plan for the measure to be used in CMS's Hospital Outpatient Quality Reporting (HOQR) Program, a pay-for-quality program. The majority of a multi-stakeholder group agreed the measure could be used by entities for QI and decision-making (77.4%) and that it would provide consumers and providers with actionable information (80.6%) and the developer recommends facilities work with their QIN-QIO or consult resources available through QualityNet to develop a QI program.

                      Summary

                      N/A

                    • First Name
                      Janice
                      Last Name
                      Young

                      Submitted by Janice Young on Wed, 01/10/2024 - 12:14

                      Permalink

                      Importance

                      Importance Rating
                      Importance

                      The developer does not address the significance of this population at risk of poor outcomes related to overuse of recall and/or underuse of recall post mammography or DBT. 

                      Feasibility Acceptance

                      Feasibility Rating
                      Feasibility Acceptance

                      The focus group found that data collection was not overly burdensome on facilities. 

                      Scientific Acceptability

                      Scientific Acceptability Reliability Rating
                      Scientific Acceptability Reliability

                      This measure uses the same data process as Outpatient Imaging Efficiency measures using claims data.  This has provided reliable results for this measure.

                      Scientific Acceptability Validity Rating
                      Scientific Acceptability Validity

                      The validity of the data shows a wide range from a minimum of 0.41 to 1st decile of 0.81.

                      Equity

                      Equity Rating
                      Equity

                      This measure is Equitable across all care and patient types.

                      Use and Usability

                      Use and Usability Rating
                      Use and Usability

                      This measure provides valid information and is not burdensome to facilities.

                      Summary

                      I would recommend this measure with minor modifications. 

                      First Name
                      Kory
                      Last Name
                      Anderson

                      Submitted by Kory on Fri, 01/12/2024 - 17:56

                      Permalink

                      Importance

                      Importance Rating
                      Importance

                      Agree with the input from the PQM folks and their rationale on this one.

                      Feasibility Acceptance

                      Feasibility Rating
                      Feasibility Acceptance

                      This should be easy enough to track and doesn't appear overly burdensome to input or submit data.

                      Scientific Acceptability

                      Scientific Acceptability Reliability Rating
                      Scientific Acceptability Reliability

                      No additional comments.

                      Scientific Acceptability Validity Rating
                      Scientific Acceptability Validity

                      No additional comments.

                      Equity

                      Equity Rating
                      Equity

                      No additions, I agree with the PQM assessment.

                      Use and Usability

                      Use and Usability Rating
                      Use and Usability

                      Concern around patients lost to follow-up, disparities in ability to notify of results/follow-up. What are the steps to change behavior, improve radiology reading, etc... if a facility falls outside the 5-12% range?

                      Summary

                      Ultimately, I think this measure probably Mets Acceptability Rating.

                      First Name
                      Janet
                      Last Name
                      Hurley

                      Submitted by Janet Hurley MD on Sun, 01/14/2024 - 18:15

                      Permalink

                      Importance

                      Importance Rating
                      Importance

                      The authors provided clear rationale about how this could be helpful moving forward.

                      Feasibility Acceptance

                      Feasibility Rating
                      Feasibility Acceptance

                      Since this is coming from claims data, collection should not be burdensome.

                      Scientific Acceptability

                      Scientific Acceptability Reliability Rating
                      Scientific Acceptability Reliability

                      While these was some variation among sites, with some that have very low scores in the 40s, the overall summary is supportive. 

                      Scientific Acceptability Validity Rating
                      Scientific Acceptability Validity

                      I have some concerns about including men in the denominator, since they are typically screened only if they have significant risk factors or symptoms.  Both situations would increase the likelihood of needing subsequent testing.  Also note that some cancer facilities may have a higher subsequent testing rate also because their population is a higher risk group in general.  

                      Equity

                      Equity Rating
                      Equity

                      The authors did look at ethnic groups when reviewing the data.

                      Use and Usability

                      Use and Usability Rating
                      Use and Usability

                      I considered this met, yet I am unclear what follow-up action medical facilities should take when their scores are higher than expected.  The author did not include this in the background materials, but I still consider it met because the focus groups surveyed were in agreement.

                      Summary

                      I recommend removal of men from the denominator, but otherwise I am in agreement with the metric. 

                      First Name
                      Ray
                      Last Name
                      Dantes

                      Submitted by Dr. Ray Dantes on Mon, 01/15/2024 - 10:46

                      Permalink

                      Importance

                      Importance Rating
                      Importance

                      Agree with staff assessment.

                      Feasibility Acceptance

                      Feasibility Rating
                      Feasibility Acceptance

                      Agree with staff assessment.

                      Scientific Acceptability

                      Scientific Acceptability Reliability Rating
                      Scientific Acceptability Reliability

                      Agree with staff assessment.

                      Scientific Acceptability Validity Rating
                      Scientific Acceptability Validity

                      Agree with staff assessment.

                      Equity

                      Equity Rating
                      Equity

                      Agree with staff assessment.

                      Use and Usability

                      Use and Usability Rating
                      Use and Usability

                      I think this is a great measure for internal quality measurement but fundamentally inappropriate for pay for performance programs.

                       

                      This is an "indicator" measure that tracks the frequency that follow-up breast imaging studies are performed after a screening mamogram or DBT. Guidelines and large studies suggest the follow-up study range should be 5-12%, but adverse patient events would occur on both ends of that spectrum due to over or under-diagnosis. The main issue I see is that this measure does not track individual patient outcomes in relation to this measure. Due to patient population variation, or variation in the quality of imaging or radiologist interpretation, you could imagine lots of appropriate variation in line with the goal of optimal patient care. For example, a lower risk population with optimal image quality and interpretation, a follow-up imaging rate of 3-4% may be appropriate. Conversely, a near-future example with a clinic serving a high-risk population using new AI tools (shown to increase sensitivity) may have a follow-up range of 15%.

                       

                      The use of a "range" in pay for performance also has potential to encourage unintended consequences which were not addressed by the developer. Follow up imaging is likely to be performed within the same radiology center. Thus, if a clinic that sees a low-risk population has a follow-up study rate of 5%, they may be incentivized to inappropriately bring patients back for diagnostic studies (that would increase reimbursement) as long as it stays "within range".

                       

                      This is akin to "did the police officer meet their monthly quota for speeding tickets" rather than a measure of how much drivers are actually speeding. 

                      Summary

                      See my comments on Use and Usability, and concerns for use of this measure in Pay for Performance.

                      First Name
                      Karen
                      Last Name
                      Fernandes

                      Submitted by Karen Fernandes on Mon, 01/15/2024 - 14:39

                      Permalink

                      Importance

                      Importance Rating
                      Importance

                      Potential negative consequences mentioned but from a patient standpoint I would rather be called than not.  If the recall rate is high, is there a problem with the equipment, technician competencies and radiologists reading the mammograms

                      Feasibility Acceptance

                      Feasibility Rating
                      Feasibility Acceptance

                      No additional burdon

                      Scientific Acceptability

                      Scientific Acceptability Reliability Rating
                      Scientific Acceptability Reliability

                      Measure as outlined was well designed

                      Scientific Acceptability Validity Rating
                      Scientific Acceptability Validity

                      Validity testing completed and consensus reached.  Unclear what the discussion really was to remove the MRI testing.  MRI testing is recommended for high risk patients but what is concerning is the contrast media utiized: Gadolinium.  This is a known toxic metal.  2018 FDA issued a drug safety warning relative to gadolinium retention issues.  The FDA suggests first time patients should receive a medication guide

                      Equity

                      Equity Rating
                      Equity

                      Planned for public reporting

                      Use and Usability

                      Use and Usability Rating
                      Use and Usability

                      Public reporting planned and to be part of HOQR Program

                      Summary

                      This is an important measure.  It is very important and reassuring for patients even though it may increase their anxity.  Risk was noted to be off.  I do question this because patients at high risk for breast cancer are really never addressed.  For example, those patients exposed in-utero to the drug Diethylstilbestrol (DES) are never acknowledged.  Radiologist should know about these high risk patients.  Recommend Mammogram intake forms includes two questions:  Are you DES exposed or do you have a family history of DES exposure.

                      First Name
                      Tamaire
                      Last Name
                      Ojeda

                      Submitted by Tamaire Ojeda on Mon, 01/15/2024 - 15:42

                      Permalink

                      Importance

                      Importance Rating
                      Importance

                      There is strong evidence presented about the benefits of screening for breast cancer and follow up evaluations when needed. It is further presented that there are negative consequences when mammography and digital breast tomosynthesis (DBT) recall rate is either too high or too low. The measure developer establishes that this measure will support the reduction in radiation received. Nonetheless, there is no clear established connection between this measure and reduction of radiation.  It is unclear how this measure will support improved outcomes in the patients. The benchmark to compare to from ACR is the rate of recall, but there is no substantive research to support that this is still a current benchmark to compare the results to. Also, has this target rate been adjusted to different ethnical and racial parameters? In addition, it is unclear if this measure is unique and if there is meaningfulness perceived by the patient population.  

                      Feasibility Acceptance

                      Feasibility Rating
                      Feasibility Acceptance

                      The burden of reporting this measure was directly addressed by the developer. There is no proprietary data reported as needed and there all information is generated from the EHR utilizing value sets, supporting the appropriate data collection. 

                      Scientific Acceptability

                      Scientific Acceptability Reliability Rating
                      Scientific Acceptability Reliability

                      Reliability of the measure was established with data provided. Does not seem to be of high burden to implementers, supported by data being readily available in EHRs with the use of value sets. 

                       

                      Agree with staff preliminary assessment comment related to limitations of the reliability results: 

                      • "There appears to be at least one outlier (reliability of 0.41). It would be useful to know more about this outlier and see why the reliability was so much lower than the median (0.95) and even the mean reliability of the first decile (0.81)."
                      Scientific Acceptability Validity Rating
                      Scientific Acceptability Validity

                      It is unclear how this measure will address the patient outcomes described of decreasing unnecessary radiation while improving further screening when needed. Validity study does not address how the addition of DBT "may shift the range downward" when referring to the ACR screening target recall rate. 

                      Equity

                      Equity Rating
                      Equity

                      The measure developer is able to establish a difference in the measure across different patient groups. It does not however, clearly establishes how the measure supports addressing these differences. Would like to know this further. 

                      Use and Usability

                      Use and Usability Rating
                      Use and Usability

                      The measure, although important in nature, is hard to see as a way to improve patient outcomes. The developer does not establish any pathways to address any numbers outside of the range of 5-12%. Not only this, but it is also concerning what facilities could do in communities with high health inequities where resources are not available to correctly address results outside of the desired range. 

                      Summary

                      Although I see the benefit of a measure that improves the appropriate follow up for breast cancer evaluation after mammograms is achieved, I am unable to see how this measure addresses the differences in population and their health equity markers as part of a pay for reporting program. I am also unable to see the need for knowing the recall rate when there is no process to address any outlier results. I look forward to further discussion of this measure. 

                      First Name
                      Barbara
                      Last Name
                      Kivowitz

                      Submitted by Barbara Kivowitz on Tue, 01/16/2024 - 16:43

                      Permalink

                      Importance

                      Importance Rating
                      Importance

                      Setting a range to guide additional screening measures so that overuse of radiation and underdiagnosis don't occur is very important.  I don't understand the rationale for the 5%-12% range other than it's consistency with measures already adopted. How do we know that this is the appropriate range going forward? Might current conditions warrant a shift in the range or having different ranges for special circumstances (eg. rural areas, demographic groups that have traditionally been over or under examined in follow up care)?

                      Also - while additional screening can elicit anxiety, there is also anxiety around uncertainty of diagnosis.

                      Feasibility Acceptance

                      Feasibility Rating
                      Feasibility Acceptance

                      Agree with staff assessment.

                      Question - what about medicare beneficiaries enrolled in medicare advantage plans (which is about half of all medicare beneficiaries). Will this measure apply to them? Might there be differences in diagnostic follow up for those in MA plans that would be important to recognize?

                      Scientific Acceptability

                      Scientific Acceptability Reliability Rating
                      Scientific Acceptability Reliability

                      agree with staff assessments

                      Scientific Acceptability Validity Rating
                      Scientific Acceptability Validity

                      agree with staff assessments

                      Appreciate that patients and caregivers were included

                      Equity

                      Equity Rating
                      Equity

                      Developers use performance data to calculate the rate of recall by sex, race/ethnicity, age group, and dual eligibility, and find overall significance for each of these factors, concluding there is opportunity for improving health equity based on these factors.

                      Use and Usability

                      Use and Usability Rating
                      Use and Usability

                      agree with staff assessment

                      Some concern that facilities wanting help with improving on this measure can consult with their QIN-QIO will be sufficient for improving and sustaining the measure. There may be rural/urban, size, resource differences that could require other approaches.

                      Summary

                      My main question is about maintaining the 5%-12% range. Might there be reasons to reexamine this range at this time? How do we know if this range is still the best one for the purposes of this measure for all populations?

                      First Name
                      Kobi
                      Last Name
                      Ajayi

                      Submitted by Kobi on Thu, 01/18/2024 - 02:00

                      Permalink

                      Importance

                      Importance Rating
                      Importance

                      The developer cites a target recall rate of 5% to 12%, depending on the facility. This rate appears arbitrary, albeit inconsistent, and does not explain the US healthcare dynamics (rural vs. urban) or technology. The developers do not describe the actionable quality improvement strategies to develop. Even though CBE #4220e is important, particularly as it focuses on monitoring "rates of recall following screening imaging instead of rates of breast cancer imaging," from a patient perspective, the developers did not describe how patients will be informed about this measure. 

                       

                       

                      Feasibility Acceptance

                      Feasibility Rating
                      Feasibility Acceptance

                      The developers conducted a focus group of professionals and individuals with lived experiences which was commendable. However, the developers did not report the geographic characteristics of the individuals with the most agreement that the measures will not add undue burden on hospitals to collect data. Professionals in rural areas may report an undue burden with collecting this data. The developers conducted a focus group of professionals and patients with lived experiences which was commendable. However, the developers did not report the geographic characteristics of the individuals with the most agreement that the measures will not add undue burden on hospitals to collect data. The patient representation relative to healthcare staff/professionals could undermine the feasibility assessment result. It would be appropriate for the developers to consider a patient-only group to ensure accurate representation. Some questions that come to mind are if there were differences in the agreement scores between the patients and healthcare professionals. 

                      Scientific Acceptability

                      Scientific Acceptability Reliability Rating
                      Scientific Acceptability Reliability

                      The reliability rating is sound, used large data that is available to different facilities, and standardized.

                      Scientific Acceptability Validity Rating
                      Scientific Acceptability Validity

                      Although the scientific measures are sound, the developers did not describe the reliability of 0.41. Compared to the mean of 0.9 and the next lowest value of 0.81. It would be ideal to describe this steep difference in value.  

                      Equity

                      Equity Rating
                      Equity

                      The developers considered key variables in their measure. However, as noted by the developers, care must be taken in interpreting the significant difference between males and females and race/ethnicity. I would caution against framing the findings as though disparities by sex exist. Additionally, considering that there are male-female differences in breast cancer, leading lead to biased estimates when data is not disaggregated, it would be worthwhile for the developers to include an additional language acknowledging this.

                      Use and Usability

                      Use and Usability Rating
                      Use and Usability

                      Like i mentioned earlier, patient representation on this measure is a limitation to be addressed. Based on the sample, the positive sentiments are reflective of providers. 

                      Summary

                      Overall, I agree with the staff assessment and my fellow independent reviewers. The measure is important, but some nuances still need clarity. There's room for continuous improvement of the measure. 

                      First Name
                      Tammy
                      Last Name
                      Love

                      Submitted by Tammy Jean Love on Thu, 01/18/2024 - 16:29

                      Permalink

                      Importance

                      Importance Rating
                      Importance

                      There is strong evidence presented about the benefits of breast cancer screening and follow-up. 

                      The developer suggests the measure will support the reduction in radiation received, but was unable to identify a clear connection or understanding how they correlate. It is also unclear how tracking this measure would actually improve patient outcomes. Is the benchmark mentioned below take into account different ethnic/racial/rural parameters?

                      Limitations - The developer does not provide guidance to sites on the actions providers can take to ensure the appropriate recall rates. Also unclear depending on population, what sites should do if their recall rates fall outside of the 5% and 12% range if there is diversity in their population. 

                      Feasibility Acceptance

                      Feasibility Rating
                      Feasibility Acceptance

                      Using claims data this should not be overly burdensome. 

                      Scientific Acceptability

                      Scientific Acceptability Reliability Rating
                      Scientific Acceptability Reliability

                      Would like to see what was the information related to the outlier (reliability of 0.41) for review. 

                      Agree with staff preliminary assessment comment related to limitations of the reliability results: 

                      • "There appears to be at least one outlier (reliability of 0.41). It would be useful to know more about this outlier and see why the reliability was so much lower than the median (0.95) and even the mean reliability of the first decile (0.81)."
                      Scientific Acceptability Validity Rating
                      Scientific Acceptability Validity

                      I would like to hear discussion on whether men should be in the denominator as they are typically only screened if at risk or having symptoms. Is there any conditions for those areas with a population in high risk areas for cancer?

                      Equity

                      Equity Rating
                      Equity

                      Agree with staff assessment. 

                      Use and Usability

                      Use and Usability Rating
                      Use and Usability

                      The developer did not address how an organization could take actions to improve performance if rates fell outside of the 5% and 12% parameters. 

                      Summary

                      Although I see the benefit of a measure supporting follow-up evaluation from initial breast cancer screenings, I would like to hear discussion on different health equity aspects and population drivers (rural/city) and how these screenings could overall improve patient outcomes.  I look forward to discussion to modify this measure to be beneficial to organizations to perform follow-up exams when needed. 

                      First Name
                      Kyle
                      Last Name
                      Campbell

                      Submitted by Kyle Campbell on Fri, 01/19/2024 - 16:47

                      Permalink

                      Importance

                      Importance Rating
                      Importance

                      The citations provided regarding the clinical guidelines do not appear appropriate, grading not provided related to recommendations on recall rates. Inconsistent recommendation on recall rates with wide range.

                      Feasibility Acceptance

                      Feasibility Rating
                      Feasibility Acceptance

                      No changes were made in response to the feasibility assessment but the survey indicated high agreement that the measure would not provide undue burden. Measure uses electronic claims data and no proprietary info needed.

                      Scientific Acceptability

                      Scientific Acceptability Reliability Rating
                      Scientific Acceptability Reliability

                      Data suggest measure is reliable for all deciles and well above 0.6 threshold. Specifications appear precise.

                      Scientific Acceptability Validity Rating
                      Scientific Acceptability Validity

                      Face validity of 75% from panel. Additional validity testing not conducted.

                      Equity

                      Equity Rating
                      Equity

                      Agree with staff assessment. Interested to learn more about the developers perspective on clinical significance of differences.

                      Use and Usability

                      Use and Usability Rating
                      Use and Usability

                      Planned use for OQR program. Usability results indicated 77.4 percent of the respondents agree the measure can be used by hospital outpatient departments to guide decision-making and improve healthcare quality and health outcomes. As indicated by staff, unclear what interventions can be undertaken to establish appropriate recall rates.

                      Summary

                      Important topic but some concerns about strength of recommendations related to the recall rates and interventions that would result in clinically significant improvements.

                      First Name
                      Selena
                      Last Name
                      McCord

                      Submitted by Selena McCord on Fri, 01/19/2024 - 19:35

                      Permalink

                      Importance

                      Importance Rating
                      Importance

                      There is not enough information to determine if the patient population finds the process measure meaningful. The developer mentioned hosting a listening session with a multi-stakeholder group of 32 individuals that included one patient. I particularly would have appreciated the feedback of those most impacted especially high-risk patients in underserved areas. 

                      Feasibility Acceptance

                      Feasibility Rating
                      Feasibility Acceptance

                      Data would be readily available without burden.

                      Scientific Acceptability

                      Scientific Acceptability Reliability Rating
                      Scientific Acceptability Reliability

                      A reliability median score of 0.95 suggests the measure can identify true differences in performance between individual facilities.  

                      Scientific Acceptability Validity Rating
                      Scientific Acceptability Validity

                      Developer facilitated qualitative and quantitive assessments of the measure's validity.  

                      Equity

                      Equity Rating
                      Equity

                      The developer draws attention to the process measure ensuring equity as shown in Table 3.  It presents measure performance scores by patient biological sex, racial or ethnic identity, age group, and dual eligibility status. 

                      Use and Usability

                      Use and Usability Rating
                      Use and Usability

                      Concerned about facilities in underserved areas.  The stakeholder's group composition does not share if their perspectives are based on rural or urban experiences, or those living in and providing care to resource-restricted populations. 

                      Summary

                      Agree overall with the process measure. 

                      First Name
                      Patricia
                      Last Name
                      Merryweather-Arges

                      Submitted by Pat Merryweath… on Sat, 01/20/2024 - 12:44

                      Permalink

                      Importance

                      Importance Rating
                      Importance

                      This is an important measures as Chicago followed the action of NY in terms of identifying the lack of call backs -- and successful call backs.  The variation among black communities was low but the mortality rate was higher than the white population.  There were issues in the testing centers associated with lack of call backs and lack of call backs in a timely manner.   While hospitals performed quite well, some of the outpatient centers and City operated centers performed poorly in terms of call backs.  

                       

                      If there are variations in mortality among populations, it is helpful for performance improvement to assess the call back outcomes which are really processes of care.

                       

                      Equal Hope (formerly the Metro Chicago Breast Cancer Task Force) is the organization that has lead the initiative and actually closed the gap by conducting detailed analysis --- resulting in some centers closing and some provided with more support.

                       

                      If there are disparities in communities and among populations, this is a very important measure.

                      Feasibility Acceptance

                      Feasibility Rating
                      Feasibility Acceptance

                      It is being done and most centers track their call back.

                      Scientific Acceptability

                      Scientific Acceptability Reliability Rating
                      Scientific Acceptability Reliability

                      The systems are established to measure across centers with a high degree of reliability.

                      Scientific Acceptability Validity Rating
                      Scientific Acceptability Validity

                      It has met the validity testing.

                      Equity

                      Equity Rating
                      Equity

                      Almost all data is provided by gender, age, race, ethnicity, and payer type.   

                      Use and Usability

                      Use and Usability Rating
                      Use and Usability

                      It is a proven approach to increase performance and in geographical areas it would be ideal to have public data.  Also, as more stand alone organizations evolve to provide diagnostic services, it would be ideal to be able to track performance and identify any disparities.

                      Summary

                      From my patient perspective and knowing the improvement in breast cancer mortality results from the NY and Chicago experiences, I fully support this measure.

                      First Name
                      Carole
                      Last Name
                      Hemmelgarn

                      Submitted by Carole Hemmelgarn on Sat, 01/20/2024 - 18:20

                      Permalink

                      Importance

                      Importance Rating
                      Importance

                      There are two sides to the coin regarding the importance of this measure. One, unnecessary radiation exposure, which currently, there is no mechanism capturing cumulative radiation exposure for each individual. The other is potential for misdiagnosis an patient harm.

                       

                      Why is Europe's rate 5% and is there something we can learn from them?

                      Feasibility Acceptance

                      Feasibility Rating
                      Feasibility Acceptance

                      There is no burden with collecting the data.

                       

                      They did have a TEP and it had two patients. Patient representation should be higher. They did capture multistakeholder opinions; however, was there diversity within this group?

                      Scientific Acceptability

                      Scientific Acceptability Reliability Rating
                      Scientific Acceptability Reliability

                      Looked at different settings (urban and rural) and size of facilities.

                       

                      Provided sound data and literature corresponded.

                      Scientific Acceptability Validity Rating
                      Scientific Acceptability Validity

                      Validity meet

                      Equity

                      Equity Rating
                      Equity

                      It would be good to see the ethnic/race data based on rural and urban settings and size of facilities.

                      Use and Usability

                      Use and Usability Rating
                      Use and Usability

                      I am not sure patients will use this data. 

                       

                      The end goal is to reduce the 5-12% number but what is the percentage or number?

                      Summary

                      I see value in the measure for patients. Reducing radiation exposure is important and not missing a potential cancer diagnosis.

                      First Name
                      Talia
                      Last Name
                      Sasson

                      Submitted by Talia Sasson on Sun, 01/21/2024 - 13:35

                      Permalink

                      Importance

                      Importance Rating
                      Importance

                      while it is important to identify facilities that over or under recall patients, the measure does not address how the screening recall rate be used to improve performance.    

                      Feasibility Acceptance

                      Feasibility Rating
                      Feasibility Acceptance

                      No comment

                      Scientific Acceptability

                      Scientific Acceptability Reliability Rating
                      Scientific Acceptability Reliability

                      No comment

                      Scientific Acceptability Validity Rating
                      Scientific Acceptability Validity

                      No comment

                      Equity

                      Equity Rating
                      Equity

                      No comment

                      Use and Usability

                      Use and Usability Rating
                      Use and Usability

                      The developers aim to use screening recall rate to improve quality of care. This measure can work when the reasons for rates outside acceptable zoon are due to a facility issues. If the reasons for unacceptable recall rates due to population access to health care, or the location of the facility (rural Vs cities) it will be very hard to improve results. How the developers aim to address these issues.

                      Summary

                      I think that measuring breast screening qualities is important. breast cancer screening recall rates could be a good tool to measure quality, but it can place unnecessary burden on facilities where poor rates are due to location, population etc.   

                      First Name
                      Kent
                      Last Name
                      Bream

                      Submitted by Kent Bream M.D. on Sun, 01/21/2024 - 18:13

                      Permalink

                      Importance

                      Importance Rating
                      Importance

                      This is a new measure under PQM. However, it is a respecification measure for the “Mammography Follow Up Rates” under OQR. It is evaluated under the initial/new rather than the maintenance standard. The submission, however, relies significantly on the previous existence of the OQR measure. 

                       

                      A logic model is provided that describes the pathway to biopsy and surveillance. The clinical patient oriented goal would be accurate identification of mammographic abnormalities. As a initial/new submission, it would be important to have clarity if this is simply a consensus statement/guideline process measure or a patient oriented clinical measure. 

                       

                      The information provided assumes that there is a correct rate of recall in a population(5-12% based on professional guideline) regardless of the prevalence of breast abnormalities within the population. This promotes precision and homogeneity in high performing entities but does not, prima facia, support clinical accuracy or impact under or over utilization. In low risk populations, recall would be high. In high risk populations, one could conclude that recall is too low. This is problematic because maximizing diagnosis while minimizing injury in breast cancer is critically important.

                       

                      The developer presents Performance gaps which are only required for maintenance submissions. The developer provides a table with the current distribution  of performance across 3391 facilities. While the developer uses the appropriate level of recall of 5-12%, the gaps are reported in deciles with 4.9% and 13.0% as the closest deciles. This does not allow precise assessment of the gap from the 5-12% suggested. In addition, the high recall rate category (over 13%) has a performance level mean of 13.7% which demonstrates a measuarable but not clearly meaningful performance gap in over utilization based on consensus guidelines. Similarly, the “denominator count” for less than 4.9% seems to be 171/3,307,860 patients or 0.005%. The narrative provides facility counts which do not seem to appear in the data presented. Clarity in performance measures, and maintaining consistent categories and variable definitions would help in interpreting importance.

                       

                      The developer also provides gap analysis by sex, race, age, and dual eligibility as a proxy for SES. They describe these data, however, as unreliable. In addition, while there are differences by race, all performance is within the recommended range(5-12%) so there would be considered no gap in performance by race because it is a binary yes/no measure of quality. The 5-12% is not a scaled measure of quality.

                       

                      This was also seen with SES and those eligible for both Medicare and Medicaid.  Younger patients (18-34, 35-44), as a broad category, have a higher rate of recall potentially creating risk based on the importance of the measure, but these patients do not have an indication for screening mammography and likely are receiving diagnostic mammography based on a complaint or an issue. Though patients 40-44 (no subgroup analysis) have the option for screening. Clearly delineating the difference between screening adn diagnostic mammography and follow up is important. Males, and women under 40 could be excluded because they are receiving diagnostic mammography based on a complaint in almost all cases.

                       

                      The developer does not comment on the adequacy of existing measures.

                       

                       

                      Feasibility Acceptance

                      Feasibility Rating
                      Feasibility Acceptance

                      The developer evaluated perception of feasability with 32 people, with only one patient. The descripotion of how these 32 people were sampled is not evident though may be available in some supporting material. It is not clear if these 32 were representative of both clinicians, system adminitrators, payors, and patients. The conclusion was that it is feasible. Though 25% were undecided or disagreed that it was feasible. No adjustments to feasabilty were made and no costs or requirements were described. In another section one adjustment is described as changing follow up to recall. 

                      Scientific Acceptability

                      Scientific Acceptability Reliability Rating
                      Scientific Acceptability Reliability

                      Reliability method is not described fully described. This is currently proposed as a process measure at the facility level but much of the justification is a clinical measure at the patient level. This may require chart abstraction rather than the use of claims data.(Though claims data could be used with the ICD 10 of abnormal mammography findings.)  

                       

                      The presented data in Table 6 do not describe misclassification or the possibility of such. Misclassification would not simply be about performance between entities but the prevalence of mammography abnormal findings of the population presenting at each facility. 

                       

                      The beta-binomial model may be a good model but it is not clear how it is being used here.  

                      Scientific Acceptability Validity Rating
                      Scientific Acceptability Validity

                      The narrative description of the validity methods focus on the consensus of 32 individuals. It is not clear in the materials that these 32 individuals represent the broad stakeholders in breast cancer care. Wile a focus group can be used for face and likely construct validity, it is not clear that the current claims based measure is addressing the need for appropriate clinical follow up without overuse.  The developer suggests that using the administrative cancer detection rate may have value. The developer does not follow up on this suggestion. 

                       

                      The developer argues that no risk adjustment is necessary. This may be true if the outcome is simply “recall”. However recall is being used as a proxy measurement for “appropriate”. In those cases, adjusting for risk in the population served by a facility would be appropriate. If there is a high risk of cancer in the population then the appropriate recall rate may be higher and even outside the suggested 5-12% range. Similarly, in a low risk population, not adjusting the pass rate for the measure would increase the exposure to radiation or follow up testing (biopsy) unnecessarily. 

                      Equity

                      Equity Rating
                      Equity

                      This is an optional measure. Feedback is provided for future improvement.

                       

                      The developer suggests that equity was considered within the performance gap data. While statistical testing indicates a statistically significant difference in recall rates, there is no indication if this is clinically significant, appropriate based on risk, or equitable/inequitable. It is also testing recall based on a scaled measure of recall rather than the binary (yes/no=5-12%) which the measure is geared toward.  As described in the performance gaps, one of the subpopulations most at risk quantitatively is males and 18-34 year old females. This quantitative inequity does not follow a clinical or equity logic model. 

                      Use and Usability

                      Use and Usability Rating
                      Use and Usability

                      Use and usability is minimally described. 

                       

                      There are no identified suggestions for improving performance. 

                       

                      This is a previously existing measure under a different name and program.

                      Summary

                      This is an important issue with significant advocacy around improving care and minimizing harm. The previous existing measure and measure proposed in this submission continues a proxy measure that uses a range without regard to clinical indication or population prevalence of breast cancer. While a measure should be implemented it is not clear that perpetuating the past measure will impact quality of care for women with abnormal findings on beast cancer screening. 

                      First Name
                      Gregary
                      Last Name
                      Bocsi

                      Submitted by Greg Bocsi on Mon, 01/22/2024 - 03:44

                      Permalink

                      Importance

                      Importance Rating
                      Importance

                      The information provided sufficiently supports the importance of a measure.

                      Feasibility Acceptance

                      Feasibility Rating
                      Feasibility Acceptance

                      Information concerning feasibility points to it being satisfactory.

                      Scientific Acceptability

                      Scientific Acceptability Reliability Rating
                      Scientific Acceptability Reliability

                      Reliability looks good.  Of course, understanding the outlier would be helpful, but wouldn't discount the overall reliability observed. 

                      Scientific Acceptability Validity Rating
                      Scientific Acceptability Validity

                      Face validity results are acceptable.  

                      Equity

                      Equity Rating
                      Equity

                      Appreciate the equity analysis; however, it would be a stretch to conclude that this measure helps address inequities in health care.

                      Use and Usability

                      Use and Usability Rating
                      Use and Usability

                      Comparing oneself to a target range can be helpful to identify an opportunity for improvement.  However, upon closer inspection one might find that one's recall rate is entirely appropriate even though it happens to be outside of the target range.  What are suggestions for the type of actions that can be used to improve
                      performance in that scenario?  Will consulting reports available on the CMS QualityNet or their QIN-QIO help to "fix" a recall rate outside of the target range when the rate is appropriate given the patients served by a hospital during a measurement period?

                      Summary

                      It's fine to have a target range and to investigate when one falls outside that range, but you wouldn't want to be unfairly judged anytime you were outside the expected range.  This measure would be more useful if it allowed for explainable performance outside of the goal or had further criteria specified for characteristics of a hospital that indicate the goal is appropriate.  

                      First Name
                      Karen
                      Last Name
                      Johnson

                      Submitted by Karen Johnson on Mon, 01/22/2024 - 09:44

                      Permalink

                      Importance

                      Importance Rating
                      Importance

                      Developer should summarize evidence linking potentially negative consequences if recall rate is either too high or too low (i.e., overuse of imaging/biopsy or less accurate/timely cancer diagnosis), not just state it. This information may have been included in the supplementary materials but if so, it was not obvious. 

                       

                      Developer should solicit patient input about the meaningfulness of the facility recall rate for patients.

                      Feasibility Acceptance

                      Feasibility Rating
                      Feasibility Acceptance

                      The measure is calculated using Medicare FFS procedure code data (outpatient and carrier).  

                      Scientific Acceptability

                      Scientific Acceptability Reliability Rating
                      Scientific Acceptability Reliability

                      Testing data were adequate, with good representation between facilities/patients.  Conducted signal-to-noise analysis, with reliability at .92 overall and high (>.8) across all deciles. 

                      Scientific Acceptability Validity Rating
                      Scientific Acceptability Validity

                      Face validity assessment was done systematically/with transparent methods/results.  The quality question asked of the TEP was not as explicit as it could have been.  Results were fair (71% agreed the measure reflects care quality, or 79% if you drop the undecided/don’t know from the calculation).  The decision not to risk-adjust is seems reasonable, although I wondered about the much higher rates for younger women (although these made up relatively small percentage of women in the testing dataset).

                      Equity

                      Equity Rating
                      Equity

                      It is good to see stratification by at least some variables, although I’m not sure how to interpret the results given the differences (for the most part) still fall within the guideline range. It would have been nice to see a rural/urban breakdown.

                      Use and Usability

                      Use and Usability Rating
                      Use and Usability

                      I believe it is included in the adopted into the Hospital OQR Program (although that isn’t completely clear from the online information presented).  A fairly large proportion of their TEP thinks it will be useful for decision making and improving quality.  It is a new measure so developers were not required to respond to questions about feedback, improvement, or unintended consequences of the measure.

                      Summary

                      For the most part, this measure appears to meet most of the requirements for initial endorsement.  However, the developer should provide more evidence to support the measure and also solicit patient input about the meaningfulness of the facility recall rate for patients. 

                      First Name
                      AC
                      Last Name
                      Comiskey

                      Submitted by Ashley Comiskey on Mon, 01/22/2024 - 11:51

                      Permalink

                      Importance

                      Importance Rating
                      Importance

                      the developer does not provide direct patient input on this measure. However, it does cite evidence of the negative consequences associated with additional screening, which include anxiety and stress to patients, as well as the risk of a delayed cancer diagnosis.

                      Feasibility Acceptance

                      Feasibility Rating
                      Feasibility Acceptance

                      Utilizes claims data which is required data and routinely generated on patient encounters.  75% agreement that the measure will not plan an undue burden on hospital to collect the data.  

                      Scientific Acceptability

                      Scientific Acceptability Reliability Rating
                      Scientific Acceptability Reliability

                      Measure score reliability testing (accountable entity level reliability) performed. The vast majority of facilities have a reliability which exceeds the accepted threshold of 0.6 with only one facility below the threshold (minimum of 0.41). Sample size for each year and accountable entity level analyzed is sufficient

                      Scientific Acceptability Validity Rating
                      Scientific Acceptability Validity

                      not risk adjusted as it is a process measure

                      Equity

                      Equity Rating
                      Equity

                      Developer conducted sufficient assessment of equity for sex, race/ethnicity, age, and dual eligibility status.

                      Use and Usability

                      Use and Usability Rating
                      Use and Usability

                      No explicit articulation of the actions measured entities could take to improve performance, but in their comments the developer listed sources facilities can consult to develop a QI initiative.

                      Summary

                      N/A

                      First Name
                      Hannah
                      Last Name
                      Ingber

                      Submitted by Hannah Ingber on Mon, 01/22/2024 - 11:59

                      Permalink

                      Importance

                      Importance Rating
                      Importance

                      This is 99% met but agree with staff assessment concern about limited patient engagement on importance. Although the developer doesn’t provide input from patients in this section, I’d be curious to hear if the developer got any information on the measures importance from the face validity testing and use survey they did. 

                      Feasibility Acceptance

                      Feasibility Rating
                      Feasibility Acceptance

                      From table 4 it looked like there were six people that responded either Disagree or Strongly Disagree. I assume the two others did not provide any feedback and so were not mentioned? Assuming this is the case and because 75% agreed, no major concerns so would consider this met.

                      Scientific Acceptability

                      Scientific Acceptability Reliability Rating
                      Scientific Acceptability Reliability

                      Met based on same rationales in staff assessment but one question: 1) Reliability- why is facility and denominator count n/a for the 0.41 reliability minimum? Perhaps this would answer questions about the outlier noted in staff assessment.

                      Scientific Acceptability Validity Rating
                      Scientific Acceptability Validity

                      no concerns

                      Equity

                      Equity Rating
                      Equity

                      Although met for the purposes of this measure as specified, I have concerns that this Medicare FFS measure will not capture many of the patients in populations at risk for receiving inequitable recall rates/the care associated with this process. What proportion of patients aren't captured by this measure, for example? That would therefore be a limitation to this measures' ability to measure inequities. However, that is outside of the scope of this measure evaluation. Would encourage the developer to explore options for assessing other populations in the future.

                      Use and Usability

                      Use and Usability Rating
                      Use and Usability

                      I think the developer makes points about how the facilities with too low or two high rates might interpret their results. However, this is most clearly written in the importance section. I think for this new measure the standards are met, though.

                      Summary

                      No major concerns but some areas where clarification or more information would be helpful. 

                      First Name
                      Anne
                      Last Name
                      Llewellyn

                      Submitted by Anne M Llewellyn on Mon, 01/22/2024 - 12:34

                      Permalink

                      Importance

                      Importance Rating
                      Importance

                      This seems like an important measure for all who get preventative and diagnostic mammograms and another test when a lump is discovered. 

                      Feasibility Acceptance

                      Feasibility Rating
                      Feasibility Acceptance

                      From the data presented it is feasible to do this and will provide important data for clinical staff to know. 

                      Scientific Acceptability

                      Scientific Acceptability Reliability Rating
                      Scientific Acceptability Reliability

                      It seems like the scientific reliability rating does not totally agree with the data submitted. As more data comes in, a standard of care can be written to discuss variations in care

                      Scientific Acceptability Validity Rating
                      Scientific Acceptability Validity

                      Seems like there needs to be more data to convince the scientific community the measurement is needed

                      Equity

                      Equity Rating
                      Equity

                      Equity seems to be covered well in the data presented.

                      Use and Usability

                      Use and Usability Rating
                      Use and Usability

                      Hopefully, the measurement will increase use and usability. This is important as people who have abnormal mammograms often wait long for further testing and then can have advanced disease. 

                      Summary

                      This is an important standard for improving follow-up care for breast cancer screening. Getting this information out to physicians, radiologists, and other stakeholders is important for the benefit of the patient and care coordination. 

                      First Name
                      Jean-Luc
                      Last Name
                      Tilly

                      Submitted by Jean-Luc Tilly on Mon, 01/22/2024 - 13:37

                      Permalink

                      Importance

                      Importance Rating
                      Importance

                      The scores describe a considerable performance gap, but crucially a remedy for underperforming facilities is not apparent.

                      Feasibility Acceptance

                      Feasibility Rating
                      Feasibility Acceptance

                      No meaningful issues.

                      Scientific Acceptability

                      Scientific Acceptability Reliability Rating
                      Scientific Acceptability Reliability

                      Outlier noted does not concern me, median reliability excellent, and even decile 1 statistics at .81 are great.

                      Scientific Acceptability Validity Rating
                      Scientific Acceptability Validity

                      Face validity concerns addressed, although I don’t understand the rationale for removing MRI as a follow-up imaging modality and why that wasn’t accepted. I would be happy to hear more discussion on this.

                      Equity

                      Equity Rating
                      Equity

                      Assessment of equity was conducted, and measure would more likely than not address inequities in healthcare (by race and insurance status), although this could have been demonstrated with stronger statistical evidence.

                      Use and Usability

                      Use and Usability Rating
                      Use and Usability

                      It would be really helpful to understand what interventions would lead to an improvement in performance, and understand from research or a case study what implementing one of those interventions would look like.

                      Summary

                      This is a well-designed “balancing” measure focused on improving rates of diagnosis of breast cancer while reducing burdensome overuse. The performance gap established by the data to date is considerable. 

                      First Name
                      Jill
                      Last Name
                      Blazier

                      Submitted by Jill Blazier on Mon, 01/22/2024 - 15:28

                      Permalink

                      Importance

                      Importance Rating
                      Importance

                      The developer cites evidence and guidelines from the American College of Radiology, which highlight the potentially negative consequences if the mammography and digital breast tomosynthesis (DBT) recall rate is either too high or too low. The developer posits that this measure will "potentially reduce radiation received from mammography or DBT that may induce more cancers in younger people or those carrying deleterious gene mutations, as well as decreasing unnecessary imaging and biopsies.”

                       

                      However, the developer states that there are inconsistencies in the target recall rate for breast cancer screening. However, guidelines from the American College of Radiology recommend a target recall rate for mammography screening between 5% and 12%. The committee should consider if the range in recall rates is appropriate and perhaps have the developer further justify the range (benefits vs. harms) associated.

                      Feasibility Acceptance

                      Feasibility Rating
                      Feasibility Acceptance

                      The measure is currently being used in the CMS Outpatient Imaging Efficiency program. The source data is entirely electronic and sent in on claims. 

                      Scientific Acceptability

                      Scientific Acceptability Reliability Rating
                      Scientific Acceptability Reliability

                      The measure specifications are well-defined and precise. The measure could be implemented consistently across organizations and allows for comparisons.

                      Scientific Acceptability Validity Rating
                      Scientific Acceptability Validity

                      The developer conducted face validity testing of the measure score via a qualitative survey of a multi-stakeholder group of 32 individuals. In total, 25 physicians of various specialties, 5 healthcare administration or management staff, 1 patient, and 2 caregivers responded to the survey (which contained questions about measure face validity, feasibility, and usability).

                       

                      The developer states that this group reached a consensus that guidance for the measure should include a target screening recall rate of 5% to 12%, in alignment with American College of Radiology guidelines.
                      75% of the respondents support the measure’s intent and 71% strongly agreed or agreed that the measure addresses quality of care.

                      Equity

                      Equity Rating
                      Equity

                      Disparities are evaluated by analyzing differences in performance scores by sex, race/ethnicity, age group, and dual eligibility, and the developer found significant overall differences (chi-squares) for each factor

                      Use and Usability

                      Use and Usability Rating
                      Use and Usability

                      Developers plan for the measure to be used in CMS's Hospital Outpatient Quality Reporting (HOQR) Program, a pay-for-quality program. The majority of a multi-stakeholder group agreed the measure could be used by entities for QI and decision-making (77.4%) and that it would provide consumers and providers with actionable information (80.6%) and the developer recommends facilities work with their QIN-QIO or consult resources available through QualityNet to develop a QI program.

                      Summary

                      I believe this is a good measure that has an opportunity to improve patient outcomes. Determining levels of appropriate follow-up exams while balancing that level with the potential downsides of over-follow-up is an important Quality Improvement objective for diagnostics. It goes without saying that patients do not want to have their possible breast tumor ignored and not further evaluated, but patients also do not respond positively when asked for follow-up that may not be necessary. Finding an appropriate balance, reimaging patients who need it, and limiting overexposure to radiation are excellent goals for which hospitals should strive. Extending this measure to the HOQR program with the target of 5%-12% performance is a good way for organizations to begin to find this balance and hone in on the required skills that are needed to optimize in this sensitive population.

                      First Name
                      Matt
                      Last Name
                      Austin

                      Submitted by Matt Austin on Mon, 01/22/2024 - 15:36

                      Permalink

                      Importance

                      Importance Rating
                      Importance

                      The developer did not provide direct patient input on this measure. However, it does cite evidence of the negative consequences associated with additional screening, which include anxiety and stress to patients, as well as the risk of a delayed cancer diagnosis.

                       

                      Inconsistencies in the target recall rate for breast cancer screening. However, guidelines from the American College of Radiology recommend a target recall rate for mammography screening between 5% and 12%. In Europe, the International Agency for Research on Cancer sets a target recall rate at 5%. The developer does not describe the actions providers can take to ensure appropriate recall rates.

                      Feasibility Acceptance

                      Feasibility Rating
                      Feasibility Acceptance

                      Measure calculated using claims data, so very feasible.

                      Scientific Acceptability

                      Scientific Acceptability Reliability Rating
                      Scientific Acceptability Reliability

                      Not all scores met the 0.6 threshold

                      Scientific Acceptability Validity Rating
                      Scientific Acceptability Validity

                      Face validity survey using a multi-disciplinary panel.

                      Equity

                      Equity Rating
                      Equity

                      Disparities are evaluated through analyzing differences in performance scores by sex, race/ethnicity, age group, and dual eligibility, and the developer found significant overall differences (chi-squares) for each factor

                      Use and Usability

                      Use and Usability Rating
                      Use and Usability

                      Not clear on plans for measure getting used.

                      Not sure how HOPDs would know that they need to improve?? (is 10% a good rate?)

                      Summary

                      Comments on 4220

                      First Name
                      Helen
                      Last Name
                      Haskell

                      Submitted by Helen W Haskell on Mon, 01/22/2024 - 22:32

                      Permalink

                      Importance

                      Importance Rating
                      Importance

                      This is an important measure to have as a balancing measure to tracking rates of screening alone. I do have concerns about the population being measured. USPSTF recommends screening in women aged 40-74. Medicare beneficiaries overlap only slightly with this population, meaning that a sizeable proportion of patients included in this measure are likely to be high-risk or being screened for diagnostic purposes. To me, this raises questions about the import of the results. Men, for example, and women younger than the recommended screening ages are likely to have been referred for suspicion of anomalies. Not surprisingly, these groups, though small, show a high rate of recall. It would also be helpful to know the follow-up of the recalls - how many patients  continue on to treatment and how many turn out to be benign?

                      Feasibility Acceptance

                      Feasibility Rating
                      Feasibility Acceptance

                      Simple measure does not present burden.

                      Scientific Acceptability

                      Scientific Acceptability Reliability Rating
                      Scientific Acceptability Reliability

                      Agree with staff assessment

                      Scientific Acceptability Validity Rating
                      Scientific Acceptability Validity

                      Agree wityh staff assessment

                      Equity

                      Equity Rating
                      Equity

                      This measure presents an opportunity to track variability by patient group. This may be a reason behind some of the wide variability noted.

                      Use and Usability

                      Use and Usability Rating
                      Use and Usability

                      Need more discussion about how finding will be addressed. It is a concern that detection of variability or clusters due to georgraphic, cultural or socioeconomic factors could be discouraged or misinterpreted.

                      Summary

                      Overuse and underuse of breast cancer screening is an important subject to address, but this measure may be taking a bit of a broad brush to the problem.