Endorsement and Maintenance (E&M) Guidebook

Comment Status

Closed

Comment Period

Sat, Jul 1 - Mon, Jul 31 2023

File

Del-3-6-Endorsement-and-Maintenance-Guidebook-Final.pdf

Comments

Acumen, LLC

Which aspect(s) of the guidebook are you commenting on?

General E&M processes and policies

Committee structure and voting outcomes

Accountable-entity level testing guidance - reliability and validity

Measure use – accountability application characteristics

PQM Measure Evaluation Rubric

Appeals

Other

Acumen appreciates the opportunity to comment on the proposed Endorsement and Maintenance (E&M) Guidebook. We applaud Battelle’s effort in making the consensus-based evaluation of measures more rigorous and streamlined. With that spirit, Acumen offers 25 comments from the measure developer perspective, which is rooted in our experience as a developer of many quality and cost measures that are currently in use across Centers for Medicare & Medicaid Services’ (CMS) programs.

1. Overlapping in Scopes of E&M and PRMR

From the measure developer perspective, the existence of two separate but overlapping processes to evaluate measures is extremely burdensome. In an ideal world, there should be only one process that holistically evaluates measures. We understand that there are regulatory and contractual requirements for having two processes, we recommend the following to harmonize the two processes and to reduce the burden on committee members:

Measures that already received a recommendation from PRMR or MSR should be automatically endorsed because the meaningfulness (PRMR) or impact (MSR) criterion completely overlaps with all E&M criteria and there will likely be significant overlap in committee membership.
Measures that already received endorsement should only be evaluated on criteria other than meaningfulness (PRMR) or impact (MSR) because both of these completely overlap with all E&M criteria and there will likely be significant overlap in committee membership.

2. Objectives of Endorsement

We disagree that the goal of endorsement should be to ensure that measures are “safe and effective” (page 3). The actions by measured entities in response to the measurement are what determine safety and efficacy of the act of measurement, not the measure itself. For example, a clinically valid and scientifically acceptable measure that quantifies disease detection rate may be unsafe and ineffective in a program that does not have other measures that also measure appropriate use or adverse effect of diagnostic testing. However, when the same measure is implemented alongside an appropriate use measure or a cost measure, it can be safe and effective. Therefore, safety and effectiveness cannot be evaluated in a vacuum without the policy context of how a measure will be implemented. We strongly recommend establishing that the objective of endorsement is scientific acceptability. In other words, safety and efficacy should be evaluated by the PRMR process instead and should not be included in the E&M process.

3. Appeal - Lack of Actionable Feedback Provided to Measure Developers Post-Review is an Acceptable Ground for Appeal

While the guidebook states that Battelle will provide a meeting summary, a final technical report, the endorsement meeting discussions, final voting results, public comments received, and any dissenting views after each cycle (page 20), the information must be granular enough for measure developers to take actions to improve the measures or address the concerns during subsequent appeals. For example, the feedback should not be as vague as the committee found the evidence to be inadequate without outlining the deficiencies. We recommend that the rationale for each final decision must point out deficiencies in the submission or the measure specifications, and the failure to identify deficiencies in the submission or the measure specifications is an acceptable ground for appeal.

4. Importance Criterion - Requirement of Systematic Review Requirement

The Importance criterion requires the measure developers to submit a systematic review or cite a systematic review (pages 31-32). While systematic review is ranked highly on strength of the evidence, it is neither the strongest form of evidence nor the only form of strong evidence. Other strong forms of evidence include clinical practice guidelines, randomized controlled trials, scoping reviews, and cohort studies. We recommend rewriting to ask for the most up-to-date evidence without a hard requirement on study type instead.

5. Importance Criterion – Failure if Empirical Evidence Includes Only Selected Studies

We disagree that the Importance criterion is not met if “Empirical evidence includes only selected studies” (page 31). All reviews include only selected studies and no review includes all studies. We recommend rewriting that “the Importance criterion is not met if the reviewer identifies important studies that have been omitted in the submission that may reasonably change the conclusion about the importance of the measure”. In other words, the omission must also be accompanied by a potential change in the narrative of the importance of the measure in order to fail a measure.

6. Importance Criterion – Requirement of a Net Benefit

We disagree with the requirement of evidence of a net benefit to measurement, which is a causality dilemma where it is not possible to ascertain a net benefit without measurement. This is contrary to the desire to better understand the healthcare system and identify areas for improvement. We recommend deleting this bullet.

Additionally, it is unclear what “net benefit” is referring to in relation to the conclusions of the systematic review, a requirement that we also disagree with (pages 31-32). There are at least two ways of interpreting “net benefit”: (1) the net benefit of the measure itself or (2) the net benefit of a particular component of the logic model. The former interpretation is problematic because it makes an implicit assumption that the measure itself is an intervention that must be shown to have a net benefit. This implicit assumption introduces a causality dilemma, where it is not possible to ascertain a net benefit without first measuring something. If the latter interpretation is what Battelle intended, we recommend rewriting the related bullets in Not Met and Met headings.

7. Importance Criterion – Requirement of a Performance Gap

We disagree that the Importance criterion is not met if “There is low confidence/certainty that there is evidence of a performance gap” (page 31). There may be circumstances where a measure is topped-out currently but it is still important to measure to ensure that the current trend does not deteriorate, or that there is a performance gap in a small subset of measured entities. Without such measurement, it would be impossible to detect worsening trends early or health inequity. We recommend removing the requirement of a performance gap as this is more appropriate for PRMR to examine.

8. Importance Criterion – Requirement of Existing Measures or Programs

We disagree that the Importance criterion is not met if “there is no description of other existing measures or programs or no search conducted to identify other existing measures or programs” (page 31). The existence of this bullet means that a measure cannot be endorsed without the program context. Is it the intention to provide endorsement for use in a specific program? An unintended consequence of requiring justification for a specific program is that the endorsement is a de-facto endorsement for use in that specific program, but the developer may implement the measure in a different program with a vastly different policy context. Similar to comment 1, there must be a clearer distinction between the purpose of endorsement versus PRMR. We recommend removing all requirements related to fit for a program from all endorsement criteria.

9. Scientific Acceptability Criterion – Reliability Threshold

We disagree that the Scientific Acceptability criterion is not met if the reliability metric is less than the stated thresholds (page 36). This criterion makes an implicit assumption that reliability alone can differentiate a good measure from a bad measure. Previous research has highlighted two drawbacks of using reliability metric as a sole benchmark for differentiation:

The variation between providers can be due to incomplete risk adjustment of patient risk, in addition to variation in the underlying performance of providers.¹A high-reliability result can either signal incomplete risk adjustment or a measure’s ability to differentiate providers’ performance. Measures with low-reliability result can still be useful in identifying extreme providers.
Reliability is not just a property of the measure, but also of the population that it is measuring, especially if there are systematic differences in patient characteristics among providers^2-3.

Therefore, difference in performance is not the only driver of reliability and reliability alone is not enough to determine scientific acceptability or inacceptability.

More problematically, the reliability thresholds do not specify whether they are applied for the whole testing sample, any point within the distribution, or at a certain minimum number of qualifying events. This creates the possibility for the reviewers to be subjective and inconsistent across measures.

[1] Kalbfleisch, John D., Kevin He, Lu Xia, and Yanming Li. "Does the inter-unit reliability (IUR) measure reliability?." Health Services and Outcomes Research Methodology 18, no. 3 (2018): 215-225.

[2] Ohlssen, David I., Linda D. Sharples, and David J. Spiegelhalter. "A hierarchical modelling framework for identifying unusual performance in health care providers." Journal of the Royal Statistical Society: Series A (Statistics in Society) 170, no. 4 (2007): 865-890.

[3] Spiegelhalter, David, Christopher Sherlaw‐Johnson, Martin Bardsley, Ian Blunt, Christopher Wood, and Olivia Grigg. "Statistical methods for healthcare regulation: rating, screening and surveillance." Journal of the Royal Statistical Society: Series A (Statistics in Society) 175, no. 1 (2012): 1-47.

10. Scientific Acceptability Criterion – Establishing a Reliability Threshold is a Task for the Scientific Method Panel

We notice that Battelle has increased the reliability thresholds compared to the thresholds that were under consideration by your predecessor. We believe that the Scientific Method Panel (SMP) gathered by your predecessor did not reach consensus on appropriate reliability thresholds. Therefore, until the new SMP gets a chance to provide inputs, the default threshold should be 0.4 as established through rulemaking.⁴ Increasing the reliability thresholds without inputs from the SMP and the public constitutes a by-pass of the consensus building process, which we believe is against the spirit of the consensus-based entity.

[4] CMS, “Medicare Program; CY 2022 Payment Policies Under the Physician Fee Schedule and Other Changes to Part B Payment Policies; Medicare Shared Savings Program Requirements; Provider Enrollment Regulation Updates; and Provider and Supplier Prepayment and Post-Payment Medical Review Requirements,” 86 FR 64996-66031.

11. Scientific Acceptability Criterion – Performance of Risk Adjustment Model

We disagree that the Scientific Acceptability criterion is not met if results of the risk adjustment model “do not demonstrate acceptable model performance” (page 36). This creates the possibility for the reviewers to be subjective and inconsistent across measures without a definition of acceptable model performance. We recommend removing requirements for the risk adjustment model to “demonstrate acceptable model performance” (page 36), unless a definition of acceptable model performance is established. Should such a definition be established, we further recommend an additional public comment period so measure developers/stewards can provide input on the definition. Similar to our comments on the reliability thresholds, the explained variance metric alone is also not appropriate to differentiate a good measure from a bad measure.

12. Scientific Acceptability Criterion – Face Validity

There is insufficient information on the criteria to determine whether face validity is “adequate” (pages 35-36). This creates the possibility for the reviewers to be subjective and inconsistent in their review. We recommend:

Convening the SMP and holding a public comment period to seek input on evaluation criteria of face validity
In absence of evaluation criteria of face validity, all face validity submissions should be considered as adequate

13. Equity Criterion

We disagree that a measure must “contributes to efforts to address inequities in health care” in the Equity criterion (page 37). Measurement alone may contribute to such efforts by shining a light on any inequity. The same measure may advance or worsen health equity depending on the health policy context at any given time, and the response of the measured entities to the measurement. Therefore, it is unreasonable to require the developers to show that a measure must contribute to such efforts when the developers are not responsible for health policies or the policies are not known at the time of endorsement. We recommend removing the equity criterion—a measure must “contributes to efforts to address inequities in health care”—because the PRMR process is more suited for evaluating the policy context.

14. Use and Usability Criterion

As currently written (page 38-39), this criterion checks for (1) a plan to use in a specific program in the near future and (2) behavioral change by the measured entities due to the measurement. While we agree with (1), we disagree with (2). Asking for evidence of behavioral change by the measured entities is making an implicit assumption that the measure itself is an intervention. We recommend removing evidence of behavioral change from the use and usability criterion because the PRMR process is more suited for evaluating the policy context.

Additionally, we recommend focusing on the burden of reporting, if any. For example, a clinically valid and scientifically acceptable measure may not be useful if it requires a special diagnostic test or data type that is out of reach for the majority of healthcare organizations due to cost, use of third-party software, or effort required for data collection. In other words, feasibility should be considered instead of evidence of behavioral changes by the measured entities.

15. Endorsement Decision Categories

Table 2 states that each decision category requires >= 75% of votes by E&M committee. It is unclear which is the default decision category should the committee fails to reach 75% in any decision category. We recommend rewriting to state the default decision category in the event of non-consensus. Additionally, a default outcome is also needed in the scenario where the sum of Endorsed and Endorsed with Conditions equals 75% and both categories receive exactly 37.5%.

16. Threshold for Consensus of 75%

We appreciate Battelle’s effort in raising the standard for consensus. However, this proposed voting threshold is a significant deviation from the predecessor’s, which was set at 60%. We see three issues:

We understand that the main justification is rooted in Appendix F that references a measure of consensus, which is set at 95%. However, there is no justification provided for using the threshold of 95% for the measure of consensus and, by extension, the rationale for a voting threshold of 75% is also lacking.
Appendix F uses different categories (“evidence complete and adequate”, “evidence complete nor adequate but will path forward”, and “evidence not complete nor adequate, and no path forward”) than the categories listed in Table 2 (“endorsed”, “endorsed with conditions”, “not endorsed/endorsement removed”). This is important because some decision categories are overlapping in concept that may not constitute disagreement. For example, “endorsed” and “endorsed with conditions” are closer in agreement than between “endorse with conditions” and “not endorsed”. In other words, the calculation in Appendix F used simulation results for a different set of decision categories and failed to consider the relative distance between categories, which can lead to an inaccurate measure of consensus and, by extension, incorrect choice of the voting threshold.
This proposal constitutes a major scientific methodological choice that we believe is best suited for the Scientific Method Panel (SMP) to consider. The consideration of relative distance or agreement between decision categories outlined above is an example of the need for the SMP’s input.

Until the SMP has a chance to review the proposed scientific rationale for the choices proposed and public comments on the SMP’s decision are received, we recommend keeping the existing voting threshold of 60% that was painstakingly discussed and approved by the predecessor.

17. Lack of Dialogue with Measure Developers during the Review Process

We would appreciate additional details regarding the measure steward and developer’s role throughout the 6-month E&M cycle. The guidebook mentions that measure stewards/developers will have a chance to answer questions/provide measure insight during listening sessions. For this process to be successful and transparent, we recommend setting aside time for the measure steward and developers to have a chance to answer questions, provide context, and defend measures during the evaluation meetings.

18. The Risk of Outsized Influence of the Subject Matter Expert

While we support the need for expertise, we are concerned with the unintended consequences of using the subject matter experts (SMEs) as currently described on page 17 and during the webinar. Since the use of an SME signals that the E&M committee or Battelle believes that they do not have sufficient expertise to review a particular measure, the SME becomes the only person with sufficient expertise. This situation essentially gives the SME outsized influence on the E&M committee and undermines the consensus-building process, even if the SME does not vote. We recommend convening a different review panel in the event that the E&M committee or Battelle believes that they do not have sufficient expertise to avoid undermining the consensus-building process.

19. STAR Tool and E&M Forms

With many changes in the process and the Fall 2023 E&M cycle coming soon, we would appreciate having immediate access to the STAR database and the E&M forms in order to prepare our submissions.

20. Confusing Terminology of the E&M Committees

The Foundational and Reconciliation groups were referenced on pages 7 and 19, but it is unclear how they relate or interact with the Advisory (Delphi) and Recommendations (Nominal) groups. We recommend simplifying the grouping terminology to avoid confusion.

21. Timeline for Remediating Issues and Factual Reviews

On page 17, the guidebook mentions that the E&M team will share CBE preliminary assessments, based on the PQM measure evaluation rubric, with developers for a 2-business day, factual review prior to sharing with the E&M committee. The E&M team also requests a similar 2-day timeline for measure developers to address any issues identified during the Completeness Check on page 15. We recommend that the Battelle E&M team provides measure developers more than 2-business days for conducting factual reviews and remediating issues identified from the completeness check. We are requesting for the 2-business day timeline to be extended to a minimum of 1 week. If the timeline cannot be extended to a minimum of 1 week, then we request for the Battelle E&M team to reach out to measure stewards and developers approximately one week before the preliminary assessments and completeness check responses are due, to alert measure developers to upcoming action items.

22. Preliminary Assessments

Page 17 of the guidebook mentions that the E&M team drafts a preliminary assessment for each measure. This preliminary assessment is delivered to measure developers for factual review, and is then sent to E&M committee members to rate measures based on the PQM Measure Evaluation Rubric. The E&M team aggregates and summarizes results and distributes them back to members for review prior to the endorsement meeting. Does the E&M team distribute these ratings to the measure developer team in advance of the measure evaluation meeting as well (this would help measure developers prepare for the measure evaluation meeting)? If yes, how many days/weeks in advance of the measure evaluation meeting does the measure developer receive these materials?

We recommend that Battelle open another public comment period after more details are released.

23. Public Comments

We have two questions related to public comments:

How does the Battelle plan to share public comment feedback with measure stewards and developers?
Will all public comments be available to view, per measure, on the PQM website, or does Battelle plan to send a public comment summary workbook to measure developers?

24. Quorum

We appreciate that the Battelle is ensuring that committee members are present and ensuring quorum at the beginning of the committee meetings and if there is less than 60% attendance, a back-up meeting will be held. Will the E&M team be checking quorum throughout the meetings and what would happen in the event that quorum is lost during the measure discussion? We recommend that Battelle open another public comment period after more details are released.

25. Annual Updates

Would Battelle provide clarification regarding the types of changes that would be required for an annual update? We recommend that Battelle open another public comment period after more details are released.

Thank you for your consideration of these comments. We look forward to reviewing additional details released by Battelle, and welcome an opportunity to provide further comments on the E&M process. Please feel free to reach out to macra-cost-measures-info@acumenllc.com for any questions.

Name or Organization

Acumen, LLC

Committee Functions

Which aspect(s) of the guidebook are you commenting on?

Committee structure and voting outcomes

Ambulatory Care (may need ambulatory care complex and chronic illness committee (or subcommittee) to cover areas roughly correlated with MIPS specialty reporting
Acute Care – Hospitals, ASC, HOPD, ESRD, IPF, Cancer, Hospital at Home, REH
Post Acute Care and End of Life – SNF, HH, Hospice, IRF, LTCH, PACE programs
Home and Community Health and Behavioral Health – I think this is different enough to warrant its own committee
Cost and Efficiency Measures
Pediatric/Adolescent – birth, pediatrics, adolescents and to include measures that affect children that move into adulthood but are often covered by pediatrics (things like cystic fibrosis, sickle cell, etc.)
Adult Ambulatory Care – including primary care, women’s health, prevention and wellness

Name or Organization

CMS

Based on the proposed…

Based on the proposed updates to the Endorsement and Maintenance (E&M) Guidebook the Centers for Disease Control and Prevention's National Healthcare Safety Network have the following comments, concerns, and questions:

1). The developer’s role in the updated measure evaluation process is unclear.

Under the section “Endorsement Committee Review, Novel Hybrid Delphi and Nominal Groups Technique” on page 19 there is no mention of the developer’s role during the endorsement meeting. We suggest adding language to the guide describing a developer’s role during the meeting that explicitly states that developers have an opportunity to provide additional information and clarifications when the committee has questions or misunderstands a piece of information in the measure submission.

The evaluation process is no longer transparent. When consensus is not reached on a criterion during independent review, developers will not know why this decision was made. Developers need to have a clear understanding of why the committee did not reach consensus on a criterion so that they can prepare to speak to the issues during the measure decision meeting. Having Battelle staff aggregate independent review results to determine which criteria are consensus not reached does not allow developers to understand the committee responses that led to this result.

2). While the measure evaluation process has decreased the amount of time needed for a measure to receive an endorsement decision, this increases the workload on both committee members and measure developers while potentially decreasing the contribution of subject matter experts to endorsement decisions.

The new process increases the workload on committee members, specifically the expectation that they will perform independent reviews and ratings of all measures. Since Battelle staff create a preliminary analysis and provide their own rating on the measure criteria, committee members are more likely to heavily rely on staff analysis and ratings, rather than reviewing the measure against the endorsement criteria and working to fully understand the measure submission on their own. We recommend staff do not provide ratings of measures in their analyses as this could lead to groupthink. Additionally, as part of the independent review, committee members are now also evaluating any public comments received on the measures along with the staff analysis and the submission itself. There can often be competing opinions between clinical experts, advocacy organizations, and professional associations. Will committee members be given guidance on how to appropriately consider competing interests?

We are concerned with the new committee categorizations, which will result in a decrease in the number of committees and broader topic areas being covered by each committee. We worry that this does not support committees having the appropriate expertise on the Advisory or Recommendations groups. Bringing in subject matter experts to provide insight on specific measure topic areas, such as renal and cancer measures, without allowing them to vote on the measure’s overall endorsement is not satisfactory. Those with specific expertise in the topic area that the measure addresses should be full voting committee members and should not just be brought in as subject matter experts who are not allowed to cast an endorsement vote. In addition, the role of the subject matter experts is not currently outlined in the guidebook. If kept, the guidelines of their participation should also be outlined in the guidebook.

Additionally, the minimum number of responses needed to determine consensus is 20, but there is no mention of what happens if 20 independent committee member reviews are not received.

The new process increases the workload on developers. We agree that the equity criterion is important; however, the addition of this criterion creates an added area of analysis that developers must complete without any reduction in other criteria requirements, which are already significantly burdensome. Additionally, a measure can still be important, reliable, and valid even if it does not directly address inequity. This should not necessarily prevent the measure’s endorsement. For instance, measures that address healthcare-associated infections (HAI) may not address inequities but are crucial to improve the quality of care all patients receive. We recommend making equity an optional criterion.

3). We are concerned that the committees will not have sufficient overall understanding of the measure evaluation criteria. Past and current committee meetings have shown that committee members do not fully understand how to evaluate the criteria against a measure submission. This lack of understanding regularly occurs with clinicians and measure developers, so it is a real concern that patient and family advocates and representatives on the committees will also not understand the measures, the review criteria, and especially the scientific testing (reliability and validity). Committee members, especially lay members, often benefitted from full committee discussions that were held as part of the former process, and from asking questions of developers and subject matter experts. We would like to better understand the vetting process for committee members, including patient advocates, and to know how they will be trained in the measure evaluation criteria. Additionally, please clarify whether there will be a separate measure evaluation criteria guidebook released later with more detail and specific algorithms to help the committee evaluate the measure against the criteria.

4). Please clarify the differences between the Advisory group and the Recommendations group. It appears that both groups are expected to review and to vote on each measure, and both groups are expected to attend the measure endorsement decision meeting; but the Advisory group is not allowed to discuss the measure or ask questions during this meeting. It seems that having one combined group to review and vote on the measures would better lead to shared comprehension and a more authentic consensus.

5). Please clarify the maintenance schedule: throughout the guidebook there is mention that maintenance review is every three years; however, on page 24 under the “Annual Updates” section it says maintenance review is every five years. We support a five-year maintenance cycle. The work of putting together a submission for endorsement review often begins at least one year prior to the submission deadline. Combined with a six-month process for a measure endorsement decision, a three-year maintenance cycle would mean that half of the maintenance time is spent by developers to prep the new submission. A measure maintenance review every five years would reduce the burden on developers while still ensuring that measures are current.

6). Please confirm when the measure submission questions will be released ahead of a new cycle so that developers can begin preparing their submissions.

Name or Organization

The Centers for Disease Control and Prevention's National Healthcare Safety Network

Comments on PQM Endorsement & Maintenance Process

Which aspect(s) of the guidebook are you commenting on?

General E&M processes and policies

Committee structure and voting outcomes

Measure use – accountability application characteristics

PQM Measure Evaluation Rubric

Committee Structure
There are concerns with the consolidation of committees from the previous consensus-based entity (CBE) process. There may be an increased workload for committee members since there are fewer project areas to distribute measures. Additionally, with broader committees, there may not be enough subject matter experts (SMEs) that adequately understand the various topics on each committee. While Battelle will solicit rotating SMEs based on the type of measures submitted, this process should be more formal than currently stated in the guidebook. Aside from reviewing PQM member expertise, Battelle should reach out to professional organizations that represent the relevant clinical topic area(s). These organizations have insight into important issues, such as unintended consequences, and experience with implementing measures in those clinical areas. Battelle should also consider increasing the target number of clinicians, statisticians, and health researchers on these committees to ensure the clinical methodology expertise is reliable. While the patient experience is necessary and valuable, clinicians are directly impacted by these performance measures.

Furthermore, it is unclear how the project topic areas (primary prevention; initial recognition and management; acute, chronic, surgery, behavioral health; end-of-life; and cost/efficiency) account for collaborative care. For example, depression screening could be considered for both primary prevention and behavioral health. If depression screening measures are evaluated in the behavioral health committee, will primary care physicians have the opportunity to review the measure as well? Primary care physicians would most likely be responsible for screening.

Novel Hybrid Delphi and Nominal Groups (NHDNG) Technique
To increase efficiency, similar measures will be discussed in a group following evaluation and survey results from the Advisory Group. Recommendation Group members then vote on the discussed measures individually. Battelle is using the Novel Hybrid Delphi and Nominal Group (NHDNG) to find areas of disagreement, which will be addressed in the Recommendations Group meetings. This will help to expedite meetings. However, it is unclear whether Recommendations Group members have the opportunity to discuss any areas that have agreement during the meeting. While 80% of the Recommendations Group members may agree on a measure, the other 20% might want to bring up why they disagree with a particular domain. Additionally, public comments may be submitted that should be addressed within areas of agreement.

Voting Procedures
The NHDNG technique will likely reduce the subjectivity of the process. There is support for the higher thresholds for agreement and quorum among the Recommendations Group as opposed to the previous process.

Public Comment
While Battelle plans to provide a 30-day comment window, which is lengthier than the previous CBE process, Battelle should avoid setting public comment deadlines on a weekend. Weekend deadlines provide the illusion of a full 30-day comment period. However, the transparency of the public comments on the PQM website is supported and appreciated. There is support for allowing members to review public comments prior to the E&M meeting as well. This provides members with a more comprehensive review of the measures prior to voting.

Submission Tool and Repository (STAR) Website
Battelle plans to keep the same CBE identification numbers for measures that were endorsed through the previous CBE. There is support for this structure; however, the STAR website does not currently have links to the materials submitted to the previous CBE. Battelle should consider adding links to the previous repository on the STAR website if Battelle does not plan to import those materials to STAR.

Name or Organization

Karen Campos

E&M Guidebook Comment

Which aspect(s) of the guidebook are you commenting on?

General E&M processes and policies

Committee structure and voting outcomes

Measure use – accountability application characteristics

PQM Measure Evaluation Rubric

Other

Based on the proposed updates to the Endorsement and Maintenance (E&M) Guidebook The Centers for Disease Control and Prevention's National Healthcare Safety Network have the following comments, concerns, and questions:

1). The developer’s role in the updated measure evaluation process is unclear.

Additionally, the minimum number of responses needed to determine consensus is 20, but there is no mention of what happens if 20 independent committee member reviews are not received.

3) We are concerned that the committees will not have sufficient overall understanding of the measure evaluation criteria. Past and current committee meetings have shown that committee members do not fully understand how to evaluate the criteria against a measure submission. This lack of understanding regularly occurs with clinicians and measure developers, so it is a real concern that patient and family advocates and representatives on the committees will also not understand the measures, the review criteria, and especially the scientific testing (reliability and validity). Committee members, especially lay members, often benefitted from full committee discussions that were held as part of the former process, and from asking questions of developers and subject matter experts. We would like to better understand the vetting process for committee members, including patient advocates, and to know how they will be trained in the measure evaluation criteria. Additionally, please clarify whether there will be a separate measure evaluation criteria guidebook released later with more detail and specific algorithms to help the committee evaluate the measure against the criteria.

4) Please clarify the differences between the Advisory group and the Recommendations group. It appears that both groups are expected to review and to vote on each measure, and both groups are expected to attend the measure endorsement decision meeting; but the Advisory group is not allowed to discuss the measure or ask questions during this meeting. It seems that having one combined group to review and vote on the measures would better lead to shared comprehension and a more authentic consensus.

5) Please clarify the maintenance schedule: throughout the guidebook there is mention that maintenance review is every three years; however, on page 24 under the “Annual Updates” section it says maintenance review is every five years. We support a five-year maintenance cycle. The work of putting together a submission for endorsement review often begins at least one year prior to the submission deadline. Combined with a six-month process for a measure endorsement decision, a three-year maintenance cycle would mean that half of the maintenance time is spent by developers to prep the new submission. A measure maintenance review every five years would reduce the burden on developers while still ensuring that measures are current.

6) Please confirm when the measure submission questions will be released ahead of a new cycle so that developers can begin preparing their submissions.

CDC NHSN Comment on Battelle Proposed Guidebook.pdf

Name or Organization

The Centers for Disease Control and Prevention's National Healthcare Safety Network

Comment on E&M Guidebook

Which aspect(s) of the guidebook are you commenting on?

General E&M processes and policies

Committee structure and voting outcomes

Accountable-entity level testing guidance - reliability and validity

Measure use – accountability application characteristics

PQM Measure Evaluation Rubric

Other

Dear Battelle Team,

On behalf of Health Services Advisory Group, Inc. (HSAG), we appreciate the opportunity to review and comment on the Endorsement and Maintenance (E&M) Guidebook. We are supportive of Battelle’s efforts to refine and streamline the process by minimizing the amount of time between measure submission and endorsement decision. We respectfully submit the following overarching comments for consideration:

Where feasible, we recommend aligning the fields and definitions of the Consensus-Based Entity (CBE) Measure Submission Form with corresponding fields within the Measures Under Consideration Entry/Review Information Tool (MERIT). This will reduce measure developer burden by consolidating or streamlining the required measure information documentation across the Measures Management System (MMS) Blueprint templates, CBE endorsement submission form, and the Centers for Medicare & Medicaid Services (CMS) MERIT Data Template.
To make it easier for developers to verify whether a measure meets the minimum requirements for endorsement and to help committee members with endorsement decisions, we encourage Battelle to separate the criteria for initial endorsement from the requirements for maintenance of endorsement. Additionally, it would be most helpful to developers if the initial endorsement and maintenance criteria could be organized by measure type, as some criteria do not apply to all measure types (e.g., risk adjustment). Depicting the criteria in an algorithm format by measure type may also ensure that all scenarios are covered.
Likewise, please consider creating separate submission pathways for measures undergoing initial endorsement versus maintenance of endorsement submissions to ensure that the required measure information is provided at the appropriate stage.
We encourage Battelle to consider incorporating and making publicly available in the Submission Tool and Repository (STAR) database the rationale for endorsement removal, since there are several reasons that can potentially contribute to the endorsement removal of a measure.
Regarding the endorsement decision outcomes, please consider using the categorization of “Failed Endorsement” in lieu of “Not Endorsed” to differentiate from measures that are not submitted for endorsement review but are currently used in CMS programs. This would align with the terminology used within MERIT.
Please clarify whether the Partnership for Quality Measurement (PQM) team is changing the maintenance of endorsement review cycle to every 5 years rather than every 3 years.
We support the use of the Novel Hybrid Delphi and Nominal Group (NHDNG) technique and believe that this technique will increase engagement of members and structure facilitation by using standard criteria and practices. However, we are concerned that it may be challenging to achieve an 80% voting quorum with 60 members.

For your consideration, we have also provided as a separate attachment in-line comments with more detailed feedback in a Word version of the guidebook.

Thank you for the opportunity to comment.

HSAG Feedback_Endorsement-and-Maintenance-Guidebook.pdf

Name or Organization

Health Services Advisory Group (HSAG)

Feedback on E&M Structure

Which aspect(s) of the guidebook are you commenting on?

Committee structure and voting outcomes

Measure use – accountability application characteristics

A premier national health care and human services consulting firm, the Lewin Group finds answers and solves problems for leading organizations in the public, non-profit and private sectors. We understand the industry and provide our clients with high-quality products and insightful support to help them maximize the delivery of programs and services that make a difference in the lives of their constituents. Lewin brings more than 50 years of experience and expertise in quality measurement, data analytics, policy research and decision support, training and technical assistance, and stakeholder engagement.

Lewin looks forward to engaging with the Battelle Partnership for Quality Measurement in working through the endorsement and maintenance (E&M) cycle for new and existing quality measures. We appreciate the opportunity to provide feedback on the E&M Guidebook for current and future endorsement reviews.

Lewin believes the E&M Novel Hybrid Delphi and Nominal Groups will underrepresent experts in post-acute and long-term care, as expertise in skilled nursing facilities, long-term-care facilities, custodial care facilities, intermediate-care facilities, and psychiatric facilities should be able to weigh in on measures affecting care provided outside of the acute care space. Additionally, non-facility care—including home health, assisted living, personal care, and home and community-based services—would not be captured in the seats reserved for post-acute-care facilities and experts.

In addition, measures evaluating person (or patient) experience do not have a logical home within the proposed E&M Advisory and Recommendation Groups structure. Inclusion of expertise in evaluating the experience of care across the care continuum, including for services provided outside of the facility setting, may be underrepresented in the E&M Novel Hybrid Delphi and Nominal Groups. Perspectives with expertise in survey methodology and analysis should either be grouped onto a single Committee on which all survey-based measures will fall or adding methodological expertise on all Committees will be necessary to ensure person (or patient) experience is evaluated accurately and fairly. To that end, we encourage Battelle to add an additional Committee (or experts on a proposed Committee) similar to those from the National Quality Forum Patient Experience and Function Project to ensure that survey-based measures and those evaluating the experience of care have the same rigor of assessment as do those measures that use acute-care, clinical data.

Name or Organization

The Lewin Group

Comments on the E&M Guidebook

Which aspect(s) of the guidebook are you commenting on?

General E&M processes and policies

Committee structure and voting outcomes

Accountable-entity level testing guidance - reliability and validity

Measure use – accountability application characteristics

Thank you to PQM and the Battelle team for allowing measure developers the opportunity to review and comment on the E&M Guidebook. Our team thoroughly reviewed the materials and had the following comments:

Lack of practical info on how to submit a measure application within the STAR system
While there is a great deal of information on how measures will move through the endorsement process, there is no information given on what application materials measure developers will need to prepare in order to submit. This information is critical for measure developers to know beforehand so we may scope project work to meet the submission deadlines.
Continued use of Trial Use endorsement status is unclear
In previous endorsement cycles, measures were able to be approved and endorsed for trial use, meaning measures could be implemented to collect needed data to support validity and reliability testing. This type of endorsement is not included in the E&M guidebook. Clarity is needed on what steps measures developers need to take if their measures were previously endorsed for trial use, as well as overall information on how this type of endorsement will or will not exist moving forward.
Committee categories do not feel encompassing or realistic in regards to measurement
There is a lack of clarity in the guide about what types of measures fall within these five categories. For example: Patient reported outcome performance measures (PRO-PMs) of any kind do not seem to fit within this structure. Measures of reproductive healthcare, including contraceptive care and pregnancy, do not seem to fit anywhere within these categorie
More clarity is needed on content expert selection
Are different content experts brought in for each measure review process? How are these experts selected? Our concern is that because our work is very specialized, a subject matter expert in a broader area of the field may not understand the measures.
Clarity needed on maintenance review timelines
Maintenance review timelines are listed as every 5 years under “Annual updates” section (p.27), but then also listed as 3 years in “Endorsement Decision Outcomes” table (p.4). Which is the correct timeline? Additionally, more information on requesting an extension of one year is needed. On Table 2 (p.4) extension is mentioned but not expanded on under the “Maintenance of Endorsement” section (p.23).

We thank you again for this opportunity and hope to engage in further conversations around the future of these processes.

Name or Organization

UCSF

Partnership for Quality Measurement

Which aspect(s) of the guidebook are you commenting on?

General E&M processes and policies

Committee structure and voting outcomes

Measure use – accountability application characteristics

PQM Measure Evaluation Rubric

Appeals

July 28, 2023

Partnership for Quality Measurement

Battelle

505 King Avenue

Columbus, OH 43201

Re: Battelle Clinical Quality Measure Endorsement and Maintenance Process Public Comments

The Rogosin Institute was established in 1983 as an independent, not-for-profit institution for the research, treatment and prevention of kidney disease. Our founder, Dr. Albert Rubin and his colleagues performed the first hemodialysis treatment in New York City in 1962 and the first kidney transplant in New York City in 1963.

Today, the Rogosin Institute remains a not-for-profit corporation whose mission includes medical research, education and health care concentrating on kidney disease through our affiliations with Weill Cornell Medicine and the New York-Presbyterian Hospital. Rogosin administers a nephrology practice which provides more than 20,000 patient visits for all manner of kidney disease at three locations in New York City. We also administer ten dialysis centers in Brooklyn, the Bronx, Manhattan and Queens, New York where we care for approximately 1,400 patients with end-stage kidney disease (ESKD). More than 15% of our patients treat by peritoneal dialysis and home hemodialysis.

The Rogosin Institute was a pioneer participant in the End-Stage Renal Disease Seamless Care Organization (ESCO) demonstration project and continues to participate in the ETC model. Through our Program to Educate Patients with Advanced Kidney Disease (PEAK), we have achieved outcomes far better than local and national benchmarks. Approximately 12.5% of patients participating in the program receive pre-emptive kidney transplants compared to 2.5% of patients entering the Medicare ESRD program. Of those who receive dialysis, 20% of PEAK participants initiate dialysis with home modalities compared to 2% across New York City. Of those starting with in-center hemodialysis, more than 60% begin as outpatients with permanent vascular access compared to less than 20% of their counterparts nationally. Currently, we are expanding our PEAK program to northern Manhattan and the southern Bronx and to Brooklyn. These data reflect our belief in providing all patients with the treatment that is right for them at the location that best suits them.

Achieving the highest quality of care is the driving force behind all of our work. The 2024 update to CMS’ Care Compare website is currently under preview. All of our eight rated facilities achieved ratings of four or five stars with an average rating of 4.625 stars for Quality of Patient Care. For Patient Experience, our average score increased from 3.50 to 3.625; three of our facilities achieved five star ratings for Patient Experience.

We appreciate the opportunity to comment on Battelle’s Clinical Quality Measure Endorsement and Maintenance Process. We applaud Battelle and the Partnership for Quality Measurement (P4QM) for its commitment to serve as the new Consensus Based Entity for the Centers for Medicare and Medicaid Services (CMS). We offer the following comments based on the publicly available information about P4QM’s new processes.

Expedited Timelines

While we appreciate P4QM’s focus on accelerating and streamlining the E & M consensus process, we are concerned that the accelerated timeline will make it difficult for stakeholders to review and provide comments on proposed measures. In the present rapidly evolving clinical and regulatory environment, a streamlined process is vital; however, sufficient time is required for measure developers to respond to feedback and concerns submitted during the public comment period which P4QM emphasizes will be critical to development of consensus by the new committees which are currently in formation. It will be critical to monitor the quality of the process as it is implemented to be sure that the emphasis on expediency does not compromise a careful consideration of the measures.

Scientific Methods Panel

The Scientific Methods Panel has provided invaluable support for the E & M consensus process. Employing their expertise to help the consensus committees understand issues like reliability and validity of measures has been critical to the committees’ endorsement of standardized measures capable of measuring providers’ performance. We support the more collaborative approach in the proposed process which will assist developers in ensuring that methodological challenges are addressed before the measure is presented to the committee. However, we urge P4QM to monitor the deliberations of its committees to ensure there is adequate understanding of all of the important aspects of the measures under consideration.

Appeals Process

Battelle intends to enhance the appeals process after measures undergo consideration by its E & M committees. We have concerns that the appeals panel consisting of the internal Battelle E & M team and the chairs of the E & M committee that initially considered the measure will revisit the measure with a preformed opinion of it. P4QM notes that others will be requested to join the appeals panel as needed but we are concerned that failure to include such experts from the outset of the appeals process will not improve the transparency of the process and will not make it more robust.

Committee Structure

P4QM proposes a novel process to increase engagement of all committee members. We support the proposal to facilitate expert involvement and ensure more equitable sharing of ideas among committee members, we are concerned that the new process may not achieve these goals.

The prior process had disease-specific E & M committees made up of expert stakeholders. The Renal Standing Committee included health care professionals, patient representatives and experts in the science of quality measurement all with experience and expertise in kidney disease and the operations of dialysis facilities. I served as a member of the Renal Standing Committee from 2020 until Battelle assumed the CMS contract for E & M in 2023. The process proposed by the P4QM replaces disease-specific committees with committees that evaluate measures by patient experience or life journey. As a result, measures related to kidney disease will be divided into two larger projects: “Management of Acute Events, Chronic Disease, Surgery and Behavioral Health” and “End-of-Life Care, Rescue and Specialized Interventions”. P4QM proposes a target of as many as 45 members for each of these committees. Since each committee will be responsible for many areas of medicine, it is likely that each will have a small number of individuals (perhaps one or two) with expertise in kidney disease on each of the advisory and recommendation groups.

We are very concerned that this proposal will dilute the expertise of the E & M committee evaluating measures related to kidney disease. The dialysis facility is a unique care setting guided by a unique Federal program with a Quality Incentive Program (QIP) that penalizes underperforming facilities. We support the program’s process of evaluating and comparing dialysis facilities based on clearly structured, objective quality measures but we note that the QIP often disproportionately impacts financially vulnerable facilities treating the most socially and medically disadvantaged patients.

We are also concerned that the division of the committees into a Foundational Advisory Group members of which will be responsible for reviewing measures and submitting their recommendations individually and a Recommendations Reconciliation Group members of which will only discuss those measures for which consensus was not reached by the Foundational Advisory Group. We are concerned that this structure will impact the committee members’ opportunity for discussion of the measures. Without this discussion, we are concerned that measures might receive endorsement without due consideration of the ability of dialysis facilities to collect the data and without due consideration of unintended consequences the new measures may have on patients.

The Renal Standing Committee developed to ensure that measures under evaluation for inclusion in the QIP are technically appropriate for use in the unique patient population who receive care in these specialized settings. We are concerned that the proposed committees will lack the clinical knowledge and specialized experience to evaluate potential unintended consequences of new measures. We are concerned that this will result in adoption of new measures into the QIP without input from subject matter experts with knowledge of the processes of dialysis facilities or the patients who would be impacted by them.

CMS has a goal of limiting quality measurement to “Measures that Matter”. We are concerned that P4QM’s new process will negatively impact the quality of the QIP by adding measures that are of lower importance to patients and those who care for them. We urge P4QM to reinstate clinically focused committees including the Renal Standing Committee.

Thank you again for the opportunity to comment on the Battelle Clinical Quality Measure Endorsement and Maintenance Process.

Sincerely,

Jeffrey Silberzweig, MD

Chief Medical Officer, The Rogosin Institute

Professor of Clinical Medicine, Weill Cornell Medical College

Battelle Comment Letter 072023.pdf

Name or Organization

Jeffrey Silberzweig, MD

Comments on the E&M Guidebook

Which aspect(s) of the guidebook are you commenting on?

General E&M processes and policies

Committee structure and voting outcomes

Accountable-entity level testing guidance - reliability and validity

Measure use – accountability application characteristics

Lack of practical info on how to submit a measure application within the STAR system

While there is a great deal of information on how measures will move through the endorsement process, there is no information given on what application materials measure developers will need to prepare in order to submit. This information is critical for measure developers to know beforehand so we may scope project work to meet the submission deadlines.

Continued use of Trial Use endorsement status is unclear

In previous endorsement cycles, measures were able to be approved and endorsed for trial use, meaning measures could be implemented to collect needed data to support validity and reliability testing. This type of endorsement is not included in the E&M guidebook. Clarity is needed on what steps measures developers need to take if their measures were previously endorsed for trial use, as well as overall information on how this type of endorsement will or will not exist moving forward.

Committee categories do not feel encompassing or realistic

There is a lack of clarity in the guide about what types of measures fall within these five categories. For example: Patient reported outcome performance measures (PRO-PMs) of any kind do not seem to fit within this structure. Measures of reproductive healthcare, including contraceptive care and pregnancy, do not seem to fit anywhere within these categories

More clarity is needed on content expert selection

Are different content experts brought in for each measure review process? How are these experts selected? Our concern is that because our work is very specialized, a subject matter expert in a broader area of the field may not understand the measures.

Clarity needed on maintenance review timelines

Maintenance review timelines are listed as every 5 years under “Annual updates” section (p.27), but then also listed as 3 years in “Endorsement Decision Outcomes” table (p.4). Which is the correct timeline?

More information on requesting an extension of one year is needed. On Table 2 (p.4) extension is mentioned but not expanded on under the “Maintenance of Endorsement” section (p.23).

Name or Organization

UCSF

NCQA Public Comment Submission

Which aspect(s) of the guidebook are you commenting on?

Committee structure and voting outcomes

Measure use – accountability application characteristics

NCQA would like to comment on the proposed E&M and PRMR committees. We’re concerned that the E&M committees and setting specific PRMR committees (i.e., Hospital Committee, Clinician Committee, and PAC/LTC Committee) are not comprehensive enough to capture the expansive health care landscape. Additionally, we’re concerned that the proposed number of aggregate members (∼60) slated for each committee is too large. With groups this considerable in size, it will be difficult to have confidence in the appropriate expertise and experience of committee members.

Name or Organization

National Committee for Quality Assurance (NCQA)

Comment and request for clarification on new CBE process

Which aspect(s) of the guidebook are you commenting on?

Measure use – accountability application characteristics

Other

Thank you for the opportunity to comment on the Battelle CBE process. From the developer perspective, we have the following comments and clarifications regarding the new process.

The new category “Endorsed with Conditions” states that measures could fall into this category when committee reviewers have recommendations they would like to see changed when the measure comes back for maintenance. We would like further clarification on how those changes are evaluated/adjudicated, as it is very common for Committee members to want to see changes to measures. This could result in measures coming back through the CBE process more frequently than every three years. Uncertainty in the maintenance process makes it more difficult for us to forecast our time and resources.

We would like to confirm if there is still a “trial use” category for eCQMs. If not, we would like to request that the trial use category be re-instated due to the difficulty in obtaining EHR data for the ideal rigor of testing prior to full measure implementation.

Could Battle please clarify the voting categories for each of the criteria? We are concerned that use of “pass” or “fail” for criteria previously rated with levels (for example, strong or moderate) would not sufficiently allow for nuances in how Committee members interpret the criteria and/or the empiric data.

The prior NQF process included detailed supporting algorithms to assist Committee members with determining a rating. Without the algorithm there may be more variation in Committee member's evaluations in this area. We encourage Battelle to provide more information about the guidance it will be providing to Committee members.

Equity category: Battelle’s new equity criteria requires that measures work toward reducing disparities in healthcare and that measures must show variation by disparity variable. Could Battelle clarify how measures that are downstream of an access disparity would be managed? For example, if there is a quality outcome measure that does not capture a diverse population because of upstream healthcare access inequities, the measure may neither work toward reducing disparities nor show variation in disparities (unless of course there were a companion utilization measure).

For related/competing measures, Battelle’s guidebook states that the new measure must be “superior” to the existing related/competing measure, but there is no definition of the term “superior.” Please clarify how this determination will be made.

Finally, there there is a typo on page 29: The guidebook states: “Reviewer determined that based on feedback provided regarding feedback on measure performance that the measure is not usable” but we think you meant to write “….that the measure is usable.”

Thank you for considering these comments and clarifications.

Kind regards,

The CBE Team at Yale/CORE

Name or Organization

Yale/CORE

APA Comments on E&M Guidebook

Which aspect(s) of the guidebook are you commenting on?

General E&M processes and policies

Committee structure and voting outcomes

Measure use – accountability application characteristics

PQM Measure Evaluation Rubric

Committee Structure

APA has concerns about consolidating the number of committees from the previous consensus-based entity (CBE) process. Fewer project areas to distribute measures may result in an increased workload for committee members. Although PQM has stated they will limit the number of measures in each cycle, concerns still remain knowing the full volume of measures in any given year.

More importantly, APA has concerns around the significant limitations of subject matter expertise. With only 2 clinicians proposed as SMEs for the recommendations committee, the committee cannot possibly have the breadth of expertise necessary to adequately review submissions for the range of topics proposed. This is particularly concerning for the committee that will cover: Management of Acute Events, Chronic Illness, Surgery, and Behavioral Health. Battelle has stated they will seek additional SMEs if needed based on the intents to submit measures for endorsement submitted at that stage of the review cycle. However, this process will be ad hoc and may not allow enough time to gather such expertise. It is also unclear whether those additional SMEs would be able to vote. Battelle should reach out to professional organizations that represent the relevant clinical topic area(s) in addition to reviewing existing members of PQM. These organizations have experience implementing certain measures and may be aware of unintended consequences or other barriers to reporting. The breakdown of stakeholders seems lopsided. Clinicians should be the majority, given that they are the ones most impacted by these measures.

The proposed structure is somewhat artificial and does not support integrated or collaborative behavioral health care. Participants in the Certified Community Behavioral Health Clinics model (CCBHCs) report on measures related to depression screening, treatment, suicidality, and substance abuse. CMS is about to pilot the Making Care Primary program which will include reporting on behavioral health measures by primary care clinicians. Primary care clinicians perform the majority of initial depression screening. Separating behavioral health from primary care sets medicine back and negates the years of work that have occurred to integrate physical medicine and mental health and will make it harder to ensure that there is input by those impacted by the measures during their review.

Consensus Process

Battelle proposes that the Advisory Group and Recommendation Group members will evaluate the measures and complete a standardized form. The recommendations group will then vote on the discussed measures individually. Battelle is using standardized consensus processes to find areas of disagreement, which will be addressed in the Recommendations Group meetings, to help expedite the process. It is not clear whether Recommendations Group members will have the opportunity to discuss all aspects of the measures, especially areas where there is agreement (and not only disagreement). Although the majority of the Recommendations Group members may agree on a measure, the minority should be able to express why they don’t agree. There also should be a mechanism to ensure all public comments are addressed, whether in support of or against.

Voting Procedures

APA does not have a strong opinion on the change to the voting processes. However, it will remain to be seen whether a 75% majority overall vs separate votes on the various aspects of the measures is realistic.

Public Comment

APA is in favor of the 30-day public comment period, but respectfully requests that deadlines not be set on weekends. This effectively shortens the review period for member organizations. APA also requests that comments be shared with the members of the review committees prior to the meeting to ensure proper time for thoughtful consideration prior to the discussion.

Submission Tool and Repository (STAR) Website

APA recently needed to access the STAR website to review a measure specification. While the retention of the previous CBE numbering system is helpful, the materials from the previous process were not accessible and many links were non-functional (e.g., when clicking on Endorsement Removed, the user is taken to the full inventory of removed measures. There is no information available as to why endorsement was removed for a particular measure. Under Current Use, if one clicks Payment Program or Regulatory and Accreditation Programs, no information about which programs include the measure is available.). APA respectfully requests that materials be imported from the previous process or at the very least, are accessible somewhere. We realize this may not be completely live yet, but it has been several months and access to those documents is needed.

Name or Organization

American Psychiatric Association

American Society of Anesthesiologists Comments on E&M Guidebook

Which aspect(s) of the guidebook are you commenting on?

General E&M processes and policies

Committee structure and voting outcomes

Accountable-entity level testing guidance - reliability and validity

Measure use – accountability application characteristics

The American Society of Anesthesiologists (ASA) appreciates the opportunity to comment on the Partnership for Quality Measurement’s (PQM) Endorsement and Maintenance (E&M) Guidebook. Although the guidebook maintains several processes from previous years, we nonetheless support several proposed improvements to expedite the measure endorsement cycle and eliminate redundant committee reviews. We hope these revised processes will ensure a fair, transparent, and objective means for multi-disciplinary committee reviews and endorsements.

Endorsement and Review Process

Over the course of the last decade, we grew increasingly concerned that the evaluation standards established by the National Quality Forum limited our ability to submit quality measures and have those measures assessed in a fair and meaningful way. For example, the Scientific Methods Panel (SMP) prevented one of our measures from progressing based upon one SMP member emphasizing his personal and anecdotal evidence over recent scientific literature. During the same review, our process measure was criticized for not being risk-adjusted even though we clearly stated reasons for the measure not to include such a feature. We unfortunately experienced similar situations when our measures were reviewed by consensus-based committees. ASA measures were scrutinized and voted upon by members who had limited knowledge of quality anesthesia care and anesthesiologist workflows. In those discussions, we felt some committee members used their personal experiences with anesthesia care to inform their endorsement decisions.

Battelle must ensure measure endorsement is within reach for specialty societies and measure stewards. The previous measure endorsement criteria, including its emphasis on identifying performance gaps, limited our ability to submit measure endorsement applications, let alone to have the measure endorsed by a standing committee. A short-sited policy that emphasizes performance gaps over clinical importance, including patient safety, of a measure does not serve patients or our health care system well. Anesthesiology patient safety measures, by nature, approach topped out status quickly. Previous consensus-based organizations have dismissed our measures on a performance gap technicality instead of understanding how upstream measures prevent or eliminate gaps in care downstream. For instance, our members perform prophylactic antibiotic administration and perioperative temperature management measure at nearly 100%. Those measures contribute to fewer surgical site infections, resulting in lower costs and lengths of stay for patients. Yet both measures had their endorsements removed because they were “topped out.” We request Battelle encourage Advisory and Recommendation Groups to consider measures based upon patient and health system needs rather than on a statistical qualifier.

As a specialty society, our registry has limited access to testing data, especially related to testing validity and assessing data from electronic health records (EHR). The vast majority of anesthesiologists do not have access to patient data once the patient is discharged from the postanesthesia care unit. Our registry, likewise, does not have data that can validate a process measure in the perioperative period with postoperative patient outcomes. This remains a significant challenge for us and limits our ability to demonstrate measure validity. We are also concerned about an overemphasis on electronic data sources and data generated by EHRs. Our members have limited access to EHRs. Many of our measures are reported by anesthesiologists using paper-based anesthesia records as well as quality applications on a phone or personal electronic device. We hope Battelle will develop and implement measure endorsement policies that are accessible yet meaningful to measure stewards, patients, physicians, and organizations regardless of how the data could be collected.

Battelle has proposed several changes that will expedite the measure review process and encourage its Advisory and Recommendation Groups to objectively assess a measure. We have confidence that Battelle and its staff have subject matter expertise in consensus building and facilitating an objective endorsement process. Battelle must ensure its staff promotes a partnership with Advisory and Recommendation Groups where expertise in measure review complements clinical review. ASA welcomes the opportunity to work with Battelle on developing policies that embrace patient safety measures, including those that may appear “topped out,” and ensuring physicians who use paper records can report measures that are nationally endorsed.

E&M Committee Composition, Roles, Responsibilities

ASA supports the proposed project structure that uses the Novel Hybrid Delphi and Nominal Groups Technique. Battelle has suggested a nimble approach that uses up to 45 committee members to form the Advisory (Delphi) Group and up to 15 committee members to sit on the Recommendation (Nominal) Group for each of the five project areas. Although we expect anesthesiologists to primarily participate in the management of acute events, chronic disease, surgery, and behavioral health project, anesthesiologists nonetheless have much to contribute on other project committees. As perioperative medicine physicians, anesthesiologists will provide valuable insight into a range of measures that have previously been considered by a consensus-based organization. Anesthesiologists will be an asset to Battelle on each of these projects, especially as we seek to align measure use national interests in understanding a patient’s care journey and enhancing care coordination.

Battelle has smartly proposed several other changes to the project committee structure. We support the inclusion of subject matter experts to augment the Recommendations Workgroup when needed. In previous review cycles, we were often at a loss when an anesthesiology or anesthesiology-related measure would be voted on by health care professionals who had limited knowledge of the role or responsibilities of an anesthesiologist. This proposal to augment the committee, when needed, addresses our concern. Likewise, we support the proposal to annually rotate one-third of project members off the project rosters.

Additional Developer Resources

ASA looks forward to participating in the Battelle virtual Measure Developer Workshop to learn about PQM processes for endorsement and how best to engage Battelle staff on cutting-edge topics relevant to measurement. These meetings, including virtual webinars, will help us understand how to align our future measure development with endorsement criteria. These meetings and resources, including the availability of Battelle staff to assist us through the submission and review process, will only add to the transparent and welcoming process that Battelle is establishing.

Appendix D: PQM Measure Evaluation Criteria

For each of these evaluation criteria, we ask Battelle to develop supporting materials on how a measure steward can effectively complete the application forms and meet evaluation and endorsement criteria. Over the course of the last 10 years, we have become more concerned with the level of expertise and data analysis required for a measure to receive endorsement. In general, we feel that previous consensus-based committee members would have supported our measures had it not been for a mechanical assessment of a measure. For instance, the feasibility criteria requires that measure stewards use electronic data capture and/or the measure data “are available in EHRs and other electronic sources.” Although our measure may be clinically sound and important, these particular criteria may set our measures up for failure. Some have estimated that more than a third of anesthesiologists, especially those working in ambulatory settings, use paper charts and records. For many of our measures, we fear an Advisory or Recommendations Group might be handcuffed in providing an objective assessment of our measure based upon preestablished criteria. In short, we caution Battelle from using an inflexible structure of measure endorsement that unfairly rewards those using EHRs while discouraging measures used in less resourced settings.

_____

MATTHEW T. POPOVICH
Chief Quality Officer

American Society of Anesthesiologists

Name or Organization

American Society of Anesthesiologists

LTSS/HCBS quality measures and E & M committee structure

Which aspect(s) of the guidebook are you commenting on?

Committee structure and voting outcomes

We applaud PQM’s commitment to an equitable distribution of effort as operationalized through fewer, more generalized E&M committees composed of members with a diversity of experience and expertise. However, we are unsure where measures related to the quality of long-term services and supports (LTSS) would fall. As a result, we are concerned that when measures related to LTSS are submitted for endorsement or maintenance, the applicable committee will not include experts with knowledge of the distinct characteristics and exigencies of the LTSS field, or people with lived experience.

With the previous CBE, measures of LTSS and Home and Community Based Services (HCBS) quality fell under the purview of the Patient Experience and Function Standing Committee, which oversaw measures examining functional status change and assessment, shared decision making, care coordination, patient experience, and long-term services and supports. However, with the new E&M committee structure, while measures examining patient experience can be submitted to many of the committees depending on the clinical experience being assessed, those measures used to understand the experience of LTSS no longer seem to fit in the framework.

Quality of LTSS is critical to managing overall healthcare quality throughout the lifecycle and is a priority for CMS. The absence of a space for such measures from the topic areas of the CBE’s portfolio represents an oversight that is not in line with the direction in which the health services quality measurement field is developing.

Though it varies by state, nationally, in 2019, 34% of total Medicaid spending went to LTSS, and 59% of total LTSS spending went to Home and community-based services (HCBS). State Medicaid programs provide the majority of support for LTSS, including HCBS. To measure quality of LTSS, states use a patchwork of measures that do not allow for standardized calculations or comparisons. State managers need reliable and valid measures to assess the quality of Medicaid HCBS and other LTSS. To measure quality of LTSS, person-reported outcome measures can “help states determine how well HCBS structures and processes meet individuals’ goals and improve their quality of life” (Lipson, 2019).

There has been much emphasis on the need for standardized quality measures for HCBS specifically, as demonstrated by the CMS Proposed Access Rule Ensuring Access to Medicaid Services (Access Rule). This proposed rule is aimed at ensuring access to services and ensuring quality of Medicaid services. One of the proposals articulated in this proposed rule is that “States must report on a set of nationally standardized quality measures specifically for HCBS established by CMS.” Based on this priority, it would make sense to have a similar area of focus within the CBE portfolio.

Finally, Jacobs, et. Al, in the New England Journal of Medicine described the conceptualization of a Universal Foundation for aligning quality measures across CMS. In their commentary, they noted that measures will be necessary to measure the quality of long-term services. They noted: “Our intention is that the Universal Foundation will eventually include selected measures for assessing quality along a person’s care journey — from infancy to adulthood — and for important care events, such as pregnancy and end-of-life care. We started by identifying preliminary measures for the Universal Foundation’s adult and pediatric components.….. Additional measures will be necessary for assessing care provided to specific populations or in certain settings, such as … long-term and community services.” To aid in the assessment and endorsement of such measures so as to be included in the Universal Foundation in the future, it seems critical that the CBE include a space for such measures in the portfolio.

We urge the CBE to clarify, of the 5 project topical areas, which is most applicable to measures of LTSS quality, specifically person-reported outcome measures.

Name or Organization

Human Services Research Institute

Joint Commission comments on the E&M Guidebook

Which aspect(s) of the guidebook are you commenting on?

General E&M processes and policies

Committee structure and voting outcomes

Accountable-entity level testing guidance - reliability and validity

Measure use – accountability application characteristics

PQM Measure Evaluation Rubric

Appeals

Other

The Joint Commission appreciates the opportunity to comment on the Partnership for Quality Measurement (PQM) Endorsement & Maintenance Process Guidebook (E&M Guidebook) June 2023.

Founded in 1951, The Joint Commission seeks to continuously improve health care for the public in collaboration with other stakeholders, by evaluating health care organizations (HCOs) and inspiring them to excel in providing safe and effective care of the highest quality and value. An independent, not-for-profit organization with a global presence, The Joint Commission has programs that accredit or certify more than 22,000 HCOs and programs in the United States. The Joint Commission evaluates across the continuum of care, including most of the nation’s hospitals. Although accreditation is voluntary, a variety of federal and state government regulatory bodies, including CMS, recognize The Joint Commission’s decisions and findings for Medicare or licensure purposes.

The Joint Commission appreciates the efforts to streamline the Evaluation and Maintenance process and provide a comprehensive explanation in a concise document. Overall, The Joint Commission supports the revised process, which will permit E&M decision-making in as few as six months and two cycles per year. We also support the Novel Hybrid Delphi and Nominal Group (NHDNG) structure described in the Guidebook. We agree with the proposal to increase the number of members reviewing measures and collecting pre-evaluation independent ratings. The use of independent preliminary committee ratings to inform committee discussions can equalize the input received on measures and minimize the likelihood that the voices of a “vocal few impart too much bias on the results” (p23). This change will facilitate consensus and permit an unbiased and stronger evaluation. When committee members are held responsible to review and provide feedback in advance of meetings, all voices can more easily be heard. Joint Commission appreciates the desire to focus meeting time on areas of disagreement and agrees that this can support consensus building.

We request clarification on the composition and role of the Scientific Methods Panel. While the panel is mentioned, there is no explanation within the rest of the document about who can serve or how that group will conduct its activities. If the composition, processes, and role of this group will change from the previous consensus-based entity’s (CBE’s) methodology, it would be important to define how.

Related to membership on the panels, Battelle mentions partner IHI and Rainmakers throughout the document, however, the role of these organizations is not clearly defined or explained. Considering these partners will have significant impact on appointments to panels, decisions related to the measure’s endorsement and maintenance processes, and, measures used, clarification of how the partnership is structured and each organization’s purview would be helpful.

The document states, “Each year, committee members are randomly assigned, within roster categories, to either the Advisory Group (36-45 individuals) or the Recommendations Group (13-15 individuals), except for co-chairs. This means that yearly Advisory and Recommendations Group assignments are mutually exclusive and will not change for up to two cycles (Fall and Spring).” While this approach is fair, we recommend implementing a process that allows participants to opt out of the Recommendations Group if they feel they will be unable to commit the additional time to preparation and presentation. Likewise, we recommend adding a process to replace Recommendation Group members if for any reason PQM staff or the member determine a member will no longer participate.

We appreciate the establishment of term limits because that can increase the voices and opinions expressed and increase participation over time. For panel leadership, it can also permit different leadership styles and reduce ability for long-term appointees to unduly influence the processes and decisions.

Pertaining to the timeline, The Joint Commission appreciates the inclusion of at least 30 days for public comments. When comments are submitted on behalf of a large organization, coordination is required to capture and respond with the organization’s full breadth of expertise. Patients and providers may also have difficulty with short public comment periods. We recommend that deadlines that occur on a weekend or holiday be moved to either the Friday or Monday before or after to keep deadlines on business days.

We are concerned with PQM allowing only two business days to respond to the completeness check and the factual review regarding the E&M team’s preliminary assessment. This may not always be feasible, for example, if the lead staff is out of the office, it may be more difficult for staff covering to respond. We respectfully suggest allowing at least five business days for these responses.

Regarding eCQM submission guidelines presented on page 16, we suggest PQM revise the statement regarding value sets. It currently reads, “ If such a published value set does not exist, then the measure developer must demonstrate that the value set is in draft form and is awaiting publication to VSAC.” If the intent is that E&M committee members can review value set contents, we recommend this statement be revised to require developers publish their value sets and provide OIDs for the published value sets. This is consistent with our current practice as a measure steward. We agree measure developers must provide value set information at the time of submission, including value sets new to the measure and those that are existing and reused. Value set publication is performed by the developer; there is not a dependency such that the developer would “await publication to VSAC.” Additionally, value sets in “draft” status are not accessible to external reviewers.

Per the E&M processes outlined, the Recommendations Group does not discuss measures that have reached consensus (75% or greater) based on the aggregated independent reviews. Please clarify if that means that an Advisory group member cannot join the Recommendation meeting to raise a concern if it relates to measure criteria that have already reached consensus? Specifically, is there a mechanism for a Review Group member to request discussion at the Recommendation meeting regarding a measure that has achieved 75% consensus.

The Joint Commission has concerns with the voting procedure for the Recommendations Group meeting, specifically that when the 80% voting quorum is not achieved, the members not in attendance for the group discussion can submit their votes offline after the meeting. Those not present are not able to bring their issues forward and participate in the discussion and they will vote later without having their issues heard by the entire group or the ability to address questions with the measure steward. Dissenting views that are not discussed by the group will not benefit from the group discussion and will impact their final asynchronous voting offline. We additionally have concerns that 48 hours or two business days would not be sufficient time to gather votes.

Regarding the appeals process described on page 24, the indication of “material interest” as a reason for appeal suggests monetary impacts to stakeholders drive appeals.

We request clarification on the terminology “material changes” (page 27), because it leaves room for interpretation. Codes are often added and deleted without impact to the clinical intent of the measure due to terminology version releases. Stewards usually identify when a measure rate would be impacted by the change. Adding criteria or a list of the kinds of measure changes that need to be reported would assist stewards to do their due diligence.

Further clarification of definition of criteria for off-cycle review is needed. Battelle lists several reasons an off-cycle or emergency review can happen which are new to the process and have the potential to increase the burden of measure maintenance for stewards. It appears any stakeholder can request off-cycle review for any reason, including perceptions of burden, or for political and economic reasons, such as development of a competing measure.

The way to request a measure’s off-cycle review, and the method of evaluation of requests does not appear to be described. Battelle proposes submitting requests to an email address; this approach lacks the transparency of other comment activities described. The document also does not describe how Battelle and the co-chairs will evaluate a request. We recommend an objective scale or rubric replace the subjective concept of “significant and emergent” described in the guidebook.

In Appendix A, there is a section that reads, "Stewards are also responsible for maintaining measure details and specifications on any publicly available website.” We support this recommendation.

On page 36, there is a section explaining scientific acceptability, but it is unclear whether those statements apply to initial endorsement, maintenance, or both. It is also unclear whether the Equity section pertains to initial endorsement, maintenance, or both.

Thank you for this opportunity to review and provide comments. The Joint Commission is pleased to answer any questions you may have regarding our comments. If you have any questions, please do not hesitate to contact me or my staff: Michelle Dardis, Director of Quality Measurement at (630) 792-5915 or mdardis@jointcommission.org.

Name or Organization

The Joint Commission

General Comments

Which aspect(s) of the guidebook are you commenting on?

General E&M processes and policies

The Federation of American Hospitals (FAH) appreciates the opportunity to comment on the Partnership for Quality Measurement (PQM) Endorsement and Maintenance Guidebook. We are supportive of PQM’s efforts to ensure that the process emphasizes consensus. However, we are very concerned that the project topic areas and associated committees will not ensure that the right clinical expertise reviews each measure. In addition, we question some of the proposed changes to the appeals process and the apparent changes to the measure evaluation criteria. We ask that the PQM carefully consider the following comments in an effort to further improve the process.
Regarding the proposed topic areas, we are extremely concerned that this structure will not ensure that measures are reviewed by the relevant clinical experts. For example, the Management of Acute Events, Chronic Disease, Surgery, Behavioral Health project is much too broad and would require representation by many specialties including but not limited to surgery, primary care, emergency medicine, endocrinology, gastroenterology, ophthalmology, and psychiatry. Based on the target number of individuals for each roster category outlined in Table 3 on page 9, we do not believe that having eight clinicians serve on this committee will be adequate. While this proposed process intends to leverage subject matter experts (SMEs) in instances where the committee does not have the necessary expertise, the guidebook does not adequately address how these SMEs will be identified or how they will be vetted (both to ensure that they have the relevant background and meet the conflict of interest policy). Based on the potential breadth of measures that would be reviewed in these projects, we believe that SMEs will be used frequently and as a result, question this lumping of so many clinical areas into so few projects. We also do not believe that having one SME provide input on a measure is sufficient, particularly when they are not able to vote on that measure. We urge the PQM to reconsider limiting measure reviews across only five topic areas and be more explicit on how SMEs will be vetted and their participation in the process.
The FAH also believes that saying that the process includes two public comment periods is misleading since the second public comment will really function as the appeals process and those commenting during that timeframe must justify their concerns against two criteria. While the appeals process is important, we do not consider it to be the traditional public comment where stakeholders can share their opinions and perspectives without any limitation. Furthermore and perhaps more importantly, we strongly disagree with including Battelle staff or the co-chairs of the very committee that voted for or against a measure in the review of any appeal. There is a very real risk of introducing bias into the process and it should be avoided at all costs.
Regarding the clarifications provided on electronic clinical quality measures (eCQM) testing on page 16, we believe that PQM is raising the bar and deviating from the previous criteria – something that it was our understanding that PQM sought to avoid at least initially. Language in the third bullet on page 16 states:
“Documentation of testing on more than one electronic health record (EHR) system from more than one EHR vendor is required to establish Scientific Acceptability (i.e., reliability and validity), indicating that the measure data elements are valid and that the measure score can be accurately calculated.”

The previous criteria guidance used by National Quality Forum (NQF) stated that:
“Data element validation is required for all eCQMs (demonstration of accountable-entity level validation is also encouraged). For eCQMs based solely on structured data fields, reliability testing will not be required if data element validation is demonstrated. If data element testing is not possible, justification is required and must be accepted by the Standing Committee.”
It also stated that:
“The minimum requirement is testing in EHR systems from more than one EHR vendor. Developers should test on the number of EHR systems they feel appropriate. It is highly desirable that measures are tested in systems from multiple vendors.”
On comparison of the PQM guidebook against the previous NQF criteria, it appears that both measure score reliability and data element validity testing will now be required and the number of vendor systems was possibly increased. Many developers who planned on submitting an eCQM for endorsement in the next year or two will likely have either completed or at least budgeted and planned for testing based on the previous criteria. These changes will have a significant impact on whether developers will be able to meet the endorsement requirements and may prohibit groups from submitting measures that can greatly contribute to advancing quality. We do not view these changes as just clarifications and the unintended consequence of discouraging measure submissions must be avoided.
Given the assurances that Battelle staff gave when asked if the measure evaluation criteria would be changing, we are shocked to see several modifications in Appendix D that were not noted elsewhere in the guidebook. Similar to our reaction to the eCQM clarifications, we believe that some of the language in this criteria are more than just clarifications or simplifying the submission process. While we appreciate the inclusion of guidance on how a measure may meet or not meet a criterion, we have several questions or concerns:
•    The previous set of criteria used by the NQF required measures to pass importance and scientific acceptability, which ensured that measure met the highest bar for these two criteria. Has that “must pass” requirement been removed?
•    Under scientific acceptability, clarification on the following items is required:
o    While we very much support the inclusion of thresholds for both levels of reliability testing, it is not clear if the threshold will be set using the minimum or average result. The FAH strongly urges PQM to set this threshold at the higher bar for the minimum result rather than for the average.
o    Will developers be allowed to provide their explanation or interpretation of the findings for reliability testing? It is included for validity testing and we believe that developers should also provide this information for validity testing.
o    Is data element validity testing considered to be empiric validity testing? Based on the selections, it appears that it is but clarification on that is needed.
o    How will eCQMs be reviewed against the reliability and validity sub-criteria given our comments and concerns outlined above.
o    For risk adjustment, it is not clear whether developers will be required to provide a sufficient level of detail on whether social risk factors were considered and tested. Developers often discount the importance of this question and we believe that it will be critical to understand the impact of their inclusion or exclusion in the model. We do not believe that the current criteria adequately address this concern.
o    The previous criteria used by NQF were more comprehensive on the threats to validity including exclusions and missing data. Knowing how a measure as specified meets these threats is very important and we do not agree with omitting them from the evaluations.
•    Equity
o    We assume that this criterion is intended to replace the previous sub-criterion on disparities in care and support its continued inclusion.
o    However, the description of this criterion appears to require empirical testing of the difference in scores, which differs from what was previously required where developers, particularly during the initial review, could provide supporting literature or distributions of performance scores by subgroups with no statistical analysis.
o    In addition, based on the current language, it is not clear whether a measure that either demonstrates overall poor performance or variation across the entire patient population but does not demonstrate differences in subgroups could be considered as eligible for endorsement.
o    While measuring and understanding the potential disparities in care is very important, some measures may not have disparities across subpopulations and that lack of variation should not prevent a measure from being endorsed.
o    We ask that PQM clarify the intent of this criterion.
We request that the PQM reconsider the proposed timeline for the two endorsement maintenance cycles. Based on Figure 1 on page 5, public comment is scheduled to occur at the same time as much of the work for Pre-Rulemaking Measure Review (PRMR)and Measure Set Removal (MSR) and will likely overlap with other activities such as public comments on proposed rules. These overlapping timelines lead to a significant burden for external stakeholders. We are extremely concerned that it will lead to reduced public input on one or more of these activities and urge the PQM to change the timing of the MSR and endorsement comment periods to avoid the months when proposed rules are also released.
Lastly, we appreciate that the PQM recognizes that refinements will need to be made to the process and that any proposed changes will include a formal public comment period in addition to a timeline for transition. We also note that the PQM commits to not applying any changes to the process or criteria for measures that are currently in the review process. However, measure development and testing is a multi-year process and developers need sufficient time to incorporate any changes into their development and testing plans. We urge PQM to commit to delaying implementation of any significant changes, including some of the measure evaluation criteria revisions outlined in this guidebook, for at least two years in order to allow them to be responsive. We also request that the PQM commit to an initial evaluation after the first or second year of implementation and ongoing re-evaluations of the process. These evaluations should be comprehensive including whether the Novel Hybrid Delphi and Nominal Group (NHDNG) technique and structure of the Advisory Group and/or Recommendation Group successfully achieve the desired goal of consensus-driven recommendations and whether the project topic areas and expertise on the committees are appropriate.
While the FAH is supportive of many of the proposed processes outlined in this guidebook, we urge PQM to ensure that the endorsement maintenance process is stable and consistent with sufficient advance notice of changes. Otherwise, there is great risk of compromising its integrity and discouraging participation in the process.
Thank you for the opportunity to comment.

Name or Organization

Federation of American Hospitals

Committee structure and appeals process

Which aspect(s) of the guidebook are you commenting on?

PQM Measure Evaluation Rubric

Appeals

Thank you for the opportunity to comment.

Committee structure and voting process concerns and request for clarifications :It is unclear how the committee voting and outcomes will be determined under the split committee structure and review period.first, it is concerning that the committee offline review and rubric will be completed concurrently with the public comment period as described on page 19. This means that some measure endorsement decisions may be determined without regard to public comments received as only areas of disagreement between the advisory and recommendations groups will be included in the verbal discussions of the public meetings. Second, will the rubric counts and percentage be included in the reports? And related, does the 80% voting quorum apply to the offline rubric since measures do not appear to be included in the public meeting if consensus is reached during the offline review. Third, if 80% voting quorum of the combined committee why can only a third of the committee contribute to the public discussion? Lastly on voting, please describe if the endorsement decision is based on 75% of all voting members voting in favor or no for endorsement or if the 75% is that 75% agreement between the two groups. The appendix table shows percentage for groups of 20 and 40 voters. But it is unclear how the two groups (advisory and recommendations) groups and compared for the final determination if consensus for or against or approve with conditions is determined. Adding an appendix with an example showing a sample decision process for a hypothetical measure with full numbers and related calculations of an sample measure with rubric totals, percentages for each group on each rubric item as well as the level of agreement between the two committee groups (advisory and recommendations) then if that leads to final decision or live meeting committee action. Including a sample vote during or following committee meeting discussion.

With regards to appeals, I agree with other commenters that the appeals period is different from a public comment period and should labeled appropriate in the guidebook and documents and announcements during the process. Second i am concerned that the measure developer participation is not included in the process description including if the the developer will have the opportunity to submit a written response to the appeal . I was the NQF appeals board team manager in 2022 and am aware how important it was to the developer of sepsis bundle measure #0500, to have appropriate time to write their response. I recommend if and when the developer will have the ability to provide a written response, Third, why not standing independent appeals board similar to the one NQF created in 2022 ready each cycle to review any appeals, rather then a adjective committee of the standing committee co-chairs. If you have an independent appeals board already seated that group can be scheduled and not delayed the process. That group can bring a fresh perspective to the situation as well as greater public and industry confidence in the final decision. I think part of the language is that the a decision that the appeal does not have merit or meet the minimum criteria for an appeal. The US courts have a process called summary judgment where a judge can determine a case does not meet minimum requirements or evidence and dismiss the case and appeals courts don’t always hear cases ( the Supreme Court has procedures to decide which cases submitted they will hear). A process for summary judgment dismissal of appeals may be appropriate however an independent appeals board can bring less bias to the final decision especially when a vote is held or an Appeal is dismissed without full review. These measures are high stakes for healthcare professionals and organizations as well as the public seeking care since the health systems react and change behavior based in hopes of increasing their scores, it is important that endorsement decision be made carefully and appeals reviewed carefully.

Thank you again for the opportunity to comment,

Beth Flashner,MHA

Name or Organization

Beth Flashner

Thank you for the…

Thank you for the opportunity to review and comment upon the Partnership for Quality Measurement (PQM) first Endorsement and Maintenance (E&M) Guidebook. I submit these comments as an individual member of PQM with extensive consensus-based experience, as follows:
• I currently serve on the Scien􀆟fic Methods panel (since 2019)
• Member, Na􀆟onal Quality Forum (NQF) Varia􀆟on in Measure Specifica􀆟ons Advisory Group (2016-2017)
• Co-Chair, Technical Expert Panel on Composite Performance Measure Evalua􀆟on, Na􀆟onal Quality Forum (2012-2013)
• Member, Na􀆟onal Quality Forum Task Force on Measure Tes􀆟ng (2010-2011)
• Member, Na􀆟onal Quality Forum Task Force on Usability (2011-2012)
• Chair, Expert Advisory Panel, Na􀆟onal Quality Forum Measure Specifica􀆟on Coding Maintenance Project (2009-2010)
• Member, Surgery and Anesthesia Technical Advisory Panel, Na􀆟onal Voluntary Consensus Standards for Hospital Care: Addi􀆟onal Priori􀆟es, 2007, Na􀆟onal Quality Forum (2007-2008)
• Member, Safe Prac􀆟ces Maintenance Commitee, Na􀆟onal Quality Forum (2008-2014, intermitent)
• Member, Na􀆟onal Quality Forum Ad Hoc Advisory Commitee on Evidence and Performance Measure Grading (2005-2006)
• Member, Na􀆟onal Quality Forum Workshop on Child Healthcare Quality Measurement and Repor􀆟ng (2004)
• Developer or co-developer of many consensus-based en􀆟ty (CBE) endorsed measures of hospital quality and pa􀆟ent safety, including several electronic clinical quality measures (eCQMs) and claims-based measures, such as the Pa􀆟ent Safety Indicators (PSIs) originally developed by AHRQ and now stewarded by CMS in the form of PSI 90, the Pa􀆟ent Safety and Adverse Events Composite
Most importantly, I appreciate Batelle’s efforts to streamline the E&M process so the decision-making process can be expedited while maintaining the transparency and mul􀆟-stakeholder par􀆟cipa􀆟on that characterize the current process. The most notable and advantageous enhancements include:

• Re􀆟rement of the Consensus Standards Approval Commitee, which currently adds litle value to the E&M process;
• Tightening the calendar between the Intent to Submit and the full submission, limi􀆟ng the amount of informa􀆟on required with the Intent to Submit to those elements necessary for proper assignment of the measure to a project, and recruitment of needed reviewers; and
• Reducing the number of E&M commitees to ensure more equitable distribu􀆟on of effort and to increase the number and diversity of voters on each commitee (while increasing Batelle’s efficiency in managing the process).

However, I do have several concerns regarding the proposed process, and whether it will actually achieve the goals of CBE review.

First, the Scien􀆟fic Methods Panel (SMP) would be limited to enhancing “all measures by focusing on novel and the most difficult methodological challenges faced by measure developers.” The Guidebook is otherwise silent on the composi􀆟on, ac􀆟vi􀆟es, and procedures of the SMP, demonstra􀆟ng how it would be marginalized under the new Guidebook. Specifically, the SMP’s role would be en􀆟rely advisory, and would have no direct role in the E&M process. Although there is no need for the SMP to review every submited measure, or even every “complex” measure, it should be engaged to address par􀆟cularly important or novel methodologic ques􀆟ons on individual measures, or to help resolve ques􀆟ons regarding consistent treatment of similarly situated measures. For example, the project E&M commitee chairs, in consulta􀆟on with CBE staff, could refer measures to the SMP when addi􀆟onal methodologic input is needed for their scien􀆟fic acceptability review. Alterna􀆟vely, project commitees could have an op􀆟on of referring a measure to the SMP before making a final decision. Although such processes could poten􀆟ally pull a measure off the 6-month endorsement track, deferring a final decision to the next cycle, this process would apply to a small minority of submited measures, and the total dura􀆟on of the process would not exceed the dura􀆟on of the current process (for ALL measures). The SMP’s involvement in individual measure review would be expected to decrease over 􀆟me, as the measure evalua􀆟on criteria become more precise and beter understood, and as project commitees develop greater methodologic exper􀆟se, but elimina􀆟ng any op􀆟on for involvement in individual measure review seems imprudent at this 􀆟me of transi􀆟on and uncertainty.

Second, the proposed project structure would benefit from some clarifica􀆟on of the areas covered to improve balance across committees and to ensure that each committee is competent to review the measures assigned to it. To elucidate this problem, it would be helpful to enumerate the currently endorsed measures that would fall within the domain of each project. For example:
• Primary preven􀆟on, which is typically defined as efforts to prevent the development of disease, intervening before health effects occur, is primarily within the domain of public health and therefore mo􀆟vates rela􀆟vely few measures requiring or referred for CBE review. Rela􀆟vely few CBE-endorsed measures focus on primary preven􀆟on, and one of the examples provided (i.e., cervical cancer screening) is clearly NOT primary preven􀆟on. I suggest broadening the concept to include elements of secondary preven􀆟on, such as screening for diseases (e.g., cervical, breast, and colorectal cancer; alcoholism as in CBE#2152).
• Ini􀆟al recogni􀆟on and management should cover more than signs and symptoms; it should cover the en􀆟re diagnos􀆟c process including diagnos􀆟c safety and diagnos􀆟c error, which have recently been recognized as cri􀆟cally important gaps in the current quality measurement enterprise. For example, laboratory tes􀆟ng and imaging are cri􀆟cal components of the diagnos􀆟c process; a commitee in this domain should have strong representa􀆟on from disciplines such as radiology, pathology, and laboratory medicine.
• The 3rd and 4th projects are not clearly delineated and would require hugely divergent exper􀆟se; for example, consumer assessment of hospice care is extremely different from management of pediatric hemodialysis. I suggest instead dividing projects according to the objec􀆟ve of the measure, which may also align with the well-accepted Ins􀆟tute of Medicine/Na􀆟onal Academy of Medicine domains for high-performing care: for example, “pa􀆟ent or caregiver experience” versus “􀆟mely and effec􀆟ve care for acute and chronic condi􀆟ons” versus “pa􀆟ent safety.” My suggested approach would ensure that the first commitee has experts in survey research and pa􀆟ent experience, while the second has experts in chronic disease management and process measurement, and the third has experts in pa􀆟ent safety.

Third, measures would undergo maintenance reviews every 3 years, although developers/stewards may request extension for up to 1 year, based on unclear criteria. I suggest instead adop􀆟ng a consistent 5-year 􀆟meline for maintenance review (subject to the provision for emergency/off-cycle review if needed), which would significantly reduce the burden on measure developers and the en􀆟re PQM. There is no benefit to triennial review, and the proposed approach seems problema􀆟c in the absence of clear criteria for proposing or accep􀆟ng an extension.

Fourth, the removal of endorsement would require “75% or greater agreement for endorsement removal by the E&M committee,” if the steward resubmits the measure with evidence of a meaningful gap. This standard is very problema􀆟c because it drama􀆟cally lowers the bar from the ini􀆟al endorsement decision. In other words, a measure would require 75% support for endorsement, but only 25% (+1) support for maintenance. In other words, is the “default assump􀆟on” or “base case” that measures should be sunset a􀅌er 5 years, and con􀆟nued if they demonstrably con􀆟nue to meet E&M criteria, or is it that measures should be con􀆟nued and used forever? I would argue that the standard for maintenance should be the same as the standard for endorsement, and that raising the bar so drama􀆟cally will make it very difficult to sunset measures that the vast majority (up to 75%) of experts no longer support.

Fi􀅌h, it is important to allow flexibility for criteria such as “no significant change in measure results for accountable entities over time.” Just because performance on a measure has not improved (yet) does not mean that it cannot improve, or that it will never improve. In some cases, there may be obstacles to improving performance that are difficult to overcome with currently available human, organiza􀆟onal, and financial resources, but con􀆟nued aten􀆟on to the measure will help to address these obstacles. Some problems in health care require con􀆟nued aten􀆟on and focus, even if progress has been difficult.

Sixth, the proposed “75% or greater agreement for endorsement” threshold appears to be a significant change from the current 60% threshold, but it is hard to evaluate how it will actually work in prac􀆟ce, based on Appendix F. More transparency and clarity regarding this proposed change is necessary. For example, the column headings in Appendix F are undefined and uninterpretable. The footnote to the table (“threshold for consensus is 0.95”) appears to contradict the 75% threshold described elsewhere and is unrealis􀆟cally high. It is not clear what the propor􀆟ons within the table cells represent. The concept of “total available range of variance” is not defined or illustrated by example. It is unclear what es􀆟mator of variance will be used based on the Measure Evalua􀆟on Rubric in Appendix D; for example, will commitee members be asked to rate measures on the 1-9 ordinal scale used in Davies et al. (2011), or will they simply be asked to classify measures as “not met,” “not met but addressable,” or “met”? The later classifica􀆟on is categorical, not ordinal, because addressability is a complex judgment that is condi􀆟oned on available 􀆟me and resources (and requires developer input).

Seventh, the role of measure developers and stewards in the E&M process must be clarified and strengthened. The current Guidebook appears to remove developers and stewards from the process, except insofar as they may be asked to address ques􀆟ons from E&M staff, as described on page 13. It is important that measure developers and/or stewards con􀆟nue to be available to address ques􀆟ons raised by commitee co-chairs and members (not just E&M staff), to respond to public comments, and to address criteria flagged as “not met but addressable” by commitee members.

Eighth, the roster targets in Table 3 appear to grossly underes􀆟mate the importance of individuals with clinical exper􀆟se in the E&M process, especially given how this roster category includes all types of licensed health professionals. For example, review of process measures requires deep knowledge of how specific diseases should be treated, based on current professional guidelines and published evidence. For example, cancer-related measures require input from medical oncologists, radia􀆟on oncologists, surgical oncologists, primary care providers involved in cancer care, oncology nurses, therapists involved in cancer treatment and recovery, radiologists or pathologists involved in diagnosis and follow-up surveillance, etc. Although strong representa􀆟on of other stakeholders, as described, is essen􀆟al, they cannot provide the cri􀆟cal review and interpreta􀆟on of clinical evidence that is required for process-of-care measures, in par􀆟cular.

Ninth, the Guidebook proposes a new criterion for appeal based on “evidence that the appellant’s interests are directly and materially affected by the measure, and that the CBE’s endorsement of the measure has had, or will have, an adverse effect on those interests.” This criterion appears to be borrowed from the judicial sphere, is conceptually problema􀆟c, and will prove to be impossible to implement fairly. Specifically, this criterion will disqualify any appellant that is not a provider, purchaser, or payer of healthcare. The CBE is not an appellate court. Any stakeholder, including pa􀆟ents, pa􀆟ent/caregiver advocacy organiza􀆟ons, and researchers, should be able to appeal a CBE decision based on the existence of a procedural error, overlooked evidence, misapplica􀆟on of the measure evalua􀆟on criteria, or other failure of the review process. The proposed criteria would preclude any researcher or former patient (for example) from appealing a decision. In addition, I would strongly recommend that the Appeals Commitee include at least some independent voices and recuse the involved committee co-chairs from decision where they have a conflict of interest. Fair consideration of appeals requires limiting the participation of those who have a vested interest in rejecting the appeal, and including impar􀆟al reviewers.
Finally, some of the specific Measure Evaluation Criteria proposed in Appendix D do not appear to be well justified based on either measurement theory or published literature. For example, it is unclear why the proposed threshold is so low for inter-rater agreement (0.4) and somewhat higher for test-retest reliability (0.5). I am not aware of evidence that test-retest reliability is more important, or easier to achieve, than inter-rater agreement. Specifically, test-retest reliability es􀆟mates are very sensi􀆟ve to the interval between test and retest, as the underlying phenomenon may change between test and retest. On the other hand, inter-rater agreement is usually tested at a single point in 􀆟me, or based on the same source material (e.g., images, text, video of a pa􀆟ent encounter), so it isolates the impact of the assessor. The threshold for inter-rater agreement should be equal to or higher than the threshold for test-retest agreement, consistent with SMP discussion in 2022. It should also be clarified that the threshold for accountable en􀆟ty-level reliability (0.6) refers to a measure of central tendency (median or mean), as these reliability es􀆟mates vary widely according to the volume/size of the en􀆟ty. With respect to risk-adjustment, it is some􀆟mes appropriate for risk-adjustment models to include features that do not significantly influence the measured outcome, if they are reasonably EXPECTED to influence that outcome, based on the conceptual framework and published literature. Similarly, it is o􀅌en appropriate to include features that do not vary significantly in prevalence across measured en􀆟􀆟es, because they COULD vary in prevalence, given a larger and more diverse tes􀆟ng sample, and because they are clearly associated with the outcome of interest. In other words, the definition of confounding provides a reasonable foundation for identifying potential risk-adjustment features, but it is o􀅌en appropriate to include features that do not meet the strict definition of a confounder.

Thank you for this opportunity to comment on the E&M Guidebook, and I look forward to following this process to its conclusion.
Patrick S. Romano, MD MPH
Professor of Internal Medicine and Pediatrics
University of California, Davis
Member, Scien􀆟fic Methods Panel

Partnership-for-Quality-Measurement-EM-Guidelines-Comments.pdf

Name or Organization

Patrick S Romano

E&M Guidebook

Please see attached

AUA comment - E&M process - July 2023.pdf

Name or Organization

Karen Johnson

Questions on PQM E&M Guidebook

Which aspect(s) of the guidebook are you commenting on?

Measure use – accountability application characteristics

Other

Thank you for the opportunity to provide public comment on the latest PQM Endorsement and Maintenance (E&M) Guidebook. Below please find these questions on this document:

- In the previous CBE process, eCQMs could be submitted for Approval for Trial Use. eCQMs that were Approved for Trial Use now have the endorsement status of “Endorsed” on the PQM website. Will eCQMs that were Approved for Trial Use by NQF be able to apply for Initial Endorsement with PQM?

- For the Importance domain, will PQM provide a flow chart (like the one NQF provided in their 2021 Measure Evaluation and Criteria, p. 15) that shows how the evidence from literature will be rated for the different levels of “Not Met”, “Not Met but Addressable”, “Met”? (Appendix D, pp. 31-33)

For the Scientific Acceptability domain:

- Will PQM provide flow charts (like the ones NQF provided in their 2021 Measure Evaluation and Criteria, p.24-25) that show how reliability and validity testing will each be rated for the different levels of “Not Met”, “Not Met but Addressable”, “Met”? (Appendix D, pp. 33-36)

- Does PQM require risk adjustment for measure endorsement? If no risk adjustment or stratification is used for the measure, then does the measure developer need to provide justification for that to meet the criteria for Scientific Acceptability? (Appendix D, p. 34, p. 36)

- For eCQMs, would PQM confirm that score level validity is not required for initial review (and only data element validity of structured fields is required)? (p. 16)

- Could PQM provide more detailed information on what they consider adequate for face validity, is it the methodology or results? (Appendix D, p. 36)

- For the Equity domain, since NQF collected this information differently, will PQM provide an example measure submission that obtained (or would obtain) the rating “Met” for Equity? (Appendix D, pp. 37)

Name or Organization

Ella Puga

Breadcrumb

Comments

Acumen, LLC

Committee Functions

Based on the proposed…

Comments on PQM Endorsement & Maintenance Process

E&M Guidebook Comment

Comment on E&M Guidebook

Feedback on E&M Structure

Comments on the E&M Guidebook

Partnership for Quality Measurement

Comments on the E&M Guidebook

NCQA Public Comment Submission

Comment and request for clarification on new CBE process

APA Comments on E&M Guidebook

American Society of Anesthesiologists Comments on E&M Guidebook

LTSS/HCBS quality measures and E & M committee structure

Joint Commission comments on the E&M Guidebook

General Comments

Committee structure and appeals process

Thank you for the…

E&M Guidebook

Questions on PQM E&M Guidebook