I am concerned that staff reviews will contain an explicit rating recommendation. In my experiece of many years as developer and NQF committee member, those recommended ratings rarely change. To some degree for good reasons, because some measures are pretty clear-cut as suitable or non-suitable for endorsement. However, for the ones that are less clear, this may introduce bias. The psychology of those groups is that many members are either not familiar with the substantive content behind the measure or measurement science. Given that uncertainty, they tend to defer to a qualified authority. As you are increasing lay representation, the problem is likely to increase.
It is an important consideration because we have harvested the low-hanging fruit for measures, i.e., the ones for which there is unambiguous RCT evidence. so we are moving into fields in which measures are based on evidence, albeit lower quality one. That is particularly true for areas of medicine that do not conduct those large-scale clinical trials, like palliative care or rare diseases.
Thus, my suggestion would be that staff follows the practice of peer review in journals, i.e., provide a reasoned evaluation but without an explicit verdict.
thanks for your consideration
Changes in the committee review process from NQF processes to Battelle processes present several questions and concerns:
1. It appears that each committee member will independently review and make judgments about each measure without the benefit of a group discussion by the diverse experts. Given the number of measures each reviewer will be asked to review (up to 11 in each cycle by each committee), it is unrealistic to expect each member to carefully examine the underlying evidence offered by each developer, and the technical characteristics of each measure (feasibility, usability, etc.). Discussions by all reviewers is helpful to all, before they develop meaningful opinions. In the proposed process, it is only those measures with apparent disagreements in judgment that will come up for discussion and review by a small subset of committee members. I suggest strongly that you revise the process to facilitate discussion by the whole committee before any judgments are made, and certainly before a vote to endorse is entertained.
2. How do you see the role of content experts in this review process? With wide stakeholder representation on each committee, the number of clinicians with experience and expertise in particular scientific and clinical issues will be a small minority of the whole committee. If the whole committee needs scientific and clinical information bearing on the measure before they submit an opinion, what mechanism is present to accomplish this? I suggest that a scientific review group comprised of subject experts, pre-review each measure, and offer their opinions to all committee members before they are asked to make judgments about each measure.
3. In the NQF process, the first step the committee was charged to examine was the evidence base for the measure. An evidence algorithm was provided to assist in this process. After presentation by a discussion lead, and discussion by the committee, a vote was held whether the evidence supported the measure or not. Moving further in the review process required agreement that the evidence supported the measure. Where in the Battelle process is the scientific evidence reviewed critically? It appears that scientific acceptability is one of many questions considered before a single overall yea or nea vote for the meaasure is held. I am concerned that the fundimental validity of the measure could be lost in such a process - - where characteristics of a measure are considered even if there is no good science underpinning it. I suggest strongly you develop a process to review the evidence with an algorithmic guide as a necessary prelude to any further consideration of each measure.
4. I did not read or hear about a logic model for each measure. I may have missed it - - it is important to ask the developer to explain the logic model they used to develop each measure - - why is it important? How will it improve care? Is there evidence that use of such a measure will improve care? And most importantly - - is there evidence that there will be no unintended consequences of using the metric? I suggest that each of these issues be addressed by the developer for all proposed measures.
ISO accreditation is quality measurement needed by Del-3-6-Endorsement-and-Maintenance-Guidebook privileges the laboratory has nesseccity complete by standard safety.
on p. 27 of the EQM guidebook, under Annual Updates, it says 5 years but elsewhere it says 3 years until maintenance review of endorsed measures.
Why can't we simplify reasonable fee for service. The practice of reducing payment if measures that may or may not relate to the medical problem being addressed is getting more convoluted. It is getting in the way of good patient care, not improving patient care.
Based on our experience of with the measure submission process prior to the new contract, we believe that it is essential that the PQM committee(s) that evaluates patient experience measures include individuals with expertise in survey methodology and analysis. The NQF patient experience and function committee served to ensure continuity in evaluation across surveys. For example, the CAHPS surveys use the same development and testing methodology but evaluate patients’ experiences in many different care settings.
To maintain consistency in the review of each CAHPS survey, we have found it essential for a consistent group of committee members to conduct the review. We recommend that Battelle consider retaining the patient experience and function committee. If this is not possible, we suggest you include at least two survey experts to participate in the evaluation of all patient surveys, including patient experience surveys such as the CAHPS surveys and other patient-facing measures, such as PROM's. This could be accomplished by making these experts available to each of the five new patient journey committees when survey measures arise.
The Johns Hopkins Armstrong Institute for Patient Safety and Quality aims to reduce preventable harm, improve clinical outcomes and experiences, and reduce waste in health care delivery at Johns Hopkins and around the world.
We support the rigorous review of quality measures to ensure that meaningful measures are used to evaluate and improve performance without causing unintended harms.
We have reviewed the new endorsement and maintenance process proposed by Battelle and have the following comments:
- We support the continued use of rigorous criteria and processes to evaluate measures.
- We support the plan to consolidate the number of committees to six and remove the redundant CSAC committee.
- We support greater individual panel member review of measures prior to the larger group discussion.
- We have some concerns about having enough content experts on each standing committee. Given there will only be six standing committees, each committee will be covering a very broad range of clinical topics. We are concerned that a committee may not have the expertise needed to adequately understand or speak to every proposed measure. We appreciate the idea of soliciting outside content expertise when needed, however, we believe that process needs to be formal in nature and fully transparent.
Thank you for the opportunity to comment.
Kidney Care Partners (KCP) is a non-profit coalition of more than thirty organizations comprising the full spectrum of stakeholders related to dialysis care—patients and advocates, dialysis professionals, physicians, nurses, researchers, therapeutic innovators, transplant coordinators, and manufacturers. KCP is committed to advancing policies that improve the quality of care and life for individuals at every stage along the chronic kidney and end stage renal disease care continuum, from prevention to dialysis, transplant, and post-transplant care. KCP applauds Battelle and the Partnership for Quality Measurement (P4QM) for its commitment to serve as the new Consensus Based Entity (CBE) for the Centers for Medicare and Medicaid Services (CMS), and we appreciate the opportunity to comment on the new Endorsement and Maintenance (E&M) Process. We offer the following comments.
KCP appreciates Battelle’s focus on accelerating and streamlining the prior E&M consensus processes. Given today’s rapidly evolving clinical, legislative, and regulatory environments, a commitment to meeting stakeholders’ needs in a timelier manner is paramount. However, sufficient time must be provided for stakeholders to review and develop thoughtful comments and positions on endorsement recommendations—and for measure developers to appropriately respond to feedback and questions during the submission and review process. We urge Battelle to acknowledge this reality and to ensure that the quality of the process is not compromised in the interest of expediency.
SCIENTIFIC METHODS PANEL ROLE
KCP welcomes Battelle’s proposal to provide opportunities for Measure Developer support throughout the submission and endorsement review processes. In particular, we applaud the revised role of the Scientific Methods Panel (SMP), wherein its members’ considerable expertise can be better leveraged on a measure-by-measure basis to assist developers struggling with methodological challenges. We have long advocated for this more fluid use of the SMP. In contrast to the prior resource-intensive, siloed development and endorsement processes, this revised approach is a more iterative, collaborative process between developers, the SMP, and the E&M Committees that we believe will better meet the needs of all parties, will conserve limited measure development resources, and will result in stronger, more meaningful measures.
Battelle notes in its E&M Guidebook that it will enact a more robust and transparent appeals process. However, the proposed Appeals Panel will consist of the internal Battelle E&M team, the Chairs of the E&M Committee that initially voted to endorse the measure in question, and “others, requested as needed.” As described, we are concerned that this process may neither be robust nor transparent; we request additional details for clarity. For instance, is there an opportunity for new nominations to Appeals Panels to better engage stakeholders in the process and to provide new opinions and diverse points of view? At current, we fear the described revised appeals process may prove to be little more than a rehash of prior positions from individuals involved in the original decision in favor of the measure.
NOVEL HYBRID DELPHI AND NOMINAL GROUPS TECHNIQUE
Battelle proposes a novel hybrid Delphi and nominal groups multistep process to increase engagement of all committee members and structure facilitation by using standard measure evaluation criteria and practices. While we are pleased by the proposal for expert facilitation to better ensure an equitable sharing of ideas and opinions among committee members, we have serious reservations with the process as currently specified.
We note that it appears each E&M project will now have two committees—a Foundational Advisory Group to review and make recommendations on all candidate measures, and a Recommendations Reconciliation Group to make final recommendations on measures for which there was not consensus within the Advisory Group. Additionally, there now appears to be only five E&M projects which have absorbed the many project committees that had been created under NQF. As such, although it is not entirely clear in the Guidebook, we believe that the Renal Standing Committee will no longer exist.
Instead, renal measures will now be reviewed under two larger projects, “Management of Acute Events, Chronic Disease, Surgery, and Behavioral Health” and “End-of-Life Care, Rescue, Specialized Interventions”. Given the large number of topics being covered and the target rosters of 45 and 15 members, respectively, we gather that there will be only one or two individuals with renal-related expertise on each of the Advisory and Recommendations Groups. We have grave concerns with this proposal and feel the end result will be a tremendous loss of experiential and clinical expertise within all clinical areas. We note that the dialysis facility, in particular, is a unique care setting guided by unique Federal regulations and a unique punitive payment system. ESRD Quality Incentive Program (QIP) penalties often disproportionally and paradoxically impact the most financially vulnerable facilities treating the most socially and medically disadvantaged patients.
The Renal Standing Committee was constructed to ensure that measures being considered for the QIP are technically appropriate for use in this singular patient population and specialized care setting and will not inadvertently perpetuate the very disparities CMS and Battelle are working to address. As proposed, these committees will not have the requisite clinical or experiential knowledge to provide meaningful input in the wide array of topic areas. Likewise, clinicians, patients, and advocates with expertise in a particular area will now be asked to review measures outside that realm of expertise, squandering their insights, talents, and time. The end result will likely be the adoption of a significant number of new measures into the QIP for which there was no input from providers with knowledge of the inner workings of dialysis facilities—or from the patients and families that will ultimately be impacted by those metrics.
We believe this loss of expert input contradicts CMS’s “Measures that Matter” priorities and anticipate it will have a profoundly deleterious impact on the quality of the QIP. We urge Battelle to reconsider this approach and to reinstate individual Advisory Groups addressing specific clinical foci. Barring that, we urge Battelle to invite appropriate subject matter experts or sitting Standing Committee members to participate in measures’ reviews.
Finally, and importantly, we have profound concerns with renal measures falling under the “End-of-Life Care” Project. We note that patients, care-partners, and advocates working with KCP are in fact offended by this proposal. As Battelle knows, many dialysis patients lead long, active, and fulfilling lives for 5, 10, or even 20 or more years. Individuals with chronic and end-stage renal disease—including those on dialysis—do not consider themselves to be dying, but rather living, with kidney failure. Moreover, splitting the renal topic area into two projects will further dilute the clinical expertise required to review these complex measures. We strongly suggest Battelle revise the proposal to allow for review of all renal-related measures within the Chronic Care Project.
KCP again thanks you for the opportunity to comment on the Battelle Clinical Quality Measure Endorsement and Maintenance Process. If you have any questions, please do not hesitate to contact Lisa McGonigal, MD, MPH ([email protected]).
 Sheetz KH, Gerhardinger L, Ryan AM, Waits SA. Changes in dialysis center quality associated with the End-Stage Renal Disease Quality Incentive Program: An observational study with a regression discontinuity design. Ann Intern Med. 2021 Aug;174(8):1058-1064. doi: 10.7326/M20-6662. Epub 2021 Jun 1. PMID: 34058101.
PQM has obviously put a lot of work into the E&M project, and I’m happy to see some significant improvements. CMS chose not to renew its contract with NQF in part due to concerns about how they handled reports of unintended adverse consequences. As one of the leaders advocating for the exclusion of the Spinal Cord Injury (SCI) population from the CAUTI measure, I am writing to share some observations and suggestions.
Ever since 2014, several SCI professional organizations have persistently raised concerns about CAUTI measure validity (does it measure a meaningful outcome) and safety (examples of unintended consequences) in the SCI population. Due to the unique bladder physiology of SCI, any quality measure that focuses exclusively on CAUTIs related to indwelling urethral catheters is inherently unable to provide a meaningful estimate of CAUTI incidence in this population. Furthermore, the measure promotes interventions that can improve UTI rates in the general population, but which have consistently failed to improve UTI rates in SCI. It inadvertently penalizes hospitals for practicing guideline-concordant, patient-centered care while rewarding those that do not. Finally, it incentivizes Foley removal by those who lack sufficient expertise to manage SCI neurogenic bladder. SCI experts continue to report the predictable adverse consequences of this practice.
Essentially, the CAUTI measure serves as an ANTI-Quality Measure for SCI – it achieves the exact opposite of the desired outcomes. Some of the nation’s most prominent SCI centers score well below national average on this measure. We have collected data from a nationwide ED database that shows that UTIs in SCI are not declining as they are in the non-SCI population, and that the predictable adverse events one would expect from inexpert SCI bladder management have been increasing (particularly during years of aggressive catheter removal). Please refer to the attached, “Risks vs Benefits for SCI” document for an explanation of the relevant underlying concepts in SCI medicine and for an extensive list of supporting references.
Despite our efforts to educate, representatives from the CDC have continued to claim the CAUTI measure provides accurate, useful information for this population, and that the measure encourages practices known to improve infection rates in SCI. The NQF’s E&M process never forced the CDC to thoughtfully evaluate the evidence behind their assertions, or to defend these claims with a thoughtful review of the scientific literature. It is my opinion that if they had been forced to do this in 2014, it would have been obvious to everyone that the CAUTI measure presents all risk and no benefit to this small, marginalized population. Instead, a years-long struggle has ensued. I believe that this struggle has been characterized by Cognitive Bias and groupthink that circumvented logical reasoning and prevented what should have been a very simple decision.
Moving forward, I present the following thoughts and suggestions:
1) Avoid Premature Closure: Voting committee members should have sufficient opportunity to learn relevant underlying concepts. This may require more than a 15-minute public comment period – particularly when discussing esoteric concepts.
2) In-Group Bias: A measure developer might avoid in-group bias by recruiting its own expert in the field of SCI (or other relevant field) to help them understand the material and to represent their interests. This can also mitigate the Dunning-Kruger effect. I eventually discovered that CDC had a PM&R physician on its staff, but failed to utilize his expertise to help them understand our concerns.
3) Watch for evidence of Authority Bias: CDC is a respected federal agency. However, several reports form the nonpartisan Government Accountability Office (most notably related to the Washington DC Lead Water Crisis) show that this agency has not always acted with scientific (or ethical) integrity. My experience over the last nine years suggests the CDC still has lessons to learn from the 2010 GAO report.
Claims made by the CDC and other measure developers should face the same level of scrutiny/skepticism as claims made by everyone else. Do their claims square with the literature? Do they make sense?
4) Anticipate and develop a plan to manage uncomfortable discussions. Most of us hate to acknowledge past mistakes, but failure to do so precludes learning from them. After nine years of resistance, it would be extremely uncomfortable and embarrassing for the CDC to acknowledge that it made inaccurate or misleading statements on this topic. In past deliberations, moderators have cut discussion short just as we were beginning to expose the absurdity of some of CDC’s claims. PQM should enable us to press forward during these times of discomfort.
5) Consider which side should bear the burden of proof.
In 2019, I received an email stating, “CDC would like to see a scientiﬁcally rigorous assessment and substantiation of the concern that the NHSN CAUTI measure has had the unintended consequence of frequent and devastating instances of bladder mismanagement of spinal cord injury patients in acute care hospitals, due largely or solely to the measure itself.” [emphasis added]
Conventional medical reasoning involves a balancing of benefits and risks. If there is reason to believe that a measure provides little benefit to a population, and that it carries some risk, then off-cycle review should not be postponed until we have proof/quantification of harm. Usually, we prefer to prevent adverse events before they happen. This unjustified risk represents an egregious deviation from conventional medical ethics.
Furthermore, their request to see evidence of “frequent and devastating [consequences] … due largely or solely to the measure itself” presents an entirely unreasonable standard. We do not have this strength of evidence to link smoking to lung cancer. CDC has saddled the tiny SCI community with the burden of proof, while it continues providing CMS with incomplete and misleading data that is used for public reporting and reimbursement purposes.
6) Clearly define consequences for non-endorsement. In past NQF deliberations, committee members worried that temporarily withholding endorsement would amount to “scrapping the [CAUTI] measure,” that it would somehow undermine improvements in CAUTI prevention nationwide. Several committee members expressed serious concern about the SCI population, but they couldn’t bring themselves to enforce such an extreme consequence.
It would be good to have an alternative to this “all-or-nothing” approach. Perhaps a middle-ground category could be useful – a category that makes endorsement conditional on certain adjustments or additional justification. We shouldn’t have to wait for the next 3-year cycle to determine whether a measure developer adequately addressed concerns raised by the committee.
If you’ve read this far, I thank you for your time. I believe that all of us are committed to improving Patient Safety, and that with enough work we can achieve consensus on this matter.
-- Matt Davis, MD
The American Medical Association (AMA) appreciates the opportunity to comment on the Partnership for Quality Measurement (PQM) Endorsement and Maintenance Guidebook. We are supportive of PQM’s efforts to ensure that the process emphasizes consensus and agree with the increase of the consensus threshold to 75%. We are very concerned that the project topic areas and associated committees will not ensure that the right clinical expertise reviews each measure. In addition, we question some of the proposed changes to the appeals process and the apparent changes to the measure evaluation criteria. We ask that the PQM carefully consider the following comments in an effort to further improve the process.
Regarding the proposed topic areas, we are extremely concerned that this structure will not ensure that measures are reviewed by the relevant clinical experts. For example, the Management of Acute Events, Chronic Disease, Surgery, Behavioral Health project is much too broad and would require representation by many specialties including but not limited to surgery, primary care, emergency medicine, endocrinology, gastroenterology, ophthalmology, and psychiatry. Based on the target number of individuals for each roster category outlined in Table 3 on page 9, we do not believe that having eight clinicians serve on this committee will be adequate. While this proposed process intends to leverage subject matter experts (SMEs) in instances where the committee does not have the necessary expertise, the guidebook does not adequately address how these SMEs will be identified or how they will be vetted (both to ensure that they have the relevant background and meet the conflict of interest policy). Based on the potential breadth of measures that would be reviewed in these projects, we believe that SMEs will be used frequently and as a result, question this lumping of so many clinical areas into so few projects. We also do not believe that having one SME provide input on a measure is sufficient, particularly when they are not able to vote on that measure. We urge the PQM to reconsider limiting measure reviews across only five topic areas and be more explicit on how SMEs will be vetted and their participation in the process.
The AMA also believes that saying that the process includes two public comment periods is misleading since the second public comment will really function as the appeals process and those commenting during that timeframe must justify their concerns against two criteria. While the appeals process is important, we do not consider it to be the traditional public comment where stakeholders can share their opinions and perspectives without any limitation. Furthermore and perhaps more importantly, we strongly disagree with including Battelle staff or the co-chairs of the very committee that voted for or against a measure in the review of any appeal. There is a very real risk of introducing bias into the process and it should be avoided at all costs.
Regarding the clarifications provided on electronic clinical quality measures (eCQM) testing on page 16, we believe that PQM is raising the bar and deviating from the previous criteria – something that it was our understanding that PQM sought to avoid at least initially. Language in the third bullet on page 16 states:
“Documentation of testing on more than one electronic health record (EHR) system from more than one EHR vendor is required to establish Scientific Acceptability (i.e., reliability and validity), indicating that the measure data elements are valid and that the measure score can be accurately calculated.”
The previous criteria guidance used by National Quality Forum (NQF) stated that:
“Data element validation is required for all eCQMs (demonstration of accountable-entity level validation is also encouraged). For eCQMs based solely on structured data fields, reliability testing will not be required if data element validation is demonstrated. If data element testing is not possible, justification is required and must be accepted by the Standing Committee.”
It also stated that:
“The minimum requirement is testing in EHR systems from more than one EHR vendor. Developers should test on the number of EHR systems they feel appropriate. It is highly desirable that measures are tested in systems from multiple vendors.”
On comparison of the PQM guidebook against the previous NQF criteria, it appears that both measure score reliability and data element validity testing will now be required and the number of vendor systems was possibly increased. Many developers who planned on submitting an eCQM for endorsement in the next year or two will likely have either completed or at least budgeted and planned for testing based on the previous criteria. These changes will have a significant impact on whether developers will be able to meet the endorsement requirements and may prohibit groups from submitting measures that can greatly contribute to advancing quality. We do not view these changes as just clarifications and the unintended consequence of discouraging measure submissions must be avoided.
Given the assurances that Battelle staff gave when asked if the measure evaluation criteria would be changing, we are shocked to see several modifications in Appendix D that were not noted elsewhere in the guidebook. Similar to our reaction to the eCQM clarifications, we believe that some of the language in this criteria are more than just clarifications or simplifying the submission process. While we appreciate the inclusion of guidance on how a measure may meet or not meet a criterion, we have several questions or concerns:
- The previous set of criteria used by the NQF required measures to pass importance and scientific acceptability, which ensured that measure met the highest bar for these two criteria. Has that “must pass” requirement been removed?
- Under scientific acceptability, clarification on the following items is required:
- While we very much support the inclusion of thresholds for both levels of reliability testing, it is not clear if the threshold will be set using the minimum or average result. The AMA strongly urges PQM to set this threshold at the higher bar for the minimum result rather than for the average.
- Will developers be allowed to provide their explanation or interpretation of the findings for reliability testing? It is included for validity testing and we believe that developers should also provide this information for validity testing.
- Is data element validity testing considered to be empiric validity testing? Based on the selections, it appears that it is but clarification on that is needed.
- How will eCQMs be reviewed against the reliability and validity subcriteria given our comments and concerns outlined above.
- For risk adjustment, it is not clear whether developers will be required to provide a sufficient level of detail on whether social risk factors were considered and tested. Developers often discount the importance of this question and we believe that it will be critical to understand the impact of their inclusion or exclusion in the model. We do not believe that the current criteria adequately address this concern.
- The previous criteria used by NQF were more comprehensive on the threats to validity including exclusions and missing data. Knowing how a measure as specified meets these threats is very important and we do not agree with omitting them from the evaluations.
- We assume that this criterion is intended to replace the previous subcriterion on disparities in care and support its continued inclusion.
- However, the description of this criterion appears to require empirical testing of the difference in scores, which differs from what was previously required where developers, particularly during the initial review, could provide supporting literature or distributions of performance scores by subgroups with no statistical analysis.
- In addition, based on the current language, it is not clear whether a measure that either demonstrates overall poor performance or variation across the entire patient population but does not demonstrate differences in subgroups could be considered as eligible for endorsement.
- While measuring and understanding the potential disparities in care is very important, some measures may not have disparities across subpopulations and that lack of variation should not prevent a measure from being endorsed.
- We ask that PQM clarify the intent of this criterion.
We request that the PQM reconsider the proposed timeline for the two endorsement maintenance cycles. Based on Figure 1 on page 5, public comment is scheduled to occur at the same time as much of the work for Pre-Rulemaking Measure Review (PRMR)and Measure Set Removal (MSR) and will likely overlap with other activities such as public comments on proposed rules. These overlapping timelines lead to a significant burden for external stakeholders. We are extremely concerned that it will lead to reduced public input on one or more of these activities and urge the PQM to change the timing of the MSR and endorsement comment periods to avoid the months when proposed rules are also released.
Lastly, we appreciate that the PQM recognizes that refinements will need to be made to the process and that any proposed changes will include a formal public comment period in addition to a timeline for transition. We also note that the PQM commits to not applying any changes to the process or criteria for measures that are currently in the review process. However, measure development and testing is a multi-year process and developers need sufficient time to incorporate any changes into their development and testing plans. We urge PQM to commit to delaying implementation of any significant changes, including some of the measure evaluation criteria revisions outlined in this guidebook, for at least two years in order to allow them to be responsive. We also request that the PQM commit to an initial evaluation after the first or second year of implementation and ongoing re-evaluations of the process. These evaluations should be comprehensive including whether the Novel Hybrid Delphi and Nominal Group (NHDNG) technique and structure of the Advisory Group and/or Recommendation Group successfully achieve the desired goal of consensus-driven recommendations and whether the project topic areas and expertise on the committees are appropriate.
While we are supportive of many of the proposed processes outlined in this guidebook, we urge PQM to ensure that the endorsement maintenance process is stable and consistent with sufficient advance notice of changes. Otherwise, there is great risk of compromising its integrity and discouraging participation in the process.
Thank you for the opportunity to comment.
The Society for Healthcare Epidemiology of America (SHEA) is a professional society representing more than 2,000 physicians and other healthcare professionals around the world who possess expertise and passion for healthcare epidemiology, infection prevention, and antimicrobial stewardship. The society’s work improves public health by establishing infection-prevention measures and supporting antibiotic stewardship among healthcare providers, hospitals, and health systems. This is accomplished by leading research studies, translating research into clinical practice, developing evidence-based policies, optimizing antibiotic stewardship, and advancing the field of healthcare epidemiology. SHEA and its members strive to improve patient outcomes and create a safer, healthier future for all. '
We are submitting the following overarching comments on the Endorsement and Maintenance (E&M) Guidebook. We are happy to set up a time to talk through these comments in greater detail.
- We support the continuation of the rigorous criteria used to develop and evaluate measures
- We support consolidating the committees and removing the redundant CSAC committee
- We support greater individual review of the measure based on specific criteria to help mitigate the risks of “group think” at the recommendation meeting
- We have some concerns about having enough content experts on the advisory and recommendation group. With only 6 and 2 clinicians respectively covering a broad range of topics, we do not believe that there will be enough clinical bedside expertise to adequately understand or speak to the clinical aspects of the varied measures in the large portfolio. We appreciate Battelle’s plan to solicit content expertise, however we believe that process needs to be more formal and codified than asking other Battelle members and should include professional organizations who have likely considered the various clinical sequelae and unintended consequences of the topic of interest.
- We believe there should be a greater number of statisticians and health service researchers on the committees as there will no longer be a scientific methods panel review before recommendation discussion.
Thank you for your consideration.
Lynne Batshon, Director of Policy & Practice, on behalf of the Society for Healthcare Epidemiology of America
- The Advisory Group not participating in nor discussing measures in the meeting yet re-voting could be viewed unfavorably particularly to the Advisory Group members.
- Not discussing measures that appear to have consensus (any status of consensus prior to the meeting) in the meeting at all could be viewed unfavorably. Stakeholders expect a process that provides for the opportunity to be heard. Written feedback even publicly posted isn’t enough if/as the E&M process specifies a meeting particularly if a measure has pre-meeting Do Not Endorse consensus.
- The 60% - 80% quorum as designed could lead to mismatches in process and purpose. The 60% meeting quorum pertains to the Recommendations Group such that the full complement of the Advisory Committee could be present but there still be no meeting quorum. (i.e. if the Recommendations Group is lacking presence).
- Voting offline (for most or all voting) due to not reaching the voting quorum could be viewed as unpalatable given the purpose for the CBE process. May want to revisit the Guidebook language saying Advisory Group members are encouraged to attend the meeting since the voting quorum includes them (at the meeting for voting/re-voting - consistency with these terms as well).
When will the system to enter upcoming new or maintenance measure content will be available. We aren’t waiting for the system’s availability to compile required inputs as Battelle has outlined that the categories won’t change but knowing the timing will help in our planning.
- P.2 Project Topical Areas Table 1: The 1st two Project Topical Areas may be a bit blurred or inclusion of measures in one or the other may appear somewhat arbitrary. E.g. The example measure in the Table: CBE #2152 Preventive Care and Screening: Unhealthy Alcohol Use: Screening & Brief Counseling uses Preventive in the measure title but is an example not for the 1st Project Topical Area of Primary Prevention but rather is for the 2nd Project Topical Area of Initial Recognition and Management. Initial Recognition also sounds non-medical (and I’m not sure what it means in this context) versus assessment or diagnosis.
- P.3 Project Topical Areas Table 1: Structural changes term/language (entry) for Areas Covered for Management of Acute Events, Chronic Disease, Surgery, Behavioral Health reads as odd in context. What structural changes are referenced? (i.e. system, person, measure?).
- P.8 After Figure 2: The text indicates: each E&M project committee (Recommendations Group plus the Advisory Group) has two cochairs.The co-chairs’ responsibilities are to: Co-facilitate meetings; however, on the previous page: the Advisory Group members are encouraged to attend the Recommendations Group endorsement meeting to listen to the Recommendations Group discussions and to re-vote on measures during the meeting. It is confusing as to what the Advisory Group chair is doing - facilitating or listening.
This section is difficult to understand due to the sequencing of the information.
- At the conclusion of the pre-evaluation commenting period, the Recommendations Group meets for the endorsement meeting - then next paragraph
- During the meeting - so far so good but then
- The E&M team shares these preliminary results with both Groups for review prior to the endorsement meeting - then back to
- At the end of meeting. (May also want to add in “the” - at the end of the meeting).
P.7 Foundational and Reconciliation Groups: The language is: Within each E&M project committee, there is a Foundational Group and a Reconciliation Group (Figure 2); but these aren’t listed in the text nor addressed in Figure 2 as such. It isn’t clear that they reference Advisory and Recommendations Groups. Foundational is also used on p.19 whereas Advisory Group may be more easily read.
While we appreciate the intent to maintain fewer, more generalized project committees, we have concerns with the ability of committee members to maintain expertise regarding the measures being reviewed. We recommend committees be structured in line with the PRMR and MSR process, where they are structured by the areas of healthcare affected by the measures: hospital, clinician, and post-acute care/long-term care (PAC/LTC). The endorsement and maintenance (E&M) process should be segregated into setting-specific committees in order for members to provide more meaningful feedback. Committee members with subject matter expertise in one setting, for example, pediatric hospitals, would likely not be able to substantively contribute to discussions for measures in home health and vice versa. This was a weakness of the NQF measure endorsement process as well, while having committees full of subject matter experts – the ability to meaningfully discuss or consider the measure was difficult across such a broad range of settings and discipline areas.
We greatly support E&M committee members receiving the information packets three weeks prior to Endorsement Meetings. Having the information several weeks in advance provide more time for committee members to substantively review and consider measures prior to the meeting. However, we do not support E&M committee members individually rating measures prior to the Endorsement Meeting where they have the ability to discuss the submitted measure information with the full committee[BE1] , especially given the current plan to not have the Recommendations group discuss measures which reach a 75% or greater consensus based on the aggregate pre-evaluation independent reviews. This format could allow for measures to be endorsed by committee members with little to no experience in a particular sector, given the proposed structure of committees.
We would appreciate further clarification regarding the endorsement decisions. Will there only be “Endorsed” and “Not Endorsed” decisions?Additional clarification should be provided in the “Endorsement Committee Review” section on how consensus may factor into the decision to endorse a measure, endorse with condition, or not endorse. Furthermore, additional clarification regarding the appeals process and what threshold would need to be met to allow or consider appeal requests.
- We believe that the prior CBE (NQF) endorsement process, although well-defined, was limited by its focus on only evaluating fully developed quality measures for accountable entities (e.g. professional or facility providers). This restricted criterion that quality measures must be tested at this provider level limited the consensus-based evaluation entity from advising stakeholders about the potential importance of some types of quality measures. This is particularly true for patient reported outcomes that require initial testing at the patient level before undergoing evaluation at the provider level. Given that clinical areas focused on rare or chronic conditions are served by a limited group of providers, it is very difficult to assemble the amount and types of data needed to carry out provider level reliability and validity testing (for scientific acceptability) to support the measure submission. As health insurers or stakeholders consider potentially burdensome decisions regarding collection of additional data to support provider level testing, it would be informative if the CBE prospectively provides an opinion regarding the potential importance of a putative measure. We strongly encourage Battelle to consider a broader view of the endorsement process, to allow your group of experts in quality measures to weigh in on these decisions. When the measure steward/developer does not or is unable to collect primary data and conduct beta testing at a provider level, the result is likely to undermine future use of an impactful measure before development is completed. This is particularly true for patient reported outcome metrics, as well as certain intermediate outcome measures. Additionally, it is imperative that a set of criteria be established for measures that are considered building block measures (i.e., measures that are necessary to inform the development of an outcome measure). These measures could also be temporary or intermediate measures with limited testing data due to the diagnosis/condition, setting, or limited patient vs. facility level analysis. In some instances, developing a patient reported outcome measure may not be possible from the outset and an intermediary phase of data collection and feedback may be necessary using an intermediary measure. Receiving endorsement on such building block measures will allow implementation of the measure in the respective settings and enable collection of large-scale data to develop future patient reported outcome measures. This will enable developers to discuss the need for and importance of developing such measures to meet the future need of developing a fully developed patient reported outcome measure.
- Our quality measure development team has extensive experience submitting quality measures to the (prior) consensus-based endorsement organization over the last two decades. We attended the committee review and debate at in-person meetings, and more recently at virtual meetings. Although virtual meetings provide some efficiencies and flexibility, we believe that some vital interactions between developers and the committee members has been lost. Our most recent reviews by the committee were performed under a process where all comments from committee members were first debated and the developer was asked to hold any responses until the committee debate was completed. At that point the developer was expected to respond to each comment or question raised over the course of the prior 30-60 minute discussion. First, when committee discussion is unfocused and steers off course with many comments raised, it is very difficult for the developer to respond to each of these comments after the fact, often up to 30-60 minutes later. Second, when a committee member makes a pejorative comment that may not be fact-based, there is no opportunity for the developer to offer correction in real-time. This has the effect of those comments biasing other committee members against a measure based on an incorrect statement or misstatement. In our experience, it is very difficult to clarify and set the record straight after another 30-60 minutes of additional committee discussion has passed. During the previous in-person meetings, the developer was able to indicate the need to respond to a comment immediately after it was made. In addition, non-verbal communication was much more evident to committee members. While we are not suggesting a return to in-person meetings, we strongly request that Battelle recognize the limitations of the most recent process under the prior CBE (NQF) which substantively limited the ability of developers to respond to questions and clarify misstatements during the course of discussion. We feel this will allow more balanced, equitable deliberation and communication between committee members and developers, and will limit introduction of bias into the quality measure endorsement process.
The American Medical Rehabilitation Providers Association (AMRPA) appreciates the opportunity to submit comments on the PQM Endorsement and Maintenance (E&M) Guidebook. AMRPA is the national trade association representing more than 700 freestanding inpatient rehabilitation facilities and rehabilitation units of acute-care general hospitals (IRFs). The vast majority of our members are Medicare participating providers. In 2021, IRFs served 335,000 Medicare Fee-for-service (FFS) beneficiaries with more than 379,000 IRF stays among 1,181 IRFs. Meaningful and effective quality reporting in the IRF program has always been a top AMRPA policy priority, and we look forward to close engagement with Battelle and the PQM moving forward.
AMRPA recognizes the importance of a consensus-based entity (CBE) and the process related to the endorsement and maintenance of quality measures that distinguish high-quality care in and among IRFs and other post-acute care providers. We also recognize how critical the E&M process is, and the amount of scientific rigor, public input, and committee consideration that is dedicated to every measure. We agree that the E&M process should result in the endorsement of measures that are safe, effective, and promote the likelihood of desired outcomes. AMRPA is hopeful that the new PQM E&M process will build upon the previous PQM process and provide a more collaborative and transparent mechanism for measure development and endorsement.
While AMRPA supports the PQM E&M concept, our review of the E&M Guidebook has identified a few concerns related to the committee assignments and structure, as well as the process for considering measures for endorsement. We note that many of these recommendations complement the separate comments we provided on the PQM Guidebook of Policies and Procedures for Pre-Rulemaking Measure Review (PRMR) and Measure Set Review (MSR) and urge Battelle to incorporate these refinements across both documents. We offer our recommendations in the attached document.
 Inpatient rehabilitation facilities (IRFs) – both freestanding and units located within acute-care hospitals – are fully licensed hospitals that must meet Medicare Hospital Conditions of Participation (COPs) and provide hospital-level care to high acuity patients. IRFs’ physician-led care, competencies, equipment and infection control protocols are just some of the features that distinguish the hospital-level care provided by IRFs from most other PAC providers.
 Medicare Payment Advisory Committee (MedPAC) March 2023 Report to the Congress – Medicare Payment Policy, Chapter 9. Pages 263 and 266.
Please see attached letter from the American Board of Family Medicine.
Thank you for the opportunity to provide comments on the proposed Evaluation and Management Guidebook (E&M) for consensus endorsement of performance measures. The suggested process enhancements represent a substantial departure from the previous approach and yield quantifiable benefits in optimizing the consensus process for performance measures. I offer the following comments below:
The December timeframe allocated for public comment on submitted measures may pose limitations for some stakeholders due to competing priorities. For instance, specialty societies will concurrently review the annual Measures Under Consideration (MUC) list. While we recognize Battelle's efforts to adopt a more efficient and nimble timeline, we understand that flexibility may be necessary during this period of the year as their may be limitations in staff who are able to review a multitude of materials at the same time. Resolving this issue might not have a straightforward solution due to uncontrollable factors and conflicting priorities.
From a measure developer perspective, the reduced timeline of a 6-month cycle (from intent to submit to the endorsement decision) will be appealing. One caveat is that the removal of the post-comment calls could potentially lead to the perception that there is less “consensus development” in the process. There may more appeals as a result (coming from the public), especially since the criteria for appeals do not appear as stringent.
It would be valuable to have an explanation of the hybrid Delphi and nominal groups techniques publicly available at some point. We encourage and support the emphasis placed on skilled/trained facilitators to run the measure evaluation meetings. The current plan to ask committee members to vote on all criteria and an overall recommendation for endorsement for all measures at the end of the meeting may be a challenge for participants after a full day of discussion.
We support the addition of the “Equity” criterion. We have noted that Battelle is eliminating the algorithms that provided guidance for evaluating evidence, reliability and validity. There are pros and cons to this. It can be good for measures that do not necessarily fit well into an algorithm pathway, but conversely, they do provide a consistent and standardized way for committee members to apply the criteria. Finally, there are no designated “must-pass” criteria for endorsement. It isn’t clear that how the “met” “not met but addressable” or “not met” ratings factor into the overall recommendation for endorsement. Does “not met but addressable” = endorsed with conditions? More guidance/clarification is needed.
E&M Committee Reduction:
Reducing and streamlining the number of E&M committees is commendable. However, there is concern that the prior structure's rigor and focus might be compromised. For instance, the Cardiology workgroup specialized in cardiovascular-specific measures, and there is apprehension about losing expertise and experience in this specific clinical category, even with the involvement of subject matter experts throughout the process. Retaining certain committee members with historical knowledge of measures could help preserve institutional knowledge from past discussions. Additionally, ensuring sufficient subject matter experts are available to evaluate specialty care-relevant measures should be a priority in the committee structure.
Measure Development & Submission:
Overall, the process changes and enhancements are well-received, and measure developers are likely to appreciate these improvements. One suggestion is to consider extending the endorsement period from three years to four or more years, with the option for annual maintenance when the developer modifies measure specifications. This would help reduce the measure steward/developer burden of maintaining endorsement. It is nice to have updated decision outcomes that now include a “endorsed with conditions” designation. Moreover, for existing measures, clarity is needed regarding the process when a measure falls below 75%, as it does not receive endorsement but also does not receive enough votes to be endorsed. Additional clarification on this matter would be highly appreciated.
URAC is an independent, nationally recognized nonprofit organization and a leader in promoting health care quality through our accreditation, certification and designation products. We offer over forty accreditation and certification programs and have more than thirty years of experience accrediting and certifying organizations in multiple health care sectors including Specialty Pharmacy, PBMs, Clinically Integrated Networks, and Care Management programs including Telehealth and Remote Patient Monitoring Accreditation.
URAC appreciates the opportunity to comment on the Endorsement and Maintenance Guidebook. Our comment loosely pertains to Appendix A. The issue of publicly available measures is foundational to Battelle’s work in the health quality space. Measures should be broadly available for use on reasonable terms and on a non-discriminatory basis such that the permitted use is reasonably anticipated. Measures should be available for use by all and not limited to use by only certain types of entities or users and not limited to use by particular vendors. In the current accreditation space, it is the practice that not all measures are able to be licensed. This has an adverse effect on the accreditation industry because the measures are being limited to use by a single accreditor. Ultimately, this stifles innovation and growth in the accreditation industry, while diminishing quality for patients and increasing costs. We believe that Battelle should take this into consideration as it further develops and implements its own policies and processes alongside the Endorsement and Maintenance Guidebook to support measure alignment and harmonization to improve health care quality.
The American Geriatrics Society (AGS) applauds the Partnership for Quality Measurement’s (PQM) commitment and efforts to engage with stakeholders on the new processes for the endorsement and maintenance (E&M) of quality measures. While the new processes generally seem reasonable, the AGS urges the inclusion of geriatrics expertise as these policies and procedures are further refined. It is also critically important to ensure that the various committees for E&M include geriatrics expertise, particularly as the current Geriatrics and Palliative Care committee will be dissolved. Geriatrics health professionals provide care for older adults, usually over the age of 65, and see the oldest and sickest patients. Their expertise in caring for older people with medical complexity or serious illness, leading interprofessional collaboration, implementing knowledge of long-term care across settings and sites, and treating older people as whole persons would be an essential skill set for the quality measurement process. Given the heterogeneity of older people, geriatrics health professionals are crucial to ensure that quality measures and the related processes meaningfully consider the unique healthcare needs of this growing population as well as the geriatrics specialty.