Child HP CAHPS Survey – Rating of Specialist

Importance

Strengths

Data from 2024 show a performance gap, with top-box score decile ranges from 62.1% to 82.6%, indicating variability in performance between health plans and less than optimal performance across all health plans (ideal performance is 100%).

Limitations

The logic model provided does not clearly articulate the relationships between inputs, activities, and outcomes. For example, the logic model does not clearly depict how the HP-CAHPS Rating of Specialist measure leads to improvements in clinical quality. Further, the logic model does not include assumptions, external factors, or feedback mechanisms. The submission could be strengthened by more clearly depicting the relationship between the HP-CAHPS Rating of Specialist and inputs, activities, outputs, and outcomes. It could also be strengthened by stating assumptions, external factors, and feedback mechanisms.

The evidence review includes three literature reviews from 2013, 2024, and 2021 which show patient experience measures are related to clinical outcomes. However, it does not include empirical evidence linking Rating of Specialist to clinical outcomes. The studies cited are not specific to parents’/guardians’ ratings of specialists providing care to their children.

Patient input is either not sufficiently sought or does not clearly support the conclusion that the measure is meaningful. The measure developer cites three empirical studies from 2007 and 2009 which demonstrate patients use patient experience measures to make decisions. However, these data are not specific to measures reporting patients’ Rating of Specialist. They are also not specific to parents’/guardians’ ratings of specialists providing care to their children. The degree of certainty from patient input is low.

The submission could be strengthened by incorporating more findings from the CAHPS Consortium’s literature review, focus groups, Technical Expert Panel, and other development activities which show the importance of this measure construct to patients and/or a link between parents’/guardians’ ratings of specialists providing care for their children and health outcomes. It could also be enhanced by including more recent literature related to the measure focus.

Rationale

The maintenance measure is rated as 'Not Met But Addressable for importance due to a non-specific logic model and insufficient patient input/meaningfulness. Enhancements, including a greater focus on the construct addressed by this measure (rating of specialist), a more specific focus on care provided to children (the focus of this measure), and a more robust description of efforts to ensure patient meaningfulness could elevate its importance.

Closing Care Gaps

Closing Care Gap Rating

Closing Care Gaps

The developer did not address this optional domain.

Feasibility Assessment

Feasibility Assessment Rating

Feasibility Assessment

Strengths

This is a patient reported experience performance measure (PRE-PM). These data are collected from parents/guardians of children patients and are not available in a structured source outside the health plan or sponsoring organization.

The survey can be administered electronically, however, the developer states that non-electronic response options should be available for enrollees with limited internet access. Mail is the most frequent mode for CAHPS surveys.

The developer indicates that the only change to the instrument was a change to survey wording that allowed respondents to consider care that was provided virtually (i.e. by phone or video). They assert this change did not impact data structure or availability.

The developer addresses burden associated with data entry, validation, and analysis. They discussed electronic feasibility, missing data, susceptibility to inaccuracies, and ability to audit data. They note the survey takes about 15 minutes to complete, depending on the respondent. Sampling uses administrative and enrollment data that is maintained by the health plans. Because the measure is collected outside of healthcare encounters, there is no impact on patient-physician interactions.

The developer described how all required data elements can be collected without risk to patient confidentiality, including administering the survey so the data are de-identified upon collection, only reporting responses in aggregate form, and not reporting results if there are fewer than 10 respondents.

There are no fees, licensing, or other requirements to use any aspect of the measure (e.g., value/code set, risk model, programming code, algorithm).

Limitations

The feasibility domain can be strengthened by providing the median cost of vendor engagement to administer CAHPS (or a similar metric).

Rationale

This maintenance measure meets all criteria for “Met” for feasibility due to its well-documented feasibility assessment, clear and implementable data collection strategy, and transparent handling of patient confidentiality, burden, licensing, and fees. These factors collectively ensure that the measure can be implemented effectively and sustainably in a real-world healthcare setting.

Scientific Acceptability

Scientific Acceptability Reliability Rating

Scientific Acceptability Reliability

Strengths

Data sources used for reliability analysis are adequately described and include a database with survey results collected from July 2023 through June 2024.
The developer conducted reliability testing using the ICC and the Spearman-Brown prophecy formula at the accountable entity-level.

Limitations

The developer performed reliability testing for this maintenance measure, namely, they conducted accountable entity-level reliability testing at the site level using the unadjusted measure scores rather than the case-mix adjusted measure scores with the rationale being case-mix adjustment is not needed when entities are not compared to each other.
The entities included in the testing were characterized by practice site and a minimum sample size of 20 completed surveys. Developer states approximately 300 completed surveys per practice are needed for statistically reliable results and does not give a rationale for minimum sample size of 20.
The percentage of sites meeting the expected threshold of 0.6 for split-half reliability was unclear from the measure submission.

Rationale

This maintenance measure is rated as ‘Not Met But Addressable’ for reliability because the developer performed the required reliability testing for this measure but it is unclear whether the results demonstrate sufficient reliability at the accountable entity-level. However, the identified limitations are deemed addressable, as the developer may consider using case-mixed adjusted data for reliability testing and providing their rationale for a minimum sample size of 20 completed survey per practice site. By addressing these issues, there is potential to demonstrate sufficient reliability at the accountable entity-level.

Scientific Acceptability Validity Rating

Scientific Acceptability Validity

Strengths

The developer performed the required validity testing for this maintenance measure, namely, they conducted person (“data element”) validity testing for all critical data elements and accountable entity-level (“measure score”) validity testing at the health plan level. The data source used for validity analysis was AHRQ CAHPS Health Plan Survey Database administered to parents or guardians of Medicaid beneficiaries aged 0-17 from July 2023 through June 2024. Data included 111,833 respondents from 234 health plans in 49 states, the District of Columbia, and Puerto Rico.
The developer conducted empirical validity testing at the accountable entity level using Spearman’s rank-order correlation on unadjusted top box scores. The developer hypothesized that the Rating of Specialist would be positively and significantly correlated at weak to moderate magnitude with the four composites, having the strongest correlation with Getting Needed Care, reasoning that both measures assess experience with specialists. The developer posited that measures of patient experience should be correlated, but not so highly correlated as to suggest the measures are not distinct, citing rho > 0.80 as an example of a very large correlation. Rating of Specialist was significantly, positively correlated at the health plan level with all four composites (rhos ranged from 0.14 to 0.27), with slightly stronger correlation with Getting Needed Care (rho = 0.27), in line with the hypothesis.

The developer conducted statistical case-mix adjustment, selecting case-mix indicators that are present at the start of care and have a significant correlation with the outcome.

Limitations

With respect to entity-level validity testing, while the developer has indicated that quality improvement activities can impact more than one measure, their hypotheses regarding the mechanisms involved and the degree to which these mechanisms are shared between IDMs is not clearly articulated. In the absence of an external gold standard against which to validate at least one of the IDMs, this submission would be strengthened by additional support in the logic model and evidence review guiding development of hypotheses about expected magnitudes of each correlation.
The developer used unadjusted scores for validity testing when use of adjusted scores might help rule out known sources of confounding.

The developer states that case-mix adjustment of the measure is optional by the user, and does not provide guidance or supporting rationale stating when adjustment is or is not appropriate. The developer did not provide evidence demonstrating variation in the prevalence of case mix factors across accountable entities. The statistical testing results provided by the developer do not reflect the impact of adjustment on providers at the high or low extremes of the case mix.

Rationale

This maintenance measure is rated as ‘Not Met But Addressable’ for validity because the accountable-entity validity testing results partially support an inference of validity for the measure, suggesting that the measure somewhat accurately reflects performance on patient experience of care and can distinguish good from poor performance to a limited extent. This submission would be strengthened by explicitly ruling in mechanisms and ruling out confounders for the effect of health plan quality on survey respondents' Ratings of Specialist.

The developer employed a statistical case-mix adjustment approach, utilizing a conceptual model designed to account for demographic case-mix factors. Variation in the prevalence of case-mix indicators across different entities was not shown and the model testing results provided do not reflect whether case-mix differences are being appropriately accounted for.

Use and Usability

Use and Usability Rating

Use and Usability

Strengths

The measure is currently used in the Office of Personal Management Federal Employees Health Benefits (FEHB) Health Plan Performance Assessment project, NCQA Health Insurance Plan Ranking and Health Plan Accreditation, CMS Medicare Advantage (MA) and Prescription Drug Plan (PDP) Program, Patient Protection and Affordable Care Act – CMS Exchange and Insurance Market Standards/ Quality Rating System, Agency for Healthcare Research and Quality CAHPS Database, and CMS Core Measure Reporting.

The developer provides a summary of how accountable entities can use the measure results to improve performance, drawn from the CAHPS Ambulatory Care Improvement Guide. Specifically, Health Plans can invest in hiring staff who are service-minded and provide training on enrollee services so they can provide accurate information. Health plans should also listen to and act on enrollee complaints.

The developer seeks input from users including accreditors, health plans, and the public. The developer reports they have made minor changes to the wording of the instrument to reflect feedback from stakeholders, such as adding language to include care delivered by video or phone in addition to in-person care.

The developer reported no unexpected findings.

Limitations

The developer reported changes in performance from 70% in 2014 to 74% in 2020. However, they note performance declined to 71% in 2023 and increased to 72% in 2024. The developer does not provide an explanation for this decrease. They assert this highlights the need for strengthening relationships between patients/parents/guardians and specialists.

The developer summarized how accountable entities can use the measure results to improve performance, but the submission could be enhanced by including these in the measure's logic model.

Rationale

This maintenance measure is rated ‘Not met, but addressable'. The measures shows variability in performance from 2014-2024. However, the application could be strengthened by offering an explanation of mean level decreases in performance and the subsequent rebound. The developer reported no unexpected findings.

Committee Independent Review

Endorse

Importance

Importance Rating

Importance

As with most of these, this is an important measure in understanding at a directional level the satisfaction of specialists.

Closing Care Gaps

Closing Care Gaps Rating

Closing Care Gaps

I don't think is addressed

Feasibility Assessment

Feasibility Assessment Rating

Feasibility Assessment

It is feasible, but for patients with multiple specialists how is that distinction made to the patient or family? If they see both an APP and physician, or only one or the other, how would that be made clear in the assessment?

Scientific Acceptability

Scientific Acceptability Reliability Rating

Scientific Acceptability Reliability

Agree with staff assessment

Scientific Acceptability Validity Rating

Scientific Acceptability Validity

Agree with staff assessment

Use and Usability

Use and Usability Rating

Use and Usability

As with a lot of these measures, I'm not sure they get to the level of evaluation to really drive specific tactics and change.

Summary

It is an important measure and brings value, but could benefit from some optimization.

Endorse

Importance

Importance Rating

Importance

Endorse

Closing Care Gaps

Closing Care Gaps Rating

Closing Care Gaps

Endorse

Feasibility Assessment

Feasibility Assessment Rating

Feasibility Assessment

Endorse

Scientific Acceptability

Scientific Acceptability Reliability Rating

Scientific Acceptability Reliability

Endorse

Scientific Acceptability Validity Rating

Scientific Acceptability Validity

Endorse

Use and Usability

Use and Usability Rating

Use and Usability

Endorse

Summary

No concerns

Specialist Measure

Importance

Importance Rating

Importance

Rating of Specialist reflects a critical aspect of children’s care, as specialists often manage complex or ongoing conditions requiring trust, coordination, and clear communication with parents or guardians. The presence of a 2024 performance gap (top-box range 62.1%–82.6%) demonstrates meaningful variation across health plans and indicates room for improvement. However, the current submission does not sufficiently connect this measure to downstream clinical outcomes for children, nor does the logic model clearly explain how improvements in specialist ratings translate into improved health or care coordination. In addition, patient input specific to parents’ or guardians’ experiences with pediatric specialists is limited. These gaps are addressable through a more targeted logic model and stronger, child-specific evidence of meaningfulness.

Closing Care Gaps

Closing Care Gaps Rating

Closing Care Gaps

The developer did not address the optional Closing Care Gaps domain. While Rating of Specialist has potential to highlight disparities in access to and quality of specialty care for children, the submission does not describe how results are used to identify or close gaps across populations, geographies, or subgroups. Explicit articulation of how this measure could inform targeted interventions would strengthen its contribution to care gap reduction.

Feasibility Assessment

Feasibility Assessment Rating

Feasibility Assessment

This measure demonstrates strong feasibility. Data are collected through an established, standardized CAHPS survey administered outside of clinical encounters, minimizing burden on providers and families. Multiple administration modes, including mail and electronic options, support equitable participation. The developer clearly addresses confidentiality protections, data validation, auditing, and respondent burden, and there are no licensing or usage fees. These features support sustainable and consistent implementation across health plans.

Scientific Acceptability

Scientific Acceptability Reliability Rating

Scientific Acceptability Reliability

The developer conducted required entity-level reliability testing using accepted statistical methods. However, testing relied on unadjusted scores and included entities with a minimum sample size of 20 completed surveys, without sufficient justification. In addition, it is unclear how many sites met the expected reliability threshold. These limitations do not negate the value of the measure but should be addressed through clearer rationale for sample size thresholds and use of adjusted scores where appropriate to strengthen confidence in reliability.

Scientific Acceptability Validity Rating

Scientific Acceptability Validity

Validity testing shows that Rating of Specialist is moderately correlated with related access and experience measures, consistent with stated hypotheses. However, the submission does not clearly articulate the mechanisms linking specialist ratings to broader care quality or outcomes for children, nor does it adequately address potential confounding through case-mix adjustment. Additional clarity in the logic model, stronger justification of hypotheses, and clearer guidance on when adjustment is appropriate would strengthen the validity argument.

Use and Usability

Use and Usability Rating

Use and Usability

This measure is widely used across federal and accreditation programs and provides actionable insight into families’ experiences with specialty care. However, the submission does not explain recent performance declines or clearly link results to sustained improvement strategies within the logic model. Greater transparency regarding performance trends and clearer integration of improvement actions would enhance usability and trust in the measure.

Summary

I support CAHPS and believe the Rating of Specialist measure captures an important dimension of pediatric care. While the measure has a strong foundation and broad use, its importance and scientific acceptability would benefit from clearer child-specific logic modeling, stronger parent/guardian-centered evidence, and more explicit articulation of how results drive improvement. With these enhancements, the measure has clear potential to fully meet endorsement criteria.

0006-15

Importance

Importance Rating

Importance

No Comments

Closing Care Gaps

Closing Care Gaps Rating

Closing Care Gaps

No Comments

Feasibility Assessment

Feasibility Assessment Rating

Feasibility Assessment

No Comments

Scientific Acceptability

Scientific Acceptability Reliability Rating

Scientific Acceptability Reliability

No Comment

Scientific Acceptability Validity Rating

Scientific Acceptability Validity

No Comment

Use and Usability

Use and Usability Rating

Use and Usability

No Comment

Summary

No Comment

High Importance, Case-Mix Adjustment Approach Acceptable

Importance

Importance Rating

Importance

Disagree with staff assessment. The wide distribution of scores overall suggests that patients have identified clear issues with the health care system reflected in their responses here, and several studies are provided supporting the link between patient experience with their health care providers, including specialists and clinical quality outcomes.

Closing Care Gaps

Closing Care Gaps Rating

Closing Care Gaps

N/A

Feasibility Assessment

Feasibility Assessment Rating

Feasibility Assessment

Agree with staff assessment. Measure is clearly feasible and widely used.

Scientific Acceptability

Scientific Acceptability Reliability Rating

Scientific Acceptability Reliability

Disagree with staff assessment. I do not anticipate issues with reliability of the measure score if it was re-ran with case mix adjusted figures, as the developer provided evidence that case mix adjustment has only a modest effect on the measure score, and in any case is an optional “add-on” to the existing measure score calculation. A mitigation strategy is presented for low reliability entities.

Scientific Acceptability Validity Rating

Scientific Acceptability Validity

Disagree with staff assessment. I do not find that the issues presented with the case mix adjustment specific to validity are sufficient to affect the rating of the criterion, as these are optional adjustments and largely under the discretion of the individual adjuster.

Use and Usability

Use and Usability Rating

Use and Usability

Disagree with staff assessment. Although there is little explanation provided for changes in performance, the changes in performance are slight and could be consistent with a positive upward drift.

Summary

Although the submission could be strengthened in some areas, the specific weaknesses are not sufficient to threaten the continued endorsement of this measure. This measure is a rare source of patient-reported data about the health care system, and reflects the performance of entities that are increasingly critical in guiding the course of health care in the United States, as health plans assume ever greater levels of control over provisioning care for their members.

Not met

Importance

Importance Rating

Importance

There ought to be evidence that the measure not only measures rates the health plans, but that the rating influences the plan's activities. Many will see that proposal as unrealistic; however, if that is the case, why are we measuring? The developer ought to be drawing a line between the measurement and plan outcomes. If they cannot do that, the measure is unimportant.

Further, as indicated by the staff assessment, the developers cite old studies, present no new data, and do not provide any real evidence of importance. The developers need to do the work to establish importance. The measures have been around long enough to have a history of importance--if such importance exists--and be able to show a relationship between measurement and outcomes--if such a relationship exists.

Closing Care Gaps

Closing Care Gaps Rating

Closing Care Gaps

This measure ought to be closing care gaps. The fact that the developer is not addressing this criterion reinforces the posit that it is not important, either.

Feasibility Assessment

Feasibility Assessment Rating

Feasibility Assessment

The measure's feasibility ought to reflect not only its data collection possibilities, but also whether it has a benefit relative to its cost--including its personnel time and respondents' time. It provides no such data. For as long as it has been in use, such data ought to be available. If the measure is truly feasible, the developer ought to present the data with pride and vigor. That is not the case.

Scientific Acceptability

Scientific Acceptability Reliability Rating

Scientific Acceptability Reliability

Consistent with the staff assessment, I agree that the developer submitted reliability data; however, examination of the measure's reliability ought to address all instances in which users employ the measure. As reported, this measure lacks such reliability measurement.

Scientific Acceptability Validity Rating

Scientific Acceptability Validity

I am in complete concurrence with the staff assessment; however, given the period over which users have employed the measure the developer ought to have evidence on its usability that goes beyond outlining where organizations use it. If a measure is usable, it stands to reason that people have no difficulty in its use and that there is evidence to support that lack of difficulty. The developers present none of that.

Use and Usability

Use and Usability Rating

Use and Usability

Summary

Considering how long this measure has been in use, the developers ought to be able to produce an adequate supply of current measure data to support importance, feasibility, acceptability, and use and usability. They do not. This measure is not acceptable in its current form.

Support

Importance

Importance Rating

Importance

No additional comments.

Closing Care Gaps

Closing Care Gaps Rating

Closing Care Gaps

Optional item not submitted by measure developer.

Feasibility Assessment

Feasibility Assessment Rating

Feasibility Assessment

No additional comments.

Scientific Acceptability

Scientific Acceptability Reliability Rating

Scientific Acceptability Reliability

Measure developer can address the items noted in staff preliminary assessment.

Scientific Acceptability Validity Rating

Scientific Acceptability Validity

Measure developer can address the items noted in staff preliminary assessment.

Use and Usability

Use and Usability Rating

Use and Usability

No additional comments.

Summary

No additional comments.

summary

Importance

Importance Rating

Importance

Closing Care Gaps

Closing Care Gaps Rating

Closing Care Gaps

Feasibility Assessment

Feasibility Assessment Rating

Feasibility Assessment

Scientific Acceptability

Scientific Acceptability Reliability Rating

Scientific Acceptability Reliability

Scientific Acceptability Validity Rating

Scientific Acceptability Validity

Use and Usability

Use and Usability Rating