|Home | About | Journals | Submit | Contact Us | Français|
Patient reported outcomes data is needed to determine the efficacy of cosmetic procedures.
To describe the development and psychometric evaluation of eight appearance scales and two adverse effects checklists for use in minimally invasive cosmetic procedures.
Recruitment was between June 2010 and July 2014. Psychometric study to select the most clinically sensitive items for inclusion in item-reduced scales, and to examine reliability and validity with patients.
Plastic surgery and dermatology outpatient clinics in the USA and Canada and a clinical trial of a minimally invasive lip treatment in the United Kingdom and France.
Pre- and post-treatment facial aesthetic patients aged 18 years and older consulting for any type of facial aesthetic treatment.
FACE-Q scales measuring appearance of the skin, lips and rhytides (i.e., overall, forehead, glabella, lateral periorbital area and lips and marionette lines). Scale scores range from 0 (lowest) to 100 (highest). FACE-Q adverse effects checklists for problems following skin and lip treatment.
The sample (783) included 503 patients (response rate 90%) and 280 clinical trial participants. Rasch Measurement Theory analyses led to the refinement of 8 appearance scales with 66 total items. All FACE-Q scale items had ordered thresholds and acceptable item fit. Reliability, measured with the Personal Separation Index (0.88 to 0.95) and Cronbach alpha (0.93 to 0.98), was high. Lower scores for appearance scales measuring the skin, lips and lip rhytides correlated with the reporting of more skin- and lip-related adverse effects. Higher scores for the 8 appearance scales correlated (range 0.70 to 0.28) with higher scores on the core 10-item FACE-Q Satisfaction with Facial Appearance scale. In the pre-treatment group, older age was significantly correlated with lower scores on 5/6 rhytide scales (exception forehead rhytides). Pre-treatment patients reported significantly lower scores on 7/8 appearance scales compared with post-treatment patients (exception skin).
FACE-Q appearance scales and adverse effects checklists can be used in clinical practice, research and quality improvement to incorporate cosmetic patients’ perspective in outcome assessments.
In 2014, 13.9 million minimally-invasive cosmetic procedures were performed in the USA representing an increase of 3 percent over the year before . In order to include the patient voice in the assessment of treatment outcomes in the cosmetics industry, patient-reported outcome (PRO) instruments are needed . A review of PRO instruments in 96,736 registered clinical trials between 2007 and 2013 showed that 27 percent used one or more, with 17 percent as a primary or secondary endpoint . The choice of which PRO instrument to use in a study is a crucial decision. If the wrong instrument is used, it may appear that a new aesthetic product or intervention has little to no benefit.
Engaging patients in the identification of issues that matter to them, and using their stories to develop PRO instruments, can help to ensure content validity [4–6]. Unfortunately, few such instruments are available for cosmetic treatments. A literature review to identify PRO instruments for cosmetic procedures  found nine of which three met international recommendations for how such tools should be developed and validated, i.e., BREAST-Q [8–9], FACE-Q  and Skindex . The review concluded that research dedicated to the evaluation of PRO instruments in cosmetic surgery is urgently required.
The FACE-Q [10, 12–16] is a PRO instrument that includes 40 plus scales and checklists designed to measure appearance, adverse effects, health-related quality of life (HRQL) and experience of health care. These domains form the basis of the FACE-Q conceptual framework. Each domain contains multiple scales and checklists. Due to the large number of scales, validation results are being published as a series of papers, each of which describes clinically relevant groupings. The aim of this paper is to describe the set of FACE-Q scales/checklists that can be used to evaluate minimally invasive cosmetic procedures. Specifically, here we describe our psychometric findings for eight appearance scales designed to evaluate skin, lips and rhytides (overall, forehead, glabella, lateral periorbital area, lips and marionette lines). We also describe two checklists designed to measure adverse effects for skin and lip treatment.
Prior to study commencement, research ethics approval was obtained at The New School in New York City, University College London Hospitals NHS Foundation Trust in London, and University of British Columbia in Vancouver.
The FACE-Q was developed by following the USA FDA guidance to industry [2, 17] and other guidance documents [5–6]. We describe our methodology elsewhere [13, 18–20]. Briefly, a systematic review , qualitative interviews with 50 facial aesthetic patients, and input from 26 experts were used to develop the FACE-Q conceptual framework and scales/checklists. The content of each scale was then refined through cognitive interviews with 35 patients. We developed four response options in keeping with best practice . Instructions ask respondents to answer in relation to the past week.
The scales for skin and lips measure satisfaction with appearance. The six scales that measure appearance of rhytides (overall, forehead, glabella, lateral periorbital area and lips and marionette lines), and the adverse effect checklists (skin and lips), evaluate how bothered someone is by these concepts. e-Table 1 in the Supplement shows the content and response options for the scales and checklists.
For validation purposes, we included three additional FACE-Q scales: 10-item Satisfaction with Facial Appearance scale, 10-item psychological Function scale and 8-item Social Function scale. These scales previously demonstrated reliability, validity and the ability to detect change [8, 15]. Participants were also asked questions so the sample could be characterized by age, gender and ethnicity.
Inclusion criteria for the study were any patient aged 18 years or older who was pre- or post-treatment for one or more of any type of surgical or nonsurgical facial aesthetic treatment. For minimally invasive treatments, returning patients asked to participant, those who were more than four months post-treatment for Botulinum toxin, and who were more nine months post-treatment for soft tissue fillers, were considered to be pre-treatment subjects in our study sample. Participants were recruited from four dermatology and eleven plastic surgery offices in the USA and Canada between June 2010 and July 2014. For eleven clinics, staff provided a questionnaire booklet to complete in the waiting room at check-in. The remaining clinics invited patients to participate via a postal survey that included a personalized letter from the relevant health care provider alongside a questionnaire booklet with up to 3 mailed reminders. Potential participants were provided a five dollar coffee card in appreciation of their participation. Completion of the FACE-Q questionnaire implied consent.
An international randomized, 2-arm, active-controlled study  recruited patients aged 18 years and older for a volume enhancement lip treatment. Participants were recruited from 12 sites in the United Kingdom and France. The treatment injection volume was based on clinical experience and subject lip treatment goals. Vermilion body and vermilion border were the primary treatment site; additional perioral sites could also be treated. This study was approved by independent ethics committees. All subjects provided written informed consent. More details about the study sample and methods are published elsewhere .
The scales measuring lips and Satisfaction with Facial Appearance were administered on days 0, 30 and 90. The scales measuring lip rhytides and Psychological and Social Function were administered on days 0, 14, 30 and 90. The adverse effects checklist for lips was administered on days 14 and 30. These scales were translated into French by MAPI Research Trust, following their linguistic validation methodology, which includes two separate forward translations by two qualified translators, a reconciliation process, and one backward translation by a qualified translator .
For the adverse effect checklists, the proportion of responses for each response option was computed. For the appearance scales, Rasch Measurement Theory (RMT) [25–26] was conducted within RUMM2030 software . RMT examines the difference between observed and predicted item responses to determine if data from a sample fit the Rasch model . The results from a range of statistical and graphical tests were examined, with the evidence considered together to make a decision about each scale’s overall quality [28–30]. We performed the following:
We also computed a Cronbach alpha for each scale, which provides a measure of how closely related a set of items are as a group . Rasch logit scores for each participant were transformed into scores from 0 (worst) to 100 (best). The scoring algorithm is available from the authors. Pearson correlations to examine associations between scores, and independent samples t-tests used to test for differences between means, were used to test the following hypotheses:
A total of 503 out of 558 patients invited to participate completed a FACE-Q booklet containing one more of the scales described in this study (response rate 90 percent). Additionally, 280 individuals participated in the lip enhancement clinical trial, for a total of 783 participants. Table 1 shows sample characteristics. When we compared the field-test sample with the clinical trial sample, mean age did not differ (p=0.77 on independent sample t-test), but gender did (p<0.001 on Chi-square test). Specifically, the clinical trial sample had fewer than expected men (9 versus 2 percent).
The checklist measuring adverse effects of the skin was completed by 74 participants on average 2.4 (SD=3.6) months after a skin treatment (range immediate to 12 months). The top three items endorsed included redness, uneven skin tone and skin sensitivity (see Table 2).
The RMT analysis supported the reliability and validity of the appearance scales. All 66 items had ordered thresholds, providing evidence that each scales’ response options worked as a continuum that increased for the construct measured. Fit residuals were within the −2.5 to +2.5 recommended range for 50/66 items (see eTable 2 in the Supplement) and 66/66 were not significant in terms of the Chi-square p-values providing evidence that the items fit the expectations of the Rasch model for each scale. The 16 items with fit outside the recommended range were retained due to their clinical importance. The item residuals were above 0.30 (range 0.35 to 0.59) for 6 pairs of items within 5 scales. Subtests performed on the pairs of items revealed marginal impact on scale reliability (0 to 0.01 difference in PSI value). For the scale measuring satisfaction with lips, DIF was detected for age and/or country on 5 items. When these items were split on the variable with DIF, and the new person locations for the scale were correlated with the original person locations, the DIF had a negligible impact (Pearson correlations were 0.99).
Figure 1 shows the person-item threshold distribution for the scale measuring rhytides overall as an example of targeting. The x-axis represents the construct (rhytide appearance), with higher scores (less bothered) increasing to the right. The y-axis shows the frequency of person measure locations (top histogram) and item locations (bottom histogram). In Figure 1a, the sample was divided into four groups based on their answer (“Not at all”, “A little”, “Moderately” or “Extremely”) to a stand-alone item that asked how bothered they were by: “How the lines on your face look overall?” In Figure 1b, the sample was divided into pre-or post-treatment groups. These examples provide evidence that the sample lay inside the range in which the scale provided measurement.
The p-values for fit to the Rasch model were not significant for 7/8 scales, which indicates that the data satisfied the requirements of the Rasch model. The p-value for the scale measuring lip rhytides was just marginally significant (p=0.02). The eight scales evidenced high reliability: PSI and Cronbach alpha values were as follows: skin (0.93, 0.93), lips (0.95, 0.97), and rhytides overall (0.93, 0.95) and on forehead (0.88, 0.95), glabella (0.91, 0.96), lateral periorbital area (0.92, 0.96), Lips (0.93, 0.97) and marionette lines (0.92, 0.98).
Pearson correlations between the eight scales and Satisfaction with Facial Appearance scores were significant (p<0.001) and ranged from 0.70 (skin) to 0.28 (glabella rhytides). Correlations between the eight scales and Psychological Function were significant (p=0.03 to <0.001)) for 7/8 scales (exception glabella rhytides), and ranged from 0.51 (lateral periorbital area rhytides) to 0.32 (rhytides overall). Correlations between the eight scales and Social Function were significant for three scales, including lateral periorbital area rhytides (r=0.40, p<0.002), lips (r=0.35, p<0.001) and lip rhytides (r=0.28, p<0.001).
More skin-related adverse effects correlated with lower scores on the skin scale (r=−0.48, p<0.001). More lip-related adverse effects correlated with lower scores on the lip (r=−0.21, p=0.001) and lip rhytides (r=−0.32, p<0.001) scales.
In the pre-treatment group, correlations between older age and lower scores for the rhytides scales were significant for 5/6 (exception forehead rhytides): rhytides overall (r=−0.41, p<.001), and on glabella (r=−0.28, p=0.03), lateral periorbital area (r=−0.35, p=0.001), lips (r=−0.52, p<0.001) and marionette lines (r=−0.65, p<0.001). In the post-treatment group, age was not significantly correlated with scores from 5/6 rhytide scales (exception lip rhytides: r=−0.32, p<0.001).
Figure 2 shows the mean scores for the eight appearance scales for pre- and post-treatment data. Pre-treatment patients reported significantly lower scores on 7/8 scales (exception skin scale) compared with post-treatment patients (p-values ranged <0.001 to 0.005 on Independent Samples t-tests).
Growing acceptance of facial cosmetic treatments has led to an industry that continues to expand. Research is urgently needed to ensure that new treatments are both safe and effective. The FACE-Q is a rigorously developed PRO instrument that can be used by academics and clinicians to collect evidenced-based outcome data from facial aesthetic patients.
The FACE-Q is currently the only PRO instrument that includes scales measuring facial appearance. Some FACE-Q appearance scales ask about satisfaction with appearance, and other scales, for negative concepts such as rhytides, ask about being bothered by appearance. Other PRO instruments used in facial aesthetics research measure appearance-related psychosocial distress rather than appearance per se. For example, the rigorously developed 61-item Skindex  measures Negative Affect, Self-Esteem, Anxiety, Physical Discomfort, Physical Limitations, Self-consciousness and Intimacy. A PRO instrument that measures psychosocial issues would not be the best choice for measuring change in appearance.
The psychometric analyses in this paper provided evidence of reliability and validity of FACE-Q scales. Also, and fundamentally, our use of RMT methods to develop the FACE-Q has certain advantages. RMT methods differ from traditional psychometric methods (based on Classical Test Theory) as their focus is on the relationship between a person’s measurement and their probability of responding to an item, rather than the relationship between a person’s measurement and their observed scale total score . Advantages of using RMT to develop PRO instruments include: 1) RMT provides measurements of people that are independent of the sampling distribution of the items used, and locates items in a scale independent of the sampling distribution of the people in whom they are developed; 2) RMT improves the potential to diagnose item-level psychometric issues; and 3) RMT allows for a more accurate picture of individual person measurements . These assets, together with the extensive qualitative work done to create the FACE-Q, are what set the FACE-Q apart from other PRO instruments in the same clinical area.
This study has previously described limitations [10, 13–16]. First, the sample was heterogeneous (e.g., varied by age, gender, timing of assessment), which limits the outcome findings we can report. Second, our sample and that of the clinical trial had many more women than men, which reflects the make-up of cosmetic patients. Third, there could have been bias introduced at the clinic level by office staff who recruited their patients for us. Finally, few field-test participants completed the FACE-Q before and after treatment. Responsiveness research is needed to document the benefits of treatment for specific facial treatments.
To conclude, evidence-based information about patient outcomes for facial aesthetic treatments is needed. The FACE-Q provides the research community and clinicians with a PRO instrument they can use to include the patient voice in the assessment of outcomes.
We are indebted to the following clinicians who recruited their patients into the FACE-Q field-test: Drs D Berson, JC. Grotting, JM Kenkel, F Nahai, RJ Rohrich, A Rossi, JM Sykes, N Van Laeken, L Young and J Rivers. We would also like to thank Diane Murphy at Allergan Medical for providing FACE-Q data from their clinical trial.
Funding/Support: This study was supported by a grant from the Plastic Surgery Foundation.
Funding/Sponsor was involved?
Design and conduct of the study? NO
Collection, management, analysis, and interpretation of data? NO
Preparation, review, or approval of the manuscript? NO
Decision to submit the manuscript for publication? NO
Author Contributions: Drs AK, AP had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. Study concept and design: AK, SC and AP. Acquisition, analysis, and interpretation of data: AK, SC, JS, SB, AC, JC, AC, AP. Drafting of manuscript: AK. Critical revision of the manuscript for important intellectual content: AK, SC, JS, SB, AC, JC, AC, AP. Statistical analysis: AK, SC, AP. Obtained funding: AP, AK, SC. Administrative, technical or material support: AK, SC, JS, AP. Study supervision: AP.
Financial Disclosure (relationships relevant to this manuscript): The FACE-Q is owned by Memorial Sloan-Kettering Cancer Center. Drs Cano, Klassen and Pusic are co-developers of the FACE-Q and, as such, receive a share of any license revenues as royalties based on Memorial Sloan-Kettering Cancer Center’s inventor sharing policy.
Financial Disclosure (all other relationships): Dr Cano is co-founder of MODUS OUTCOMES, an outcomes research and consulting firm that provides services to pharmaceutical, medical device, and biotechnology companies. Drs A Carruthers and J Carruthers are consultants and investigators for Allergan, Merz, Kythera and Alphaeon.
Meeting where preliminary results were presented: Schwitzer J, Klassen AF, Cano SJ, Pusic AL. Measuring Outcomes That Matter to Cosmetic Patients: Development and Validation of the FACE-Q Satisfaction with Lips and Lip Lines Scales. Plast Reconstr Surg. 2014 Oct;134 (4S-1 Suppl):97 American Society of Plastic Surgeons (ASPS), Plastic Surgery Meeting; Chicago, IL. (October 2014). Oral
Anne F Klassen, McMaster University, Hamilton Canada.
Stefan J Cano, Plymouth University Peninsula Schools of Medicine and Dentistry, Plymouth, UK.
Jonathan A Schwitzer, Memorial Sloan Kettering Cancer Center, New York USA; MedStar Georgetown University Hospital, Washington DC USA.
Stephen B Baker, MedStar Georgetown University Hospital, Washington DC USA.
Alastair Carruthers, University of British Columbia, Vancouver, Canada.
Jean Carruthers, University of British Columbia, Vancouver, Canada.
Anne Chapas, New York University School of Medicine, New York, NY, USA.
Andrea L Pusic, Memorial Sloan Kettering Cancer Center, New York USA.