|Home | About | Journals | Submit | Contact Us | Français|
Children with a cleft of the upper lip exhibit obvious facial disfigurement. Many require multiple lip surgeries for an optimal esthetic result. However, because the decision for lip revision is based on subjective clinical criteria, clinicians may disagree on whether these surgeries should be performed. To establish more reliable, functionally relevant outcome criteria for evaluation and treatment planning, a clinical trial currently is in progress. In this article, the design of the clinical trial is described and results of a study on subjective evaluations of facial form by surgeons for or against the need for lip revision surgery are presented.
Parallel, three-group, nonrandomized clinical trial and subjective evaluations/ratings of facial views by surgeons.
For the clinical trial, children with repaired cleft lip and palate scheduled for a secondary lip revision, children with repaired cleft lip and palate who did not have lip revision, and noncleft children. For the subjective evaluations, surgeons’ facial ratings of 21 children with repaired cleft lip.
Descriptive and Kappa statistics assessing the concordance of surgeons’ ratings of (a) repeated facial views and (b) a recommendation of revision on viewing the prerevision and postrevision views.
The surgeons’ consistency in rating repeated views was moderate to excellent; however, agreement among the surgeons when rating individual participants was low to moderate.
The findings suggest that the agreement among surgeons was poor and support the need for more objective measures to assess the need for revision surgery.
For a child with a cleft of the lip with or without a cleft palate, the decision to surgically revise the lip is based on a subjective evaluation of lip form and function that is made by the surgeon either independently or in conjunction with the patient and parents (Marsh, 1990). Subjective evaluation defines the current standard of care for patients who are candidates for lip revision; however, recent research has demonstrated many limitations with the use of subjective assessments. Of particular concern are the lack of agreement among clinicians (Asher-McDade et al., 1991; Tobiason et al., 1991; Ritter et al., 2002; Morrant and Shaw, 1996) and a tendency for the assessment of lip form to confound the assessment of lip function (Ritter et al., 2002). For example, when the severity of the deformity of static faces (i.e., faces at rest) was rated subjectively by clinicians, interexaminer agreement ranged from low (Asher-McDade et al., 1991) to good (Tobiason et al., 1991; Ritter et al., 2002), while for the subjective evaluations of cleft faces during movement, agreement among clinicians was consistently reported to be poor (Morrant and Shaw, 1996; Ritter et al., 2002). Moreover, the extent of lip scarring influenced observers’ perceptions of impaired movements of the lips: the more severe the perception of scarring, the more severe was the perception of impaired movement (Ritter et al., 2002). These findings call into question the reliance on subjective evaluations as the only determinant for subsequent soft tissue surgeries (Trotman et al., 2003) and highlight the need for objective measures of lip disability to supplement subjective evaluations.
Abnormalities in facial (lip) form and function may be attributed to three separate mechanisms: (1) mechanical limitations in movement of the perioral tissues due to the effects of scarring, (2) impaired muscle force dynamics, and (3) impaired sensorimotor integration. Regarding the first mechanism, the effects of scarring, objective measurements confirmed that patients with a cleft of the lip have impairments in the maximum extent to which they can move their facial tissues (Trotman et al., 1998; Trotman and Faraway, 1998; Trotman et al., 2000). Relevant to the second mechanism, muscle force dynamics, a number of studies have demonstrated that measures of lip force can be used to characterize both the strength and fine motor control of facial muscles in normal individuals and in patients with somatomotor disorders of the central nervous system (Barlow and Abbs, 1983, 1984, 1986; Barlow and Rath, 1985; Barlow and Netsell, 1986). The results of pilot studies have demonstrated impairments in maximum force capacity and motor control of the lips of children with cleft lip (D’Antonio et al., 1994, 1995). Equally important is the concern related to the third mechanism, sensory integrative (sensorimotor) function, which is defined as the “use of all sensory input as meaningful information which the individual reacts with a well-organized adaptive response” (Chapparo et al., 1981). Dysfunction is characterized by a failure of different bilateral structures in the face and structures in the vertical thirds of the face (i.e., middle to lower facial thirds) to function in a coordinated manner. Normal sensory function and sensation are requisite for normal perioral motor function (Stranc et al., 1987; Essick, 1998). Importantly, the above studies found impairments in motor control not only in the upper lip but also in the lower lip, and our initial studies indicated that many patients with cleft of the upper lip do not have normal sensation (Essick et al., 2005).
To date, however, the gold standard for the assessment of lip function in the patients with a repaired cleft of the lip continues to be subjective clinical assessments. Objective measures that quantify facial soft tissue function in terms of movement, muscle force dynamics, or sensory integrative function have not been employed in formal studies or to monitor surgical treatment outcomes. The objective of this article is two-fold: (1) to describe the overall design of a clinical trial to assess functional outcomes of cleft lip surgery and discuss issues related to the conduct of the trial and (2) in participants with a repaired cleft lip, to compare the results of subjective evaluations by surgeons for or against the need for lip revision surgery.
This study is designed to determine the differences in soft tissue circumoral function between children with a cleft of the lip (with or without a cleft palate) and children who do not have a cleft and to determine whether lip revision is effective in normalizing soft tissue function. Also, given the subjective nature of clinical assessments of facial/perioral form and function by surgeons, an exploratory aim was included to obtain estimates of concordance in subjective assessments among experienced surgeons for planning of future studies aimed at improving these types of clinical assessments. These assessments of the face were made with the face at rest and during different movements and are described in greater detail later.
The investigative team is composed of individuals from the Facial Animation Laboratory, Sensory Laboratory, Craniofacial Center, and Clinical Research Data Center at the University of North Carolina School of Dentistry (UNCSOD), the Statistics Laboratory at the University of Bath, and the Communication Neuroscience Laboratories at the University of Kansas. In addition, an appointed Data Safety and Monitoring Board (DSMB) met annually to monitor patient safety and ethical considerations and to review patient recruitment, retention, and data analysis. The study was a nonrandomized, three-group, parallel clinical trial that was confined to a single center and a single surgeon. The quantitative measures and subjective clinical measures of facial form and function were collected over a 15-month period, and for the participants who had lip revision surgery, the measures were collected before and after the surgery (Fig. 1).
Children and adolescents with cleft lip and palate were recruited from the Craniofacial Center at the UNCSOD. There were three general groups of participants: (1) those with cleft lip and palate judged by the surgeon to be candidates for secondary lip revision surgery based on the current standards of care and thus recommended for, and elected to undergo, lip revision surgery; (2) those with cleft lip and palate who had not received a lip revision surgery at the time of enrollment (this nonrevision group was composed of participants who may have been recommended for lip revision surgery by the surgeon but had chosen not to proceed with the surgery as well as those participants who were not recommended to receive surgery); and (3) a group of noncleft participants recruited as controls from clinics at the UNCSOD. To ensure a similar age and sex distribution among the three groups, the groups were matched on age and sex using group frequency matching.
The inclusion and exclusion criteria that apply to participants in all three groups were as follows.
Additional criteria that applied only to the participants with a cleft lip and palate were as follows.
Participants who met the selection criteria were recruited and screened in the Craniofacial Center, the Graduate Orthodontics Clinic, the Pediatric Dentistry Clinic, and the Orthodontics Faculty Practice at the University of North Carolina. No subject was excluded from participation on the basis of sex, race, or ethnic background. The purpose and protocol of the study was explained to the participant(s) and parent(s), and informed consent and assent were obtained. Consent and HIPAA documents were approved by the School of Dentistry Human Subjects Institutional Review Board. Group frequency matching was performed periodically (every 3 months) to track the adequacy of balance among the three groups.
The participants in the three groups were followed longitudinally for a 15-month period and tested at four times (Fig. 1). Specifically, the participants who had revision surgery were tested in a run-in period of approximately 3 months (time −3) and just before surgery (baseline, time 0). These revision participants received surgery soon after time 0. Testing was repeated at 3 months (time 3) and 12 months after the surgery. The test at 3 months after surgery was selected to coincide with a time point used in evaluating patients after trigeminal nerve injuries (Van Boven and Johnson, 1994; Karas et al., 1990; Yoshida et al., 1989) and the early resolution of edema and swelling, and the test at 12 months after surgery was chosen because scar maturation is completed in approximately 12 months. The nonrevision and noncleft participants, who do not have surgery, were tested at corresponding times, thus maintaining fairly consistent intervals between testing sessions for the nonrevision and noncleft participants (i.e., sessions at −3, 0, 3, and 12 months; Fig. 1) to provide time intervals of testing comparable to that for the revision participants. Participants were asked to allow 1 full day of testing for each of the four sessions.
The change in lip function from prerevision to postrevision in the participants who had lip revision included the treatment effect as well as the change due to maturation across sessions at 0, 3, and 12 months. Given that we did not know whether participants in the revision and nonrevision groups would demonstrate normal maturational change, the nonrevision participants allowed an assessment of maturational change in children who were expected to be impaired. The noncleft group provided information on normal maturational change over the 15-month period of follow-up. It was expected, however, that any systematic difference in the maturation or initial functional impairment in lip function between the revision and nonrevision participants was minimal.
The Craniofacial Team at UNCSOD met weekly to plan the coordinated treatment for all patients enrolled with the team. Prior to the final recommendations being made for the treatment of a patient, team members discussed the recommendations made by the different disciplines on the team. Generally, recommendations for lip revision were made by the plastic surgeon, and discussions occurred with the orthodontist and oral surgeon regarding the desired esthetic outcomes of the revision surgery as well as the need for, and timing of, bone-grafting procedures.
The revision participants had surgery within a few days of the testing at time 0 (Fig. 1). The surgery was performed by the study surgeon, who was experienced in cleft care. All surgeries were secondary correction of a cleft lip and fell into two categories: (1) a full-thickness lip revision or (2) a partial-thickness lip revision. In a full-thickness lip revision, the lip was revised by re-creation of the defect and reclosure. The scar from the previous surgery was excised completely. If the orbicularis oris muscle was not repaired originally, the muscle was realigned and repaired. In all patients, the mucosa, muscle, dermis, and skin were repaired. All unilateral repairs were by the rotation-advancement technique. If the original repair was by a different technique, the original scars were excised in such a way as to allow rotation-advancement closure. For bilateral clefts, the secondary repair was in the configuration of a modified Manchester repair, with a narrower philtrum. For the partial-thickness lip revision, the full thickness of the skin was divided with a partial division of the orbicularis oris muscle. This approach was used in patients who have discrepancies in vertical lip length and symmetry. For either category of lip revision, concomitant nasal correction, when felt to be indicated by the surgeon, was performed by a standard open rhinoplasty with a V-shaped columellar incision. If a cartilage graft was needed, it was harvested from either or both ears via a postauricular incision.
Table 1 gives the quantitative outcome measures that are the best surrogates of facial functional disability. The primary quantitative outcome measure is facial (lip) movement; thus, sample size calculations were based on this measure. Secondary outcomes were measures of lip force, sensation, and sensorimotor integration.
Estimates of variability and mean differences for facial movement during the smile and cheek-puff animations between children with and without cleft lip were obtained from nine children with complete unilateral and bilateral cleft lip and 50 noncleft subjects. These children were studied prior to the initiation of this clinical trial (Trotman et al., 2000). The pilot data indicated considerable variability in asymmetry and magnitude of displacement among the participants with cleft lip and palate; however, the primary focus was the difference in the change in movement measures between participants who had a lip revision and the noncleft healthy children. It was hypothesized that there would be large mean facial movement changes on the order of 50% following lip revision and a much smaller mean change (approximately 10%) resulting from maturation in both the noncleft and the nonrevision cleft lip groups.
Given this background, a first approximation of a sample size per group that would be sufficient to detect a large effect size (≥.80; Cohen, 1988) between the primary comparison groups of interest and the revision and noncleft control groups, with a level of significance of .05 and 90% power, was calculated using an unpaired t test approach (NQuery version 5; Elashoff, 2002) to compare the average 12-month change in the facial movement measures. The estimated sample size of 34 per group was used to generate power curves for a one-way analysis of variance with three groups and a single one-way between-means contrast (comparing two of the three groups assuming the overall test is significant) when the effect size or sample size was varied. The inspection of the power curves (Fig. 2) and the likelihood that power would be improved by the inclusion of covariates (e.g., age, sex) in the analyses that would explain a portion of the variation in the facial movement summary measures suggested that an effective sample size goal of 34 children in each of the three groups would be appropriate.
The study used a three-group, parallel design, and participants were followed for 15 months. All enrolled participants with at least the −3 visit data were included in the full analysis set. The overall alpha level for each analysis was set at .05. No pairwise group comparisons were performed unless the overall F value for fixed effects was statistically significant (p < .05). Based on closed-testing principles, if the overall hypothesis of equality among the three groups was rejected, each of the three hypotheses for equality of pairs of treatments could be conducted at the .05 significance level (Westfall et al., 1999).
The general analytical approach for all the outcome measures was linear mixed-effect modeling. Data from different participants were considered independent, whereas data within a participant (multiple sites tested on the face per visit or multiple visits) were expected to be correlated. Thus, a subject was considered to have a cluster of correlated response data for each outcome. In the linear mixed model, modeling of the variances and covariances was achieved through specification of random effects and/or specification of the variance matrix of the error vector for a subject.
Eight plastic surgeons from different craniofacial centers across the Unites States who were experienced in cleft care viewed and rated photographs and videotapes of 11 revision and 10 nonrevision study participants. The revision and nonrevision participants were selected from all the participants in the clinical trial to represent a wide range of lip scars. For each revision participant, photographic and videotaped views recorded at baseline and at 12 months postsurgery were selected. For the nonrevision participants, similar views recorded at corresponding times were selected. The two views for each of the 21 participants were compiled in random order on a DVD for viewing. In addition, to determine consistency in ratings by individual surgeons, either the baseline or 12-month view of eight participants was repeated on the DVD. Thus, each surgeon viewed a total of 50 sets of photographs and videotapes: eleven baseline and eleven 12-month views for the revision participants; ten baseline and ten 12-month views for the nonrevision participants; and eight repeated views.
For optimal and consistent viewing quality, all the views (Fig. 3) were displayed to each surgeon independently on a computer monitor with a 17-inch screen. Each surgeon was blinded to the participants’ identity, group membership (revision versus nonrevision), and surgical history (baseline versus 12 months). Surgeons were shown the photographic still images first, followed by the video images. By selecting a still image (Fig. 3) for a participant, that photograph was magnified for further evaluation. By selecting either video A or video B (Fig. 3), the surgeon was able to view either a frontal and right profile image (video A) or frontal and left profile image (video B) of the participant performing a smile, lip purse, cheek puff, mouth opening, and natural smile. In addition, surgeons could view the participants speaking the phrase “put the baby in the buggy.”
Prior to the start of the viewing, the research assistant read the same set of instructions to each surgeon explaining the process of viewing the DVD. The surgeons were not told which participants had received surgery, but they were told that participants would be viewed more than once. On viewing each participant, the surgeon was asked the following questions: (1) Do you think this person would benefit from a lip revision surgery? (2) If yes, would the surgery be a minor revision (involving skin only), full-thickness revision (involving the entire depth of skin and muscle), or partial-thickness revision (skin with partial muscle involvement)? and (3) Which view (still or dynamic video images) gave you the most information to make your decision?
Each surgeon was allowed to proceed with the viewing at his or her own pace. The responses to the questions were recorded by the research assistant directly by hand on a form and tape recorded with a handheld recorder for later verification. After verification, the responses were entered into a spreadsheet for data processing. For each surgeon, the entire session of viewing and rating the participants lasted approximately 2 1/2 hours.
Standard descriptive statistics were generated from the ratings of the surgeons and included the following. (1) The percentage agreement for the repeated views was calculated for each surgeon to assess the consistency in the replicate recommendations. (2) The number of surgeons who agreed on a recommendation for a participant, and the Kappa statistics for agreement between pairs of surgeons, was calculated to assess the consistency among the surgeons in the recommendations made. (3) For those participants who had a revision surgery, the number of surgeons who categorized participants into one of the four possible combinations of recommendation (revision at presurgery and no revision at postsurgery, revision at presurgery and revision at postsurgery, no revision at presurgery and no revision at postsurgery, and no revision at presurgery and revision at postsurgery) was calculated to assess the consistency among the surgeons in providing a revision recommendation and to assess whether the surgeon’s recommendations suggested that, in the surgeon’s perception, the revision improved, did not change, or worsened the participant’s condition.
A surgeon’s classification of a patient’s need for lip revision was categorized as nonrevision (either no or only minor revision recommended) or as revision (a partial- or full-thickness lip revision recommended). On viewing the participants, the surgeons commented that they welcomed the opportunity to systematically view the facial photographs and animated video images, and they all felt that the video images provided useful additional diagnostic information.
Five of the eight surgeons demonstrated 100% agreement on their decisions for repeated ratings. Two of the eight surgeons had an 87% agreement, in which only one participant was rated differently after the second viewing. One surgeon had a 75% agreement for repeated ratings, with two participants rated differently after the second viewing.
All eight surgeons made the same recommendation for only two of the 42 viewings (~5%): baseline views of one participant (equivalent to ~5% of the baseline views) and 12-month views of one participant (~5%). Seven of the surgeons, although not consistently the same seven, agreed on the recommendation for 3 (14%) of the 21 baseline viewings and 1 (5%) of the 12-month viewings. Six of the surgeons, although not consistently the same six, agreed on the recommendation for 4 (19%) of the 21 baseline viewings and 6 (29%) of the 12-month viewings. Five of the surgeons, although not consistently the same five, agreed on the recommendation for 6 (29%) of the 21 baseline viewings and 6 (29%) of the 12-month viewings (Table 2). As an additional assessment of agreement, Kappa statistics were calculated for all possible pairs of surgeons (28 pairs): Kappa values ranged from .0 to .57 for the baseline views and from .0 to .44 for the 12-month views. All but one of the Kappa values were less than .45, implying poor agreement among the surgeons.
Table 3 provides the presurgery and 12-month follow-up recommendations made by each surgeon for each participant who had a revision during the study. These recommendations can be categorized into the following combinations.
The percentage of participants who, in the independent, masked ratings by the surgeons, were judged to need a revision at baseline but no revision at 12 months, implying that a perceptible improvement was apparent, was low (9% to 44%). For all surgeons, the recommendation of either no revision at both baseline and 12 months or a revision at both baseline and at 12 months was given for most of participants (from 45% to 73%). The combination of no revision at baseline but a revision at 12 months, implying a perceptible worsening, was not given to any of the participants by four of the surgeons. The other four surgeons chose this latter combination of recommendations for 9% to 36% of the participants.
The purpose of this trial was to assess the efficacy of lip revision surgery on lip and perioral function in a controlled clinical environment. The participants were recruited from the UNCSOD clinics and Craniofacial Center. As a result, the findings may not be generalizable to a broad population; however, should the quantitative approaches used in this study be sufficiently sensitive to detect effects related to lip revision surgery, then these methodologies could be used to compare the effects of different surgical techniques on function in future randomized clinical trials and/or in clinical settings. In support of our approach to the assessment of perioral function are the results of the Eurocleft Intercenter studies, in which the findings of subjective ratings of nasolabial appearance in patients with cleft lip and palate, based on static photographic images, were found to be invalid as surgical outcome measures (Brattström et al., 2005). These researchers noted that “the rating of nasolabial outcome is a key area for further research.” This clinical trial has been totally devoted to outcomes of perioral surgery in patients with cleft lip and palate.
Recruitment goals for noncleft control participants were met (n = 37). As of July 2006, more than 90% of the recruitment goal had been achieved for the nonrevision (n = 32 of 34) and revision groups (n = 31 of 34). The refusal and dropout rates were low in all groups. In the revision group, one patient was dropped at the first visit because the child would not perform the tests. Four potential revision candidates declined participation in the study: one child refused, one child did not show for the first testing appointment, and the parents of two children declined to give parental permission. Recruitment in the revision and nonrevision groups continues. The mean ages of the participants by group (noncleft, nonrevision, and revision) and cleft lip status (unilateral and bilateral) are shown in Table 4. The DSMB assigned to this study met with the study investigators at UNCSOD prior to the start of recruitment and held yearly meetings with the investigators during the grant tenure. Also, teleconferences with the DSMB were held every 6 months to monitor all aspects of the trial. As stated earlier, the nonrevision participants may have been judged to not need revision surgery or they may have been judged by the surgeon to be candidates for lip revision but may have elected not to proceed with the surgery. Of the 32 nonrevision participants, 8 were judged to need a revision and elected not to proceed with the surgery.
The subjective assessments for or against lip revision surgery in the participants with a cleft lip indirectly addressed the rating of esthetic outcome of lip revision surgery in these individuals. Our findings demonstrated that the surgeons’ consistency in rating repeated views of the same participant was moderate to excellent; however, the overall agreement among the surgeons when rating individual participants was low to moderate. Any recommendation for revision surgery by a surgeon depends on the surgeon’s assessment of the participant’s lip form and function at the time of the clinical evaluation and the surgeon’s own clinical expertise related to whether she or he can improve on the lip form and/or function. Therefore, it was not surprising that different surgeons had different thresholds for revision and that there was poor interobserver agreement. These findings are similar to those of other studies in which there was poor agreement among evaluators (Asher-McDade et al., 1991; Morrant and Shaw, 1996; Ritter et al., 2002) when subjectively rating individuals’ facial features. Thus, substantially more work is needed in developing consensus among surgeons, and this work is one aspect of our current research.
During the rating sessions in this study, the baseline (before surgery) and 12-month (after surgery) views of the revision participants were presented to the surgeons in a random manner, and the surgeons did not know which were the before or after surgery views. The expectation was that for most of the revision participants, surgeons would recommend a revision at the baseline viewing and would not recommend a revision at the 12-month viewing. The findings show that the percentage of participants who were perceived by the surgeons as needing a lip revision at baseline and then at 12 months varied considerably among the surgeons, suggesting that the surgeons did not process or rank the information presented to them in the same way. In fact, after viewing the 12-month sets of images for the revision participants, all the surgeons made the recommendation of further revision surgery for at least one participant. One conclusion of this finding could be that the outcome of the revision surgery in some participants was ineffective; however, this conclusion would be premature. Other factors could lead to this outcome. For example, in certain cases, a determination may be made by the surgeon that more than one revision may be necessary to achieve a satisfactory surgical outcome. Such a situation may occur with revisions for a bilateral cleft lip. The surgeon may stage the surgery, revising one side of the lip at any one time and the other side at a later date, or the surgeon may revise both sides on the lower part of the lip and defer the upper part that is related to the columella for a subsequent surgery. In this study, however, of the 11 revision participants who were rated by the surgeons, only 3 had a repaired bilateral cleft lip, and all of their revisions were planned and fully completed by the surgeon with one surgery.
Another factor affecting the decision by surgeons for additional surgery on viewing the 12-month images may be related to the length of time at which this judgment was made and the extent of visible scarring at that time. Some patients develop hypertrophic scarring following surgery that requires a considerable period of time to resolve. The 12-month followup of this study may not have been long enough for surgeons to make a final decision regarding resolution of scarring. In addition, the presence of a scarring confounds ratings or judgments of lip movement (Ritter et al., 2002), which may have affected the surgeon ratings when they used the video images. The focus of this study, however, was on the agreement among surgeons for or against the need for revision surgery. These additional factors, although interesting, would have had a minimal impact on these results. One other factor, described earlier, that could have affected surgeon ratings and may have confounded the results somewhat was that surgeons may have perceived a need for revision surgery but may not have believed that they could achieve an improvement of the lip form and/or function, a factor that relates to surgeons’ willingness to perform the surgery and their competence.
This study was supported by grant R01 DE13814-01A1 from the National Institute of Dental Research (NIDCR).
Dr. Carroll-Ann Trotman, Department of Orthodontics, University of North Carolina at Chapel Hill, North Carolina.
Dr. Ceib Phillips, Department of Orthodontics, University of North Carolina at Chapel Hill, North Carolina.
Dr. Greg K. Essick, Department of Prosthodontics and Curriculum in Neurobiology, University of North Carolina at Chapel Hill, North Carolina.
Dr. Julian J. Faraway, Department of Mathematical Sciences, the University of Bath, Bath, England.
Dr. Steven M. Barlow, Communication Neurosciences Laboratory, University of Kansas, Lawrence, Kansas.
Dr. H. Wolfgang Losken, Department of Plastic and Reconstructive Surgery, University of North Carolina School of Medicine, Chapel Hill, North Carolina.
Dr. John van Aalst, Department of Plastic and Reconstructive Surgery, University of North Carolina School of Medicine, Chapel Hill, North Carolina.
Ms. Lyna Rogers, Department of Orthodontics, University of North Carolina at Chapel Hill, North Carolina.