|Home | About | Journals | Submit | Contact Us | Français|
The individual thickness of the stratum corneum is required to normalize drug permeation profiles in dermatopharmacokinetic studies. The thickness is often estimated using tape-striping combined with transepidermal water loss measurements. A linear transformation of Fick’s first law is used to relate the progressively thinner barrier with the corresponding increase in transepidermal water loss and to estimate the thickness by linear regression. However, the data from an important subset of subjects are poorly fitted to this linear model. This is typically due to the removal of loose outer layers of stratum corneum, which do not contribute significantly to barrier function. This work proposes two alternative non-linear models. All three models were used to fit data from 31 in vivo tape-striping experiments and their outcomes and goodness-of-fit compared. The results suggest that the linear model may overestimate the stratum corneum thickness and is open to subjectivity regarding the selection of data points to be fitted. The non-linear models satisfactorily fitted all the data, including all data points. No significant differences were found between the thicknesses derived from the two non-linear models. However, the analysis of the goodness-of-fit of the models to the data suggests a preference for a baseline-corrected approach.
The individual thickness of the stratum corneum (SC) is required in order to normalise drug penetration profiles from different volunteers during dermatopharmacokinetic studies. Bioavailability describes the rate and extent to which a drug, in an active form, reaches its target site. In the case of topical formulations seeking a local effect, the target site is the skin. By measuring the ‘rate’ and ‘extent’ of drug penetration into the skin, so-called dermato-pharmacokinetic parameters may be derived. These parameters and the drug penetration profiles provide information with which to assess bioavailability and demonstrate bioequivalence between different formulations. It has been suggested that since the stratum corneum, SC, is the principal barrier to drug absorption, that kinetic data of drug passage through this layer can be related to bioavailability in the target tissue [1–3].
The tape stripping technique has attracted considerable interest from regulatory bodies, such as the US Food and Drug Administration, as a means to determine dermato-pharmacokinetic parameters, and ultimately test bioequivalence between formulations [2,3]. Tape stripping involves the sequential removal of layers of the stratum corneum using adhesive tapes. It can be performed with minimum discomfort and relative ease in vivo. Typically, formulations are applied to the accessible volar forearm, which has a SC thickness shown to be consistently between 10–20 μm for most volunteers . Restoration of the 80–90% of SC’s barrier function is complete within 3 days  and transepidermal water loss (TEWL) values return to normal after eight  to eleven  days.
In 1998, the FDA issued guidance on the tape stripping technique for evaluating drug penetration through the SC . It was later withdrawn when the comparison of three products by two laboratories resulted in contradictory outcomes. Since then, some problems with the original guidance have been identified: (i) discarding two tape strips has lost favour, as drug in these outer layers would probably become available for absorption eventually [1,8] (ii) in an attempt to relate the drug concentration to a particular depth within the SC, and in order to normalise the data derived from different volunteers [1,8–12], the total thickness of the SC is now measured [13,14].
An exact measurement of the SC thickness (H) is impossible in a completely non-invasive way. However, a version of Fick’s first law allows relating the easily measurable TEWL value from an intact SC, to several SC parameters [15,16]:
Where TEWL0 is the baseline TEWL, D is the diffusion coefficient of water in the SC; K is the SC-viable tissue partition coefficient of water; ΔC is the water concentration gradient, and H is the thickness of the SC. This equation assumes that the SC is the main barrier to water loss, and that it provides a homogeneous barrier to water diffusion, justified by Kalia et al. .
Therefore, once a certain cumulative thickness of the SC, x, is removed by tape stripping, the TEWL will increase as follows:
Experimental data expressed according to Eq.(3) should yield a single straight line that can be analyzed by simple linear regression. Extrapolation of this linear regression to 1/TEWL = 0 yields x = H, allowing H to be found.
However, data from a significant subset of subjects are not satisfactorily fitted by this Linear Model (LM). In these cases, representation of the data according to Eq.(3) yields an initial plateau followed by a declining straight line. It could be hypothesized that this initial plateau corresponds to the removal of the stratum corneum disjunctum; the looser outer stratum corneum layer [18–21] which does not provide a significant contribution to the barrier to water loss. It follows that its removal would contribute a significant mass change without a corresponding increase in TEWL. Thus, the Linear Model (Eq.(3)) which assumes that the SC barrier properties are homogeneous, cannot reflect experimental findings accurately.
Another disadvantage of the LM results from the mathematical procedure: upon inversion of the TEWL values into 1/TEWL, there is also an inversion of the errors associated with each data point, which can skew the goodness-of-fit of the linear regression .
The aim of this work was to develop an improved model which avoids the problems mentioned above. We propose first the Baseline-corrected Non-Linear (BC-NL) model (Eq.(4)) which (a) fits directly TEWL versus cumulative thickness data, thus avoiding the errors associated with the reciprocal transformation , and (b) incorporates a baseline parameter, B, to reflect the initial plateau during which TEWL remains constant despite removal of some layers of SC.
To decide on the best approach for estimating the stratum corneum thickness we examined data sets collected from different volunteers on different occasions. We evaluated the performance of the three models and compared their outcomes. Each set of data (TEWL versus cumulative thickness removed) has been analyzed separately using each of the three models: LM, S-NL and BC-NL. The SC thickness (H) estimated for each volunteer by each model is compared, along with a discussion of the goodness-of –fit, and the relative merits and weaknesses of each method.
18 healthy volunteers (3 male, 15 female, age range 22–43 years), with no history of dermatological disease, participated in the study. Ethical approval was granted by Salisbury Local Research Ethics Committee, the Declaration of Helsinki protocols were followed, and written informed consent was obtained from all volunteers. A total of 31 sites were examined over a period of 2 years. Participants refrained from using any topical products on the test area on the day of the experiments. As TEWL measurements may be affected by sweating or the changes to the relative humidity or temperature of the laboratory, the volunteers rested in the room for at least 15 minutes prior to taking first TEWL measurement; the experiment was completed in less than 1 hour; and repeated initial TEWL measurements were taken until they stabilised. The mean (±SD) temperature and relative humidity in the study room were 21.9±1.5°C and 37.8±9.7% respectively. Subjects were given code numbers (1 to 18) and nine subjects participated only once in the study. Six subjects participated twice; Subjects 9 and 14 participated thrice, and subject 8 participated four times. Repeated participation was coded with a letter, for example subject 1a and 1b for the first and second participation of subject 1. Repeat participation using the same arm was delayed by at least 1 month, which is sufficient for barrier regeneration [5–7].
Tape stripping procedure: Two tapes, with pressure applied using a roller, were taken and discarded. These pre-tapes are also taken before all dermato-pharmacokinetic experiments in our laboratories, to remove and exogenous substances and prepare the skin surface in a systematic way. A plastic template was applied, to delimit a constant area to be stripped. An initial TEWL measurement was taken with a closed-chamber evaporimeter (Biox Aquaflux AF102, Biox Systems Ltd, London, UK; measurement range 0–100 g.m2h−1; resolution ±0.05 g.m2h−1; the probe was applied for a minimum 60 s, the TEWL was obtained as mean of 10 successive measurements having a CV<1%).
A preweighed (Sartorius Microbalance SE-2F, precision 0.1μg; Sartorius AG, Goettingen, Germany) piece of Scotch Book tape 845 (3M, St Paul, MN) was placed over the template and adhesion to the skin was assured systematically with a set number of rolls with a roller. The tape was removed swiftly and another TEWL measurement taken. The sequence was repeated until the TEWL value was 3–4 times its initial value (usually 60–80g.m−2h−1). All tapes were reweighed following completion of the stripping procedure, with prior removal of any hairs as necessary. Static electricity was discharged from the tapes prior to weighing, using an Eltex R50 discharging bar, with an Eltex ES50 power supply (Eltex Elektrostatik GmbH, Weil am Rhein, Germany). 3–5 blank tapes were weighed at the same time than the tapes used for the tape-stripping experiments. Any change in the mass of these blank tapes was used to correct the calculated mass of SC for any variations in weight due to environmental or other conditions. For a detailed review on the tape-stripping technique the reader is referred to Herkenne et al. 
Linear regressions were performed on each data set of 1/TEWL versus cumulative SC thickness removed, using GraphPad Prism® (version 4.00 for Windows, GraphPad Software, San Diego, CA). All the data points were fitted into the regression unless stated otherwise. All the slopes were significantly non zero (p<0.0001) which confirmed a statistically significant relationship between 1/TEWL and cumulative SC thickness removed.
All data sets were evaluated separately using WinNonLin® software (Version 5.1, Pharsight Corporation, Mountain View, CA) using an ASCII user-defined model written for models S-NL and BC-NL. In all cases, no data points were excluded; uniform weighting was applied; no bounds were used; and the initial parameter estimates were: H = last x value + 1; (D.K. C) = 30; B = initial TEWL value before stripping (for BC-NL model only). Iterations continued until relative change in weighted sum of squares <0.000100. The model derived parameter estimates were used as the new ‘initial’ values in the model, and the model rerun until there was no change in the parameter output. All parameter estimates are those from the final iteration of the model, with the smallest resultant residual sum-of-squares; however, usually, no change was seen after the first run, suggesting the modelling was stable. All statistical tests for comparisons between parameter outputs for different data sets were done using GraphPad Prism®.
The statistical evaluation of ‘goodness-of-fit’ of a model is not a trivial matter, especially in the case of non-linear models [22,24]. Obviously, a first step involves a visual graphical assessment of how well the model fits the experimental values. Models are usually evaluated for their accuracy; but this was not possible as there are no independent methods to obtain SC thickness that can be considered the “gold standard”. Thus, in an effort to evaluate objectively all three models, a series of statistical tools were considered concurrently.
First of all, the precision of the parameter estimates is evaluated using the coefficient of variation (CV, %), which relates the parameter’s standard error of the regression (SER) to its estimate (p):
The SER is an absolute measure whereas CV is a relative error that can be used to compare the three models. Normally, the model resulting in the smaller CV should be preferred.
Next, the two non linear models were compared; it would be expected that the BC-LN would fit the data better than the S-NL model, simply because it has one extra parameter. Thus, two statistical tools, the Akaike Information Criterion and an F-test, that take into account the difference in the number of parameters, were used to compare objectively the two non-linear models.
The Akaike Information Criterion (AIC) for the S-NL and BC-NL models was calculated for each of the 31 data sets separately as follows :
where Nobs is the number of data points of each data set; WRSS is weighted residual sum of squares (provided by WinNonlin®); Npar is the number of parameters which were 2 (“DKΔC” and “H”) and 3 (“DKΔC”, “B” and “H”) for the S-NL and the BC-NL models respectively. The absolute value of the AIC for a single model on its own is meaningless; hence the AIC is always used to compare several models. Briefly; the model with the lowest AIC is more likely to be correct. Furthermore, the probability that the BC-NL model is correct, rather than the S-NL model, for a given data set can calculated via the difference in the AIC scores, ΔAIC = AICBC-NL – AICS-NL, as follows :
The S-NL and BC-NL models can be considered as nested models; that is, the BC-NL model can be considered an extension of the other. Crucially, both models would be identical if a single parameter, B, is set to zero. Under these conditions, an F test [22,25] can examine the effect of the additional parameter on the WRSS. The F* value was calculated separately for each data set:
where: df1 and df2 are the degrees of freedom for the S-NL and BC-NL models respectively, and WRSS1 and WRSS2 the corresponding weighted residual sum of squares (both provided by WinNonlin®). The F* was then compared to critical values (Ftable) taken from F tables  for a two-tailed test, with a p value of 0.05, column value = |df1−df2|; and row value = df2. If F* is greater than Ftable, it can be concluded that the full model is better than the reduced model.
The AIC and the F test consider which model, S-NL or BC-NL, may be more appropriate for each data set individually. However, an overall preferred model may be suggested based on the preferred model selected for all 31 cases.
TEWL was measured before and after each tape strip in 31 experiments. The average baseline (before stripping) and final (after stripping) TEWL values, along with their standard deviations, were 10.12 ± 2.65 g.m−2h−1 and 61.48 ± 18.78 g.m−2h−1 respectively. This increase in TEWL is due to the barrier disruption and is in agreement with previous work [13,14,16,27].
The first method considered was the linear model, LM, which has been widely used [1,8,10–13,28] since its introduction [14,16]. The linear transformation of Fick’s first law (Eq.(3)) predicts a straight line when the 1/TEWL values are plotted versus the cumulative thickness of the stratum corneum removed (x). Figure 1.a shows example plots for two volunteers: a single straight is observed for both subjects; the linear regression fits the experimental data well (R2=0.99) and the thickness of the stratum corneum is easily extrapolated from the regression line.
However, there were several sets of data which were not satisfactorily described by the linear model. Figure 2.a illustrates five of these examples which show an initial plateau before a clear linear descent is observed. Clearly, the LM fails to fit all the experimental data to the regression line. Worryingly, there is a clear potential for overestimation of H as the slope of the regression line is shifted upwards to include the initial plateau as figure 2 illustrates. The dilemma is how to deal with this type of data which was apparent in approximately half of the 31 experiments performed.
Obviously, the key question is the reason for this initial plateau. We could not find any trend (age, gender…) that would assign this plateau to any “skin type”. In any case, we hypothesised that the initial plateau is due to loose outer portions of SC, which are soon to be lost through natural shedding. These loose portions seem to constitute a minor barrier to water loss, but do represent a considerable mass when removed. This agrees with the heterogeneous nature of the SC structure [18–21]; where the SC compactum evolves into the SC disjunctum as the corneocytes migrate towards the surface, and progressively loose their corneodesmosome links. The cells of the SC disjunctum do not contribute significantly to the water barrier function of the SC. Therefore, when the SC disjunctum layers are removed, a large mass change, and hence cumulative SC thickness removed, x, is registered, without a corresponding increase in TEWL, resulting in the plateau. Since these layers represent a poor barrier to water loss, it can be argued that they would also constitute a reduced barrier with regards drug ingress, and hence should be considered separately from the main SC barrier during the estimation of H.
A potential solution is to subjectively exclude these initial plateau values and fit the rest with LM. An example of such procedure is illustrated in Figure 2.a. The exclusion of the initial plateau values provides higher R2 values, and an 8.2 – 17% change (0.87–1.72 μm) in H. Because the regression line fits only the declining straight line, its gradient shifts downward, resulting in lower values of H as shown in the Figure 2.a. It should be noted that these fits are produced through the exclusion of a considerable proportion of data points: 9 out of (27) total data points; 12 (24); 8 (18); 4 (17); and 4 (15) for the subjects sequentially shown in Figure 2.a. Unfortunately, is impossible to define an algorithm to exclude these points objectively, thus, a researcher would be forced to subjectively exclude points by eye in order to improve the linear fit if this approach was adopted. This potential subjectivity in deciding upon the portion to be fitted by the regression constitutes an important disadvantage of the LM. Clearly, any step of a future dermato-pharmacokinetic “modus operandi” aiming to compare topical formulations in an objective way, should allow minimal room for inter-laboratories or inter-researcher variability. Therefore, the crucial step of H determination, in order to normalise data from different volunteers, should clearly be standardised.
To attempt to remedy the problems associated with the LM, namely the poor fit and the potential subjectivity of removing data points, two alternative non-linear models have been proposed: the Simple Non-Linear (S-NL) (Equation 2) and the Baseline-Corrected Non-Linear models (BC-NL) (Equation 4) which were applied to all data sets with no data points excluded. The first step to evaluate their performance is a visual assessment of the model’s predicted fit as compared to the raw data points. Figures 1 and and2,2, section b, show 7 examples of data fitted with the S-NL and BC-NL models. The predicted line fit the experimental data closely in all cases, independent of the existence of an initial plateau (Figure 2). A comparison of the two non linear models, suggests that the BC-LN model fits the earlier TEWL values slightly better, which was the rationale behind the introduction of the baseline parameter, B, into this model. The baseline parameter allows some vertical translocation of the resultant fit, and thus improves the fitting in some cases (subjects 3a, 5b and 6).
Figure 1 shows two sets of data 1a and 2b which were well fitted by all three models. The estimated total SC thickness for subject 1a was 12.9±0.19μm (LM), 13.1±0.07μm (S-LN) and 13.2±0.16μm (BC-NL). The estimated H for subject 2b was 8.8±0.12μm, 8.5±0.10μm and 8.0±0.17μm according to the LM, the S-NL and the BC-NL models respectively. Thus, when there is no initial plateau, namely for subjects having little SC disjunctum, the three models fit well the data and the values of H obtained are very similar. Other sets of data that showed this behaviour are highlighted on Table 1. Next, we should consider the data shown in Figure 2, or cases showing an apparent initial plateau. Contrary to the LM, both non-linear models fit the data very well. Both non-linear models resulted in very close values of H which were approximately 1μm smaller than those estimated by the linear model. Interestingly enough, when values are excluded from the linear fitting the differences become smaller. For example, the estimated values of H for subject 5a were 8.5±0.04μm (S-NL) and 8.4 ± 0.05μm (BC-NL); clearly lower that that estimated by linear regression 9.9 ± 0.45 μm. However, when 12 points were excluded from the linear regression, the value of H decreased to 8.26 ± 0.07μm, very similar to the non-linear estimates. A similar trend is observed for all the cases shown in Figure 2 and for a total of 16 of the 31 cases analyzed. This suggests that the three models would agree on a common value for H if the linear regression is applied only to the latter linear portion of the data. However, as discussed before, excluding the initial plateau values, and deciding on linearity is a process subjected to researchers’ subjectivity. On the contrary, the two non-linear models fit all the experimental data letting the model to correct for the baseline in an objective way.
The values of H estimated by the S-NL and BC-NL models were very similar in cases 3a, 5a, 5b and 7 (Figure 2). Some differences between the S-NL and the BC-NL models are illustrated by Subject 6: the S-NL prediction misses six experimental points, while the BC-NL misses only two; the latter model is probably aided by the vertical translocation permitted by the baseline parameter. This results in a slightly larger (1.16μm) difference in the values of H estimated by the two non linear models in this case.
Before discussing the goodness of fit of the three models by statistical tools we should discuss whether H, the parameter of interest for dermato-pharmacokinetic studies, is significantly different when estimated by different models. A compilation of the H estimates together with the standard error of the regression (SER) and the coefficient of variation (CV(%)) for the 31 data sets is presented in Table 1. The H and corresponding SER may be compared graphically in Figure 3. A matched-observations Friedman test (equivalent to a non-parametric 1-way matched ANOVA) followed by the corresponding Dunn’s post-test was used to compare the values of H estimated by the LM (with no data excluded), the S-NL and the BC-NL models. The test concluded that the value of H estimated by the linear model was statistically significantly greater than that derived from the S-NL (p<0.01) and the BC-NL (p<0.001) models. In other words, the linear model tends to overestimate the thickness of the stratum corneum. On the other hand, although the S-NL model tends to estimate a higher value of H than the BC-NL model, the differences between the non-linear models did not reach the level of statistical significance (p <0.05).
Table 1 also shows the coefficient of variation (CV%) associated with each value of H and model. It is interesting to note that in 28 of the 31 cases considered, the highest relative error was associated with the linear method. The performance of the linear model can be improved but only via the exclusion of some data points. Overall, these results suggest that the two non-linear models proposed here offer a better-quality fit than the linear model; allowing a better estimation of the parameter H (lower CV%), in an objective way.
Finally, we should discuss the relative performance of the two non-linear models. The S-NL model fits directly experimental values (TEWL and cumulative thickness removed) to Fick’s first law. This basic difference already provides a certain advantage over linearly fitting the transformed values of water loss (1/TEWL) . The baseline parameter was built into the BC-NL model to describe the removal of the outer SC layers that do not significantly contribute to the barrier to water loss. However, the BC-NL model would be expected to fit the data better, and have smaller residual sum-of squares than the S-NL model simply because it has one extra parameter. Therefore, to decide which model was superior, the use of some statistical tools that take into consideration the number of parameters used by each model was required, as outlined in Materials and Methods. Two statistical tools: the AIC and an F Test (Table 1) (described in Materials and Methods) were used to compare the non linear models. The F test performed here is only applicable for nested models (i.e., where one model (S-NL) can be considered a simplified version of a more complex one (BC-NL). The value of F* calculated is compared to a critical F value obtained from published F tables (26). If the value of F* is bigger than the critical value, the null hypothesis can be rejected and we can accept that the more complex model fits better the data. Table 1 shows that in 19 of the 31 cases, the BC-NL model is preferred (p<0.05).
The AIC offers complementary information as it tells us the probability of the preferred model being the correct one. The value ΔAIC = AICBC-NL – AICS-NL was calculated as described in materials and methods and the values are shown in Table 1. The model with the lowest AIC is considered superior, therefore negative values of ΔAIC indicate that the BC-NL is preferred. The magnitude of this difference can be used to determine the probability of this assumption being correct via Equation 7. For simplicity, Table 1 always shows the probability of the BC-NL model being the correct one. For example, for subject 1b, the negative value of ΔAIC (−37.74) indicates that the BC-LN model fits better the data than the S-NL model, and that there is a 100% chance that the BC-NL model is the correct one. On the contrary, in the case of subject 2a, the positive value of ΔAIC (1.84) indicates that the S-NL model is superior in this case, and that there is a 71% chance of the S-NL model being correct (29 % chance for the BC-NL model shown in the column). On the whole, the ΔAIC of 25 of the 31 data sets were negative, indicating a preference for the BC-NL model (probability of BC-NL being correct >61%). In fact, in 20 of the 31 cases there was a probability ≥90% of the model BC-NL being correct.
The use of these two parameters can be illustrated by considering subject 6. The SC thickness for subject 6 is 8.6±0.2μm (CV=2.3%) according to the S-NL model and 7.4±0.07 (CV=1%) according to the BC-NL model. The difference in the ΔAIC for this subject was −32.39, resulting in a probability of 100% the BC-NL model being correct; the F test also shows that the BC-NL model fits better the than the S-NL. This is in good agreement with the graphic representation: the S-NL prediction misses 6 experimental points, while the BC-NL misses only two. As discussed before, approximately 25 of the 31 cases were better modelled by the BC-NL model, indicating the usefulness of the inclusion of the baseline parameter to fit the full breadth of the experimental data. In summary, the results in Table 1 suggest that the BC-NL has a better overall performance than the S-NL model.
In addition, Table 2 and Figure 3 show the data for some volunteers who participated on different occasions. It is worth noting that the H estimated on different occasions may differ markedly for the same volunteer, independent of the model used to fit the data. See for example, subjects 1–3, 5, 8, 9, 10, 12 and 14 (figure 3). These differences imply that the SC thickness changes with time, potentially due to environmental conditions and the use of drugs, cosmetics or exfoliating agents. It follows that a determination of the SC thickness is necessary every time a dermato-pharmacokinetic study is performed in a subject, so the correct thickness is used to normalise the drug-penetration profiles.
In summary, we believe that non-linear models perform better than the standard linear model typically used to estimate the stratum corneum thickness via tape-stripping experiments combined with TEWL measurements. When the LM performs well, that is in the absence of an initial plateau, the three models estimate comparable values of H. However, when an initial plateau appears, probably corresponding to the removal of the SC disjunctum, the SL model tends to overestimate the thickness of the SC unless some subjective exclusion of data is made. On the contrary, the non-linear models offer a robust procedure, incorporating all data points, reflecting better experimental observations, and estimating the thickness with a smaller coefficient of variation.
The incorporation, rather than the exclusion, of the initial plateau by the non-linear approach should be preferred as it follows more closely the SC physiology and its division into the SC compactum and disjunctum. The statistical comparison of the two non-linear models showed a higher probability for the baseline corrected model being the preferred one. However, there were no statistical differences between the values of H estimated from either of these models.
We thank Pharsight Corporation Inc. for a PAL WinNonlin® licence and Dr. Dan Weiner for scientific input during the development and assessment of the non-linear models. The financial support of the U.S. National Institutes of Health (EB-001420) is gratefully acknowledged. We thank Prof. R.H. Guy and other members of our group at the University of Bath for encouraging discussions.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.