This study collected data from a large cohort of young non-pregnant women. The sample included women from each social class, with a wide range of educational achievements and living conditions. Strengths of the study were that the data were interviewer-collected and the response rate was good: 75% of the women contacted agreed to take part in the study. The high completion rate of FFQs was achieved by the use of laptop computers for data collection wherever possible. There is concern that FFQs may be subject to bias (Byers, 2004
). However, in the context of dietary pattern analysis, Hu et al (1999)
showed that a FFQ revealed similar patterns of diet as weighed diet records and that individuals’ scores on both were strongly positively correlated.
Principal component analysis produced two components with a clear interpretation. The first component was termed the ‘prudent diet score’, in line with published data (Slattery et al, 1998
; Hu et al, 1999
; Fung et al, 2001
; Osler et al, 2001
); women with high scores had diets in accordance with recommendations from the Department of Health (Department of Health, 1994
; Department of Health, 1998
) and other agencies. All coefficients for the second principal component were positive, indicating that to obtain a high score a woman would have a generally high food intake. Indeed, Pearson’s correlation coefficient between the second component and energy intake was 0.81, and therefore the second component was termed the ‘high-energy diet score’.
The prudent and high-energy diet score together explain 14.6% of the variation in the 49 food and food groups. Direct comparisons of the proportion of variation explained by a set of components cannot be made across the literature since it is highly dependent on the number of variables entered into a PCA and the number of components retained. However, when the SWS results were compared to analyses with a similar number of variables entered and components retained, the proportion of variation explained by the SWS was highly comparable (data not shown).
Cluster analysis resulted in two distinct groups of participants in the SWS. The diets of women in Cluster 1 appeared to be less healthy than those of women in Cluster 2. Similar ‘more’ and ‘less’ healthy dietary clusters were found in a large survey of English adults (Margetts et al, 1998
) and a smaller study of older US lung cancer cases and controls (Tsai et al, 2003
). Healthy clusters of British subjects were also identified in the National Diet and Nutrition Survey (Pryer et al, 2001a
) and the Dietary and Nutritional Survey of British Adults (Pryer et al, 2001b
PCA and cluster analysis are both useful approaches to the assessment of dietary patterns, and maximum information may be obtained when different methods are used (Newby and Tucker, 2004b
). Strengths and limitations of PCA and cluster analysis are discussed in detail by Michels and Schultze (2005)
. A commonly cited criticism of the two techniques is that they involve several subjective but important decisions, such as grouping of foods, and possible transformations of variables. Principal component analysis involves decisions about the number of components to retain and their subsequent labelling. Cluster analysis requires choices about the method of clustering and labelling of the clusters. Another disadvantage of PCA and cluster analysis is that they generate patterns based on variation in diet, but there is no guarantee that these patterns will be predictive for a particular health outcome. However, the techniques have the advantage that they are empirically derived, and are therefore not limited by current knowledge. Furthermore PCA and cluster analysis can combine information about all aspects of diet and are based on food intakes, meaning that they may be more relevant to dietary choices than summaries involving nutrients.
Comparison of the results of the dietary pattern analyses revealed that the more healthy cluster had a much higher average prudent diet score than the less healthy cluster; in fact the two clusters were almost a dichotomy of the prudent diet score, but were less strongly related to the high-energy diet score. These results are similar to those of Costacou et al (2003)
and Bamia et al
(2005) who compared principal component analysis with cluster analysis. Costacou’s first principal component, which resembled a Mediterranean diet, was considerably higher in one cluster than in two others. Bamia’s first component was labelled ‘vegetable-based’, and was much higher in cluster A than in clusters B and C.
The analyses of robustness in the SWS indicate that differences in results from two similar cluster analyses may be due to a dichotomy of the prudent diet score at a different point, thus highlighting the relationship between results using the two techniques. Newby and Tucker (Newby and Tucker, 2004b
; Newby et al, 2004a
) also note there is evidence that underlying eating patterns are revealed by both principal component analysis and cluster analysis.
In the SWS the two cluster solution was clearly indicated. In this context, since the two techniques reveal similar patterns, the question of whether PCA or cluster analysis is preferable may be similar to that of a researcher with the choice to analyse body mass index as a continuous variable, or to use a cut-point to dichotomise it into an overweight and a not-overweight group. In many circumstances it may be less informative to use the dichotomous variable than the continuous variable in an analysis. A solution with more than two clusters could be more informative, but was not indicated as the most appropriate way of clustering the participants by a tree diagram. A continuous score resulting from PCA can be particularly useful, and characterising an individual’s diet using a continuous score might be considered a more pragmatic choice than assignment to one of a number of discrete categories. PCA also gives the opportunity to explore more than one dimension of variation in diet, such as the high-energy diet, which was not revealed by cluster analysis in this study. PCA appeared to be somewhat less sensitive to outliers and differing groupings of the foods, and the coefficients altered little when only half of the dataset was analysed.
All dietary pattern analysis techniques have a contribution to make and may reveal similar patterns. In the context of the SWS, PCA was seen to be particularly valuable as a general discriminatory tool. Other studies are needed to replicate these results, particularly the close association between the two-cluster solution and the prudent diet score. The robustness of PCA and cluster analysis, and their associations with other measures of diet, are important considerations when assessing the usefulness of each technique. With the development of robust, meaningful dietary pattern analysis techniques it should be possible to understand better the role of dietary patterns in health and disease.