|Home | About | Journals | Submit | Contact Us | Français|
Principal component analysis is a popular method of dietary patterns analysis, but our understanding of its use to describe changes in dietary patterns over time is limited. We assessed the diets of 12,572 non-pregnant women aged 20-34 from Southampton, UK using a food frequency questionnaire, of whom 2,270 and 2,649 became pregnant and provided complete dietary data in early and late pregnancy respectively. Intakes of white bread, breakfast cereals, cakes and biscuits, processed meat, crisps, fruit and fruit juices, sweet spreads, confectionery, hot chocolate drinks, puddings, cream, milk, cheese, full-fat spread, cooking fats and salad oils, red meat and soft drinks increased in pregnancy. Intakes of rice and pasta, liver and kidney, vegetables, nuts, diet cola, tea and coffee, boiled potatoes and crackers decreased in pregnancy. Principal component analysis at each time point produced two consistent dietary patterns, labeled ‘prudent’ and ‘high-energy’. At each time point in pregnancy, and for both the prudent and high-energy patterns, we derived two dietary pattern scores for each woman: a ‘natural’ score, based on the pattern defined at that time point, and an ‘applied’ score, based on the pattern defined before pregnancy. Applied scores are preferred to natural scores to characterize changes in dietary patterns over time because the scale of measurement remains constant. Using applied scores there was a very small mean decrease in prudent diet score in pregnancy, and a very small mean increase in high-energy diet score in late pregnancy, indicating little overall change in dietary patterns in pregnancy.
Many studies collect dietary data longitudinally. Such data provide an opportunity to assess whether diets at a population level are stable over time, and also whether there is stability in diet at an individual level, known as tracking (1).
Multivariate statistical methods such as principal component analysis (PCA) have become increasingly popular as a means of deriving dietary patterns (2, 3). A range of studies have assessed the stability of dietary patterns over time at a population level by performing separate PCA or factor analysis at each time point (1, 4-15). With very few exceptions these analyses show that patterns found at baseline are replicated with only slight variation at follow-up time points.
Principal component analysis generates dietary patterns by computing coefficients for each food or food group in the analysis; individual dietary pattern scores are calculated by multiplying these coefficients by the individual’s reported frequencies of consumption, to provide a score for every participant. When characterizing change in individual pattern scores over time a particular issue is whether to calculate scores at a follow-up time point using the coefficients defined at that follow-up time point, or the coefficients at an earlier baseline time point. Northstone and Emmett (13) use the term ‘applied’ scores to refer to those scores calculated at a follow-up time point using coefficients obtained from the data at a baseline time point. Here we will adopt the same terminology, and in addition we will label scores at a follow-up time point that are calculated using the coefficients obtained from the data at that follow-up time point as ‘natural’ scores.
Whilst several researchers have used natural scores to describe changes in dietary patterns (1, 5-7, 10, 12), Prevost et al. (4), Mishra et al. (9) and Borland et al. (14) all chose to use applied scores, basing dietary scores at a follow-up time point on patterns determined by principal component analysis or factor analysis at a baseline time point. Mishra et al. extended the ideas of Schulze et al. (16) in order to produce a simplified dietary score that is applicable at different time points. An advantage of applied scores is that the scale of measurement remains constant. However, only Northstone and Emmett (13) have compared natural and applied scores, concluding that natural scores are more appropriate in their study where there are changes in the food frequency questionnaire over time.
This paper reports the results of dietary assessment before pregnancy and in early and late pregnancy in a large, contemporary cohort of UK women. We wanted to see whether dietary patterns change in pregnancy, and to examine the advantages and disadvantages of using natural and applied scores to describe these changes. Dietary patterns at each time point are presented and the data are used to address the question of whether natural or applied scores are preferable to assess tracking of individual diets over time.
The Southampton Women’s Survey (SWS) has assessed the diet, body composition, physical activity and social circumstances of a large group of non-pregnant women aged 20 to 34 years living in the city of Southampton, UK. Full details of the study have been published previously (17). Women were recruited between April 1998 and December 2002 through general practices across the city. Each woman was sent a letter inviting her to take part in the survey, followed by a telephone call when an interview date was arranged. In total 12,583 women agreed to take part in the survey, 75% of all women contacted. Trained research nurses visited the women at home and collected detailed information about their health, diet and lifestyles.
Food intake over the preceding three months was assessed using a validated interviewer-administered food frequency questionnaire (FFQ). Prompt cards were used to ensure standardized responses to the FFQ; further details are given by Robinson et al. (18). Standard portion sizes were assigned, derived primarily from a published list of UK values (19). The women who subsequently became pregnant visited the SWS ultrasound unit at 11, 19 and 34 weeks’ gestation. At 11 and 34 weeks gestation trained research nurses collected similar information as at the interview before pregnancy, including administering the same food frequency questionnaire.
Complete dietary data are available for 12,572 non-pregnant women, 2,270 women in early pregnancy and 2,649 women in late pregnancy. The Southampton Women’s Survey was approved by the Southampton and South West Hampshire Local Research Ethics Committee.
There were 98 foods and non-alcoholic beverages listed on the FFQ. These were combined into 48 food groups on the basis of similarity of nutrient composition and comparable usage. For example, carrots, parsnips, swedes and turnips were combined in the ‘root vegetables’ group; bacon, ham, corned beef, meat pies and sausages were combined in the ‘processed meats’ group.
Principal component analysis is a statistical technique that produces new variables that are uncorrelated linear combinations of the dietary variables with maximum variance (20). PCA was performed on the reported frequencies of consumption of the 48 foods and food groups at the before, early and late pregnancy time points. The principal component analyses were based on the correlation matrix in order to adjust for unequal variances of the original variables. Natural dietary pattern scores were calculated by multiplying the coefficients for the 48 food groups at one time point by each individual’s standardized reported frequencies of consumption at the same time point. In order to calculate applied dietary pattern scores, frequencies of consumption in early and late pregnancy were standardized to the mean and standard deviation (SD) observed before pregnancy (since standardizing to the frequencies at the early or late pregnancy time point would remove information about increases or decreases in consumption between time points). Applied dietary pattern scores were then calculated by multiplying the coefficients from the PCA at the before-pregnancy time point by each individual’s standardized reported frequencies of consumption at the early and late pregnancy time points.
Natural PCA dietary scores are generated by definition with a mean of zero. The natural scores were divided by their standard deviation in order that the units of the scores were meaningful (standard deviation units). The applied scores were divided by the standard deviation of the before-pregnancy score, so that comparisons could be made in terms of change in standard deviation units.
Differences in food intakes between the time points were tested using Wilcoxon signed rank tests. Due to the large numbers in the sample, differences with P-values of <0.0001 were considered important. The first principal component scores were normally distributed, whilst the second were not, thus for consistency the associations between individual dietary scores at the different time points were assessed using Spearman’s correlation coefficients. The agreement between scores at the different time points was described using Bland-Altman limits of agreement (21). Differences in scores were normally distributed and were assessed using paired t-tests. Two-sided statistical tests are presented, and analyses were performed using Stata 10.1 (22).
The characteristics of the SWS women studied are given in Table 1. Of the women who became pregnant in the SWS, the median time to conception was 1.8 years from the initial interview. Women who provided data at all of the three time points (before, early and late pregnancy) (n = 2,057) were slightly better educated and less likely to smoke than those who became pregnant but did not provide data at these time points, and were less likely to be from a non-white ethnic group (data not shown).
Consumption of the 48 foods and food groups are described in Table 2, for women who provided data at all three time points. Intakes of 21 foods or food groups increased in early pregnancy. These included white bread, breakfast cereals, cakes and biscuits, processed meat, crisps, fruit and fruit juices, dried fruit, sweet spreads, confectionery and hot chocolate drinks (all P < 0.0001). The increases in some foods are not immediately apparent from the summary statistics in Table 2, due to the limited number of categorical responses in our FFQ. Thus if a relatively large proportion of individuals consume at the median level, an underlying change as indicated by the Wilcoxon signed rank test may not be evident from the median values in Table 2. For example, although the median frequency of citrus fruit and fruit juices remains unchanged in early pregnancy, there is an underlying increase in intake of citrus fruit and fruit juices such that the proportion of women with high consumption of citrus fruit and fruit juices increased from 52% before pregnancy, to 64% in early pregnancy.
Consumption of breakfast cereals, cakes and biscuits, processed meat, non-citrus fruit, sweet spreads and hot chocolate drinks increased further in late pregnancy (all P < 0.0001). Whilst consumption of puddings, cream, milk, cheese, full-fat spread, cooking fats and salad oils, red meat and soft drinks did not change in early pregnancy, they increased in late pregnancy (all P < 0.001). The most marked increase was for breakfast cereals, from a median frequency of 4.5 times per wk in early pregnancy, to 7 times per wk in late pregnancy.
Intakes of 10 foods or food groups decreased in pregnancy. Consumption of rice and pasta, liver and kidney, salad vegetables, other vegetables, vegetable dishes, nuts, diet cola, tea and coffee were lower in pregnancy than before pregnancy (all P < 0.0001). Consumption of rice, pasta, liver and kidney were notably lower again in late pregnancy than they were in early pregnancy (P < 0.0001); the proportion of women consuming any liver and kidney was 48% before pregnancy, 22% in early pregnancy and 16% in late pregnancy. Consumption of green vegetables, boiled potatoes and crackers did not change in early pregnancy, but decreased in late pregnancy. The most marked decreases in early pregnancy were liver and kidney, tea and coffee, and the most marked decrease in late pregnancy was vegetable dishes. The reductions in consumption of liver and kidney, and caffeinated drinks are in line with public health messages to women in pregnancy (23).
For the majority of foods, increases or decreases in consumption in pregnancy were due to changes in intakes amongst consumers. However, for six of the foods (breakfast cereals, liver and kidney, vegetable dishes, sweet spreads and jam, diet cola and hot chocolate drinks), the changes reflected both changes in intakes amongst consumers and changes in the number of consumers. Decreases in nuts and cracker intake largely reflect the decrease in proportion of participants consuming these foods across the three time points.
The coefficients from principal component analysis at each of the three time points are shown in Table 3. All coefficients for component 1 are within 0.07 of one other coefficient for the same food group. Similarly, all coefficients for component 2 are within 0.06 of one other coefficient. Generally the first and second patterns are strikingly similarity at all three time points.
The first principal components explained between 7.6% and 8.2% of the variation in the dietary data at the three time points. At all time points this component was characterized by high intakes of fruit and vegetables, wholemeal bread, rice and pasta and yogurt, and low intakes of chips and roast potatoes, sugar, white bread, processed meat, full-fat dairy products, crisps, Yorkshire puddings and savory pancakes, confectionery, and tea and coffee. Component 1 was termed the ‘prudent’ diet pattern (24).
The second principal components explained between 6.3% and 7.1% of the variation in the dietary data. At all time points this component was characterized by high intakes of fruit and vegetables, puddings, meat and fish, eggs and egg dishes, cakes and biscuits, full-fat spread, potatoes, crisps and confectionery. It is notable that virtually all coefficients for component 2 are positive, so a high score on component 2 indicates high overall consumption. In a subset of the SWS cohort, component 2 was shown to have a very strong association with energy intake (r = 0.81, P < 0.0001) (24), and was therefore termed the ‘high-energy’ diet pattern.
At each time point the third and subsequent principal components explained substantially less variation than the first two and were also seen to be less interpretable; therefore they were not considered further.
All subsequent analyses are conducted on the 2,057 women who provided dietary data at all three time points.
The correlations between women’s natural prudent and natural high-energy diet scores are given in Table 4. There is clearly a strong association between women’s scores at the three time points. Since these are natural scores, the high correlation coefficients reflect both the similarities in patterns across the three time points (Table 3) and individual tracking of diet. The correlations for women’s applied prudent and high-energy diet scores are given in Table 5. The high correlations, very similar to those in Table 4, demonstrate that when the scale of measurement is held constant there is again a strong association between women’s scores at the three time points.
Table 6 provides summary statistics for both natural and applied prudent and high-energy diet scores. Natural PCA dietary scores are generated by definition with a mean of zero. Thus the mean natural prudent diet score of 0.07 before pregnancy is due to the fact that the 2,057 women who went on to become pregnant and provided data at both the early and late pregnancy time points tended to have slightly higher prudent diet scores than all 12,572 women with non-pregnant data on whom the scores were generated. Similarly, the mean natural prudent diet score in early pregnancy is zero because it is generated with a mean of zero by definition on 2,270 women, and the majority of these are represented in the 2,057 women under consideration. Applied scores are not generated by definition with a mean of zero, so the summary statistics for applied scores in Table 6 are values that are not affected by the sub-sample on which they are calculated.
We next consider quantifying the mean change in dietary scores between the early and before-pregnancy time points (Table 6). The differences between the early pregnancy natural score and the before-pregnancy natural score have a mean of −0.07. This is the same as the difference between 0.00 (the mean of the early pregnancy natural scores) and 0.07 (the mean of the before-pregnancy natural scores). If the scores had been generated on the 2,057 women themselves the mean difference would have been zero by definition (because the scores have a mean of zero). Therefore the mean difference in natural scores of −0.07 merely reflects the characteristics of the 2,057 women as compared to the full datasets on which the dietary scores were generated. The mean difference (early pregnancy-before pregnancy) in applied scores of −0.01 is the same (subject to rounding) as the difference between 0.05 (the mean of the early pregnancy applied scores) and 0.07 (the mean of the before-pregnancy natural scores), and is a truer reflection of the change in prudent diet scores since it is not based on an early pregnancy score generated by definition with a mean of zero.
Table 6 and Supplemental Figure 1 describe the mean and Bland-Altman 95% limits of agreement for the differences between applied pregnancy scores and before-pregnancy scores. There was minimal change in women’s prudent diet score in early (−0.01 standard deviations, P = 0.35) and late (−0.03 standard deviations, P = 0.11) pregnancy compared to before pregnancy. There was no overall change in high-energy diet score in early pregnancy compared to before pregnancy (0.01 standard deviations, P = 0.49), but a small statistically significant increase in high-energy diet score in late pregnancy compared to before pregnancy (0.07 standard deviations, P = 0.0002). The limits of agreement are somewhat narrower for the prudent diet score than the high-energy diet score indicating that there is closer tracking of prudent diet score into pregnancy than high-energy diet score.
In this study we interviewed a large sample of young women both before and during pregnancy. A particular strength of the SWS is that the data were collected prospectively, thus providing a valuable opportunity to assess dietary change when women become pregnant. Data are available from a large cohort of women with a good response rate: 75% of the women contacted agreed to take part in the study. The complete cohort of 12,583 non-pregnant women has been shown to be broadly representative of women of this age group in the UK in terms of smoking and educational profile, although the proportion of white women is higher than the national figure of 88% (17). Diet was assessed using an FFQ administered by trained research nurses (18). Although there is concern that FFQs may be subject to bias (25), they have been shown to identify similar patterns of diet to weighed food records (5, 7). Since data were interviewer-collected, there were few missing food items, a particular advantage for PCA where complete dietary data is required. Characterizations of individual tracking in dietary scores have often used only correlation methods (1, 5-7, 10, 12); these measure linear association but not agreement. Here we have used Bland-Altman plots (21), which are able to highlight any consistent shifts in pattern scores between time points.
We have used dietary data collected before, in early and late pregnancy to derive prudent and high-energy dietary patterns at these time points using principal component analysis. The continuous nature of PCA has been seen to be more advantageous than a two-cluster solution resulting from a cluster analysis of SWS dietary data (24). The first component was termed the ‘prudent’ diet score, in line with published data (5, 26-28); women with high scores had diets in line with recommendations from the UK Department of Health (29, 30) and other agencies. The second component was termed the ‘high-energy’ diet score; similar patterns in the literature have been labeled ‘high-fat’(31) and ‘high energy-density’(32). In common with other studies (1, 4-15) we found that the prudent and high-energy patterns were replicated with only slight variation across the three time points.
The prudent and high-energy diet score together explain over 14% of the variation in the 48 food and food groups at each time point. Direct comparisons of the proportion of variation explained by a set of components cannot be made across the literature since it is highly dependent on the number of variables entered into a PCA and the number of components retained. However, when the SWS results were compared to dietary analyses with a similar number of variables entered and components retained, the proportion of variation explained by the SWS was highly comparable (5, 31).
We have used dietary patterns in the SWS to address the question of whether natural or applied scores are preferable to assess tracking of individual diet over time. Three problems are apparent with natural scores: firstly, since they are generated with a mean of zero, if, for example, on average diets became less prudent in early pregnancy, this effect would not be apparent. Secondly, it is common for dietary patterns to be calculated on differing numbers of subjects in longitudinal studies, for example where attrition has occurred over time. In this case any apparent change in natural dietary scores could simply be due to the characteristics of subjects with data at both time points, as demonstrated by the changes in natural scores (Table 6), and thus be an artifact of the sub-sample on which differences can be calculated, rather than illustrating true change. Thirdly, although dietary patterns tended to be replicated across time points within the SWS, there is inevitably some variation; therefore changes in natural scores reflect both changes in diet and subtle variations in the patterns, whereas by calculating applied scores we know that any changes in scores are due to changes solely in the participants’ diets themselves because the scale of measurement (the dietary pattern) is kept constant.
For these reasons, applied scores are preferred to natural scores. Another study (13) inferred that natural scores are more appropriate, but the FFQ in this study differed somewhat at the second time point, causing difficulties with implementing applied scores, whereas in the SWS the FFQs were identical. If FFQs did change substantially over time within a study then it may not be possible to calculate applied scores, and natural scores would have to be used. A pertinent example might be when different FFQs are used through infancy and childhood because it is impossible for one tool to be appropriate at all ages.
A further advantage of natural scores cited by Northstone and Emmett (13) is that their use enables researchers to identify new patterns at follow-up. We therefore suggest that an important step in exploratory work is to calculate natural as well as applied scores to ensure that dietary patterns used for applied scores are relevant to the follow-up time point.
There was moderate tracking in dietary scores from before pregnancy into pregnancy. Most women’s prudent diet scores in pregnancy were within −1.44 and 1.39 standard deviations of their score before pregnancy. There was very slightly lower tracking of the high-energy diet score, which in pregnancy was mainly between −1.60 and 1.69 standard deviations of their score before pregnancy. We found that women’s applied prudent diet scores did not increase in pregnancy compared to before pregnancy. In early pregnancy women’s prudent diet scores were on average 0.01 standard deviations lower than before pregnancy, and in late pregnancy they were 0.03 standard deviations lower. These changes reflect the differences seen in food consumption in pregnancy: there was decreased consumption of rice and pasta, vegetables and vegetable dishes, all of which were positively associated with the prudent diet score, alongside increases in consumption of foods that were negatively associated with the prudent diet scores including white bread, cakes and biscuits, red and processed meat, crisps, confectionery, full-fat spread and soft drinks. These influences were offset to a large extent by increases in consumption of breakfast cereals, fruit and fruit juices, dried fruit, and cooking fat and salad oils, that were positively associated with the prudent diet score, and decreases in intake of tea and coffee, which was negatively associated with the prudent diet score.
Women’s applied high-energy diet scores did not change between early and before pregnancy, but were on average 0.07 standard deviations higher in late pregnancy than before pregnancy. This change reflects increases in consumption of foods in late pregnancy that were positively associated with the high-energy diet score, such as cakes and biscuits, processed meat, crisps, fruit, sweet spreads, puddings, cream, full-fat milk, cheese, full-fat spread and soft drinks.
Of the 48 foods and food groups studied, the intake of 21 increased in pregnancy, and 10 decreased. Few studies have been able to collect dietary data prospectively before and during pregnancy. However, Brown and Kahn (33) describe data from 550 US women whose dietary intake reported before pregnancy was compared with that at four time points during pregnancy. Whilst food consumption data was not provided, there was a noticeable increase in energy intake in pregnancy, a pattern that is consistent with the broad picture of increases in consumption in pregnancy in the SWS. Rifas-Sherman et al. (34) describe changes in the diets of American women from the 1st to 2nd trimester of pregnancy. Although their time points do not match directly with those in the SWS, their observations of increases in dairy foods, and red and processed meat through pregnancy are consistent with the changes seen in the SWS.
Adequate nutrition during pregnancy is important for the health of both the mother and her child (35). Since there were very small reductions in prudent diet score into pregnancy in the SWS, it is concerning that women were not able to improve their diet in pregnancy. This small change is likely to be an effect of pregnancy itself rather than repeating the questionnaire; we have previously reported (14) that dietary patterns are reasonably stable in a subset of 94 SWS women who did not become pregnant but who were re-interviewed two years after their initial interview, and if anything women’s applied prudent diet scores increased (by 0.13 standard deviations).
Women are able to respond to dietary public health messages in pregnancy (23) as demonstrated by the reductions in liver and kidney, and caffeinated drink intake in pregnancy. However, the overall quality of the diet, as measured by the prudent diet score, has not improved in the SWS. Appropriate nutrition during pregnancy is an important public health issue, and therefore interventions to improve dietary quality may need to take into account reasons for changes in diet such as nausea and changes in appetite.
Principal component analysis of data from food frequency questionnaires before, in early and late pregnancy revealed prudent and high-energy dietary patterns. Applied dietary scores were preferred to natural dietary scores as a means of analyzing changes over time. There were very small decreases in applied prudent diet scores in pregnancy compared to before pregnancy, and a small increase in applied high-energy diet scores in late pregnancy, indicating little overall change. It is of concern that women were not able to improve their overall diets in pregnancy.
Contributors: SRC performed the statistical analysis. SMR, HMI, KMG and CC designed the research. All authors wrote the paper. SRC had primary responsibility for final content. All authors read and approved the final manuscript.
Sources of funding: This work was supported by the Medical Research Council; the University of Southampton; and the Dunhill Medical Trust.
1Supplemental Figure 1 is available as Online Supporting Material with the online posting of this paper at http://jn.nutrition.org
Conflict of interest or funding disclosure: None