Table provides the average and standard deviation of African, European and Native American ancestry for the wives and husbands, stratified by ethnicity and recruitment site. While both Mexicans and Puerto Ricans have ancestry from all three populations, it is apparent that the Mexicans have predominant European and Native American ancestry but modest African ancestry, while the Puerto Ricans, who also have substantial European ancestry, have greater African ancestry and far less Native American ancestry. Indeed, these studies (and prior ones) indicate that there is only modest overlap in the ancestry distributions for Mexicans and Puerto Ricans (Figure ). The overlap exists where Native American ancestry ranges from 0.1 to 0.3 and African ancestry from 0 to 0.2. This area of overlap is of particular interest, because it describes individuals who are matched in terms of ancestry but discordant in terms of nationality/ethnicity and culture.
Mean (standard deviation) ancestries for Latino spouses by recruitment site
African versus Native American ancestry in Mexicans and Puerto Ricans.
In Mexicans, the predominance of Native American and European ancestry is also reflected in the variances of the three ancestries, where the standard deviation for Native American and European ancestry is large at approximately 0.16, while for African ancestry the standard deviation is much smaller at approximately 0.04. By contrast, in Puerto Ricans, where European and African ancestry are dominant, the variance of African and European ancestry are large (standard deviations approximately 0.14) and the variance of Native American ancestry less (standard deviation 0.065). These variances also have implications for correlations in ancestry within individuals. As expected (Table S1 in Additional data file 1), the correlation between Native American and European ancestry in Mexicans is extremely strong (-0.97). There is also a moderately negative correlation observed between African and Native American ancestry (-0.28). In Puerto Ricans, the correlation between African and European ancestry is strong (-0.89). Because European is the predominant ancestry in the Puerto Ricans, there is also a moderate negative correlation between European and Native American ancestry (-0.35).
Results of t-tests comparing average ancestries between spouses, and recruitment site within ethnic group, are given in Table S2 in Additional data file 1. As is apparent in Table , there are no significant differences in ancestry between the wives and husbands within any category. There are also no significant differences between the Puerto Ricans recruited from Puerto Rico and those recruited from New York. However, there are substantial ancestry differences between the Mexicans from Mexico City and those from the Bay Area, reflecting a migrant effect. The Bay Area Mexicans have significantly more European and African ancestry and less Native American ancestry compared to the Mexicans from Mexico City (Table S2 in Additional data file 1). This difference may reflect specific geographical or socioeconomic origins of the Mexican migrants to the Bay Area.
To examine a possible role of socioeconomic status on further analyses of these subjects, we examined average ancestries within SES categories for the subset of subjects on whom we had such information (Table S3 in Additional data file 1). Linear regression analysis of ancestry on SES (coded as 1 for low, 2 for moderate, 3 for middle and 4 for upper) was also performed separately for the sexes and ethnicities. There was a non-significant trend towards increased European and decreased Native American ancestry with SES among the Mexican wives but not husbands. However, there was a significant positive relationship of African ancestry with SES and negative relationship of SES with European ancestry among the Puerto Rican wives. SES trends were less clear among the Puerto Rican fathers. We note that because SES was measured based on census-based location information rather than personal information, there may be a loss of sensitivity in these results.
We next examined the between-spouse correlations in ancestry (Table ). Among the Mexicans, the spouse correlation in European ancestry is extremely high and statistically significant; Native American ancestry shows a similar pattern. By contrast, there is no significant spouse correlation for the African component of ancestry. The correlations for the Mexicans combining the two recruitment sites are confounded by the difference in average ancestries we noted above. However, within site, the spouse correlations for European and Native American ancestry are still high (0.56 to 0.57 for European or Native American ancestry in Mexicans from Mexico City and 0.39 to 0.42 in Mexicans from the Bay Area). Figure depicts the spouse similarity for the three different ancestry components for the two Mexican recruitment sites. Of note, the higher spouse correlation among pairs from Mexico City is due entirely to four couples with particularly high European and low Native American ancestry. Nonetheless, the data show that the spouse ancestry correlation is robust and replicated across the two recruitment sites.
Between spouse correlations (95% confidence interval) in ancestry, by ethnicity, recruitment site and socioeconomic status
Correlation in individual ancestry for Mexican spouses. Correlation in individual ancestry (IA) for Mexican spouses from (a) San Francisco Bay Area and (b) Mexico City. AF, African; Eu, European; NA, Native American.
Within the Puerto Rican spouse pairs, the correlations are high and significant for both European and African ancestry, but not for Native American ancestry. In this case, there are no significant differences in ancestry correlations between the couples from Puerto Rico versus those from New York City. We also note that the spouse correlation in African ancestry (0.33) is somewhat higher than the correlation in European ancestry (0.24), although the difference is not statistically significant. Figure depicts the spouse similarity for Puerto Ricans; the ancestry correlations for Puerto Rican pairs from the two recruitment sites appear quite similar.
Correlation in individual ancestry for Puerto Rican spouses. Correlation in individual ancestry (IA) for Puerto Rican spouses from (a) New York City and (b) Puerto Rico. AF, African; Eu, European; NA, Native American.
An important question is the source of the ancestry correlation between spouses. One possible factor is SES. Therefore, for the Mexicans from the Bay Area and the Puerto Ricans from Puerto Rico, for whom we had such information, we also examined spouse correlations within SES categories (Table ). The spouse correlations in ancestry persisted within SES categories both in Mexicans and Puerto Ricans, and there was no apparent pattern of increase or decline with SES. As an additional evaluation of the impact of SES, we performed a linear regression analysis, with wife's individual ancestry (IA) as dependent variable and husband's IA and SES as the independent variables. These analyses were performed separately for each of the three ancestry components (Table S4 in Additional data file 1). Here again, we find no attenuation of the significant spouse relationship in European or Native American ancestry in the Mexicans when allowing for SES in the regression model. Similarly, we find no attenuation of African or European ancestry spouse correlation in the Puerto Ricans when including SES in the regression model. SES was not a significant predictor of wife's ancestry in any of the analyses of Mexicans; however, as noted previously, there was a significant positive regression of SES on African ancestry and negative regression of SES on European ancestry among the Puerto Rican wives.
We next evaluated the impact of assortative mating on genotype distributions at individual loci. First, we noted no significant differences in allele frequencies between spouses within recruitment sites, either for the Mexicans or Puerto Ricans (Table S5 in Additional data file 1). However, we did find a large excess of significant allele frequency differences between the Mexican and US recruitment sites for the Mexicans (69% of loci significant at P < 0.05). This pattern is consistent with what we previously observed for site-specific ancestry differences for the Mexicans. To determine whether the Mexico City versus Bay Area allele frequency differences were entirely attributable to the ancestry difference between the two sites, we performed a regression analysis of the allele frequency difference chi-square on δij2/p*q*, where δij represents the allele frequency difference between ancestral populations i and j, and p* is the allele frequency in the admixed population, q* = 1 - p* (see Materials and methods). The results are given in Table S6 in Additional data file 1. We observed a highly significant regression coefficient for the European-Native American δ (0.0339 ± 0.0037), while neither of the other coefficients was statistically significant, nor was the intercept significantly different from 1. Similarly, in an analysis where the intercept term was fixed at 1, the regression coefficients were very close to the unconstrained analysis. Thus, the entire excess of significant allele frequency differences between Mexico City and Bay Area can be attributed to the European-Native American δ values at the markers, consistent with the European/Native American ancestry difference between the two sites being the source of site allele frequency differences. As described in Materials and methods, the pairwise sums of regression coefficients provide estimates of the squared difference in ancestry between the two sites. From the regression coefficients in Table S6 in Additional data file 1, we estimate the following ancestry differences between Mexico City and the Bay Area: Native American, √(0.0315 + 0.0025) = 0.184; European, √(0.0315 - 0.0018) = -0.172; African, √(0.0025 - 0.0018) = -0.026. From Table , the corresponding numbers are 0.184, -0.160 and -0.024, respectively. Thus, the regression results agree remarkably well with the observed site ancestry differences.
To explore the effect of assortative mating on individual loci, we calculated F values, both for the spouses themselves (within individual correlation) and between spouses (between spouse correlation), as described in Materials and methods. The value F1 represents the within spouse allelic correlation, which is derived from the excess of homozygosity among the spouses. The value F2 represents the between spouse allelic correlation obtained by sampling one allele from each parent at random, which is also an estimate of the expected value of F1 for the children of these spouse pairs (see Materials and methods). Thus, the two values of F allow us to compare the effect of assortative mating across two generations.
The mean values of F1 and F2 are given in Table , stratified by ethnicity and recruitment site. The mean of all F values are significantly greater than 0, although the largest values are observed for F2 in Mexicans and F1 in Puerto Ricans. For Mexicans, the overall F1 and F2 values appear reasonably consistent between generations (0.0161 for F1 and 0.0172 for F2). However, for Puerto Ricans, the overall F values appear higher within spouses (F1 of 0.0256) compared to between spouses (F2 of 0.0085). This may indicate a decrease in spouse correlation between the generations, but requires additional investigation.
Mean (standard error) values of allelic correlation within spouses (F1) and between spouses (F2)
We next undertook an analysis to determine the degree to which the significant F values could be attributed to ancestry assortative mating. We did so by linear regression, allowing the F value to be the dependent variable and three independent variables denoted as δij2/p*q*, where the i, j subscripts refer to the three possible combinations of the ancestral African, European and Native American populations and p* is the allele frequency in the admixed population (see Materials and methods).
Results are provided in Table (for F1) and Table (for F2). Among the Mexicans, it appears that the F1 values are fully explained by the standardized Native American-European squared delta values of the markers, which were significant for the Bay Area Mexicans and for both groups combined. In these analyses, the intercept term was not different from 0, indicating that the F1 distribution was fully explained by the covariate. In the analysis of F2, the results were not as clear cut, although again it appears that the Native American-European delta values explain much of the excess. In the analysis including all three delta terms, none were significant in any of the analyses, although the coefficients for the Native American-European delta tended to be largest. However, in analyses including only the Native American-European delta term, this covariate was significant in the analysis of the Bay Area Mexicans and both sites combined. In the final analysis of both groups combined, the intercept term is largely diminished, although still marginally significantly greater than 0.
Regressions of F1 on δ2/p*q*
Regressions of F2 on δ2/p*q*
Regression analyses on Puerto Rican F1 values yielded less clear-cut results. As expected, the largest regression coefficients were for African-European delta terms, although none were formally significant, in the analyses of single sites or for the two sites combined. Also, it appears that the ancestral deltas do not fully explain the excess of homozygosity at these markers. As seen in Tables and , the F2 values were not as extreme as the F1 values, and none of the regression coefficients were significant, although again the largest regression coefficient tended to be for African-European delta terms. After regression, there was no significant intercept term remaining.
As described in Materials and methods, the pairwise sums of regression coefficients provide estimates of the three spouse covariances in ancestry. For the Mexicans we analyzed the two recruitment sites separately, to avoid inflation of spouse covariance due to average ancestry differences between sites. From Table , for the regression analysis on F1 we estimate the following ancestry covariances for Mexico City: Native American, 0.0125 + 0.0054 = 0.0179; European, 0.0125 - 0.0047 = 0.0078; African, 0.0054 - 0.0047 = 0.0007. For the regression analysis on F2, the corresponding covariance estimates are: Native American, 0.0141 + 0.0034 = 0.0175; European, 0.0141 - 0.0028 = 0.0113; African, 0.0034 - 0.0028 = 0.0006. The corresponding observed spouse covariances in ancestry derived from Tables and for Mexico City are: Native American, 0.0190; European, 0.0168; African, -0.0001. Thus, the regression-based estimates for Native American ancestry spouse covariance are quite close to the observed, but the regression-based estimate for European ancestry covariance is somewhat below the observed. For the Bay Area Mexicans, the regression-based covariance estimates for F1 are: Native American, 0.0168 + 0.0033 = 0.0201; European, 0.0168 - 0.0038 = 0.0130; African, 0.0033 - 0.0038 = -0.0005. For the corresponding regression analysis on F2, we estimate: Native American, 0.0135 - 0.0011 = 0.0124; European, 0.0135 + 0.0004 = 0.0139; African, 0.0004 - 0.0011 = -0.0007. The corresponding observed spouse covariances for Bay Area Mexicans are: Native American, 0.0083; European, 0.0093; African, 0. Here the regression-based estimates appear to somewhat overestimate the actual covariances for Native American and European ancestry. All analyses regarding covariances for African ancestry are consistent in showing no evidence of correlation.
We repeated the same analysis in the Puerto Ricans, but for the two recruitment sites combined. From Table , for the regression analysis on F1 we estimated the following ancestry covariances: African, 0.0131 - 0.0006 = 0.0125; European, 0.0131 + 0.0064 = 0.0195; Native American, 0.0064 - 0.0006 = 0.0058. For the regression analysis on F2, the corresponding covariance estimates are: African, 0.0028 + 0.0024 = 0.0052; European, 0.0028 - 0.0002 = 0.0026; Native American, 0.0024 - 0.0002 = 0.0022. The corresponding observed spouse covariances in ancestry from Tables and for Puerto Ricans are: African, 0.0059; European, 0.0048; Native American, 0. The F2 regression-based estimates of spouse covariance for African and European ancestry are comparable to the observed (with a somewhat underestimated European ancestry correlation), while the F1 regression-based estimates are higher. This suggests (as does the overall higher mean value for F1 than F2) that the assortative mating in Puerto Ricans was stronger in the prior generation than in the current one.
To determine whether the excess average F1 and F2 values might be attributable to specific genomic locations, we created a Q-Q (quantile-quantile) plot of regression residuals against a normal distribution (Figure S1a for Mexicans and S1b for Puerto Ricans in Additional data file 2). In both figures the observed distributions match closely to the expected. Hence, the homozygote excess appears to be a global phenomenon.
Results of the inter-locus (LD) analysis were strikingly different from the single locus analyses. A clear excess of significant chi-square tests was observed in each ethnic group and recruitment site (Table ). Approximately 15% of tests were found to be significant at the 5% level of significance. Regression analyses of the standardized squared-delta products (for each of the two marker loci involved) were quite revealing (Table S7 in Additional data file 1). For the Mexicans, the European-Native American standardized delta products were extremely predictive of the chi-square, in contrast to the two other delta product covariates. After regression, the intercept terms were greatly attenuated from the corresponding mean chi-squares in Table , although still significantly greater than 1. The Puerto Ricans showed a similar pattern, except that the highly significant covariate term in this case was for the African-European squared delta product term (Table S7 in Additional data file 1). As for the Mexicans, the intercept terms were greatly diminished from the corresponding mean values in Table , although still somewhat greater than 1. These results show that the primary driver of LD between unlinked loci in this population is ancestral delta values - between Europeans and Native Americans for the Mexicans, and between Africans and Europeans for the Puerto Ricans.
Chi-square tests of linkage disequilibrium between pairs of markers for spouses combined
To search for possible regions with excess LD, we performed another regression analysis, this time on the LD parameter D as a function of the unstandardized delta products (Table ). As seen previously for the regression analysis of chi-square, the European-Native American deltas were highly significant for the Mexicans, while the African-European deltas were highly predictive for the Puerto Ricans. We then examined the distribution of residuals from the regression by creating a Q-Q plot against a normal distribution (Figure S2 in Additional data file 2). While the overall fit to a normal distribution appears good for both the Mexicans and Puerto Ricans, there do appear to be a few possible outlier points on both ends. The marker pairs involved in the most extreme points (with Z scores greater than +4 or less than -4) are given in Table S8 in Additional data file 1. The most extreme point occurred in Mexicans (Z = +5.09) for markers on chromosomes 2p and 3p. We note that the same pair of markers gave a Z score of +1.10 in the Puerto Ricans. The marker pair on chromosomes 1p and 2q, which gave a Z score of -4.08 in Mexicans, also had a nominally significant Z score in Puerto Ricans (-2.40), while the pair on chromosomes 1p and 17p (Z score of -4.09 in Mexicans) also had a nominally significant Z score in Puerto Ricans, but in the opposite direction (Z = +2.42).
Regression of linkage disequilibrium parameter D on δ1δ2
We next projected the reduction in ancestry variance over time (see Materials and methods). The results are shown in Figure , where we have plotted the proportion of original variance, Vt/V0 against generation. For a constant spouse correlation over time, the variance decreases most rapidly, and is around 10% of its original value after just five generations (for c = 0.3, corresponding to Puerto Ricans) or seven generations (for c = 0.4, corresponding to Mexicans). By contrast, for the linear model (c = 1-at), and the exponential model (c = e-bt), the rate of decline of V is slower; a reduction to 10% of the original value occurs between 10 and 13 generations, depending on the model parameters.
Decay in ancestry variance over time for three spouse correlation models.
To determine the compatibility of the curves in Figure with our own data, we calculated Vt/V0 and rt for the current generation of spouses. From the means (α) and standard deviations (√V) in Table , we derived values of Vt/V0 of approximately 0.11 for European and Native American ancestry in Mexicans and 0.08 for African and European ancestry in Puerto Ricans. By contrast, the proportion of original variance for African ancestry in Mexicans is only 0.02, and for Native American ancestry in Puerto Ricans the value is 0.03. These lower values are consistent with the more modest spouse correlations observed for these ancestry components. All these variance ratios may be slightly inflated due to statistical noise in ancestry estimation. Because there was no correlation of African ancestry in the Mexican spouses, we assumed that the variance observed for African ancestry (0.0016) was primarily due to estimation error, since the actual variance would have decreased rapidly by this point in time. Adjusting the values of Vt/V0 given above for this amount of error variance (an upper bound) reduced the ratios to 0.10 for European and Native American ancestry in Mexicans, and 0.07 for African and European ancestry in Puerto Ricans.
To estimate rt
, we need to project the value of the LD parameter D to marker loci that are completely informative for ancestry (that is, allele frequency of 1 in one ancestral population and 0 in the other), which corresponds to δ
values of 1 for both markers. From the regression results presented in Table , we can estimate D for δ
= 1 by simply using the regression coefficient of δ1δ2
. For Mexicans combined, D = 0.0402. To obtain the value of rt
, we then need to divide D by α
(1 - α
), because α
and 1 - α
correspond to the allele frequencies for a marker that is completely informative for ancestry (δ
= 1). Using the mean ancestry values of Table as α
, we derive an approximate rt
value of 0.16. For Puerto Ricans, the value of D is 0.0283; dividing by α
(1 - α
), we obtain a value of 0.12. We can rearrange the formula for Vt
given in Materials and methods to Vt
/(2 - ct
) and ct
= 2 - rt
). Using the values above for Vt
, for Mexicans we obtain ct
= 2 - 0.16/0.10 = 0.40; for Puerto Ricans we obtain ct
= 2 - 0.12/0.07 = 0.29. These values are close to the observed spouse correlations in ancestry in Table . Referring back to Figure , we see that our results are consistent with a model of decreasing spouse ancestry correlation over a period of about 9 to 13 generations for Mexicans and 10 to 14 generations for Puerto Ricans. The same formulas given above can also be adapted for linked markers [26
]. The assortative mating we observed is expected to enhance the LD between linked markers to an even greater extent than for unlinked markers.