Statistical methods

Raw continuous data were used for height and body mass index. Smoking status, church attendance and political affiliation were analyzed as raw ordinal measures with respectively two, six and five categories.

Structural modeling of the data was undertaken using methods described in

Keller et al. (2009) and based on

Eaves et al. (1999) and

Truett et al. (1994), which assess the contributions of additive and dominant genetic effects in the presence of effects such as vertical cultural inheritance, phenotypic assortative mating or social homogamy, shared twin and sibling environments and within-family environment. Phenotypic assortment occurs when mate selection is based at least partly on the trait being studied, and is evidenced by a correlation between the observed phenotypes of spouses. Such a correlation may also result from shared social background which can be modeled alternatively. Vertical cultural inheritance is the transmission of non-genetic information from parent to child, and refers to the environmental effects the parents create for their children based on their phenotype. The models of assortment and cultural transmission tested here represent some of the possible mechanisms for family resemblance (

Cloninger et al., 1979;

Fulker, 1988;

Heath & Eaves, 1985). Between-family environmental effects make family members relatively more similar, whereas sibling environments are those environmental factors shared between all types of offspring. A special twin environment is an additional correlation between the environment of twins (in addition to the sibling environment) which makes both MZ and DZ twins more alike than ordinary siblings even in the absence of genetic effects (

Neale & Cardon, 1992). While all these sources of common environment contribute to variation among individuals regardless of relationship, they differ in their effect on the covariation between types of relatives. The contribution of genetic and environmental factors may be depend on both magnitude and nature upon an individual’s sex.

A FORTRAN program ‘Famfit’ was originally written by Lindon Eaves to fit an extended twin kinship model to correlations of twins and their first degree and collateral relatives, including parents, siblings, spouses and children. A mathematically equivalent version of the model was implemented in Mx (

Maes et al., 1999) to (i) fit models directly to the raw data to obtain maximum likelihood estimates of the model parameters with appropriate confidence intervals (

Neale & Miller, 1997) and handling missing data (

Little and Rubin, 1987), (ii) analyze multiple variables simultaneously using the rules of multivariate path analysis (

Vogler, 1985), and (iii) make it easier to develop and modify as necessary for other pedigree structures and other models of familial resemblance. To accommodate alternative specifications of assortment, major changes were required to the Mx specifications which led to a more concise script. In the new version, we also added in various data handling options which have greatly increased the flexibility of the code which can now be used to analyze data on any combination of relatives including twins, parents, siblings, spouses and children of twins. Thus basically any type of twin design, from the classical twin design to the nuclear twin family design to ET and cascade, can be fit using the same script. We hope that as such it will become a starting point for further developments and improvements. To help with this goal, we will describe here how the program is constructed.

The principles behind the Mx version, which is available on

http://www.vcu.edu/mx, are simple. The full model is broken up into a number of building blocks which are precalculated in the top part of the program. These also include a set of constraints which are necessary to uniquely identify all the model parameters. The expectations of each of the existing relationships including twins and their first degree and collateral relatives can then be formed by combining the building blocks in the appropriate way, each of which is done in a series of calculation groups. Further calculation groups are specified to combine the various relationships in order to construct the expected matrices for relatives for all five types of twin pair (MZM, DZM, MZF, DZF, DZO). The data groups then provide the observed data as well as these expected covariance matrices in terms of the precalculated expectations. Finally, calculation groups are added to print the various parameter estimates and to derive components of variance. The full model allows for a complete treatment of sex differences, both in the magnitude and the kind of the effects. Thus both the building blocks and the expectations for the relationships have to be specified for the four combinations (male-male, female-female, male-female and female-male).

The Mx script starts with a number of ‘#define’ statements which control various parts of the job to be run. They are set up in such a way that to apply the model to different sets of data, only a number of parameters have to be changed at the top of the script while the main part of the code remains unchanged. The choices to be made up front include (i) ordinal or continuous data, (ii) confidence intervals or not, (iii) extensive or essential output, (iv) individual likelihood statistics or not, (v) save matrix of expected correlations, (vi) sex differences or not, (vii) full ET design or sub design with limited set of relatives, (viii) dominance versus shared environment in submodels that do not allow both to be simultaneously estimated, (ix) phenotypic assortment or social homogamy. #define’d variables are also use to provide filenames for the observed data, for saving various outputs, for details regarding the variable(s) being analyzed and thresholds given ordinal data, specifications for the variable means, start values and boundaries for the parameters, and the number of variables to be analyzed. Additional variables are used to control which design is being fitted to the data. Finally, each of the 64 groups are referred to by names also declared with ‘#define’ statements to make it easier to insert or delete groups without extensive renumbering.

Calculation groups are used to declare matrices for additive genetic (both common to both sexes and male-specific) and cultural transmission latent factors. These groups also calculate the covariance between an individual’s genotype and his phenotype, including paths through a correlated set of genes and through genotype-environment covariance resulting from the combined presence of genetic and cultural transmission. This g-e covariance is one of the building blocks that are generated for each combination by sex. An assortment path between spouses is specified and additional parameters for additive genetic factors which allow the specification of assortment through the phenotype versus social homogamy. The two sets of genetic paths are set equal to test for phenotypic assortment, or the second set of paths is set to zero for social homogamy. Now all the parameters are declared to compute the covariance between the genotypes of siblings (either MZ or DZ twins/siblings), which may include effects due to assortment. These are then combined with GE covariance paths and the covariance between the cultural transmission latent factors of siblings as building blocks (ABC) for sibling, avuncular and cousin relationships in each of the zygosities.

Matrices are also declared for the non-additive genetic latent factors as well as shared sibling, twin and unique environmental factors and correlations between these factors across sex. These factors together with the additive genetic ones (and associated GE covariance) form the phenotypic variances which are set up as constrained parameters. Corresponding paths are set up to control which sources of variance contribute to assortment. The combination of all sources of variance and their counterparts to control assortment then allow the calculation of the covariance between a person’s actual phenotype P and the phenotype on which assortment P~ is based. Finally parameters for cultural transmission and their covariances need to be declared in matrices.

Constraints to ensure equilibrium of genetic, environmental and GE covariances over consecutive generations are then set up. Three constraints are needed for the genetic latent factors, one for the common set of genes, one for the male-specific genes and one for the covariance between the common and male-specific genetic factors. There are also three constraints for the residual environmental covariance between male, female and opposite sex pairs. The covariances between genetic and environmental factors are also sex-specific and require four constraints.

Additional groups are used to create larger building blocks to be used in acrossgeneration relationships. The covariances between the parental phenotype and the additive genetic and cultural transmission latent factors of the children are precalculated as are the covariances of these factors across generations. These blocks involve both direct genetic and cultural transmission paths from parent to offspring. Similarly, blocks for covariances due to genetic and cultural transmission that involve assortment are constructed and combined to generate (grand)parent-offspring, avuncular and cousin relationships.

The expectations for each of the 88 sex-specific relationships in the extended twin kinship design are then specified. In addition to expected covariances between the actual phenotypes of the relatives involved, we also calculate expected covariances between the actual phenotype (P) of one relative with the ‘mating phenotype’ (P^{~}) of the other relative, referred to as PP^{~} covariances, or between the mating phenotypes of both relatives (P^{~}P^{~} covariances) which are used as part of covariances between relatives further apart. First the twin covariances for the five zygosities are generated, followed by PP^{~} and P^{~}P^{~} covariances. Second are the sibling and PP^{~} covariances. The expectations for the correlations between twins use the blocks for ABC covariances across siblings, latent factors representing genetic dominance, non-parental shared environment and special twin environment and the correlations between these factors in males and females in opposite sex twins. The sibling expectations are similar to those for twins except for the special twin environment contribution.

The third group of first degree relationships consists of parent-offspring relatives. The parent-offspring correlations are made of building blocks between direct and indirect (through assortment) paths from the parental phenotype to latent ABC factors of the children and the matrices defining the links between these latent factors and phenotypes. The same building blocks multiplied by additional blocks connecting ABC factors across generations are used to compute expected grandparent-grandchildren correlations. The Famfit program did not include expectations for these relationships as the number of observed pairs of these relationships was relatively small the VA 30,000 sample. However, when fitting to the raw data, all possible relationships have to be explicitly specified. Given the assumption that the correlation between the twins and their parents is identical to the correlation between the twins and their children, the grandparent-grandchild correlations can be computed by combining the expected parent-offspring correlations in the appropriate way.

Next the expected covariances for avuncular relationships through MZ twins, DZ twins and siblings are computed. The matrix algebra for each of these correlations consists of seven matrices: i) paths from the phenotype of an uncle/aunt to his/her latent ABC factors, multiplied with ii) paths from the latent ABC factors to the genetic latent AB factors of a niece/nephew, and iii) a twin or sibling correlation from an uncle/aunt to his/her cotwin, multiplied with iv) cultural transmission path, and v) a twin or sibling PP^{~} correlation from an uncle/aunt to the mating phenotype of his/her cotwin, multiplied with vi) genetic and cultural transmission paths through assortment, all of which are multiplied finally by vii) a matrix of paths from the latent factors in the child to his/her phenotype. In addition to the regular avuncular covariances, we also specify PP^{~} covariances for such relationships through twins, which are used in the cousin covariances. The cousin relationships which may exist through MZ twins or DZ twins are specified next. These are also built up by combining the various building blocks in the appropriate fashion, in a similar way as the avuncular relationships with a few extra matrices.

Next a number of calculation groups are used to combine the various individual expected correlations into larger units which can then be combined to produce a table with all the expected correlations. More importantly, they are organized in such a way that they can be put together to generate the expected covariance matrices for the extended kinships by zygosity. Separate groups are used to organize the twin, sibling, parent-offspring, grandparent-grandchild, avuncular and cousin correlations. Following this are groups that calculate the relationships through marriage including first degree relatives and their spouse, spouses through twins and nieces/nephews and the spouse of their uncle/aunt.

Finally all the building blocks that do not vary according to the zygosity of the twin pairs, for example, the covariance between brothers and sisters, are organized in one group. These are then combined with matrices specific to each zygosity to generate the expected covariance matrices for each of the five types. An extra group is used to set up matrices to be used across the data groups to handle regression of covariates. It also generates matrices to produce the relevant subsets of the extended kinship expected covariance matrices when fitting one of the subdesigns. The data groups read the observed raw data for all the relatives. In addition to specifying the model for the covariances between relatives, the data groups also contain models for the means. The latter can include constraints across birth order, zygosity, generation and sex, or be estimated freely. The order of the relatives in the expected mean and covariance statements needs to match those in the observed data files. An additional calculation group summarizes the expected means and covariance matrices for all five zygosity groups. Note that various groups include start values and boundary statements to limit the range of values for the parameter estimates, and options statements for the output.

To obtain relevant information from the output in a organized fashion, several calculation groups are used that create tables. The first one of these generates a table of expected covariances for the 88 relationships by sex combination. Following groups summarize parameter estimates and calculate derived parameters and compute unstandardized and standardized variance components separately for males and females. Other groups report the function values for each of the data groups and list the results of the constraints groups to make it easy to check that all the constraints are satisfied. Statements are included to calculate confidence intervals around parameters of interest. Given the number of parameters in the full model and the size of the observed dataset, it is wise to restrict the number of requested confidence intervals until after evaluation of the model. The final group calls up all the computed tables to print. Also a number of optimization options and options for saving output files are specified in this group. If no sex differences are requested, a set of parameters will be equated, or dropped to specific values. If instead of the full extended kinship model a sub design is fitted, several parameters may have to be dropped from the model to ensure identification of the remaining parameters. Finally, if several subdesigns are being analyzed with the same dataset, a loop function can be used to generate the appropriate output.