LOT supports input files in a format similar to the standard GENEHUNTER format. Two input files are required: a locus data file and a pedigree file. The locus file contains information on genetic distances between markers, number of alleles at each locus and their frequencies. The pedigree file provides information about the structure of each pedigree, the values of the ordinal trait, the genotype of each marker for each individual and the value of the covariates, if any. For formats and detailed instruction please refer to the Supplementary information website.
LOT produces two types of output: a table and a diagram. The first two columns in the table contain the names of the markers and the map position of the markers and intermarker locations, respectively. The next three columns contain the complete (natural) log-likelihood without considering the latent variables (‘Without Us’), the log-likelihood considering only U1 (‘With U1’) and the log-likelihood considering both U1 and U2 (‘With U1 & U2’), computed for each marker and intermarker location. This tabulated output is automatically saved as a tab delimited plain-text file. The graphic output displays the significance level of linkage of each location based on the result of the likelihood estimation. Users have the option to save the diagram as a PNG image. Currently, the graphical output is only available for versions of LOT with GUI. In addition to the final output, LOT interactively prints onto the main window the progress of the computation.
displays the graphical output produced by LOT for a hoarding study dataset (Feng et al.
). The response in this study is an ordinal trait that takes the value of 0, 1 and 2 based on the hoarding symptoms of a patient. Zero was recorded if both of the hoarding items on the Yale–Brown Obsessive-Compulsive Scale symptom checklist were rated as present for the patient, one if only one item was present and two if both items were absent. Shown in the figure is the result from the markers on chromosome 5. The horizontal axis indicates map locations on the chromosome and the vertical axis stands for the difference in log-likelihood between the model considering only U1
and the model considering both U1
. The green curve denotes the gain in log-likelihood when both latent variables are included in the computation compared to when only the familial and genetic factors (U1
) are considered. The blue line and red line indicates the thresholds for suggestively significant linkage and significant linkage, respectively. The thresholds are calculated following the definition of suggestive linkage and significant linkage suggested by Lander and Kruglyak (1995
) based on the assumption that the total number of markers in a genome-wide linkage scan is about 400. This is usually the case for microsatellite markers. These thresholds provide a reference for the users. Users are encouraged to recalculate the thresholds according their study settings. As shown in , at any position where the green curve exceeds the threshold for suggestive linkage the name of the marker is printed on the graph in black; if the green curve exceeds the threshold for significant linkage, the marker name is printed in bold letters.
Fig. 1. Graphical output from LOT for a hoarding study dataset. The blue and red lines indicate, respectively, the thresholds of significant and suggestive evidence for linkage between a marker and the trait locus. The thresholds were computed empirically by (more ...)
The computational time of LOT grows linearly in the number of markers. The computational time for computing the inheritance vectors grows exponentially in the number of non-founders within a pedigree and linearly in the number of pedigrees when all pedigrees have the same structure. The computational time of the remaining part of the program grows quadratically in the number of samples. While running the LOT program, the bottleneck in computational time is the remaining part. Thus, practically, the estimated running time of the LOT program grows quadratically with the number of samples. In the above example, 223 samples and 24 markers were analyzed on a desktop workstation with Intel Pentium D CPU 3.20 GHz processor and 3.50 GB of RAM. The computation was completed in 211 s. In another analysis with 3074 samples and 32 markers, it took 49 357 s to complete on the same machine.