This paper proposed methodology for estimating the regression relationship between exposure and outcome when the exposure variable is assessed by multiple assays that are potentially subject to detection limits. We developed models and evaluated designs for both continuous and binary outcome variables and considered the case in which the relationship between outcome and exposure followed a polynomial or smoothing spline relationship. The PCB/endometriosis study measured a serum assay in all study participants and the more definitive “gold standard” assay on only a subset of the total patients. In this study, the second gold standard assay was measured from adipose tissue sampled during laparoscopy surgery and can be considered to be sampled completely at random. We explored the alternative design in which the second assay is performed when the first assay is below LOD and with probability P when the initial assay is above LOD. We found that the efficiency of a design in which the second assay is measured only when the initial assay is below LOD is substantially improved when only a small percentage of second assays are measured when the initial assay is above LOD. Additionally, designs in which we only observe the second assay when the first assay is below LOD is highly inefficient relative to sampling the second assay completely at random in a comparable proportion of patients.
In the PCB/endometriosis study, the second assay was an invasive, tissue-based assay, whereas the initial assay was a serum-based assay. In other settings, the second assay may be more expensive than the initial assay, and designs may be compared based on minimizing study costs. In these cases, rather than comparing relative efficiencies as was done in this paper, we can compare designs based on minimizing cost functions. Such an approach would require specifying the relative cost of the 2 assays, and could easy be implemented within the framework discussed in this paper.
For the PCB/endometriosis study, the serum assays were subject to LODs, whereas the gold standard adipose tissue assay was not subject to LOD. Thus, the model development and design considerations were for the situation in which the inexpensive/noninvasive assay measured in all individuals was subject to lower LOD, whereas the gold standard expensive/invasive assay measured on a subset of individuals was not subject to LOD. The model and simulations could be easily altered to allow for LOD on the second assay. Furthermore, the model could be easily altered additionally to allow for an upper LOD for the initial assay. Our investigation is focused on the situation where there are 2 assays. The methodology can easily be extended to incorporate more than 2 assays measuring a single exposure. We expect that the design results presented in this paper will be similar for these alternative formulations.
Most of the models presented in this article assume that values below LOD are not available and treat these values as being left censored. In many instances, values outside LODs are not reported by the laboratory or, for a particular assay, values below a certain limit cannot be quantified. However, as a general principle, these values should be collected when possible, and depending on the type of assay, should be used in the analysis.6
For the PCB/endometriosis study, where the value of serum assay measurements below LOD were recorded, we were able to examine the advantages of using the actual measurements below the LOD rather than treating these values as left censored. We found small efficiency gains in incorporating these values in the analysis. Further, simulation studies confirmed these efficiency gains in situations in which the proportion of initial assays below LOD was sizable. In the analyses and simulations, we assumed that measurement error was constant and did not depend on the actual true measurement of the exposure variable. More complex models that allow for a measurement error process that depends on the true value may be more appropriate when we are analyzing actual measurements below LODs.6
The development of such models is an area of future research.
We proposed models for cross-sectional data such as the PCB/endometriosis study in which single exposure and outcome measurements were obtained. These models could be extended to the longitudinal setting in which both exposure and outcome are measured longitudinally in time. Albert7
proposed such a model when the outcome in a longitudinal study is measured with multiple assays. A similar model could be developed in which the exposure variable is measured with multiple assays. This is an area of future research.
This article focuses on the situation in which the second assay is missing by design. Namely, an investigator could measure the second assay in all study participants, but because of cost or feasibility issues only performs the second assay in a subset of study participants. In other applications, investigators may attempt to measure the second assay in all participants, but may have missing data. For example, if the second assay requires a tissue specimen, then it may be missing when the tissue sample cannot be obtained due to the participants refusal to undergo the procedure or due to the surgeon failure to obtain the sample. If the reason for missing data mechanism is missing at random or missing completely at random,8
then the proposed methodology is appropriate. However, if the probability of missing the second assay depends on the value of the second assay had it been observed (this type of missing data mechanism has been referred to as nonignorable missingness) then the methodology presented in this article may result in biased inference. Extensions of the models for nonignorable missingness is an area of future research.