We obtained samples of journal articles indexed in MEDLINE by searching on the basis of race and ethnicity terms, population terms, and genetics terms (details below). We based the search on three sets of journals. The first two samples were drawn from journals with the highest impact factors (based on ISI journal citation reports [Thomson ISI, 2005b
]) in the fields of (1) clinical research (which included a subset of cardiology journals) and (2) genetics. Many high impact factor genetics journals do not address the human population research of interest here. As a result, and in order to keep the impact factor ratings roughly equivalent in the clinical and genetics journals, the high impact genetics sample is based on only two journals while the high impact clinical sample draws on five (as well as the cardiology journals). lists the journals selected by impact factor. The third sample, referred to here as the general journal sample, was drawn from MEDLINE-indexed journals, excluding all of those journals that were included in the high impact sample.
Sampled Journals by Impact Factor
We relied on journal impact factors to choose our sample for three reasons. First, impact factor as the “measure of the frequency with which the ‘average article’ in a journal has been cited in a particular year or period” [Thomson ISI, 2005a
], suggests that articles from high impact factor journals are more likely to be read by more people. Thus analyzing articles from these journals provides an account of the most common models available to researchers who use genetic research findings but who may themselves not have conducted or been experts in this kind of research. Second, as impact factor rankings also approximate a journal’s prestige [Thomson ISI, 2005a
], and as prestige often translates into greater resources, journals with high impact factors are more likely to have paid editorial staff and the capacity to develop and apply strict editorial and peer review policies. This capacity suggests that the practices identified in these journals might be more likely to reflect preferred practices. Third, articles in high-impact journals are the most likely to be reported in the lay press, and thus clarity in the use and definitions of socially-charged terminology is important to assess in these journals.
We added a specific sub-sample from high impact factor cardiology journals to the clinical sample to assure inclusion of a sufficient number of in-depth, condition-specific studies in the clinical sample. For analysis purposes, the cardiology sample was combined with the clinical sample.
To collect study articles we conducted a four step sampling process, applied first to high impact journal citations. We began step one by searching MEDLINE in the selected high impact factor journals with the following limits: abstract, humans, English, and publication dates between 2001 and 2004. To restrict articles to those that reported research, we also set search criteria to exclude articles indexed to the following article types: clinical conference, comment, consensus development conference NIH, duplicate publication, editorial, letter, news, newspaper article, review, academic, review, multicase, and review literature. Full citations for articles that met these criteria were then downloaded.
Step two started with searching the title, abstract and keywords of downloaded citations for terms from the following three lists: (1) Race and ethnicity terms, including race, racial, ethnic, ethnicity, and a set of race and ethnicity terms based on pre-2003 MEDLINE MeSH terms. These terms included: black, white, Caucasian, European-American, Asian, Hispanic American, Mexican American, Native American, American Indian, Alaskan American, African American, Inuit, Gypsy (or Gypsies) Arab, and Jew; (2) Population terms, including population, family, and kindred; and, (3) Genetic terms, including genetic (MeSH term “genetic” returns the following list of terms: cytogenetics, genetic research, genetics behavioral, genetics medical, genetics microbial, genetics population, immunogenetics, molecular biology, pharmacogenetics, and radiation genetics), and pharmacogenomics. It is noteworthy that MEDLINE has revised its MeSH terminology for the list 1 terms concerning race and ethnicity [Sankar, 2003
]. However, list 1 terms were accepted MeSH terms during most of the period covered by our sample and their use by MEDLINE suggests a sufficient overlap with popular usage to warrant adoption here.
To be eligible for random selection for the study, an article had to have in its title, abstract, or keywords one term from list 1 (race or ethnicity terms) or one term from list 2 (population terms), and one term from list 3 (genetics terms). Review of article titles, abstract, and keywords based on these criteria returned 2,151 articles from high impact clinical journals, 178 from cardiology journals, and 1,295 from genetics journals.
For step three, we randomly selected 120 articles from the clinical set, 120 from the genetic set, and 45 from the cardiology set.
Step four subjected each of these articles to detailed review to assure that the article: (1) reported on original research; (2) used human subjects or human tissue samples; and, (3) concerned human genetics (and not bacterial or viral DNA). If an article did not meet one of the three criteria, it was excluded from the sample. Articles were also evaluated to assure that common words, such as “white” or “race” were used to describe the race or ethnicity of subjects. Articles that used these terms for different purposes or as part of another word were eliminated. (e.g., “white wine” [Mukamal et al., 2003
], or “Arabidopsis” [Housworth and Stahl, 2003
].) After this review, 100 articles remained in the clinical set, 31 from cardiology, and 102 in the genetics set. The cardiology articles were merged into the clinical set at this point resulting in a clinical sample of 131 articles.
To create the general article sample, we followed the same four steps with one difference. We ran the same MeSH and MeSH-based term searches, and selected articles based on the same combinations: a race or ethnicity term and a genetic term, or a population term and a genetic term. However, from this set we then eliminated all articles from journals we had included in the high impact sample. Searches on the remaining articles returned 1,575 selections from which we then randomly selected 120 articles. Step four reduced this sample for the general article set to 97. Added to the high impact clinical and high impact genetics articles, these additional 97 articles resulted in a total sample of 330 articles for analysis.
Coding and Inter-Rater Reliability
Once each sample was finalized, full text PDFs were downloaded for each article, then converted to Rich Text Format documents and imported into Atlas.ti version 5 [Muhr, 2004
], a qualitative analysis software package. Content codes were developed to reflect how the research population was discussed, and the structure and main components of the article. The initial set of codes was defined and then tested by a team of four researchers, including PS and MKC. These codes were subjected to multiple evaluations including several rounds of consensus coding [Jenkins et al., 2005
] and discussion, and review by two different advisory groups that included science journal editors and clinical and research geneticists. When the codes were judged to adequately capture the relevant article features, a codebook was created containing coding rules, definitions, and examples.
Three coders were trained using articles that had been coded as part of codebook development. Coder reliability was assessed on each article and training continued on additional previously coded articles until trainee coding largely matched approved coding. To evaluate inter-rater-reliability, these coders then coded a sample of 72 new articles. Each article was coded by a pair of coders and assignments were made such that each pair coded approximately one-half of the articles. Coding was then compared between coders and an agreement/ disagreement ratio was calculated. The inter-rater-reliability score on these 72 articles was 90.6% agreement. Subsequently, each article was coded by one coder. 10% of the remaining sample was subjected to the same inter-rater reliability test and agreement remained >90%.