|Home | About | Journals | Submit | Contact Us | Français|
Limited psychometric research has examined the reliability of self-reported measures of neighborhood conditions, the effect of measurement error on associations between neighborhood conditions and health, and potential differences in the reliabilities between neighborhood strata(urban vs. rural and low vs. high poverty).
We assessed overall and stratified reliability of self-reported perceived neighborhood conditions using 5 scales (Social and Physical Disorder, Social Control, Social Cohesion, Fear) and 4 single items (Multidimensional Neighboring). We also assessed measurement error-corrected associations of these conditions with self-rated health.
Using random-digit dialing, 367 women without breast cancer (matched controls from a larger study) were interviewed twice, 2–3 weeks apart. We assessed test-retest (intraclass correlation coefficients [ICC]/weighted kappa [k]) and internal consistency reliability (Cronbach’sα). Differences in reliability across neighborhood strata were tested using bootstrap methods. Regression calibration corrected estimates for measurement error.
All measures demonstrated satisfactory internal consistency (α≥.70) and either moderate (ICC/k=.41–.60) or substantial (ICC/k=.61–.80) test-retest reliability in the full sample. Internal consistency did not differ by neighborhood strata. Test-retest reliability was significantly lower among rural (vs. urban) residents for 2 scales (Social Control, Physical Disorder) and 2 Multidimensional Neighboring items; test-retest reliability was higher for Physical Disorder and lower for 1 item Multidimensional Neighboring item among the high (vs. low) poverty strata. After measurement error correction, the magnitude of associations between neighborhood conditions and self-rated health were larger, particularly in the rural population.
Research is needed to develop and test reliable measures of perceived neighborhood conditions relevant to the health of rural populations.
Significant evidence spanning the multiple and diverse disciplines of public health, criminology, and urban sociology has demonstrated consistent associations between the social and physical environment of local residential neighborhoods and a range of individual outcomes.[1–7] In the public health and medical literature, research findings suggest relationships between neighborhood disadvantage and poorer physical and mental health.[4, 8] Several methods are available for the assessment of neighborhood conditions. Existing sources of administrative data such as U.S. Census data yield objective assessments. Systematic assessments of neighborhoods by independent, trained observers and self-reported perceived neighborhood conditions by neighborhood residents may yield more subjective assessments of neighborhood conditions. A growing body of research has focused on the unique role of residents’ perceived neighborhood conditions such as social and physical disorder. Self-reported measures can capture conditions such as satisfaction and relations with neighbors and neighborhoods, perceived safety, and neighborhood problems, [6, 9, 10] which are not captured by more objective measures or by outsider observers. Several studies even have shown that, compared to other assessment methods, perceived neighborhood conditions demonstrate equal or even greater associations with self-rated health [11–13] and other self-reported health outcomes [14, 15]. Given the use of both self-reported exposure and outcome data, however, the extent to which these stronger associations represent same-source bias is unclear. Regardless, to date, research indicates that both perceived and more objectively measured neighborhood conditions play important, if somewhat differing, roles in health. The extent to which perceptions or reality are stronger predictors of health likely depends on the measured health outcomes or neighborhood conditions-for example, perceived fear or safety may be particularly important for outdoor exercise behaviors or for mental health.
Although increasingly used, a rather limited body of research has addressed the psychometric properties of perceived neighborhood conditions.[16–22] Accordingly, the reliability of several validated and widely used measures of perceived neighborhood conditions remains unexplored. These measures, including Perceived Disorder and Collective Efficacy, have been consistently linked with a variety of individual outcomes (e.g. self-rated health, violent crime, depression, mistrust, chronic conditions) across a multidisciplinary literature.[6, 14, 23–26].
Reliability is a necessary precursor for validity. Low reliability may affect the performance and interpretation of a measure, contribute to participant misclassification, and attenuate associations with other variables in epidemiological research. Specifically, low reliability may result in biased measures of association, reflecting the influence of measurement error rather than the underlying true association. However, little is known about the extent to which measurement error as a result of low reliability of perceived neighborhood conditions affects observed associations.
Many commonly measured perceived neighborhood conditions were initially developed in the urban sociology and/or criminology literature and while the scales are more widely used today, they are still predominantly used in higher poverty, urban populations.[4, 6, 27] While there is some evidence that residents from suburban or less densely populated areas may not view neighborhood conditions the same as respondents from inner-city or more densely populated metro areas,[28, 29] the reliability and relevance of commonly used perceived neighborhood condition measures in rural populations remains largely unknown. In the event that measured conditions are less relevant in rural residents’ lives (e.g., the problem of graffiti), rural residents may be more likely to guess on these items. Guessing may result in greater measurement error in rural areas, leading to lower internal consistency and test-retest reliability, and possibly attenuating associations between neighborhood conditions and outcome variables.
The limited data on the reliability of perceived neighborhood conditions, especially in rural areas, and the lack of assessment of the potential role of measurement error in these measures, represents a major limitation in the neighborhood effects literature. As existing measures continue to be integrated into a wider literature and new measures of self-reported conditions are developed, it will be important to ensure they are reliable in both urban and rural populations as well as in low vs. high poverty areas. Therefore, we focused on the reliability assessment of more general social and physical neighborhood characteristics using some of the most widely used and broadly relevant measures available. We assessed overall and stratified (urban vs. rural and low vs. high poverty) test-retest and internal consistency reliability of perceived neighborhood conditions and we assessed the role of measurement error in associations of these conditions with self-rated health among urban/rural and low/high poverty residents.
Data were collected via telephone interview as part of a study of Missouri residents examining the effect of adverse neighborhood conditions on quality of life among women with breast cancer and an age-race-geography-matched, random-digit-dialed sample of women without breast cancer. The latter group of women without breast cancer was interviewed for this test-retest study. The Washington University School of Medicine institutional review board approved the study.
All participants during the last9month period of the larger study who, during their first telephone interview, stated that they were interested in participating in a follow-up study (98%)were eligible to be re-interviewed for the retest study. The second interview took place 2–3 weeks after the first interview with a different interviewer. This time frame, often used in test-retest studies, is long enough to ensure that respondents are unlikely to remember their original answers, yet short enough that changes in perceptions or underlying neighborhood conditions are unlikely. Up to 15 call attempts, distributed across weekdays, weeknights, and weekends, were made.
The Time 1interviews obtained information about sociodemographic characteristics; both interviews collected information about street address, length of residence, and perceived neighborhood conditions. Street addresses were geocoded and participants were linked to census tracts. Urban/rural status and poverty rate were included as indicators of objective neighborhood conditions and were measured at the census-tract level using2000 U.S. Census data. Poverty rate was selected because it is a robust indicator of socioeconomic status across levels of geography and time, has been associated with various health outcomes, and has relevance for policymakers.[30, 31] Area poverty rate was defined as the percent living in poverty in the residents’ census tract and dichotomized at the median: <9.14% or ≥9.14%. Rural/urban status was defined using Rural Urban Continuum Area (RUCA) Codes of the residents’ census tract. The RUCA distinguish metropolitan (codes 0–3) from non-metropolitan (codes 4–10) census tracts using measures of daily commuting connectivity, urbanization, and population size and density.
We measured multiple perceived neighborhood conditions using 5multi-item scales and 4single item measures. All measures referred to participants’ residential neighborhood; the exact boundaries of those neighborhoods were left up to the interpretation of the individual. Perceived Neighborhood Disorder was measured with the Ross-Mirowsky 15-item 4-point Likert-type scale consisting of two subscales, Social Disorder (n=9 items) and Physical Disorder/Decay (n=6 items). Neighborhood disorder captures the extent to which residents’ perceive cues in their neighborhoods signaling the breakdown of social control. Collective Efficacy was measured with two subscales, informal Social Control, measured with 5 5-point Likert-type items, and Social Cohesion and Trust, measured with 5 4-point Likert-type items. Social Control captures the resident’s perceptions of the community’ s ability to control the behavior of inhabitants and Social Cohesion measures the resident’ s perceptions of the community’s ability to organize effectively to promote positive and repel negative influences. Neighborhood Fear was measured with the sum of 3 items capturing the number of days in the previous week (0–7) during which respondents feared violent or criminal activities or were afraid to leave their home. Lastly, we used 4 of the 14 items of the Multidimensional Measure of Neighboring capturing dimensions of social life within neighborhoods. We selected 3 items from the “Neighborhood attachment” subscale and one of the two items capturing the “weak social ties” subscale because they capture unique aspects of neighboring not captured in the other, more widely studied scales used in this study. Items not selected demonstrated considerable conceptual overlap with other measured scales, particularly the Social Cohesion and Social Disorder scales. Three of the four items were measured with a 5-point Likert-type scale and one item used a 6-category response option to measure the number of neighbors with which one socializes. The four items were not summed and used as a scale because no data exist to support the validity or internal consistency reliability of a scale using only these 4 items.
We measured one health outcome, self-rated health, a valid and reliable measure of health status used widely in the neighborhood effects literature. Self-rated health was measured at Time 1with the question, “How would you rate your health—would you say it is excellent, very good, good, fair, or poor?” Similar to other studies,[33–35] we contrasted persons who answered “fair” or “poor” to persons who answered “excellent,” “very good”, or “good.”
Demographic and socio economic covariates were measured at Time 1and include the following: age (25–44, 45–54, 55–64, 65–74, ≥75), race (non-Hispanic white, non-Hispanic black, or Hispanic/other/multiple), education (≤high school, some college, ≥college graduate), household income ($≤24,999, 25–34,999, 35–49,999, 50–74,999, ≥75,000, missing), marital status (married/living together vs. unmarried), employment status (employed, retired, unable to work/unemployed, or student/homemaker), insurance (yes/no), having a usual source of medical care (yes/no), smoking status (current, former, never), leisure time physical activity in the last 30 days (yes/no), home owner (yes/no), years living in current residence (<5, 5–9.9, 10–19.9, 20–29.9, ≥30.0), and census tract-level percent living in poverty (<median of 9.14% vs. ≥median).
Descriptive statistics and reliability analyses are provided for the total sample and by urban/rural and low/high poverty strata. For consistency, all perceived neighborhood conditions were coded such that higher scores indicate greater neighborhood disadvantage. All scales are scored as the mean of the sum of the items with the exception of the Fear scale which due to its skewed distribution, uses the sum of the items.
For multi-item scales, internal consistency reliability was assessed using Cronbach’s α coefficient and test-retest reliability was assessed with intraclass correlation coefficients(ICC) and associated 95% confidence intervals from a 1-way analysis of variance. The ICC is defined as the between-subject variation divided by sum of between-and within-subject (test-retest) variation and represents the proportion of the total variation in a measure due to true differences between subjects rather than differences due to random error and variability between measurement times. The ICC ranges from 0–1, where 0 indicates that subject variability is negligible compared to total measurement error (retest variability + random error) and 1 indicates that total measurement error is negligible compared to the variability between subjects. To test differences between ICCs from the rural vs.urban and the low vs. high poverty strata, we used 1000 bootstrap samples to obtain the distribution of difference in two ICCs under the null hypothesis that the two ICCs are equal. Then we obtained the achieved significance level (ASL) as the p-value by comparing the observed difference in two ICCs with the null distribution.
For the ordinal single-item variables, test-retest reliability was assessed using percent agreement and weighted kappa and 95% confidence intervals. The kappa statistic adjusts for the amount of agreement expected to occur by chance alone. We report the linearly weighted kappa because it corrects for the magnitude of the difference between adjacent and nonadjacent ordinal categories and it is less sensitive to the range of responses than its alternative, the quadratic-weighted kappa. We test the difference of the linearly weighted kappas between the urban/rural and low/high poverty strata with a chi-square heterogeneity test; p values <.05 indicate significant heterogeneity between the strata.[41, 42] To correct for the tendency of kappa to be highly dependent on the prevalence of the condition in the population, we also report the prevalence-and bias-adjusted kappa (PABAK). PABAK assumes fifty percent prevalence of the condition and absence of any bias, thereby reflecting an ideal situation and ignoring the prevalence and bias present in the “real world”.
For both ICC and kappa, we judge the adequacy of test-retest reliability following the guidelines as suggested by Landis and Koch: poor: 0–0.20; fair: 0.21–0.40; moderate: 0.41–0.60; substantial: 0.61–0.80; and nearly perfect agreement: 0.81to 1.0. We adopt the convention of claiming satisfactory internal consistency when coefficient α≥.70.
To estimate the effects of measurement error on observed associations, we used regression calibration (RC) to calculate the calibrated mean neighborhood conditions. We calculate both the naïve (uncorrected)association of mean (Time 1 and 2) neighborhood conditions and the calibrated mean(Time 1 and 2)neighborhood conditions with self-rated health. RC predicts and uses the “true” neighborhood characteristics for each subject to correct effect estimates. Neighborhood conditions are assumed to be measured with random additive error, estimated from test-retest replicated measures, effectively adjusting for test-retest reliability. Using a linear calibration function for replicated data, the calibrated mean for each participant can be calculated as: , where tot is the grand mean of all observations, is the mean of the replicate measurements for each participant, and λ is the reliability coefficient ICC. We compare the mean calibrated (regression calibration) odds ratio that corrects for measurement error to the naïve analysis(uncorrected for measurement error). Due to our small sample size, we include only those potential confounders that were associated (p<.05) with both the outcome (self-rated health) and the stratifying variable (urban/rural), including education (high school or less vs. some college or more) and income (<$25K, ≥25K, or missing)in measurement error analyses.
The first interview was completed by 1166 women without breast cancer between August, 2007 and June, 2009. The CASRO response rate, reflecting respondent cooperation and telephone sampling efficiency, was 69.1%. Of all eligible and interested participants in the test-retest study (n=475), 17.9% were unable to be reached within the 14–21 day window. Of the390 women who were reached, 367(94.1%) completed the second interview for the test-retest study within the 2–3 week window (mean=19.1 days; median=20 days) after the initial interview. The average interview administration time was 34.1 minutes (SD=7.7) for the first interview and 18.4 (SD=4.1) minutes for the second interview (reflecting the abbreviated Time 2 survey). Descriptive statistics of the 367women in the retest group (mean age 60.6years, range: 25–95) are shown in Table 1. Urban residents (n=268) were more likely to be non-white, have higher education and income, better self-rated health, to have lived in their homes for a shorter period of time, and to live in census tracts with lower rates of poverty. Because poverty status of urban and rural residents varied so dramatically, we also report reliability analyses by neighborhood poverty status (above/below median of 9.14% poverty).
All items are listed in Table 2. Due to highly skewed responses, several items were collapsed: items measuring Neighborhood Fear were dichotomized (0 vs. ≥1 days)prior to summing to create the Neighborhood Fear scale and one item in the Multidimensional Neighboring scale(MN4) was collapsed into 3-categories(≤3, 4–6, >6). Spearman correlations between all measured scales and items are shown in Table 3. Most scales and items were inter-correlated, including, as expected, the subscales of Collective Efficacy (Social Cohesion and Social Control) and Neighborhood Disorder (Physical and Social Disorder).
All measured scales, with the exception of the measurement of Social Control among the rural residents at both time points and among the low poverty residents at Time 1, and Neighborhood Fear at Time 1 in the total and rural samples, demonstrated satisfactory internal consistency reliability (α≥.70) (Table 4).
Internal consistency reliability was not statistically different across either of the compared strata(urban/rural or low/high poverty) at Time 1 (Table 4) or at Time 2 (data not shown). However, while point estimates for internal consistency reliability were very similar across urban/rural strata for two of the scales (Social Cohesion and Fear), the remaining scales (Social Control, Physical Disorder, Social Disorder) demonstrated higher reliability among the urban residents. Point estimates for internal consistency reliability were quite similar in the high and low poverty samples for most scales, with slightly greater reliability for Physical Disorder and Fear(Time 1) and Social Control(Time 1) among the high poverty sample.
For all scales, ICCs among the total sample indicated substantial(>0.70) test-retest reliability. For all items, weighted kappas and PABAKs indicate moderate test-retest reliability in the total sample (Table 4).
While point estimates of test-retest reliability were higher in the urban (vs. rural) sample for every scale and for 3 of the 4 items, significant differences were evident only for the following scales and items: Social Control, Physical Disorder, MN1, and MN4. For each of these, the ICCs demonstrated substantial reliability among urban residents and only moderate test-retest reliability among rural residents. Among the high poverty sample, test-retest reliability was significantly higher for Physical Disorder and was significantly lower for MN4.
Overall, 16.9% of women reported being in fair or poor health. Both the naïve (uncorrected) and regression calibration results showed that women were more likely to report being in fair or poor health when living in neighborhoods with worse perceived conditions although the confidence intervals for several of the odds ratios included the null (Table 5). In general, compared to regression calibration-corrected estimates, naïve (uncorrected)estimates of the association of mean neighborhood characteristics with self-rated health were biased toward the null(Table 5). For nearly all items and scales, the attenuation in naive analyses, resulting from measurement error, was more substantial among rural residents than among the urban residents. The naïve estimates were even more attenuated when only Time 1 measures were used (data not shown), reflecting greater within-person error when using a single vs. replicate (mean) measures. In sensitivity analyses, we also ran both unadjusted models and models adjusting for the demographic variables of race and age (data not shown), however the results were not substantially different from the data presented in Table 5.
Overall, all measured perceived neighborhood conditions demonstrated satisfactory internal consistency and at a minimum, moderate or substantial test-retest reliability in the full sample. These results confirm the test-retest and internal consistency reliability of questions designed to capture perceived neighborhood conditions using a mixed urban and rural sample.
However, the reliability and utility of these measures among rural residents is less clear. While internal consistency reliability did not differ statistically by urban/rural status, point estimates were generally slightly lower in rural populations. And for several scales and items, test-retest reliability was significantly lower in rural populations; for others, the differences in reliabilities between rural and urban populations was not statistically significant.
Associations of perceived neighborhood conditions with self-rated health were generally attenuated in naïve analyses and after correction for measurement error, the odds ratios demonstrated stronger associations. Notably, naïve associations appeared to be slightly more attenuated among rural residents, reflecting greater measurement error in this group. This measurement error is caused in part by the lower reliability among rural residents that we demonstrated in several of the measured neighborhood conditions.
While we cannot draw definitive conclusions about the reasons for lower reliability, greater measurement error, and attenuated associations across different neighborhood strata, it is possible that some aspects of neighborhood conditions do not apply equally to the residents in urban/rural or low/high poverty neighborhoods. For example, in the event that some referenced scenarios (e.g. graffiti) are uncommon occurrences or not as noticeable in rural neighborhoods, residents may guess more frequently when answering these items, making the scale less reliable and introducing greater measurement error at both time points among rural residents.
With regard to urban/rural differences, one might expect to find lower reliability for Physical Disorder, Social Control, and MN4 among rural residents. The Physical Disorder scale includes items referencing graffiti, noise, vandalism, and abandoned buildings, all of which are likely to be less relevant for rural residents. Similarly, Social Control refers to the likelihood that neighbors “do something” about deviant behaviors unlikely in rural settings, including kids hanging out on a street corner or spray-painting graffiti on a local building. For MN4, (How many of your closest neighbors do you typically stop and chat with when you run into them?), the number of close neighbors likely differs by urban/rural status, as does the frequency of running into them. Accordingly, we observed significantly lower test-retest reliabilities for each of these variables, but we found no differences in internal consistency reliability. In contrast, equivalent reliability might be expected for scales and items such as Social Disorder, Social Cohesion, Fear, and MN1-3 given their measurement of more universal phenomena (e.g. feelings and relationships with neighbors, etc.). Indeed, we did not find any significant differences in either type of reliability for these items with one exception – MN1 (I feel strongly attached to this residence) had significantly lower test-retest reliability in the rural sample.
For the high/low poverty comparisons, one might expect higher reliability for Social Disorder, Physical Disorder, Fear, and Social Control in high-poverty samples, given that each of these scales refers to disorder and/or behaviors more common in high poverty areas. In contrast, one would not expect reliability differences for Social Cohesion or MN1-4. However, in general, while we found a trend toward greater reliability in the high poverty sample on these as well as other scales, only one comparison was significantly different in the expected direction: test-retest reliability was significantly higher for Physical Disorder in the high poverty sample. Surprisingly, the test-retest reliability for MN4 (chatting with neighbors) was significantly lower among the high poverty sample.
While some comparisons demonstrated significantly different test-retest reliability, internal consistency reliability was not significantly different across any strata comparisons. This could reflect true equivalence of internal consistency across strata, with items in each scale “hanging together” in the same way in both groups. However, point estimates indicated some what lower internal consistency reliability in the rural sample, and the lack of significance may simply be a side effect of low power to detect true differences.
At this time, is not clear what type of perceived neighborhood conditions may be most relevant or reliable for rural populations. While our results suggest that the measures of Social Disorder and Social Cohesion may be equally reliable and therefore more universal phenomena, future research should explore alternative aspects of the physical environment and social control that may be more relevant among rural residents. Other perceived neighborhood conditions, including geographic and/or social isolation, access to social and/or health care services, the role of shared values and/or culture, and socio demographic homogeneity, should also be explored.
Importantly, we assess only the individual-level measurement properties of perceived neighborhood conditions. To assess the group-level measurement properties of these conditions, ecometric analyses are needed. Ecometric analyses integrate traditional psychometric analysis into multilevel frameworks, allowing for reliability assessment of a measure at both the individual-and neighborhood-level. By calculating the random variation on a measure between residents living in a neighborhood, this approach can provide evidence for or against the use of aggregated individual-level responses as reliable indicators of mean neighborhood-level constructs.[9, 51] Our data were too sparse for the grouping and multilevel analysis by census tract or even by county that this approach requires: the participants in this study are distributed across 298 census tracts (mean: 1.2 individuals per tract, range: 1–5) and 88 counties (mean: 4.2 individuals per county, range: 1–107).
This study has several limitations. While matched to a population-based sample, our sample represents an older, relatively affluent, and predominantly white group of women, limiting the generalizability of our results. The primary study was not designed to test urban-rural differences which could have limited our power to detect differences. In particular, our small sample sizes, particularly in the rural sample, may have contributed to the wide confidence intervals seen in the measurement error analyses. The larger analytic standard errors calculated using regression calibration also contributed to the wide intervals. Additionally, our reliability analyses do not account for the differential characteristics of respondents in urban and rural areas; future research should strive to account for any such differences. Finally, the importance of reliability in perceived fear in the previous week (Neighborhood Fear scale), is less clear than for our other measures given the greater potential for change in this scale over a short timeframe.
Current understanding about the psychometrics of perceived neighborhood conditions is rather limited.[16–22] Little research has examined the measurement properties of perceived global neighborhood conditions across urban and rural neighborhoods and no studies, to our knowledge, have assessed the role of measurement error in these conditions. We addressed these gaps in the literature by examining multiple global measures of perceived neighborhood conditions and conclude that while all measures demonstrated satisfactory internal consistency and test-retest reliability in the full sample, the lower reliability and greater measurement error among rural residents disproportionately attenuated measures of effect in this population and may have other, unknown consequences.
A deeper understanding of both the psychometric and ecometric properties of perceived neighborhood conditions is needed to advance the growing body of research linking neighborhoods and health. More research will be needed to understand the implications of conceptualizing and analyzing perceived neighborhood conditions as either individual-level or group-level constructs, or whether they should be incorporated at both levels using a multilevel framework as Kawachi et al. have advocated for the measurement of Social Capital. Given recent research demonstrating differing patterns of associations between neighborhood context and mental health status across urban and rural neighborhoods, more attention to context in rural areas is warranted. Particular work is needed to develop and test valid and psychometrically and ecometrically reliable measures relevant across diverse populations and neighborhoods as well as measures specifically designed to capture neighborhood conditions relevant to the health of rural populations.
We thank the Alvin J. Siteman Cancer Center at Barnes-Jewish Hospital and Washington University School of Medicine in St. Louis, Missouri, for the use of the Health Behavior, Communication and Outreach Core. This research was supported in part by grants from the National Cancer Institute (CA112159, CA91842), and preparation of the manuscript was supported in part by the first author’s Alvin J. Siteman Cancer Center Prevention and Control Program postdoctoral fellowship and her career-development award from the Institute of Clinical and Translational Sciences at Washington University in St. Louis, funded by the National Institutes of Health/National Center for Research Resources (KL2 RR024994). Its contents are solely the responsibility of the authors and do not necessarily represent the views of the funding agencies.
Publisher's Disclaimer: LICENCE STATEMENT
The Corresponding Author has the right to grant on behalf of all authors and does grant on behalf of all authors, an exclusive licence (or non exclusive for government employees) on a worldwide basis to the BMJ Publishing Group Ltd and its Licensees to permit this article (if accepted) to be published in JECH editions and any other BMJPGL products to exploit all subsidiary rights, as set out in our licence (http://jech.bmj.com/ifora/licence.pdf).
Competing Interest: None declared.