Overall, all measured perceived neighborhood conditions demonstrated satisfactory internal consistency and at a minimum, moderate or substantial test-retest reliability in the full sample. These results confirm the test-retest and internal consistency reliability of questions designed to capture perceived neighborhood conditions using a mixed urban and rural sample.
However, the reliability and utility of these measures among rural residents is less clear. While internal consistency reliability did not differ statistically by urban/rural status, point estimates were generally slightly lower in rural populations. And for several scales and items, test-retest reliability was significantly lower in rural populations; for others, the differences in reliabilities between rural and urban populations was not statistically significant.
Associations of perceived neighborhood conditions with self-rated health were generally attenuated in naïve analyses and after correction for measurement error, the odds ratios demonstrated stronger associations. Notably, naïve associations appeared to be slightly more attenuated among rural residents, reflecting greater measurement error in this group. This measurement error is caused in part by the lower reliability among rural residents that we demonstrated in several of the measured neighborhood conditions.
While we cannot draw definitive conclusions about the reasons for lower reliability, greater measurement error, and attenuated associations across different neighborhood strata, it is possible that some aspects of neighborhood conditions do not apply equally to the residents in urban/rural or low/high poverty neighborhoods. For example, in the event that some referenced scenarios (e.g. graffiti) are uncommon occurrences or not as noticeable in rural neighborhoods, residents may guess more frequently when answering these items, making the scale less reliable and introducing greater measurement error at both time points among rural residents.
With regard to urban/rural differences, one might expect to find lower reliability for Physical Disorder, Social Control, and MN4 among rural residents. The Physical Disorder scale includes items referencing graffiti, noise, vandalism, and abandoned buildings, all of which are likely to be less relevant for rural residents. Similarly, Social Control refers to the likelihood that neighbors “do something” about deviant behaviors unlikely in rural settings, including kids hanging out on a street corner or spray-painting graffiti on a local building. For MN4, (How many of your closest neighbors do you typically stop and chat with when you run into them?), the number of close neighbors likely differs by urban/rural status, as does the frequency of running into them. Accordingly, we observed significantly lower test-retest reliabilities for each of these variables, but we found no differences in internal consistency reliability. In contrast, equivalent reliability might be expected for scales and items such as Social Disorder, Social Cohesion, Fear, and MN1-3 given their measurement of more universal phenomena (e.g. feelings and relationships with neighbors, etc.). Indeed, we did not find any significant differences in either type of reliability for these items with one exception – MN1 (I feel strongly attached to this residence) had significantly lower test-retest reliability in the rural sample.
For the high/low poverty comparisons, one might expect higher reliability for Social Disorder, Physical Disorder, Fear, and Social Control in high-poverty samples, given that each of these scales refers to disorder and/or behaviors more common in high poverty areas. In contrast, one would not expect reliability differences for Social Cohesion or MN1-4. However, in general, while we found a trend toward greater reliability in the high poverty sample on these as well as other scales, only one comparison was significantly different in the expected direction: test-retest reliability was significantly higher for Physical Disorder in the high poverty sample. Surprisingly, the test-retest reliability for MN4 (chatting with neighbors) was significantly lower among the high poverty sample.
While some comparisons demonstrated significantly different test-retest reliability, internal consistency reliability was not significantly different across any strata comparisons. This could reflect true equivalence of internal consistency across strata, with items in each scale “hanging together” in the same way in both groups. However, point estimates indicated some what lower internal consistency reliability in the rural sample, and the lack of significance may simply be a side effect of low power to detect true differences.
At this time, is not clear what type of perceived neighborhood conditions may be most relevant or reliable for rural populations. While our results suggest that the measures of Social Disorder and Social Cohesion may be equally reliable and therefore more universal phenomena, future research should explore alternative aspects of the physical environment and social control that may be more relevant among rural residents. Other perceived neighborhood conditions, including geographic and/or social isolation, access to social and/or health care services, the role of shared values and/or culture, and socio demographic homogeneity, should also be explored.
Importantly, we assess only the individual-level measurement properties of perceived neighborhood conditions. To assess the group-level measurement properties of these conditions, ecometric analyses are needed. Ecometric analyses integrate traditional psychometric analysis into multilevel frameworks, allowing for reliability assessment of a measure at both the individual-and neighborhood-level. By calculating the random variation on a measure between residents living in a neighborhood, this approach can provide evidence for or against the use of aggregated individual-level responses as reliable indicators of mean neighborhood-level constructs.[
9,
51] Our data were too sparse for the grouping and multilevel analysis by census tract or even by county that this approach requires: the participants in this study are distributed across 298 census tracts (mean: 1.2 individuals per tract, range: 1–5) and 88 counties (mean: 4.2 individuals per county, range: 1–107).
This study has several limitations. While matched to a population-based sample, our sample represents an older, relatively affluent, and predominantly white group of women, limiting the generalizability of our results. The primary study was not designed to test urban-rural differences which could have limited our power to detect differences. In particular, our small sample sizes, particularly in the rural sample, may have contributed to the wide confidence intervals seen in the measurement error analyses. The larger analytic standard errors calculated using regression calibration also contributed to the wide intervals. Additionally, our reliability analyses do not account for the differential characteristics of respondents in urban and rural areas; future research should strive to account for any such differences. Finally, the importance of reliability in perceived fear in the previous week (Neighborhood Fear scale), is less clear than for our other measures given the greater potential for change in this scale over a short timeframe.