Search tips
Search criteria 


Logo of jurbhealthspringer.comThis journalToc AlertsSubmit OnlineOpen ChoiceThis journal
J Urban Health. 2009 March; 86(2): 161–182.
Published online 2008 December 23. doi:  10.1007/s11524-008-9322-7
PMCID: PMC2648887

Use of City-Archival Data to Inform Dimensional Structure of Neighborhoods


A growing body of research has explored the impact of neighborhood residence on child and adolescent health and well-being. Most previous research has used the US Census variables as the measures of neighborhood ecology, although informative census data are not designed to represent the sociological and structural features that characterize neighborhoods. Alternatively, this study explored the use of large-city administrative data and geographical information systems to develop more uniquely informative empirical dimensions of neighborhood context. Exploratory and confirmatory structural analyses of geographically referenced administrative data aggregated to the census-block group identified three latent dimensions: social stress, structural decline, and neighborhood crime. Resultant dimensions were compared through canonical regression to those derived from US Census data. The relative explanatory capacity of the city-archival and census dimensions was assessed through multilevel linear modeling to predict standardized reading and mathematics achievement of 31,742 fifth- and 28,922 eight-grade children. Results indicated that the city-archival dimensions uniquely augmented predictions, and the combination of city and census dimensions explained significantly more neighborhood effects on achievement than did either source of neighborhood information independently.

Keywords: Neighborhood, Indicators, Census, Archival data, Dimensional structure, GIS, Neighborhood effects, Multilevel modeling, Factor analysis

In a 1998 American Journal of Public Health commentary, Ana Diez-Roux suggested that a large portion of epidemiologic research is based on methodological individualism: the notion that the distribution of health and disease in populations can be explained exclusively in terms of the characteristics of individuals. She argued that macro-level factors should be incorporated into epidemiological research in order to understand the role of non-individual factors in health outcomes. Since this commentary there has been an increase in the research exploring macro-level neighborhood influence on various outcomes, including low birth weight, child mental health, school performance, and childhood obesity.1,2

Whereas research exploring neighborhood influences has increased, so too has the debate on how to adequately quantify neighborhoods in a way that would allow for reliable and valid explanation of their ecological impact on various outcomes. Methods used for measuring neighborhood conditions have included direct observation,3,4 resident survey,5 and secondary data sources.6,7 US Census data are the most frequently used secondary neighborhood data, since it is free and easily accessible at the tract and block-group level.

Assessments of the neighborhood effects literature conducted by Jencks and Mayer8 and Sampson et al.2 noted that few studies identify and measure the social process or mechanism for neighborhood effects. The major reason for this is that the US Census data are relied upon by most neighborhood researchers. Census data provide demographic, economic, educational, and housing information pertaining uniquely to areas located within predetermined geographic boundaries. The census block group is the smallest geographic area9 (encompassing approximately 1,000 individuals) for systematic reporting, and, consequently, researchers frequently equate the statistical phenomena exclusive to a given block-group unit with the phenomena that essentially define a neighborhood.

Census data provide useful socioeconomic information for geographic areas, but they do not include substantial information on the processes hypothesized to shape child and adolescent well-being. For example, Duncan and Raudenbush10 detailed two limitations affecting studies that use census data as indicators of neighborhood conditions. First, the data are collected on a decennial basis, and as one gets further from the year of collection, the data may no longer reflect neighborhood composition. A second problem arises when using census-based measures as evidence for various neighborhood effects as, for instance, when researchers use only the percent of the neighborhood living below poverty failing to consider the risk factors associated with high poverty neighborhoods. Such practice can lead to erroneous interpretation of the influence of census-based neighborhood phenomena on target outcomes when, in fact, the neighborhood phenomena are themselves caused by more precise neighborhood risk factors that correlate with the census measures being used.

In their review of studies assessing neighborhood effects on maternal and child health, Rajaratnam et al.7 noted that, whereas authors provided theoretical explanations for their broad neighborhood constructs, few were explicit about the rationale for selecting the particular indicators that would represent those constructs. Of the 31 studies reviewed, the most frequently used neighborhood variables came from the US Census. The most frequently used census variables were percent living below the poverty level (used in 62% of the 31 studies reviewed), percentage unemployment rates (62%), percentage single-headed households (58%), percentage on public assistance (42%), percentage African American (32%), and percentage of families that had moved in the last 5 years (32%).

It is common for studies using census data to select indicators without a theoretical or statistical justification for their inclusion. Statistical justification at a minimum includes a study of indicator correlations to identify possible multicolinearity. Rajaratnam et al.7 recommends that neighborhood indicators be grounded in theoretical relevance or that represent larger constructs that actually correspond to practical public policy interventions.

Another common problem with neighborhood measures is that they are typically constructed as unweighted sums of census variables.11 Equal weighting for all variables fails to appreciate the differential role variables play in distinguishing neighborhood. Additionally, the unweighted sums are commonly based on percentage calculations for the variable—an approach that fails to incorporate population size in the index creation. Hogan and Tchernis11 recommend using factor analysis as the most effective way to address such concerns. Factor analytic methods, they argue, would provide more robust neighborhood indicators because (a) weights are functions of the model parameters as informed by the data themselves, (b) different population sizes are incorporated naturally and thus reflect posterior variability of the indices, and (c) indices are summarized not as a single number but rather in terms of their posterior distributions that reflect uncertainties about the indicator being developed.

Along these lines, Sampson et al.12 conducted a factor analysis of Chicago census data and obtained a three-dimensional solution. The first dimension, concentrated disadvantage, was comprised of percent of population below poverty, percent receiving public assistance, percent unemployed, percent of all households that are female-headed, percent of population under age 18, and percent Black population. The second dimension, called immigrant concentration, consisted of the percent of Latinos and foreign-born population. A third dimension, termed residential stability, featured percent of population living in same residence as in 1995 and the percent of all housing units that are owner-occupied. Although an important demonstration of factor analysis for defining neighborhood context, the Sampson et al.12 effort was nonetheless limited to census variables, already noted as being problematic. Alternatively, Spencer et al.13 turned to detailed direct observation and coding of non-census local neighborhood features in the Atlanta area, but the procedures required substantial time and personnel and did not produce sufficient data to permit factor analysis. Neighborhood dimensions were resolved through cluster analyses that were unable to significantly augment the information found in census-based factors.

With methodological limitations to the use of census or observational data, researchers14 have sought other available sources of information on neighborhoods. Of the options considered, municipal archival data show promise because they are collected as a normal requirement in running city governments. This spares the expense of new resources as required in the Spencer et al.12 investigation, and, in addition, they are collected on a more continuous basis than the decadal census, thus availing more timely assessments. More importantly, perhaps, city-administrative data tend to be much more locally probative, and, because they are used for in vivo decision making, their scope is more reflective of the sociodynamic and economic processes that differentiate one neighborhood from another. Such databases routinely encompass information at both individual and household levels and span topics such as maternal, neonatal and child health, crime, child abuse and neglect, education, and housing. With the aid of available geographical information systems (GIS), city-administrative data can be associated with their specific address locations and then geocoded and aggregated to user-defined geographic areas (including the specific census tracts or block groups typically used to define neighborhoods).

Within this framework, our study recounts the development and application of empirical dimensions of neighborhoods as found in the nation’s sixth largest city. Archival data from all public agencies and authorities are coupled with GIS technology to identify multiple measures of neighborhood phenomena. Both exploratory and confirmatory structural analyses are applied, whereupon the structural similarities and dissimilarities of city-archival-based and census-based measures are analyzed. Multilevel modeling is used to explore the relative and combined efficacy of city-archival-based and census-based neighborhood dimensions for explaining between-neighborhood differences in children’s school performance.


Data Acquisition

Neighborhood Data

The neighborhood unit was defined as the US Census block group, there being 1,816 such block groups identified by the most recent census for the City of Philadelphia, PA, USA.9 Census-block groups were chosen because they are considered large enough to summarize neighborhood experiences that are common to small groups of residents, yet small enough to distinguish differences in these experiences across a metropolitan area.15,16 Data were obtained through formal memoranda-of-understanding, which allowed for all pertinent city-administrative data to be merged at an individual person or household level and distributed by the Philadelphia Kids Integrated Database System (KIDS), the University of Pennsylvania, Cartographic Modeling Laboratory’s (CML) Neighborhood Information System (NIS;, and the CML’s Crime Base (

Specifically, KIDS provided data on the health and well-being of Philadelphia children, which included vital statistics birth and death certificate data (low birth weight births, infant deaths, births to teen mothers), and the Department of Human Services substantiated child abuse case, substantiated child neglect cases, delinquency out-of-home placements, and dependent out-of-home placements data. CML NIS records provided data on Philadelphia’s built environment and included Water Department and Philadelphia Gas Works service shut off due to delinquent accounts, Department of License and Inspections demolitions, Department of Revenue property tax lien sales, and Philadelphia Fire Department property fire data. CML Crime Base data originated with the Philadelphia Police Department’s Incident Transmittal System (INCT), where incidents are classified according to the FBI’s uniform crime reporting system. The INCT includes data on the crime classification for each incident, to which Philadelphia police have responded. After police department investigation of the crime, classification of the original classifications can be changes within 5 days of the initial police response. Homicides are initially coded as aggravated assaults pending an investigation. Thus, homicides are not included in this data system.

Datasets for each year 2004 to 2006 from KIDS were geocoded based on the address listed on the birth certificate, death certificate, address of the abuse/neglect case, or original address of the child prior to out-of-home placement. The point data for each year was then aggregated to the census-block group, and a 3-year average annual count for each indicator was then calculated for each block group. Since the block group is a relatively small geographic leading to small aggregated counts in many areas, 3-year average was calculated and used for analysis to control for extreme year-to-year fluctuations. The NIS and Crime Base variables were already aggregated to the block group for the years 2004 to 2006. For each variable aggregated, 3-year average annual counts were calculated.

Outcome Data

In addition to block-group data, secondary data analyses were conducted with Pennsylvania System of School Assessment17 standardized reading and mathematics test data. The Pennsylvania Department of Education implemented PSSA testing statewide in the 2001/2002 academic year as a tool for monitoring federal No Child Left Behind18 adequate yearly progress (AYP). NCLB legislation requires that all students must attain measured reading and mathematics proficiency by 2014, and AYP, as determined through standardized tests, is the mechanism by which the federal government certifies compliance. The PSSA is a group-administered, criterion- and norm-referenced test that follows an adaptive testing format and applies Rasch and generalized partial-credit item response theory models for scaling and scoring. A large body of evidence supports PSSA’s reliability and content and criterion-related validity.19,20 From 2001 to 2005, the PSSA was administered to School District of Philadelphia students in grades 5 and 8, and, thus, reading and mathematics scores for students over the 2002/2003 and 2003/2004 academic years were applied as outcome measures for the present study.

Individual student home addresses, as indicated by school district records, were geocoded under their appropriate census-block groups, there being a total of 31,742 fifth and 28,922 eight grader, with participant fifth graders residing in 1,717 (95%) of Philadelphia’s 1,816 block groups and eight graders and eight graders residing in 1,712 (94%) of block groups. (Note: Unrepresented block groups featured commercial, industrial, park, and undeveloped areas.) Participant fifth graders attended 205 different elementary schools and eight graders 131 middle schools. One percent or fewer PSSA reading and mathematics score were missing for either grade.

Data Analyses

Latent Neighborhood Dimensions

Aggregated variables for the 1,816 census-block groups were subjected to exploratory common factor analysis. While administrative variables appeared in deliberately redundant forms (totals, subtotals, percentages, and other transformations of identical data), only raw numerical counts were retained. Since administrative data tend to present high levels of colinearity and multicollinearity,10 preliminary screening analyses were undertaken. Thus, whereas KIDS initially offered 11 variables, NIS 14, and Crime Base 22 that were ostensibly distinct, inspection of bivariate relationships revealed that nine of the 11 KIDS, five NIS, and all 22 crimes variables were essentially noncolinear. The consequent 36 × 36 correlation matrix was submitted to series of common factoring, with squared multiple correlations as initial communality estimates. This strategy is preferred to components analysis with fewer than 40 variables21 inasmuch as the process focuses exclusively on reliable shared variance and inflated standard errors are averted. When assessing all variables in one matrix, the crime variables comprised most of the variables they were essentially swamping the other data and forcing theoretically implausible and uninterpretable solutions. To address this problem, the correlation matrix was bifurcated with KIDS and NIS variables subjected to one series of factor analyses and crime variables to another series.

For each series, analyses applied both orthogonal (varimax and equamax) and oblique (promax) rotations to simple structure, where promaxian structures were estimated from initial varimax or equamax structures. Each resultant model was assessed according to multiple criteria and the ideal factor solution would: (a) satisfy the minimum constraints for Cattell’s scree test,22 Velicer’s23 minimum-average partialing test, and parallel factoring of random normal variables24 based on 100 replications; (b) rejection of the likelihood of an identity matrix for the variables as per Bartlett’s test25; (c) retention of factors with at least three salient variables per factor26 where salience is defined by loadings >.40; (d) minimal alpha coefficient of internal consistency for each retained factor >.70; (e) maintenance of the most parsimonious structure as measured by maximum hyperplane count27; and (f) resolution of factors that are theoretically meaningful.28

Confirmatory analyses were conducted using confirmatory, oblique, and item clustering,29,30 where hypothesized factor membership for items was based on the common factoring, and items were permitted to migrate iteratively to alternative factors that better explained item variance.

Additionally, the factor structures were assessed using confirmatory structural equation modeling. In this analysis, simultaneous equations are created with each salient functioning as a dependent and the latent constructs as independent variables. Results assess the likelihood that the resultant factor solutions were inconsistent with the data.31 The Comparative Fit Index (CFI) and the Root Mean Square Error of Approximation (RMSEA) were used to assess the extent to which the model fit the data.32

Generality of the factor solutions was assessed through comparisons of the structure derived for the full sample of 1,816 neighborhoods to the structures independently found for neighborhood subsamples as determined by differential population density, poverty concentration, and geographic racial/ethnic isolation. In these analyses, Cohen’s coefficient κ33 was used to assess matching factor patterns across groups.

Census Factors

Variables comprising the Sampson et al.12 census-derived factors (concentrated disadvantage, immigrant concentration, and residential stability) were calculated for the 1,816 Philadelphia block groups. The published variable loadings were used to calculate factor block-group scores based on Philadelphia census data. The scores were used as comparisons for overlap, with factors derived on the basis of city-administrative data. To investigate the bimultivariate relationships between the census and administrative neighborhood factors, canonical regression and redundancy analyses34,35 were applied where the census factors served as the X dataset and city-administrative factors as the Y dataset.

Neighborhood Effects

Two-level hierarchical linear modeling was applied to determine and explain the relative proportions of students’ PSSA reading and mathematics achievement associated with between-students and between-neighborhoods variation. For reading and mathematics held respectively as dependent variables, unconditional, one-way random effects ANOVA models were used to decompose the between-students and between-neighborhoods variability. First, census factors alone were used to predict achievement outcomes. Then city-archival factors were used, and finally both sets of covariates were combined to explain neighborhood achievement effects.


Latent Neighborhood Dimensions

Common factoring of the nine KIDS and five NIS variables showed that the two-factor promax (k = 2) nodal met all stated criteria. Table 1 lists the component variables for each dimension, as well as rotated factor loadings and item-total r’s. The precision-weighted factor scores (i.e., scores that were weighted relative to the magnitude of the contribution of each variable to the factor) were transformed by area conversion to normalized T scores with M = 50 and SD = 10. The first dimension (coefficient α = .86) was named social stress in that higher T scores earmarked neighborhoods whose residents experienced noticeably more underweight infants at birth, births to younger teenagers, infant deaths, substantiated child abuse, and out-of-home placements for delinquent and other dependent children. The second dimension (α = .84), termed structural decline, described neighborhood physical states featuring building vacancies and demolitions, frequent lien sales for unpaid tares, and water shut offs due to unpaid bills.

Table 1
Dimensional structure of neighborhood city-archival data (N = 1,816)

Table 1 also summarizes results of confirmatory cluster analysis. The proportion of each variable’s variance explained (R2) by the factor it was associated with through exploratory common factoring (R2 with own factor) was found to be many times higher than the variance explained by the alternative factor (R2 with next factor). These results provide additional support for the reported two-factor solution. Moreover, when the two-factor solution was fitted to the data via structural equations modeling, permitting the factors to correlate, the most pertinent fit indices also supported the plausibility of the structure, where the CFI = .96 and the RMSEA = .05.31

The factor analyses for the 22 neighborhood crime variables identified a single dimension that satisfied all of the stated criteria. This factor was derived from overextraction to a two-factor model whose first factor met all criteria but whose second factor produced too few salient loadings and unacceptable internal consistency and was basically uninterpretable. Model overextraction28 serves to disentangle primary factors from underidentified secondary factors by adding random normal variables to the variable set and rotating more factors than intended to retain.

Table 2 posts loadings and item-total r’s for the retained neighborhood crime dimension (α = .89). As indicated by the factor loadings, neighborhood crime is defined by higher levels of drug use and possession, followed by a range of aggravated and weapons related offenses and arson. All member variables (except disorderly conduct) are regarded as felony crimes under the UCR. Confirmatory clustering supported the placement of variables on the hypothesized factor, and structural equations modeling provided additional support with CFI = .97 and RMSEA = .06.31

Table 2
Dimensional structure of neighborhood crime dimensions (N = 1,816)

Further support for the interpretability of the three neighborhood dimensions (social stress, structural decline, and neighborhood crime) is produced through variance components analysis. Here, the correlation matrix of T scores from the three dimensions is submitted to second-order common factoring. Table 3 shows the correlations among the dimensions and that the strongest overlap (that of social stress and structural decline) still allows substantial separation of the dimensions (i.e., 1  .682 = .54 or 54% nonoverlap). With second-order communalities functioning as indicators of common variance, it was found that at least 20% to 31% of the variance conveyed by the neighborhood dimensions remained both unique and reliable for interpretation.

Table 3
Intercorrelation and variance components of city-archival neighborhood dimensions (N = 1,816)

Generality of Neighborhood Dimensions

An important step in any scale development is to examine whether or not the structure identified using the full data set does in fact hold when examining relevant subsets of the data. In this study, it was deemed important to test the structural generality of the model by population density (four sets of block groups corresponding to quartiles of increasing density), concentrated poverty (two sets of block groups corresponding to those with ≥20% of the population below the poverty level and those with <20% of the population below poverty), and isolation index scores for non-Whites (four sets formed by quartiles).

For each subset of block groups, common factoring was repeated independently, and the factor pattern for the full sample was compared to that for each subset by deeming variables with loadings >.40 salient and others nonsalient, and the agreement of factor patterns was tested with Cohen’s κ coefficient.32 Entries in Table 4 indicate that all factors derived from independent subsets of the neighborhoods achieved high congruence with the factors derived for the full sample of neighborhoods (κ’s ranging .81–1.0), with the exception of social stress within neighborhoods with lower racial isolation, in which case moderate congruence (.71) was evident (Note: Distinctions between high and moderate congruence are based on Landis and Koch.36)

Table 4
Generality of dimensions of city-archival data for subsets of neighborhoods

Comparison of City-Administrative and Census Dimensions

Relationships between the city-administrative dimensions and dimensions based on census information were explored through canonical regression analysis. This procedure is particularly useful because it simultaneously controls for Type I errors driven by the interdependencies of dimensions comprising either set and highlights the multiple ways (if any), whereby the two types of neighborhood measures correspond. A statistically significant overall relationship was discovered, where Wilk’s Λ1 = .36, multivariate F(9, 4405) = 251.99, p < .0001. A statistically significant overall relationship was discovered, where Wilk’s Λ = .36, multivariate F(9, 4405) = 251.99, p < .0001. Two statistically significant and interpretable bimultivariate relationships were detected. The first significant canonical correlation linked the city-derived social stress, structural decline, and neighborhood crime with the census-derived concentrated disadvantage (Rc2 = .76, p < .0001). The second significant canonical correlation linked neighborhood crime with (Rc = .32, p < .001).

Table 5 presents the canonical loadings defining each relationship. The first relationship simultaneously and positively links all three city-archival dimensions (social stress, structural decline, neighborhood crime) with the concentrated disadvantage dimension from the census. The second relationship illustrates that, as census-based residential stability decreases, city-based neighborhood crime increases (and vice versa). Additionally, canonical redundancy indicates that, whereas the city-archival dimensions are able to explain 23.9% of the variability in census dimensions, the census dimensions explain about twice as much (44.2%) of the variability in city dimensions. This suggests that the city dimensions cover a more circumspect range of neighborhood phenomena, but that most of the variability in either source is independent of the other. This raises the prospect that the combination of information from city and census dimensions may substantially augment the knowledge gained from either in isolation.

Table 5
Bimultivariate structure and canonical redundancy of city-archival and the US Census neighborhood dimensions (N = 1,816)

Neighborhood Effects

For each dependent variable (student PSSA achievement scores for fifth-grade reading, eight-grade reading, fifth-grade mathematics, and eight-grade mathematics), an unconditional, one-way random effects, hierarchical linear model was fitted to the data. Each model held student variation at level 1 and neighborhood variation at level 2 and was designed to identify the proportion of explainable variance in student achievement that was associated exclusively with neighborhood differences (rather than between-students differences). It was found that 14.9% of the explainable variance in fifth-grade reading, 11.7% of the variance in eight-grade reading, 20.2% in fifth-grade mathematics, and 15.3% of eight-grade mathematics were attributable to neighborhood differences. The covariance parameter estimates underlying each value was statistically significant, indicating that such values are consequential and permit explanation. The values also exceed substantially the 5% criterion suggested by Raudenbush and Bryk37 for considering differences as less explicable. Thus, for each dependent variable, three regressions with means-as-outcomes models were built. Model 1 entered the three census-based dimensions as covariates, model 2 the three city-based dimensions as covariates, and model 3 all six dimensions simultaneously, with all covariates being level 2 (or neighborhood) predictors.

Table 6 summarizes modeling results for fifth- and eight-grade reading and Table 7 for corresponding mathematics, including coefficients for fixed effects and the percentage of explained neighborhood differences. In order to enhance interpretation, scores for the dependent variables were standardized to unit deviate form (M = 0, SD = 1), as were covariates grand-mean centered and standardized. Compared to city-archival dimensions, the census dimensions were able to explain more of the neighborhood differences in reading achievement (55.4% vs. 51.9% at grade 5 and 64.0% vs. 58.0% at grade 8).

Table 6
Multilevel models predicting fifth- and eight-grade PSSA reading scores
Table 7
Multilevel models predicting fifth- and eight-grade PSSA mathematics scores

Although nearly all of the dimensions contributed significantly to explaining neighborhood effect in reading (the exception being census-based residential stability at grade 8), the contributions of census-based concentrated disadvantage and city-based structural decline and neighborhood crime were more pronounced. Perhaps more revealing were the contributions when census and city dimensions were combined for predicting reading (model 3). Inasmuch as reading models 1 (census dimensions alone) and 2 (city dimensions alone) were nested under model 3 (all dimensions), significance of the increment in model fit was tested by contrasting the −2 log likelihood estimates for the two simpler models against the more complex model, with deviances assumed to be distributed approximately as χ2 (3 df). Both contrasts showed that model 3 was more efficient (p < .001) than the other models, as also reflected in the reduction for Akaike’s Information Criterion38 with model 3 vs. 1 and 2 (refer to Table 6). Here, the contributions of concentrated disadvantage remained dominant, whereas structural decline appeared to diminish somewhat in its prominence, albeit statistically significant. Nevertheless, the combined data source accounted for 63.0% of neighborhood differences at grade 5 and 71.5% at grade 8, effectively increasing explanatory power by 7.5% over the best independent predictions of census data and by 11.1–13.5% over the city dimensions.

These phenomena are, for the most part, echoed for explanation of neighborhood differences in mathematics achievement. Census-based concentrated disadvantage and city-based structural decline and neighborhood crime dominate the accounting of neighborhood effects when the data source remain independent, and the census information explains more variation than city information (51.7% vs. 47.6% at grade 5 and 61.6% vs. 52.0% at grade 8). Statistical contrasts of the census-alone and city-alone dimensions (as displayed for models 1 and 2 in Table 7) vs. the combined model 3 indicated that the combined nodal was more effective than either model 1 or 2 (p’s >.001, df = 3). Upon combination of the data sources, the increment to the contribution of census information in isolation is 5.5–6.2%, while the increment to city information is 10.3–15.1%.


Our research explored the use of city-administrative data to inform the context of neighborhoods. The first step resolved a meaningful structure drawn from contemporary Philadelphia archival data originating with many diverse municipal departments, including public health, welfare, police, fire, education, housing, and licensing. We were able to effectively distinguish multiple-marker dimensions focusing on the degree of social stress experienced by people within neighborhoods, the relative state of structural decline for the neighborhood environs, and the measure of felony crime that affected each neighborhood. The resultant dimensions avoided the limitations of measures that convert frequency data to percentages or rates, and the dimensions were weighted such that the differential contributions of the various markers were reflected in summative scores. Each of the three neighborhood dimensions was demonstrated to convey a substantial amount of reliable variability that was uniquely independent of the other dimensions.

Because the neighborhood dimensions might find application in many types of studies that concentrate on certain geographic or demographic subgroups, it was deemed imperative to assess the structural integrity of the dimensions as they are applied exclusively for subsets of neighborhoods. Thus, Philadelphia’s 1,816 census-block neighborhoods were partitioned into sets that manifested incremental population density, neighborhoods whose residents were highly concentrated poverty vs. neighborhoods with lower poverty concentration, and sets of neighborhoods whose residents are distinguished by incremental levels of racial isolation. Massey and Denton39 noted these sociological distinctions as being crucial elements of the economic, political, and behavioral dynamics of urban populations. It was demonstrated that the dimensional structures unique to the various levels of the population density, poverty, and racial isolation constructs were closely or reasonably represented by the overall dimensional structure resolved for Philadelphia. This evidence lends support for the extension of the city dimensions to investigations focused more narrowly on specific types of neighborhoods.

Because the literature has presented over recent years a variety of studies that apply available data from the US Census to describe neighborhoods, we thought it important to test empirically the uniqueness and redundancy of census- and city-based dimensional structures. To this end, we scored Philadelphia’s neighborhoods according to the factor-analytic scheme derived by Sampson et al.12 for the Chicago metropolitan area. A comparison of the city and census dimensions is further facilitated in that the city dimensional structure deliberately avoided use of alternatively available census information. Bimultivariate comparisons of Philadelphia’s neighborhoods, as depicted by the city and census dimensions, revealed some overlap. Both data sources appear to share a common link in that concentrated disadvantage, as extracted from the census, was positively related to all of the city dimensions. Moreover, census-based phenomena describing the stability of residency in neighborhoods was inversely related to neighborhood crime—a finding that makes great sense whether one tends to view instability as a cause of crime or the reverse. Also interesting was the evidence that, notwithstanding some significant overlap in common variance, more than half of the information conveyed by city vs. census dimensions was nonredundant. This invited the prospect that neighborhood characteristics as informed jointly by the data sources could substantially enrich the overall understanding of census-block neighborhoods.

Meaningful tests of the independent and joint utility of city and census dimensions would have to rest on their capacity to inform phenomena that were not by definition tautologically identified with the markers that comprise the neighborhood dimensions. We decided to explore the ability of the neighborhood dimensions to explain the unique neighborhood variability in resident students’ academic achievement. We emphasize neighborhood variability because, whereas abundant evidence demonstrates that individual students’ school performance is related to personal or familial status, poverty, and such, the pertinent interest here is with that aspect of student achievement that is clearly peculiar to the neighborhood rather than the person. Hence, multilevel modeling was used to isolate the achievement phenomena that distantly differentiated neighborhoods. Once accomplished, the investigation puts the census and then the city dimensions to work, attempting explanation of neighborhood differences in achievement (viz., the neighborhood effects).

The measures used were the same applied by Pennsylvania for all of its high-stakes assessments under federal law. It was learned that most of the neighborhood reading achievement effects were explained by either census or city dimensions and that most or almost all of the neighborhood effects for mathematics were also explained. Although evident that some dimensions (census concentrated disadvantage, city crime, and structural decline) carried most of the weight, all of the dimensions played a significant role in some respect. In all of these tests, the census-based dimensions showed some noticeable advantage in accounting for neighborhood differences. This, we suggest, follows from the fact that the census dimensions cover a somewhat broader scope of sociologic information, whereas the city-based dimensions focus more particularly on local variation related to municipal services.

When the city and census dimensions were permitted to reciprocally augment one another through combined models, their explanatory power was increased, and the models more precisely fitted the data. It is not inconsequential to explain 60% to 70% of the distinctions between neighborhoods, as accomplished here. We noted also that the explanatory power of the combined neighborhood dimensions tended to increase from fifth to eight grade (~10% from the cross-sectional perspective). This might strengthen the argument that student school performance is incrementally affected, as students reside and interact over longer periods in urban neighborhoods. There are indeed plentiful examples of cascading effects of macro environment on school performance.40

Two important findings concern the social stress dimension. First, the dimension points to the bundling of social problems (low birth weights, infant mortality, teen pregnancy, child abuse, and neglect) at the neighborhood level. Second, there is high correlation between the social stress and concentrated disadvantage dimensions. When operating in the same equation, the concentrated disadvantage dimension mediates (partially reduces the unique impact of) social stress. Investigations of other cities have shown similar clustering of social problems and correlations with concentrated poverty.2,41 While useful as a summary predictor, concentrated disadvantage does not lend insight into the processes that lead to and result from economic disadvantage, the processes that distinguish neighborhoods, including neighborhoods that may suffer relative economic impoverishment, but for different reasons and with different consequences. Taken together, the social stress dimension sheds light on the social problems that accompany concentrated disadvantage. Thus, unlike census data, city-administrative data can inform the specific levels of social stress experienced by residents.

The story imparted by a neighborhood’s level of structural decline affords another set of perspectives on the processes that distinguish neighborhoods. It speaks to the relative state of the physical surrounds and to the stability of that state. Neighborhoods with elevated scores on this dimension are typically riddled with abandoned and already razed buildings, and those not uninhabitable are often in arrears for tax or utility payments. Structural decline signals many processes that affect neighborhoods. First, there is the psychological impact upon residents surrounded by the constant reminders of deterioration and blight. Second, the abundance of property liens and termination of even basic utilities are symptoms of broader disinvestment in the community. At still another level, we observed in our efforts to produce sound and nonredundant dimensions of structural decline the role of indicators such as water shut offs and demolitions were often just the more visible tips of icebergs. As have many researchers learned, we saw substantial colinearity among the variables that depict neighborhoods. But the extraordinarily high intercorrellations actually depict many closely linked chains of events as recorded in city data. To illustrate, property disinvestment may manifest itself in delinquent taxes, leading to discontinued water, electric, and gas service, abandonment, fire, condemnation, and demolition. The numerous variables that constitute these chains are thus sequential or coincidental and understandably highly correlated. Whereas sound measurement precludes the use of multiple indicators that are so linked, certain indicators will tend to rise to the top as more stable or viable proxies for many other indicators. Thus, variables such as liens, water shut offs, and demolitions serve to represent much broader stories of neighborhood decline.

While difficult to study a causal relationship between structural decline and poor health and well-being outcomes, research has shown associations. Studies have found associations between features of the built environment and self-esteem42 and conduct and oppositional defiant disorders.43 Cohen et al.44 found that boarded-up housing remained a predictor of gonorrhea rates, all-cause premature mortality, and premature mortality due to malignant neoplasms, diabetes, homicide, and suicide after control for sociodemographic factors.

City-archival data also highlight the unique public safety profiles that differentiate neighborhoods. Felony crime, like good schools, is one of those considerations that translate into neighborhood popularity and growth, or lack thereof. Crime is both a cause and an effect of neighborhood differences, and its presence is perhaps the characteristic attracting the media and repelling investment—but not reflected in census data. High values on all three neighborhood dimensions derived from city data comport with the processes hypothesized to negatively influence health and well-being.

Hill et al.45 emphasize that the very presence of neighborhood crime and other distressing signs of disorder and decay (abandoned buildings, vandalism, drug use) results in chronic psychophysiological distress, which in turn leads to poor health outcome, and Furstenberg46 notes that elevated crime levels for already high-risk neighborhoods functions to undermine parents’ ability to navigate successful child-rearing strategies.

Research has shown that exposure to neighborhood crime and violence increases the likelihood of depression, anxiety,47 aggression48 and posttraumatic stress disorder,49 and poor school performance.50

Reliance on city-archival data does present limitations, both known and unknown. Municipal administration data ordinarily are gathered by numerous organizations, and, depending on the government, conventions may be substantially disparate for any given city. Notwithstanding the more socioeconomic foci of national census data and the sampling errors that census data entail,51 there is at least a general strategy and order to the way that data are collected and prepared for analysis. Municipal data, partly because they originate from so many different arms of government and partly because the rules on quality assurance are less universal and, perhaps, enforced, are more vulnerable to error. Also, the integrity of the collection and preparation of municipal data are probably more dependent on inconstant abilities to fund the process and to avoid deliberate falsification in reporting. It is often difficult to reconcile missing data and, to the extent that many departments collect data, it can be very difficult and expensive to penetrate the bureaucratic boundaries that resist sharing information and to afford the time and expertise to accurately merge data that were never intended to occupy a single dataset. Given the limitation, we believe that the analysis is informative, especially as it contrasts city-archival and census data under the same limitation.

Our study benefited from the foresight and investment of many Philadelphia agencies determined to pool their data in the hope that the whole would exceed the sum of its parts. Nonetheless, our work was impeded as times by inconstant data linking information, the tendency of solitary data sources to change data definitions over time without clear documentation, and the politics of asking much of many. We are indebted to the City of Philadelphia and the University of Pennsylvania for their close cooperation in creating the KIDS, NIS, and other CML data networks. Our study is perhaps also limited by the use of explanatory data (as represented by the city neighborhood dimensional structure and outcome data (school achievement) having been generated by the same city-administrative system. This is offset somewhat by the use of the US Census data, but it does encourage a search for multiple independent sources of information. Finally, we are limited by the fact that most city-archival data were never intended for research purposes. Researchers will always be looking for data that might better answer the questions really cared about, rather than the questions evoked by the available data.

Our research has allowed census-block groups to define the geographic boundaries of neighborhoods. This is defensible from the standpoint that block groups are large enough to provide reasonable within-neighborhood variability and small enough to avert the risk of insensitivity to important variations in socioeconomic, ethnic, and structural distinctions that vary widely across larger boundary limits (e.g., census tracts). Block groups also tend to comport frequently with local service precinct and ward concepts. There exists, nonetheless, a wide variety of conventions that might alternatively be adopted to define neighborhoods (parishes, schools, recreation facilities, shopping), although their boundaries may tend to be more indefinite, and pertinent data are less available, but block-group boundaries may not adequately represent the locations of the phenomena that most influence human lives. To illustrate, consider those block-group residents living near the boundaries. Are not the adjacent block groups as or more relevant than the assigned block group? And what of the many city block groups that would partition neighborhoods by the midline of streets? To some degree, this problem may be mitigated through spatial weighting techniques, which would account for spatial autocorrelation that might exist for specific variables. Thus, to the extent that much of the data pertaining to block groups originate at the individual person level and thereafter is aggregated over many people, it is possible to weigh environmental influences by their proximity to the individual people presumably influenced. Statistical point pattern analysis52 returns focus to the individual’s personal location and builds the neighborhood construct around that location. Such methods may prove more sensitive than aggregated data within fixed boundaries for neighborhood definition. Spatial techniques might also be integrated with temporal measures that assess the duration of exposure associated with particular locations.


There is now a general appreciation for the fact that individual and collective behavior can be studied separately. The individual and the neighborhood (with its many individuals collectively and its structural conditions) each play an important part in child growth and well-being. Policy makers and researchers both know that context makes a big difference in forming human lives. It is exceedingly challenging to study important aspects of many contexts (child rearing, family life, etc.) without considerable expense and intrusion—prices that are usually reserved for desperate and emergency situations (child protection, police, etc.). It makes reasonable sense, though, that researchers take full advantage of the many general archival information sources that focus on child development in context. Neighborhood information is available in many cases, and the brunt of its expense is already absorbed by the taxpayers. We suggest that many social science research inquiries could be enhanced substantially by the normal archival information reflecting neighborhoods’ unique and relative contributions. The augmentation of periodic census data and the multilevel statistical techniques to disentangle corporate neighborhood influences from the individual differences that distinguish people would appear to make the prospects more promising.


This research was conducted with cooperation from the Kids Integrated Data System of the City of Philadelphia, PA, USA, in conjunction with the Cartographic Modeling Laboratory, University of Pennsylvania, Philadelphia, PA, USA.


1Wilks’s lambda is used to test the significance of the first canonical correlation. If p < .05, the two sets of variables are significantly associated by canonical correlation.

2Rc represents the canonical correlation, which is a form of correlation relating two sets of variables. Canonical correlations are interpreted the same as Pearson’s r; their square is the percent of variance in the canonical variate of one set of variables explained by the canonical variate for the other set along the dimension represented by the given canonical correlation.


1. Ellen IG, Mijanovich T, Diliiman KN. Neighborhood effects on health: exploring the links and assessing the evidence. J Urban Aff. 2001; 23(3):391–408.
2. Sampson RJ, Morenoff J, Gannon-Rowley T. Assessing “neighborhood effects”: social processes and new directions in research. Annu Rev Sociology. 2002;28:443–478.
3. Caughy MO, O’Campo PJ, Patterson J. A brief observational measure for urban neighborhoods. Health Place. 2001;7(3):225–236. [PubMed]
4. Raudenbush SW, Sampson RJ. Ecometrics: toward a science of assessing ecological settings, with application to the systematic social observation of neighborhoods. Sociol Method. 1999;29(1):1–41.
5. Mujahid MS, Diez Roux AV, Morenoff JD, Raghunathan T. Assessing the measurement properties of neighborhood scales: from psychometrics to ecometrics. Am J Epidemiol. 2007;165(8):858–867. [PubMed]
6. Chow J, Coulton C. Was there a social transformation of urban neighborhoods in the 1980s? A decade of worsening social conditions in Cleveland, Ohio, USA. Urban Stud. 1998;35(8):1359–1375.
7. Rajaratnam JK, Burke JG, O’Campo P. Maternal and child health and neighborhood context: the selection and construction of area level variables. Health Place. 2005;12(4):547–556. [PubMed]
8. Jencks C, Mayer S. The social consequences of growing up in a poor neighborhood. In: Lynn LE, McGeary MFH, eds. Inner-city Poverty in the United States. Washington, D.C.: National Academy; 1990:111–186.
9. U.S. Census Bureau. Census 2000 Data Engine CD-ROM [CD-ROM]. Orange: SRC, LLC; 2000.
10. Duncan GJ, Raudenbush SW. Getting context right in quantitative studies of child development. In: Thornton A, ed. The Well-Being of Children and Families: Research and Data Needs. Ann Arbor: University of Michigan Press; 2001:356–383.
11. Hogan JW, Tchernis R. Bayesian factor analysis for spatially correlated data with application to summarizing area-level material deprivation from census data. J Am Stat Assoc. 2004;99(466):314–324.
12. Sampson RJ, Morenoff JD, Earls F. Beyond social capital: spatial dynamics of collective efficacy for children. Am Sociol Rev. 1999;64:633–660.
13. Spencer MB, McDermott PA, Burton LM, Kochman TJ. An alternative approach to assessing neighborhood effects on early adolescent achievement and problem behavior. In: Brooks-Gunn J, Duncan GJ, Aber JL, eds. Neighborhood Poverty: Vol. 2. Policy Implications in Studying Neighborhoods. New York: Sage; 1997:145–163.
14. Viring BA, McBean M. Administrative data for public health surveillance and planning. Annu Rev Public Health. 2001;22:213–230. [PubMed]
15. Burton LM, Price-Spratlen T, Spencer MB. On ways of thinking about measuring neighborhoods: implications for studying context and developmental outcomes for children. In: Brooks-Gunn J, Duncan GJ, Aber JL, eds. Neighborhood Poverty: Vol. 2. Policy Implications in Studying Neighborhoods. New York: Russell Sage Foundation; 1997:23–47.
16. Goodman AC. A comparison of block group and census tract data in a hedonic housing price model. Land Econ. 1977;53(4):483–487.
17. Pennsylvania Department of Education. Assessment. Available at: Accessed on: November 10, 2006.
18. U.S. Department of Education. No Child Left Behind, Accountability and Adequate Yearly Progress (AYP). Presented at the National Title I Directors’ Conference. Available at: Accessed on: November 17, 2006.
19. Thacker AA, Dickinson ER. Item Content and Difficulty Mapping by Form and Item Type for the 2001–03 Pennsylvania System of School Assessment (PSSA). Alexandria: HumRRO; 2004.
20. Thacker AA, Dickinson ER, Koger ME. Relationships among the Pennsylvania System of School Assessment (PSSA) and Other Commonly Administered Assessments. Alexandria: HumRRO; 2004.
21. Snook SC, Gorsuch RL. Component analysis versus common factor analysis: a Monte Carlo study. Psychol Bull. 1989;106(1):148–154.
22. Cattell RB. The scree test for the number of factors. Multivariate Behav Res. 1966;1(2):245–276.
23. Velicer WF. Determining the number of components from the matrix of partial correlations. Psychometrika. 1976;31(3):321–327.
24. Buja A, Eyuboglu N. Remarks on parallel analysis. Multivariate Behav Res. 1992;27(4):509–540.
25. Geweke JF, Singleton KI. Interpreting the likelihood of ratio statistic in factor models when sample size is small. J Am Stat Assoc. 1980;75(369):133–137.
26. Fabrigar LR, Wegener DT, MacCallum RC, Strahan EJ. Evaluating the use of exploratory factor analysis in psychological research. Psychol Methods. 1999;4(3):272–299.
27. Yates A. Multivariate Exploratory Data Analysis: a Perspective on Exploratory Factor Analysis. Albany: State University of New York Press; 1987.
28. Wood JM, Tataryn DJ, Gorsuch RL. Effects of under- and overextraction on principal axis factor analysis with varimax rotation. Psychol Methods. 1996;1:354–365.
29. Anderberg MR. Cluster Analysis for Applications. New York: Academic; 1973.
30. Harman HH. Modern Factor Analysis. 3rd edn. Chicago: University of Chicago Press; 1976.
31. Brown TA. Confirmatory Factor Analysis for Applied Research. New York: Guilford; 2006.
32. Hu L, Bentler PM. Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Struct Equ Modeling. 1999;6(1):1–55.
33. Guadagnoli E, Velicer W. A comparison of pattern matching indices. Multivariate Behav Res. 1991;26(2):323–343.
34. Miller JK, Farr SD. Bimultivariate redundancy: a comprehensive measure of interbattery relationship. Multivariate Behav Res. 1971;6(3):313–324.
35. Van den Wollenberg AL. Redundancy analysis—an alternative to canonical correlation analysis. Psychometrika. 1977;42(2):207–219.
36. Landis JR, Koch GG. An application of hierarchical Kappa-type statistics in the assessment of majority agreement multiple observers. Biometrics. 1977;22(2):363–374. [PubMed]
37. Raudenbush SW, Bryk AS. Hierarchical Linear Models: applications and Data Analysis Methods. 2nd edn. Thousand Oaks: Sage; 2002.
38. Burnham KP, Anderson DR. Multimodel response: understanding AIC and BIC in model selection. Sociol Methods Res. 2004;33(2):261–304.
39. Massey D, Denton N. Suburbanization and segregation in U.S. metropolitan areas. Am J Sociol. 1988;94(3):592–626.
40. Crano WD, Kenny DA, Campbell DT. Does intelligence cause achievement? J Educ Psychol. 1972;63(3):671–684.
41. Coulton CJ, Korbin JE, Su M. Measuring neighborhood context for young children in an urban area. Am J Community Psychol. 1996;24(1):5–32.
42. Haney TJ. “Broken windows” and self-esteem: subjective understanding of neighborhood poverty and disorder. Soc Sci Res. 2007;36(3):968–994.
43. Aneshensel CS, Sucoff CA. The neighborhood context of adolescent mental health. J Health Soc Behav. 1996;37:293–310. [PubMed]
44. Cohen DA, Mason K, Bedimo A, Scribner R, Basolo V, Farley TA. Neighborhood physical conditions and health. Am J Public Health. 2003;93(3):467–471. [PubMed]
45. Hill TD, Ross CE, Angel RJ. Neighborhood disorder, psychophysiological distress, and health. J Health Soc Behav. 2005;46(2):170–186. [PubMed]
46. Furstenberg FF. How families manage risk and opportunity in dangerous neighborhoods. In: Wilson WJ, ed. Sociology and the Public Agenda. Newbury Park: Sage; 1993:231–258.
47. Leventhal T, Brooks-Gunn J. Moving to opportunity: an experimental study of neighborhood effects on mental health. Am J Public Health. 2003;93(9):1576–1582. [PubMed]
48. Linares LO, Heeren T, Bronfman E, Zuckerman B, Augustyn M, Tronick E. A meditational model for the impact of exposure to community violence on early child behavior problems. Child Dev. 2001;72(2):639–652. [PubMed]
49. Garbarino J, Bradshaw CP, Vorrasi JA. Mitigating the effects of gun violence on children and youth. Future Child. 2002;12:73–85. [PubMed]
50. Bowen NK, Bowen GL. Effects of crime and violence in neighborhoods and schools on the school behavior and performance of adolescents. J Adolesc Res. 1999;14(3):319–342.
51. Wolter KM. Some coverage error models for census data. J Am Stat Assoc. 1986;81(394):338–346. [PubMed]
52. Lloyd CD. Local Models for Spatial Analysis. Boca Raton: CRC/Taylor & Francis; 2006.

Articles from Journal of Urban Health : Bulletin of the New York Academy of Medicine are provided here courtesy of New York Academy of Medicine