Study subjects
Information on all cases of CNS tumours in 0-14 year olds diagnosed in the former Yorkshire Regional Health Authority during the period 1st January 1974 to 31st December 2006 was extracted from the Yorkshire Specialist Register of Cancer in Children and Young People (YSRCCYP) [
17]. The YSRCCYP is a specialist population-based cancer registry covering an area of 12,000 km
2 which varies between highly urbanised conurbations such as Leeds and Bradford in West Yorkshire to rural isolated areas such as the North York moors in North Yorkshire. The socio-demographic profile of the Yorkshire region has been shown to be representative of the UK as a whole [
18]. The YSRCCYP is exempted (originally under Section 60 of the UK Health and Social Care Act 2001, which has now been superseded by Section 251 of the National Health Service Act 2006) from the need to obtain patient consent for recording and analysis of data. The original ethical approval for the YSRCCYP was granted by the Northern and Yorkshire Research Ethics Committee in April 2000 (reference MREC 0/3/1) which allows epidemiological research, including space-time clustering, to be conducted using Register data.
Cases were ascertained from hospital clinics and neuropathology departments across the Region, and further validation checks for completeness were carried out with the National Registry of Childhood Tumours (
http://www.ccrg.ox.ac.uk) and the Northern and Yorkshire Cancer Registry and Information Service (
http://www.nycris.org.uk). 85% of all diagnoses recorded on the YSRCCYP have been histologically verified and a case review (undertaken by a single experienced neuropathologist) of all CNS tumours on the Register was carried out in 2004 to validate tumour classification [
19].
Malignant or certain benign CNS tumours were included in the analysis occurring within Group III of the International Classification of Childhood Cancer (ICCC) based on ICD-O-2 morphology and site codes [
20]. The following diagnostic groups were specified
a priori for analysis: (i) ependymoma (ICCC code III(a)); (ii) astrocytoma (ICCC code III(b)); (iii) ependymoma and astrocytoma (ICCC codes III(a) and III(b)); (iv) PNET (ICCC code III(c)); (v) other gliomas, e.g. oligodendroglioma, mixed glioma, other glioma situated outside the optic nerve (ICCC code III(d)); (vi) other specified and unspecified CNS tumours (ICCC codes III(e&f)); and (vii) all CNS tumours (ICCC codes III(a-f)) [
21]. All CNS tumours, except intracranial germ cell tumours, are captured by the ICCC IIIa-f codes. Benign tumours included cases of ependymoma, other gliomas, other specified intracranial and intraspinal neoplasms and unspecified intracranial and intraspinal neoplasms.
In the UK there are around 1.7 million postcodes, which are primarily used for postal delivery. A typical postcode may include around fifteen to twenty houses, a smaller number of multiple occupancy residences, or a single commercial address [
22]. For each case, Ordnance Survey (OS) four-digit Easting and Northing grid references were allocated to the centroid of the birth and diagnosis residential address postcode. This allowed geo-referencing of the Easting and Northing residential address co-ordinates to within 0.1 km.
Statistical methods
Overall space-time clustering was studied using an approach based on
K-functions, which may be considered to be a generalised version of the Knox test [
23,
24]. These methods have been used in previous work related to space-time clustering of childhood cancer, type 1 diabetes and congenital anomalies [
12,
13,
25,
26]. The Knox test regards a pair of cases as being in "close proximity" if diagnosis time and addresses of residence at this time are close. The number of pairs of cases observed to be in close proximity is counted and denoted
O. The number of pairs of cases expected to be in close proximity, assuming independence of spatial and temporal proximity, is calculated and denoted
E. If
O is greater than
E, then a significance test is used to determine if there is evidence of space-time clustering. An estimate of the "strength of clustering" is obtained by calculating

. A related quantity is defined as
R = O/√E.
The Knox test presents a particular limitation, namely the choice of critical values is entirely arbitrary. This test uses a single set of critical values for defining close proximity in space and time (e.g. "close in space", denoted
s = 1 km, and "close in time", denoted
t = 12 months). Selection of a number of different critical values and subsequent repetition of the Knox analysis would result in multiple testing. A simplification of the
K-function method has been used to partially avoid the arbitrary choice of critical values and therefore avoid multiple testing [
23]. This approach involved a simultaneous set of 225 calculations similar to the single Knox calculations to obtain values of
R. Critical values changed over a pre-specified set of close values in time (
t = 0.1, 0.2,...,1.5 years) and close values in space (
s = 0.5, 1.0, 1.5,..., 7.5 km). The observed value of the
K-function,
KO, was obtained by summing the 225 calculated values of
R(s,t), i.e.
KO = ∑
s,tR and the distribution of the
K-function was simulated using 999 random permutations of time. At each simulation, dates of birth (or dates of diagnosis) were randomly reallocated to each of the cases in the data set, creating a simulated value of the
K-function. Note that the Knox test corresponds to a single dimension
K-function where there is only one set of critical values. Statistical significance was assessed by comparing the observed value of the
K-function with the simulated distribution.
Unlike the Knox test, the
K-function does not give a readily available measure of the size of the clustering effect. Hence
S (obtained from the Knox test, with critical spatial values
s = 0.5,...,7.5 km and critical temporal values
t = 0.1,...,1.5 years) was used to describe the magnitude of the clustering effects for a given pair of critical values. Additionally, the nominal statistical significance of each value of
S was assessed using the Poisson distribution. To enable comparisons to be made between the geographical distance and nearest neighbour (NN) metrics (see below), an overall indicator of the strength of clustering was obtained using

(where i refers to the
ith combination of
s and
t).
If clustering has arisen due to a geostationary exposure, then this could lead to detection only by the fixed geographical distance threshold. Alternatively, if clustering has arisen due to an infective process, then this could lead to detection only by the variable NN threshold. If clustering is due to an infective process, then it must be noted that analysis based on a NN metric is likely to be more appropriate when both urban and rural areas are included. Any specified distance between two cases will have different meanings in urban and rural locations. For example, the size of school catchment areas will differ greatly. Using the NN metric the specification of critical values for "close in space" is not fixed, but determined empirically by the local density of the spatially heterogeneous underlying population. Using the
nth NN, two cases were close in space if the locations of one (or both) of the cases was nearer than the other's
nth NN in the total data set (of all birth and diagnosis addresses). Thus the number of these pairs of cases observed to be in close proximity was counted. To adjust for variations in population densities, we repeated the
K-function analyses by replacing fixed geographical distances with variable distances to the (
N-7)
th,...,(
N + 7)
th NNs if
N ≥ 8 and with variable distances to the 1st,..., 15th NNs if
N ≥ 7.
N was chosen so that the mean distance was around 5 km, thus
N = 3 for birth addresses (the fixed geographical distances were replaced by variable distances to the 1st,...,15th NNs) and
N = 12 for diagnosis addresses (the fixed geographical distances were replaced by variable distances to the 5th,...,19th NN). The use of a single threshold NN approach was originally proposed by Jacquez [
27].
The distributions of distances between the 3rd NNs for births and the 12th NNs for diagnoses were highly skewed, with median distances of 1.2 km and 2.9 km respectively. An exact geographically based match to the underlying population distribution was not available. Thus we used the case distribution as a proxy for the underlying population distribution to test whether population density was associated with space-time clustering. Cases were divided into two groups: 50% in a "more densely populated" group and 50% in a "less densely populated" group, according to whether the 3rd NN (for births) or 12th NN (for diagnoses) was closer or further away than the median distance. There are then three possible ways in which pairs of cases may be in close proximity: (i) a case from a "more densely populated" area may be in close proximity to another case from a "more densely populated" area; (ii) a case from a "less densely populated" area may be in close proximity to another case from a "less densely populated" area; or (iii) a case from a "more densely populated" area may be in close proximity to a case from a "less densely populated" area. Therefore, if we are interested in whether cases from a "more densely populated" area show a tendency to cluster, it does not matter whether partner cases are from either a more or less densely populated area. Thus, population density analyses proceeded by analysing pairs of cases that included at least one case from a "more densely populated" area (i.e. "more densely populated: any" case pairs) and pairs of cases that included at least one case from a "less densely populated" area (i.e. "less densely populated: any" case pairs).
It has been argued that population shifts may cause artificial space-time clustering [
28,
29]. We were not able to analyse population shifts, because this would require data on small area population estimates for short time intervals, which are not available. If population shifts led to space-time clustering we would predict that this would only occur within a specific sub-period. Thus, we also analysed space-time clustering within two shorter time periods (1974-1990 and 1991-2006).
As a supplementary analysis, Kulldorff's scan statistic based on a space-time permutation model was used to identify individual clusters [
30] and examine geographical and spatial patterning between covariates (and thus this method is distinct from the Knox and
K-function methods which analyse overall space-time clustering patterns). The complete study region and time span was scanned by construction of a three-dimensional cylindrical moving window. The base of the cylinder represents two-dimensional geographical space and the height represents time. The base and height of this cylinder vary so that they include at most 10% of the entire time span and at most 10% of the entire geographical area. The variable base is centred on the postcode centroid of each case [
31]. This method has been used previously in an analysis of childhood leukaemia data [
32]. The scan statistic was applied to test for differences in the propensity to cluster between gender and levels of population density, using a Bernoulli-based model [
33]. This method is a case-control approach where one stratum (e.g. males) is treated as the case group and the other stratum (e.g. females) is treated as the control group. Thus the test assesses differences between the spatio-temporal distributions of the two groups. These scan statistics were calculated using the geographical locations of the addresses (OS grid references of residence at birth or diagnosis) and temporal reference (date of birth or date of diagnosis).
Four possible space-time interactions were analysed: those between (i) times and places of birth; (ii) time of diagnosis and place of birth; (iii) time of birth and place of diagnosis; and (iv) times and places of diagnosis. The interpretation of these interactions depends on the extent of residential movement between birth and diagnosis among the cases. If there was no residential movement then there would only be two interactions (time of birth or diagnosis with place of domicile). An interaction based on birth would indicate that cases who resided close to one another were also born at close points in time, indicating that they shared a similar environment at birth. An interaction based on diagnosis would indicate that cases who resided close to one another were also diagnosed at similar times, suggesting that they shared a similar environment at diagnosis. However, more than approximately 60% of children moved between birth and diagnosis, indicating that residential movements need to be taken into account. Thus a time of birth/place of birth interaction would suggest a transient environmental exposure affecting children in-utero or shortly after birth and that there is a variable latent period between exposure and diagnosis. A time of diagnosis/place of diagnosis interaction would suggest an exposure around diagnosis place and close to diagnosis time with a short latent period. A time of diagnosis/place of birth interaction would indicate an exposure at a heterogeneous time after birth, with a constant latent period. A time of birth/place of diagnosis interaction would suggest an exposure around residence at diagnosis, affecting those born at similar times with a short latent period (for a more detailed description see Birch and colleagues [
34]).
K-function and Knox analyses were done using programs written in FORTRAN 90 [
35] and Kulldorff's scan statistic was performed using SaTScan v7.0 [
36].
Statistical significance (P < 0.05) was evaluated using one-sided tests and 999 simulations for both the K-function analyses and the scan statistic.