In total, 202 units were recruited across the ten countries. No centre had a data entry error rate over 5% and no complete double data entry was required. Of the 202 units, 93 (46%) were in the inner city, 73 (36%) in the suburbs and 37 (18%) in the country. The majority (120, 59%) were community based, 47 (23%) were hospital wards and 35 (17%) were units within the hospital grounds. Their size ranged from five to 320 beds (mean 30, median 19); 162 (80%) had no maximum length of stay and of those that did the mean was 1.8 years (range 0.5 to 5, median 2). Thirty-three (16%) units were for men only and 18 (9%) for women only. Table shows the characteristics of units recruited in each country. Independent data collection for inter-rater reliability testing of the toolkit was carried out in only one case by a second rater repeating the interview.
Characteristics of included units and inter-rater reliability testing method
Sixteen items had a narrow range of response (Figure ).
Reasons for dropping toolkit items.
The results of the inter-rater reliability testing are shown in Additional file 1
. Only one item had poor inter-rater reliability (How many CBT appointments are usually offered?) but was retained with an amended response structure.
Of the 202 managers interviewed, 189 (94%) thought the toolkit questions were relevant/very relevant to their unit and 178 (88%) thought the results would be useful/very useful in auditing the quality of their unit. Of the 202 interviews carried out, the researchers reported that 143 (71%) took between one and two hours, 43 (21%) took less than an hour and 15 (7%) took over two hours. There were problems in accessing information in 37 (18%) interviews.
The toolkit was refined through discussion with the PSC and international expert panel in light of the results. The 16 items with a narrow range of response were dropped and nine others were dropped for the reasons shown in Figure . Eight items were merged with another item, three items were amended from single answer to categorical response options and one item was added (total number of staff employed by or visiting the unit). The final toolkit comprised 145 questions.
In the initial allocation of scored items to domains, 25 were allocated to Living Environment, 42 to Therapeutic Environment, 34 to Treatments and Interventions, 32 to Self-management and Autonomy, eight to Social Policy and Citizenship, eight to Clinical Governance, 19 to Social Interface, 30 to Human Rights and 25 to Recovery Based Practice. The following pairs of domains shared more than 50% of items: all Social Policy, Citizenship and Advocacy questions were also in Human Rights; 72% of Recovery Based Practice questions were in Therapeutic Environment; 64% of Recovery Based Practice questions were in Self-management and Autonomy; 60% of Human Rights questions were in Self-management and Autonomy; 53% of Social Interface questions were in Treatments and Interventions; 50% of Clinical Governance questions were in Human Rights and 50% were in Therapeutic Environment.
After the first iteration of the EFA, 16 items were removed from domains they did not load onto where they loaded onto another domain. After the second iteration one item (is there a private room for patients/residents to meet with their visitors?) which had not loaded onto any domain in the first iteration now loaded onto Living Environment and was retained. One question (unit has a policy for dealing with a report from a patient/resident of abuse, aggression or bullying from a member of staff?) which had loaded onto Clinical Governance and Human Rights after the first iteration now did not load onto Clinical Governance and was retained only in Human Rights. One item (unit provides the same activities for all residents?) which had loaded onto Therapeutic Environment after the first iteration no longer loaded after the second iteration. Eight items which did not load onto any domain after the first and second iterations were dropped (Figure ) and the third iteration of EFA run. This indicated that all remaining items loaded onto at least one domain with a factor loading greater than 0.3.
Items dropped after Exploratory Factor Analysis.
The KMO measures of sampling adequacy of the nine domains were low for Clinical Governance and Social Policy, Citizenship and Advocacy (0.52 and 0.61 respectively). Clinical Governance comprised only three items and Social Policy, Citizenship and Advocacy comprised six. All these items also contributed to other domains. The PSC therefore agreed that these two domains could be dropped without the loss of any toolkit content. The KMO statistics for the remaining seven domains ranged from 0.67 to 0.80 with only one (Social Interface) falling just below 0.7. The number of items per domain, KMO and Cronbach's Alpha statistics are shown in Table . These demonstrate that all seven domains had good internal consistency (again only Social Interface fell just below the threshold of 0.7). The final allocation of questions to domains comprised 88 questions allocated to one or more of seven domains (38 were allocated to one domain, 24 to 2, 20 to 3, 5 to 4 and 1 to 5). The EFA process reduced the overlap of items between domains (57% of Recovery Based Practice items in Self-management and Autonomy compared with 64% originally; 52% of Human Rights in Self-management and Autonomy compared with 60% originally; 71% of Recovery Based Practice items in Therapeutic Environment compared with 72% originally; 60% of Social Interface items in Treatments and Interventions compared with 53% originally).
Sampling adequacy and internal consistency of domains after 3rd iteration of exploratory factor analysis