We address the problem of controlling false positive rates in mass-multivariate tests for electromagnetic responses in compact regions of source space. We show that mass-univariate thresholds based on sensor level multivariate thresholds (approximated using Roy's union–intersection principle) are unduly conservative. We then consider a Bonferroni correction for source level tests based on the number of unique lead-field extrema. For a given source space, the sensor indices corresponding to the maxima and minima (for each dipolar lead field) are listed, and the number of unique extrema is given by the number of unique pairs in this list. Using a multivariate beamformer formulation, we validate this heuristic against empirical permutation thresholds for mass-univariate and mass-multivariate tests (of induced and evoked responses) for a variety of source spaces, using simulated and real data. We also show that the same approximations hold when dealing with a cortical manifold (rather than a volume) and for mass-multivariate minimum norm solutions. We demonstrate that the mass-multivariate framework is not restricted to tests on a single contrast of effects (cf, Roy's maximum root) but also accommodates multivariate effects (cf, Wilk's lambda).
► We aim to estimate the number of independent tests made in MEG source space. ► We trial a heuristic based on the number of unique lead field extrema. ► We compare Bonferroni corrected tests using this heuristic to permutation methods. ► The heuristic performs well for both mass-univariate and mass-multivariate tests.
Pairwise comparison of time series data for both local and time-lagged relationships is a computationally challenging problem relevant to many fields of inquiry. The Local Similarity Analysis (LSA) statistic identifies the existence of local and lagged relationships, but determining significance through a p-value has been algorithmically cumbersome due to an intensive permutation test, shuffling rows and columns and repeatedly calculating the statistic. Furthermore, this p-value is calculated with the assumption of normality -- a statistical luxury dissociated from most real world datasets.
To improve the performance of LSA on big datasets, an asymptotic upper bound on the p-value calculation was derived without the assumption of normality. This change in the bound calculation markedly improved computational speed from O(pm2n) to O(m2n), where p is the number of permutations in a permutation test, m is the number of time series, and n is the length of each time series. The bounding process is implemented as a computationally efficient software package, FASTLSA, written in C and optimized for threading on multi-core computers, improving its practical computation time. We computationally compare our approach to previous implementations of LSA, demonstrate broad applicability by analyzing time series data from public health, microbial ecology, and social media, and visualize resulting networks using the Cytoscape software.
The FASTLSA software package expands the boundaries of LSA allowing analysis on datasets with millions of co-varying time series. Mapping metadata onto force-directed graphs derived from FASTLSA allows investigators to view correlated cliques and explore previously unrecognized network relationships. The software is freely available for download at: http://www.cmde.science.ubc.ca/hallam/fastLSA/.
The Cartesian sampled three-dimensional HNCO experiment is inherently limited in time resolution and sensitivity for the real time measurement of protein hydrogen exchange. This is largely overcome by use of the radial HNCO experiment that employs the use of optimized sampling angles. The significant practical limitation presented by use of three-dimensional data is the large data storage and processing requirements necessary and is largely overcome by taking advantage of the inherent capabilities of the 2D-FT to process selective frequency space without artifact or limitation. Decomposition of angle spectra into positive and negative ridge components provides increased resolution and allows statistical averaging of intensity and therefore increased precision. Strategies for averaging ridge cross sections within and between angle spectra are developed to allow further statistical approaches for increasing the precision of measured hydrogen occupancy. Intensity artifacts potentially introduced by over-pulsing are effectively eliminated by use of the BEST approach.
hydrogen exchange; radial sampling; angle selection; two-dimensional FT
Motivation: ChIPseq is rapidly becoming a common technique for investigating protein–DNA interactions. However, results from individual experiments provide a limited understanding of chromatin structure, as various chromatin factors cooperate in complex ways to orchestrate transcription. In order to quantify chromtain interactions, it is thus necessary to devise a robust similarity metric applicable to ChIPseq data. Unfortunately, moving past simple overlap calculations to give statistically rigorous comparisons of ChIPseq datasets often involves arbitrary choices of distance metrics, with significance being estimated by computationally intensive permutation tests whose statistical power may be sensitive to non-biological experimental and post-processing variation.
Results: We show that it is in fact possible to compare ChIPseq datasets through the efficient computation of exact P-values for proximity. Our method is insensitive to non-biological variation in datasets such as peak width, and can rigorously model peak location biases by evaluating similarity conditioned on a restricted set of genomic regions (such as mappable genome or promoter regions).
Applying our method to the well-studied dataset of Chen et al. (2008), we elucidate novel interactions which conform well with our biological understanding. By comparing ChIPseq data in an asymmetric way, we are able to observe clear interaction differences between cofactors such as p300 and factors that bind DNA directly.
Availability: Source code is available for download at http://sonorus.princeton.edu/IntervalStats/IntervalStats.tar.gz
Supplementary data are available at Bioinformatics online.
In this paper, we develop an efficient moments-based permutation test approach to improve the test’s computational efficiency by approximating the permutation distribution of the test statistic with Pearson distribution series. This approach involves the calculation of the first four moments of the permutation distribution. We propose a novel recursive method to derive these moments theoretically and analytically without any permutation. Experimental results using different test statistics are demonstrated using simulated data and real data. The proposed strategy takes advantage of nonparametric permutation tests and parametric Pearson distribution approximation to achieve both accuracy and efficiency.
There are currently a number of competing techniques for low-level processing of oligonucleotide array data. The choice of technique has a profound effect on subsequent statistical analyses, but there is no method to assess whether a particular technique is appropriate for a specific data set, without reference to external data.
We analyzed coregulation between genes in order to detect insufficient normalization between arrays, where coregulation is measured in terms of statistical correlation. In a large collection of genes, a random pair of genes should have on average zero correlation, hence allowing a correlation test. For all data sets that we evaluated, and the three most commonly used low-level processing procedures including MAS5, RMA and MBEI, the housekeeping-gene normalization failed the test. For a real clinical data set, RMA and MBEI showed significant correlation for absent genes. We also found that a second round of normalization on the probe set level improved normalization significantly throughout.
Previous evaluation of low-level processing in the literature has been limited to artificial spike-in and mixture data sets. In the absence of a known gold-standard, the correlation criterion allows us to assess the appropriateness of low-level processing of a specific data set and the success of normalization for subsets of genes.
This study seeks to increase clinical operational efficiency and accelerator beam consistency by retrospectively investigating the application of statistical process control (SPC) to linear accelerator beam steering parameters to determine the utility of such a methodology in detecting changes prior to equipment failure (interlocks actuated).
Steering coil currents (SCC) for the transverse and radial planes are set such that a reproducibly useful photon or electron beam is available. SCC are sampled and stored in the control console computer each day during the morning warm-up. The transverse and radial - positioning and angle SCC for photon beam energies were evaluated using average and range (Xbar-R) process control charts (PCC). The weekly average and range values (subgroup n = 5) for each steering coil were used to develop the PCC. SCC from September 2009 (annual calibration) until two weeks following a beam steering failure in June 2010 were evaluated. PCC limits were calculated using the first twenty subgroups. Appropriate action limits were developed using conventional SPC guidelines.
PCC high-alarm action limit was set at 6 standard deviations from the mean. A value exceeding this limit would require beam scanning and evaluation by the physicist and engineer. Two low alarms were used to indicate negative trends. Alarms received following establishment of limits (week 20) are indicative of a non-random cause for deviation (Xbar chart) and/or an uncontrolled process (R chart). Transverse angle SCC for 6 MV and 15 MV indicated a high-alarm 90 and 108 days prior to equipment failure respectively. A downward trend in this parameter continued, with high-alarm, until failure. Transverse position and radial angle SCC for 6 and 15 MV indicated low-alarms starting as early as 124 and 116 days prior to failure, respectively.
Radiotherapy clinical efficiency and accelerator beam consistency may be improved by instituting SPC methods to monitor the beam steering process and detect abnormal changes prior to equipment failure.
PACS numbers: 87.55n, 87.55qr, 87.56bd
Quality control; quality assurance; statistical process control; radiation therapy
Innovative extensions of (M) ANOVA gain common ground for the analysis of designed metabolomics experiments. ASCA is such a multivariate analysis method; it has successfully estimated effects in megavariate metabolomics data from biological experiments. However, rigorous statistical validation of megavariate effects is still problematic because megavariate extensions of the classical F-test do not exist.
A permutation approach is used to validate megavariate effects observed with ASCA. By permuting the class labels of the underlying experimental design, a distribution of no-effect is calculated. If the observed effect is clearly different from this distribution the effect is deemed significant
The permutation approach is studied using simulated data which gave successful results. It was then used on real-life metabolomics data set dealing with bromobenzene-dosed rats. In this metabolomics experiment the dosage and time-interaction effect were validated, both effects are significant. Histological screening of the treated rats' liver agrees with this finding.
The suggested procedure gives approximate p-values for testing effects underlying metabolomics data sets. Therefore, performing model validation is possible using the proposed procedure.
Motivation: Resampling methods, such as permutation and bootstrap, have been widely used to generate an empirical distribution for assessing the statistical significance of a measurement. However, to obtain a very low P-value, a large size of resampling is required, where computing speed, memory and storage consumption become bottlenecks, and sometimes become impossible, even on a computer cluster.
Results: We have developed a multiple stage P-value calculating program called FastPval that can efficiently calculate very low (up to 10−9) P-values from a large number of resampled measurements. With only two input files and a few parameter settings from the users, the program can compute P-values from empirical distribution very efficiently, even on a personal computer. When tested on the order of 109 resampled data, our method only uses 52.94% the time used by the conventional method, implemented by standard quicksort and binary search algorithms, and consumes only 0.11% of the memory and storage. Furthermore, our method can be applied to extra large datasets that the conventional method fails to calculate. The accuracy of the method was tested on data generated from Normal, Poison and Gumbel distributions and was found to be no different from the exact ranking approach.
Availability: The FastPval executable file, the java GUI and source code, and the java web start server with example data and introduction, are available at http://wanglab.hku.hk/pvalue
Supplementary information: Supplementary data are available at Bioinformatics online and http://wanglab.hku.hk/pvalue/.
To systematically review the literature regarding how statistical process control—with control charts as a core tool—has been applied to healthcare quality improvement, and to examine the benefits, limitations, barriers and facilitating factors related to such application.
Original articles found in relevant databases, including Web of Science and Medline, covering the period 1966 to June 2004.
From 311 articles, 57 empirical studies, published between 1990 and 2004, met the inclusion criteria.
A standardised data abstraction form was used for extracting data relevant to the review questions, and the data were analysed thematically.
Statistical process control was applied in a wide range of settings and specialties, at diverse levels of organisation and directly by patients, using 97 different variables. The review revealed 12 categories of benefits, 6 categories of limitations, 10 categories of barriers, and 23 factors that facilitate its application and all are fully referenced in this report. Statistical process control helped different actors manage change and improve healthcare processes. It also enabled patients with, for example asthma or diabetes mellitus, to manage their own health, and thus has therapeutic qualities. Its power hinges on correct and smart application, which is not necessarily a trivial task. This review catalogues 11 approaches to such smart application, including risk adjustment and data stratification.
Statistical process control is a versatile tool which can help diverse stakeholders to manage change in healthcare and improve patients' health.
The detection of unusual patterns in the occurrence of diseases is an important challenge to health workers interested in early identification of epidemics. The objective of this study was to provide an early signal of infectious disease epidemics by analyzing the disease dynamics. A two-stage monitoring system was applied, which consists of univariate Box-Jenkins model or autoregressive integrated moving average model and subsequent tracking signals from several statistical process-control charts. The analyses were illustrated on January 2000–August 2009 national measles data reported monthly to the Expanded Programme on Immunization (EPI) in Bangladesh. The results of this empirical study revealed that the most adequate model for the occurrences of measles in Bangladesh was the seasonal autoregressive integrated moving average (3, 1, 0) (0, 1, 1)12 model, and the statistical process-control charts detected no measles epidemics during September 2007–August 2009. The two-stage monitoring system performed well to capture the measles dynamics in Bangladesh without detection of an epidemic because of high measles-vaccination coverage.
Communicable diseases; Disease models; Disease outbreaks; Seasonal autoregressive integrated moving average model; Statistical process-control charts; Bangladesh
Applied behavior analysis is based on an investigation of variability due to interrelationships among antecedents, behavior, and consequences. This permits testable hypotheses about the causes of behavior as well as for the course of treatment to be evaluated empirically. Such information provides corrective feedback for making data-based clinical decisions. This paper considers how a different approach to the analysis of variability based on the writings of Walter Shewart and W. Edwards Deming in the area of industrial quality control helps to achieve similar objectives. Statistical process control (SPC) was developed to implement a process of continual product improvement while achieving compliance with production standards and other requirements for promoting customer satisfaction. SPC involves the use of simple statistical tools, such as histograms and control charts, as well as problem-solving techniques, such as flow charts, cause-and-effect diagrams, and Pareto charts, to implement Deming's management philosophy. These data-analytic procedures can be incorporated into a human service organization to help to achieve its stated objectives in a manner that leads to continuous improvement in the functioning of the clients who are its customers. Examples are provided to illustrate how SPC procedures can be used to analyze behavioral data. Issues related to the application of these tools for making data-based clinical decisions and for creating an organizational climate that promotes their routine use in applied settings are also considered.
Four applications of permutation tests to the single-mediator model are described and evaluated in this study. Permutation tests work by rearranging data in many possible ways in order to estimate the sampling distribution for the test statistic. The four applications to mediation evaluated here are the permutation test of ab, the permutation joint significance test, and the noniterative and iterative permutation confidence intervals for ab. A Monte Carlo simulation study was used to compare these four tests with the four best available tests for mediation found in previous research: the joint significance test, the distribution of the product test, and the percentile and bias-corrected bootstrap tests. We compared the different methods on Type I error, power, and confidence interval coverage. The noniterative permutation confidence interval for ab was the best performer among the new methods. It successfully controlled Type I error, had power nearly as good as the most powerful existing methods, and had better coverage than any existing method. The iterative permutation confidence interval for ab had lower power than do some existing methods, but it performed better than any other method in terms of coverage. The permutation confidence interval methods are recommended when estimating a confidence interval is a primary concern. SPSS and SAS macros that estimate these confidence intervals are provided.
Mediation; Permutation test
In the absence of randomization, the comparison of an experimental treatment with
respect to the standard may be done based on a matched design. When there is a
limited set of cases receiving the experimental treatment, matching of a proper
set of controls in a non fixed proportion is convenient.
In order to deal with the highly stratified survival data generated by multiple
matching, we extend the multivariate permutation testing approach, since standard
nonparametric methods for the comparison of survival curves cannot be applied in
We demonstrate the validity of the proposed method with simulations, and we
illustrate its application to data from an observational study for the comparison
of bone marrow transplantation and chemotherapy in the treatment of paediatric
The use of the multivariate permutation testing approach is recommended in the
highly stratified context of survival matched data, especially when the
proportional hazards assumption does not hold.
Highly stratified data; Matched survival data; Multiple matching; Multivariate permutation tests
Improvement of health care requires making changes in processes of care and service delivery. Although process performance is measured to determine if these changes are having the desired beneficial effects, this analysis is complicated by the existence of natural variation—that is, repeated measurements naturally yield different values and, even if nothing was done, a subsequent measurement might seem to indicate a better or worse performance. Traditional statistical analysis methods account for natural variation but require aggregation of measurements over time, which can delay decision making. Statistical process control (SPC) is a branch of statistics that combines rigorous time series analysis methods with graphical presentation of data, often yielding insights into the data more quickly and in a way more understandable to lay decision makers. SPC and its primary tool—the control chart—provide researchers and practitioners with a method of better understanding and communicating data from healthcare improvement efforts. This paper provides an overview of SPC and several practical examples of the healthcare applications of control charts.
In order to obtain adequate signal to noise ratio (SNR), stimulus-evoked brain signals are averaged over a large number of trials. However, in certain applications e.g. fetal magnetoencephalography (MEG), this approach fails due to underlying conditions (inherently small signals, non-stationary/poorly characterized signals, or limited number of trials). The resulting low SNR makes it difficult to reliably identify a response by visual examination of the averaged time course, even after pre-processing to attenuate interference. The purpose of this work was to devise an intuitive statistical significance test for low SNR situations, based on non-parametric bootstrap resampling. We compared a 2-parameter measure of p-value and statistical power with a bootstrap equal means test and a traditional rank test using fetal MEG data collected with a light flash stimulus. We found that the 2-parameter measure generally agreed with established measures, while p-value alone was overly optimistic. In an extension of our approach, we compared methods to estimate the background noise. A method based on surrogate averages resulted in the most robust estimate. In summary we have developed a flexible and intuitively satisfying bootstrap based significance measure incorporating appropriate noise estimation.
bootstrap; statistical significance; evoked response; fetal magnetoencephalography; MEG
Bayesian network models are commonly used to model gene expression data. Some applications require a comparison of the network structure of a set of genes between varying phenotypes. In principle, separately fit models can be directly compared, but it is difficult to assign statistical significance to any observed differences. There would therefore be an advantage to the development of a rigorous hypothesis test for homogeneity of network structure. In this paper, a generalized likelihood ratio test based on Bayesian network models is developed, with significance level estimated using permutation replications. In order to be computationally feasible, a number of algorithms are introduced. First, a method for approximating multivariate distributions due to Chow and Liu (1968) is adapted, permitting the polynomial-time calculation of a maximum likelihood Bayesian network with maximum indegree of one. Second, sequential testing principles are applied to the permutation test, allowing significant reduction of computation time while preserving reported error rates used in multiple testing. The method is applied to gene-set analysis, using two sets of experimental data, and some advantage to a pathway modelling approach to this problem is reported.
In a new primary care setting with three medical disciplines participating, a vaccine history and order entry system was implemented along with other online documentation systems as the primary documentation tools for the clinic. Reminders were generated based upon a set of algorithms consistent with 1998 nationally accepted vaccine guidelines. Vaccine compliance data were analyzed for the entire population cared for in this setting for a 6 month period. Rates of compliance with national recommendations for eight key vaccine groups were calculated based on the online data. Trends in the rates of compliance, interpreted within limitations, showed statistically and clinically significant improvements. The immunization application accomplished several goals: accurate history and patient-specific recommendations, online ordering of vaccines or serum products, online charting of administration that, in turn, automatically maintained the vaccine history.
Using data from over 450,000 pediatric encounters three data sources were
evaluated for their ability to support early detection of a yearly
outbreak of rotavirus disease: 1) Laboratory studies ordered, 2) Diagnosis
codes, and 3) Free text “reason for visit” strings
categorized as Gastrointestinal syndrome by a support vector machine
software classifier. We found that in this setting the categorized free
text analyzed through simple control charts detected each outbreak
within 10 days of their beginning as determined by laboratory detection
of rotavirus antigen (the gold standard). Outbreak detection by laboratory
orders was delayed an average of 14 days and by diagnosis codes
by an average of 20 days. We conclude that categorized text may provide
a valuable basis for real-time detection of disease outbreaks.
Since more than a million single-nucleotide polymorphisms (SNPs) are analyzed in any given genome-wide association study (GWAS), performing multiple comparisons can be problematic. To cope with multiple-comparison problems in GWAS, haplotype-based algorithms were developed to correct for multiple comparisons at multiple SNP loci in linkage disequilibrium. A permutation test can also control problems inherent in multiple testing; however, both the calculation of exact probability and the execution of permutation tests are time-consuming. Faster methods for calculating exact probabilities and executing permutation tests are required.
We developed a set of computer programs for the parallel computation of accurate P-values in haplotype-based GWAS. Our program, ParaHaplo, is intended for workstation clusters using the Intel Message Passing Interface (MPI). We compared the performance of our algorithm to that of the regular permutation test on JPT and CHB of HapMap.
ParaHaplo can detect smaller differences between 2 populations than SNP-based GWAS. We also found that parallel-computing techniques made ParaHaplo 100-fold faster than a non-parallel version of the program.
ParaHaplo is a useful tool in conducting haplotype-based GWAS. Since the data sizes of such projects continue to increase, the use of fast computations with parallel computing--such as that used in ParaHaplo--will become increasingly important. The executable binaries and program sources of ParaHaplo are available at the following address:
Bicistronic reporter assay systems have become a mainstay of molecular biology. While the assays themselves encompass a broad range of diverse and unrelated experimental protocols, the numerical data garnered from these experiments often have similar statistical properties. In general, a primary dataset measures the paired expression of two internally controlled reporter genes. The expression ratio of these two genes is then normalized to an external control reporter. The end result is a ‘ratio of ratios’ that is inherently sensitive to propagation of the error contributed by each of the respective numerical components. The statistical analysis of this data therefore requires careful handling in order to control for the propagation of error and its potentially misleading effects. A careful survey of the literature found no consistent method for the statistical analysis of data generated from these important and informative assay systems. In this report, we present a detailed statistical framework for the systematic analysis of data obtained from bicistronic reporter assay systems. Specifically, a dual luciferase reporter assay was employed to measure the efficiency of four programmed −1 frameshift signals. These frameshift signals originate from the L-A virus, the SARS-associated Coronavirus and computationally identified frameshift signals from two Saccharomyces cerevisiae genes. Furthermore, these statistical methods were applied to prove that the effects of anisomycin on programmed −1 frameshifting are statistically significant. A set of Microsoft Excel spreadsheets, which can be used as templates for data generated by dual reporter assay systems, and an online tutorial are available at our website (http://dinmanlab.umd.edu/statistics). These spreadsheets could be easily adapted to any bicistronic reporter assay system.
The Shipman Inquiry recommended mortality rate monitoring if it could be ‘shown to be workable’ in detecting a future mass murderer in general practice.
To examine the effectiveness of cumulative sum (CUSUM) charts, cross-sectional Shewhart charts, and exponentially-weighted, moving-average control charts in mortality monitoring at practice level.
Design of study
Analysis of Scottish routine general practice data combined with estimation of control chart effectiveness in detecting a ‘murderer’ in a simulated dataset.
Practice stability was calculated from routine data to determine feasible lengths of monitoring. A simulated dataset of 405 000 ‘patients’ was created, registered with 75 ‘practices’ whose underlying mortality rates varied with the same distribution as case-mix-adjusted mortality in all Scottish practices. The sensitivity of each chart to detect five and 10 excess deaths was examined in repeated simulations. The sensitivity of control charts to excess deaths in simulated data, and the number of alarm signals when control charts were applied to routine data were estimated.
Practice instability limited the length of monitoring and modelling was consequently restricted to a 3-year period. Monitoring mortality over 3 years, CUSUM charts were most sensitive but only reliably achieved >50% successful detection for 10 excess deaths per year and generated multiple false alarms (>15%).
At best, mortality monitoring can act as a backstop to detect a particularly prolific serial killer when other means of detection have failed. Policy should focus on changes likely to improve detection of individual murders, such as reform of death certification and the coroner system.
family practice; homicide; outcome and process assessment (health care); quality assurance, health care; regulation
In most studies aimed at localizing footprints of past selection, outliers at tails of the empirical distribution of a given test statistic are assumed to reflect locus-specific selective forces. Significance cutoffs are subjectively determined, rather than being related to a clear set of hypotheses. Here, we define an empirical p-value for the summary statistic by means of a permutation method that uses the observed SNP structure in the real data. To illustrate the methodology, we applied our approach to a panel of 2.9 million autosomal SNPs identified from re-sequencing a pool of 15 individuals from a brown egg layer line. We scanned the genome for local reductions in heterozygosity, suggestive of selective sweeps. We also employed a modified sliding window approach that accounts for gaps in the sequence and increases scanning resolution by moving the overlapping windows by steps of one SNP only, and suggest to call this a “creeping window” strategy. The approach confirmed selective sweeps in the region of previously described candidate genes, i.e. TSHR, PRL, PRLHR, INSR, LEPR, IGF1, and NRAMP1 when used as positive controls. The genome scan revealed 82 distinct regions with strong evidence of selection (genome-wide p-value<0.001), including genes known to be associated with eggshell structure and immune system such as CALB1 and GAL cluster, respectively. A substantial proportion of signals was found in poor gene content regions including the most extreme signal on chromosome 1. The observation of multiple signals in a highly selected layer line of chicken is consistent with the hypothesis that egg production is a complex trait controlled by many genes.
Motivation: Permutation testing is very popular for analyzing microarray data to identify differentially expressed (DE) genes; estimating false discovery rates (FDRs) is a very popular way to address the inherent multiple testing problem. However, combining these approaches may be problematic when sample sizes are unequal.
Results: With unbalanced data, permutation tests may not be suitable because they do not test the hypothesis of interest. In addition, permutation tests can be biased. Using biased P-values to estimate the FDR can produce unacceptable bias in those estimates. Results also show that the approach of pooling permutation null distributions across genes can produce invalid P-values, since even non-DE genes can have different permutation null distributions. We encourage researchers to use statistics that have been shown to reliably discriminate DE genes, but caution that associated P-values may be either invalid, or a less-effective metric for discriminating DE genes.
Supplementary information: Supplementary data are available at Bioinformatics online.
Summarising the complex data generated by multiple cross sectional quality indicators in a way that patients, clinicians, managers and policymakers find useful is challenging. A common approach is aggregation to create summary measures such as star ratings and balanced score cards, but these may conceal the detail needed to focus quality improvement. We propose an alternative way of summarising and presenting multiple quality indicators, suitable for use for quality improvement and governance. This paper discusses (1) control charts for repeated measurements of single processes as used in industrial statistical process control (SPC); (2) control charts for cross sectional comparison of many institutions for a single quality indicator (rarely used in industry but commonly proposed for health care); and (3) small multiple graphics which combine control chart signal extraction with efficient graphical presentations for multiple indicators.