|Home | About | Journals | Submit | Contact Us | Français|
A novel medical image quality index using grey relational coefficient calculation is proposed in this study. Three medical modalities, DR, CT and MRI, using 30 or 60 images with a total of 120 images used for experimentation. These images were first compressed at ten different compression ratios (10~100) using a medical image compression algorithm named JJ2000. Following that, the quality of the reconstructed images was evaluated using the grey relational coefficient calculation. The results were shown consistent with popular objective quality metrics. The impact of different image aspects on four grey relational coefficient methods were further tested. The results showed that these grey relational coefficients have different slopes but very high consistency for various image areas. Nagai’s grey relational coefficient was chosen in this study because of higher calculation speed and sensitivity. A comparison was also made between this method and other windows-based objective metrics for various window sizes. Studies found that the grey relational coefficient results are less sensitive to window size changes. The performance of this index is better than some windows-based objective metrics and can be used as an image quality index.
With the progress in medical diagnostic equipment, imaging systems have faster speeds and higher resolution as development goals. The picture archiving and communication system (PACS) is used to manage and store large numbers of images. The amount of medical imaging data is increasing with the latest high-resolution imaging equipment. Large amounts of data may affect the performance of PACS systems. The most cost-effective way to improve PACS performance is to use image compression technology to reduce the data. Appropriate image compression can save space and speed transmission without diagnostic impact. Therefore, image compression for PACS is the most efficient way to save cost and improve efficiency.
Lossless (reversible) image compression ratio (CR=volume of original/volume of compressed image) have a maximum of 2.5. The lossy (irreversible) compression can achieve a higher CR, but the images will become different from the original. Because of the differences, the image quality needs to be evaluated for diagnostic applications.
The lossy image assessment methods can be divided into subjective and objective. In general, the subjective methods evaluate image quality using the human eye as the basis. They are ROC (receivers operating characteristics) and the MOS (mean opinion score). The objective methods are based on mathematical calculations. The changes between the original and the compressed images are calculated using some math formulas. The MSE (mean square error) and the PSNR (peak signal to noise ratio) are frequently used objective metrics. Both metrics are used to calculate the differences in pixels from original and compressed image to indicate the change in image quality. The higher differences represent higher image quality degradation.
PSNR is most commonly as an index of image quality. The differences between pixels are calculated directly. Recently, Wang et al. found that the pixel to pixel calculation had difficultly reflecting the human visual experience. They suggested that even with the same MSE value for two images, they did not have the same image quality [1, 2].
İsmail et al. reported that both MSE and PSNR are sensitive for only detection noise. There is no connection to the human visual response .
Recently, the metrics combined perceptual quality measurement and human visual system (HVS) features [4, 5] has been developed. Since a human observer is the end user of image quality measurement, the metrics used for assessing the image quality should take into account the impact of HVS . The Sarnoff just noticeable differences (JND) vision model is being used successfully to predict digital-video quality . The JND metrics is a computational model that simulates known physiological mechanisms in the human visual system, including the contrast sensitivity of the eye, luminance, spatial frequency and orientation responses of the visual cortex . The success of these metrics is in some sense heuristic and developed as an ISO standard (ISO 20462) [7, 8]. However, Ponomarenko et al. . suggested that “currently there are no reliable mathematical models for the HVS resulting in the impossibility of defining an optimum metric that perfectly matches the HVS”.
A number of studies have used windows as the basis for the quality indices. These indices evaluate variation in quality by calculating correlation between the pixel grey values in windows, as proposed by Wang et al. [1, 2] and Chen et al. [5, 9–11]. The Universal Quality Index (Q)  and the mean structural similarity (MSSIM)  are obtained through using the changes in variance, average and covariance values between two image windows as indicators. Chen et al. proposed statistical Moran’s I test to measure the spatial correlation between windows from original and compressed images to assess the image quality [5, 9–11]. The human vision on an image should be a block (window) rather than one point only. The Q and MSSIM were proven to have a high correlation with the human eye [1, 2]. The Moran peak ratio (MPR) suggested by Chen et al. also showed a high correlation with the smoothing and sharpening of images [5, 9–11]. Therefore, window-based computing is a feasible option as the image quality index.
For these reasons, the grey relational coefficient (GRC) [12, 13] calculation for windows between images may be developed as a new image quality index. The GRC calculates the correlation coefficients between two sequences. It is equivalent to calculating the correlation of two image windows if the pixel values in windows are rearranged as sequences. The metrics are the same as the above window-based metrics.
In this study, the most common images, requiring most storage space were chosen. Three modalities, digital radiology (DR), computerized tomography (CT) and magnetic resonance imaging (MRI) were used as experimental images. These images were first compressed at ten different CRs (10~100) using a medical image compression algorithm, named JJ2000 (available on http://jj2000.epfl.ch). Following that, the quality of the reconstructed images was evaluated using the GRC and some objective metrics. The GRC results are consistent with some objective metrics, such as MSE, PSNR, Q and MPR. This method was also compared with other windows-based objective metrics with varied window sizes. It was found that GRC is less susceptible to window size variations. GRC is a stable image quality index relative to Q and MPR.
The grey system was proposed by Deng in 1982 . The GRC calculation is a major element in the grey system. GRC is used to calculate how discrete sequences correlate [13, 14]. This is equivalent to calculating the correlation coefficients between the pixel grey scales of two image windows if the pixel values were rearranged as sequences.
Deng’s GRC can be written as:
Wong’s GRC is:
The Γ can be used to represent the closeness of the original sequence to another sequence. A higher the GRC value corresponds to high correlation between sequences.
A sequence Si exist with k elements: Si(Si(1), Si(2), Si(3),..., Si(k)) U. The (U (i); Γ) is a grey relational space. Here
which means that for any window in the corresponding position on images there are m+1 sequences respect to m+1 images:
If the sequence S0 is taken as a reference sequence (original image), and all other sequences are taken for comparison (reconstructed images), this type of GRC calculation is called localization grey relational grade (LGRG) . In this study, the original image was used as a reference sequence. There is no need for any comparison between the various compressed images. So the LGRG was used for GRC calculation.
In the distinguished coefficient (DC) ζ=[0, 1] range, adjusting the DC can change the GRC value, but does not affect the original sequence order with the compared sequence. However, the DC value adjustment always results in a change in Γ values.
Four researchers, Hsia, Wen, Wu and Nagai, proposed their quantitative GRC based on Deng and Wong’s formula respectively, as shown in Eqs. 3–6 . In their GRC, all DCs are set to 1, as noted in these equations.
Hsia’s GRC is
Wen’s GRC is
Wu’s GRC is
Nagai’s GRC is
where The Euclidean mode ρ=2 was used in this study.
The above four kinds of GRCs were applied to find the relations between the original images and compressed images. The original image was set as a reference sequence (S0) and compressed images were compared sequences (Si). Equal size windows of original and reconstructed images in corresponding positions were chosen (e.g. 9×9). The GRC can be calculated after sorting these pixel values in the windows as sequences. Windows slides on the image from the top-left corner to the base-right corner pixel-by-pixel to calculate all GRCs. Taking the average of all GRC can be used to represent the closeness between two images (or between pixel grey scares of images). This closeness coefficient may be used to as an index of quality. If two images are exactly the same the GRC will be equal to 1. The average GRC with the CR incremental value should be reduced and should be used to represent the image quality degradation.
To prove the proposed method’s usefulness, the GRC results in this study will be compared with some general objective metrics. The objective metrics included both pixel-based calculation, such as MSE, PSNR and window-based metrics, such as Q and MPR.
In MSE and PSNR, both pixel-based metrics evaluate the image quality by calculating the average of the square of the pixel grey scale values differences between images. They can be written as
where IO and IM are the pixel grey values of original and reconstructed images, respectively. The n is the depth of the bits in a pixel, and N is the total number of pixels in the image. A lower MSE and higher PSNR correspond to a better image quality.
Q was proposed by Wang et al. . This index evaluates the image quality by calculating the correlation, brightness, and contrast distortion between windows. A sliding window is used to scan entire images and take the average to derived Q values. The Q values range between 0 and 1. For two identicle images, the Q value is 1. The higher the Q the better the quality. Q can be calculated as follows
where and σ2 represent the average window pixel greyscale value and variance, is a covariance, N=number of pixels within the window, WO (i) and WM (i) represent the original and compressed image pixel greyscale values.
The Moran test is mainly used to estmate the data spatial autocorrelation. The Moran coefficient A and standard score Z for pixels in an m×n window is calculated as 
where xi is the greyscale value of pixel i; represents the average greyscale of window; if pixels i and j are adjacent δij=1 and=0 otherwise; S0 =2 (2 mn−m−n); m and n are the number of rows and columns in the window; and N is the total number of pixels in the window. The numerator is a measure of the covariance and the denominator is a measure of the variance among the pixels. A larger A indicates a high correlation between pixels and the image is smother. When the size of N is large enough (i.e. >25), the variable approximately follows a normal distribution with the mean and variance given by 
We can use the standard normal statistics to determine the structural information of an image. This statistics can be written as,
A Z histogram can be produced by collecting all Z values in an image and then sorting them into bins. The spatial correlation increases with the amount of image blurring and accompanies the increase in Z value. This Z value will increase in certain areas to form a peak. The MPR is a peak ratio of the Z value between the manipulated and original images. It has been proven to correspond well to the image spatial properties variation [5, 9–11]. The higher the Q and the lower the MPR represent with a better image quality.
The most common Radiology images were chosen for this study. Three modalities: CT (image size of 512×512 and 12 bits deep), MR (image size of 512×512 for head and 256×160 for abdomen, all images are 12 bits deep), and DR images (image size of 2000×2000 and 15 bits deep). Thirty images were chosen of MR and DR and of CT were 60 images; 120 images in total. For CT images, 30 head images and 30 abdominal images were used. For DR, 30 chests and for MR, 30 head images were used. All of these images were randomly chosen from a PACS system from a general hospital in Central Taiwan. A software named “JJ2000” (JJ2000 version 4.1, available on the Internet at http://jj2000.epfl.ch) was used as compression algorithm. The images were first compressed at ten different CRs (10~100). Following that, the quality of the reconstructed images was evaluated using the above metrics. The results of GRC will take comparison with different objective metrics.
The above four GRCs were applied on the original and compressed images to verify that the GRC can be used as an indicator of compressed image quality. A CT abdominal image, shown in Fig. 1, was chosen for this application. Four different areas were adopted to represent different aspects in image. These included (a) contrast, (b) edge, (c) noise and (d) smooth areas, as noted in Fig. 1. This image was compressed at ten different CRs first by JJ2000 first and then reconstructed back. Following that, four GRC calculation were applied to eash of these areas with 5×5, 7×7, 9×9 and 11×11 windows respectively. The results form four GRCs on the contrast area (i.e. Fig. 1a) using four different window sizes are shown in Fig. 2. These four GRC trends are obviously the same for an image area with various window sizes as noted in this Figure. Actually, the trends of the other areas are the same (not shown here). Results from the four GRCs in four different areas using the same 9×9 window are shown in Fig. 3. The trends for Hsia, Wen and Wu are similar and Nagai shows more slope. The four GRCs were found to follow the same trend in each ROI relative to the increase in CR. The Nagai slopes are obviously greater than the other three in the low CR regions for all four areas. The GRC lines apparently follow different trends in Fig. 3d. This area is a smooth region and the results correspond well to that from a previous study. A report  evaluated image quality using two observers after image compression and transmission. They mentioned that with wavelet compression, the high-contrast region did not decrease at lossy compression for CR lower than 20. There was a loss in low contrast region at 1% and 5% modulation, corresponding to 10:1 and 20:1 compression, respectively. This means that the quality of the smooth areas degraded faster than the sharp areas with same CR.
The results from four GRCs are the same for the other window sizes (5×5, 7×7 and 11×11). The results show that Nagai’s GRC is more sensitive in response to changes in CR and has less computation time than others. Therefore, in the following calculations Nagai’s GRC was applied.
The MSE vs. various CR for CT abdominal images was shown in Fig. 4a. The greater the MSE responds the higher CR of images. The GRC results are shown in Fig. 4b. The average GRC line descends to a lower value indicating a high CR corresponding to a lower correlation. The direction of the two lines are different but the trend is the same. The PSNR and GRC results for CT head images are shown in Fig. 5a, b. These two metrics have the same directions and trends. The GRC trend is linear as noted in this figure.
The Q and GRC for DR chest images were shown in Fig. 6a, b. The Q is decreased with the increase in CR. The GRC results show a similar trend. Figure 7a, b show the MPR and GRC sagittal head MR image results. These two lines move in different directions, but show similar trends with CR increasing.
From the above results and comparison, GRC and CR are indeed relevant. The GRC is consistent with objective metrics, such as MSE, PSNR, Q and MPR. This calculation can be used to correspond the variation in image quality. The GRC features will be discussed in the following section.
In the GRC calculation, the sequence should meet the comparability requirement. That is the ratio between any values in sequence must to lower than 100 [13, 14]. However, the ratio may higher than 100 between the maximum and minimum MR image values. In the bone, soft tissue and edge junction regions the ratios may go up to 200.
The high ratio between values in sequences may impact the GRC results. A MR head image was chosen randomly for testing. A value of 1,000 was added to each image pixel to reduce the ratios. Following that, the original and the scaled images were compressed as before and GRC calculated, respectively. The results showed that there is no impact on the GRC for CR lower than 100.
An objective image quality index should be a well-defined dynamic range. The range will help people to understand changes in image quality. The dynamic range of Q index is between [0, 1]. The range of GRC is in [0, 1] too. The best image quality is GRC=1, as two images are identical, and 0, they are nothing correlated.
The window sizes may change estimation results for a window-based metric. A smaller window will cover only a small area with less variation in general. Larger areas may have varied pixel grey levels. Does the window size impact the quality estimation?
A CT abdominal image was chosen randomly to evaluate this issue. The image was compressed as above first and then the GRC, Q and MPR were applied on these images using various window sizes, respectively. The window sizes were varied from 3×3 to 12×12. Because the 8×8 [1, 2, 11] or 9×9 [5, 9, 10] window sizes were used successfully in some previous studies, the calculation results for 9×9 were selected as a reference. The correlation coefficient of these results with respect to the 9×9 results were calculated. The sample window size does not affect the GRC calculations, as noted in Fig. 8. This means that the GRC showed almost equal trend lines for windows. The trends for Q and MPR went obviously different for windows. The GRC is a better indicators of image quality.
The GRC calculation was conducted in a Matlab environment in this study. The calculation time expense for four different GRC were estimated. Only 0.44 s were needed of Nagai’s GRC to process eight images. The time need for Hsia and Wen was up to 8.55 s for the same calculation and Wu needed more. The Q needed 1.2 s and MPR was the longest. Nagai’s GRC is therefore the fastest over other window-based metrics.
This study applied GRC to medical image compression analysis. We are developing a compression image quality indicator. The results from this study showed that the GRC can be used as an image quality index. The GRC presented consistent results with some objective points or window-based metrics. GRC is independent of the window size and has a faster calculation speed than other window-based metrics. This method can also be used to discern the image quality between different image compression algorithms.
This work was supported by a research grant: NSC 97-2314-B-471-001-MY2 from the National Science Council of Taiwan.