|Home | About | Journals | Submit | Contact Us | Français|
The purpose of this study was to find risk factors that are associated with complications of cerebral infarction in patients with atrial fibrillation (AF) and to discover useful association rules among these factors.
The risk factors with respect to cerebral infarction were selected using logistic regression analysis with the Wald's forward selection approach. The rules to identify the complications of cerebral infarction were obtained by using the association rule mining (ARM) approach.
We observed that 4 independent factors, namely, age, hypertension, initial electrocardiographic rhythm, and initial echocardiographic left atrial dimension (LAD), were strong predictors of cerebral infarction in patients with AF. After the application of ARM, we obtained 4 useful rules to identify complications of cerebral infarction: age (>63 years) and hypertension (Yes) and initial ECG rhythm (AF) and initial Echo LAD (>4.06 cm); age (>63 years) and hypertension (Yes) and initial Echo LAD (>4.06 cm); hypertension (Yes) and initial ECG rhythm (AF) and initial Echo LAD (>4.06 cm); age (>63 years) and hypertension (Yes) and initial ECG rhythm (AF).
Among the induced rules, 3 factors (the initial ECG rhythm [i.e., AF], initial Echo LAD, and age) were strongly associated with each other.
Atrial fibrillation (AF), the most common type of arrhythmia, decreases cardiac function by halting cardiac activity and causing irregular ventricular activity, and it is the main cause of 85% of systemic thromboembolic events and 33% of strokes resulting from blood clots in the atria . Among the AF-induced cardiac diseases, cerebral infarction results in a large number of lesions with a high recurrence rate, and it is therefore associated with high mortality and morbidity rates. Therefore, it is of utmost importance to prevent strokes, for which it is essential to precisely identify the risk factors involved . For AF, the annual incidence rate of thromboembolism is approximately 4%-6%, and its risk factors include congestive heart failure, coronary artery disease, hypertension, age (>65 years), diabetes mellitus, mitral valve disease, and history of thromboembolism. Vigorous research is being conducted to identify the risk factors of strokes that are complications resulting from AF .
With the recent introduction of the Electronic Medical Record (EMR) and Electronic Health Record (EHR), collecting various types of clinical data has become easier. However, the clinical data within a hospital contain various types of unclear and incomplete data that are difficult to comprehend. As such, it is difficult to identify important factors related to the diagnosis or prognosis of certain diseases or to obtain meaningful knowledge from these types of clinical data [4,5]. In clinical medicine, the independent risk factors for the corresponding disease are generally extracted using multivariate statistical analysis utilizing logistic regression analysis to identify the risk factors associated with the diagnosis or the prognosis of certain diseases and to design a diagnosis and prediction model based on this. This method may be most effective for identifying independent risk factors of certain diseases; however, this method has a limitation, that is, the extraction of the relationships among the risk factors may be difficult. To compensate for this limitation, many previous studies have proposed methods using various data mining techniques [5-10]. In particular, some studies [5,10] have proposed a hybrid decision model (e.g., "multivariate statistical analysis and decision tree" and "rough set and decision tree"), which combines the advantages of the statistical analysis technique and machine learning techniques, and to verify its effectiveness, this model was applied to uncover useful information for acute appendicitis and heart failure. As with the method described in the these studies [5,10], we combined multivariate statistical analysis with the association rule mining technique to analyze the risk factors for cerebral infarction, a complication of AF, and we discuss the methods of obtaining useful information for decision making.
The study included 1,134 patients with AF who were among the patients who visited the outpatient clinic of the Dongsan Medical Center in Daegu, Korea, between September 1983 and September 2010. Medical records were collected on demographic characteristics, medical history, initial electrocardiographic (ECG) findings, and initial echocardiographic (Echo) findings; samples with missing values were excluded. To obtain information related to the risk factors associated with cerebral infarction in patients with AF, 227 patients with cerebral infarction complications were selected from among those with AF for the study group. 907 patients without cerebral infarction complications were recruited for the control group.
To compare the differences between the study group and the control group, the chi-square test or Fisher's exact test was performed for category variables. For continuous variables, the student's t-test was carried out if they satisfied the normality after the Kolmogorov-Smirnov test; otherwise, the Mann-Whitney test was used if they did not. Here, the distribution of the category variables were given as percentages (%), and the continuous variables were given as mean ± standard deviation. In addition, the level of statistical significance was defined as p < 0.05. To identify the independent risk factors for cerebral infarction, multivariate statistical analysis was conducted on the significant factors with the p-value of 0.05 and 0.10 for entry and removal, respectively. All statistical analysis were carried out using the SPSS ver. 12.0 (SPSS Inc., Chicago, IL, USA).
The present study used an a priori algorithm  to obtain the information related to cerebral infarction. The a priori algorithm is one of the typical association rule mining methods with the generated rule expressed as R: IF A THEN B (or A => B) to extract association rules based on 2 variable parameters, i.e., support and confidence. In addition, the generated rules are judged based on the lift or improvement scale .
Here, |.| represents the cardinality of the set, i.e., the number of the element, and N represents the number of samples in the entire data.
In general, support represents the ratio of samples that simultaneously satisfy both condition A in the antecedent part and condition B in the consequent part in the entire set of samples, while confidence represents the ratio of samples that satisfy both conditions A and B, among those that satisfy condition A in the antecedent part. A confidence value of 1 for a certain rule means that the possibility of obtaining outcome B when A is a given condition (A → B) is 100% (i.e., certain rule); if not, the possibility of A → B is defined as a value (possible rule) between 0 and 1. In addition, the lift or improvement value of ≥1 represents a positive correlation, while a value of <1 represents a negative correlation. Therefore, more general or useful information would have a higher confidence level, with a lift or improvement value of ≥1. However, as discussed above, it is difficult to determine appropriate values for the two free parameters in association rule mining, because information must be obtained based on the minimum threshold for support and for confidence. As such, in this study, the minimum confidence level was variably adjusted to 10%-50%, when the minimum support was defined as 10%; cardiology specialists were consulted for the association rules generated here, while the final confidence level was determined by a clinical specialist's opinion. Furthermore, the a priori component of a commercial data mining program, Clementine ver. 12.0 (SPSS Inc., Chicago, IL, USA), was used for these experiments, and default values were used for the experimental parameters.
Upon comparing the general statistical characteristics of the study group and the control group, it was found that the mean age was higher in the study group (69.31 ± 9.04 years) than in the control group (66.17 ± 11.22 years), with statistical significance (p < 0.001). As to medical history, 116 patients (51.1%) in the study group had hypertension compared to 342 patients (37.7%) in the control group (p < 0.001), while there were 43 patients (18.9%) in the study group who had coronary artery disease compared with 117 patients (12.9%) in the control group, also showing statistical significance (p < 0.05). With regard to the initial ECG findings, statistically significant (p < 0.01) differences were observed in ECG rhythm for AF with 205 patients in the study group (90.3%) and 738 patients in the control group (81.4%), for AF, with 4 patients in the study group (1.8%) and 15 patients in the control group (1.7%), and for normal sinus rhythm with 18 patients in the study group (7.9%) and 154 patients in the control group (17.0%). In addition, with regard to the type of AF, 58 patients from the study group (25.6%) and 325 patients from the control group (35.8%) showed paroxysmal AF, 17 patients from the study group (7.5%) and 73 patients from the control group (8.0%) showed persistent AF, and 152 patients from the study group (67.0%) and 509 patients from the control group (56.1%) showed permanent AF, with both the study and the control groups showing a relatively high frequency of permanence compared to the other 2 types, both with statistically significant difference (p < 0.01) (Table 1).
In the present study, the area under the receiver operating characteristic curve (AUC) was used to determine the clinical reference or criteria for independent risk factors with continuous attribute values. The data point value that provided the best AUC value in the ROC curve was selected as the cutoff value for each independent factor. Table 2 shows the independent variables having continuous attribute values; i.e., the clinical references for age, initial ECG heart rate, and for initial Echo ejection fraction (EF, %), diastolic left ventricular dimension (LVDd, cm), systolic left ventricular dimension (LVDs, cm), and left atrial dimension (LAD, cm), as well as the standard error and the AUC. As shown by the results in Table 2, findings include age older than 63 years, heart rate higher than 78 beats/min, EF ≤63%, LVDd ≤4.87 cm, LVDs >3.04 cm, and LAD >4.06 cm. In addition, the results of comparing the statistical characteristics between the study group and the control group after differentiating these 6 variables are shown in Table 3, with a statistically significant difference (p < 0.01) for LAD.
Binary logistic regression analysis was used to extract the risk factors associated with complications of cerebral infarction from the 6 factors (age, hypertension, coronary artery disease, initial ECG rhythm, AF type, and initial Echo LAD) that were selected after univariate statistical analysis. The results showed that age, hypertension, initial ECG rhythm, and initial Echo LAD are the independent risk factors associated with cerebral infarction, with the risk increasing by 1.949 times for patients older than 63 years, 1.587 times for patients with a history of hypertension, 2.026 times when AF is the initial ECG rhythm, and 1.482 times when the LAD on initial Echo findings exceeded 4.06 cm. Furthermore, the Hosmer-Lemeshow goodness-of-fit test results showed that the model was appropriate, because the significance level for the chi-square value was 0.538 (Table 4).
To obtain clinically reliable rules to determine the complications of cerebral infarction, the association rules were generated for the case when the minimum confidence level was adjusted to 10%-50%. As a result, the most reliable rule set could be obtained when the minimum confidence level was defined at 30%, yielding 44 rules (40 rules related to AF and 4 related to cerebral infarction) (Table 5). As shown in Table 5, the first rule showed the highest confidence level and improvement scale, which could be interpreted as follows.
Those who were older than 63 years, had a history of hypertension, showed AF on the initial ECG Rhythm, and had initial Echo LAD of over 4.06 cm comprised 7% of the total 1,134 patients, with a confidence level of approximately 33% and an improvement scale of the rule of 1.7 times showing a positive correlation.
As previously mentioned, Table 6 shows the change in the number of corresponding rules for each group when the minimum confidence level was adjusted to 10%-50%, and the rules associated with cerebral infarction could not be obtained when the minimum level of confidence was adjusted to 0.35-0.50 (35%-50%). In addition, the results of Web node analysis conducted to examine the relationship between the fields showed that the factors most closely associated with cerebral infarction in patients with AF include the following in decreasing order of association: AF as the rhythm on initial ECG, LAD of greater than 4.06 cm on initial Echo, age older than 63 years in terms of demographic characteristics, and hypertension in terms of medical history (Figure 1).
Atrial fibrillation, the most common supraventricular arrhythmia, in which irregular atrial muscle contractions produce an irregular pulse, is known as a major risk factor of thromboembolism. Furthermore, it leads to strokes and causes hemodynamic instability, deterioration of renal function, and systemic embolic events [13,14].
In the present study, multivariate statistical analysis using logistic regression analysis was conducted to extract the risk factors for cerebral infarction complications in patients with AF, and the relationship between these factors was analyzed by applying the association rule mining technique. As a result, the independent risk factors associated with cerebral infarction complications were found to include age, hypertension, initial ECG rhythm, and initial Echo LAD, and the following information associated with cerebral infarction could be obtained: 1) age >63 years, hypertension is present, initial ECG rhythm is AF, initial Echo LAD >4.06 cm => cerebral infarction (support, 6.88%; confidence, 33.48%); 2) age >63 years, hypertension is present, initial Echo LAD >4.06 cm => cerebral infarction (support, 7.50%; confidence, 31.84%); 3) hypertension is present, initial ECG rhythm is AF, initial Echo LAD >4.06 cm => cerebral infarction (support, 8.29%; confidence, 30.52%); and 4) age >63 years, hypertension is present, initial ECG rhythm is AF => cerebral infarction (support, 7.50%; confidence, 30.47%). In addition, the analysis results using web node revealed AF as the initial ECG rhythm to be the factor most closely associated with cerebral infarction in patients with AF, followed by initial Echo LAD >4.06 cm, age >63 years, and hypertension in decreasing order of association.
An existing numeric tool that can be used to estimate the risk of stroke in patients with atrial fibrillation is CHADS2 (congestive heart failure, hypertension, age, diabetes mellitus, prior stroke or TIA or thromboembolism [double]) score . In this tool, 1 point is given for congestive heart failure, hypertension, age 75 years or older, and medical history of diabetes mellitus, and 2 points are given for history of stroke or transient cerebral ischemic attack, to categorize a low-risk group (0 point), a moderate-risk group (1 point), and a high-risk group (more than 2 points). In addition, the CHA2DS2-VASc (congestive heart failure/left ventricular dysfunction, hypertension, age ≥75 [doubled], diabetes, stroke [doubled], vascular disease, age 65-74, and sex category [female]) score (Birmingham 2009 scheme), based on the new guidelines reported at the European Society of Cardiology in 2010, is a more detailed stroke risk assessment tool than the previous CHADS2 score, because it includes the risk factors (female, 65-75 years of age, left ventricular dysfunction, vascular diseases) that affect thromboembolism in patients whose CHADS2 score is between 0 and 1 . In the two stroke assessment tools mentioned above, the factors used to assess the risk of stroke in patients with atrial fibrillation include age, hypertension, diabetes mellitus, stroke or history of transient ischemic attack, sex, left ventricular dysfunction, and vascular diseases. These factors show a consistent trend with the 4 factors that are suggested in this study as risk factors (age, hypertension, AF as the initial ECG rhythm, and initial Echo LAD).
The results of the present study suggest the risk factors for complications of cerebral infarction in atrial fibrillation patients and the association rules between these factors, based on medical record data collected retrospectively. However, the effectiveness and reliability of these risk factors and the association rules suggested in this study have yet to be verified for clinical application; further research is required for such verification in addition to comparative studies with existing stroke assessment tools.
This research was supported by the Basic Research Program through the National Research Foundation of Korea (NRF), which is funded by the Ministry of Education, Science and Technology (Grant No. 2012-0004520), the Regional Technology Innovation Program of the Ministry of Knowledge Economy (Grant No. RTI04-01-01), and implantable biosensor and automatic physiological function monitor system for chronic disease management from the Industrial Strategic Technology Development Program of the Ministry of Knowledge Economy (Grant No. 10041876).
No potential conflict of interest relevant to this article was reported.