|Home | About | Journals | Submit | Contact Us | Français|
To construct and optimize a neural network that is capable of predicting the occurrence of recurrent aphthous ulceration (RAU) based on a set of appropriate input data.
Artificial neural networks (ANN) software employing genetic algorithms to optimize the architecture neural networks was used. Input and output data of 86 participants (predisposing factors and status of the participants with regards to recurrent aphthous ulceration) were used to construct and train the neural networks. The optimized neural networks were then tested using untrained data of a further 10 participants.
The optimized neural network, which produced the most accurate predictions for the presence or absence of recurrent aphthous ulceration was found to employ: gender, hematological (with or without ferritin) and mycological data of the participants, frequency of tooth brushing, and consumption of vegetables and fruits.
Factors appearing to be related to recurrent aphthous ulceration and appropriate for use as input data to construct ANNs that predict recurrent aphthous ulceration were found to include the following: gender, hemoglobin, serum vitamin B12, serum ferritin, red cell folate, salivary candidal colony count, frequency of tooth brushing, and the number of fruits or vegetables consumed daily.
Recurrent aphthous ulceration (RAU) affects healthy as well as medically-compromised people. Aphthous ulcers are painful, shallow, and usually covered with a grayish white pseudomembrane that is surrounded by an erythematous margin.1
Although the clinical characteristics of RAU are well defined, the precise etiology remains unclear, and therefore the term “idiopathic” is widely used.2 Nevertheless, a number of predisposing factors have been linked to a minority of patients. A genetic background has been found for some RAU patients; those having positive family history for oral ulcerations have shown an increased frequency of human leukocyte antigen (HLA) types A2, A11, B12, and DR2.2
Dietary patterns could be playing a role in the pathogenesis, either by causing hypersensitivity or by deficiency of some vitamins, proteins, or minerals.3 Results of recent studies implicate cows milk in the etiology of RAU.4–6 Recurrent aphthous-like ulcers are seen as oral manifestations of hematinic deficiencies of vitamin B1, B2, B6, B12, folic acid, or iron.2 While some researchers found a significant relationship between vitamin B12 deficiency and RAU, it was also found that hemoglobin level and serum levels of folic acid and ferritin did not have a statistically significant effect on RAU.1 Some researchers noticed that RAU patients ate acidic foods like oranges and lemons more frequently than participants in a control group.3 Food allergies including chocolate, cheese, gluten, cinnamaldehyde, methyl methacrylate, mercury, wheat flour, tomatoes, peanuts, and strawberries might be responsible for the onset of oral ulcers.2,7–9
A minority of patients may be predisposed to aphthous-like ulcers by systemic conditions or diseases.10 Gender seems to be unrelated to the occurrence of RAU,1 however, patients affected by RAU are usually nonsmokers.2
Based on knowledge of the aforementioned predisposing factors, the diagnosis of RAU can be established by obtaining a proper history that confirms recurrence and excludes trauma as a predisposing factor. The clinical features of RAU are also important tools in establishing the diagnosis. RAU can appear in one of three forms: minor, major, and herpetiform.11,12
Artificial neural networks (ANN) is an example of an intelligent data analysis tool and is claimed to be superior to classic regression.13,14 ANNs function in much the same way as neurons in the brain, which have the capability of acquiring, storing, and utilizing experiential knowledge.15,16 An ANN consists of an interconnected group of artificial neurons that process information using a connectionist approach to computation. It is an adaptive system that changes the values of some constants related to certain input data based on their effect on the output data.15,16
Genetic algorithms (GAs) are based on the triangle of genetic reproduction, evaluation, and selection.17 Genetic reproduction is performed by means of two basic genetic operators: crossover and mutation. Evaluation is performed by means of the fitness function, which is dependent on the specific problem. Selection is the mechanism that selects parent individuals with probability proportional to their relative fitness. Some genetic algorithms (like the one used in this work) consist of the following steps: Initialization. An initial population comprising a number of individuals is randomly generated in this phase. Evaluation. The fitness, a positive measure of quality used as a measure to reflect the degree of goodness of the individual, is calculated for each individual in the population. Selection. Individuals are chosen from the current population to enter a mating pool devoted to the creation of new individuals for the next generation such that the chance of a given individual to be selected to mate is proportional to its relative fitness. This means that best individuals produce more copies in subsequent generations so that their desirable traits may be passed onto their offspring. This step ensures that the overall quality of the population increases from one generation to the next. Crossover. Provides the means by which valuable information is shared among the population. It combines the features of two parent individuals to form two children individuals who may have new patterns compared to those of their parents. Crossover also plays a central role in GAs. Mutation. Often introduced to guard against premature convergence. Generally, over a period of several generations, the gene pool tends to become more and more homogeneous. The purpose of mutation is to introduce occasional perturbations to the parameters to maintain genetic diversity within the population. Replacement. After generating the offspring’s population through the application of the genetic operators to the parents’ population, the parents’ population is totally replaced by the offspring’s population. This is known as non-overlapping, generational replacement. This completes the “life cycle” of the population. Termination. The GA is terminated when some convergence criterion is met. Possible convergence criteria are: the fitness of the best individual so far found exceeds a threshold value, or the maximum number of generations is reached. After terminating the algorithm, the optimal solution of the problem is the best individual so far found. The block diagram of the genetic algorithm is given in Figure 1.
The parameters that are optimized using the genetic algorithm are the number of layers, the number of neurons, and the corresponding weights during the training phase. The network’s output for each individual is compared with the desired output and the overall error rate is minimized throughout the evolution process of the genetic algorithm.18
ANN was originally used in medicine to investigate the causality of a number of diseases and it was found to have relatively high accuracy.19–23 Some researchers used ANN to diagnose celiac disease based on the occurrence of oral lesions including RAU.24 Others used it to predict survival rates of cancer patients undergoing esophagus and esophagogastric junction resections,25 to predict relapse in breast cancer patients,26 to predict lymph node metastasis in gastric cancer,27 to diagnose and predict survival of patients with colon cancer, 28 to predict radiation-induced liver disease,29 and to study pancreatic cancer.21 Despite the promising medical applications of ANN, its use in oral medicine is still limited and is mainly focused on oral cancer and precancer.30–35
The aim in this study was to find the predisposing factors suitable for constructing artificial neural networks capable of predicting the occurrence of RAU.
All ninety six participants included in this study were patients attending the Orthodontics clinic in the Dental Department at The University of Jordan Hospital. Patients were first-time attendees seeking orthodontic treatment for mild to moderate malocclusion. Participants included in this study reported a medical history free of any disease except for common infectious diseases like flu or common colds and had no oral or dental pathologies. The 96 patients in this study were divided into two groups. Group 1 consisted of 86 patients for the construction phase of the ANNs and group 2 consisted of 10 patients for the reproduction (prediction) phase of the study.
Patients were asked to fill out a questionnaire containing items concerning: oral hygiene habits (tooth brushing, use of mouth wash, and use of dental floss), nutritional habits (daily consumption of fresh fruits and vegetables), and history of recurrent nontraumatic oral ulceration.
All patients were investigated for complete blood count, serum vitamin B12, serum ferritin, and red cell folate. Blood samples for complete blood count and red cell folate were collected in ETDA tubes. Blood samples for ferritin and B12 were collected in plain tubes. All tests were analyzed in batch samples at the University of Jordan Hospital Clinical Laboratories.
A sample of saliva was collected from each patient. Patients were instructed to expectorate all saliva in a sterile container for a period of 5 minutes; additionally, they were asked not to eat or drink for at least 1 hour prior to the procedure. Salivary samples were cultured within 2 hours of collection on Sabouraud glucose agar plates using the streaking method, and incubated at 35°C for 24–48 hours. All yeast-like colonies were recorded and identified if they were Candida or budding yeast cells by using wet preparations, ChromCandida agar, and RapID Yeast Plus Systems for Yeast Species (Remel, KS, USA).
The software used to construct the ANNs employed genetic algorithms for network optimization. The population size was set to 50 ie, generations were patches of 50 ANNs. When the fitness of a certain individual (certain ANN configuration) is less than 100%, the operation proceeds to the next generation where another patch of 50 ANNs are produced and so on until at least one ANN is produced with a fitness of 100%.18
Nine ANNs were constructed. For each network, different group of predisposing factors were used as input data as detailed below. Output data was always the same for the networks and described the presence (expressed as 1) or absence (expressed as 2) of oral ulceration for each participant.
Input data (predisposing factors) for each network were as follows:
Network 1: Gender, hemoglobin, serum vitamin B12, serum ferritin, red cell folate, salivary candidal colony count, frequency of tooth brushing and flossing daily, frequency of using mouth wash weekly, and the number of fruit and vegetables consumed daily.
Network 2: Gender, hematological, and mycological results.
Network 3: Gender, hematological, and mycological results, tooth brushing, and consumption of fresh fruits and vegetables.
Network 4: Gender and hematological results.
Network 5: As network 3 but without gender.
Network 6: As network 3 but without the data for hemoglobin.
Network 7: As network 3 but without the data for vitamin B12.
Network 8: As network 3 but without the data for ferritin.
Network 9: As network 3 but without the data for red cell folate.
All networks were designed and constructed based on GA optimization.
Following the construction phase of the networks, the output was reproduced (network output) and the network was trained on input and output data until the deviation of the network output from the actual output was very small.
In the final phase, each network was used to obtain predictions of the RAU status of patients in group 2 using the input data of the patients in that group.
Network predictions that were less than 1.5 were considered equivalent to 1, indicating presence of RAU. If the network prediction was 1.5 or more, it was considered equivalent to 2, indicating absence of RAU.
Statistical tests of significance were used to explore statistically significant differences in: gender, hemoglobin, vitamin B12, ferritin, red cell folate, or candidal colonies count or oral hygiene and dietary habits between the group with RAU and the group without RAU (Table 2).
Ethical committee approval (by the University of Jordan Hospital) was obtained to carry out this study. All participants (or their guardians when required) signed a consent form to participate in this study.
Table 1 displays the predictions of the nine networks for patients in group 2. Networks 3 and 8 produced the most accurate predictions. Networks 5 and 7 produced the least accurate predictions.
Accuracy of predictions with networks 3 and 8 were 90%, with networks 4, 6, and 9 it was 80%, with networks 1 and 7 it was 70%, and with networks 2 and 5 it was 60%.
Table 2 displays the results of statistical tests of significance. At a 95% confidence interval, participants with RAU were not significantly different when compared with those without RAU regarding all possible predisposing factors.
Little work has been done on the use of artificial intelligence to predict diseases. This study is the first to utilize ANN for the prediction of this rather unclear entity of diseases termed RAU. Although a number of factors have been linked to RAU,2,3 it can be considered an idiopathic disease with an unknown etiology in most cases.2
While ordinary statistical tests of significance could not detect significant differences in the possible predisposing factors between participants who were affected by RAU and those who were not, some ANNs constructed in this study and trained on the same values could detect a pattern that incriminates some of the above-mentioned factors as predisposing factors to RAU.
It is important to notice that statistical tests were performed at a 95% confidence interval. Some researchers advocate the use of different confidence intervals when testing statistical significance in certain situations.36
The better performance of ANN in this study over ordinary statistics is in agreement with the findings of Kattan.14 The ANN software used in this study employed a specialized genetic algorithm for the build-up and optimization of all tested networks; however, the performance of the different networks was not consistent as accuracy depended on the choice of assumed input data (predisposing factors) for any given network.
For a neural network estimation of certain values, it is common to have a difference between the actual output and the estimated values. Hence, for a network output of either 1 or 2 in this study, the network output is rounded up to the nearest number.
It has been noticed that the more the network is trained on the supplied set of data, the more accurate it becomes.
As far as the prediction of unknown data is concerned and depending on the aforementioned factors, the accuracy of the ANNs can be reach 90%. This in itself has a significant clinical value.
Gender, hematological and mycological data, tooth brushing, and consumption of fruits and vegetables were the most important factors that produced networks with high accuracy. However, the elimination of ferritin as a predisposing factor did not affect the accuracy of the network. In fact, predictions made with network 8 have a sum of deviation equals to zero. This renders network 8 the most accurate network in this study (Figure 2).
If trained on more input data and output data from new patients in the future, this network may be able to reach 100% accuracy.
Gender, hematological (without ferritin) and mycological data, tooth brushing, and fruits and vegetables consumed were found to be related to the occurrence of RAU.
We would like to thank the Deanship of Scientific Research/ University of Jordan, Amman, Jordan for providing the necessary funds to carry out this study.
We would also like to thank Mrs Dareen Yaseen and Mrs Manal Saleh for performing the hematological and mycological laboratory tests.
The authors report no conflicts of interest in this work.