PDD: Predictive Diabetes Diagnosis using Data mining Algorithms

PDD: Predictive Diabetes Diagnosis using Datamining Algorithms

Table of Contents


Data analytics is used to obtain useful insights from small or large data set to conclude some useful information and also used for future recommendation and decision making. Predictive Analytics uses data mining, machine learning techniques to make predictions about future. It involves the analysis of available data. The predictive analytics in health care is primarily used to determine patients having initial stages of diabetes, asthma, heart disease and another critical lifetime disease. The proposed method PDD uses data mining algorithms to predict type2 diabetes. The data mining algorithms used in the proposed system are K-Means Clustering and Random Forest. The predictive model, PDD provides better results in terms of accuracy when compared to hierarchical clustering and Bayesian network clustering with random forest prediction.

  • Author Keywords

    • Type 2 diabetes,
    • Prediction,
    • K-Means Clustering,
    • Random Forest


For long years data mining is used for searching knowledge from the data [1, 2, 3, 4]. Diabetes is not like other diseases, it prolong to the patients and also causes other diseases. Insulin secretion problem in pancreas is called diabetes. Two forms of diabete also exist. These are groups 1 and 2[5]. Type 1 diabetes’ other name is called insulin-dependent, or childhood-onset diabetes. This is because of less insulin secretions in the body. The ineffective use of insulin in the body triggers Type 2 diabetes which is also other way known as adult-onset diabetes and also as non-insulin-dependent diabete. Also produced by less physical activity and increased body weight. Diabetes is one of the biggest increasing challenges in India. The diabetic population is increased by 8.7% in the age group of 20 to 70. The non-communicable disease, diabetes causes other diseases and the growing incidence of diabetes is influenced by a combination of factors such as rapid urbanization, sedentary lifestyles, unhealthy diets, tobacco use and increased life expectancy. According to World Health Organization (WHO) the number of people in diabetes will double in next decade.InIndia, The diabetics in India is estimated as 31,705,000 at present and it will reach 79,441,000 in the year 2030. The WHO assessment in the year 2013 showed that 63 million diabetic. According to International Diabetes Federation Atlas in the year 2015, 69.2 million Indians are diabetic. Diabetes is rapidly gaining the status of a potential disease in India and diabetes prevalence is expected to double globally. The increase in diabetic is from 171 million in 2000 to 366 million in 2030 and the maximum increase rate is evident in India [6]. Worldwide currently 143 million people suffer from this major disease diabetes mellitus. This number alarmingly shows rapid growth. Five percent of India’s population suffers from diabetes mellitus. This disease is controlled and managed by timely detection and periodic health check-ups at regular intervals. Diabete is most prevailing disease and both men and women are getting affected. The upper middle class people are affected more than the lower middle class people. The Figure 1 depicts the diabetes prevalence in India [7] and Figure. 2 Samples the prevalence of diabetes and prediabetes in 15 Indian states[8 ].More Research is required to predict the diabetic in developing countries.

Sadri Sa’di et al. implemented various algorithms for diagnosis of type 2 diabetes such as Naive Bayes, J48 and RBF Network using Weka tool [9]. The performance comparison showed that Naive Bayes gave accuracy rate of 6.95% and performed better than RBF and J48. Daniah Almadn and Abdolreza Abhari have compared type 2 diabetes diagnosis using classification models.

This paper implemented classification algorithms for diagnosis of type 2 diabetes and also compared the performance of the classification algorithms such as Logistic Regression and Support Vector Machine. The authors have proposed Fuzzy Expert System (FES) along with Fuzzy Inference System (FIS) to predict the type 2 diabetes. The comparisons of all the algorithms are done using Weka and MATLAB [10]. Mahmoud Heydari et al. have compared support vector machine, decision tree, 5-nearest neighbor, and Bayesian network classification algorithms for the diagnosis of type 2 diabetes in Iran. 95.03 % accuracy in the 5-nearest neighbor algorithm [11] proves that the data mining algorithms provides better results in type 2 diabetes diagnosis. Angus G. Jones et al. proposed a detailed study [12] on type 2 diabetes and the effect prediction on b-Cell Failure. The study is carried out with 620 type 2 diabetes patients. The HbA1c ‡58 mmol/mol (7.5%) is there in patients and they have assessed the therapy for 6 months. The effect of b-cell failure and the corresponding glycemic response is evaluated. Linear regression is used for prediction and the ANOVA tool is used for implementation. The research concluded that GLP-1RA therapy is suited for diabetes patients with b-Cell Failure.


Diabetes is the most common dangerous disease that can lead to additional problems such as heart attack, stroke, blindness, nerve damage, kidney failure, disease of the blood vessels, and sexual disempowerment. The proposed method using K-Means and random forest provides better accuracy for predicting type 2 diabete. To enhance the proposed method the fuzzy system and deep learning method could also be used.


The authors gratefully acknowledge the use of facilities at Sri Ramakrishna Engineering College, Coimbatore, India. Also, like to convey their sincere thanks to the Principal and the Management for their support and guidance.

About KSRA

The Kavian Scientific Research Association (KSRA) is a non-profit research organization to provide research / educational services in December 2013. The members of the community had formed a virtual group on the Viber social network. The core of the Kavian Scientific Association was formed with these members as founders. These individuals, led by Professor Siavosh Kaviani, decided to launch a scientific / research association with an emphasis on education.

KSRA research association, as a non-profit research firm, is committed to providing research services in the field of knowledge. The main beneficiaries of this association are public or private knowledge-based companies, students, researchers, researchers, professors, universities, and industrial and semi-industrial centers around the world.

Our main services Based on Education for all Spectrum people in the world. We want to make an integration between researches and educations. We believe education is the main right of Human beings. So our services should be concentrated on inclusive education.

The KSRA team partners with local under-served communities around the world to improve the access to and quality of knowledge based on education, amplify and augment learning programs where they exist, and create new opportunities for e-learning where traditional education systems are lacking or non-existent.

FULL Paper PDF file:

PDD: Predictive Diabete Diagnosis using Datamining Algorithms



M. S. Geetha Devasena, R. Kingsy Grace and G. Gopu,




PDD: Predictive Diabete Diagnosis using Datamining Algorithms

Publish in

2020 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 2020, pp. 1-4,



PDF reference and original file: Click here

+ posts

Somayeh Nosrati was born in 1982 in Tehran. She holds a Master's degree in artificial intelligence from Khatam University of Tehran.

Website | + posts

Professor Siavosh Kaviani was born in 1961 in Tehran. He had a professorship. He holds a Ph.D. in Software Engineering from the QL University of Software Development Methodology and an honorary Ph.D. from the University of Chelsea.

Website | + posts

Nasim Gazerani was born in 1983 in Arak. She holds a Master's degree in Software Engineering from UM University of Malaysia.