De-identification and Privacy Issues on Bigdata Transformation

De-identification and Privacy Issues on Bigdata Transformation

Table of Contents


As the number of data in various industries and government sectors is growing exponentially, the ‘7V’ concept of big data aims to create new value by indiscriminately collecting and analyzing information from various fields. At the same time as the ecosystem of the ICT industry arrives, big data utilization is threatened by the privacy attacks such as infringement due to a large amount of data. To manage and sustain the controllable privacy level, there need some recommended de-identification techniques. This paper exploits those de-identification processes and three types of commonly used privacy models. Furthermore, this paper presents use cases that can be adopted those kinds of technologies and future development directions.


  • Author Keywords
    • Big Data,
    • Privacy,
    • personal information,
    • k-anonymity,
    • l-diversity,
    • t-closeness
  • IEEE Keywords
    • Data privacy,
    • Guidelines,
    • Industries,
    • Diseases,
    • Data models


Over the last 20 years, the number of data has occurred in a multi-dimensional industry or sector, and the number has increased exponentially. According to the “Data Age 2025” white paper published by the International Data Corporation (IDC) in 2018, the amount of data worldwide will increase by about five times to 175 ZB by 2025 from 33ZB (Zettabyte, 1 trillion GB) as of 2018 [1]. Big data means not only just big data but also enormous data, which includes letters and phase data, compared to data generated in analog environments in the past. In recent years, various industries have been fascinated by the potential of big data, making analyses for various value creation, and the individual, corporate and humming countries that effectively utilize this big data have brought about the effect of opening a new paradigm chapter. Since big data is intended to improve the interests of social organizations, it is necessary to push for the beneficial use of big data at a national level. However, big data has problems with the premise of including a lot of data. For most of the data with value in use, it is often the attribute that identifies the individual, the attribute that is not the identifier by itself, but the attribute that can be used to deduce a particular person indirectly through combination with other data, and the attribute that can reveal the person’s privacy (e.g., card information, salary sensitivity, etc.). In other words, because big data is produced, collected, and analyzed indiscriminately, there is a risk that personal privacy may be infringed upon depending on the information collection process or the results of the analysis. In addition to the explosion of data resulting from the advent of the ICT industry ecosystem, the scope of the personal information area continues to expand, and the potential for various categories of crimes, including cybercrimes and invasion of privacy, can increase at a time when too much information is missing from data tests. As a result, it is predicted that building and utilizing appropriate unidentifiable data will ensure stability and reliability in making proper use of big data.

The composition of this paper is as follows. Chapter II provides a brief supplement to the big data mentioned in the introduction and introduces current guidelines for nonidentification processing and the technologies and models that apply. Chapter III introduces three cases of big data analysis based on non-identified data and tries to examine what value can be created. Finally, Chapter IV aims to use this study to apply non-identifiable treatment guidelines in the future to suggest ways to use them in research on privacy and security aspects.


To take the benefits which big data provides to individuals and society, it is essential to ensure that individuals can trust the privacy protection arising from the use of big data. Finding meaningful patterns in unstructured forms of big data is becoming increasingly important to create value. The personal information(privacy) which we commonly use also can make the value that has major importance as big data. As I mentioned in the text, it must meet the basic deidentification process to take advantage of it. In this paper, rather than introduce new concepts of privacy models or de-identification methods, this study looked at how the privacy model was based and, also looked at how values were produced based on the data from which the de-identification measures were taken. This is expected to be used to define new assumptions or to conduct analyses based on de-identified data in the future. Meanwhile, the three methods of k-anonymity, l-diversity, and t-closeness presented in the privacy model and its parameters are determined by experts. In future research, we will adopt the deep-learning models to find the optimal parameter from k-anonymity, l-diversity, and t-closeness.

About KSRA

The Kavian Scientific Research Association (KSRA) is a non-profit research organization to provide research / educational services in December 2013. The members of the community had formed a virtual group on the Viber social network. The core of the Kavian Scientific Association was formed with these members as founders. These individuals, led by Professor Siavosh Kaviani, decided to launch a scientific / research association with an emphasis on education.

KSRA research association, as a non-profit research firm, is committed to providing research services in the field of knowledge. The main beneficiaries of this association are public or private knowledge-based companies, students, researchers, researchers, professors, universities, and industrial and semi-industrial centers around the world.

Our main services Based on Education for all Spectrum people in the world. We want to make an integration between researches and educations. We believe education is the main right of Human beings. So our services should be concentrated on inclusive education.

The KSRA team partners with local under-served communities around the world to improve the access to and quality of knowledge based on education, amplify and augment learning programs where they exist, and create new opportunities for e-learning where traditional education systems are lacking or non-existent.

FULL Paper PDF file:

'e-identification and Privacy Issues on Bigdata Transformation



Hyo-jun Lee; Si-heon Cho; Ji-won Seong; Suan Lee; Wookey Lee




De-identification and Privacy Issues on Bigdata Transformation,

Publish in

2020 IEEE International Conference on Big Data and Smart Computing (BigComp), Busan, Korea (South), 2020, pp. 514-519,



PDF reference and original file: Click here

+ posts

Somayeh Nosrati was born in 1982 in Tehran. She holds a Master's degree in artificial intelligence from Khatam University of Tehran.

Website | + posts

Professor Siavosh Kaviani was born in 1961 in Tehran. He had a professorship. He holds a Ph.D. in Software Engineering from the QL University of Software Development Methodology and an honorary Ph.D. from the University of Chelsea.

Website | + posts

Nasim Gazerani was born in 1983 in Arak. She holds a Master's degree in Software Engineering from UM University of Malaysia.