Big Data in Education. A Bibliometric Review

Big Data in Education. A Bibliometric Review

Table of Contents


The handling of a large amount of data to analyze certain behaviors is reaching great popularity in the decade 2010–2020. This phenomenon has been called Big Data. In the field of education, the analysis of this large amount of data, generated to a greater extent by students, has begun to be introduced in order to improve the teaching-learning process. In this paper, it was proposed as an objective to analyze the scientific production on Big Data in education in the databases Web of Science (WOS), Scopus, ERIC, and PsycINFO. A bibliometric study was carried out on a sample of 1491 scientific documents. Among the results, the increase in publications in 2017 and the configuration of certain journals, countries, and authors as references in the subject matter stand out. Finally, potential explanations for the study findings and suggestions for future research are discussed.


Big Data, education, bibliometric study, Internet


Big Data is a concept that is currently in fashion and has been in specialized literature for more than a decade, alluding to the large amount of data that is generated at every moment as a result of technological evolution and the interactions of people in digital spaces (Waller and Fawcett 2013). However. it is only recently that it has had its greatest apogee and impact as an object of research as a result of technological advances and the development of platforms for interaction between users and these with the content, leading to an enormous amount of data (Ghani et al.).

Specifically, Big Data refers to the large volume of data generated because of the development of technology and the continuous actions and interactions of users in digital environments (Hussain and Cambria 2018). Other concepts related to Big Data are data learning mining or learning analytics. Data learning mining is all those techniques and procedures used to extract useful and relevant information from the large amount of data reported from educational platforms (Menon et al. 2017). On the other hand, learning analytics is a construct that is derived from data mining and alludes to the management, processing, and analysis of students’ educational data, which are studied with the purpose of improving and optimizing the learning process (Liang et al. 2016).

That is why, today, society is in what experts call the Big Data era, promulgating new challenges and benefits through the analysis of all data generated in environments characterized by high quantification (Pugna et al. 2019). Since the arrival of the new millennium, services such as the Internet and the development of the Web began to record data from users, their movements and interactions, creating a large bank of useful and relevant information, whose analysis reports great potentialities to study the needs and demands of people (Chen et al. 2012; Khan et al. 2018). Technological development and the emergence of popular social networks have led people to become active agents in digital media, exponentially multiplying the amount of data generated (Ni et al. 2016).

All this has led to a great interest on the part of researchers in studying all aspects concerning the enormous presence of data in all aspects of people’s lives (Williamson 2015; Williams et al. 2017). Thus, the European Commission stated that the Horizon 2020 report would be a major step towards the study of Big Data, with the aim of developing strategies to conduct research and innovation in this field of knowledge (Jin et al. 2015). The purpose of Big Data analysis is to collect a set of data from various electronic sources to be transformed into relevant information in order to improve the services which the user habitually accesses (Jagadish 2016).

Big Data is nourished by an era marked by the connectivity of people (Veltri 2017), where the action of creating content, sharing, and interacting with the rest of users in the community are the order of the day (Hussain and Cambria 2018). This provides a great opportunity to know—in addition to the needs—the psychological state of people and their behavior in virtual spaces (Eichstaedt et al. 2015). Given the peculiarities of the society in which we live, the data are growing at great speed (Al Nuaimi et al. 2015). So much so, that volume, speed, variety, veracity, and value are already spoken of as fundamental characteristics of the data and that are inherent to Big Data.

They present a disorganized structure and are in various formats such as text, image, voice, and video (Injadat et al. 2016). In order to analyze all the data in the digital environment, the concept of data science arises with the intention of managing and interpreting each and every one of the data by means of specialized programs with high processing capacity (Hicks and Irizarry 2018). These developments have led to the evolution of predictive analytics (Waller and Fawcett 2013), to adapt services to current trends demanded by the user (Saiki et al. 2018). Therefore, the data are used to predict and make decisions about the future (Ghani et al.), based on a strategic design that analyzes the requirements of the audience (Perlado-Lamo-de-Espinosa et al. 2019).

According to Moreno-Carriles (2018), the literature reveals that the treatment of Big Data has expanded into different fields of action, such as security, customer service, public services, preservation of the environment, the economy, finance, in addition to education, which is the field that interests us in this study. The Big Data that has mainly been exploited in the business world today is already being widely used in education (Aretio 2017), finding us in a new phase of teaching and learning based on the study of data generated by students (Gibson 2017). All the data derived from the different educational agents (teachers and learners) are currently being processed in order to improve the quality and experience of learning processes in digital environments (Liang et al. 2016).

Likewise, the data source produced by educational content management platforms is being used to develop tools and services adapted to the singularities of contemporary education, highly conditioned by the development of educational technology (Merceron et al. 2015). The immersion of the students in a distance and ubiquitous education has caused a great flow of data about their developed activity (Seufert et al. 2019).

However, experts such as Menon et al. (2017) consider that data mining techniques in the field of education—to this day—are not completely successful, so not all meaningful and valuable information is extracted. This is due to the fact that the handling and treatment of Big Data require the collaboration of teachers with specialists, with the objective of being able to obtain the relevant information from the data reported by the use of tools and digital resources of an educational nature (Huda et al. 2017). This allows learners to perform all kinds of actions in virtual spaces, whose generated data are used to obtain knowledge about their activity, performance, and satisfaction (Elia et al. 2019).

An effective analysis of Big Data contributes to the promotion of new and better educational experiences (Reidenberg and Schaub 2018), to an improvement of didactic programming tasks on the part of teachers with the help of scientists specializing in data analysis, to an efficient selection of strategies and decision making to approach the formative process, adequate to the demands of a learning group increasingly familiar with technology, seeking innovative learning as a result of the study of data (Huda et al. 2018), and all of this based on a predictive analysis of the data collected (Daniel 2015; Daniel 2017).

Therefore, Big Data and analytics of the interactions of educational agents in virtual environments are positioned as new ways to solve the shortcomings of the educational system (Picciano 2012), in such a way as to improve productivity, innovation (Sanchez and Ball 2015), and the personalization of learning (Dishon 2017). As a result, it was proposed as an objective to analyze the scientific output, understood as the published articles on Big Data in education in the Web of Science (WOS), Scopus, ERIC, and PsycINFO databases. Consequently, the following research questions were identified:

RQ1. What is the state of scientific production over time?

RQ2. Which journals and countries concentrate on the greatest scientific production on Big Data in education?

RQ3. Which are the articles of greater impact in the area of Big Data in education?

RQ4. What are the main lines of research in this field that are derived from the keywords of scientific articles?

Discussion and Conclusions

Coinciding with the important technological revolution we have been witnessing in recent decades and, in particular, with the rise of the so-called information and communication technologies, a scenario of constant change has been articulated in which the generation of data and the tools responsible for its treatment and management are increasingly important. Moreover, as it could not be otherwise, education cannot remain alien to all this reality. After some years of profound reflection and analysis, professionals and scholars of education are beginning to realize that all this data will make it possible to obtain very substantial, valuable, and detailed information about the way in which the agents involved (students, teachers, and families) are developing the teaching-learning processes so that they are able to determine the way in which these processes are being implemented in each of their phases and levels, with which it will also be possible to articulate the corrective measures and mechanisms needed to achieve high levels of quality and efficiency. This is without forgetting the possibility of being able to individualize it and adapt it to the characteristics, needs, and interests of each student, in order to achieve high levels of efficiency and quality (Asur and Huberman 2010M. Chen et al. 2014Provost and Fawcett 2013).
In spite of the great potentialities of Big Data, it seems clear that, at present, the field of education is not getting all the performance that would be desirable, in terms of data collection, individualization, and improvement of quality and efficiency of teaching-learning processes. As this is such a young and technological stream of thought, it requires the mastery and implementation of a wide repertoire of computer and technological skills and competencies. Unfortunately, they are not available to the majority of teachers, which is often leading to their very poor and inappropriate use, with the consequent damage to the efficiency and significance of student learning (Genevieve et al. 2015Shum and Ferguson 2012). At this point, it seems appropriate to insist on the need to articulate specific training and qualification plans oriented towards the knowledge of the main technological skills, abilities, and competencies (Gorospe et al. 2015Correa 2015Dussel 2012).
As an answer to the first of the questions posed in this study (What is the state of scientific production over time?), it should be noted that this is a young phenomenon and, therefore, one that has only recently come into being. This is demonstrated by the fact that the first scientific publications related to the subject do not begin to see the light until 2010. Although, it is no less true that since 2012 and up to the present time they have increased exponentially, as a result of the boom that this phenomenon has been experienced in the field of business, social networks, and education (Bennett 2015).
Most of the research related to the study of Big Data (the second question of the study presented here) is concentrated, as far as the publication and dissemination of scientific results are concerned, in very specific journals located in countries or environments with a clear English or Anglo-Saxon tradition. This casual circumstance is related to the fact that these are some of the major environments in which these new currents of thought. In these environments, they are more widely developed, rooted, and consolidated, to the point that, in recent years, they have begun to become an outstanding element of dissemination to the rest of the developed countries of all these new approaches in the treatment and management of data, as elements of clear individualization, improvement, and efficiency of the teaching-learning processes (Caballero 2013).
By countries, and at a high level of agreement with the ideas outlined in the preceding paragraph, it should be noted that the United States is the country with the highest production, followed by China, the United Kingdom, and Canada, as far as scientific publications related to Big Data are concerned. Once again, there is evidence of the progressive development that trend of technological thought related to Big Data has been experiencing in Anglo-Saxon countries, to the point that they have become the main cultural window for the development of all these tools, especially in the field of education and, more specifically, the teaching-learning processes (DatAnalysis 15M 2013).
The only exception, with respect to the hegemonic countries in the use and disclosure of Big Data, is China, which appears in second place. This fact, although it goes a little out of the basic pattern because it is not a country of Anglo-Saxon culture or English-speaking, is not surprising. It is well known that China is configured as a leading and technologically advanced nation that even becomes a pioneer in the development and implementation of many of the most important technological advances that end up reaching the main developed countries, including the United States itself (Medici 2009).
Although Germany does not end up occupying a predominant role with respect to the use, handling, and expansion of the technological tools assigned to Big Data, it is configured as the great gateway to Europe of the main technological advances related to the treatment of the complex data and information chains. As in the case of China, this result is not surprising either, because, with regard to the European continent, Germany represents the flagship of economic and technological prosperity. It is well above most of the countries that make up the European Union, and therefore ends up becoming the main introducer and engine of all kinds of advances, as well as a clear model to imitate (Hernández 2012).
With regard to the articles with the greatest impact linked to Big Data (the third question in the study), those that deal with topics clearly related to the field of business and production processes stand out, followed by those linked to the health field, in particular to the improvement of people’s health levels and quality of life in elderly or elderly individuals. However, recent studies try to analyze the benefits of Big Data as a tool for the collection of data to analyze the development, design, and implementation of teaching-learning processes in their different phases and levels are gaining much prominence. With the idea of providing them with greater quality, efficiency, and significance, as well as to articulate the means and strategies that make it possible to individualize them and adapt them to the characteristics, needs, and interests of the students, in this way, we guarantee that each student, during their development, receives all that they need and that, therefore, we offer them the possibility of carrying out the whole teaching-learning process with high doses of efficiency and quality, contributing to the significance of the learning (Área 2011Salazar 2016).
Ultimately, the main lines of research linked to the phenomenon of Big Data (the fourth question of the study) show an almost absolute coincidence with the most important topics that have been working scientific articles of greater impact. This is evidenced by the fact that some of the main lines of research regarding Big Data are those that are found as a central topic in clusters. However, it also appears as a promising and very current line of research, which focuses on the figure of the student to place special emphasis on the knowledge of all those methodologies and strategies of a didactic nature. It has been insisting on the convenience and the need to evaluate the way in which the teaching-learning processes are being developed for the articulation, design, and implementation of perfectly individualized intervention procedures adapted to the needs of the student (Dussel 2014Martín-Barbero 2012).

About KSRA

The Kavian Scientific Research Association (KSRA) is a non-profit research organization to provide research / educational services in December 2013. The members of the community had formed a virtual group on the Viber social network. The core of the Kavian Scientific Association was formed with these members as founders. These individuals, led by Professor Siavosh Kaviani, decided to launch a scientific / research association with an emphasis on education.

KSRA research association, as a non-profit research firm, is committed to providing research services in the field of knowledge. The main beneficiaries of this association are public or private knowledge-based companies, students, researchers, researchers, professors, universities, and industrial and semi-industrial centers around the world.

Our main services Based on Education for all Spectrum people in the world. We want to make an integration between researches and educations. We believe education is the main right of Human beings. So our services should be concentrated on inclusive education.

The KSRA team partners with local under-served communities around the world to improve the access to and quality of knowledge based on education, amplify and augment learning programs where they exist, and create new opportunities for e-learning where traditional education systems are lacking or non-existent.

FULL Paper PDF file:

Big Data in Education. A Bibliometric Review



José-Antonio Marín-Marín, Jesús López-Belmonte, Juan-Miguel Fernández-Campoy, José-María Romero-Rodríguez




Big Data in Education. A Bibliometric Review

Publish in

Soc. Sci. 2019


PDF reference and original file: Click here

Website | + posts

Nasim Gazerani was born in 1983 in Arak. She holds a Master's degree in Software Engineering from UM University of Malaysia.

Website | + posts

Professor Siavosh Kaviani was born in 1961 in Tehran. He had a professorship. He holds a Ph.D. in Software Engineering from the QL University of Software Development Methodology and an honorary Ph.D. from the University of Chelsea.

+ posts

Somayeh Nosrati was born in 1982 in Tehran. She holds a Master's degree in artificial intelligence from Khatam University of Tehran.