Knowledge Discovery from Social Media using Big Data provided Sentiment Analysis (SoMABiT)


Table of Contents


In today’s competitive business world, being aware of customer needs and market-oriented production is a key success factor for industries. To this aim, the use of efficient analytic algorithms ensures a better understanding of customer feedback and improves the next generation of products. Accordingly, the dramatic increase in using social media in daily life provides beneficial sources for market analytics. But how traditional analytic algorithms and methods can scale up for such disparate and multi-structured data sources is the main challenge in this regard. This paper presents and discusses the technological and scientific focus of the SoMABiT (Sentiment Analysis)as a social media analysis platform using big data technology. Sentiment analysis has been employed to discover knowledge from social media. The use of Map Reduces and developing a distributed algorithm towards an integrated platform that can scale for any data volume and provide a social media-driven knowledge is the main novelty of the proposed concept in comparison to the state-of-the-art technologies.


NoSQL Databases; Big Data Analytics, Product Use Information, Social Media Analysis, Sentiment Analysis, Knowledge Discovery, Semantic Technologies


Scalable Decision Making (SDM) in data-intensive applications is a key challenge for enterprises dealing with big data. Complexity and expansion of resources and streams as well as velocity and variety of the data have led to difficulties in enterprise asset management [1]. In this context, scalable algorithms and semantic technologies for dynamic, economic and efficient management of information could facilitate the transition from current challenging situations into a scalable and adaptive decision support systems’ era in enterprise practices. As an example, in the air traffic management with data-intensive activities such as detection of a conflict between two arriving aircraft, a key challenge is to semantically integrate, analyze and interpret data such as trajectory with flight plans, etc., so that they can be used directly for negotiation and resolution of conflicts [2]. The most important factors in such a scenario are time, cost, and scalability. Applications should not only be nearly on-time and cost-effective but also adequate to scale up by enhancing data and processing demands and access via different types of platforms anytime.

On the other hand, social media provides a massive amount of data, especially user-generated content that can be used for opinion mining for a wide variety of purposes(Sentiment Analysis). According to Nielsen’s report, the consumer-generated product reviews and ratings are the most preferred source of information between social media users [3]. Accordingly, social media analysis provides product improvement recommendations as well as smart and novel ideas for the next generation and new products. This also provides more efficient marketing methods for enterprises in today’s competitive business world. It is not limited only to industry and marketing-oriented aspects, but also to social, medical, and political goals. However, the main challenge here is to find out how to collect the data from social media, which data sources can be useful for specific goals, how to analyze collected data and discover useful knowledge from sources and how to provide scalable algorithms to proceed with the large volume and variety of data sources provided in social media. For instance, according to the reports released in February 2015, every day, Twitter produces about 500 million tweets, Facebook produces2 2.5 billion pieces of content (100 terabytes) and 144.000 hours of videos are being uploaded to the Youtube.

Managers and key decision-makers in various industrial and/or business sectors involve continuous supervision of huge amounts of information, which have to be collected from social media and analyzed for predictions, judgments, evaluations, strategic plans, and actions. In most cases, the decisions that have to be made are subject to strict restrictions regarding available resources and requested response times. Moreover, decision-makers usually encounter sudden, unexpected, and urgent events that can easily lead to dangerous situations that may threaten the safety and the reliability of the whole system. The issue of how to scale up or down for acceleration or deceleration imposes an additional complexity in today’s traditional decision-making methods [4]. This issue becomes more challenging when there are not any particular estimates of future growth for data volumes and service demands. Managing huge amounts of heterogeneous data has recently emerged(Sentiment Analysis) as a key challenge in many computing applications. In addition to the traditional Relational Database Management Systems (RDBMS), so-called NoSQL databases have appeared as high-performance alternatives, providing document-oriented storage for semi-structured or unstructured data [5]. These databases can also be deployed to many nodes and allow adjustable redundancy levels as required by the application. However, the right choice of the database management system and its correct parameterization according to the data as well as the data processing requirements of a specific application are not yet fully understood in a big data era [4]. Many aspects have to be taken into account, like the type of data assets, the access patterns in the data, the desired level of redundancy/availability, the isolation level of distributed nodes, and many more. The integration of heterogeneous data sources presents another key challenge. Current approaches to joining diverse data sources and creating an abstraction layer for unified data access are often the result of an ad-hoc approach [6]. A comprehensive methodology for creating data federations over diverse data sources, which is applicable for different domains, is still missing. This is particularly critical when static and dynamic data sources have to be combined for creating new insights from the data, especially through social media sources.

NoSQL database management systems are characterized by not using SQL as a query language – or, at least, not using fully functional structured queries. Mostly they do not offer relational operators like JOIN and generally do not provide full ACID (Atomicity, Consistency, Isolation, and Durability) guarantees [7] in terms of atomic transactions, consistent database states, transactional isolation, and durability of persistent data. On the other hand, NoSQL databases offer good performance and horizontal scaling across the nodes of a cluster. As such, they are well suited for web-scale applications and other big data domains, where the efficient storage and access to huge data volumes is more important than transactional consistency.

It is imperative to clearly define the strengths and weaknesses of NoSQL technology, where databases are not relational and have no fixed data scheme, complex relations, or joins. The common denominator of the majority of NoSQL databases is that they are optimized for large or massive data-store scaling, i.e. they are supposed to scale more efficiently and smoothly than RDBMSs by spreading the processing and storage load over a multitude of affordable server systems. On the other hand, relational database management systems – SQL databases – scale up by using ever faster and memory/disk rich high-end server hardware [4]. HBase technology [8], as a NoSQL database solution, which is a part of the Hadoop ecosystem, has been used in the data layer. One of the major advantages of using HBase is its distributed and scalable database support with random and real-time reads/writes.

The article is structured in five sections. The first section provides an introduction and motivation for the research. The next section (Sec. 2) reviews state-of-the-art technologies, tools, and findings from different perspectives that relate to this research. The section focuses on big data, social media analysis, and semantic technology areas and summarizes how the integrated use of these technologies could discover knowledge from social media and improve the quality of products. Section 3 discusses the SoMABiT (Sentiment Analysis)concept and provides requirement analysis as well as the list of the data sources Bohlouli et al. that have been used in the system. The realization, scientific background, evaluation, and verification of results are afterward discussed in section. The conclusion and future works are addressed in section 5 which summarized the findings and provides an outlook on the future work of the project.

About KSRA

The Kavian Scientific Research Association (KSRA) is a non-profit research organization to provide research / educational services in December 2013. The members of the community had formed a virtual group on the Viber social network. The core of the Kavian Scientific Association was formed with these members as founders. These individuals, led by Professor Siavosh Kaviani, decided to launch a scientific / research association with an emphasis on education.

KSRA research association, as a non-profit research firm, is committed to providing research services in the field of knowledge. The main beneficiaries of this association are public or private knowledge-based companies, students, researchers, researchers, professors, universities, and industrial and semi-industrial centers around the world.

Our main services Based on Education for all Spectrum people in the world. We want to make an integration between researches and educations. We believe education is the main right of Human beings. So our services should be concentrated on inclusive education.

The KSRA team partners with local under-served communities around the world to improve the access to and quality of knowledge based on education, amplify and augment learning programs where they exist, and create new opportunities for e-learning where traditional education systems are lacking or non-existent.

FULL Paper PDF file:

Knowledge Discovery from Social Media using Big


1. About Twitter Inc., (accessed June 2015).
2. Lea, Wendy, ‘Infographic: The Potential of Big Data’,
(2013, accessed June 2015).
3. Youtube Statistics, (accessed June 2015).
4. Semantic Web Standards, (accessed June 2015).
5. The World Wide Web Consortium (W3C), (accessed June 2015).
6. Linked Data – Connect Distributed Data across the Web, (accessed June 2015).
7. Research Data: Meta, Wikimedia Foundation, (accessed June 2015).
8. Data Dumps – Freebase API, (accessed June 2015)
9. Linked Open Data Could,, (accessed June 2015)
Bohlouli et al 18
10. European Research Project: Competence Profiling framework for IT sector in Spain (ComProFITS), (2013,
accessed June 2015).


This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors.

Corresponding author:

Mahdi Bohlouli, Institute of Knowledge-Based Systems, Department of Electrical Engineering and Computer Science, University of Siegen.
Hoelderlinstr. 3, D-57076, Siegen, Germany. Email:


Mahdi Bohlouli
Institute of Knowledge-Based Systems, Department of Electrical Engineering and Computer Science, University of Siegen, Germany
Jens Dalter
Department of Business Information Systems, University of Siegen, Germany
Mareike Dornhöfer
Institute of Knowledge-Based Systems, Department of Electrical Engineering and Computer Science, University of Siegen, Germany
Johannes Zenkert
Institute of Knowledge-Based Systems, Department of Electrical Engineering and Computer Science, University of Siegen, Germany
Madjid Fathi
Institute of Knowledge-Based Systems, Department of Electrical Engineering and Computer Science, University of Siegen, Germany

How to cite this article

Bohlouli, M., Dalter, J., Dornhöfer, M., Zenkert, J., & Fathi, M. (2015). Knowledge discovery from social media using big data-provided sentiment analysis (SoMABiT). Journal of Information Science41(6), 779–798.
Website | + posts

Professor Siavosh Kaviani was born in 1961 in Tehran. He had a professorship. He holds a Ph.D. in Software Engineering from the QL University of Software Development Methodology and an honorary Ph.D. from the University of Chelsea.

Website | + posts

Nasim Gazerani was born in 1983 in Arak. She holds a Master's degree in Software Engineering from UM University of Malaysia.

+ posts

Maryam kakaei was born in 1984 in Arak. She holds a Master's degree in Software Engineering from Azad University of Arak.