Testing Map Reduce program using Induction Method

Testing MapReduce program using Induction Method,

Table of Contents


MapReduce is a “divide and conquer” applied paradigm for processing large volumes of data to filter out information to solve day to day complex challenges. MapReduce is the core of big data applications. The challenging part to test these applications which also represent the characteristic of these applications is variation in data due to different formats and sources. In other words, poor quality of input data can deviate the system towards failure if not handled properly programmatically for a variety of input data. MapReduce program itself based on transformations at different levels based on the program logic This paper proposes the testing technique based on the mathematical induction principle and considered as extension or conjunction with other testing techniques already in use either based on transformations analysis from input to output as in MRFlow. Proposed function testing can be used in business acceptance testing and showcase the correctness of the program, further can detect many defects even before shipping bigdata application in life.

  • Author Keywords

    • MapReduce,
    • Data Defects,
    • Induction,
    • MapReduce Testing,
    • MapReduce business acceptance testing


Software testing is the process of finding error or defect in the program or finding deviation (if any) in expected behavior or end result. The purpose of this exercise is to improve the quality of software and reduce the related costs of defect fix if encountered in a live environment. To test bigdata application individual testing required in each stage from the extraction of data, loading data in HFDS, transformation, and utilization of data as per business requirement, and further representing report or dashboard. To meet the envisioned purpose of business application it is equally desirable to perform functional and non-functional testing. MapReduce should be considered as a layer of bigdata application where key business rules get implemented. This makes testing of MapReduce as a key factor for the success of the big data implementation. Lecture “Big Data Essentials: HDFS, MapReduce, and Spark RDD” available on the Coursera website, suggests performing unit, integration, system, and acceptance testing [3]. This paper proposed another approach of functional testing based on mathematical induction principles and help to showcase the correctness of the MapReduce program. This approach should be considered as harmonizing other methods used to perform functional testing of the MapReduce application. As per book Concrete Mathematics, Scientific acceptance of mathematical induction has already discussed in different articles and can be understood with an example that we will climb as tall as we like on a stepping stool, by demonstrating that able to climb onto the foot rung (the premise) which from each rung we are able to climb up to the following one (the step)[4].

This metaphor helps to utilize mathematical induction to solve by formal verification. The remaining paper is organized as follows: section2 describes MapReduce paradigm, techniques, tools used for MapReduce, and related work done in this area. Next section 3 proposed techniques presenting in this paper along with the mathematical model of the Induction method. Section 4 is a case study that showcases the example of the proposed MapReduce testing technique. A further section is the conclusion notes for this paper.


The proposed testing technique is simple but effective to find bugs in the MapReduce program without worrying about the architectural complexity of the underlying framework. It provides confidence for program correctness and validation results for acceptance testing ensuring meeting business functional requirements in a live-like environment. The MapReduce programs are more prone to defects due to incorrect validation, data type mismatch, or following wrong processing for key-value pair or exception handling. Even sometimes defects can be for incorrect business calculations. These defects may cause program failure or may have business impacts. The proposed technique provides test cases for exceptions such as primitive cases along with validating them against a business requirements for given data set showcasing program correctness. As future work we plan to apply sampling for variety or voluminous data or finding acceptance index for iteration on data set, further, it can be automated with the inclusion of machine learning for test coverage and execution.

About KSRA

The Kavian Scientific Research Association (KSRA) is a non-profit research organization to provide research / educational services in December 2013. The members of the community had formed a virtual group on the Viber social network. The core of the Kavian Scientific Association was formed with these members as founders. These individuals, led by Professor Siavosh Kaviani, decided to launch a scientific / research association with an emphasis on education.

KSRA research association, as a non-profit research firm, is committed to providing research services in the field of knowledge. The main beneficiaries of this association are public or private knowledge-based companies, students, researchers, researchers, professors, universities, and industrial and semi-industrial centers around the world.

Our main services Based on Education for all Spectrum people in the world. We want to make an integration between researches and educations. We believe education is the main right of Human beings. So our services should be concentrated on inclusive education.

The KSRA team partners with local under-served communities around the world to improve the access to and quality of knowledge based on education, amplify and augment learning programs where they exist, and create new opportunities for e-learning where traditional education systems are lacking or non-existent.

FULL Paper PDF file:

Testing MapReduce program using Induction Method,



A. K. Rai and A. K. Malviya,




Testing a MapReduce program using Induction Method,

Publish in

2020 IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS), Bhopal, India, 2020, pp. 1-5, 



PDF reference and original file: Click here

+ posts

Somayeh Nosrati was born in 1982 in Tehran. She holds a Master's degree in artificial intelligence from Khatam University of Tehran.

Website | + posts

Professor Siavosh Kaviani was born in 1961 in Tehran. He had a professorship. He holds a Ph.D. in Software Engineering from the QL University of Software Development Methodology and an honorary Ph.D. from the University of Chelsea.

Website | + posts

Nasim Gazerani was born in 1983 in Arak. She holds a Master's degree in Software Engineering from UM University of Malaysia.