Extending MLP ANN hyper-parameters Optimization by using Genetic Algorithm

Extending MLP ANN hyper-parameters Optimization by using Genetic Algorithm

Table of Contents


Optimizing the hyper-parameters of a multi-layer perceptron (MLP) artificial neural network (ANN) is not a trivial task, and even today the trial-and-error approach is widely used. Many works have already presented using the genetic algorithm (GA) to help in this optimization search including MLP topology, weights, and bias optimization. This work proposes adding hyperparameters for weights initialization and regularization to be optimized simultaneously with the usually MLP topology and learning hyper-parameters. It also analyses which hyperparameters are more correlated with classification performance, allowing a reduction in the search space, which decreases the time and computation needed to reach a good set of hyper-parameters. Results achieved with public datasets reveal an increase in performance when compared with similar works. Also, the hyperparameters related to weights initialization and regularization are among the top 5 most relevant hyper-parameters to explain the accuracy performance in all datasets, showing the importance of including them in the optimization process.

Author Keywords

  • artificial neural network,
  • multi-layer perceptron,
  • MLP,
  • genetic algorithm,
  • GA,
  • hyper-parameters

IEEE Keywords

  • Genetic algorithms,
  • Neurons,
  • Topology,
  • Optimization,
  • Training,
  • Artificial neural networks,
  • Network topology



Once each problem has specificities about its data, to choose the optimal hyper-parameters of an MLP usually involves a trialand-error approach, which consumes time, computational resources and requires the researcher to have great experience to properly tune the MLP. It is thus highly desirable to have a method to automatically search for the optimal hyperparameters efficiently. By hyper-parameters we mean those responsible for defining the topology, learning, weights initialization and regularization options of an MLP. GA has been widely used as an alternative to the classical Back-propagation (BP) algorithm [1] to tune the set of weight values of MLP with fixed neural topology such as in [2] [3]. Some works studied the use of GA to find only the MLP topology, i.e., the number of hidden layers and the number of neurons in each layer, as in [4]. Others use GA to search for the optimal values of MLP weights together with its topology, instead of using the classical BP, as in [5]. GA has also been used to tune MLP weights and topology such as in [6] [7], and to compose a hybrid training strategy with BP [8]. This hyperparameter optimizations certainly increases the performance of the MLP. However, there are other essential hyper-parameters, such as weights initialization and regularization that also need to be tuned because they can improve the MLP performance. The weights initialization hyper-parameters used in this work control the statistical distribution and the scale of initial weights. Poorly initialized weights may prevent to achieve a good performance, either leading to a slower training and requiring more epochs to train or to a faster training but with an increased risk of being trapped in a local minimum [9]. On the other hand, an optimized weight initialization will allow the MLP back-propagation to efficiently decrease the error through the epochs, reaching better performance. The regularization hyper-parameters are especially essential to improve the generalization of a network with limited sample size and a large number of parameters[10]. With a large number of parameters, the MLP can memorize the training instances exactly and achieve a supposed error-free perfect fit (Fig. 1), compromising the capability of the network to generalize the acquired knowledge on prediction for the examples not used in training.

To improve the classification performance, this work proposes adding the weights initialization and the regularization hyper-parameters to be optimized simultaneously with the MLP topology and learning hyper-parameters by using a GA. The proposed method named MLPGA+4 because of the 4 hyperparameters categories to be optimized simultaneously Moreover, the relationship between these added hyperparameters and the classification performance will be analyzed to understand the effects of these hyper-parameters on the classification performance. It will allow identifying hyperparameters space regions where best classification performance is achieved. With that, it will also be possible to restrict the search space and to develop a more efficient GA, which requires less time and computational resources to find a good set of hyper-parameters. The remainder of this paper is organized as follows: Section II briefly presents some general concepts about MLP networks and GA. Section III details the methodology developed, followed by the experimental results in Section IV. Section V presents the conclusions and future works.


This work presented the addition of hyper-parameters for weights initialization and regularization to be optimized simultaneously with topology and learning parameters of an MLP. It also proposed analyzing how these hyper-parameters affect the classification performance. The results from the five datasets show the proposed method allows training an MLP with better performance in classification task when compared with similar works. Moreover, the standard deviation of the mean accuracy rate presented by the proposed method is the smallest, demonstrating the stability of the approach. In the five datasets, the added hyper-parameters of weights initialization and regularization are found between the top 5 most relevant hyper-parameters to explain the accuracy rate of the MLP on classification tasks. The greatest difference in the mean accuracy rate occurred in Iris dataset with an increase of more than 15% from the worst to the best interval of input dropout ratio. Even with the higher number of neurons in all datasets, the MLP with the proposed method presented the highest accuracy rate in 3-fold cross-validation, showing the importance of the regularization hyper-parameters in controlling overfit. The initial weight distribution and initial weight scale are found between the top 5 most relevant hyper-parameters in 3 out of 5 datasets. In the Sonar dataset, the initial weight distribution optimization increased the mean accuracy rate by 3,6%. This result shows the importance of this hyper-parameter to be included in the optimization process. Due to the peculiarities of each problem, each dataset benefited from a different set of hyper-parameters and achieved the best set in different generations. Therefore, a pattern that could be used to make the search by GA more efficient was not found. On the other hand, it shows how important it is to optimize these hyper-parameters in each dataset with a minimum of 20 generations to achieve high performance. Future extensions to this work include adding the hidden layer dropout hyper-parameter to be optimized with the ones proposed herein and analyzing the performance of the MLPGA+4 in regression tasks. The correlation between the hyper-parameters themselves is to be analyzed to search for a pattern that may possibly be used to reduce the hyper-parameters space of search, thus reducing the time needed to find the optimal set of hyper-parameters.

About KSRA

The Kavian Scientific Research Association (KSRA) is a non-profit research organization to provide research / educational services in December 2013. The members of the community had formed a virtual group on the Viber social network. The core of the Kavian Scientific Association was formed with these members as founders. These individuals, led by Professor Siavosh Kaviani, decided to launch a scientific / research association with an emphasis on education.

KSRA research association, as a non-profit research firm, is committed to providing research services in the field of knowledge. The main beneficiaries of this association are public or private knowledge-based companies, students, researchers, researchers, professors, universities, and industrial and semi-industrial centers around the world.

Our main services Based on Education for all Spectrum people in the world. We want to make an integration between researches and educations. We believe education is the main right of Human beings. So our services should be concentrated on inclusive education.

The KSRA team partners with local under-served communities around the world to improve the access to and quality of knowledge based on education, amplify and augment learning programs where they exist, and create new opportunities for e-learning where traditional education systems are lacking or non-existent.

FULL Paper PDF file:

Extending MLP ANN hyper-parameters Optimization by using Genetic Algorithm



F. Itano, M. A. de Abreu de Sousa and E. Del-Moral-Hernandez,




Extending MLP ANN hyper-parameters Optimization by using Genetic Algorithm,

Publish in

2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, 2018, pp. 1-8, 



PDF reference and original file: Click here

+ posts

Somayeh Nosrati was born in 1982 in Tehran. She holds a Master's degree in artificial intelligence from Khatam University of Tehran.

Website | + posts

Professor Siavosh Kaviani was born in 1961 in Tehran. He had a professorship. He holds a Ph.D. in Software Engineering from the QL University of Software Development Methodology and an honorary Ph.D. from the University of Chelsea.

Website | + posts

Nasim Gazerani was born in 1983 in Arak. She holds a Master's degree in Software Engineering from UM University of Malaysia.