Normalization of Numeronyms using NLP Techniques

Normalization of Numeronyms using NLP Techniques

Table of Contents





Abstract

This paper presents a method to apply NaturalLanguage Processing for normalizing numeronyms to make them understandable by humans. We try to deal with the problem using two approaches, viz., semi-supervised approach and supervised approach. For the semi-supervised approach, we make use of the state of the art DamerauLevenshtein distance of words. We then apply Cosine Similarity for the selection of the normalized text and reach greater accuracy in solving the problem. For the supervised approach, we used a deep learning architecture to solve the problem at hand. Our approach garners accuracy figures of 71% and 72% for Bengali and English (for the semi-supervised approach) and 89% for the supervised approach, respectively.

  • Author Keywords

    • Cosine Similarity,
    • Damerau Levenshtein Distance,
    • LSTM,
    • Numeronym

Introduction

A numeronym is a number-based word. Most commonly, a numeronym is a word where a number is used to form an abbreviation [1], [2], [7]. Pronouncing the letters and numbers may sound similar to the full word: ”K9” for ”canine”(phonetically: ”kay” + ”nine”).Nowadays, the use of numeronyms is widespread due to the concept of Language Localization. Language localization the process of adapting the product in the language suited to the particular culture and geographical location/market. The need to communicate and connect with the younger audience is the main reason to adopt language localization services. There is a thin line between localization and translation. Translation includes grammar and spelling issues which vary by the geographical locations. Localization deals more with significant, non-textural components of products or services. It addresses other aspects such as adapting graphics, using appropriate dates, and time formats, adopting the local currency, choices of colors, and cultural references amongst many other details.

Now, when short segments of alphabets are replaced by numbers, the resulting word is still readable, but more often, we can see a more complex form of numeronyms, such asL10N(Localization) and I18N(Internationalization). Decipher-ing these forms can be quite trivial and needs an acquaintance with such language. To the best of our knowledge, we did not find any previous state-of-art work done in this domain.

We have used two approaches to solve the problem. The first approach is a semi-supervised method, where, to find the normalized version of the words, we have used the concept ofDamerau-Levenshtein [3] distance and Cosine similarity. This approach can counter the problem to a large instance and when checked manually, it gives us an accuracy of 71% and 72%, for Bengali and English respectively.

The second approach is based on Deep Learning architecture, where numeronyms and their corresponding normalized words are given to a neural network in character embedding. The network then learns to generate normalized words of a given numeronym. This approach, when checked manually, gives us an accuracy of 89% for both the languages.

The rest of the paper is organized as follows. Section IIdefines the data that was used for the experiment. The working algorithm has been described in detail in Section III. This is followed by results and concluding remarks in SectionIV and V respectively.

Conclusion

Through this work of ours, we manage to make machines understand Numeronyms and decode them as any normal person would by looking at them. After understanding them, these systems can more accurately respond to various requirements based on Language processing. Our results are quite satisfactory in portraying the success of the algorithm and hope to find use in the near future in systems of daily needs. Lack of data-sets have led to lower accuracy which makes way for further works to be carried out based on this work. A better similarity measurement metric and selection model might give more accurate results.

About KSRA

The Kavian Scientific Research Association (KSRA) is a non-profit research organization to provide research / educational services in December 2013. The members of the community had formed a virtual group on the Viber social network. The core of the Kavian Scientific Association was formed with these members as founders. These individuals, led by Professor Siavosh Kaviani, decided to launch a scientific / research association with an emphasis on education.

KSRA research association, as a non-profit research firm, is committed to providing research services in the field of knowledge. The main beneficiaries of this association are public or private knowledge-based companies, students, researchers, researchers, professors, universities, and industrial and semi-industrial centers around the world.

Our main services Based on Education for all Spectrum people in the world. We want to make an integration between researches and educations. We believe education is the main right of Human beings. So our services should be concentrated on inclusive education.

The KSRA team partners with local under-served communities around the world to improve the access to and quality of knowledge based on education, amplify and augment learning programs where they exist, and create new opportunities for e-learning where traditional education systems are lacking or non-existent.

FULL Paper PDF file:

Normalization of Numeronyms using NLPTechniques

Bibliography

author

A. Garain, S. K. Mahata, and S. Dutta,

Year

2020

Title

Normalization of Numeronyms using NLP techniques

Publish in

2020 IEEE Calcutta Conference (CALCON), Kolkata, India, 2020, pp. 7-9,

Doi

10.1109/CALCON49167.2020.9106524.

PDF reference and original file: Click here

 

+ posts

Somayeh Nosrati was born in 1982 in Tehran. She holds a Master's degree in artificial intelligence from Khatam University of Tehran.

Website | + posts

Professor Siavosh Kaviani was born in 1961 in Tehran. He had a professorship. He holds a Ph.D. in Software Engineering from the QL University of Software Development Methodology and an honorary Ph.D. from the University of Chelsea.

Website | + posts

Nasim Gazerani was born in 1983 in Arak. She holds a Master's degree in Software Engineering from UM University of Malaysia.