Search Articles

Home / Articles

Sentence-Level Classification of Web-Extracted Data in Urdu Language Text (ULT)

. Somia Ali , Uzma Jamil, Mamoona Jabbar & Muhammad Assad Jabbar


Abstract

The most prominent and dominant way of communication in the current digital era is text instead of using sound, emotions, pictures, and animation. Millions of users are using the internet because of its real-time availability. Social media is one of the most promising sources of information. On social media, the usage of local language is increasing day by day. People share their points of view on different topics of interest on social media. Natural Language Processing (NLP) is an emerging domain for the processing of different languages for different purposes. People from different cultures, interests, and knowledge areas share their ideas, opinions, and occasions like food festivals, sports, death and murder, politics, law and order, terrorist attacks, and others in the local language on social media. In this research, a sentence-level classification is performed for extracting the different occasions from social media. Those extracted occasions are then classified into different classes. For occasion classification, Machine learning (ML) classifiers are used. For the evolution of the proposed work, performance measuring parameters (precision, recall, F1-score, and accuracy) are used. In our experiment, linear SVC and Ridge classifier shows the best accuracy of 83%.  In the future, Deep learning classifiers can be used to enhance the accuracy of text classification.

Keywords: Machine Learning; Classification; Urdu Language Text; Sentence-level classification-; Natural Language Processing

Download :