Search Articles

Home / Articles

Enhancing Social Media Text Analysis: Investigating Advanced Preprocessing, Model Performance, and Multilingual Contexts

. Agha Muhammad Yar Khan, Abdul Samad Danish, Irfan Haider, Sibgha Batool, Muhammad Adnan Javed & Waseem Tariq


Abstract

The objective of this study is to assess the effectiveness of sophisticated text preprocessing and normalization methods in handling the unique linguistic characteristics of social media, including vernacular and emoticons. This study investigates the effects of these approaches on the performance of the models, with a particular focus on multilingual environments that T5 and DistilBERT confront. Furthermore, the research investigates the incorporation of additional contextual information and the improvement of model interpretability in the context of text classification. Adversarial training techniques are also being contemplated as a means to enhance the resilience of models against deceptive text patterns. The findings derived from a comprehensive assessment of various machine learning models—LSTM, T5, DistilBERT, SVM, and Naive Bayes—on a 90,000-tweet dataset along with additional datasets obtained from Kaggle—emphasize that transformer-based models exhibit superior performance in the domain of text classification. This research enhances the comprehension of human language nuances by NLP models, thereby making a valuable contribution to the development of more accurate and efficient text analysis tools.

 

 

Index Terms- Natural Language Processing, Machine Learning, Multilingual Analysis.

Download :