Document Type : Research Paper
Author
Associate Professor, Department of Electrical and Computer Engineering, Faculty of Engineering, Eyvanekey University, Eyvanekey, Semnan, Iran.Corresponding Author: Mohammad.Rabiei@eyc.ac. ir
Abstract
Semantic similarity is used in applications such as information retrieval, text summarization and sentiment analysis. In this article, a new method based on deep learning has been presented in order to check the matching percentage of the proposed name of the company registration applicants with the time of the company's activity. The key innovation lies in the use of a combined Aria BERT model for word embedding to convert registered company names into vectors. Additionally, the company's field of activity is converted into numerical vectors using the FastText model, which are then processed through deep learning algorithms, specifically bidirectional long short-term memory (Bi-LSTM) networks with an additional attention layer. The results were evaluated using cosine similarity and ROUGE criteria. Following the approval of the company name and activity field, the DBSCAN clustering method is employed to categorize the company names based on their activities.
The results demonstrate that the ROUGE-1, ROUGE-2, and ROUGE-L scores for company activity vectorization are 0/7623, 0/7413, and 0/7982, respectively. The overall model accuracy and recall were 0/8512 and 0/8317, respectively. Moreover, the correlation coefficient between the cosine similarity of the proposed names and the company's activity time, as calculated by the model, was 93%, confirming the model's effectiveness.
This method effectively preventing the registration of names that do not meaningfully relate to the company's operations. By clustering company names, the method facilitates the suggestion of related names based on the company's field of activity.
Keywords
Main Subjects