Комплексный анализ русскоязычных текстов на основе нейросетевых моделей трансформерного типа

Шиян В.И.; Марков В.Н.

Comprehensive Analysis of Russian-Language Texts Based on Transformer-Type Neural Network Models

Shiyan V.I., Markov V.N.

Incoming article date: 18.02.2025

This article presents a comprehensive analysis of Russian-language texts utilizing neural network models based on the Bidirectional Encoder Representations from Transformers (BERT) architecture. The study employs specialized models for the Russian language: RuBERT-tiny, RuBERT-tiny2, and RuBERT-base-cased. The proposed methodology encompasses morphological, syntactic, and semantic levels of analysis, integrating lemmatization, part-of-speech tagging, morphological feature identification, syntactic dependency parsing, semantic role labeling, and relation extraction. The application of BERT-family models achieves accuracy rates exceeding 98% for lemmatization, 97% for part-of-speech tagging and morphological feature identification, 96% for syntactic parsing, and 94% for semantic analysis. The method is suitable for tasks requiring deep text comprehension and can be optimized for processing large corpora.

Keywords: BERT, Russian-language texts, morphological analysis, syntactic analysis, semantic analysis, lemmatization, RuBERT, natural language processing, NLP