The article examines the transition of universities from data warehouses to data lakes, revealing their potential in processing big data. The introduction highlights the main differences between storage and lakes, focusing on the difference in the philosophy of data management. Data warehouses are often used for structured data with relational architecture, while data lakes store data in its raw form, supporting flexibility and scalability. The section ""Data Sources used by the University"" describes how universities manage data collected from various departments, including ERP systems and cloud databases. The discussion of data lakes and data warehouses highlights their key differences in data processing and management methods, advantages and disadvantages. The article examines in detail the problems and challenges of the transition to data lakes, including security, scale and implementation costs. Architectural models of data lakes such as ""Raw Data Lake"" and ""Data Lakehouse"" are presented, describing various approaches to managing the data lifecycle and business goals. Big data processing methods in lakes cover the use of the Apache Hadoop platform and current storage formats. Processing technologies are described, including the use of Apache Spark and machine learning tools. Practical examples of data processing and the application of machine learning with the coordination of work through Spark are proposed. In conclusion, the relevance of the transition to data lakes for universities is emphasized, security and management challenges are emphasized, and the use of cloud technologies is recommended to reduce costs and increase productivity in data management. The article examines the transition of universities from data warehouses to data lakes, revealing their potential in processing big data. The introduction highlights the main differences between storage and lakes, focusing on the difference in the philosophy of data management. Data warehouses are often used for structured data with relational architecture, while data lakes store data in its raw form, supporting flexibility and scalability. The section ""Data Sources used by the University"" describes how universities manage data collected from various departments, including ERP systems and cloud databases. The discussion of data lakes and data warehouses highlights their key differences in data processing and management methods, advantages and disadvantages. The article examines in detail the problems and challenges of the transition to data lakes, including security, scale and implementation costs. Architectural models of data lakes such as ""Raw Data Lake"" and ""Data Lakehouse"" are presented, describing various approaches to managing the data lifecycle and business goals. Big data processing methods in lakes cover the use of the Apache Hadoop platform and current storage formats. Processing technologies are described, including the use of Apache Spark and machine learning tools. Practical examples of data processing and the application of machine learning with the coordination of work through Spark are proposed. In conclusion, the relevance of the transition to data lakes for universities is emphasized, security and management challenges are emphasized, and the use of cloud technologies is recommended to reduce costs and increase productivity in data management.
Keywords: data warehouse, data lake, big data, cloud storage, unstructured data, semi-structured data
The development and application of methods of preliminary processing of tabular data for solving problems of multivalued classification of computer attacks is considered. The object of the study is a data set containing multivalued records collected using a hardware and software complex developed by the authors. The analysis of the attributes of the dataset was carried out, during which 28 attributes were identified that are of the greatest informational importance when used for classification by machine learning algorithms. The expediency of using autoencoders in the field of information security, in tasks related to datasets with the property of ambiguity of target attributes is substantiated. Practical significance: data preprocessing can be used to improve the accuracy of detecting and classifying multi-valued computer attacks.
Keywords: information security, computer attacks, multi-label, multi-label classification, multivalued classification, dataset analysis, experimental data collection, multivalued data, network attacks, information security
Based on the analysis of behavioral characteristics, the main indicators that provide the greatest accuracy in identifying users of mobile devices are identified. As part of the research, software has been written to collect touchscreen data when performing typical user actions. Identification algorithms are implemented based on machine learning algorithms and accuracy is shown. The results obtained in the study can be used to build continuous identification systems.
Keywords: user behavior, touch screen, continuous identification, biometrics, dataset, classification, deep learning, recurrent neural network, mobile device
A class of mathematical methods for code channel division has been developed based on the use of pairs of orthogonal encoding and decoding matrices, the components of which are polynomials and integers. The principles of constructing schemes for implementing code channel combining on the transmitting side and arithmetic code channel division on the receiving side of the communication system and examples of such schemes are presented. The proposed approach will significantly simplify the design of encoding and decoding devices used in space and satellite communication systems.
Keywords: telecommunications systems, telecommunications devices, multiplexing, code division of channels, matrix analysis, encoding matrices, synthesis method, orthogonal matrices, integers
Currently, key aspects of software development include the security and efficiency of the applications being created. Special attention is given to data security and operations involving databases. This article discusses methods and techniques for developing secure applications through the integration of the Rust programming language and the PostgreSQL database management system (DBMS). Rust is a general-purpose programming language that prioritizes safety as its primary objective. The article examines key concepts of Rust, such as strict typing, the RAII (Resource Acquisition Is Initialization) programming idiom, macro definitions, and immutability, and how these features contribute to the development of reliable and high-performance applications when interfacing with databases. The integration with PostgreSQL, which has been demonstrated to be both straightforward and robust, is analyzed, highlighting its capacity for efficient data management while maintaining a high level of security, thereby mitigating common errors and vulnerabilities. Rust is currently used less than popular languages like JavaScript, Python, and Java, despite its steep learning curve. However, major companies see its potential. Rust modules are being integrated into operating system kernels (Linux, Windows, Android), Mozilla is developing features for Firefox's Gecko engine and StackOverflow surveys show a rising usage of Rust. A practical example involving the dispatch of information related to class schedules and video content illustrates the advantages of utilizing Rust in conjunction with PostgreSQL to create a scheduling management system, ensuring data integrity and security.
Keywords: Rust programming language, memory safety, RAII, metaprogramming, DBMS, PostgreSQL
A method is proposed for cascading connection of encoding and decoding devices to implement code division of channels. It is shown that by increasing the number of cascading levels, their implementation is significantly simplified and the number of operations performed is reduced. In this case, as many pairs of subscribers can simultaneously exchange information, what is the minimum order of the encoding and decoding devices in the system. The proposed approach will significantly simplify the design of encoding and decoding devices used in space and satellite communication systems.
Keywords: telecommunications systems, telecommunications devices, multiplexing, code division of channels, orthogonal matrices, integers, cascaded connection
The article presents the method of multiple initial connections aimed at enhancing the information security of peer-to-peer virtual private networks. This method ensures the simultaneous establishment of several initial connections through intermediate nodes, which complicates data interception and minimizes the risks of connection compromise. The paper describes the algorithmic foundation of the method and demonstrates its application using a network of four nodes. An analysis of packet routing is conducted, including the stages of packet formation, modification, and transmission. To calculate the number of unique routes and assess data interception risks, a software package registered with the Federal Service for Intellectual Property was developed. The software utilizes matrix and combinatorial methods, providing high calculation accuracy and analysis efficiency. The proposed method has broad application prospects in peer-to-peer networks, Internet of Things systems, and distributed control systems.
Keywords: multiple initial connections, peer-to-peer network, virtual private network, information security, data transmission routes, intermediate nodes, unique routes
The article presents an algorithm for establishing a secure connection for peer-to-peer virtual private networks aimed at enhancing information security. The algorithm employs modern cryptographic protocols such as IKEv2, RSA, and DH, providing multi-level data protection. The developed algorithm structure includes dynamic generation and destruction of temporary keys, reducing the risk of compromise. The proposed solution is designed for use in corporate network security systems, Internet of Things system, and distributed systems.
Keywords: virtual Private Network, peer-to-peer network, cryptographic protocols, RSA, Diffie-Hellman, IKEv2, secure connection, multi-layer protection, information security, distributed systems
This article examines the vulnerability associated with storing image files in the cache on the device's hard disk in unencrypted form. The nature of this problem and the possible consequences of its exploitation, including leakage of confidential data, abuse of information received and risks to corporate information systems, are being investigated. The main attention is paid to the method of protection against this vulnerability, which is based on the use of masking techniques using orthogonal matrices.. The developed prototype of the messenger is presented, in which this method is implemented: images are transmitted and stored in the file system in masked form, the unmasking process is carried out directly in the messenger application itself.
Keywords: information security, messenger, messaging, communications, instant messaging systems, encryption, orthogonal matrices
The purpose of the article is to review various types how to deceive attackers in the network, analyze the applicability and variability of modern deception technologies. The method of investigation - analyzing existing articles in reviewed Russian and foreign sources, aggregating researches, forming conclusions based on the analyzed sources. The review article considers technologies of deception an attacker (Honeypot traps, Honeytoken decoys, moving target defense MTD, Deception platform). The effectiveness of the use of deception in terms of the impact on the mental state of a person is given in the article. The article provides a description of different types of Honeypots, discusses the classification according to the target, place of introduction, level of interaction, location, type of introduction, homogeneity and type of activity. as well as their component parts. Different strategies for using traps in the network are discussed - sacrificial lamb, hacker zoo, minefield, proximity traps, redirection screens, and deception ports. Classification of decoys is given, methods of their application in an organization's network are described, additional conditions that increase the probability of detection of an attacker by using decoys are specified. The basic techniques of the MTD strategy to obfuscate the infrastructure are given. The interaction of these methods with Honeypot and Honeytoken technologies is described. Research that confirms the effectiveness of using MTD in conjunction with traps and decoys is given it he article, the difficulties in using this strategy are pointed out. A description of the Deception platform is given, its distinctive features from conventional traps and decoys are described, and the possibility of its interaction with MTD is given. As a result, the main technologies and strategies to deceive the attacker have been identified and described, their development is pointed, their interaction with attackers and counteraction to them is described.
Keywords: Deception Platform, Honeypot, Honeytoken, Honeynet, MTD
The relationship between "old" and "new" concepts/metrics for quality assensing of statistical detection criteria and binary events classification is considered. Independence and consistency assessments of analyzed metrics relative to initial input data volume/composition are provided. Recommendations for the use of "new" metrics for assessing the quality of detection and binary classification events are clarified.
Keywords: Type I and Type II errors, accuracy, recall, specificity, F-score, ROC curve; AUC integral metric
This study examines the structure and characteristics of multilayer autoencoders (MAEs) used in detecting computer attacks. The potential of MAEs for improving detection capabilities in cybersecurity is analyzed, with a focus on their role in reducing the dimensionality of large datasets involved in identifying computer attacks. The study explores the use of different neuron activation functions within the network and the most commonly applied loss functions that define reconstruction quality of the original data. Additionally, an optimization algorithm for autoencoder parameters is considered, designed to accelerate model training, reduce the likelihood of overfitting, and minimize the loss function.
Keywords: neural networks, layers, neurons, loss function, activation function, mobile applications, attacks, hyperparameters, optimization, machine learning
The article is devoted to the problems of the complexity of the process of developing organizational and administrative documentation, taking into account the branch of work of the organization, as well as the departments that make up its main work. Taking into account the impact of the changing economy in the country, organizations are constantly subject to changes with the initiative of the relevant regulators in a particular area and regulatory documents in the form of standards and laws. The main branches of the organizations' work are highlighted, as well as the number of regulatory documents for regulating their activities. The analysis of the organization as a system based on a system analysis is carried out. The Sagatovsky method was chosen as an approach to solve the problem. According to the methodology, the system was analyzed, consisting of seven stages. At each stage, the main components are highlighted, and justifications for each of them are given. Life cycle diagrams of the specified "types of end products" have been compiled, taking into account the direction of work of the departments. A scheme of the process of creating organizational and administrative documentation by employees and departments of the organization has been developed. An analysis of the organization from the point of view of a system analysis will further develop criteria for creating a set of organizational and administrative documentation. Criteria for the creation of organizational and administrative documentation and methods of their assessment will help organizations significantly facilitate work with the main regulators in any area, as well as meet the set standards of work, which in the future will help not only to improve work, but also to avoid negative consequences for the enterprise itself.
Keywords: the Saratovsky method, system analysis, goal setting, information security
PHP Data Objects (PDOs) represent a significant advancement in PHP application development by providing a universal approach to interacting with database management systems (DBMSs). This article opens with an introduction describing the need for PDOs as of PHP 5.1, which allows PHP developers to interact with different databases through a single interface, minimising the effort involved in portability and code maintenance. It discusses how PDO can improve security by supporting prepared queries, which is a defence against SQL injection. The main part of the paper analyses the key advantages of PDO, such as its versatility in connecting to multiple databases (e.g. MySQL, PostgreSQL, SQLite), the ability to use prepared queries to enhance security, improved error handling through exceptions, transactional support for data integrity, and the ease of learning the PDO API even for beginners. Practical examples are provided, including preparing and executing SQL queries, setting attributes via the setAttribute method, and performing operations in transactions, emphasising the flexibility and robustness of PDO. In addition, the paper discusses best practices for using PDO in complex and high-volume projects, such as using prepared queries for bulk data insertion, query optimisation and stream processing for efficient handling of large amounts of data. The conclusion section characterises PDO as the preferred tool for modern web applications, offering a combination of security, performance and code quality enhancement. The authors also suggest directions for future research regarding security test automation and the impact of different data models on application performance.
Keywords: PHP, PDO, databases, DBMS, security, prepared queries, transactions, programming
Relevance of the research topic. Modern cyber attacks are becoming more complex and diverse, which makes classical methods of detecting anomalies, such as signature and heuristic, insufficiently effective. In this regard, it is necessary to develop more advanced systems for detecting network threats based on machine learning and artificial intelligence technologies. Problem statement. Existing methods of detecting malicious traffic often face problems associated with high false-positive response and insufficient accuracy in the face of real threats on the network. This reduces the effectiveness of cybersecurity systems and makes it difficult to identify new attacks. The purpose of the study. The purpose of this work is to develop a malicious traffic detection system that would increase the number of detected anomalies in network traffic through the introduction of machine learning and AI technologies. Research methods. To achieve this goal, a thorough analysis and preprocessing of data obtained from publicly available datasets such as CICIDS2017 and KDD Cup 1999 was carried out.
Keywords: anomaly detection, malicious traffic, cybersecurity, machine learning, artificial intelligence, signature methods
Abstract. The purpose of the article is to study the information security of critical parameters of the organization's IT infrastructure processes and its digital infrastructure using Security Monitoring Centers. Such risk factors as adaptability, stability in the middle and long period, the influence of uncertainties ("white noise") are emphasized. In addition to system analysis and synthesis, methods of mathematical (simulation, operator) modeling, computational mathematics and statistics are used in the work. Based on the analysis and synthesis, the following main results were obtained: 1) the classification of the effects of various attacks on the distributed infrastructure was carried out; 2) a scheme, a multiplicative model of integral interactions of protective measures and an integral measure of security are proposed; 3) an algorithm has been developed to identify the constructed multiplicative model based on the least squares criterion, both by the set of factors and by risk classes; 4) shows an example of an operator equation taking into account random noise in the system. Scientific and practical value of work: the results can be used to assess the security of the system and reduce the risks of targeted attacks, damage from them. In addition, the proposed schemes will facilitate situational modeling to detect risk situations and assess the damage from their implementation.
Keywords: assessment, sustainability, maturity, information security center, monitoring, risk, management
The article proposes a set of anthropomorphic models for assessing the risks of infrastructural destructivism effects. These models are based on one of the approaches to assessing the risks of infrastructural genesis, which consists in assessing the effect of infra-structural destructivism, consisting in the uncontrolled self-destruction of the information infrastructure. In contrast to existing approaches to assessing the indicators of infrastructural destructivism, the article proposes the use of models that take into ac-count multiple inter-object behavioral interactions of processes based on the anthropomorphic approach. The anthropomorphic approach involves the implementation of algorithms for assessing inter-object interactions according to the principles of development of wildlife. The phenomenon of infrastructural destructivism has a practical explanation associated with the fact that under certain conditions, the simultaneous implementation of destructive impacts on infrastructure objects from various sources can lead to both catastrophic changes (that is, to the complete self-destruction of the information infrastructure) and to minimizing the risks of infrastructural genesis. The article introduces the concept of the "health" metric in the infrastructure information security monitoring system, which displays the presence of "negative" behavioral activities of processes and thereby predicts an increase in the probability of the appearance of infra-structure destructiveness effects. Thus, when applying the proposed models, it becomes possible to increase the accuracy of assessing the risks of infrastructure genesis, and therefore ensure a sufficient level of information security.
Keywords: infrastructure destructiveness, destructive impacts of infrastructure genesis, anthropo-morphic approach, intelligent analysis of event logs, behavioral analysis
The study of statistical characteristics of network traffic allows us to detect its fractal features and estimate how the fractal dimension changes under cyber attacks (CA). These studies highlight the relationship between attacks and dynamic changes in the fractal dimension, which allows us to better understand how attacks affect the structure and behavior of network traffic. Such understanding is critical for developing effective methods for monitoring and protecting networks from potential threats. These observations justify the use of fractal analysis methods, including discrete wavelet analysis, for detecting CA. In particular, it is possible to monitor the fractal dimension of telecommunication traffic in real time with tracking its changes. However, the choice of the most appropriate mother wavelet for multiresolution analysis remains an insufficiently studied aspect. The article evaluates the influence of the choice of the mother wavelet type on the estimate of the Hurst exponent and the reliability of CA detection. The following types of mother wavelets are considered: Haar, Daubechies, Simlet, Meyer and Coiflet. The study included an experimental evaluation of the Hurst exponent on a data set that includes a SYN flood attack and normal network traffic. It was shown that the minimum spread of the Hurst exponent estimate for traffic with SYN flood attacks is achieved when using the Meyer mother wavelet with an analysis window of more than 10,000 samples and the Haar wavelets with an analysis window of less than 10,000 samples.
Keywords: mother wavelet, computer attack, network traffic, Hurst exponent, wavelet analysis, fractal dimension
The purpose of this work is to analyze the concept of the threat of ransomware, methods of their detection, as well as to consider methods of intelligent analysis in solving the problem of detection, which are a popular tool among researchers of ransomware and malicious software (malware) in general. Data mining helps to improve the accuracy and speed up the malware detection process by processing large amounts of information. Specialists can identify new, previously unknown malware. And with the help of generative adversarial networks, zero-day malware can be detected. Despite the fact that a direct and objective comparison of all the studies presented in the work is impossible, due to different data sets, it can be assumed that using the architecture of generative-adversarial networks is the most promising way to solve the problem of detection.
Keywords: malware, ransomware, intelligent analysis, machine learning, neural network, generative adversarial network
the article considers the development trends of the high-tech industry of quantum communications. The most popular topologies of quantum communication networks are described, including those with trusted intermediate nodes. The methods of interaction between nodes of the backbone quantum-cryptographic network are given and the main methods of ensuring secure transmission in such networks are presented. A simplified scheme for distributing a quantum secret key between the end segments of the backbone telecommunication network using trusted intermediate nodes is considered. Possible data leakage channels in the general structure of quantum-cryptographic networks are described.
Keywords: quantum communications, quantum key, network topologies, trusted nodes
The use of electronic signatures has recently become widespread and has become an integral part of most business processes. The electronic signature management tools offered by the cryptography vendor are not always able to satisfy all the requests of organizations. In this paper we consider an approach aimed at solving most of the problems of electronic signature management. The essence of the method consists in the combined use of both libraries of the cryptography tools developer and the capabilities of highly specialized libraries for working with cryptography and documents.
Keywords: software, electronic signature management, stamp, electronic signature visualization, information protection
An integrated information security system combining dynamism and efficiency is proposed, and a quantitative assessment of this system is presented. The study is aimed at identifying all potential switching routes of maximum length between unique states, taking into account potential difficulties that may arise when implementing a recomposition information security system. The main tool for analyzing and modeling various transition configurations in the system under study is the apparatus of graph theory. Within the framework of the proposed approach, each subsystem includes several independent options or components, and at any given time only one of these options functions. An important aspect is both the interaction between the subsystems and the ability to switch components within one subsystem. For a visual understanding of the proposed approach, an example is given that illustrates the basic principles and mechanisms of the developed system.
Keywords: information security system, state graph, DLP system, IPS/IDS system
Currently, one of the most extensive issues in the field of information security is the organization of user access control to information infrastructure objects. Taking into account the volume of corporate information resources, as well as the number of users requesting access, there is a need to automate the access approval process taking into account possible risks. In this case, the most optimal solution to this problem is the use of fuzzy logic. The article analyzes the process of providing access to the information infrastructure using a fuzzy classifier and develops a conceptual model of the fuzzy classifier algorithm for incoming requests for access in order to automate the process and minimize information security risks associated with possible destructive actions aimed at the confidentiality, integrity and availability of the information infrastructure.
Keywords: neural network, machine learning, information security, cybersecurity, properties and structure of a neural network, mathematical model, threats and information vulnerabilities
Nowadays, the Internet has become an integral part of our lives, providing access to a huge amount of information and services. However, along with this, the number of destructive Internet resources that can harm users, especially children and adolescents, is growing. In this regard, there is a need to create an effective system for regulating access to such resources. The article presents an expert system for regulating access to destructive Internet resources, developed on the basis of modern technologies and methods of artificial intelligence. The system allows to automatically detect and block access to resources containing malicious content, as well as provides an opportunity for manual configuration and access control. The article describes the main components of the system and presents images demonstrating the work of the system for blocking access to destructive resources. The article will be useful for specialists in the field of information security, artificial intelligence and protection of children from malicious content on the Internet.
Keywords: destructive content, expert system, information security, Internet resources, SpaCy, Keras, RNN, LSTM, PyQt5, vectorization
The article solves the problem of automated generation of user roles using machine learning methods. To solve the problem, cluster data analysis methods implemented in Python in the Google Colab development environment are used. Based on the results obtained, a method for generating user roles was developed and tested, which allows reducing the time for generating a role-based access control model.
Keywords: machine learning, role-based access control model, clustering, k-means method, hierarchical clustering, DBSCAN method