Wavelet and fractal analysis based news spreading model

The aggression of Russian Federation in Ukraine has exposed a strategic problem, derived from unwillingness to fully oppose in the information confrontation. Media could do even more damage to country’s defense capabilities than military action. Spreading fake news can be used to misrepresent, distort, and create a negative image of an informational event in society, as well as instill fear, panic, or downplay achievements. Little attention is paid to such information operations and approaches to their detection. The paper deals with detection of such information effects using Exploratory data analysis, fractal and wavelet analysis, as well as factor structure analysis using neural networks for being defensive, understanding of methods and techniques used to attack in Informational Space is necessary.


Problem formulation
Literary sources analysis. At the moment, «fake news spreading» is actively discussed by a professional community of researchers as definite and meaningful action. Fake is se of fictitious and specially fabricated news whose main purpose is to undermine the reputation of targeted institution, organization or person. Such news are destructive, causing financial damage and potentially harmful to state, citizens, companies, organizations.
In the context of ongoing «hybrid war» it is concluded, that systemic misinformation and manipulation of audience poses a threat to Ukrainian national interests and security and also and discredits media outlets. As such, for fake news information protection requirements are violated, making them indirectly malicious. Their influence spread peaked with mainstream embracing use of social media and subsequent automatic distributing by different organizations. Such an example was botnet generating and posting misinformation about US being responsible for COVID-19 creation and spread.
Fake news detection methods can be divided in two categories, according to source of their distribution: the media news content research and social networks content research. Methods related to the first category are focused on the news content: text, header and additional metadata (when they are available). The second category methods are focused on social networks features (user interaction analysus and user's activity monitoring). It is possible to distinguish different approaches to news research and fake news management in mentioned research categories: fake news monitoring, linguistic cue methods and text analysis, network analysis, analytic approaches including machine learning methods.
The news content approach describe features connected with the meta information related to a piece of news. Representative news content attributes are source, headline, body text, photo/video sources. The main tool for news content analysis are fact-checking and text mining techniques.
The social network content approach. Generally, there are three major aspects of the social media context represented: users, generated posts, and networks. Social engagements represent the news proliferation process over time, which provides useful auxiliary information to infer the veracity of news articles based on graphic representation of users connections, news spreading model and news spreading characteristics, users activity on posten news etc [1,2].

Goals and methods of studying
The subject of the study is information attacks and information influences. The aim is to develop an approach to identify information effects based on a model built using wavelet and fractal analysis for further implementation in specialized systems of intellectual analysis of unstructured text flows on the Internet. This would give an opportunity to defense experts or law enforcement to detect spikes in the information field and the dissemination of manipulative messages and allow to react in a timely and effective manner. To perform the study on informational influences, the informational messages set on the news related to the granting the Tomos of autocephaly to Ukraine. Collected data was analyzed step by step with following methods: Exploratory Data Analysis (using statistical processing on time series composed from number of published news sorted by emotional affiliation) [3, p.20 Step 1: Exploratory Data Analysis Exploratory data analysis (EDA) provides patterns revealment and dataset's main characteristics summarizing beyond modeling and hypothesis testing. Results after EDA performing are outliers and anomalies detection, underlying structures uncovering, statistically analyzed informational time series. Thus, exploratory data analysis is used as a preliminary step before the hypothesis testing stage. In this case the main EDA purpose is to identify the characteristics, patterns, and relationships in the data that are the subject of the analysis [1,2].
As time series analysis subject was used the number of news about Tomos. The set of analyzed sources for data collection consisted of 20 television channels, 400 traditional media, 14000 online media, 15 radio stations, and social networks presented by Facebook, VKontakte, YouTube, Odnoklassniki, Live journal. The resulting time series characterize the total amount of news, classified by different emotional color, from different sources from the Ukrainian and Russian media, and also as well as separate regions of Donetsk and Lugansk regions (SRDLR) media. Emotional color classification was performed using Sentiment analysis (also known as opinion mining or emotion AI) which refers to the use of natural language processing and text analysis to systematically identify and quantify affective states and subjective information.
After Exploratory data analysis performed on classified data following time series data characteristics were identified: • Mean • Median • Mode • Range • The first quartile or 25th percentile, which corresponds to the value that lies halfway between the median and the lowest value in the distribution (when it is already sorted in the ascending order) • The third quartile or 75th percentile, which corresponds to the value that lies halfway between the median and the highest value in the distribution (when it is already sorted in the ascending order) • Inter-Quartile Range where is the mean, is time series item value. The outliers are values in data that are abnormal. In case, abnormal nature is considered that the values are significantly different from others. Hypothetically, news time series outliers characterize the interference of fake news in the social processes of dissemination and exchange of information.
Inter-Quartile Range is the difference between the extreme values of the sample, truncated on the 25th to the 75th quartile. Such a measure is less prone to the impact of emissions and potentially characterizes the real amplitude of fluctuations in the social process of dissemination and exchange of information.
Dispersion and standard deviation characterize the scattering of values around the data center.
Standard deviation is the values scattering index.
The coefficient of variation gives an idea of the size of the deviation of the sample values from each other and allows to compare the variance in the values between different points in the data.
The result of time series analysis on news set data is represented in Tabs 1, 2, 3:   As can be seen from Tables 1-3, the most popular value that characterizes the number of news per day is 0. This is evidenced by the Mode indicator of these time series. It can also be seen that the Inter-Quartile Range is approximately equal to the average number of messages per day. This makes it possible to make more accurate predictions and hypotheses around outliers source. Using mentioned results it can be seen that SRDLR news is similar in structure to Russian news due to time series characteristics and the ratio of news with a negative color, which much higher than in Ukrainian news. In both Russian and SRDLR news time series, the ratio of news with a negative color is much higher than in Ukrainian news, and also with a greater concentration of positive news, the maximum number of negative news per day is higher than positive.
Graphic representation of performed analysis of emotional distribution are below:   This day is also characterized by an abnormally high amount of positive news, but also a number of negative news was recorded. The SRDLS outliers are corresponding to the dates October 11, 2018 (the Ecumenical Patriarch Bartholomew in Constantinople confirmed that Ukraine will be granted the Tomos -meaning that Ukraine's independent autocephalous church is recognized) and January 5, 2019. Both of these days are also character-ized by an increase in positive news, and their number fixed as maximum. In addition, the largest amount of negative news was recorded on October 11.

Step 2: Fractal Analysis
As the information space is considered as stochastic, the self-similarity of the information space is expressed in stable structure in which such sections as sources, authors, subjects practically do not change the form. Fractal theory application in the informational space analysis allows to investigate the patterns that form the basis of computer science. Due to thematic information arrays are represented by information clusters, it is possible to perform a cluster analysis and independently identify new features objects and develop information space self-similar structures as stochastic fractals distribute objects to new groups [3, p.158].
Within the applied fractal analysis, following indicators were calculated: • the Hurst exponent; • Fractal dimension; • Correlation entropy; • Correlation dimension. Hurst exponent is a persistence measure characterizing a process property to trending. The value > 1 2 means that a certain side dynamics of the process in the past is likely to cause future dynamic direction. In < 1 2 case it is predicted that the process will change a direction.
= 1 2 means process uncertainty [3, p.164]. The Hurst exponent is also known as / analysis. To study the fractal characteristics of time series the values of the Hurst exponent are investigated from the ratio: where • R is the range between the minimum and maximum value; • S is the standard deviation; • A is the scale constant; • N is the sample size; • H is the Hurst exponent. Thus, Hurst exponent is calculated as: Fractal dimension determines the degree of complexity of the fractal figure and described by ratio = log log where is the number of self-similar parts after figure is -times increased.
The correlation entropy index is required to analyze the dynamics of the process subject to predictability. The smaller the value of the correlation entropy, the more predictable the behavior of the process. In other words, correlation entropy is a quantitative indicator of the chaos of the social process. The correlation dimension indicator shows complexity of the process behavior.
Thus, the correlation dimension indicates presence of similar values in the cluster system, probability that a random number of the time series will spend in this area, and also shows the number of parameters required process describing [6].
The result of fractal analysis on news set data is represented below: Presented results show that all processes are persistent, except the neutral SRDLR news, which trend is difficult to predict. As for Ukrainian news, the most uncertain is the process of news with a negative emotional color. The value of correlation entropy for positive and negative news is approximately equal. For Russian news, the result is the same, with the values of correlation entropy are much smaller. For SRDLR news, the correlation entropy for neutral news is 0. This means that the process that characterizes neutral news is completely defined and predictable.
Graphic interpretation of gained results is given below (Ukrainian news on Fig. 13, Fig. 14, Russian news on Fig. 15, Fig. 16, SRDLR news on Fig. 17, Fig. 18): Step 3: Wavelet Analysis The wavelet transform's main idea is non-stationary time series division into separate intervals (so-called observation windows) and performing scalar calculation on each of them gaining a value that characterizes two data patterns closeness degree, with different wavelet shifts on different scales. The wavelet transform generates a set of coefficients as functions of two variables: time and frequency, and therefore form a surface in three-dimensional space. These coefficients show to what extent the behavior of the process at a given point is analogous to wavelet on this scale. Mentioned operations use allows to analyze data at different scales and accurately mean the position of their characteristics over time [4, p.113]. The technology of using wavelets allows to detect single and non-regular «bursts», sharp changes in quantitative indicators values, thematic publications amount on the Internet in particular case.
News posting process can be described as oscillations, which are in some sense regular. Thus, fluctuation amplitude will indicate published at a given point in time news amount, taking into consideration specific news characteristics as first rapid increase, and then a slower decline. Since at rest the fluctuations of the social process are regular, interference in this process of fake news creates noticeable jumps and disrupts the structure of fluctuations. A study is also possible in which news that does not have a fake nature is considered noise. In this case, it is possible to extract the fake signal from the general news stream by separating the signal from the noise. After fractal analysis the signal was investigated for the property of self-similarity, wavelet analysis shows time series cycles inheritance and investigate these cycles for the moment of their beginning to the end [3, p.179], [4, p.113].
In study were used first-and fifth-order Dobesi wavelets, which differ in filter lengths of 2 and 10. Dobesi Wavelet is an orthogonal wavelets family with a compact carrier that is calculated iteratively. The main purpose of Dobesi wavelet use is to highlight media data patterns in Russian, Ukrainian and SRDLS news. The task is to select similar cycles in all time series. Such a result has a very useful effect on the further construction of a model of fake news, as well as allow to refute or confirm some hypotheses [7].
The following result is a 10-level time series data decomposition by first-and fifth-order Dobesi wavelets. The 10th decomposition level is the maximum possible series representation level. The 11th level completely repeats the 1st. All levels decomposition signals are presented in details of the time series form and its approximate form. Graphic interpretation of gained results is given below (Ukrainian news on Fig. 19, Fig. 20, Russian news on Fig. 21, Fig. 22, SRDLR news Fig. 23,  Fig. 24): Due to gained time series decompositions it can be seen that Ukrainian news almost completely repeats the dynamics of positive news at each decomposition level (as positive news amount is about 70 percent). This means that the sharp changes in negative news amount are atypical for these time series and can be considered as information influence result.
Russian news general dynamics is a composition of positive and negative news. The situation directly cor- Dynamics of all news almost completely coincides with the dynamics of negative news in SRDLR region. This may be partly due to the fact that negative news value is more than 57 percent. At the same time, on each decomposition level there are certain patterns inherent in a number of news in Russian region. Therefore, the final conclusion is that much of the news in the region, both negative and positive, is characterized by outliers and can be considered as information influences.
Step 4: Factor structure analysis Factor structure is a set of factors affecting the system formation and being system or process component. Conceptually, social systems structure is very similar to neural networks structure and both of them consist of interconnected elements set divided into certain hierarchical levels and directly and feedback interacting each other. Another similarity is the nonlinear relationship between the system elements, as well as the simultaneous parallel processing of information.Thus, latent structure, e.g. the number of neurons on the latent layer, can be interpreted as the number of factors acting in the social process or system.
Item factor analysis (IFA) is a popular method for summarizing a number of categorical item responses using a smaller number of continuous latent variables. It is an indispensable tool for item analysis as well as test construction and scoring in psychological and educational measurement research [5,8]. To discover new patterns from a huge database by knowing what factors affect the system and what information should be extracted is an essential step in data mining. The more precisely a factor analysis is done, the better the performance of the clustering and the classification in pattern recognition becomes. In factor analysis, the selected factors should be significant and complete. The necessary condition of significance requires the selected factors to be independent and important while the sufficient condition of completeness would provide complete information [5].

Experiment details
During the experiment direct distribution architecture (FNN) network and its MLPRegressor implementation on Python programming language from the sklearn library were used.
Feedforward Neural Network is a class of no feedback networks. Signals in such networks propagate only in one from the input neurons to the output through the hidden layer direction. FNNs are mainly multilevel classifiers of logistic regression and also called Multi Level Perseptrons (MLP). In this mode hidden layer is required. Thus, the use of several such layers makes sense only when using nonlinear activation functions. two information influences spreading models with same character and properties, but different behavior. First one is aggressive and aimed at influences spreading in order to achieve such interests, the second is passive and responds to attacks in order not to suffer damage.
5) It can be concluded that 3 news bursts in the Russian region were symmetrically reflected in the Ukrainian news sources. Hypothetically, it can be assumed that bursts are caused by information influences or fake news spreading.

Conclusions
An iterative approach to the information influence model using EDA, fractal and wavelet analysis were used to conduct a aimed results: 1) Data statistic characteristics identification and data outliers descrition has been proveded by Exploratory data analysis. Collected results are interpretated with box plots. 2) Fractal analysis performing provided Hurst exponent and corellation entropy values. Social processes have been studied for both persistence and uncertainty. 3) Wavelet analysis was performed in order to perform patterns highlight and news process decomposition at different levels. 4) In the constructed model conditions the optimum factor number has been found due to influencing factor data structure allocation. The paper presents a method based on wavelet and fractal analysis which can be used for implementation within the specialized media monitoring systems of unstructured text flows intellectual analysis. Theme and development in this area are extremely relevant for Ukraine today due to actively developing sources that retransmit the Russia's agenda, and work to destabilize the situation in Ukraine, messages from whose messages are disseminated by provokers publishing posts and comments, which are created for spreading Russian indoctrination in social networks, as well as comments under news from online news outlets, etc.
Forehanded use of such methods in sphere of security and national security would help to automate the detection process of fake news distribution in initial phases, allowing experts of security and defense structures to take timely measures and prevent manipulative messages spread.