
The GraphicAnalytical Method.
Principles of the selection of the lexical material. for research are considered apart
The method used in the studies, called the graphoanalytic by the author, is both a means and technique of cognizing the relationships of closely related languages at an early stage of their development conditioned by the peculiarities of the natural environment. It was first described in 1987 in the article "The Determination of Habitats of Ancient Slavs by a GraphicAnalytical Method" in the magazine "Proceedings of the Academy of Sciences of the USSR. The Series of Literature and Language. Volume LXIV:1, Moscow" (in Russian). This method is practical implementation of theoretical reasoning of an Ukrainian philologist Illarion Sventsitskiy who, defending the need for the application of mathematics in the humanities, wrote:
It is important only that all sorts of human and world relationships are easiest to denote by numbers, volume, and position in space and time, so they can easily fit into the framework of mathematical symbols (SWIENCYCKYI I., 1927, 53)
Method, indeed, determines the relative position of related languages in space at a certain time and is based on the use of one of the types of graphs that maybe still awaits its description in mathematics (the author has not found it in graph theory ).
The investigation of prehistoric ethnogenic processes required the analysis of large volume of lexical data selected from various dictionaries. It is not necessary to have perfect command of all these analysed languages when working with dictionaries of different language families, but it is indispensable to know their phonetic peculiarities and rules of their changes according to the requirements of comparativehistorical linguistics (MEILLET A, 1938; MEILLET A., 1954; FORTUNATOV F.F.,1956; GAMKRELIDZE T.V., IVANOV V.V., 1984). The work of H. Krahe (KRAHE HANS, 1966) was used while selecting and systematizing of words of the IndoEuropean languages. The phonetic rules of the FinnoUgric languages were drawn from the book of Russian linguists Lytkin V.I. and Gulajev E.S. (LYTKIN V. I., GULAYEV E.S., 1970). and the phonetic rules of the Turkic languages were drawn from Baskakov’s classification (BASKAKOV N.A., 1960).
The analysis was performed on the lexical level with the comparison of lexical units within two aspects – phonetic and semantic, h.e. after their appearance and meaning. The phonetic congruencies without semantic conformities were excluded from the study. The evaluation of semantic accordance was performed from synonymy, with more or less semantic similarity, till antonymy, which sometimes can be the consequence of concept characteristics (classical example – initial meaning of the word “side” can be changed to “beginning” and “end”).
The Nostratic, IndoEuropean, FinnoUgric, Turkic, Iranian, Germanic, Slavic, Mongolic, ManchuTungus languages were studied with this new graphicanalytical method. Two types of tabledictionaries were used for each group of languages. At the beginning, the first type tabledictionary for the language group was compiled, where the semantic list was placed in the far left column but all available synonyms of each semantic concept were placed in forthcoming columns for each analysed language. Then the obtained synonymic nests were analysed for phonetic similarity and it allowed us to select the phonosemantic terms; the other words were added to the list after the analysis of synonyms with similar sense. The selected phonosemantic terms constituted the tabledictionary of the second type where the identifiers of phonosemantic terms were placed in the far left column and available matches from particular languages were placed in the remaining columns. They form phonosemantic set. Worked out by such way Etymological tabledictionaries for different language families and groups, using which calculation was provided, can be found on my site "Alternative historical linguistics".
The data of these tables provided us with the means to calculate the number of mutual words in the language pairs, necessary for the construction of graphic models for the language relationships within the same language families. These graphic models are the graphs of specific sort, possibly yet to be described (the author has not yet found it in anywhere) in mathematics. This graph can be characterized as a “weighed graph” where not only single nodes but all of them without exclusion form mutual connections and not only the connection itself is important, the distance between all the nodes has to be considered. In this case, each node of graph is not just a point but the aggregate of points and every aggregate correspond to a particular language in the relationship model. Each point of the aggregate is the end of the segment with the length iinversely proportional to the number of mutual words in the pair of languages that correspond to those two aggregates connected by this very segment. When the number of mutual words in the language pairs is known, it is possible to determine the set of segments needed to build the graphic model. Even this possibility of the graph construction proves the existence of a certain system in the database but certain doubts may arise. Let’s calculate this probability.
If we take the graph A, which has n mutually connected nodes, each node has (n1) ribs. As we know from mathematics, it is enough to have only two coordinates in any frame of coordinates to place a point on a plane. For our graph, we can determine much more pairs of coordinates combining all ribs by two with each other. (When the length of ribs is known!). The number of pairs C can be calculated with this known equation:
For example, if we have the number of nodes n = 6, the number of pairs C of coordinates will be 10, but when n = 10, C increases to 36, and C = 55 when n = 12. Thus if n is as much as 6, we can determine a place for each node in tens different ways. In our case with the graph A when we use all possible variants of nodes arrangement with the ribs of known length, every time some nodes will get to the same point. But when we analyse a real situation, e.g. , the system of cognate languages, the graph B, where each of its nodes is not just a single point but the aggregate of points, which fill small areas and these areas do not overlap each other, can meet our requirements. If we have the number of analysed objects n = 6 and they fill the area S = 1, each object fills the area as big as s =1/6. In that case, the probability for at least one point to get on its own place is equal to 1/6. If we have 6 objects, we can place each node in ten different ways (look above), so the probability for the point to get on the same very place in each of ten cases will be equal to 1/610 = 1: 604 660 176. As far as we have 6 objects, this number has to be multiplied by six times again and we shall obtain a number with 80 zeros in the denominator. If we have ten objects, the number of zeros in the denominator will increase up to 3600. It demonstrates that accidental construction of the graphical model is practically impossible.
The construction of the graphic models can be demonstrated on the example of the Nostratic languages. This term is used for the phylum of six large language families of the Old World: Altaic, Uralic, Dravidian, IndoEuropean, Kartvelian and SemiticHamitic (HamitoSemitic, or AfroAsiatic) which seem to have a common parent language. The necessary data for the analysis were sourced from the work of the Ukrainian linguist IllichSwitych (ILLICHSVITYCH V.M., 1971 ). He analyzed and systematized similarities in word structure, grammar and vocabulary of the Nostratic languages and gave a large volume of such matches between these languages in his book. The scholar assumed that these similarities can be interpreted only within the theory postulating genetic relationship of these languages i.e. that they are monophyletic and belong to one superfamily (phylum) of the Nostratic languages.
Some of the results of IllichSvitych’ study were taken from tables in his book (morphologic features and the vocabulary of 147 units) and 286 matches were found in the further text. After the comparison of this data with the research materials of another Russian scholar (ANDREYEV N.D., 1986), consistent with the results of IllichSwitych, they were supplemented with 27 words from the Uralic languages and 8 words from the Altaic languages. As a result, it is turned out that we determined 433 features in total. Thirty four of them were common for the whole phylum and the rest was composed by 255 units from the Altaic, 255 units from the Uralic, 253 units from the IndoEuropean, 240 units from the SemiticHamitic, 189 units from the Dravidian, and 139 units from the Kartvelian languages respectively. Then the number of mutual features in language pairs was calculated. The results of the calculation are given in table 1.
Table 1. Quantity of mutual features between language families.
Altaic – Uralic 
167 
Uralic – Kartvelian 
66 
Altaic – IndoEuropean 
153 
IndoEuropean – SemiticHamitic 
147 
Altaic – SemiticHamitic 
149 
IndoEuropean – Dravidian 
108 
Altaic – Dravidian 
109 
IndoEuropean – Kartvelian 
70 
Altaic – Kartvelian 
84 
SemiticHamitic – Dravidian 
110 
Uralic – IndoEuropean 
151 
SemiticHamitic – Kartvelian 
86 
Uralic – SemiticHamitic 
136 
Dravidian – Kartvelian 
54 
Uralic – Dravidian 
134 


Further, in fine print, follows detailed description of the principle of constructing a graphical model of kinship closely of related languages for selected lexical and statistical data on the example of the Nostratic languages. This description can be omitted without prejudice to the considered question.
We can’t yet speak about the certain rule in the analyzed data but one can find out that as a rule there is the biggest volume of mutual words in the Altaic, Uralic, SemiticHamitic and IndoEuropean languages. Using graphicanalytical method, let’s try to build the graphic model of the Nostratic relationship to prove the existence of a certain rule in this data. First, the distances between the centres of the habitats of individual Nostratic speakers at the time of these languages arising has to be calculated with the formula L = K/N, where L is the distance, N is the number of mutual words (features) in separate pairs and K is the scale factor to be determined, (K > 0). The choice of the scale factor is determined by the size of the plane of building our model. Number K = 1000 is consistent with our data. So the distances in cm between the areas of individual languages have are presented in table 2.
Table 2. Distances between centers of language family areas at the diagram, cm.
Altaic – Uralic 
6.0 
Uralic – Kartvelian 
15.2 
Altaic – IndoEuropean 
6.5 
IndoEuropean – SemiticHamitic 
6.8 
Altaic – SemiticHamitic 
6.7 
IndoEuropean – Dravidian 
9.3 
Altaic – Dravidian 
9.2 
IndoEuropean – Kartvelian 
14.3 
Altaic – Kartvelian 
11.3 
SemiticHamitic – Dravidian 
9.1 
Uralic – IndoEuropean 
6.6 
SemiticHamitic – Kartvelian 
11.6 
Uralic – SemiticHamitic 
7.3 
Dravidian – Kartvelian 
18.5 
Uralic – Dravidian 
7.5 


The construction of the model requires reiterations. First, one point for each language is determined on two coordinates. These six points determine the estimated places of languages and their exact places are to be found with the subsequent iterations. In principle, one can start building the model from any language but when it is unknown in which direction it will extend it can exceed the limits of the plane. Therefore it is better to start with the language pair which has more mutual features. In this case, this pair is the Altaic and Uralic languages. So, first, the segment AB with length of 6cm corresponding with the number of mutual words in these two languages is placed close to the centre of the plane. The ends of this segment determine the place for points of the Altaic and Uralic languages (see figure 2).
The points for the IndoEuropean and SemiticHamitic languages are placed on the base of this segment. We start with the point for the SemiticHamitic because this language has more mutual features with Kartvelian and Dravidian. According to the number of mutual features the point of SemiticHamitic is to be placed at the distance of 6,7cm from the point of the Altaic language, and at 7,4 cm from the point of the Uralic language. Two arcs with such radiuses are made by the pair of compasses and the point of SemiticHamitic is situated on their attachment. There can be two of such points – to the left and to the right of the base AB. The first of them determines the final appearance of the graphic model that can have two mutually reflexive variants. We select the point closer to the center and obtain three points – A, B, C, and look for point D for the IndoEuropean languages. It is also situated on the base AB. It has to be at the interval of 6,5 cm from the point A and at the interval of 6,6 cm from the point B. Two corresponding arcs are made with a pair of compasses opposite to the point C, so point D is found. (Point D can’t be close to the point C, for the IndoEuropean and SemiticHamitic languages should have had considerably more mutual characteristics in such case, when it not like that in the reality). The point E for the Dravidian languages is placed on base BC because the SemiticHamitic and Uralic languages have the biggest number of mutual features with Dravidian. Thus this point is placed at the interval of 7,5 cm from the Uralic point and 9,1 cm from the SemiticHamitic point in direction from the center of the model, otherwise it lies next to the Altaic point h.e. not consistent with the number of mutual features between them. The point F for the Kartvelian languages is placed the same way but on the base AC. As the first iteration is finished, we can determine the scheme of the graphic model for the Nostratic languages. The areas of these languages are to be somehow close to the points A, B, C, D, E, F. Then the positions of the language areas are corrected by building points on other bases. It goes without saying that new points do not overlap each other.
Fig. 2. The first and the second iterations during the construction of the graphic model of the Nostratic relationships.
The whole configuration of the aggregate of points for each language prompts us the direction where we have to move the areas in order to place the points forming the most compact graph. In so doing, we can repeat two or three iterations to get the definitive graphical model of language relationship. In our case the model of the Nostratic language relationship has the final appearance presented on the figure 3. The figure has fractal characteristics and reminds Sierpinski triangle
Fig. 3. The model of relationship of Nostratic languages.
The construction of a graphical model of kinship using lexicalstatistical data may well be automated. For this you need to create a program for the computer. This is a task for applied mathematics, where compilation of mathematical models of systems is common. Automatic built models would cause greater confidence, but so far none of the applied mathematicians have decided to undertake such work. Obviously, it's still not an easy task.
Using of the graphoanalytical method eventually led to the discovery of the phenomenon ethnoproducing areas. Their existence is a kind of empirical generalization, which, according to Vernadsky, "does not differ from the scientifically established fact" (VERNADSKY V.I. 2004, § 15).
Quantitative data on the vocabulary studied using the graphoanalytical method for forty years.
Languages families and groups

Number of languages 
Isogloss number 
Total number of words 
SinoTibetan 
7 
2.775 
9.700 
TungusManchu 
11 
2.234 
10.200 
Mongolic 
8 
2.250 
8.500 
Nostratic 
6 
433 
2.600 
AbkhazAdyghe 
5 
1.800 
4.500 
NakhDagestanian 
27 
1.900 
24.000 
IndoEuropean 
14 
2.554 
12.381 
FinnoUgric 
12 
1.913 
9.584 
Turkic 
13 
2.558 
19.670 
Germanic 
6 
2.630 
11.065 
Iranian 
11 
1.773 
8.249 
Slavic 
10 
3.200 
12.000 
Total 
130 
≈ 26.000 
≈ 135.000 
