V.Stetsyuk's personal site logo
Mail to site
Print version
News (RSS)
Historical Macrolinguistics. / The Common Principles of the Macrolinguistics. / The Graphic-Analytical Method.

The Graphic-Analytical Method.

Principles of the selection of the lexical material. for research are considered apart

The method used in the studies, called the graphoanalytic by the author, is both a means and technique of cognizing the relationships of closely related languages at an early stage of their development conditioned by the peculiarities of the natural environment. It was first described in 1987 in the article "The Determination of Habitats of Ancient Slavs by a Graphic-Analytical Method" in the magazine "Proceedings of the Academy of Sciences of the USSR. The Series of Literature and Language. Volume LXIV:1, Moscow" (in Russian). This method is practical implementation of theoretical reasoning of an Ukrainian philologist Illarion Sventsitskiy who, defending the need for the application of mathematics in the humanities, wrote:

It is important only that all sorts of human and world relationships are easiest to denote by numbers, volume, and position in space and time, so they can easily fit into the framework of mathematical symbols (SWIENCYCKYI I., 1927, 53)

Method, indeed, determines the relative position of related languages in space at a certain time and is based on the use of one of the types of graphs that maybe still awaits its description in mathematics (the author has not found it in graph theory ).

The investigation of prehistoric ethnogenic processes required the analysis of large volume of lexical data selected from various dictionaries. It is not necessary to have perfect command of all these analysed languages when working with dictionaries of different language families, but it is indispensable to know their phonetic peculiarities and rules of their changes according to the requirements of comparative-historical linguistics (MEILLET A, 1938; MEILLET A., 1954; FORTUNATOV F.F.,1956; GAMKRELIDZE T.V., IVANOV V.V., 1984). The work of H. Krahe (KRAHE HANS, 1966) was used while selecting and systematizing of words of the Indo-European languages. The phonetic rules of the Finno-Ugric languages were drawn from the book of Russian linguists Lytkin V.I. and Gulajev E.S. (LYTKIN V. I., GULAYEV E.S., 1970). and the phonetic rules of the Turkic languages were drawn from Baskakov’s classification (BASKAKOV N.A., 1960).

The analysis was performed on the lexical level with the comparison of lexical units within two aspects – phonetic and semantic, h.e. after their appearance and meaning. The phonetic congruencies without semantic conformities were excluded from the study. The evaluation of semantic accordance was performed from synonymy, with more or less semantic similarity, till antonymy, which sometimes can be the consequence of concept characteristics (classical example – initial meaning of the word “side” can be changed to “beginning” and “end”).

The Nostratic, Indo-European, Finno-Ugric, Turkic, Iranian, Germanic, Slavic, Mongolic, Manchu-Tungus languages were studied with this new graphic-analytical method. Two types of table-dictionaries were used for each group of languages. At the beginning, the first type table-dictionary for the language group was compiled, where the semantic list was placed in the far left column but all available synonyms of each semantic concept were placed in forthcoming columns for each analysed language. Then the obtained synonymic nests were analysed for phonetic similarity and it allowed us to select the phono-semantic terms; the other words were added to the list after the analysis of synonyms with similar sense. The selected phono-semantic terms constituted the table-dictionary of the second type where the identifiers of phono-semantic terms were placed in the far left column and available matches from particular languages were placed in the remaining columns. They form phono-semantic set. Worked out by such way Etymological table-dictionaries for different language families and groups, using which calculation was provided, can be found on my site "Alternative historical linguistics".

The data of these tables provided us with the means to calculate the number of mutual words in the language pairs, necessary for the construction of graphic models for the language relationships within the same language families. These graphic models are the graphs of specific sort, possibly yet to be described (the author has not yet found it in anywhere) in mathematics. This graph can be characterized as a “weighed graph” where not only single nodes but all of them without exclusion form mutual connections and not only the connection itself is important, the distance between all the nodes has to be considered. In this case, each node of graph is not just a point but the aggregate of points and every aggregate correspond to a particular language in the relationship model. Each point of the aggregate is the end of the segment with the length iinversely proportional to the number of mutual words in the pair of languages that correspond to those two aggregates connected by this very segment. When the number of mutual words in the language pairs is known, it is possible to determine the set of segments needed to build the graphic model. Even this possibility of the graph construction proves the existence of a certain system in the database but certain doubts may arise. Let’s calculate this probability.

If we take the graph A, which has n mutually connected nodes, each node has (n-1) ribs. As we know from mathematics, it is enough to have only two co-ordinates in any frame of co-ordinates to place a point on a plane. For our graph, we can determine much more pairs of co-ordinates combining all ribs by two with each other. (When the length of ribs is known!). The number of pairs C can be calculated with this known equation:

For example, if we have the number of nodes n = 6, the number of pairs C of co-ordinates will be 10, but when n = 10, C increases to 36, and C = 55 when n = 12. Thus if n is as much as 6, we can determine a place for each node in tens different ways. In our case with the graph A when we use all possible variants of nodes arrangement with the ribs of known length, every time some nodes will get to the same point. But when we analyse a real situation, e.g. , the system of cognate languages, the graph B, where each of its nodes is not just a single point but the aggregate of points, which fill small areas and these areas do not overlap each other, can meet our requirements. If we have the number of analysed objects n = 6 and they fill the area S = 1, each object fills the area as big as s =1/6. In that case, the probability for at least one point to get on its own place is equal to 1/6. If we have 6 objects, we can place each node in ten different ways (look above), so the probability for the point to get on the same very place in each of ten cases will be equal to 1/610 = 1: 604 660 176. As far as we have 6 objects, this number has to be multiplied by six times again and we shall obtain a number with 80 zeros in the denominator. If we have ten objects, the number of zeros in the denominator will increase up to 3600. It demonstrates that accidental construction of the graphical model is practically impossible.

The construction of the graphic models can be demonstrated on the example of the Nostratic languages. This term is used for the phylum of six large language families of the Old World: Altaic, Uralic, Dravidian, Indo-European, Kartvelian and Semitic-Hamitic (Hamito-Semitic, or Afro-Asiatic) which seem to have a common parent language. The necessary data for the analysis were sourced from the work of the Ukrainian linguist Illich-Switych (ILLICH-SVITYCH V.M., 1971 ). He analyzed and systematized similarities in word structure, grammar and vocabulary of the Nostratic languages and gave a large volume of such matches between these languages in his book. The scholar assumed that these similarities can be interpreted only within the theory postulating genetic relationship of these languages i.e. that they are monophyletic and belong to one super-family (phylum) of the Nostratic languages.

Some of the results of Illich-Svitych’ study were taken from tables in his book (morphologic features and the vocabulary of 147 units) and 286 matches were found in the further text. After the comparison of this data with the research materials of another Russian scholar (ANDREYEV N.D., 1986), consistent with the results of Illich-Switych, they were supplemented with 27 words from the Uralic languages and 8 words from the Altaic languages. As a result, it is turned out that we determined 433 features in total. Thirty four of them were common for the whole phylum and the rest was composed by 255 units from the Altaic, 255 units from the Uralic, 253 units from the Indo-European, 240 units from the Semitic-Hamitic, 189 units from the Dravidian, and 139 units from the Kartvelian languages respectively. Then the number of mutual features in language pairs was calculated. The results of the calculation are given in table 1.

Table 1. Quantity of mutual features between language families.

Altaic – Uralic 167 Uralic – Kartvelian 66
Altaic – Indo-European 153 Indo-European – Semitic-Hamitic 147
Altaic – Semitic-Hamitic 149 Indo-European – Dravidian 108
Altaic – Dravidian 109 Indo-European – Kartvelian 70
Altaic – Kartvelian 84 Semitic-Hamitic – Dravidian 110
Uralic – Indo-European 151 Semitic-Hamitic – Kartvelian 86
Uralic – Semitic-Hamitic 136 Dravidian – Kartvelian 54
Uralic – Dravidian 134

Further, in fine print, follows detailed description of the principle of constructing a graphical model of kinship closely of related languages for selected lexical and statistical data on the example of the Nostratic languages. This description can be omitted without prejudice to the considered question.

We can’t yet speak about the certain rule in the analyzed data but one can find out that as a rule there is the biggest volume of mutual words in the Altaic, Uralic, Semitic-Hamitic and Indo-European languages. Using graphic-analytical method, let’s try to build the graphic model of the Nostratic relationship to prove the existence of a certain rule in this data. First, the distances between the centres of the habitats of individual Nostratic speakers at the time of these languages arising has to be calculated with the formula L = K/N, where L is the distance, N is the number of mutual words (features) in separate pairs and K is the scale factor to be determined, (K > 0). The choice of the scale factor is determined by the size of the plane of building our model. Number K = 1000 is consistent with our data. So the distances in cm between the areas of individual languages have are presented in table 2.

Table 2. Distances between centers of language family areas at the diagram, cm.

Altaic – Uralic 6.0 Uralic – Kartvelian 15.2
Altaic – Indo-European 6.5 Indo-European – Semitic-Hamitic 6.8
Altaic – Semitic-Hamitic 6.7 Indo-European – Dravidian 9.3
Altaic – Dravidian 9.2 Indo-European – Kartvelian 14.3
Altaic – Kartvelian 11.3 Semitic-Hamitic – Dravidian 9.1
Uralic – Indo-European 6.6 Semitic-Hamitic – Kartvelian 11.6
Uralic – Semitic-Hamitic 7.3 Dravidian – Kartvelian 18.5
Uralic – Dravidian 7.5

The construction of the model requires reiterations. First, one point for each language is determined on two co-ordinates. These six points determine the estimated places of languages and their exact places are to be found with the subsequent iterations. In principle, one can start building the model from any language but when it is unknown in which direction it will extend it can exceed the limits of the plane. Therefore it is better to start with the language pair which has more mutual features. In this case, this pair is the Altaic and Uralic languages. So, first, the segment AB with length of 6cm corresponding with the number of mutual words in these two languages is placed close to the centre of the plane. The ends of this segment determine the place for points of the Altaic and Uralic languages (see figure 2).

The points for the Indo-European and Semitic-Hamitic languages are placed on the base of this segment. We start with the point for the Semitic-Hamitic because this language has more mutual features with Kartvelian and Dravidian. According to the number of mutual features the point of Semitic-Hamitic is to be placed at the distance of 6,7cm from the point of the Altaic language, and at 7,4 cm from the point of the Uralic language. Two arcs with such radiuses are made by the pair of compasses and the point of Semitic-Hamitic is situated on their attachment. There can be two of such points – to the left and to the right of the base AB. The first of them determines the final appearance of the graphic model that can have two mutually reflexive variants. We select the point closer to the center and obtain three points – A, B, C, and look for point D for the Indo-European languages. It is also situated on the base AB. It has to be at the interval of 6,5 cm from the point A and at the interval of 6,6 cm from the point B. Two corresponding arcs are made with a pair of compasses opposite to the point C, so point D is found. (Point D can’t be close to the point C, for the Indo-European and Semitic-Hamitic languages should have had considerably more mutual characteristics in such case, when it not like that in the reality). The point E for the Dravidian languages is placed on base BC because the Semitic-Hamitic and Uralic languages have the biggest number of mutual features with Dravidian. Thus this point is placed at the interval of 7,5 cm from the Uralic point and 9,1 cm from the Semitic-Hamitic point in direction from the center of the model, otherwise it lies next to the Altaic point h.e. not consistent with the number of mutual features between them. The point F for the Kartvelian languages is placed the same way but on the base AC. As the first iteration is finished, we can determine the scheme of the graphic model for the Nostratic languages. The areas of these languages are to be somehow close to the points A, B, C, D, E, F. Then the positions of the language areas are corrected by building points on other bases. It goes without saying that new points do not overlap each other.

Fig. 2. The first and the second iterations during the construction of the graphic model of the Nostratic relationships.

The whole configuration of the aggregate of points for each language prompts us the direction where we have to move the areas in order to place the points forming the most compact graph. In so doing, we can repeat two or three iterations to get the definitive graphical model of language relationship. In our case the model of the Nostratic language relationship has the final appearance presented on the figure 3. The figure has fractal characteristics and reminds Sierpinski triangle

Fig. 3. The model of relationship of Nostratic languages.

The construction of a graphical model of kinship using lexical-statistical data may well be automated. For this you need to create a program for the computer. This is a task for applied mathematics, where compilation of mathematical models of systems is common. Automatic built models would cause greater confidence, but so far none of the applied mathematicians have decided to undertake such work. Obviously, it's still not an easy task.

Using of the graphoanalytical method eventually led to the discovery of the phenomenon ethno-producing areas. Their existence is a kind of empirical generalization, which, according to Vernadsky, "does not differ from the scientifically established fact" (VERNADSKY V.I. 2004, § 15).

Quantitative data on the vocabulary studied using the graphoanalytical method for forty years.

Languages families and groups Number of languages Isogloss number Total number of words
Sino-Tibetan 7 2.775 9.700
Tungus-Manchu 11 2.234 10.200
Mongolic 8 2.250 8.500
Nostratic 6 433 2.600
Abkhaz-Adyghe 5 1.800 4.500
Nakh-Dagestanian 27 1.900 24.000
Indo-European 14 2.554 12.381
Finno-Ugric 12 1.913 9.584
Turkic 13 2.558 19.670
Germanic 6 2.630 11.065
Iranian 11 1.773 8.249
Slavic 10 3.200 12.000
Total 130 ≈ 26.000 ≈ 135.000

Free counter and web stats

Please this page? Help us to develop this site!

© 1978 – 2019 V.Stetsyuk

Reprinting of articles from site are encouraged while
reference (hyperlink) to my site is provided

Site powered by

Load count : 4628

Modified : 19.12.2018

If you look up the type error
on this page, please select it
by mouse and press Ctrl+Enter.