The Selection of the Lexical Material.

The selection of lexical material for research was initially done by the author on their own because of the absence or unavailability of etymological dictionaries. However, when using the etymological dictionaries to research ethnogenetic processes in prehistoric times, the selection should be done according to certain rules, which the author has learned in the course of their work. The main ones are set out below.

Methods of mathematical statistics are used for random sampling of the "general population" as a model for the data source. Random sampling statistically displays the general population, but it should be eliminated all subjective factors of non-random sampling of Education. In our research without the full confidence of the random nature of the sample deliberately planned to take a greater number of elements in order to prevent distortion of the final results. However, the graphic-analytical method used feature that randomness and the sample size is determined to be sufficient to reflect the internal structure of the population, which can not be achieved with a small amount of random data. Therefore, data were collected in the volume "level of confidence", which provides construction of the graphic layout relationships related languages. The fact of constructing a circuit indicates the existence of an internal data structure. If on the basis of the collected data to construct of a scheme closely related to languages ​​relationship fails, then it is an indication of any incorrect data or the lack of relationship between the individual languages, admitted to the study

The common opinion about large instability of vocabulary can be explained by the fact that many languages ​​have a lot of loan words. However, observations show that the loan words relate primarily to a more "cultural" layer of words but ancient words, which correspond to the lower cultural level, still remain in the language. These ancient words in the language at the same time are the most commonly used. According to A.V. Desnitsky, native vocabulary includes a significant part of the most common words which reflect the basic concepts and create the largest number of word-producing nests (DESNITSKAJA A.V. 1966: 9). M.V.Arapov and V.V. Herz say in their work about the dependence of the frequency of using a word and its age so:

There is a relationship between the frequency and the time of occurrence of the word in his language… Most of the words with a high frequency of use are ancient words, and vice versa – the lower the frequency of a word, the more likely that the word is new-created (ARAPOV M.V., Herz V.V., 1974: 3)

The authors note that for the first time, this connection was remarked on by George Kingsley Zipf in 1947 and appreciated its significance for the quantitative analysis of the facts relating to the history of a language (Zipf's law). However, it should be borne in mind that some words with a low frequency may be ancient, and there are many newly created words, which have a greater frequency of use, but they can be very easily removed while the lexical and statistical studies according to their meaning.

It is known that there are such languages ​​wich vocabulary has more words of foreign origin, but while common circulation their own words are used, therefore such languages ​​do not give the impression of belonging to a different linguistic group even according to their vocabulary. For example, the Romanian language is in such a situation, having more words of Slavic origin, followed by Latin, Turkish, and modern Greek, but the Romanian language and texts, written in that language give the impression of a Romance not Slavic. Ignoring or misunderstanding the fact of dependence of word age on its using frequency in language confuses linguists in the issue of the primary kinship of languages, ​​complicates the distinction between ancient words and loanwords, and eventually gets scientists in a deadlock.

The problem of the separation of common words and later loan words in related languages ​​is one of the most difficult in historical linguistics. It is well understood by all comparativists because it immediately raises comparative analysis of all the languages. Choosing to study even the most commonly used words for lexical and statistical analysis based on their values, we are always subject to certain risks in including in the lists some of the ancient words of foreign origin. However, for the majority of languages, they are relatively few, and if selected lexical material is specially analyzed in order to eliminate borrowed words, this risk is substantially reduced, and errors have not significantly affected on results of the research. Elimination of later borrowing can be easier in cases you know sometimes a donor language. We say about more recent borrowings, of that time when the speaker of languages have left their ancestral areas. Before this time, borrowing from one language group to another are difficult to separate from the words of its own origin. But for determination of the primary areas of settlements, as we shall see, it's not a big deal.

In principle, the very selection of data would require a minimum of professional knowledge and would be purely technical work in the availability, accessibility, and completeness of etymological dictionaries. Unfortunately, all three of these conditions are not met. ​​Etymological dictionaries are still not made up for some languages, for others, they are only prepared and not completely out of print. Systematization of the material was hindered to a certain extent also by incomplete data in etymological dictionaries. They give rarely a full set of matches in related languages, the authors are often limited to the examples of the most famous, and sometimes some erroneous etymology wander from one dictionary to another.

All these circumstances have forced most part of the work on search and selection of data to conduct by careful review of bilingual dictionaries, in which in most cases you can find very rich material. However, there are not enough dictionaries. In accordance with the subject of work would have to be processed Samodian language dictionaries, but because of their lack such work has not been performed at all. However, the most negative impact on the results of the studies had missing or incomplete dictionaries of some Iranian languages.

Studies were conducted on the lexical level without grammatical forms with a comparison of lexical units in the two plans – sound and meaning. The coincidences of sound forms without correspondence in meaning were unconditionally ignored. While assessing semantic aspect matches were identified from a maximum value – synonymy through a greater or lesser similarity of semantics to antonyms, which is sometimes a consequence of the specificity of the concept (the classic example – the original meaning of "edge" can be in different languages ​​to get to "the beginning" and "end" ). Synonymy is understood here as a match of at least one meaning of the word in different languages ​​(usually the dominant), but not a complete coincidence of semantic fields. However, most often in the material prevailed not synonyms but words of a similar sense of common origin, not even necessarily the same grammatical category.

Compiled in such way Etymological table-dictionaries for diverse language families used for calculation are shown on my site "Alternative historical linguistics".