LABORATORY 15
Head of Laboratory – Dr.Sc.
(Linguistics) Igor Boguslavsky
Tel.: (095) 299-49-27; E-mail: bogus@iitp.ru
The leading researchers of the laboratory include:
Full member of the Russian Academy of Sciences, Dr.Sc. (Linguistics) Jury D.
Apresjan |
||
Dr. Sc. (Linguistics). |
Vladimir Z. Sannikov |
Nikolay V. Grigoriev |
Dr. |
Leonid L.
Iomdin |
Alexander V. Lazursky |
Dr. |
Leonid G. Mitjushin |
Irina E. Kayali |
Dr. |
Leonid L. Tsinman |
Leonid G. Kreidlin |
Dr. |
Svetlana A. Grigorieva |
Nadezhda E. Frid
|
RESEARCH ACTIVITIES
The main problem area of the Laboratory is the study
of natural language functioning as a means of information transmission. Basic
research activities pursued in the laboratory are aimed at the development of a
full operational formal model of language of the Meaning Û Text type. This model simulates human linguistic
behavior, that is, the human’s ability to produce and comprehend natural
language texts.
All scientific
results obtained in 2001 bear upon the enhancement of the scope of functional
possibilities of the multipurpose NLP system, ETAP-3. A demo version of the
system is available over the Internet at http://proling.iitp.ru
.
1.
New versions of combinatorial dictionaries of Russian and English have
been developed. Each of the two dictionaries now counts up to 53,000 lexical
entries, which is comparable in size with large traditional general-purpose
bilingual dictionaries. Both dictionaries have undergone not only quantitative
but qualitative changes as well. Up till quite recently the general strategy of
the Laboratory’s lexicographic work has been to reduce to the minimum the
polysemy of lexical items by aggregating within one entry several meanings of
the word. Such aggregated lexical entries were supplied with the respectively
aggregated patterns of government, which enhanced the probability of mistakes
in machine translation and other NLP systems. The greater speed of
text-processing in modern computers allowed to give up that strategy, while the
introduction of more sophisticated lexicographic information in the dictionary
entries required that this strategy should be abandoned. The new lexicographic
strategy allows to a greater extent to take into account the polysemy of a word
and to handle each of its meanings as a separate lexical item, with its own pattern
of government, its own set of lexical functions, and special translations in
various cases of its use. As a result, the Russian and English combinatorial
dictionaries have come to reflect the real structure of the lexical systems of
both languages much more accurately, which led to an improvement in machine
translation quality.
2.
The Russian morphological dictionary now counting up to 120,000 lexical
units has been further replenished by proper and geographical names and has
been stripped of the doublets of the "бомбовый – бомбовой" type which
appeared in it as a result of integration with Zaliznyak’s Grammatical
Dictionary of Russian.
3.
The new algorithm of morphological analysis developed in 2000 and participating
in every ETAP-based NLP application has been programmed on the basis of the
finite state automaton technologies (FST). The basic characteristics of the new
system of morphological analysis are:
·
a high speed of operation (several thousand words a second),
·
bidirectionality (the same set of data can be used both for analysis
and generation),
·
compactness (it requires very little RAM and disc space).
4.
The capacities of ETAP-3’s parsing module have been expanded by introducing in it a weighting mechanism
aimed at generating the most probable syntactic structure for each processed
sentence. A prototype syntactic parser has been created which takes into
account the results of statistical analysis of large corpora in producing the
syntactic structure for a particular sentence.
5.
The algorithm of grammar and functional ambiguity resolution of Russian
words using morphological data and linear context, which was designed in 2000,
has been programmed and is being run for collecting material intended to
improve the performance of morphological and syntactic analyzers.
6.
A series of experiments have been staged to integrate the ETAP-3 system
with an online question answering system, IAW (I Ask Web), carrying on a
dialogue with the user in a natural language. The ETAP was replenished with a
semantic dictionary including the following three domains: the Internet shop,
immovables and taxes. Apart from that, a module was written which transforms
the syntactic structure generated by ETAP into the corresponding semantic
representation. The use of the ETAP facilities made possible a more accurate
choice of the answer to the question asked. Essential progress has also been
made in syntactic ambiguity resolution. The work on the integration of ETAP and
IAW was fully done, the integrated system has been put into action and can be found
at the address www.iaskweb.com.
7.
The deconversion module for the UNL (Universal Networking Language) generating
target Russian texts from the input UNL semantic representations has been
further developed. In cooperation with partners from Spain, Italy, France and
India a series of experiments on the simultaneous generation of texts in the
five languages on computers located in the five respective countries has been
prepared and staged. Within this framework concrete recommendations aimed at
further improvement of UNL and methods of recording information in that
language have been worked out. The system can be looked up at the site
(http://www.unl.ru).
8.
Work on the project "Computer-Aided Learning of Lexica" has
been completed. Learner’s dictionaries of Russian and English counting up to
2500 entries each have been created. They store the following types of
information about the lexemes: a) part of speech, b) translation or
translations into the other working language, c) the analytical definition of
the lexeme, d) its semantic features, or descriptors, e) its pattern of
government, f) the values of the lexical functions it has. The total number of
lexical functions is 107. On this basis several computer lexical games have
been designed, for example: guess the word from its analytical definition, supply
the values of the lexical functions offered by the computer for the given word,
supply the values of a concrete lexical function for the words offered by the
computer and so on. This product is equipped with a system of numerically
assessing and scoring up the user’s answers depending on the number of correct
answers and the degree of linguistic complexity of the questions.
9.
Work on the project "A Formal Model of Paraphrazing for the NLP
systems" has been completed. Apart from the paraphrazing rules sketched
out in the classical version of the "Meaning – Text" theory, a great
number of new rules have been introduced bearing on the synonymic relations in
the derivational and syntactic subsystems of the language, in particular:
-rules working with the so-called Aktionsarten of the
Russian verb (transformations of the inceptive, finitive, causative and
liquidative verbs into the respective verbal phrases, e. g., «Зал зашумел – В
зале поднялся шум»);
-rules working with the so-called indefinite personal
constructions of Russian (transformations of the type «Его обманули – Он был
обманут»).
In this connection about two dozen new lexical functions
missing in the classical version of the "Meaning – Text" model were
introduced in the paraphrazing system.
10. The
first round of work on the second part of the tagged corpus of Russian texts
(collection and primary processing of a corpus of sentences) has been completed.
The textual material used as the second part of the corpus are the so-called
news tapes of the Internet – sets of brief pieces of information issued by the
news agencies. The texts were borrowed from the sites www.yandex.ru, www.lenta.ru,
www.rbc.ru, www.polit.ru
and some others. On the one hand, this material is stylistically and
syntactically fit for automatic processing because it requires a very modest
amount of post-editing. On the other hand, the results of tagging this kind of
texts are extremely useful for improving the performance of information
retrieval systems. Alongside of the collection of texts work on tagging the
sentences has been started.
GRANTS From:
·
Russian
Foundation of Basic Research (No. 99-06-80277): "Development of an Operation
Meaning Û Text Linguistic Model (third
release)".
·
Russian
Foundation of Basic Research (No. 99-06-80292): "A Formal Model of Paraphrasing
of Sentences for Natural Text Processing".
·
Russian
Foundation of Basic Research (No. 01-06-80453): "Development of a Compound
Parsing Algorithm for the Linguistic Processor ETAP-3".
·
Russian
Foundation of Basic Research (No. 01-07-90405): "Creation of an Annotated
Corpus of Russian Texts (second release)".
·
Russian State
Scientific Foundation (No. 99-04-00318): "Computer Assisted Learning of
Language Vocabulary".
Publications
in 2001
1.
Jurij D. Apresjan. Semantyka leksykalna. Synonimiczne środki języka.
Przeł. Zofia Kozlowska i Andrzej Markowski. Drugie wydanie polskie
przygotowały Zofia Kozłowska i Elżbieta Janus. Wrocław –
Warszawa – Kraków: Ossolineum, 2000 (реально вышла в 2001).
2. Boguslavsky I.,
On the scales and implicatures of EVEN //
Pragmatics and Flexibility of Word Meaning. Ed. by E. Németh t., K.
Bibok. Current Research in the Semantics/Pragmatics Interface, 8, Elsevier
Science, 2001.
3.
Апресян Ю.Д. Смыслы
‘знать’ и ‘считать’ в системе русского языка // Међународни научни скуп о лексикографиjи и лексикологиjи
«Дескриптивна лексикографиjа стандартног jезика и њене теориjске основе.
Резимеи. Београд – Нови Сад, 2001, 1-2.
4.
Апресян Ю.Д. Глагол заставлять: семантический класс,
синонимия, многозначность // Жизнь языка. Сборник статей к 80-летию Михаила
Викторовича Панова. М.: 2001, 13-27.
5.
Апресян Ю.Д.
Системообразующие смыслы ‘знать’ и ‘считать’ в русском языке // Русский язык в
научном освещении. 2001, № 1, 5-26.
6.
Апресян Ю.Д. Значение
и употребление // ВЯ. 2001, № 4, 3-22.
7.
Апресян Ю.Д. Синонимия
предикатов группы ждать // Слово.
Юбилеен сборник, посветен на 70-годишнината на проф. Ирина Червенкова. София,
2001, 16-32.
8.
Апресян Ю.Д. «Русский
синтаксис в научном освещении» в контексте современной лингвистики // А. М.
Пешковский. Русский синтаксис в научном освещении. Издание 8-е. Языки
славянской культуры, М.: 2001, III-XXXIII.
9.
Апресян Ю.Д. Восхищение и восторг: сходства и различия // Традиционное и новое в русской
грамматике. Сборник статей памяти В. А. Белошапковой. М.: «Индрик», 2001,
94-106.
10. Апресян Ю.Д. От значений к несемантическим свойствам лексем:
знание и мнение // Русский язык: пересекая границы. Дубна, 2001, 7-18.
11. Апресян Ю.Д., Ботякова В.В., Латышева Т.Э. и др.
Англо-русский синонимический словарь. М.: Русский язык, 2001, изд. 6-е,
стереотипное, 543 с.
12. Апресян Ю.Д., Иомдин Л.Л., Медникова Э.М., Петрова А.В. и
др. Новый большой англо-русский словарь. М.: Русский Язык, 2001. Изд. 6-е,
стереотипное. T. I, 832 c., T. II, 828 c., T. III, 824 c.
13. Богуславский И.М. Об одной загадке языка Пушкина // A. S.
Puškin und die kulturelle Identität Russlands / Gerhard Ressel
(Hrsg.). – Frankfurt am Main; Berlin; Bern; Bruxelles; New York; Oxford; Wien:
Lang, 2001, S. 133-144.
14. Богуславский И.М. Модальность, сравнительность и отрицание.
// Русский язык в научном освещении. 2001, № 1, 27-51.
15.
Григорьева
С.А. Степень и количество // Труды Международного семинара Диалог'2001 по
компьютерной лингвистике, Аксаково, 2001, с. 68-75.
16.
Григорьева
С.А., Григорьев Н.В., Крейдлин Г.Е. Словарь языка русских жестов. Языки русской
культуры // Wiener Slawistischer Almanach Sonderband 49, Москва-Вена, 2001, 230 с.
17. Крейдлин Г.Е., Фрид Н.Е. Вслух про себе (семантика и синтаксис одной русской частицы) // Лингвистика
на рубеже эпох: идеи и топосы. Сборник статей. М.: РГГУ, 2001. с. 46-67.
1.
Jurij D. Apresjan.
Principles of Systematic Lexicography // In Honour of B. T. S. Atkins
(in print).
2.
Boguslavsky I. UNL from the linguistic point of view (in print).
3. Boguslavsky I. Even in discourse: Interaction of lexical meanings and
interpretation strategies (in print).
4. Iomdin L., Carl
M., Pease C., Streiter O. Towards a Dynamic Linkage of Example-Based and
Rule-Based Machine Translation // MT (in print).
5.
Апресян Ю.Д. О
лексических функциях семейства REAL – FACT // Сборник в честь Z. Saloni (в
печати).
6.
Апресян Ю.Д.
Наказание в языковой картине мира // Сборник в честь Анджея Богуславского (в
печати).
7.
Апресян Ю.Д.
Системность лексики: семантические парадигмы и семантические альтернации //
Сборник в честь С. Кароляка (в печати).
8.
Григорьева
C.А. Словарная статья синонимического ряда ПОЛНОСТЬЮ (в печати).
9.
Григорьева
С.А. Словарная статья синонимического ряда ПОЧТИ (в печати).
10.
Григорьева
C.А. Словарная статья синонимического ряда ЧАСТИЧНО (в печати).
11.
Григорьева
C.А. Словарная статья синонимического ряда ВРЯД ЛИ (в печати).
12. Иомдин Л.Л. Синтаксические особенности фразеологических
единиц: новые подробности // Сборник статей в честь 70-летия проф. А.
Богуславского (в печати).