Sector 1.1

Sector of Computer Logic in Information Processes

Head of the Sector – Dr.Sc. (Mathematics), Prof. Vassily Lyubetsky

Tel.: (095) 299-83-54, (095) 413-46-43; E-mail: lyubetsk@iitp.ru

 

The leading researchers of the laboratory include:

 

Dr.Sc. (Math.)

A. Chagrov

Dr.Sc. (Math.)

N. Vereshchagin

Dr.Sc. (Biol.)

M. Gelfand

Dr.

V. V'yugin

Dr.Sc. (Math.)

P. Golubtsov

Dr.

K. Gorbunov

 

 

DIRECTIONS OF ACTIVITY:

 

·        construction of algorithms of acceptable complexity (quadratic or close) for analysis of regulation on the DNA and RNA levels: search for regulatory signals and given patterns in genomic sequences, construction of conservative and alternative secondary structures. On this basis application of these algorithms for analysis of regulatory sites in complex interacting regulatory systems.

·        construction of algorithms of acceptable complexity (quadratic or close) for construction of reconciled phylogenetic trees and application these algorithms for construction of a species tree and analysis of evolutionary events on the level of genes.

·        investigation the concept of information interaction, effective algebraic description of objects on the basis of the theories: model completeness, modal logic, information transformers categories, algorithmic complexity and randomness.

 

MAIN RESULTS

 

The previously suggested algorithm for identification of regulatory signals (binding sites of transcription factors) in bacterial genomes was optimized and parallelized. The algorithm accepts as input a sample of genome fragments (strings in the four-letter alphabet) believed to contain such signals (that is, a collection of candidate regulatory regions) and constructs the distribution of scores measuring the likelihood that a word representsed the desired signal. This is a classical and yet unresolved problems of the computational molecular biology. It gained additional importance during last years after commencing of global experiments on the analysis of genome expression and publication of complete genomes of many bacteria and some eukaryotes, which made possible the comparative analysis of regulation processes. The algorithm was implemented on a multiprocessor (cluster) system. This allowed for processing samples containing millions of nuclotides, which is not possible with a sequential algorithm.

A previously suggested algorithm (of approximately linear complexity) for finding alternative RNA secondary structures (pairs of mutually exclusive hairpins) was improved. Computer programs for prediction of alternative and conserved RNA structures were developed.

An earlier developed algorithm (of approximately linear complexity) for construction of a reconciled tree («consensus» or averaged tree) given a set of protein family trees. This consensus tree represents a phylogenetic tree for the genomes encoding the corresponding proteins. This is a traditional problem in the theory of molecular evolution. The algorithm was implemented and applied to analysis of several species, and also groups of evolution trees of mitochondrial proteins and clusters of orthologous genes.

In the continued collaboration with Integrated Genomics, Inc., several specific regulatory systems in bacterial genomes were analyzed. Specifically, the following regulatory systems have been analyzed: arginine regulon in many taxa, multiple drug resistance systems in enteric bacteria, biosynthesis of aromatic amino acids and utilization of pentoses, iron utilization (FUR) and FNR-regulons in gamma-proteobacteria, SOS-response and utilization of pentoses in Gram-positive bacteria. This resulted in identification of a large number of new regulated genes and in some cases in identification of new regulatory signals.

In the same collaboration, the work on creation of algorithms for gene recognition in eukaryotes was continued. Programs for gene recognition in the presence of sequencing errors and for gene recognition by comparison of genomic sequences were developed. New cases of brain-specific splicing were analyzed and corresponding regulatory signals were predicted.

A program searching for RNA secondary structures given a pattern was developed. It was used to identify regulatory sites (RFN-elements) upstream of riboflavin synthesis and transport genes in many bacterial genomes. A new mechanism for riboflavin regulation and riboflavin transport was suggested.

Regulatory RNA secondary structures (T-boxes) were predicted upstream of genes encoding aa-tRNA-synthetases and amino acid biosynthesis enzymes in Gram-positive bacteria. The family of T-box-regulated genes was extended.

Metabolism of fatty acids was studied in Escherichia coli, Haemophilus influenza, Vibrio cholerae, Yersinia pestis. FadR was shown to regulate all stages of degradation of long-chain fatty acids and partially biosynthesis of fatty acids. The acyl-CoA-dehydrogenase gene yafH was shown to be identical to the gene described earlier as fadE. New members of the FadR-regulon were found.

The algorithms for signal identification was successfully applied for identification of palindromic 20 bp signal of purine regulation in Р. horikoshii, and the trehalose regulator signal, as well as non-palindromic signals for MarX in Е. coli and CggR signals in a group pf related bacteria В. anthracis, В. subtilis, В. halodurans. Orthologous sets of genes in the E. coli and B. subtilis groups were analyzed; the results were compared with the existing collections DPInteract and DBTBS. New signals were given to biologists for expaerimental verification.

The previously suggested algorithm for analysis of overlapping RNA stem-loops was improved. Such alternative structures serve as regulatory signals of transcription and translation. Such parameters as energy of specific base pairs and the lengths of overlaps were taken into account. The new version is based on creation of a list of stem-loops (filtered by the free energy) and consequent formation of overlaps. In a number of biologically relevant test cases the obtained co-ordinates of loops were shown to match experimental data with high reliability. These results are used for identification of biologically relevant overlaps.

An algorithm for identification of conserved RNA structures was developed and implemented in software. It is based on alignment of hairpins for each pair of fragments in the sample. It allows for filtering of spurious hairpins. A specific version for the cloverleaf conformation was successfully tested on the tRNAs of E. coli.

The algorithms of phylogenetic analysis of genomes based on individual gene trees were modified for the analysis of horizontal transfer of genes. This version is based on the analysis of differences between the reconciled tree and the individual gene trees. The program implementing this algorithm was applied to the analysis of 42 bacterial genomes represented in 128 clusters of orthologous genes from the COG database (National Center of Biotechnology Information, USA)

The properties of the notion of «knowledge» were described by using formal logical means. To this end the first order language was enriched by the operator k(i,s) interpreted as «anthe agent i knows a fact s». It was proven that various theories obtained in this way are finitely axiomatizable and their complexity is the same as for that of the classical propositional calculus.

A way to describe informational interactions based on the formalism of a logic of knowledge (a version of the polymodal logic) was found. Minimal models (in the predicative and individual senses) of the obtained logic were found. An example of a minimal model is a model where an agent of an informational interaction is represented by a structured family of pairs «a model – a bunch of statements». One can describe in this terms several aspects of some specific kinds of informational interaction in the course of a child mental development, oral communications, etc.

The problem of construction of optimal strategies for games in stochastic dynamical systems with discrete time was investigated. This problem lies in the junction of nonantagonistic game theory and optimal control theory. Special attention was given to the cases of incomplete, or even, asymmetric information about the system. It was shown that additional information can have a negative impact on the game results. An algorithm of solving the problem was suggested and implemented in computer program. On the basis of this program a series of numerical simulations were performed (in particular, for the problem of marine fishery).

There where developed methods for construction of an optimum measurement-computer system, based on an imprecisely specified invariant measurement system, for the cases continuous-discrete, or parametrically specified measurement system. Methods of optimal calibration of these kind of invariant measurement systems were studied.

Study of general properties of categories of information transformers (ITs) was continued. It was shown that every category of ITs is a monoidal category of a special kind. Many basic categories of ITs, e.g., categories of stochastic, fuzzy, multivalued ITs have the structure of Kleisly category. The key concept in construction of such categories is a functor T that takes an object A to an object TA of «all distributions on A». In this connection the problem of endowing of a Kleisly category with a structure of monoidal category was studied. It was shown that sufficient conditions for this are existence of a natural transformation which takes a pair of «distributions» to an «independent joint distribution».

The enrichment of the first order language of the field of real numbers by a unary functional symbol f was studied. The goal was to solve a known problem of separation by a formula of this language of two given families of functions (the formula should be true when a function from the first family is substituted for f, and false when a function from the second family is substituted). It was proven that for any natural i, the family of functions having the derivative of the i-th order is separable from its complement. A conjecture was made that the family of all polynomials is inseparable from the family of all functions «glued» using infinitely many different polynomials, and from its complement in the class of functions glued from finitely many polynomials. Several particular cases of this conjecture were solved.

It was found that many NP-complete modal logics, even some non-local-tabular ones, become P-complete after bounding the number of propositional variables being used, i.e., «behave» like the classical propositional logic: for them the condition of satisfiability can be verified by a non-enumeration method. Among these logics there is an important logic S4.3.

A new inequality for Shannon entropy of discrete random variables was found (all the result listed below are joint with some mathematicians from Moscow Lomonosov University and from abroad).

A new proof of the Ahlswede-Gacs-Koerner theorem on the absence of common information it the outcomes of two correlated information sources was found.

It was proven that any propositional formula such that for any arithmetical formulas substituted for its variables there is a realization (in Kleene's sense) of the resulting formula whose complexity is a little «o» of the sum of the minimal complexities of substituted formulas, is derivable in the Yankov's logic (the converse is true also).

It was proven that given a program enumerating a small set of binary strings of the same length one can found a small family of betting strategies such that if the sequence of coin tossing outcomes belongs to the given set then at least one strategy wins a significant amount.

A description of all possible shapes of A. N. Kolmogorov structure functions was found, a key result in the algorithmic statistics.

It was shown that given the set of random strings as an oracle one can find in time polynomial in n a random string of length n.

Some results of instability of several ergodic theorems with respect to small growth of the randomness deficiency were proven.

 

The department members participated in the following symposiums:

-        Howard Hughes Medical Institute, 2001 Meeting of International Research Scholars, Vancouver, June 2001 (invited talk);

-        NATO advanced studies institute, Artificial intelligence and heuristic methods for bioinformatics, San Miniato, Italy, October 2001 (invited talk);

-        The III international conference «Control and complex systems modeling problems», Samara-2001 (invited talk);

-        The III international Smirnov memorial conference, Moscow, 2001 (invited talk);

-        The VI all-Russia scientific meeting «Modern logic: problems of theory, history and application in science'', Saint-Petersburg, June 22--24, 2000 (invited talk);

-        XVI Annual IEEE conference on Computational Complexity, Chicago, June 2001;

-        XVIII Annual Symposium on Theoretical Aspects of Computer Sciences, Dresden, February 2001 (a Program Committee member);

-        An international meeting on occasion of 100th anniversary of academician P. S. Novikov, Moscow, August 27-31, 2001 (invited talks).

 

The department members visited several universities and scientific centers to give courses and to collaborate (Lawrence Berkeley Laboratory, University of California-San Diego, University of Montana, Marseille, Amsterdam, Wien, Athens).

 

 

PUBLICATIONS IN 2001

 

Articles

 

1.      Данилова Л.В., Горбунов К.Ю., Гельфанд М.С., Любецкий В.А. Алгоритм выделения регуляторных сигналов в последовательностях ДНК (1) // Электронный научный журнал «Информационные процессы» (http://www.jip.ru/). 2001, т. 1, № 1, с. 56-63.

2.      Гельфанд М.С., Вьюгин В.В., Любецкий В.А. Об одном способе построения деревьев эволюции видов по множественным генетическим данным // Электронный научный журнал «Информационные процессы» (http://www.jip.ru/). 2001, т. 1, № 1, с. 64-77.

3.      Данилова Л.В., Горбунов К.Ю., Гельфанд М.С., Любецкий В.А. Алгоритм выделения регуляторных сигналов в последовательностях ДНК (2) // Молекулярная биология. 2001, т. 35, № 6, с. 987-995.

4.      Вьюгин В.В., Любецкий В.А. Об одном алгоритме поиска горизонтального переноса гена на основе филогенетических деревьев белков // Электронный научный журнал «Информационные процессы» (http://www.jip.ru/). 2001, т. 1, № 2, с. 167-177.

5.      Любецкий В.А., Селиверстов А.В. Об одном вероятностном алгоритме решения NP-полной проблемы // Труды III международной конференции «Смирновские чтения», РАН, Москва, 2001, с. 47-49.

6.      Данилова Л.В., Любецкий В.А. Алгоритм выделения регуляторных сигналов: тестирование и биологические применения // Труды III международной конференции «Проблемы управления и моделирования в сложных системах», Самара, РАН, 2001, с. 632-634.

7.      Кузнецов Н.А., Любецкий В.А., Чернавский А.В. К вопросу о понятии информационного взаимодействия, 2: доречевой интеллект // Труды III международной конференции «Проблемы управления и моделирования в сложных системах», Самара, РАН, 2001, с. 25-42.

8.       Горбунов И.А., Рыбаков М.Н. Выразительность операторов знания и возможность эффективного описания логик знания // Труды III международной конференции «Проблемы управления и моделирования в сложных системах», Самара, РАН, 2001, с. 617-622.

9.      Голубцов П.В., Старикова О.В. Инвариантность и калибровка измерительно-вычислительных систем в непрерывно-дискретном случае // Труды III международной конференции «Проблемы управления и моделирования в сложных системах», Самара, РАН, 2001, с. 635-641.

10. Витрещак А.Г., Гельфанд М.С. Предсказание вторичной структуры РНК. Программа RNApattern – поиск вторичной структуры по паттерну // Труды III международной конференции «Проблемы управления и моделирования в сложных системах», Самара, РАН, 2001, с. 623-625.

11. Чагров А.В. К логическим проблемам семантических аспектов информационного взаимодействия // Труды III международной конференции «Проблемы управления и моделирования в сложных системах», Самара, РАН, 2001, с. 611-616.

12. Горбунов К.Ю, Любецкая Е.В, Любецкий В.А. О двух алгоритмах поиска альтернативной вторичной структуры РНК // Электронный научный журнал «Информационные процессы» (http://www.jip.ru/). 2001, т. 1, № 2, с. 178-187.

13. Mironov A.A, Novichkov P.S., Gelfand M.S. Pro-Frame: similarity-based gene recognition in eukaryotic DNA sequences with errors // Bioinformatics. 2001, vol. 17, no. 1, pp. 13-15.

14. Rodionov D.A., Gelfand M.S., Mironov A.A., Rakhmaninova A.B. Comparative approach to analysis of regulation in complete genomes: multidrug resistance systems in gamma-proteobacteria // Journal of Molecular Microbiology and Biotechnology. 2001, vol. 3, no. 2, pp. 319-324.

15. Makarova K.S., Mironov A.A., Gelfand M.S. Conservation of the arginine repressor DNA binding signal in all bacterial lineages // Genome Biology. 2001, vol. 2, no. 4, pp. research 0013.1-0013.8.

16. Panina E.M., Vitreschak A.G., Mironov A.A., Gelfand M.S. Regulation of aromatic amino acid biosynthesis in gamma-proteobacteria // Journal of Molecular Microbiology and Biotechnology. 2001, vol. 3, no. 4, pp. 529-543.

17. Brudno M., Gelfand M.S., Spengler S., Zorn M., Dubchak I., Conboy J.G. Computational analysis of candidate intron regulatory elements for tissue-specific alternative pre-mRNA splicing // Nucleic Acids Research, 2001, vol. 29, no. 11, pp. 2338-2348.

18. Боринская С.А., Гельфанд М.С., Миронов А.А. Компьютерная геномика: в поисках генов // Химия и жизнь. 2001, № 2, с. 36-40.

19. Гельфанд М.С. Биология in silico // Компьютерра. 2001, № 36(413), с. 20-22.

20. Гельфанд М.С. Аннотация геномов – от последовательности к функции // Компьютерра. 2001, № 36(413), с. 22-25.

21. Горбунов К.Ю. Об одной модели информационной зависимости в алгебраической геометрии // Электронный научный журнал «Информационные процессы» (http://www.jip.ru/). 2001, т. 1, № 2, с. 147-149.

22. Голубцов П.В., Старикова О.В. Калибровка инвариантных преобразователей информации // Электронный научный журнал «Информационные процессы» (http://www.jip.ru/). 2001, т. 1, № 1, с. 78-88.

23. Голубцов П.В., Старикова О.В. Редукция параметрически заданных инвариантных измерительно-вычислительных систем // Вестник Московского Университета, Сер. 3 "Физика и астрономия". 2001, № 6, с. 3-6.

24. Вьюгин В.В. О неустойчивости индивидуальной эргодической теоремы // Проблемы передачи информации. 2001, т. 37, № 2, с. 27-39.

25. V'yugin V.V. Most sequences are stochastic // Information and Computation. 2001, vol. 169, pp. 252-263.

26. Рыбаков М.Н. Операторы всеобщего и распределенного знания: дополнительные выразительные средства в логиках знания // Электронный научный журнал «Информационные процессы» (http://www.jip.ru/). 2001, т. 1, № 1, с. 89-98.

27. Chagrov A.V. All tabular generally post-complete extensions of k4 have interpolation property // Information Processes (http://www.jip.ru/). 2001, т. 1, № 1, с. 50-55.

28. Zakharyaschev M., Wolter F., Chagrov A. Advanced modal logic. – In Gabbay D.M., Guenthner F. (eds.). Handbook of philosophical logic, 2nd edition, vol. 3, Kluver academic publishers, 2001, pp. 83-266.

29. Shen A., Vereshchagin N. Logical operations and kolmogorov complexity // Electronical colloquium on computational complexity. 2001, 5 pp., Ftp://ftp.eccc.uni-trier.de/pub/eccc/reports/1002/tr01-088.

30. Vereshchagin N. Kolmogorov complexity conditional to large integers // Electronical colloquium on computational complexity. 2001, 11 pp., Ftp://ftp.eccc.uni-trier.de/pub/eccc/reports/1002/tr01-086.

31. Vereshchagin N. An enumerable undecidable set with low prefix complexity: a simplified proof /. Electronical colloquium on computational complexity. 2001, 4 pp., Ftp://ftp.eccc.uni-trier.de/pub/eccc/reports/1002/tr01-083

 

In print

 

1.      Вьюгин В.В., Гельфанд М.С., Любецкий В.А. Согласование деревьев: реконструкция эволюции видов по филогенетическим деревьям генов // Молекулярная биология, 25 с.

2.      Novichkov P.S., Gelfand M.S., Mironov A.A. Gene recognition in eukaryotic DNA by comparison of genomic sequences // Bioinformatics.

3.      Rodionov D.A., Mironov A.A, Gelfand M.S. Transcriptional regulation of pentose utilization systems in the Bacillus/Clostridium group of bacteria // FEMS Microbiol. Lett.

4.      Laikova O.N., Mironov A.A., Gelfand M.S. Transcriptional regulation of pentose utilization systems in gamma-proteobacteria // FEMS Microbiol. Lett.

5.      Panina E.M., Mironov A.A., Gelfand M.S. Comparative analysis of FUR regulons in gamma-proteobacteria // Nucleic Acids Res. 2001, vol. 29, no. 23.

6.      Ponomarenko J.V., Orlova G.V., Frolov A.S, Ponomarenko M.P., Gelfand M.S. SELEX_DB, a database on in vitro selected oligomers adapted for recognizing natural sites and for analyzing both SNPs and site-directed mutagenesis data // Nucleic Acids Res. 2002, vol. 30, no. 1.

7.      Permina E.A., Mironov A.A., Gelfand M.S. Damage-repair error-prone polymerase of bacteria: association with mobile genome elements // Gene. 2001 (accepted).

8.      Герасимова А.В., Родионов Д.А., Миронов А.А., Гельфанд М.С. Компьютерный анализ регуляторных сигналов в бактериальных геномах. Участки связывания Fnr. Молекулярная биология, 2001, в печати.

9.      Голубцов П.В, Старикова О.В. Учет инвариантности в задаче калибровки инвариантных измерительно-вычислительных систем. Математическое моделирование (в печати, выход февраль-март 2002 г.).

10. Садовская Н.С, Лайкова О.Н., Миронов А.А., Гельфанд М.С. Изучение регуляции метаболизма длинноцепочечных жирных кислот с использованием компьютерного анализа полных бактериальных геномов. Молекулярная биология, 2001 (в печати).

 

Conferences report

 

1.      Vitreschak A.A. Computer analysis of regulation of genes, encoding aminoacyltRNA synthetases and amino acid biosynthetic proteins in Gram positive bacteria: T-box RNA regulatory element. Prediction of regulation of new genes, including amino acid transporters. – In the abstracts of International School «Artifical Intelligence and Heuristic Methods for Bioinformatics», Italy, San-Miniato, October 1-11, 2001, p. 63.

2.      Gelfand M.S., Laikova O., Mironov A.A., Novichkov P.S., Panina E.M., Rodionov D.A., Vitreschak A.G. Comparative analysis of bacterial patterns. – In the abstracts of Meeting of HHMI International Research Scholars, Vancouver, Canada, June 20-23, 2001, p. 75.

3.      Голубцов П.В., Старикова О.В. Оптимальная калибровка инвариантных измерительно-вычислительных систем в случае параметрической информации о схеме измерения // VII конференция «Обратные и некорректно поставленные задачи», Москва, 2001, с. 19.

4.      Golubtsov P.V, Starikova O.V. Invariance Considerations in Problems of Synthesis and Optimal Calibration of Measurement Computer System // First SIAM-EMS Conference Applied Mathematics in our Changing World, Berlin 2001, Collection of Abstracts, p. 66.

5.      Чагров А.В. Логика, не являющаяся ни конечно-значной, ни бесконечно- значной // Труды научно-исследовательского семинара логического центра института философии РАН, вып. XIV, Москва, 2000, с. 59-67.

6.      Рыбаков М.В., Чагров А.В. Стандартные переводы неклассических формул и относительная разрешимость логик // Труды научно-исследовательского семинара логического центра института философии РАН, вып. XIV, Москва, 2000, c. 81-98.

7.      Чагров А.В. Алгоритмическая проблематика в неклассических пропозициональных логиках // Смирновские чтения, III международная конференция, Москва, ИФ РАН, 2001, с. 69-71.

8.      Чагров А.В., Чагрова Л.А. Первопорядковая определимость интуиционистских формул на конечных шкалах крипке: алгоритмический аспект // Смирновские чтения, III международная конференция, Москва, ИФ РАН, 2001, с. 71-75.

9.      Muchnik An., Vereshchagin N. Logical operations and Kolmogorov complexity // Proc. of 16th annual IEEE conference on computational complexity, Chicago, June 2001, pp. 256-265.