Sector 1.1
Sector of Computer Logic in Information
Processes
Head of the Sector – Dr.Sc.
(Mathematics), Prof. Vassily Lyubetsky
Tel.: (095) 299-83-54, (095)
413-46-43; E-mail: lyubetsk@iitp.ru
The leading researchers of the
laboratory include:
Dr.Sc. (Math.) |
A. Chagrov |
Dr.Sc. (Math.) |
N. Vereshchagin |
Dr.Sc. (Biol.) |
M. Gelfand |
Dr. |
V. V'yugin |
Dr.Sc. (Math.) |
P. Golubtsov |
Dr. |
K. Gorbunov |
DIRECTIONS OF ACTIVITY:
·
construction of algorithms of
acceptable complexity (quadratic or close) for analysis of regulation on the
DNA and RNA levels: search for regulatory signals and given patterns in genomic
sequences, construction of conservative and alternative secondary structures.
On this basis application of these algorithms for analysis of regulatory sites
in complex interacting regulatory systems.
·
construction of algorithms of
acceptable complexity (quadratic or close) for construction of reconciled
phylogenetic trees and application these algorithms for construction of a
species tree and analysis of evolutionary events on the level of genes.
· investigation the concept of information interaction, effective algebraic description of objects on the basis of the theories: model completeness, modal logic, information transformers categories, algorithmic complexity and randomness.
The previously suggested algorithm for identification
of regulatory signals (binding sites of transcription factors) in bacterial
genomes was optimized and parallelized. The algorithm accepts as input a sample
of genome fragments (strings in the four-letter alphabet) believed to contain
such signals (that is, a collection of candidate regulatory regions) and
constructs the distribution of scores measuring the likelihood that a word
representsed
the desired signal. This is a classical and yet unresolved problems of the computational molecular biology. It
gained additional importance during last years after commencing of global
experiments on the analysis of genome expression and publication of complete
genomes of many bacteria and some eukaryotes, which made possible the
comparative analysis of regulation processes. The algorithm was implemented on
a multiprocessor (cluster) system. This allowed for processing samples
containing millions of nuclotides, which is not possible with a sequential
algorithm.
A previously suggested algorithm (of approximately
linear complexity) for finding alternative RNA secondary structures (pairs of
mutually exclusive hairpins) was improved. Computer programs for prediction of
alternative and conserved RNA structures were developed.
An earlier developed algorithm (of approximately
linear complexity) for construction of a reconciled tree («consensus» or
averaged tree) given a set of protein family trees. This consensus tree
represents a phylogenetic tree for the genomes encoding the corresponding
proteins. This is a traditional problem in the theory of molecular evolution.
The algorithm was implemented and applied to analysis of several species, and
also groups of evolution trees of mitochondrial proteins and clusters of orthologous
genes.
In the continued collaboration with
Integrated Genomics, Inc., several specific regulatory systems in bacterial
genomes were analyzed. Specifically, the following regulatory systems have been
analyzed: arginine regulon in many taxa, multiple drug resistance systems in
enteric bacteria, biosynthesis of aromatic amino acids and utilization of
pentoses, iron utilization (FUR) and FNR-regulons in gamma-proteobacteria,
SOS-response and utilization of pentoses in Gram-positive bacteria. This
resulted in identification of a large number of new regulated genes and in some
cases in identification of new regulatory signals.
In the same collaboration, the work
on creation of algorithms for gene recognition in eukaryotes was continued.
Programs for gene recognition in the presence of sequencing errors and for gene
recognition by comparison of genomic sequences were developed. New cases of
brain-specific splicing were analyzed and corresponding regulatory signals were
predicted.
A program searching for RNA
secondary structures given a pattern was developed. It was used to identify
regulatory sites (RFN-elements) upstream of riboflavin synthesis and transport
genes in many bacterial genomes. A new mechanism for riboflavin regulation and
riboflavin transport was suggested.
Regulatory RNA
secondary structures (T-boxes) were predicted upstream of genes encoding
aa-tRNA-synthetases and amino acid biosynthesis enzymes in Gram-positive
bacteria. The family of T-box-regulated genes was extended.
Metabolism of
fatty acids was studied in Escherichia
coli, Haemophilus influenza, Vibrio cholerae, Yersinia pestis. FadR was
shown to regulate all stages of degradation of long-chain fatty acids and
partially biosynthesis of fatty acids. The acyl-CoA-dehydrogenase gene yafH was shown to be identical to the
gene described earlier as fadE. New
members of the FadR-regulon were found.
The algorithms for signal
identification was successfully applied for identification of palindromic 20 bp
signal of purine regulation in Р.
horikoshii, and the trehalose regulator signal, as well as non-palindromic
signals for MarX in Е. coli and CggR
signals in a group pf related bacteria В.
anthracis, В. subtilis, В. halodurans. Orthologous sets of genes
in the E. coli and B. subtilis groups were analyzed; the
results were compared with the existing collections DPInteract and DBTBS. New
signals were given to biologists for expaerimental
verification.
The previously suggested algorithm
for analysis of overlapping RNA stem-loops was improved. Such alternative
structures serve as regulatory signals of transcription and translation. Such
parameters as energy of specific base pairs and the lengths of overlaps were
taken into account. The new version is based on creation of a list of
stem-loops (filtered by the free energy) and consequent formation of overlaps.
In a number of biologically relevant test cases the obtained co-ordinates of
loops were shown to match experimental data with high reliability. These
results are used for identification of biologically relevant overlaps.
An algorithm for identification of
conserved RNA structures was developed and implemented in software. It is based
on alignment of hairpins for each pair of fragments in the sample. It allows
for filtering of spurious hairpins. A specific version for the cloverleaf
conformation was successfully tested on the tRNAs of E. coli.
The algorithms
of phylogenetic analysis of genomes based on individual gene trees were
modified for the analysis of horizontal transfer of genes. This version is
based on the analysis of differences between the reconciled tree and the
individual gene trees. The program implementing this algorithm was applied to
the analysis of 42 bacterial genomes represented in 128 clusters of orthologous
genes from the COG database (National Center of Biotechnology Information, USA)
The properties of the notion of
«knowledge» were described by using formal logical means. To this end the first
order language was enriched by the operator k(i,s)
interpreted as «anthe agent i knows a fact s». It was proven that various theories obtained in this way are
finitely axiomatizable and their complexity is the same as for that of the classical propositional
calculus.
A way to describe informational interactions based on the formalism of a logic of knowledge (a version of the polymodal logic) was found. Minimal models (in the predicative and individual senses) of the obtained logic were found. An example of a minimal model is a model where an agent of an informational interaction is represented by a structured family of pairs «a model – a bunch of statements». One can describe in this terms several aspects of some specific kinds of informational interaction in the course of a child mental development, oral communications, etc.
The problem of construction of optimal strategies for games in stochastic dynamical systems with discrete time was investigated. This problem lies in the junction of nonantagonistic game theory and optimal control theory. Special attention was given to the cases of incomplete, or even, asymmetric information about the system. It was shown that additional information can have a negative impact on the game results. An algorithm of solving the problem was suggested and implemented in computer program. On the basis of this program a series of numerical simulations were performed (in particular, for the problem of marine fishery).
There where developed methods for construction of an optimum measurement-computer system, based on an imprecisely specified invariant measurement system, for the cases continuous-discrete, or parametrically specified measurement system. Methods of optimal calibration of these kind of invariant measurement systems were studied.
Study of general properties of categories of information transformers (ITs) was continued. It was shown that every category of ITs is a monoidal category of a special kind. Many basic categories of ITs, e.g., categories of stochastic, fuzzy, multivalued ITs have the structure of Kleisly category. The key concept in construction of such categories is a functor T that takes an object A to an object TA of «all distributions on A». In this connection the problem of endowing of a Kleisly category with a structure of monoidal category was studied. It was shown that sufficient conditions for this are existence of a natural transformation which takes a pair of «distributions» to an «independent joint distribution».
The enrichment of the first order language of the field of real numbers by a unary functional symbol f was studied. The goal was to solve a known problem of separation by a formula of this language of two given families of functions (the formula should be true when a function from the first family is substituted for f, and false when a function from the second family is substituted). It was proven that for any natural i, the family of functions having the derivative of the i-th order is separable from its complement. A conjecture was made that the family of all polynomials is inseparable from the family of all functions «glued» using infinitely many different polynomials, and from its complement in the class of functions glued from finitely many polynomials. Several particular cases of this conjecture were solved.
It was found that many NP-complete modal logics, even some non-local-tabular ones, become P-complete after bounding the number of propositional variables being used, i.e., «behave» like the classical propositional logic: for them the condition of satisfiability can be verified by a non-enumeration method. Among these logics there is an important logic S4.3.
A new inequality for Shannon entropy of discrete random variables was found (all the result listed below are joint with some mathematicians from Moscow Lomonosov University and from abroad).
A new proof of the Ahlswede-Gacs-Koerner theorem on the absence of common information it the outcomes of two correlated information sources was found.
It was proven that any propositional formula such that for any arithmetical formulas substituted for its variables there is a realization (in Kleene's sense) of the resulting formula whose complexity is a little «o» of the sum of the minimal complexities of substituted formulas, is derivable in the Yankov's logic (the converse is true also).
It was proven that given a program enumerating a small set of binary strings of the same length one can found a small family of betting strategies such that if the sequence of coin tossing outcomes belongs to the given set then at least one strategy wins a significant amount.
A description of all possible shapes of A. N. Kolmogorov structure functions was found, a key result in the algorithmic statistics.
It was shown that given the set of random strings as an oracle one can find in time polynomial in n a random string of length n.
Some results of instability of several ergodic theorems with respect to small growth of the randomness deficiency were proven.
The department members participated in the following symposiums:
- Howard Hughes Medical Institute, 2001 Meeting of International Research Scholars, Vancouver, June 2001 (invited talk);
- NATO advanced studies institute, Artificial intelligence and heuristic methods for bioinformatics, San Miniato, Italy, October 2001 (invited talk);
- The III international conference «Control and complex systems modeling problems», Samara-2001 (invited talk);
- The III international Smirnov memorial conference, Moscow, 2001 (invited talk);
- The VI all-Russia scientific meeting «Modern logic: problems of theory, history and application in science'', Saint-Petersburg, June 22--24, 2000 (invited talk);
- XVI Annual IEEE conference on Computational Complexity, Chicago, June 2001;
- XVIII Annual Symposium on Theoretical Aspects of Computer Sciences, Dresden, February 2001 (a Program Committee member);
- An international meeting on occasion of 100th anniversary of academician P. S. Novikov, Moscow, August 27-31, 2001 (invited talks).
The department members visited several universities and scientific centers to give courses and to collaborate (Lawrence Berkeley Laboratory, University of California-San Diego, University of Montana, Marseille, Amsterdam, Wien, Athens).
1.
Данилова Л.В.,
Горбунов К.Ю., Гельфанд М.С., Любецкий В.А. Алгоритм выделения регуляторных
сигналов в последовательностях ДНК (1) // Электронный научный журнал
«Информационные процессы» (http://www.jip.ru/).
2001, т. 1, № 1, с. 56-63.
2.
Гельфанд М.С., Вьюгин
В.В., Любецкий В.А. Об одном способе построения деревьев эволюции видов по
множественным генетическим данным // Электронный научный журнал «Информационные
процессы» (http://www.jip.ru/). 2001, т. 1, № 1, с. 64-77.
3.
Данилова Л.В.,
Горбунов К.Ю., Гельфанд М.С., Любецкий В.А. Алгоритм выделения регуляторных
сигналов в последовательностях ДНК (2) // Молекулярная биология. 2001, т. 35, №
6, с. 987-995.
4.
Вьюгин В.В., Любецкий
В.А. Об одном алгоритме поиска горизонтального переноса гена на основе
филогенетических деревьев белков // Электронный научный журнал «Информационные
процессы» (http://www.jip.ru/). 2001, т. 1, № 2, с. 167-177.
5.
Любецкий В.А.,
Селиверстов А.В. Об одном вероятностном алгоритме решения NP-полной проблемы //
Труды III международной конференции «Смирновские чтения», РАН, Москва, 2001, с.
47-49.
6.
Данилова Л.В.,
Любецкий В.А. Алгоритм выделения регуляторных сигналов: тестирование и
биологические применения // Труды III международной конференции «Проблемы
управления и моделирования в сложных системах», Самара, РАН, 2001, с. 632-634.
7.
Кузнецов Н.А.,
Любецкий В.А., Чернавский А.В. К вопросу о понятии информационного
взаимодействия, 2: доречевой интеллект // Труды III международной конференции
«Проблемы управления и моделирования в сложных системах», Самара, РАН, 2001, с.
25-42.
8.
Горбунов И.А., Рыбаков М.Н. Выразительность
операторов знания и возможность эффективного описания логик знания // Труды III
международной конференции «Проблемы управления и моделирования в сложных
системах», Самара, РАН, 2001, с. 617-622.
9.
Голубцов П.В.,
Старикова О.В. Инвариантность и калибровка измерительно-вычислительных систем в
непрерывно-дискретном случае // Труды III международной конференции «Проблемы
управления и моделирования в сложных системах», Самара, РАН, 2001, с. 635-641.
10. Витрещак А.Г., Гельфанд М.С. Предсказание вторичной
структуры РНК. Программа RNApattern – поиск вторичной структуры по паттерну //
Труды III международной конференции «Проблемы управления и моделирования в сложных
системах», Самара, РАН, 2001, с. 623-625.
11. Чагров А.В. К логическим проблемам семантических аспектов
информационного взаимодействия // Труды III международной конференции «Проблемы
управления и моделирования в сложных системах», Самара, РАН, 2001, с. 611-616.
12. Горбунов К.Ю, Любецкая Е.В, Любецкий В.А. О двух алгоритмах
поиска альтернативной вторичной структуры РНК // Электронный научный журнал
«Информационные процессы» (http://www.jip.ru/). 2001, т. 1, № 2, с. 178-187.
13. Mironov A.A, Novichkov P.S., Gelfand
M.S. Pro-Frame: similarity-based gene recognition in eukaryotic DNA sequences
with errors // Bioinformatics. 2001, vol. 17, no. 1, pp. 13-15.
14. Rodionov D.A., Gelfand M.S., Mironov
A.A., Rakhmaninova A.B. Comparative approach to analysis of regulation in complete
genomes: multidrug resistance systems in gamma-proteobacteria // Journal of
Molecular Microbiology and Biotechnology. 2001, vol. 3, no. 2, pp. 319-324.
15. Makarova K.S., Mironov A.A., Gelfand
M.S. Conservation of the arginine repressor DNA binding signal in all bacterial
lineages // Genome Biology. 2001, vol. 2, no. 4, pp. research 0013.1-0013.8.
16. Panina E.M., Vitreschak A.G.,
Mironov A.A., Gelfand M.S. Regulation of aromatic amino acid biosynthesis in
gamma-proteobacteria // Journal of Molecular Microbiology and Biotechnology.
2001, vol. 3, no. 4, pp. 529-543.
17. Brudno M., Gelfand M.S., Spengler
S., Zorn M., Dubchak I., Conboy J.G. Computational analysis of candidate intron
regulatory elements for tissue-specific alternative pre-mRNA splicing //
Nucleic Acids Research, 2001, vol. 29, no. 11, pp. 2338-2348.
18. Боринская С.А., Гельфанд М.С., Миронов А.А. Компьютерная
геномика: в поисках генов // Химия и жизнь. 2001, № 2, с. 36-40.
19. Гельфанд М.С. Биология in silico // Компьютерра. 2001, №
36(413), с. 20-22.
20. Гельфанд М.С. Аннотация геномов – от последовательности к
функции // Компьютерра. 2001, № 36(413), с. 22-25.
21. Горбунов К.Ю. Об одной модели информационной зависимости в
алгебраической геометрии // Электронный научный журнал «Информационные процессы»
(http://www.jip.ru/). 2001, т. 1, № 2, с.
147-149.
22. Голубцов П.В., Старикова О.В. Калибровка инвариантных
преобразователей информации // Электронный научный журнал «Информационные процессы»
(http://www.jip.ru/). 2001, т. 1, № 1, с.
78-88.
23.
Голубцов П.В., Старикова О.В. Редукция параметрически заданных инвариантных
измерительно-вычислительных систем // Вестник Московского Университета, Сер. 3
"Физика и астрономия". 2001, № 6, с. 3-6.
24. Вьюгин В.В. О неустойчивости индивидуальной эргодической
теоремы // Проблемы передачи информации. 2001, т. 37, № 2, с. 27-39.
25. V'yugin V.V.
Most sequences are stochastic // Information and Computation. 2001, vol. 169, pp. 252-263.
26. Рыбаков М.Н. Операторы всеобщего и распределенного знания:
дополнительные выразительные средства в логиках знания // Электронный научный
журнал «Информационные процессы» (http://www.jip.ru/).
2001, т. 1, № 1, с. 89-98.
27. Chagrov A.V. All tabular generally
post-complete extensions of k4 have interpolation property // Information
Processes (http://www.jip.ru/). 2001, т. 1, № 1, с. 50-55.
28. Zakharyaschev M., Wolter F., Chagrov
A. Advanced modal logic. – In Gabbay D.M., Guenthner F. (eds.). Handbook of
philosophical logic, 2nd edition, vol. 3, Kluver academic publishers, 2001, pp.
83-266.
29. Shen A., Vereshchagin N. Logical
operations and kolmogorov complexity // Electronical colloquium on
computational complexity. 2001, 5 pp., Ftp://ftp.eccc.uni-trier.de/pub/eccc/reports/1002/tr01-088.
30. Vereshchagin N. Kolmogorov
complexity conditional to large integers // Electronical colloquium on
computational complexity. 2001, 11 pp., Ftp://ftp.eccc.uni-trier.de/pub/eccc/reports/1002/tr01-086.
31. Vereshchagin N. An enumerable
undecidable set with low prefix complexity: a simplified proof /. Electronical
colloquium on computational complexity. 2001, 4 pp., Ftp://ftp.eccc.uni-trier.de/pub/eccc/reports/1002/tr01-083
1.
Вьюгин В.В., Гельфанд
М.С., Любецкий В.А. Согласование деревьев: реконструкция эволюции видов по
филогенетическим деревьям генов // Молекулярная биология, 25 с.
2. Novichkov P.S.,
Gelfand M.S., Mironov A.A. Gene recognition in eukaryotic DNA by comparison of genomic
sequences // Bioinformatics.
3. Rodionov
D.A., Mironov A.A, Gelfand M.S. Transcriptional regulation of pentose
utilization systems in the Bacillus/Clostridium group of bacteria // FEMS Microbiol.
Lett.
4. Laikova
O.N., Mironov A.A., Gelfand M.S. Transcriptional regulation of pentose utilization
systems in gamma-proteobacteria // FEMS Microbiol. Lett.
5.
Panina E.M., Mironov A.A., Gelfand M.S. Comparative analysis
of FUR regulons in gamma-proteobacteria // Nucleic Acids Res. 2001, vol. 29,
no. 23.
6. Ponomarenko
J.V., Orlova G.V., Frolov A.S, Ponomarenko M.P., Gelfand M.S. SELEX_DB, a
database on in vitro selected oligomers adapted for recognizing natural sites
and for analyzing both SNPs and site-directed mutagenesis data // Nucleic Acids
Res. 2002, vol. 30, no. 1.
7. Permina
E.A., Mironov A.A., Gelfand M.S. Damage-repair error-prone polymerase of
bacteria: association with mobile genome elements // Gene. 2001 (accepted).
8.
Герасимова А.В.,
Родионов Д.А., Миронов А.А., Гельфанд М.С. Компьютерный анализ регуляторных
сигналов в бактериальных геномах. Участки связывания Fnr. Молекулярная
биология, 2001, в печати.
9.
Голубцов П.В, Старикова О.В. Учет инвариантности в задаче калибровки
инвариантных измерительно-вычислительных систем. Математическое моделирование
(в печати, выход февраль-март 2002 г.).
10. Садовская Н.С, Лайкова О.Н., Миронов А.А., Гельфанд М.С.
Изучение регуляции метаболизма длинноцепочечных жирных кислот с использованием
компьютерного анализа полных бактериальных геномов. Молекулярная биология, 2001
(в печати).
1. Vitreschak A.A.
Computer analysis of regulation of genes, encoding aminoacyltRNA synthetases
and amino acid biosynthetic proteins in Gram positive bacteria: T-box RNA
regulatory element. Prediction of regulation of new genes, including amino acid
transporters. – In the abstracts of International School «Artifical Intelligence
and Heuristic Methods for Bioinformatics», Italy, San-Miniato, October 1-11,
2001, p. 63.
2. Gelfand M.S.,
Laikova O., Mironov A.A., Novichkov P.S., Panina E.M., Rodionov D.A.,
Vitreschak A.G. Comparative analysis of bacterial patterns. – In the abstracts
of Meeting of HHMI International Research Scholars, Vancouver, Canada, June
20-23, 2001, p. 75.
3.
Голубцов П.В., Старикова О.В. Оптимальная калибровка инвариантных
измерительно-вычислительных систем в случае параметрической информации о схеме
измерения // VII конференция «Обратные и некорректно поставленные задачи», Москва,
2001, с. 19.
4.
Golubtsov P.V,
Starikova O.V. Invariance Considerations in Problems of Synthesis and Optimal
Calibration of Measurement Computer System // First SIAM-EMS Conference Applied
Mathematics in our Changing World, Berlin 2001, Collection of Abstracts, p. 66.
5.
Чагров А.В. Логика,
не являющаяся ни конечно-значной, ни бесконечно- значной // Труды
научно-исследовательского семинара логического центра института философии РАН,
вып. XIV, Москва, 2000, с. 59-67.
6.
Рыбаков М.В., Чагров
А.В. Стандартные переводы неклассических формул и относительная разрешимость
логик // Труды научно-исследовательского семинара логического центра института
философии РАН, вып. XIV, Москва, 2000, c. 81-98.
7.
Чагров А.В.
Алгоритмическая проблематика в неклассических пропозициональных логиках //
Смирновские чтения, III международная конференция, Москва, ИФ РАН, 2001, с.
69-71.
8.
Чагров А.В., Чагрова
Л.А. Первопорядковая определимость интуиционистских формул на конечных шкалах
крипке: алгоритмический аспект // Смирновские чтения, III международная конференция,
Москва, ИФ РАН, 2001, с. 71-75.
9. Muchnik An.,
Vereshchagin N. Logical operations and Kolmogorov complexity // Proc. of 16th
annual IEEE conference on computational complexity, Chicago, June 2001, pp.
256-265.