The method employed to obtain the pronunciation repre- sensation is based on visualization through PC display informa- tion concerning the similarities between polytonal wavelets modulation spectra of current speech signal and nearest etalon from some memorized set of pronunciation etalones.
The Purpose of researches - revealing of optimum ways of subject interaction with training computer at computer assisted pronunciation training to foreign languages through the multime- dia virtual environment,that admits simultaneous visualization of synthetic images of templates pronounced by announcers in perfection by of by possessing investigated language and synthe- tic images sounds pronounced by subject together with displayed by instructions and recommendations by formed training program.
The complexity of problems arising thus, in comparison with conventional approaches of fixture by tests and misses to excite the formation of current spectra, similar to sonogrammes, resul- ted by successful examples of selected words and phrases of anot- her's language pronunciations, is connected with necessity of choosing the most of successful attempt from dialogue of person and machine, as at training with templates, as at imitations of foreign speech sounding on memory.
The first circle of technological problems - choosing of interacting senses, since the interface of virtual environment is not obliged to limit man-machine interaction by sounds and displays of faces as for lips reading , but can appeal to vibra- tory,or to electrosomatical or, even, to not yet inspected sen- ses of person.
Second circles of technological problems - choosing of dimensions of signals and codes of their consecutive discrete displays by transcoding,since the range is here utterly wide, beginning from arrays of binary codes of sequences of samples, further through phonemes and syllables units down to gradations of significances.
In proposed approach system uses polytonally modulated for- mant characteristics of personal utterances to calculation of in- dividual physiological articulation abilities parameters. Synthe- sized in multimedia templates will take into account these indi- vidual wavelets parameters, should increase the training perfor- mance and reduce computational expenditures.
The hardware of first stage of experiments can require ma- nufacturing of several of unique designs of masks with conver- ters built - in in them of forms of representation of informati- on. The converters will be connected to computer; the computer on this stage also can be required enough powerful one.
At second stage, after processing of results first, will be
directed on preparation hardware and software for mass personal
computers. Walsh wavelets polytonal modulation spectra filtered
througth two layers perceptron , some of recieptive fields are
shown below.In these pictures numbers means current numbers in
recieptive field , encircled elements are recieptive fields ele-
ments whose modulation products are collected from. At the second
perceptron layer only ganglions collected more than their neigh-
bors recieve marking correspondent to domination recieptive field
nomber,otherwise them not marked by any integer.
__________________________ __________________________
| _____ | | __________ |
| | 13 | 14 15 16 | | 13 14 | 15 16 | |
| | | | | _____|__________| |
| | 9 | 10 11 12 | | 9 | 10 | 11 12 |
| |_____|_____ | | ____|_____| |
| 5 | 6 | 7 8 | | | 5 | 6 7 8 |
| |_____|__________ | | | | |
| 1 2 | 3 4 | | | | 1 | 2 3 4 |
| |__________| | | |____| |
|__________________________| |__________________________|
1-rst recieptive field 2-nd recieptive field
__________________________ __________________________
| ____ | | __________ |
| 13 14 15 | 16 | | | | 13 14 | 15 16 |
| | | | | |__________|_____ |
| 9 10 11 | 12 | | | 9 10 | 11 | 12 |
| ___|____| | | |_____|_____ |
| 5 6 | 7 | 8 | | 5 6 7 | 8 | |
| ___________|___| | | | | |
| | 1 2 | 3 4 | | 1 2 3 | 4 | |
| |___________| | | |_____| |
|__________________________| |__________________________|
3-rd recieptive field 4-rth recieptive field
There are two examples so processed modulation spectral rep-
resentation showh forth. First represent short russian phrase
[lena mila malinu] pronounced by male, second - short franch one
"maurice berce son enfant" pronounced by female. Cells collected
modulation products are smaller then threshold marked as "#", up-
per line consist of intensities accumulated through different
walsh functions and different tones,taken from speakers register.
lines at the middle indicates selected fields nombers those up
correspondent thresholds and by sign"#" for those are lower than
thresholds.Analogously at the down part of table represent modu-
laton products collected from different tones.
Inten. 3321121133224223422321431##
Reasons of Walsh
spectral representa- ###########################
tons are connected Walsh ###########################
not only with their ###########################
abilities to accele- ##1######55##5#####1#######
rate calculations 7#4#5577#127427##5#77#75###
but with abilities 6###47#####2#######77##5###
some of them to ###########################
represent of dis- Spect. ###########################
continuos speech ###########################
acoustical events ###########################
firstly described ###########################
in ULB R.A.12/1 [1] ###########################
These methods of ###########################
representation dis- ###########################
continuos speech Hight ###########################
acoustical events ###########################
described in detail ###########################
in software package T ###########################
and in patent[3,2]. O #######3#2#################
N #########5#################
E ######3##1#################
S #####4#####################
#####7###########5#########
###########################
Low ###########################
5##########################
beg. TIME end
Images recieved after acoustic signal processing are compa-
red with images formed before and saved in hard disc memory. Ne-
arest one , selected by features displayed together with the new
one. That gives the possibility of package user to estimate his
pronunciation samples in comparing with etalones ones.
Int. 1#########1###############
In the second example the da-
########################## ta representation is the same
Walsh ########################## only tones range is shifted
########################## according to the higher voice
#####5##1#####1########### of the dictor. Speaker Veri-
74###5#84#8527############ fication Package used by the
##264##377562############# philologist Khatemlyanskaya
########2646############## H.A. in her research concer-
########################## ning main of the French con-
########################## sonant system. We tried to
Spect.##5####################### lead experimental phonetic
4#5####################### research for the determining
742####################### the prosody factor importan-
ce because the above-mentio-
########################## ned factor influences on as-
Hight ########################## similation process. The ex-
########################## periment data testify that
T ########################## our Speaker Verification Pa-
O ########################## ckage could be used in com-
N ########################## plex experiment phonetic re-
E ########################## search work,e.g.what place
S ########################## the prosody factor occupies
########################## in assimillation process, or
########################## what the varietiveness of
########################## consonants is.
##5#######################
##########################
Low ##########3###############
######5##3#7##############
2##3######26##############
beg. TIME end
At detailed planning of these stages the experience of French researchers in relations cognitive attitudes in situation problem solving will also be used. Their researches have been undertaken last year in Institut de la communication parlee in Grenoble ( Les Cahiers de l'ICP Rapport de Recherche N 3, Atti- tude cognitives et actes de langage en situation de communicati- on homme / machine Jean Caelen, Anne-Lise Frechet p.137-152 ). The results of swedish researchers ( Bjorn Granstrom et al, Ro- yal Institute of Technology ( KTH ), STL-QPSR Oct.15,1994, p.93-111 ) will be also used.
It should be notice, however, that stability of perception
and prerecognition of these intermediate messages, generated at
continuous decreasing of threshold levels of transmitted signals
modulations, should serve that by base at choosing of signal di-
mensions of intermediate displays. That displays are suitable
for construction of messages to be recognized, as it was obser-
ved in mass experience on listening of speech of separate anno-
uncer, allocated at intensive noisy background. Such choosing
corresponds to perception law by N.N.LANGE, already successfully
used at processing of optical information with fixation prere-
cognition images.
The learning of speaking skill can be multimedially model-
led as cognitive psychology events by specifying how neural tis-
sue carries out computations while students attempt to imitate
short phrase pronunciation acoustically reproduced by computer
as its template. Visual cues of students pronunciation presented
on display screen together with template features are an impor-
tant factor in sensed similarity judgments of pronounced senten-
ces to its templates using graphics at walsh polytonally modula-
ted walsh wavelets . The problem is to be investigated how to
find the best cue for optic/acoustic/articulative students acti-
vity on its memorizing while education and on remembering while
examination.
The researches on cognitive multimedia model for the
computer assisted learning of a speaking skill are planned to
fulfill in Laboratory of data analysis,error correction codes
and cryptology IPPI RAN together with researchers from Moscow
Pedagogical University and these researches are opened for any
scientific cooperation with western researchers .
References
[Current Projects] [IITP, The Homepage]