Jakub Gałka PhD (Eng) Department of Electronics, AGH pav. D8, r. 626 e-mail: jgalka agh.edu.plphone: +48 12 6173639, SKOS |
| Research | Teaching | Priv |
| Topics | PhD thesis | Publications |
NEWS
|
PS-E
|
PS-IA
|
PS-NS
|
ADRW
|
KGiTM
TM | PSC | INŻ | SPECTRUM | Lab D8-805 |
|
|
Topics of interest | Zainteresowania
Projects | Projekty
|
|
PhD Thesis
Optimization of signal parameterization for Polish speech recognition, Kraków 2008 Abstract This work presents the use of discrete wavelet transform to non-uniform segmentation and parameterization of Polish speech signal, for recognition systems. Non-uniform segmentation is used for extraction of acoustically uniform sub-word units from speech. Both proposed segmentation methods base on discrete wavelet spectral analysis. First one is used for extraction of spectral rate-of-change function which is then used for segment borders’ detection. The second of the algorithms is based on extraction of the event-detection function from the discrete wavelet spectrum of the speech signal. Those events indicate segment borders. Signal parameterization is performed with the use of discrete wavelet decomposition. Best decomposition base (tree) is selected with a new Mean Best Basis algorithm and a new-designed cost function, applied for measuring the concentration of the wavelet-cosine spectrum. Efficacy of proposed methods was evaluated with the use of phone-recognition system and Polish speech corpus – Corpora’97. Segment insertion and deletion rates, and phone recognition rate (PRR) were used as the most important efficacy indicators. Results were obtained with k-NN and HMM classifiers. autoreferat |
|
Publikacje | Publications | Papers (2001-2010)
Wykaz publikacji wg Biblioteki AGH | Publications listed by AGH Library
1. J. Gałka, Analiza niektórych parametrów głosu ludzkiego oraz ich przydatnosci do automatycznego rozpoznawania osób, In XXXVIII Sesja studenckich kół naukowych pionu hutniczego Akademii
Górniczo-Hutniczej, Kraków, 2001
2. M. Ziółko, M. Kępiński, J. Gałka, Wavelet-Fourier analysis of speech signal, In Proceedings of the „Workshop on multimedia communications and services”, Kielce, 2003
3. M. Kępiński, J. Gałka, M. Ziółko, Speech Signals in Wavelet-Fourier Domain, In Proceedings of SASRTL Workshop, Szczyrk, 2003
Abstract The wavelet-Fourier transform, as a new attitude to representation and analysis of a dynamically changeable signal, a specially the speech one is in-troduced. It delivers global characteristic of frequencies’ local changes. This representation is used in phonemes to find the similarities between them. To explore behavior of a wavelet-Fourier transform, four strongly different Pol-ish phonemes have been chosen, namely: E, K, Ś and Ż. Each pho-neme was represented several times and by three different persons. It gives the possibility to analyze almost all cases, in which particular phoneme may resemble the other phoneme or not, in the wavelet-Fourier domain. All pho-nemes have been extracted using a segmentation algorithm based on tempo-rary wavelet power function, which has been applied on a real speech signals. full text PDF
4. J. Gałka, M. Kępiński, Wavelet-Fourier Sectrum Parametrisation For Speech Signal Recognition, In Proceedings of the Tenth National Conference on „Application of Mathematics in biology and medicine, Święty Krzyż, 2004
Abstract Automated speech recognition still is the open problem. Not a many working, efficient solutions have been created yet, especially for Polish and other non-English languages. This article presents solution, how to obtain distinctive parameters of the wavelet-Fourier spectrum of speech signal. These parameters may help to create efficient phonemes-recognising algorithm for Polish and other languages. full text PDF
5. J. Gałka, M. Kępiński, WFT - Context-Sensitive Speech Signal representation, In Proceedings of the Intelligent Information Systems Conference, Ustroń, 2006; Advances in Soft Computing, Springer Verlag, 2006
Abstract Progress of automatic speech recognition systems’ (ASR) develop- ment is, inter alia, made by using signal representation sensitive for more and more sophisticated features. This paper is an overview of our investigation of the new context-sensitive speech signal’s representation, based on wavelet-Fourier transform (WFT), and proposal of it’s quality measures. The paper is divided into 5 sections, introducing as follows: phonetic-acoustic contextuality in speech, basics of WFT, WFT speech signal feature space, feature space quality measures and finally con- clusion of our achievements. full text PDF(Springer) | full text PDF(Google) | send me
6. J. Gałka, Distance Measures for Wavelet representation of Speech, In Proceedings of the Twelfth National Conference on „Application of Mathematics in biology and medicine”, KKZMBM XII, Koninki, 2006
Abstract Dyadic scheme of wavelet signal decomposition leads to a specific division of frequency bands. It is comparable to mel-frequency division and may be used in effective parameterization of speech signal in recognition systems, speech coding or other speech signal based applications. This paper discusses efficiency of different spectral distance measures applied to wavelet-parameterized speech. The presented methods are designated to use in isolated phoneme recognition task. full text PDF
7. B. Ziółko, J. Gałka, S. Manandhar, R. C. Wilson, M. Ziółko, The use of statistics of Polish phonemes in speech recognition, Speech Signal Annotation, Processing and Synthesis SSAPS2006, Poznań, 2006; Speech and Language Technology ed PTFON, IX, 2007
Abstract Statistical data on phonemes, useful in continuous speech recognition system, are presented. This paper explains basics of a simple system for phonemes, diphones and triphones statistics estimation from a text corpus of Polish language. Obtained results are presented for exemplar text database. Possible application of the statistics is suggested. full text PDF(1) | full text PDF(2)
8. B. Ziółko, J. Gałka, S. Manandhar, R. C. Wilson, M. Ziółko, Triphone Statistics for Polish Language, In Speech and Language Technology ed PTFON, IX, 2007
Abstract The Polish text corpus was analysed to find information about phoneme statistics. We were especially interested in triphones as they are commonly used in many speech processing applications like HTK speech recogniser. An attempt to create the full list of triphones for Polish language is presented. A vast amount of phonetically transcribed text was analysed to obtain the frequency of triphone occurrences. A distibution of frequency of triphones occuring and other phenomena are presented. The standard phonetic alphabet for Polish and methods of providing phonetic transcriptions are described. full text PDF(1) | full text PDF(2) | BibTeX
9. J. Gałka, M. Dyrek, B. Ziółko, Measures on Wavelet Segmentation of Speech, In Proceedings of The 8th WSEAS International Conference On Multimedia Systems And Signal Processing, International JournalOf Circuits, Systems And Signal Processing, NAUN 2008
Abstract Speech segmentation is widely used in many speech applications. We propose a new wavelet-based extension of the typical spectrum-based non-uniform speech segmentation methods. The use of wavelets improves computation performance and provides easy and flexible adjusting of algorithm parameters. Segmentation accuracy measures are introduced and applied for evaluation as well. full text PDF(NAUN) | full text PDF, session chairman
10. J. Gałka, B. Ziółko, Study of Performance Evaluation Methods for Non-Uniform Speech Segmentation, In Proceedings of The 8th WSEAS International Conference On Multimedia Systems And Signal Processing, International Journal Of Circuits, Systems And Signal Processing, NAUN 2008
Abstract Speech segmentation is a very difficult problem, because of continuous nature of speech. Segmenting speech into various units (phonemes, syllables, and acoustic atoms) is essential in many applications. Choosing the best method of segmentation must be preceded by evaluation of its performance. This paper is a study of various numerical measures for automatic segmentation performance. full text PDF(NAUN) | full text PDF, session chairman
11. J. Gałka, M. Ziółko, Wavelets in Speech Segmentation, In Proceedings of The 14th IEEE Mediterranean Electrotechnical Conference MELECON 2008, Ajaccio, 2008
Abstract A new event-driven method of speech signals segmentation is presented. The wavelet discrete transform was used for spectral analysis and to create a segmentation procedure. Innovative event detector is the core of the process. Efficiency of the algorithm is tested against the hand annotated speech corpus. full text PDF(IEEEXplore) | IEEE Explore | send me
12. B. Ziolko, S. Manandhar, R. C. Wilson, M. Ziolko, J. Galka, Application of HTK to the Polish Language, In Proceedings of IEEE International Conference on Audio, Language and Image Processing ICALIP2008, Shanghai, 2008, pp.1759-1764
Abstract A speech recognition system based on HTK for Polish is presented. It was trained on 365 utterances, all spoken by 26 males. The features of Polish with respect to speech recognition are described. Some aspects of speech recognition differ in comparison to English. Errors in recognition were analysed in details in an attempt to find reasons and scenarios of wrong recognitions. full text PDF(IEEEXplore) | IEEE Explore | BibTeX | send me
13. M. Ziółko, J. Gałka and T. Drwięga, Wavelet Transform in Speech Segmentation,
In Progress in industrial mathematics at ECMI 2008, Proceedings of European Consortium for Mathematics in Industry, Springer-Verlag, Berlin, Heidelberg, pp. 1073-1078
Abstract A non-uniform speech segmentation method based on discrete wavelet transform is used for the localization of phoneme boundaries. A vector of real values representing the digital speech signal is decomposed into phone-like units by placing segment borders according to the result of the multiresolution analysis. The final decision on localization of boundaries is taken by analysis of the energy flow among the decomposition levels. Distribution-like event functions indicate events, regarded as the segment boundaries. PDF (Springer Link) |Book (Springer Link) | poster PDF | send me
14. B. Ziółko, J. Gałka, M. Ziółko, POLISH PHONEME STATISTICS OBTAINED USING
CYFRONET HIGH PERFORMANCE COMPUTERS, 2009-Mar-26,in press...
Abstract The phonetical statistics were collected from several Polish corpora. The paper presents summarisation of the data which are phoneme n-grams and some phenomena in the statistics . Triphone statistics apply context-dependent speech units which have an important role in speech recognition systems and were never calculated for a large set of Polish written texts. The standard phonetic alphabet for Polish, SAMPA, and methods of providing phonetic transcriptions are described. full text PDF | presentation PDF
15. B. Ziółko, J. Gałka, M. Ziółko, Phonetic Statistics from an Internet Articles
Corpus of Polish Language, Recent Advances in Intelligent Information Systems, pp. 159-172, Academic Publishing EXIT, Warsaw, 2009
Abstract The statistics of Polish phonemes, biphones and triphones were collected from a large Internet articles corpus. The paper presents summarisation of the data and some phenomena in the statistics including a distribution of frequency of triphones occurring. Triphone statistics play an important role in automatic speech recognition systems. They are used to apply context-dependent speech units. The phonetic alphabet for Polish, SAMPA, and methods of providing phonetic transcriptions are described. full text PDF, ISBN 978-83-60434-59-8 | send me
16. B. Ziółko, J. Gałka, M. Ziółko, Polish Speech Recognition, In Jubileusz 90 – lecia Akademii Górniczo-Hutniczej
im. Stanisława Staszica w Krakowie, AGH, Kraków, 2009
Abstract The paper presents research on Polish speech recognition conducted at AGH on segmentation, parameterization applying discrete wavelet transform and acoustic, grammar and semantic modeling. Artykuł podsumowuje badania nad rozpoznawaniem mowy polskiej prowadzone w AGH w zakresie segmentacji, parametryzacji z wykorzystaniem transformacji falkowych oraz modelowania akustycznego, gramatycznego i semantycznego. full text PDF (pl) | poster PDF (pl)
17. B. Ziółko, J. Gałka, M. Ziółko, Phone, diphone and triphone statistics for Polish language, In Proceeding of the 13th International ISCA Conference on Speech and Computer, SPECOM-2009, St Petersburg, 2009
Abstract The statistics of Polish phonemes, diphones and triphones were collected from a large literature corpus. The paper presents summarisation of the data and focuses on interesting phenomena in the statistics. Triphone statistics play an important role in speech recognition systems. They are used to improve the proper transcription of the analysed speech segments. A distribution of frequency of triphones occurring and other phenomena are discussed. SAMPA - the standard phonetic alphabet for Polish and methods of providing phonetic transcriptions are described. full text PDF
18. J. Gałka, M. Ziółko, WAVELET PARAMETERIZATION FOR SPEECH
RECOGNITION, In Proceedings of AN ISCA TUTORIAL AND RESEARCH WORKSHOP ON NON-LINEAR SPEECH PROCESSING NOLISP 2009, Vic, 2009
Abstract Typical parameterization schemes utilize linear prediction or melscaled filter-banks, which are classic windowed DFT based methods. In this paper a new optimized adaptive wavelet parameterization scheme is presented. A novel extension of the Best Basis algorithm is used on wavelet-packet cosine transform (WPCT) instead of typical filter bank. Obtained features are tested using Polish language HMM phone-classifier.
19. B. Ziółko, J. Gałka, M. Ziółko, Phoneme Ngrams Based on a Polish Newspaper Corpus, In Proceedings of The 2009 World Congress in Computer Science, Computer Engineering, and Applied Computing - WORLDCOMP 2009, The 2009 International Conference on
Artificial Intelligence - ICAI 2009, Las Vegas, 2009
Abstract The phonetical statistics of Polish were collected from a newspaper corpus of around 110 000 000 words. The paper presents summarisation of the data which are phoneme ngrams and some phenomena in the statistics including a distribution of frequency of triphones occurring. Triphone statistics apply context-dependent speech units which have an important role in automatic speech recognition systems. The standard phonetic alphabet for Polish, SAMPA, and methods of providing phonetic transcriptions are described. full text PDF
20. J. Gałka, M. Ziółko, Mean Best Basis Algorithm for Wavelet Speech Parameterization, In Proceedings of The Fifth IEEE International Conference on Intelligent Information Hiding and Multimedia Signal Processing - IIHMSP 2009, Kyoto, 2009
Abstract In this paper a new optimized adaptive wavelet parameterization scheme for speech recognition is presented. A novel extension of the Best Basis algorithm is used on wavelet-packet cosine transform (WPCT) instead of typical Mel-scaled filter bank. Obtained features are tested using Polish language HMM phone-classifier. IEEE Explore | send me
21. J. Gałka, M. Ziółko, Best Basis Selection of the Wavelet Packet Cosine
Transform in Speech Analysis, In Proceedings of AFRICON 2009, Nairobi, 2009
Abstract In this paper a new application of the Wavelet Packet Cosine Transform (WPCT), used in the adaptive wavelet parameterization scheme, is presented. This is an extension of the Best Basis algorithm. Obtained optimized wavelet decomposition schemes are used for speech feature extraction and are tested using Polish language hidden Markov model (HMM) phone-classifier. IEEE Explore | send me
22. Bartosz Ziolko, Jakub Galka, Suresh Manandhar, Richard C. Wilson and
Mariusz Ziolko, Triphone Statistics for Polish Language, In LNAI 5603 Human Language
Technology, Challenges of the Information Society
Abstract The Polish text corpus was analysed to find information about phoneme statistics. We were especially interested in triphones as they are commonly used in many speech processing applications like HTK speech recogniser. An attempt to create the full list of triphones for Polish language is presented. A vast amount of phonetically transcribed text was analysed to obtain the frequency of triphone occurrences. A distibution of frequency of triphones occuring and other phenomena are presented. The standard phonetic alphabet for Polish and methods of providing phonetic transcriptions are described. Book PDF (Springer) | Paper PDF (Springer) | Paper PDF
23. Rafał Samborski, Mariusz Ziółko, Bartosz Ziółko and Jakub Gałka,
Speech Extraction from Jammed Signals in Dual-Microphone Systems, In Proceedings of The Seventh IASTED International Conference on
Signal Processing, Pattern Recognition and Applications - SPPRA 2010, Insbruck 2010
Abstract This paper presents two different methods of speech extraction: cross-correlation analysis and adaptive filtering. Algorithms are designed to extract conversations in noisy environment. Such situations can appear in police investigations’ materials or multi-speaker environment. Noise can be added intentionally by suspects or not intentionally (e.g. in a car interior). Both of the algorithms are based on recordings from a dual-microphone system. The presented methods use the small differences between recordings. Algorithms were compared taking SNR improvement and better speech understanding into consideration.
24. J. Gałka, M. Ziółko, Wavelet Speech Feature Extraction Using Mean Best Basis Algorithm,
In Lecture Notes in Computer Science - Advances in Nonlinear Speech Processing, vol. 5933/2010, pp. 128-135, Springer Berlin / Heidelberg, 2010
Abstract This paper presents Mean Best Basis algorithm, an extension of the well known Best Basis Wickerhouser’s method, for an adaptive wavelet decomposition of variable-length signals. A novel approach is used to obtain a decomposition tree of the wavelet-packet cosine hybrid transform for speech signal feature extraction. Obtained features are tested using the Polish language hidden Markov model phone classifier. Book PDF (Springer) | Paper PDF (Springer) | send me
25. M. Ziółko, J. Gałka, B. Ziółko, T. Jadczyk, D. Skurzok, J. Wicijowski, Automatic speech recognition system based on wavelet analysis,
In 2010 IEEE Fourth International Conference on Semantic Computing, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA, 22–24 September 2010, pp. 450-451
Abstract We demonstrate an automatic speech recognition system for Polish continuous speech. As most of the progress in the field is done for English, a few layers of our system are different from popular approaches in this field. These elements of our system could be successfully ported to other languages which share some features with Polish: the speech contains a lot of high-frequency phones (fricatives and plosives) and is highly inflective and non-positional. PDF (IEEE Explore) | send me
26. B. Ziółko, J. Gałka, T. Jadczyk, D. Skurzok, Modified weighted Levenshtein distance in automatic speech recognition,
In Proceedings of the sixteen national conference on Applications of mathematics in biology and medicine, Krynica, 14–18 September 2010, pp. 116-120.
Abstract The paper presents modifications of the well know Levenshtein metric. The suggested improvements result in better automatic speech recognition when Levenshtein metric is applied to compare words from a dictionary and speech recognition hypotheses. It allows to evaluate hypotheses and to choose the word which was actually spoken. PDF | send me
27. M. Ziółko, J. Gałka, B. Ziółko, T. Drwięga, Perceptual wavelet decomposition for speech segmentation,
In Proceedings of INTERSPEECH 2010 - spoken language processing for all, 26–30 September, Makuhari Japan, pp. 2234–2237
Abstract A non-uniform speech segmentation method based on wavelet packet transform is used for the localisation of phoneme boundaries. Eleven subbands are chosen by applying the mean best basis algorithm. Perceptual scale is used for decomposition of speech via Meyer wavelet in the wavelet packet structure. A real valued vector representing the digital speech signal is decomposed into phone-like units by placing segment borders according to the result of the multiresolution analysis. The final decision on localisation of the boundaries is made by analysis of the energy flows among the decomposition levels. PDF@ISCA | send me
28. B. Ziółko, J. Gałka, Polish phones statistics,
In Proceedings of Computational linguistics – applications, 2010 - international multiconference on Computer science and information technology, Wisła, October 18–20, 2010. pp. 71–75
Abstract The paper analyzes multiple noun expressions, as part of the implementation of the Ontological Semantic Technology, which uses the lexicon, ontology and semantic text analyzer to access the meaning of text. Because the analysis and results depend on the lexical senses of words, general principles of lexical acquisition are discussed. The success in interpretation and classification of such expressions is demonstrated on 100 randomly selected sequences. PDF | send me
29. B. Ziółko, D. Skurzok, J. Gałka, M. Ziółko, Speech modelling based on phone statistics,
In Proceedings of the fifth international multi-conference on Computing in the global information technology, 20–25 September 2010, Valencia, Spain, pp. 189-194
Abstract The statistics of Polish phones, biphones and triphones were collected from several corpora. The paper presents summarisation of the data and some statistics phenomena including a distribution of frequency of biphones and triphones occurring. The model applying these statistics in speech recognition is presented as well. PDF (IEEE Explore) | send me
30. B. Ziółko, J. Gałka, D. Skurzok, Speech modelling using phoneme segmentation and modified weighted levenshtein distance,
In Proceedings of ICALIP 2010 - International Conference on Audio, Language and Image Processing, November 23–25, 2010, Shanghai, China, Vol. 1, pp. 743-746
Abstract A method of choosing a word hypothesis from a dictionary of a speech recognition system is presented. The method applies a modified weighted Levenshtein distance for better accuracy. The distance is counted between phonetic transcriptions of a string of phonemes received from a classifier and of a dictionary. It allows efficient conducting of speech classifying task. PDF | send me
31. R. SAMBORSKI, M. ZIÓŁKO, B. ZIÓŁKO, J. GAŁKA, Wiener filtration for speech extraction from the intentionally corrupted signals,
In Proceedings of 2010 IEEE International Symposium on Industrial Electronics, Bari, Italy, 2010, pp. 1698-1701
Abstract This paper suggests a speech enhancement approach to an eavesdropping audio system. Speech signal is disturbed by non-stochastic noise. The algorithm is based on recordings from dual-microphone system. The Wiener ?lter was applied for speech extraction. The algorithm is designed to capture dialogues in noisy environment as well. It uses the small differences between recordings. The differences in speaker and the source of noise localisation together with differences in spectra, enable us to split both signals. PDF (IEEE Explore) | send me
32. J. GAŁKA, T. JADCZYK, M. ZIÓŁKO, Zastosowanie psychoakustycznej falkowej ekstrakcji cech oraz ilorazowej miary odległosci Itakura-Saito w rozpoznawaniu mowy polskiej / Wavelet perceptual feature extraction and quotient Itakura-Saito distance measure for Polish speech recognition,
In Bio-Algorithms and Med-Systems, Jagiellonian University, Medical College, ISSN 1895-9091, 2010, vol. 6, no. 12, pp. 73-74
Abstract Polish speech recognition is a developing science and business area of interest. No consumer-ready computer application for large vocabulary polish speech recognition has been published yet. However, many efforts have been recently made to change this situation. This paper presents some details on speech processing front-end for large vocabulary polish speech recognition system. A new way of extraction and comparison of the acoustically relevant speech features are presented. Speech signal is divided into acoustically uniform frames, which are then parameterized with a perceptual multi-resolution wavelet analysis. Obtained feature vectors can be classified and used for speech and speaker recognition, vocal tract diagnostics, speech coding or compression. In presented work a k-NN classifier utilizes a newly introduced quotient symmetrized Itakura-Saito spectral distance measure to illustrate the ability of proposed method to classify phone-like speech units. Presented methods were implemented in multi-thread real-time recognition system with a dictionary of about 10000 most commonly occurring polish words. Well known Dijkstra's algorithm was used as a word-level phonetic decoder. The use of the modified quotient Itakura-Saito distance measure increased the word recognition ratio by about 30% in comparison to a classical symmetrized Itakura-Saito or Euclidean distance measures. PDF | send me If You are unable to download a file, let me know (jgalka agh.edu.pl), I will send You a copy.Jeżeli nie możesz zapisać pliku PDF, wyslij email (jgalka agh.edu.pl) z tytułem. |
|