Booz Allen Colloquium: "Automatic Speech Recognition: Trials, Tribulations and Triumphs," S. Furui
Booz Allen Hamilton Distinguished Colloquium in Electrical and Computer Engineering
"Automatic Speech Recognition: Trials, Tribulations and Triumphs"
Prof. Sadaoki Furui
Tokyo Institute of Technology
Automatic speech recognition (ASR) technology has made remarkable progress over the last 20-30 years. ASR represents the state-of-the art in terms of simulating some aspects of human cognition and, although ASR systems are yet imperfect, it is quite impressive that they work as well as they do. There remain however, many difficult issues and challenges at every level. In most ASR applications, computers still make 5-10 times more errors than human subjects. One of the most significant differences exists in that human subjects are far more flexible and adaptive than machines in response to various variations of speech, including individuality, speaking style, additive noise, and channel distortions. For this reason, there are as yet only a handful of good applications, and they are generally limited in terms of their domain, and the conditions they may be used under. How to train and adapt statistical models for ASR using limited amounts of data is one of the most important research issues. Future systems need to have an efficient way of representing, storing, and retrieving various knowledge resources required for natural spoken conversation.
Sadaoki Furui received B.S., M.S., and Ph.D. degrees in mathematical engineering and instrumentation physics from Tokyo University, Tokyo, Japan in 1968, 1970, and 1978, respectively.
He joined the Electrical Communications Laboratories of Nippon Telegraph and Telephone (NTT) Corporation in 1970, and later served as a Research Fellow and the Director of the Furui Research Laboratory at NTT Human Interface Laboratories, from 1991 to 1997. He is currently a Professor of the Department of Computer Science, Graduate School of Information Science and Engineering, Tokyo Institute of Technology. He has also served as Dean of the Graduate School of Information Science and Engineering from 2007 to 2009, and is now serving as Director of Institute Library.
His research interests include analysis of speaker characterization information in speech waves and its application to speaker recognition as well as interspeaker normalization and adaptation in speech recognition. He is also interested in vector-quantization-based speech recognition algorithms, spectral dynamic features for speech recognition, speech recognition algorithms that are robust against noise and distortion, algorithms for Japanese large-vocabulary continuous-speech recognition, automatic speech summarization algorithms, multimodal human-computer interaction systems, automatic question-answering systems, and analysis of the speech perception mechanism. He has authored or coauthored over 900 published articles.
From December 1978 to December 1979, he served on the staff of the Acoustics Research Department of Bell Laboratories, Murray Hill, New Jersey, as a visiting researcher working on speaker verification. Dr. Furui is a Fellow of the IEEE, the Acoustical Society of America (ASA), the Institute of Electronics, Information and Communication Engineers of Japan (IEICE) and the International Speech Communication Association (ISCA). He served as President of the Permanent Council of International Conferences on Spoken Language Processing (PC-ICSLP) from 2000 to 2004, the ISCA from 2001 to 2005, and the Acoustical Society of Japan (ASJ) from 2001 to 2003. He served on the IEEE Technical Committees on Speech as well as Multimedia Signal Processing, and the Technical Program Committees of ICASSP86 in Tokyo as well as ICSLP90 in Kobe. He served on ICSLP94 in Yokohama as Vice Chairman of the Conference Committee. He has organized various international conferences and workshops including the 1997 IEEE Workshop on Automatic Speech Recognition and Understanding. He has also served on several international advisory boards in the US and Europe. He served as a Board member of the IEEE Signal Processing Society from 2001 to 2003. He served as an Editor-in-Chief of the Journal of Speech Communication from 1997 to 2001, Chief Editor of the Journal of the ASJ from 1997 to 1999, and Chief Editor of the English Journal of IEICE from 2001 to 2003. He also served as an IEEE Press Editorial Board member from 1995 to 1999. He is now serving as an Editorial Board member of the Journal of Computer Speech and Language and the Journal of Speech Communication. He has also served as a Board member of the IEICE and the ASJ.
He supervised the five-year Japanese Science and Technology Agency Priority Program entitled Spontaneous Speech: Corpus and Processing Technology from 1999 to 2004. He has supervised the 21st Century Center of Excellence (COE) Program entitled Framework for Systematization and Application of Large-scale Knowledge Resources since its inception in 2003.
He received the Yonezawa Prize in 1975 and the Best Paper Award in 1988, 1993 and 2003 from the IEICE. He received the Sato Paper Award from the ASJ in 1985 and 1987. He received the Senior Award from the IEEE ASSP Society and the Achievement Award from the Minister of Science and Technology, both in 1989. He received the Book Award from the IEICE in 1990 and the Achievement Award from the IEICE in 2003. He received the IEEE Signal Processing Society Award, the Achievement Award from the Minister of Education, Culture, Sports, Science and Technology, and the Purple Ribbon Medal from Japanese Emperor in 2006. He received the Distinguished Achievement and Contributions Award from the IEICE in 2008, the ISCA Medal for Scientific Achievement in 2009, and the IEEE James L. Flanagan Speech and Audio Processing Award in 2010. He also received the Mira Paul Memorial Award from the AFECT, India in 2001. He was a Distinguished Lecturer of the IEEE Signal Processing Society from 1993 to 1994.
He is the author of Digital Speech Processing, Synthesis, and Recognition (Marcel Dekker, 1989, revised in 2000) in English, Digital Speech Processing (Tokai University Press, 1985) in Japanese, "Acoustics and Speech Processing" (Kindai-Kagaku-Sha, 1992, revised in 2006) in Japanese, and Speech Information Processing (Morikita, 1998) in Japanese. He has authored Building computers talking with people Forefront of automatic speech recognition - (Kadokawa, 2009) in Japanese. He has co-authored Image and Speech Processing Technology (Denpa-Shinbun-Sha, 2004) in Japanese. He has edited Advances in Speech Signal Processing (Marcel Dekker, 1992) jointly with Dr. M.M. Sondhi. He has translated into Japanese Fundamentals of Speech Recognition, authored by Drs. L.R. Rabiner and B.-H. Juang (NTT Advanced Technology, 1995) and Vector Quantization and Signal Compression, authored by Drs. A. Gersho and R. M. Gray (Corona-sha, 1998).