ASR Gooyesh Pardaz Co. (AGP), as the first and the only Iranian company specialized in speech technology, was established in 2002 to develop speech processing systems by a team of experts from Sharif University of Technology, with years of experience in the field of language processing. As the leading pioneer in speech processing technology for Persian language, AGP has successfully developed various applications and solutions for speech processing and recognition.
Home & Industrial Automation, and Voice Command Control
The goal of this system is to establish a connection between mankind and machine, through speech. In other words, people could use voice commands instead of keys and buttons to speak to computers and devices. The following are only a few of many applications of this system:
- Execution and control of computer applications through speech
Setting up and controlling computer programs via speech. This capability enables our users to use speech to perform computer tasks or control software. As an example, one could open a web browser by simply saying the command “connect to internet”, or to zoom in on a text by saying “increase text size”. Similarly, users are able to define new voice commands in different applications and control their programs with ease. Voice commands can also be used to improve the capabilities of programs such as games or educational applications.
- Home and industrial automation through speech recognition
The true purpose of this system is to provide a means to remote voice recognition for controlling the equipment at hand. This system can be used in a wide variety of fields such as vehicle and automotive control, factory equipment management and smart homes. This product can leverage internet or phones in order to convey voice commands.
Speech recognition on embedded systems (cell phones, DSP, etc.)
While the speed and efficiency of embedded processors increase with each generation, developing software in this context still face many challenges such as computation demand and ease of use. AGP has also developed a version of the speech recognition system on processors with limited resources such as DSPs (for use in embedded applications as part of other systems) and mobile phones with high efficiency and optimum processing speed. Some applications of these systems are as follows:
- Providing voice to text capability on mobile handsets
- Speech dialing and voice-based SMS on mobile handsets
- Speech to speech Voice Translator
- Login system through speaker verification
- Voice dictation ability in systems such as robots
Natural Language Processing (NLP)
The incorporation of language models and language information is one the crucial prerequisites of AI-based systems such as speech recognition, text to speech conversion, machine translation, optical characters’ recognition and correction of typing errors. AGP uses the latest methods in Natural Language Processing to extract and apply language information to various systems. This has resulted in using a large volume of information for the first time in Persian. Immense language information has been used in our speech recognition engine such as Persian statistical language models, Persian grammar Model, and a set of computational vocabularies for Persian language. This information can be used in different applications and research activities.
Educational Multimedia Systems
Many educational applications such as language instruction and Quran recitation also require sophisticated levels of language analysis in order to help users quantify their progress. This feature may be used as a module or SDK in different applications. Pronunciation inspection systems can be used in a variety of applications. By leveraging pattern recognition and statistical modeling, the feature can transform the similarity between the word/phrase pronounced by the user and the reference word/phrase into a score. This module can act as dependent or independent from the speaker’s language.
Speech dictation abilities incorporated in our Ariana product are also used in applications such as audio books or any programs that requires the conveying of various information to users.
Speech Quality Enhancement
Improving the quality of speech and making it understandable has been a longstanding necessity in many aspects of speech processing. The process includes removing the added or convoluted noise in a signal recorded in a lecture, music performance, conference talks, etc. Using the latest techniques in this field, AGP has embarked upon performing extensive research and developing a product for this purpose. This service can be used as a stand-alone application or be incorporated into other programs. For instance, using this unit in speech recognition systems in noisy environments such as a moving car or in an exhibition improves the efficiency and accuracy of the ASR system. This engine can be customized and optimized based on the requirements of any specific application.
Some of Our Projects
Continuous Speech Recognition Project
The speech in this particular project consists of a series of continuous words and sentences pronounced by human voice, which is then fed into our system in order to be translated into text. Leveraging the latest technology in this area, AGP has been able to develop a speaker-independent speech recognition engine, which ultimately lead to the development of various versions of our Nevisa Persian speech to text application, each with a particular specialty. This engine also enables the expansion of our system to many other languages such as English, Arabic, French, etc.
Telephony Speech to Text Project
The telephony speech recognition system has been developed by AGP simultaneously to the development of microphonic speech to text application. Voice recognition through telephone is a far harder endeavor, since the speech often has a lower quality, and its bandwidth is limited to 8Khz. Also the speaker tends to use informal language through telephone, and the voice often incorporates a wider range of expressions and grammar. Our telephony voice to text system can be used to recognize numbers and voice commands in Interactive Voice Response (IVR) systems, using analog, digital and Voice over IP (VoIP) lines.
Text to Speech (TTS) Project
The purpose of this project is to enable computer systems to read electronic texts. The project is essentially divided into two important components: (1) Translating the text into a sequence of phoneme units, and (2) turning the sequence into voice (speech synthesis). For the first component, AGP has developed a Text to Phenome (TTP) engine for Persian. For the second part, AGP has successfully developed a Persian speech synthesis engine, leveraging the state-of-the-art text to voice techniques, which can be expanded to other languages as well. One the crucial aspects of developing such an engine is to ensure the quality of the outcome speech and to make is as human-like as possible. AGP continues to improve this aspect of their engine.
Speaker Identification Project
The voice of each person is one of his/her unique attributes that can be used for biometric purposes. The purpose of this project is to extract some information from the incoming signal that will lead to the identification of the speaker. Speaker recognition consists of two areas. In speaker authentication, the identity of the user can be verified from his/her voice, and consequently granted access based on his/her identity. Speaker verification systems are used to ensure security in systems and to help with access grant systems. AGP has developed a speaker identification system with a broad domain that can be used online or offline and also to conduct voice processing on telephone or satellite lines.