<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/0.32-RC1" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>voicerecog</title>
	<link>http://www.xncroft.com/blog/lyceum/voicerecog</link>
	<description>speak and be turned into bits</description>
	<pubDate>Fri, 14 Jul 2006 14:55:05 +0000</pubDate>
	<generator>http://wordpress.org/?v=0.32-RC1</generator>
	<language>en</language>
			<item>
		<title>Add recognized vocabulary to HelloWorld demo with different &#8220;grammars&#8221;</title>
		<link>http://www.xncroft.com/blog/lyceum/voicerecog/2006/07/14/add-recognized-vocabulary-to-helloworld-demo-with-different-grammars/</link>
		<comments>http://www.xncroft.com/blog/lyceum/voicerecog/2006/07/14/add-recognized-vocabulary-to-helloworld-demo-with-different-grammars/#comments</comments>
		<pubDate>Fri, 14 Jul 2006 14:41:20 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
	<category>Notes</category>
		<guid isPermaLink="false">http://www.xncroft.com/blog/lyceum/voicerecog/2006/07/14/add-recognized-vocabulary-to-helloworld-demo-with-different-grammars/</guid>
		<description><![CDATA[As it comes with the Sphinx-4 package, the HelloWorld.jar example only recognizes the following words:
(Good morning &#124; Hello)
( Bhiksha &#124; Evandro &#124; Paul &#124; Philip &#124; Rita &#124; Will )
To expand this vocabulary, we need to modify a grammar file that the JSGFGrammar class imports, and then we need to rebuild the HelloWorld.jar using ant.
In [...]]]></description>
			<content:encoded><![CDATA[<p>As it comes with the Sphinx-4 package, the HelloWorld.jar example only recognizes the following words:</p>
<div id="code">(Good morning | Hello)<br />
( Bhiksha | Evandro | Paul | Philip | Rita | Will )</div>
<p>To expand this vocabulary, we need to modify a grammar file that the <a href="http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/jsapi/JSGFGrammar.html" target="_blank">JSGFGrammar</a> class imports, and then we need to rebuild the HelloWorld.jar using <a href="http://ant.apache.org/" target="_blank">ant.</a></p>
<p>In the sphinx4-1.0beta/demo/sphinx/helloworld/ directory that came with the Sphinx-4 package, you should see a file called hello.gram.  If you open this grammar file, you see the following code:</p>
<div id="code">#JSGF V1.0;<br />
/**<br />
 * JSGF Grammar for Hello World example<br />
 */<br />
grammar hello;</p>
<p>public &lt;greet&gt; = (Good morning | Hello)<br />
( Bhiksha | Evandro | Paul | Philip | Rita | Will );</div>
<p>So, this grammar file will tell the JSGFGrammar in this application to look for phrases of two words.  To change what words our application can recognize, we simply add to these word groups.  For instance, if we want to be able to say congratulations to all these people, we simply change the public <greet> line to:</p>
<div id="code">public &lt;greet&gt; = (Good morning | Hello | Congratulations)<br />
( Bhiksha | Evandro | Paul | Philip | Rita | Will );</div>
<p>Once this change is made, you may also want to mirror this change in the HelloWorld.java program so that it prints the new word possibilities to the terminal in its println statement. Then to run the application, simply change directory to the demo/helloworld/ in Terminal and type:</p>
<div id="code">ant</div>
<p>Your system must have ant installed for this to work, but this command finds the &#8220;build.xml&#8221; file in that directory which lets the machine know where all the necessary files are for building the HelloWorld.jar.  Once the program has been built, you should be able to go to  sphinx4-1.0beta/bin/ and run java -mx312m -jar HelloWorld.jar and say Congratulations Paul <em>(or whoever)</em>.</p>
<p>Using grammars like this obviously provides little flexibility in what the program understands.  There are instances where this could be advantageous, however, such as when we want menu navigation, a survey, etc.  JSGF grammar files can be quite extensive though. Take a look at the <a href="http://java.sun.com/products/java-media/speech/forDevelopers/JSGF/JSGF.html#17514" target="_blank">developer&#8217;s guide</a> to see how they can import rules from other grammar files, reference rules within rules, weight certain words above others, and other functionalities.</p>
<p><strong style="font-size: 13px; font-weight:bold;">But, I don&#8217;t see the grammar file or the JSGFGrammar class in the HelloWorld.java file. Where do these get called?</strong><br />
All of this is specified in the &#8220;helloworld.config.xml&#8221; file that the application loads and the Configuration Manager takes action on. In that xml file, you will see in the section commented as &#8220;The Grammar Configuration&#8221;:</p>
<div id="code">&lt;component name=&#8221;jsgfGrammar&#8221; type=&#8221;edu.cmu.sphinx.jsapi.JSGFGrammar&#8221;&gt;<br />
        &lt;property name=&#8221;dictionary&#8221; value=&#8221;dictionary&#8221;/&gt;<br />
        &lt;property name=&#8221;grammarLocation&#8221;<br />
             value=&#8221;resource:/demo.sphinx.helloworld.HelloWorld!/demo/sphinx/helloworld/&#8221;/&gt;<br />
        &lt;property name=&#8221;grammarName&#8221; value=&#8221;hello&#8221;/&gt;<br />
	&lt;property name=&#8221;logMath&#8221; value=&#8221;logMath&#8221;/&gt;<br />
    &lt;/component&gt;</div>
<p>So, this xml tells the application where to find the hello.gram file in the .jar&#8217;s resource path as well as the grammarName that was specified by the line &#8220;grammar hello;&#8221; in the grammar file.
</p>
]]></content:encoded>
			<wfw:commentRSS>http://www.xncroft.com/blog/lyceum/voicerecog/2006/07/14/add-recognized-vocabulary-to-helloworld-demo-with-different-grammars/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Sphinx 4 Architecture</title>
		<link>http://www.xncroft.com/blog/lyceum/voicerecog/2006/07/05/sphinx-4-architecture/</link>
		<comments>http://www.xncroft.com/blog/lyceum/voicerecog/2006/07/05/sphinx-4-architecture/#comments</comments>
		<pubDate>Wed, 05 Jul 2006 22:04:18 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
	<category>Notes</category>
		<guid isPermaLink="false">http://www.xncroft.com/blog/lyceum/voicerecog/2006/07/05/sphinx-4-architecture/</guid>
		<description><![CDATA[
( diagram taken from Sphinx-4: A Flexible Open Source Framework
for Speech Recognition )
NOTE: This post is not complete.
The beauty of the Sphinx 4 architecture is its modularity and pluggability.  Previously, speech recognition programs were built to fulfill specific roles: continuous speech vs. non-continuous, large vocabulary vs. smaller vocabulary, etc.  Now with Sphinx 4, [...]]]></description>
			<content:encoded><![CDATA[<p><img src="http://www.xncroft.com/blog/lyceum/wp-content/blogs/2/uploads/sphinx4_framework.jpg"><br />
<strong style="font-size:8px; font-weight:normal;font-style:italic">( diagram taken from <a href="http://research.sun.com/techrep/2004/smli_tr-2004-139.pdf">Sphinx-4: A Flexible Open Source Framework<br />
for Speech Recognition</a> )</strong></p>
<p><em>NOTE: This post is not complete.</em></p>
<p>The beauty of the Sphinx 4 architecture is its modularity and pluggability.  Previously, speech recognition programs were built to fulfill specific roles: continuous speech vs. non-continuous, large vocabulary vs. smaller vocabulary, etc.  Now with Sphinx 4, an xml-configuration file allows varied &#038; dynamic behavior from the speech engine without a need for modifying the source code or recompiling.</p>
<p>The diagram above <a href="http://research.sun.com/techrep/2004/smli_tr-2004-139.pdf"  target="_blank">( from this .pdf )</a> shows how an application plugs into the Sphinx framework without having to delve deeply into the code of Sphinx itself. The speech engine of Sphinx, called the Recognizer, consists of 3 main modules: the Front End, the Decoder, &#038; the Linguist.  The behavior of each of these modules can be individually configured in the xml configuration file.  I will try and delve into these three parts in separate posts once I understand them more fully, but for now, let&#8217;s focus on the application basics and configuration file setup.  </p>
<p>The <a href="http://cmusphinx.sourceforge.net/sphinx4/doc/ProgrammersGuide.html" target="_blank">Sphinx-4 Application Programmer&#8217;s Guide</a> and <a href="http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/util/props/doc-files/ConfigurationManagement.html" target="_blank">Configuration Management for Sphinx-4</a> are must reads on this topic, but I&#8217;ll try to convey what I can here. </p>
<p>Any Java application incorporating Sphinx appears to go through 5 basic steps, as in the <a href="http://cmusphinx.sourceforge.net/sphinx4/doc/ProgrammersGuide.html#helloCodeWalk" target="_blank" onMouseover="ddrivetip('HelloDigits Source Code')"; onMouseout="hideddrivetip()">HelloDigits app demo</a> that comes with Sphinx-4. (Code snippets below were taken from the Application Programmer&#8217;s Guide, except for the portion from the Transcriber demo.) Very concisely stated, here are those 5 steps in order:</p>
<p>1) Load in xml configuration file.<br />
2) Create ConfigurationManager which interprets xml.<br />
3) Use lookup() on ConfigurationManager to create Recognizer object (the speech engine) and audio input data stream.<br />
4) Go into loop that is based on audio events where the Recognizer analyzes speech to return Results.<br />
5) Call Result methods to convert speech analysis into text strings ( or do more advanced operations&#8230; )</p>
<p>1) First, the file path of the xml config file is specified and loaded into the application via the URL object.<br />
2) This URL is handed to an object called the <a href="http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/util/props/ConfigurationManager.html" target="_blank">ConfigurationManager</a> which interprets the xml and gets the Sphinx speech engine ready to behave as the config file specifies that it should.  </p>
<div id = "code">
public static void main(String[] args) {<br />
        try {<br />
            URL url;<br />
            if (args.length > 0) {<br />
&nbsp;&nbsp;&nbsp;url = new File(args[0]).toURI().toURL();<br />
            } else {<br />
&nbsp;url = HelloDigits.class.getResource(&#8221;hellodigits.config.xml&#8221;);<br />
            }<br />
            ConfigurationManager cm = new ConfigurationManager(url);
</div>
<p>3) Once the Configuration Manager is created, we call <a href="http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/util/props/ConfigurationManager.html#lookup(java.lang.String)" target="_blank">lookup()</a> on it in order to make a Recognizer ( the speech engine containing the Front End, Decoder, and Linguist previously mentioned ) and an audio input ( a microphone in the case of HelloDigits.)</p>
<div id="code">Recognizer recognizer = (Recognizer) cm.lookup(&#8221;recognizer&#8221;);<br />
	    Microphone microphone = (Microphone) cm.lookup(&#8221;microphone&#8221;);</div>
<p>Alternatively, a pre-recorded audio source can be converted into an audiostream and used as an input, as in the Transcriber demo:</p>
<div id="code">
try {<br />
            URL audioURL;<br />
            if (args.length > 0) {<br />
                &nbsp;&nbsp;audioURL=new File(args[0]).toURI().toURL();<br />
            } else {<br />
&nbsp;&nbsp;audioURL=Transcriber.class.getResource(&#8221;10001-90210-01803.wav&#8221;);<br />
            }<br />
&nbsp;&nbsp;AudioInputStream ais = AudioSystem.getAudioInputStream(audioURL);<br />
&nbsp;&nbsp;StreamDataSource reader = (StreamDataSource)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cm.lookup(&#8221;streamDataSource&#8221;);<br />
&nbsp;&nbsp;reader.setInputStream(ais, audioURL.getFile());</div>
<p>4) Once the Recognizer and Front End audio input (either Microphone or StreamDataSource) are created, we allocate necessary memory resources to the Recognizer</p>
<div id="code">/* allocate the resource necessary for the recognizer */<br />
            recognizer.allocate();</div>
<p>and then the application can go into a loop where calling recognizer.recognize() will try to return Results while the audio input is available.  </p>
<div id="code">
/* the microphone will keep recording until the program exits */<br />
	    if (microphone.startRecording()) {</p>
<p>		System.out.println<br />
		    (&#8221;Say any digit(s): e.g. \&#8221;two oh oh four\&#8221;, &#8221; +<br />
		         &#8220;\&#8221;three six five\&#8221;.&#8221;);</p>
<p>		while (true) {<br />
		    System.out.println<br />
			(&#8221;Start speaking. Press Ctrl-C to quit.\n&#8221;);</p>
<p>                    /*<br />
		     * This method will return when the end of speech<br />
		     * is reached. Note that the endpointer will determine<br />
		     * the end of speech.<br />
		     */<br />
		    Result result = recognizer.recognize();</p>
<p>	            if (result != null) {<br />
			String resultText = result.getBestResultNoFiller();<br />
			System.out.println(&#8221;You said: &#8221; + resultText + &#8220;\n&#8221;);<br />
	            } else {<br />
		        System.out.println(&#8221;I can&#8217;t hear what you said.\n&#8221;);<br />
	            }<br />
		}<br />
            } else {<br />
	        System.out.println(&#8221;Cannot start microphone.&#8221;);<br />
		recognizer.deallocate();<br />
		System.exit(1);<br />
            }</div>
<p>5) <a href="http://cmusphinx.sourceforge.net/sphinx4/doc/ProgrammersGuide.html#interpretResult" target="_blank">Results</a> are the objects that the Recognizer returns when speech is detected.  The Recognizer analyzes speech by hypothesizing on probable matches for what a user has said. The Result object actually contains all of the &#8220;search paths&#8221; that the Recognizer has traversed for a given block of speech. It contains paths that have reached their &#8220;final state&#8221; ( meaning that it&#8217;s probably at the end of a sentence or long pause ) as well as &#8220;active paths&#8221; that haven&#8217;t yet reached final state.  Basically, the Result object is a collection of scored guesses that the computer makes about what has been said, and the object has methods for your application to mine through this collection in different ways.<br />
The <a href="http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/result/Result.html#getBestFinalResultNoFiller()" target="_blank">getBestFinalResultNoFiller()</a> is the method used most in the demos, and it is used to avoid any partial sentences in the text output. Basically, the program waits until it is certain of a finished phrase before it hands off a textual guess. Another possible method, <a href="http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/result/Result.html#getBestResultNoFiller()" target="_blank">getBestResultNoFiller()</a> seems a bit more forgiving (but perhaps less accurate) in that it attempts to return the highest scored result that has reached a final state, but if it doesn&#8217;t find a best final result, it is happy to return the active result with the highest score.  There are many other methods for manipulating the Result returned by the Recognizer, including ways to dig into the search paths, words, and scores of <a href="http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/decoder/search/Token.html" target="_blank">Tokens</a> to find the N-best results.</p>
<p>Resources:</p>
<p><em>For info into the architecture of Sphinx 4</em><br />
<a href="http://www.speech.cs.cmu.edu/cmusphinx/twiki/Sphinx4/WebHome/Architecture.pdf">Sphinx 4 for the Java platform<br />
Architecture Notes </a><br />
<a href="http://research.sun.com/techrep/2004/smli_tr-2004-139.pdf">Sphinx-4: A Flexible Open Source Framework<br />
for Speech Recognition</a></p>
<p><em>For info into configuration of Sphinx 4 applications</em><br />
<a href="http://cmusphinx.sourceforge.net/sphinx4/doc/ProgrammersGuide.html">Sphinx-4 Application Programmer&#8217;s Guide</a><br />
<a href="http://cmusphinx.sourceforge.net/sphinx4/javadoc/edu/cmu/sphinx/util/props/doc-files/ConfigurationManagement.html">Configuration Management for Sphinx-4</a></p>
]]></content:encoded>
			<wfw:commentRSS>http://www.xncroft.com/blog/lyceum/voicerecog/2006/07/05/sphinx-4-architecture/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Bluetooth Headset results</title>
		<link>http://www.xncroft.com/blog/lyceum/voicerecog/2006/06/30/bluetooth-headset-results/</link>
		<comments>http://www.xncroft.com/blog/lyceum/voicerecog/2006/06/30/bluetooth-headset-results/#comments</comments>
		<pubDate>Fri, 30 Jun 2006 19:11:53 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
	<category>Notes</category>
		<guid isPermaLink="false">http://www.xncroft.com/blog/lyceum/voicerecog/2006/06/30/bluetooth-headset-results/</guid>
		<description><![CDATA[Bluetooth headsets could turn out to be the most widely available interface for speaking to the computer, so we wanted to do some tests to see if the demos that came with Sphinx-4 would work with a headset.  
One useful thing to note is that I had to jump through some hoops to get [...]]]></description>
			<content:encoded><![CDATA[<p>Bluetooth headsets could turn out to be the most widely available interface for speaking to the computer, so we wanted to do some tests to see if the demos that came with Sphinx-4 would work with a headset.  </p>
<p>One useful thing to note is that I had to jump through some hoops to get the Bluetooth headset to pair with my Powerbook G4 that I purchased in 2003.  Basically, I had to update my bluetooth software AND FIRMWARE for it to be able to pair with the NOKIA HDW-3 headset that we wanted to use.  Dig into the Apple forums <a href="http://discussions.apple.com/message.jspa?messageID=2250836">here</a> and <a href="http://discussions.apple.com/thread.jspa?messageID=1864354#1864354">here</a> for useful tips on this update process.</p>
<p>Once you have the headset paired with the computer, change your line-in settings in System Preferences / Sound to your bluetooth headset so that the computer is listening through the headset instead of its internal mic.</p>
<p>I tested several of the demo applications from Sphinx-4 with the Bluetooth headset and the mic built in to the laptop with very similar results.  Neither input achieves total accuracy with the demos, but both are frequently recognizable by the computer.  I&#8217;m not sure how to run the diagnostic applications yet to compare exactly what the difference is, but both seem to work to some degree. There&#8217;s something I don&#8217;t understand yet with the timing of when the computer is actually listening in the demos, so that is something to look into.
</p>
]]></content:encoded>
			<wfw:commentRSS>http://www.xncroft.com/blog/lyceum/voicerecog/2006/06/30/bluetooth-headset-results/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>Getting started with Sphinx 4 :: HelloWorld!!</title>
		<link>http://www.xncroft.com/blog/lyceum/voicerecog/2006/06/30/on-sphinx-getting-started-with-spinx-4/</link>
		<comments>http://www.xncroft.com/blog/lyceum/voicerecog/2006/06/30/on-sphinx-getting-started-with-spinx-4/#comments</comments>
		<pubDate>Fri, 30 Jun 2006 17:55:48 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
	<category>Tutorials</category>
		<guid isPermaLink="false">http://www.xncroft.com/blog/lyceum/voicerecog/2006/06/30/on-sphinx-getting-started-with-spinx-4/</guid>
		<description><![CDATA[Sphinx is a HMM based speech recognition system developed at CMU.  There are 4 versions of Sphinx. For the moment, I&#8217;m exploring Sphinx-4 because it is written in Java, and it would probably combine easiest with the Video Comments work being done at ITP.
Getting everything you need to run Sphinx 4
First, download the bin [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://cmusphinx.sourceforge.net/html/cmusphinx.php">Sphinx</a> is a <a href="http://en.wikipedia.org/wiki/Hidden_Markov_model" onMouseover="ddrivetip('Hidden Markov Model')"; onMouseout="hideddrivetip()">HMM</a> based speech recognition system developed at CMU.  There are 4 versions of Sphinx. For the moment, I&#8217;m exploring Sphinx-4 because it is written in Java, and it would probably combine easiest with the Video Comments work being done at ITP.</p>
<p><strong>Getting everything you need to run Sphinx 4</strong><br />
First, download the bin (or the source if you want to compile everything yourself) of <a href="http://sourceforge.net/project/showfiles.php?group_id=1904&#038;package_id=117949">Sphinx 4 at sourceforge.</a></p>
<p>Then, to run any speech applications in Java, whether they be speech recognition or text to speech, you must get the Java Speech API setup.  </p>
<p>After downloading the Sphinx-4 bin, you need to &#8220;unpack&#8221; the jsapi.jar by signing a BCL license. Instructions for setting up the JSAPI 1.0 for UNIX and Windows systems are <a href="http://freetts.sourceforge.net/docs/jsapi_setup.html">here,</a> but I&#8217;ll repeat how I set up on my Mac right here for good measure:</p>
<p>In Terminal, change directory to the <strong style="font-family: Courier; font-weight: normal;">lib</strong> folder in the Sphinx-4 package that you downloaded where the jsapi.sh file sits.<br />
<strong style="font-family: Courier; font-weight: normal;">cd sphinx4-1/sphinx4-1.0beta/lib</strong></p>
<p>If you type <strong style="font-family: Courier; font-weight: normal;">ls</strong>, you should see a file called jsapi.sh in this directory.</p>
<p>Then type <strong style="font-family: Courier; font-weight: normal;">chmod +x ./jsapi.sh</strong><br />
Then type <strong style="font-family: Courier; font-weight: normal;">sh ./jsapi.sh</strong><br /> A long document that is the BCL license should show up.  Scroll down to the end of it and agree to it by typing &#8216;y&#8217; when prompted to do so.  Then we you press enter, the jsapi.jar should be unpacked and ready in this same lib directory.  </p>
<p>Move this jsapi.jar file to your Java Extensions folder ( yourComputer: System/Library/Java/Extensions ). Now your computer should know how to talk with Java!</p>
<p><strong>Now run some demos!</strong><br />
To make sure everything is set up correctly, we can now run the demos that came with the Sphinx-4 download.  In the bin directory, let&#8217;s run the HelloWorld.jar that came with the package. In this application, there are a fixed vocabulary of words that you can speak for your computer to hear. </p>
<p>Run it in terminal by changing directory to the bin folder ( sphinx4-1/sphinx4-1.0beta/bin ) and then launch the HelloWorld app by typing <strong style="font-family: Courier; font-weight: normal;">java -mx312m -jar HelloWorld.jar</strong>  A more in depth <a href="http://cmusphinx.sourceforge.net/sphinx4/demo/sphinx/helloworld/README.html" onMouseover="ddrivetip('HelloWorld Tutorial from Sphinx')"; onMouseout="hideddrivetip()">tutorial</a> about running the HelloWorld.jar and the app does is on the Spinx website.</p>
]]></content:encoded>
			<wfw:commentRSS>http://www.xncroft.com/blog/lyceum/voicerecog/2006/06/30/on-sphinx-getting-started-with-spinx-4/feed/</wfw:commentRSS>
		</item>
		<item>
		<title>General Speech Recognition - How It Works, What Types Exist</title>
		<link>http://www.xncroft.com/blog/lyceum/voicerecog/2006/06/30/general-speech-recognition-how-it-works-what-types-exist/</link>
		<comments>http://www.xncroft.com/blog/lyceum/voicerecog/2006/06/30/general-speech-recognition-how-it-works-what-types-exist/#comments</comments>
		<pubDate>Fri, 30 Jun 2006 16:13:24 +0000</pubDate>
		<dc:creator>admin</dc:creator>
		
	<category>Notes</category>
		<guid isPermaLink="false">http://www.xncroft.com/blog/lyceum/voicerecog/2006/06/30/general-speech-recognition-how-it-works-what-types-exist/</guid>
		<description><![CDATA[I found one useful summary of what speech recognition is here.  It details the types of speech recognizers, including speaker-independent, speaker-dependent, continuous speech recognition, isolated speech recognition, and vocabulary constrained system.  As the technology exists now, it seems that one has to figure out a compromise between vocabulary size that the computer can [...]]]></description>
			<content:encoded><![CDATA[<p>I found one useful summary of what speech recognition is <a href="http://javaboutique.internet.com/tutorial/speechapi/index-3.html">here.</a>  It details the types of speech recognizers, including speaker-independent, speaker-dependent, continuous speech recognition, isolated speech recognition, and vocabulary constrained system.  As the technology exists now, it seems that one has to figure out a compromise between vocabulary size that the computer can recognize and flexibility of the system to recognize different speakers and natural ways of speaking. From what Shawn has told me about what we&#8217;re looking for, we definitely need a continuous speech program with a rather large vocabulary. Allowing natural ways of speaking seems to be the most important element since we want this system to maintain a conversational feel.  So, the accuracy of the recognition doesn&#8217;t have to be perfect.
</p>
]]></content:encoded>
			<wfw:commentRSS>http://www.xncroft.com/blog/lyceum/voicerecog/2006/06/30/general-speech-recognition-how-it-works-what-types-exist/feed/</wfw:commentRSS>
		</item>
	</channel>
</rss>

