Wednesday, May 17, 2006 8:49 AM
Introduction
Back in the 80’s, speech was a pretty big deal. Everyone was trying to get their Apple, Commodore, or other 8-bit computers to speak to them. Some of it was to report information, but given the constraints of the time, it was generally either for games or pure fun. There’s nothing like having your computer call your brother a name! Some games would speak as part of a typing lesson, to act as in-game narration, or to present other information. Sound sampling was starting to be used, but memory was too small to store snippets of every possible spoken phrase. Synthesized speech was the way to go!
These days, games have much less need for dynamic speech. In fact, some games can even string together speech snippets into new sentences pretty convincingly. Most of us don’t have the resources to do this though. If we want arbitrary spoken messages, we need to go retro! Welcome back synthesized speech, we missed you!
This article will step you through adding speech to your .NET application. So what can you use the speech for? Any kind of monitoring application can speak the status of something. You could read the subject of incoming emails, the current capacity of a hard drive, the weather, or any other real-time information. This could be very useful for adding accessibility to an application or providing information in a non-obtrusive manner. You’ll be surprised how easy it is to use. What you do with it is up to you!
Adding speech Support
As it turns out, Windows has provided built-in speech support for quite awhile. The Microsoft Speech API provides powerful spoken word synthesis with little effort for the developer. The first step is to link to the appropriate library. Unfortunately, speech is not directly accessible as managed code; however the interop work required is negligible. After a few clicks, you’ll be speaking like the best!
Start by creating a new solution in C# or VB. I haven’t tried this, but I see no reason why J# wouldn’t work either if you are working in that realm. Linking to the COM object is as simple as adding a reference to the library.
With your new solution and project open, in the Solution Explorer, right-click on References, then click Add Reference. From the Add Reference dialog, click the COM tab, then select Microsoft Speech Object Library and click OK:
Figure 1: Referencing the Speech library
Theoretically, you could start to speak now with just a line or two of extra code, but for convenience (and to stretch this article…) let’s encapsulate the speech behavior into a helper class.
In Solution Explorer, right-click the project, then click Add | Class. In the Add New Item dialog, for Name, enter “SpeechUtility.”
Figure 2: Creating the SpeechUtility class
The primary feature you are probably waiting for is the ability to speak a sentence. This is as short as one line of code. Before adding any code, it will be more convenience to add the SpeechLib namespace with the using statement. At the top of the file, after “using System.Text;” add:
This will save time when using classes from the namespace. In order to speak a command though, you’ll need to instantiate the SpVoice class. Instead of creating a fresh one every time, and taking the associated memory and performance hit, create a static object to reuse each time. At the class level, after the “class SpeechUtility” declaration, add the following:
private
static
SpVoice
objSpeech =
new
SpVoice
();
The “static” keyword is used to indicate that even if you create 100 objects from this class, this object is shared. The static keyword can also be used at the method level (as you will see). In a similar fashion, this means that the method is not associated with a given class. It can access any static member variables, but if you also have class-level instance variables, you can’t access them. For this class, we will only use static data and methods so no troubles. Making it instance-based works fine as well, but then you need to be able to pass around the instance throughout the code. Since we gain no benefit from a distinct instance, it is more convenient this way.
Let’s Talk!
You are almost ready to start talking! The Speak method of the objSpeech object does the work of invoking the speech synthesis. The first argument is a string, in this case the message to say (though it can vary). The second argument instructs the speech engine what to do with the first argument. This can be used to speak text from a file, speak directly to a WAV file, or to pass other instructions to the engine. Create our Speak method as the following:
public static void
Speak(
string
tts)
{
objSpeech.Speak(tts,
SpeechVoiceSpeakFlags
.SVSFDefault);
}
Simply calling the Speak method with a string will cause the speech synthesizer to do its thing. The variable name, tts, was chosen from the phrase Text-To-Speech. Now let’s create a simple user interface. You could alternatively just call Speak from the main method in Program.cs, but that’s not as much fun! Create any UI you’d like. Mine looks like this:
Figure 3: The user interface
Double-clicking the button creates an event handler. In the body of the event handler, invoke the Speak method with the content of the TextBox control:
private void
speakButton_Click(
object
sender,
EventArgs
e)
{
SpeechUtility
.Speak(messageTextBox.Text);
}
If you run the application (F5) you can now speak anything your heart desires. You may have noticed that you can’t do anything to the window while it’s speaking. This may be fine in some applications, but you will generally want the ability to process user input or perform other actions while speech is occurring. This is pretty easy to add by specifying a different flag. Replace the SpeechVoiceSpeakFlags.SVSFDefault with SpeechVoiceSpeakFlags.SVSFlagsAsync as follows:
objSpeech.Speak(tts,
SpeechVoiceSpeakFlags
.SVSFlagsAsync);
The method will now return immediately, while the speech continues. For more control, you can use the WaitUntilDone method to wait a specified number of milliseconds for the speech to complete. If the speech takes longer than the number of milliseconds, it won’t get cut off, but your code will continue executing. Just call it after the Speak method if desired:
objSpeech.Speak(tts,
SpeechVoiceSpeakFlags
.SVSFlagsAsync);
objSpeech.WaitUntilDone(2000); //milliseconds
In the case of asynchronous speech, it will wait for one sentence to finish before it starts the next one. You can avoid this by adding a second flag. You combine more than one flag using the Boolean OR operation. This sets any bits that exist in both flags and is called a bit mask. Add the SVSFPurgeBeforeSpeak flag to abort any in-progress speech before starting something new (one line of code):
objSpeech.Speak(tts,
SpeechVoiceSpeakFlags
.SVSFlagsAsync |
SpeechVoiceSpeakFlags
.SVSFPurgeBeforeSpeak);
Next Steps
You’ve got the basics of using speech now. Some advanced features enable you create WAV files of speech, read from files, play WAV files, change voices (if installed), alter pitch/volume, and more. Enhance a game, or almost any monitoring application with dynamic speech synthesis. It’s not exactly human, but it’s definitely understandable and can add a new dimension to your applications.
Conclusion
The Microsoft Speech API is pretty powerful and can let you do fun things. Go to the Microsoft Speech API download page to get more features such as additional voices and languages. Speech quality will continue to improve, and as it gets better I expect to see it make a comeback on the desktop. Be ready!