Amazon offers the option of playing audio files via Echo using SSML. To quote:
" in some cases you may want additional control over how Alexa generates the speech from the text in your response. For example, you may want a longer pause within the speech, or you may want a string of digits read back as a standard telephone number. The Alexa Skills Kit provides this type of control with Speech Synthesis Markup Language (SSML) support."
I looked around for some examples to learn how this is achieved and ran into a brick wall. Here's what I learnt on (using python as my language) playing an audio file using Echo.
- The <speak> tag: All SSML documents(text) need to be embedded within the speak tag.
- The <audio> tag: Lets you provide the URL to an audio file. There are some guidelines around the hosting and characteristics of the file you provide.
- The MP3 must be hosted at an Internet-accessible HTTPS endpoint. (best bet? use S3)
- No sensitive or customer specific information
- Sample rate of 16000 Hz, bit rate of 48 kbps
- No longer than 90 seconds
How do we address the requirements around characteristics? Thankfully, Amazon even identifies the tools and commands with which you can achieve this. 2 options(amongst the many available):
- Command line: FFmpeg.
- following command converts the provided
<input-file>
to an MP3 file that works with theaudio
tag.
- GUI: Audacity. (this needs the Lame library, available at: http://lame.buanzo.org/#lamewindl)
- Open the file to convert.
- Set the Project Rate in the lower-left corner to
16000
. - Click File > Export Audio and change the Save as type to
MP3 Files
. - Set the Bit Rate Mode to
Constant
and Quality to48 kbps
.
What are the code changes needed ?
- In the outputSpeech attribute:
- set the type to SSML
- use SSML for the marked up text(instead of 'text')
So, in effect, if you're used to seeing:
def build_speechlet_response(title, output, reprompt_text, should_end_session):
return {
'outputSpeech': {
'type': 'PlainText',
'text': output
},
'card': {
'type': 'Simple',
'title': title,
'content': output
},
'reprompt': {
'outputSpeech': {
'type': 'PlainText',
'text': reprompt_text
}
},
'shouldEndSession': should_end_session
}
your function will now look something like:
def build_speechlet_response(title, output, reprompt_text, should_end_session):
return {
'outputSpeech': {
'type': 'SSML',
'ssml': output
},
'card': {
'type': 'Simple',
'title': title,
'content': output
},
'reprompt': {
'outputSpeech': {
'type': 'PlainText',
'text': reprompt_text
}
},
'shouldEndSession': should_end_session
}
Here is an example of valid output(note, enclosed within the <speak> </speak>tags. Replace the bucket name and file name appropriately)
'<speak>This output speech uses SSML.<audio src="https://s3-us-west-2.amazonaws.com/<bucket name>/<file name.mp3>" />.</speak>'
When returned in outputSpeech, Echo will :
- read out, in normal, Alexa's voice: "This output speech uses SSML."
- and then play the audio file the URL points to.
No comments:
Post a Comment