# Virtual Device API Documentation
# Overview
We provide APIs to interact with our virtual devices programmatically. These APIs can be accessed via Node.js or HTTP.
They are easy to work with, and require simply sending a payload of what you want to "say" to Alexa or Google, such as:
virtualDevice.message("ask my skill what is the weather", (result) => {
console.log(result.transcript); // Prints out the reply from Alexa - e.g., "the weather is nice"
});
It is as easy as that! For more information on how our functional testing works, read here.
# Node.js SDK
# Installation
- Add the Virtual Device SDK to your project:
npm install virtual-device-sdk --save
- Get your token: Follow the instructions here.
# Constructor parameters
export interface IVirtualDeviceConfiguration {
token: string;
locale?: string;
voiceID?: string;
skipSTT?: boolean;
asyncMode?: boolean;
stt?: string;
locationLat?: string;
locationLong?: string;
conversationId?: string;
screenMode?: string;
projectId?: string;
}
- token: Your virtual device token, check here how to obtain it
- locale: The locale you are using, defaults to en-US
- voiceID: The voice from Polly to use with the current locale, defaults to the default voice for the locale
- skipSTT: Skip speech to text for Google (Google can return text directly), defaults to false
- asyncMode: Retrieve the conversation sent in batch asynchronously, defaults to false
- stt: What speech to text to use (google or witai), defaults to google
- locationLat: Location Latitude used in Google Virtual Devices.
- locationLong: Location Longitude used in Google Virtual Devices.
- conversationId: Set a conversation id in advance for the batch process in async mode.
- screenMode: State of the screen used in Google Virtual Devices(playing or off), defaults to playing
- projectId: Dialog Flow project id
# Sending a Message
Here is a simple example in Javascript:
const vdSDK = require("virtual-device-sdk");
const configuration = {
token: "<PUT_YOUR_TOKEN_HERE>",
locale: "en-US",
voiceId: "Joey",
}
const virtualDevice = new vdSDK.VirtualDevice(configuration);
virtualDevice.message("open my skill").then((result) => {
console.log("Reply Transcript: " + result.transcript);
console.log("Reply Card Title: " + result.card.mainTitle);
});
# Result Payload
Here is the full result payload:
export interface IVirtualDeviceResult {
card: ICard | null;
debug?: {
rawJSON: any;
};
sessionTimeout: number;
streamURL: string | null;
transcript: string;
transcriptAudioURL: string;
utteranceURL: string;
caption: string;
display: object;
debug: object;
}
export interface ICard {
imageURL: string | null;
mainTitle: string | null;
subTitle: string | null;
textField: string;
type: string;
}
# Adding homophones
Our functional tests use speech recognition for turning the output speech coming back from the virtual device into text.
This process is imperfect - to compensate for this, homophones can be specified for errors that occur when a reply from the virtual device is misunderstood.
Before sending your message call the "addHomophones" method to indicate what are the ones you expect
const virtualDevice = new vdSDK.VirtualDevice(configuration);
virtualDevice.addHomophones("white", ["wife", "while"]);
virtualDevice.message("open my skill").then((result) => {
console.log("Reply Transcript: " + result.transcript);
console.log("Reply Card Title: " + result.card.mainTitle);
});
# Sending messages in batch
You can send multiple messages and expected phrases in an object array. The goal of this method is to handle a complete interaction with the virtual device. By sending all messages to the endpoint, it is able to sequence them faster, avoiding session timeouts.
sdk.batchMessage(
[{text: "what is the weather"}, {text: "what time is it"}, {text: "tell test player to play"}],
).then((results) => {
console.log("Results: ", results);
});
The result will be an array of objects as defined in the Result Payload Section
# Processing the results asynchronously
By default when you use batchMessage the promise will return a promise that will resolve once the complete interaction is finished. By setting the async mode in the constructor of the SDK instance you can change that behavior so that it returns a conversation id that can be used to retrieve the results progressively.
const configuration = {
token: "<PUT_YOUR_TOKEN_HERE>",
locale: "en-US",
voiceId: "Joey",
skipSTT: false,
asyncMode: true,
}
const virtualDevice = new vdSDK.VirtualDevice(configuration);
sdk.batchMessage(
[{text: "what is the weather"}, {text: "what time is it"}, {text: "tell test player to play"}],
).then((result) => {
console.log("Conversation Id: ", result.conversation_id);
// logic to wait some time - you can use setTimeout to do this
return sdk.getConversationResults(result.conversation_id);
}).then((results) => {
console.log("Results: ", results);
});
As the results are processed the "getConversationResults" method will return them in subsequent calls to the method.
# HTTP API
The VirtualDevice service can also be called directly via HTTP.
To use it, first get your token:
Follow the instructions here.
The Base URL is:
https://virtual-device.bespoken.io
# Process Endpoint
Takes a single message and returns the AVS response in text form.
URL
/process
Method:
GET
URL Params
Required:
message=string
: the message that we want to send to Alexauser_id=string
: "validation token" obtained from Bespoken Dashboard (http://apps.bespoken.io/dashboard/)Optional:
language_code=string
: one of Alexa's supported locales (e.g. en-US, de-DE, etc.). Default value: "en-US". Taken from https://developer.amazon.com/docs/custom-skills/develop-skills-in-multiple-languages.html#h2-code-changesvoice_id=string
: one of Amazon Polly's supported voices (e.g. Joey, Vicki, Hans, etc.). Default value: "Joey". MUST correspond with the language_code. Taken from: https://docs.aws.amazon.com/polly/latest/dg/voicelist.htmlphrases=string
: a word or phrase used as a hint so that the speech recognition is more likely to recognize them as part of the alexa response. You can use this multiple times in the query string to send more than one.debug=boolean
: return additional information like process duration, transcript duration and raw response. Default value: "false"new_conversation=boolean
: open a new session, only for google virtual devices. Default value: "false"stt=string
: speech to text service to use, supported services are google and witai. Default value: "google"project_id=string
: project id of the Dialog Flow agentaudio_url: string
: Url address of the audio file.format: string
: When audio data is provided, the format of the audio. If an audio_url that includes an extension is provided, we'll use that instead. Valid values are 'raw' (for PCM), 'wav', 'mp3', and 'ogg'. Defaults to 'raw'.frame_rate: int
: When audio data is provided, the sample rate of the audio - defaults to 16000. We recommend using audio recorded at 16000 as this is what is typically used by the assistants. Using other sample rates will require re-sampling the audio. This field is only needed for audio with 'raw' format - for other formats, the frame_rate is contained in the audio data.channels: int
: When audio data is provided, the number of channels in the audio. Defaults to 1. This field is only needed for audio with 'raw' format - for other formats, the frame_rate is contained in the audio data.sample_width: int
: When audio data is provided, the number of 8-bit bytes in the audio - defaults to 2. This field is only needed for audio with 'raw' format - for other formats, the frame_rate is contained in the audio data.Success Response:
- Code: 200
Content:
{ "streamURL": "string", "sessionTimeout": 0, "transcriptAudioURL": "string", "message": "string", "transcript": "string", "card": { "subTitle": "string", "mainTitle": "string", "textField": "string", "type": "string", "imageURL": "string" }, "display": "object", "caption": "string" }
- Code: 200
Error Response:
Code: 400 BAD REQUEST
Content:"message is required"
Code: 400 BAD REQUEST
Content:"user_id is required"
Code: 400 BAD REQUEST
Content:"Invalid user_id"
Code: 400 BAD REQUEST
Content:{error: 'stt is invalid, it could be google or witai'}
Code: 500 INTERNAL SERVER ERROR
Content:{error: 'error message in case of an exception'}
Sample Call:
curl "https://virtual-device.bespoken.io/process?message="what time is it"&user_id=<your user id>&voice_id=Joey&language_code=en-US" ;
Notes:
- Not sending the language_code or voice_id will default both to en-US and Joey.
# Batch process Endpoint
Receives multiple messages and expected phrases in an object array. The goal of this endpoint to handle a complete interaction with Alexa. By sending all messages to the endpoint, it is able to sequence them faster, avoiding session timeouts.
URL
/batch_process
Method:
POST
Headers
Content-Type: application/json
- The content type must be set to application/json.URL Params
Required:
user_id=string
: "validation token" obtained from Bespoken Dashboard (http://apps.bespoken.io/dashboard/)Optional:
language_code=string
: one of Alexa's supported locales (e.g. en-US, de-DE, etc.). Default value: "en-US". Taken from https://developer.amazon.com/docs/custom-skills/develop-skills-in-multiple-languages.html#h2-code-changesvoice_id=string
: one of Amazon Polly's supported voices (e.g. Joey, Vicki, Hans, etc.). Default value: "Joey". MUST correspond with the language_code. Taken from: https://docs.aws.amazon.com/polly/latest/dg/voicelist.htmlasync_mode=boolean
: process the messages in the background, results can be obtained in the conversation endpoint. Default value: "false".debug=boolean
: return additional information like process duration, transcript duration and raw response. Default value: "false"stt=string
: speech to text service to use, supported services are google and witai. Default value: "google"location_lat=float
: only for google, latitude coordinate from the location of the requestlocation_long=float
: only for google, longitude coordinate from the location of the requestconversation_id=string
: only from async_mode, set the conversation idproject_id=string
: project id of the Dialog Flow agentRequest Body
Required:
messages=message[]
: object array where each object represent a message sent to the device. Each message can contain text or audio.The JSON fields for the
message
object are as follows - either the text or audio field must be provided for each object:text: string
: For messages to be converted to audio via text-to-speech, the text to convert.phrases: string[]
: Hints for converting the audio response from the assistant back to text. This will guide the speech-to-text that is performed on the response from Alexa and/or Google.audio_data: string
: Base64-encoded audio data - for sending pre-recorded audio to the assistant.audio_url: string
: Url address of the audio file.format: string
: When audio data is provided, the format of the audio. If an audio_url that includes an extension is provided, we'll use that instead. Valid values are 'raw' (for PCM), 'wav', 'mp3', and 'ogg'. Defaults to 'raw'.frame_rate: int
: The sample rate of the audio - defaults to 16000. We recommend using audio recorded at 16000 as this is what is typically used by the assistants. Using other sample rates will require re-sampling the audio. This field is only needed for audio with 'raw' format - for other formats, the frame_rate is contained in the audio data.channels: int
: The number of channels in the audio. Defaults to 1. This field is only needed for audio with 'raw' format - for other formats, the frame_rate is contained in the audio data.sample_width: int
: The number of 8-bit bytes in the audio - defaults to 2. This field is only needed for audio with 'raw' format - for other formats, the frame_rate is contained in the audio data.{ "messages": [ { "text": string, "phrases": string[], "audio_data": string (Base64-Encoded Byte Array), "audio_url": string, "format": string, "frame_rate": int, "channels": int, "sample_width": int } ] }
Success Response:
- Code: 200
Content:{ "results": [ { "streamURL": "string", "sessionTimeout": 0, "transcriptAudioURL": "string", "message": "string", "transcript": "string", "card": { "subTitle": "string", "mainTitle": "string", "textField": "string", "type": "string", "imageURL": "string" }, "display": "object", "caption": "string" } ] }
- Code: 200
Success Response (async mode):
- Code: 200
Content:{ "conversation_id": "string", }
- Code: 200
Error Response:
Code: 400 BAD REQUEST
Content:"Invalid message"
Code: 400 BAD REQUEST
Content:"Invalid user_id"
Code: 400 BAD REQUEST
Content:{error: 'stt is invalid, it could be google or witai'}
Code: 400 BAD REQUEST
Content:{error: 'Invalid format in message'}
Code: 400 BAD REQUEST
Content:{error: 'Invalid encoded audio in message'}
Code: 400 BAD REQUEST
Content:{error: 'Error processing audio'}
Code: 500 INTERNAL SERVER ERROR
Content:{error: 'error message in case of an exception'}
Sample Call:
const userId = <your user id>; const voiceId = "Joey"; const languageCode = "en-US": $.post(`https://virtual-device.bespoken.io/batch_process?user_id=${userId}&voice_id=${voiceId}&language_code=${languageCode}`, { "messages": [ { "text":"open guess the price", "phrases":["how many persons"] }, { "text":"one" } ] }, function(data, status){ console.log("Got: " + data.results.length + " responses!"); });
# Conversation Endpoint
Obtains the processed results from a batch process sent in async mode
URL
/conversation
Method:
GET
URL Params
Required:
uuid=[string]
: the id of the conversation, returned by the Batch process when async mode param is set to trueSuccess Response:
- Code: 200
Content:{ "results": [ { "streamURL": "string", "sessionTimeout": 0, "transcriptAudioURL": "string", "message": "string", "transcript": "string", "card": { "subTitle": "string", "mainTitle": "string", "textField": "string", "type": "string", "imageURL": "string" }, "display": "object", "caption": "string" } ] }
- Code: 200
Error Response:
Code: 400 Bad Request
Content:{error: 'Required parameter uuid is missing'}
Code: 404 Not Found
Content:{error: 'The uuid provided doesn\'t match a process'}
Code: 500 INTERNAL SERVER ERROR
Content:{error: 'error message in case of an exception'}
Sample Call:
const userId = <your user id>; const voiceId = "Joey"; const languageCode = "en-US": $.post(`https://virtual-device.bespoken.io/batch_process?user_id=${userId}&voice_id=${voiceId}&language_code=${languageCode}&async=true`, { "messages": [ {"text":"open guess the price", "phrases":["how many persons"]}, {"text":"one"} ] }, function(data, status){ console.log("Got conversation id: " + data.conversation_id); // Logic implementation to wait for a number of seconds $.post(`https://virtual-device.bespoken.io/conversation?uuid={uuid}, function(convData, convStatus){ console.log("Got: " + convData.results.length + " responses!"); }); });