The Diatheke API is defined using gRPC and protocol buffers. This section of the documentation is auto-generated from the protobuf file. It describes the data types and functions defined in the spec. The “messages” below correspond to the data structures to be used, and the “service” contains the methods that can be called.
Service that implements the Cobalt Diatheke Dialog Management API.
Method Name | Request Type | Response Type | Description |
---|---|---|---|
Version | Empty | VersionResponse | Queries the Version of the Server. |
Models | Empty | ModelsResponse | Models will return a list of available versions. Model values from this list may be used in NewSession calls. |
NewSession | NewSessionRequest | SessionID | Requests a new session with the given config and returns the session ID, which is required for other rpc methods. After the session is created, StartSession() must be called to begin executing the Diatheke model. |
StartSession | SessionID | Empty | Begin execution of the model for the given session ID. The session’s event stream should be set up prior to calling this function so that the client application can respond to any initialization events that are defined in the session’s model. |
EndSession | SessionID | Empty | Terminates an existing session and closes any open session streams. It is an error if the SessionEndRequest has an invalid SessionID. |
SessionEventStream | SessionID | DiathekeEvent | Requests a new event stream for the given session. Only one stream per session is allowed. |
CommandFinished | CommandStatus | Empty | Notify Diatheke when a command has completed so that it may update the dialog state. The initial command request will come as part of a DiathekeEvent. After sending a CommandEvent, Diatheke will wait until it receives the CommandFinished notification before continuing to the next action in the model. Client applications should therefore always call this after receiving a CommandEvent, or else the session will hang. |
StreamAudioInput | AudioInput | Empty | Begin an audio input stream for a session. The first message to the server should specify the sessionID, with binary audio data pushed for every subsequent message. As the audio is recognized, Diatheke will respond with appropriate events on the session’s event stream. Only one stream at a time is allowed for a session. A previously created audio input stream must be closed before starting a new one. |
StreamAudioReplies | SessionID | AudioReply | Create an audio reply stream for a session. The returned stream will receive replies (as defined in the Diatheke model) from the server as they occur in the conversation. For each reply, the stream will first receive the text to synthesize (defined by the model), followed by one or more messages containing the synthesized audio bytes. The reply will end with a message indicating that TTS for that entry is complete. Only one reply stream at a time is allowed for a session. NOTE: The text in the first message of an audio reply is the same that will be received in the session’s event stream. |
PushText | PushTextRequest | Empty | Push text to Diatheke as part of the conversation for a session. Diatheke will respond with an appropriate event on the session’s event stream based on whether the given text was recognized as a valid intent or not. |
SetStory | StoryRequest | Empty | Set the current story for a running session. This function can be used to implement system initiated alerts or to change the current session state. Events for the new story will come over the session’s event stream. |
StreamASR | ASRRequest | ASRResponse | Manually run streaming ASR unrelated to any session by pushing audio data to the server on the audio stream. As transcriptions become available, the server will return them on the ASRResponse stream. The transcriptions may then be used for, e.g., the PushText method. This function is provided as a convenience. |
StreamTTS | TTSRequest | TTSResponse | Manually run streaming TTS. The Audio stream will receive binary audio data as it is synthesized and will close automatically when synthesis is complete. This function is provided as a convenience. |
Request for streaming ASR unrelated to a session.
Field | Type | Label | Description |
---|---|---|---|
model | string | The Cubic model to use for ASR. This message should always be sent before any audio data is sent. |
|
audio | bytes | Audio data to process. The encoding of the data should match what was specified in the Diatheke server configuration. NOTE: If the audio data is empty, the server may interpret it as the end of the stream and stop accepting further messages. |
ASRResponse contains speech recognition results.
Field | Type | Label | Description |
---|---|---|---|
text | string | Text is the Cubic engine’s formatted transcript of pushed audio. This field will be the 1-best alternative. |
|
confidence_score | double | The confidence score is a floating point number between 0.0 - 1.0. A score of 1.0 indicates that the ASR engine is 100% confident in the transcription. |
The AtStartEvent is sent when a Diatheke session returns back the start state of the model.
This message is empty and has no fields.
Provides input audio data for StreamAudioInput. The first message sent must contain the session ID only. All subsequent messages must contain audio data only.
Field | Type | Label | Description |
---|---|---|---|
session_id | string | Session ID returned from the NewSession call. |
|
data | bytes | Audio data to process. The encoding of the data should match what was specified in the Diatheke server configuration. NOTE: If the audio data is empty, the server may interpret it as the end of the stream and stop accepting further messages. |
An AudioReply is the verbal and textual reply that Diatheke returns as part of a conversation (not to be confused with the server concepts of request and response).
Field | Type | Label | Description |
---|---|---|---|
label | string | The label defined in the Diatheke model. Identifies which reply in the model this message corresponds to. |
|
text | string | The reply text as defined in the Diatheke model. This is the first message that will be received for an AudioReply. It contains the same text as the corresponding ReplyEvent in the session’s event stream. |
|
data | bytes | The audio data from TTS. There can be any number of these messages for an AudioReply after the first text message and before the final end message. The encoding of the data will match what was specified in the server configuration. |
|
end | Empty | Indicates that TTS has finished streaming audio for the reply. This is the last message that will be received for an AudioReply. |
A CommandEvent occurs when Diatheke wants the client to execute the given command.
Field | Type | Label | Description |
---|---|---|---|
command_id | string | ID of the command that should be run. i.e. “COM01” for Command #01. |
|
parameters | CommandEvent.ParametersEntry | repeated | A generic map of parameters (name, value). The parameters are defined in the Diatheke model. Depending on the command, these parameters should be sent back with the CommandStatus update. |
command_state_id | string | ID to keep track of the dialog state when the command is requested. This field is required in the CommandStatus message so that Diatheke can correctly update the dialog state when CommandFinished is called. |
Field | Type | Label | Description |
---|---|---|---|
key | string | ||
value | string |
The final status of an executed command.
Field | Type | Label | Description |
---|---|---|---|
session_id | string | session_id should be the same as the status id returned from NewSessionResponse. |
|
command_id | string | ID of the command as given in the RunCommand object. |
|
return_status | CommandStatus.StatusCode | ||
output_parameters | CommandStatus.OutputParametersEntry | repeated | Parameters to return to Diatheke. For example, the map might contain the entry “temperature”, which was populated with a value of “30” after the command finished. Expected parameters are defined by the Diatheke model. |
error_message_text | string | Set this field with an error message if a fatal error occured while executing the command (return_status == FAILURE). |
|
command_state_id | string | State ID from the original CommandEvent. This field is required for Diatheke to correctly update the dialog state when CommandFinished is called. |
Field | Type | Label | Description |
---|---|---|---|
key | string | ||
value | string |
An event from Diatheke in response to either recognized audio, submitted text, or some other transition in the model.
Field | Type | Label | Description |
---|---|---|---|
command | CommandEvent | Indicates Diatheke found an actionable state in the dialog, and requests the client to perform the given command. Users should always call CommandFinished after receiving this event so that Diatheke can update the dialog state when the command is complete. |
|
recognize | RecognizeEvent | An event indicating whether pushed text and audio was recognized by ASR and/or Diatheke. |
|
reply | ReplyEvent | The textual reply from Diatheke in the conversation (not to be confused with the server concepts of request and response). For example, this could be a question to solicit more information from the user, a status report, or any other reply defined by the Diatheke model. The text of this message is also provided in the AudioReply stream (if one is open). |
|
input_required | InputRequiredEvent | Indicates that Diatheke is expecting user input (text or audio), which is defined by input actions in the Diatheke model. |
|
at_start | AtStartEvent | Indicates that Diatheke has returned to the start state of the model. |
This message is empty and has no fields.
An InputRequiredEvent occurs when Diatheke is expecting input from the user (text or audio).
This message is empty and has no fields.
The message sent by the server in response to a Models request. Returns an array of model names.
Field | Type | Label | Description |
---|---|---|---|
models | string | repeated | Array of models available for use. |
Request for the NewSession call.
Field | Type | Label | Description |
---|---|---|---|
model | string | For applications that have more than one model to use for ASR/NLU. ASR grammar can vary between models, as well as sets of commands. Some applications will only have one model. |
Request to push text to Diatheke as part of a conversation.
Field | Type | Label | Description |
---|---|---|---|
session_id | string | Session ID returned from the NewSession call. |
|
text | string | User input. This could be a transcription from manually run ASR, text selected from a dropdown list, entered in a prompt, etc. |
A RecognizeEvent occurs if a session’s audio input has a transcription available, or if the PushText method was called. In both cases, the event will indicate whether the text was recognized as a valid intent by the Diatheke model.
Field | Type | Label | Description |
---|---|---|---|
text | string | The pushed text or transcription of audio sent to Diatheke. |
|
valid_input | bool | True if the submitted text or audio transcription was recognized by the Diatheke model as a valid intent or entity. |
A ReplyEvent occurs when Diatheke has a reply in the conversation (not to be confused with the server concepts of request and response). These correspond to replies defined in the Diatheke model. For example, it might be a prompt for additional information from the user, a status update, or a confirmation. ReplyEvents are not generated in response to StreamTTS calls.
Field | Type | Label | Description |
---|---|---|---|
text | string | Text of the reply event (defined by the Diatheke model). |
|
label | string | Label of the reply event (defined by the Diatheke model). |
Simple message that only contains the session ID.
Field | Type | Label | Description |
---|---|---|---|
session_id | string | Session ID returned from the NewSession call. |
Request to change the current story of a session.
Field | Type | Label | Description |
---|---|---|---|
session_id | string | ID of the session that will have its story changed. |
|
story_id | string | ID of the story to switch to. This ID is defined by the model used to create the session. |
|
parameters | StoryRequest.ParametersEntry | repeated | Parameters to set in session memory before executing the specified story. Some stories in the model may make assumptions about which parameters have already been defined, so it is important to be familiar with the model requirements for any given story. |
wait_for_start | bool | If true, the given story will not be executed until the session completes the current stories and returns back to the main story. If false, the current story in the session will be immediately interrupted to execute the specified story. |
|
temporary | bool | If true, once the given story has finished, Diatheke will return the session to the place in the model where it was when this request was received, and restore the parameters that were defined at that time. This is useful when the change in story represents a temporary interruption. If false, Diatheke will simply continue from the given story without trying to go back to its prior state, which is useful to make a permanent state change. |
Field | Type | Label | Description |
---|---|---|---|
key | string | ||
value | string |
Request to synthesize speech unrelated to a session.
Field | Type | Label | Description |
---|---|---|---|
model | string | The Luna model to use for TTS (defined in the server config file). |
|
text | string | Text to synthesize |
Response for text-to-speech unrelated to a session.
Field | Type | Label | Description |
---|---|---|---|
data | bytes | The synthesized audio data. The data encoding will match what was specified in the server configuration. |
The message sent by the server for the Version
method.
Field | Type | Label | Description |
---|---|---|---|
server | string | Server that manages all of the the other components. |
CommandStatus are the resulting states of a command.
Name | Number | Description |
---|---|---|
SUCCESS | 0 | SUCCESS indicates that the command was successfully completed, and the dialog state may now move on to the next state. |
FAILURE | 1 | FAILURE indicates that there was a fatal error running the command. The session will log an error and return to the start state of the model when this status is encountered. |
See the protocol buffer documentation for these
.proto Type | Notes |
---|---|
Duration | Represents a signed, fixed-length span of time represented as a count of seconds and fractions of seconds at nanosecond resolution |
Empty | Used to indicate a method takes or returns nothing |