Diatheke API Reference

The Diatheke API is defined using gRPC and protocol buffers. This section of the documentation is auto-generated from the protobuf file. It describes the data types and functions defined in the spec. The “messages” below correspond to the data structures to be used, and the “service” contains the methods that can be called.

diatheke.proto

Service: Diatheke

Service that implements the Cobalt Diatheke Dialog Management API.

Method Name Request Type Response Type Description
Version Empty VersionResponse Queries the Version of the Server.
Models Empty ModelsResponse Models will return a list of available versions. Model values from this list may be used in NewSession calls.
NewSession NewSessionRequest SessionID Requests a new session with the given config and returns the session ID, which is required for other rpc methods. After the session is created, StartSession() must be called to begin executing the Diatheke model.
StartSession SessionID Empty Begin execution of the model for the given session ID. The session’s event stream should be set up prior to calling this function so that the client application can respond to any initialization events that are defined in the session’s model.
EndSession SessionID Empty Terminates an existing session and closes any open session streams. It is an error if the SessionEndRequest has an invalid SessionID.
SessionEventStream SessionID DiathekeEvent Requests a new event stream for the given session. Only one stream per session is allowed.
CommandFinished CommandStatus Empty Notify Diatheke when a command has completed so that it may update the dialog state. The initial command request will come as part of a DiathekeEvent. After sending a CommandEvent, Diatheke will wait until it receives the CommandFinished notification before continuing to the next action in the model. Client applications should therefore always call this after receiving a CommandEvent, or else the session will hang.
StreamAudioInput AudioInput Empty Begin an audio input stream for a session. The first message to the server should specify the sessionID, with binary audio data pushed for every subsequent message. As the audio is recognized, Diatheke will respond with appropriate events on the session’s event stream.

Only one stream at a time is allowed for a session. A previously created audio input stream must be closed before starting a new one.

StreamAudioReplies SessionID AudioReply Create an audio reply stream for a session. The returned stream will receive replies (as defined in the Diatheke model) from the server as they occur in the conversation. For each reply, the stream will first receive the text to synthesize (defined by the model), followed by one or more messages containing the synthesized audio bytes. The reply will end with a message indicating that TTS for that entry is complete. Only one reply stream at a time is allowed for a session. NOTE: The text in the first message of an audio reply is the same that will be received in the session’s event stream.
PushText PushTextRequest Empty Push text to Diatheke as part of the conversation for a session. Diatheke will respond with an appropriate event on the session’s event stream based on whether the given text was recognized as a valid intent or not.
SetStory StoryRequest Empty Set the current story for a running session. This function can be used to implement system initiated alerts or to change the current session state. Events for the new story will come over the session’s event stream.
StreamASR ASRRequest ASRResponse Manually run streaming ASR unrelated to any session by pushing audio data to the server on the audio stream. As transcriptions become available, the server will return them on the ASRResponse stream. The transcriptions may then be used for, e.g., the PushText method. This function is provided as a convenience.
StreamTTS TTSRequest TTSResponse Manually run streaming TTS. The Audio stream will receive binary audio data as it is synthesized and will close automatically when synthesis is complete. This function is provided as a convenience.

Message: ASRRequest

Request for streaming ASR unrelated to a session.

Field Type Label Description
model string

The Cubic model to use for ASR. This message should always be sent before any audio data is sent.

audio bytes

Audio data to process. The encoding of the data should match what was specified in the Diatheke server configuration. NOTE: If the audio data is empty, the server may interpret it as the end of the stream and stop accepting further messages.

Message: ASRResponse

ASRResponse contains speech recognition results.

Field Type Label Description
text string

Text is the Cubic engine’s formatted transcript of pushed audio. This field will be the 1-best alternative.

confidence_score double

The confidence score is a floating point number between 0.0 - 1.0. A score of 1.0 indicates that the ASR engine is 100% confident in the transcription.

Message: AtStartEvent

The AtStartEvent is sent when a Diatheke session returns back the start state of the model.

This message is empty and has no fields.

Message: AudioInput

Provides input audio data for StreamAudioInput. The first message sent must contain the session ID only. All subsequent messages must contain audio data only.

Field Type Label Description
session_id string

Session ID returned from the NewSession call.

data bytes

Audio data to process. The encoding of the data should match what was specified in the Diatheke server configuration. NOTE: If the audio data is empty, the server may interpret it as the end of the stream and stop accepting further messages.

Message: AudioReply

An AudioReply is the verbal and textual reply that Diatheke returns as part of a conversation (not to be confused with the server concepts of request and response).

Field Type Label Description
label string

The label defined in the Diatheke model. Identifies which reply in the model this message corresponds to.

text string

The reply text as defined in the Diatheke model. This is the first message that will be received for an AudioReply. It contains the same text as the corresponding ReplyEvent in the session’s event stream.

data bytes

The audio data from TTS. There can be any number of these messages for an AudioReply after the first text message and before the final end message. The encoding of the data will match what was specified in the server configuration.

end Empty

Indicates that TTS has finished streaming audio for the reply. This is the last message that will be received for an AudioReply.

Message: CommandEvent

A CommandEvent occurs when Diatheke wants the client to execute the given command.

Field Type Label Description
command_id string

ID of the command that should be run. i.e. “COM01” for Command #01.

parameters CommandEvent.ParametersEntry repeated

A generic map of parameters (name, value). The parameters are defined in the Diatheke model. Depending on the command, these parameters should be sent back with the CommandStatus update.

command_state_id string

ID to keep track of the dialog state when the command is requested. This field is required in the CommandStatus message so that Diatheke can correctly update the dialog state when CommandFinished is called.

Message: CommandEvent.ParametersEntry

Field Type Label Description
key string

value string

Message: CommandStatus

The final status of an executed command.

Field Type Label Description
session_id string

session_id should be the same as the status id returned from NewSessionResponse.

command_id string

ID of the command as given in the RunCommand object.

return_status CommandStatus.StatusCode

output_parameters CommandStatus.OutputParametersEntry repeated

Parameters to return to Diatheke. For example, the map might contain the entry “temperature”, which was populated with a value of “30” after the command finished. Expected parameters are defined by the Diatheke model.

error_message_text string

Set this field with an error message if a fatal error occured while executing the command (return_status == FAILURE).

command_state_id string

State ID from the original CommandEvent. This field is required for Diatheke to correctly update the dialog state when CommandFinished is called.

Message: CommandStatus.OutputParametersEntry

Field Type Label Description
key string

value string

Message: DiathekeEvent

An event from Diatheke in response to either recognized audio, submitted text, or some other transition in the model.

Field Type Label Description
command CommandEvent

Indicates Diatheke found an actionable state in the dialog, and requests the client to perform the given command.

Users should always call CommandFinished after receiving this event so that Diatheke can update the dialog state when the command is complete.

recognize RecognizeEvent

An event indicating whether pushed text and audio was recognized by ASR and/or Diatheke.

reply ReplyEvent

The textual reply from Diatheke in the conversation (not to be confused with the server concepts of request and response). For example, this could be a question to solicit more information from the user, a status report, or any other reply defined by the Diatheke model. The text of this message is also provided in the AudioReply stream (if one is open).

input_required InputRequiredEvent

Indicates that Diatheke is expecting user input (text or audio), which is defined by input actions in the Diatheke model.

at_start AtStartEvent

Indicates that Diatheke has returned to the start state of the model.

Message: Empty

This message is empty and has no fields.

Message: InputRequiredEvent

An InputRequiredEvent occurs when Diatheke is expecting input from the user (text or audio).

This message is empty and has no fields.

Message: ModelsResponse

The message sent by the server in response to a Models request. Returns an array of model names.

Field Type Label Description
models string repeated

Array of models available for use.

Message: NewSessionRequest

Request for the NewSession call.

Field Type Label Description
model string

For applications that have more than one model to use for ASR/NLU. ASR grammar can vary between models, as well as sets of commands. Some applications will only have one model.

Message: PushTextRequest

Request to push text to Diatheke as part of a conversation.

Field Type Label Description
session_id string

Session ID returned from the NewSession call.

text string

User input. This could be a transcription from manually run ASR, text selected from a dropdown list, entered in a prompt, etc.

Message: RecognizeEvent

A RecognizeEvent occurs if a session’s audio input has a transcription available, or if the PushText method was called. In both cases, the event will indicate whether the text was recognized as a valid intent by the Diatheke model.

Field Type Label Description
text string

The pushed text or transcription of audio sent to Diatheke.

valid_input bool

True if the submitted text or audio transcription was recognized by the Diatheke model as a valid intent or entity.

Message: ReplyEvent

A ReplyEvent occurs when Diatheke has a reply in the conversation (not to be confused with the server concepts of request and response). These correspond to replies defined in the Diatheke model. For example, it might be a prompt for additional information from the user, a status update, or a confirmation. ReplyEvents are not generated in response to StreamTTS calls.

Field Type Label Description
text string

Text of the reply event (defined by the Diatheke model).

label string

Label of the reply event (defined by the Diatheke model).

Message: SessionID

Simple message that only contains the session ID.

Field Type Label Description
session_id string

Session ID returned from the NewSession call.

Message: StoryRequest

Request to change the current story of a session.

Field Type Label Description
session_id string

ID of the session that will have its story changed.

story_id string

ID of the story to switch to. This ID is defined by the model used to create the session.

parameters StoryRequest.ParametersEntry repeated

Parameters to set in session memory before executing the specified story. Some stories in the model may make assumptions about which parameters have already been defined, so it is important to be familiar with the model requirements for any given story.

wait_for_start bool

If true, the given story will not be executed until the session completes the current stories and returns back to the main story. If false, the current story in the session will be immediately interrupted to execute the specified story.

temporary bool

If true, once the given story has finished, Diatheke will return the session to the place in the model where it was when this request was received, and restore the parameters that were defined at that time. This is useful when the change in story represents a temporary interruption. If false, Diatheke will simply continue from the given story without trying to go back to its prior state, which is useful to make a permanent state change.

Message: StoryRequest.ParametersEntry

Field Type Label Description
key string

value string

Message: TTSRequest

Request to synthesize speech unrelated to a session.

Field Type Label Description
model string

The Luna model to use for TTS (defined in the server config file).

text string

Text to synthesize

Message: TTSResponse

Response for text-to-speech unrelated to a session.

Field Type Label Description
data bytes

The synthesized audio data. The data encoding will match what was specified in the server configuration.

Message: VersionResponse

The message sent by the server for the Version method.

Field Type Label Description
server string

Server that manages all of the the other components.

Enum: CommandStatus.StatusCode

CommandStatus are the resulting states of a command.

Name Number Description
SUCCESS 0 SUCCESS indicates that the command was successfully completed, and the dialog state may now move on to the next state.
FAILURE 1 FAILURE indicates that there was a fatal error running the command. The session will log an error and return to the start state of the model when this status is encountered.

Well-Known Types

See the protocol buffer documentation for these

.proto Type Notes
Duration Represents a signed, fixed-length span of time represented as a count of seconds and fractions of seconds at nanosecond resolution
Empty Used to indicate a method takes or returns nothing

Scalar Value Types

.proto Type Notes Go Type Python Type C++ Type
double float64 float double
float float32 float float
int32 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. int32 int int32
int64 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. int64 int/long int64
uint32 Uses variable-length encoding. uint32 int/long uint32
uint64 Uses variable-length encoding. uint64 int/long uint64
sint32 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. int32 int int32
sint64 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. int64 int/long int64
fixed32 Always four bytes. More efficient than uint32 if values are often greater than 2^28. uint32 int uint32
fixed64 Always eight bytes. More efficient than uint64 if values are often greater than 2^56. uint64 int/long uint64
sfixed32 Always four bytes. int32 int int32
sfixed64 Always eight bytes. int64 int/long int64
bool bool boolean bool
string A string must always contain UTF-8 encoded or 7-bit ASCII text. string str/unicode string
bytes May contain any arbitrary sequence of bytes. []byte str string