3 weeks


Jia Liu, Xueting Zhang


conversation modeling, multimodal interface design


DialogFlow, Adobe Premiere, Adobe After Effects


It can be intimidating to converse with fast-talking locals in a foreign country. In a future with driverless cabs, might the absence of humans present an opportunity to improve the user experience for tourists?


Rover is a conversational user interface for driverless cabs. It creates seamless ride experiences in a foreign land by assisting users with a suite of tasks related to reservation, navigation, and exploration.

1. The Problem Space

We decided to design for a future in which most people prefer using a ride-sharing service to owning a car.

Our competitive analysis showed a fragmented landscape of products and services, comprising both established car manufacturers as well as startups. As of October 2017, some potential players in the space include Google Assistant, Teslabot, Bot Studio, and Uber.

Here is a table summarizing our competitive analysis.

2. Our Concept

Rover is a multimodal user inferface with voice, text, and visuals.

Visuals are used inside the car as a cue to indicate pre-attentive and attentive states.

Speech is used for conversations in the car.

Text is used for out-of-car contact.

Text is used inside the car for media-rich conversations, such as restaurant recommendations.

Designing for Speech

Ubiquitous. Unforgiving. Short attention spans.

CUIs need to speak a simple, succinct language with lots of repetition. CUIs must be highly flexible in their journey from intention to fulfillment. Finally, CUIs must be exceptionally adept at error recovery. To learn more, read my article on the politics of voice design.

3. Adding essentials

Here seven of the most fundamental interactions for a conversational interface. We designed Rover with these interactions in mind.

4. Creating a personality

Rover is male, with a full, deep voice. He is a friendly and proactive assistant.

5. Conversation Modeling

We focused on a few key intentions and modeled each of those conversations from beginning to end.

Our conversational models identified the user’s intentions and mapped a range of possible utterances to responses from the system. We also came up with a happy path for each intention. A happy path is a direct conversation that leads from utterance to fulfillment of an intention. Here's an example:

A model of a happy path from utterance to fulfillment.

A model of a complex conversation showing the pre-attentive and attentive states of the CUI.

If we had more time...

We would build out multiple conversations on DialogFlow, from beginning to end, and try them out with real users. In parallel, we would prototype a graphical interface that might accompany the CUI in autonomous cabs of the future.