Alexa skill development: understanding intents, utterances, and slots

Since Amazon opened up its voice platform Alexa to developers in 2015, the platform has seen several thousand new ‘skills’ launched. Those trying to get their head around Alexa skill development quickly come across some unfamiliar terms which require decoding before the work of skill-building can begin. Having recently undertaken flagship UX work building skills on the platform for our clients, here is a quick guide to getting your head around the key terms Amazon uses when developing for Alexa.

Note: the following is applicable to Amazon’s Alexa. Other companies have their own glossary of terms, some use the same terms, others use their own terms, but Alexa is currently the most widely used platform.

Alexa vs Echo

In discussions and in the media, the terms ‘Alexa’ and ‘Echo’ are often conflated when in reality they refer to specifically separate things. Echo refers to the Amazon Echo, a physical product, with a speaker and ring of microphones. Echo can also refer to the Echo Dot, the main Echo’s smaller cousin.

Alexa is the cloud-based ‘intelligent personal assistant’ which processes your requests and supplies answers back to you. If it’s easier, think of Alexa as the ‘mind’ of the ecosystem, while Echo is the ‘body’. When you speak your conversation is with Alexa — Echo is just the mouth and ears. Alexa isn’t constrained to just the Echo though, you can also speak to Alexa through Amazon’s 'Fire' branded products, as well as a growing number of third-party consumer devices.

Wakeword

The Echo devices have a ring of always-on microphones, meaning the device is always listening to what is around it but in a dormant state. It will only ‘wake up’ and actively pay attention to you when it hears a specific word or phrase, called a wakeword. Amazon offers a choice of ‘Alexa’, ‘Amazon’, ‘Echo’, or ‘Computer’ for these wakewords, with the default being ‘Alexa’. This wakeword cannot be changed beyond these four options by users or by developers. The only time the wakeword options have changed is when Amazon added ‘Computer’ in late 2016.

Note; a ‘wakeword’ wakes the assistant, but does not trigger your specific skill, that would be an invocation (we’ll get to this later).

Skills

The simplest way to describe a skill is ‘an app for Alexa’. But try to avoid this, as it can cause confusion with the Alexa App, (which we’ll come to later on). Amazon maintains a list of available Alexa skills on its site.

Skills aren’t downloaded in the same way as a phone app and are instead ‘enabled’. This is because the code and content for a skill are not stored on the Echo device, but in Amazon’s cloud servers along with Alexa itself. If a user wants to use your skill, they ‘enable’ it, either within the Alexa App or by asking Alexa to enable it. This then gives the user the ability to invoke and use your specific skill.

There are three types of skill:

  1. Custom Skills
    This is the most common type of skill, and gives you the most control over the user experience. This type of skill lets you develop just about anything you can imagine.
  2. Smart Home Skills
    This is a type of skill specifically for controlling smart home appliances. It gives you less control over the user experience, but is simpler to develop.
  3. Flash Briefing Skills
    This type of skill is specifically for compatibility with Alexa’s native ‘Flash Briefing’ ability. This type of skill also gives you reduced experience control, but again is simpler to develop.

Invocation Name

An ‘invocation name’ is the word or phrase used to trigger your skill. In a sense it’s voice’s equivalent of an app icon. This invocation name usually matches your skill’s name, but given the stringent rules around choosing an invocation name, it might end up being slightly different.

Take The Guardian’s skill as an example. Their invocation name is ‘the guardian’, so for a user to start using their skill they would say something along the lines of;

  • “Alexa, open The Guardian”
  • “Alexa, ask The Guardian to give me the headlines”
  • “Alexa, ask The Guardian to give me the latest podcasts”

Note: invocation names are only needed for Custom Skills.

Intent

An intent is what a user is trying to accomplish. Within the code, this is how you define your function. ‘Intent’ doesn’t relate to the specific words that a user says, but the high-level goal they are aiming for.

Utterance

Utterances are the specific phrases that people will use when making a request to Alexa. These can be hugely varied — just think of the number of ways that people can ask for the time;

  • “What time is it?”
  • “What’s the time?”
  • “How late is it?”
  • “What’s the time now?”
  • “Do you have the time?”
  • “Can you tell me the time?”

This is where a flair for communication comes in — when developing a skill, utterances have to be coded to tell Alexa what to expect. This can mean typing out dozens of very slight variations of questions and statements — basically anything you think a user would actually say to get the result they want.

Slot

A slot is a variable that relates to an intent allowing Alexa to understand information about the request. For example, in a skill which delivers people their daily horoscope, the user’s request may take the form of the utterance “Give me the horoscope for Leo”. In this example, Leo would be the custom slot.

Amazon provides a number of built in slot types, such as dates, numbers, durations, time, etc. But developers can create custom slots for variables which are specific to their skill.

Using Intents, Utterances, and Slots

Intents, utterances, and slots all work together to tell Alexa what you want to happen when someone is using your Alexa skill. You’ll provide the Amazon Developer Platform with a list of your intents and utterances in the following way:

In this instance GetHoroscope denotes the intent, the statement is the utterance, while {Sign} denotes the custom slot.

Alexa App

This is a companion app made by Amazon and available on iOS, Android, and Amazon Fire. It lets the user set up an Echo (or other Alexa-enabled device), change settings, enable/disable skills, and see information associated with user requests on cards.

Card

If you’ve used Alexa before you’ve probably seen cards in the app. Cards are used to display information relating to the user’s request, whether that’s simply displaying what the user asked and Alexa’s response, or information which is difficult to convey through voice (e.g. a picture, or long numbers or lists, which can be difficult to process and remember when delivered through voice only).

Choosing the right platform

Amazon are trailblazers in this space. Among the new category of voice-commanded consumer products (besides phones) Amazon has the most traction, with dozens of devices using Alexa. Amazon was also the first to open its ecosystem to developers. This means that they have momentum, and are to a certain extent able to set the language used in designing for voice.

We’re in the early stages of mainstream voice technology and it will take a while before the market matures and long-term leaders emerge. While Amazon are currently ahead, any market can be disrupted by challengers and new entrants.

This vocabulary list may look quite different in two years’ time. The launch of Google Home has already brought new words, like ‘action’ instead of ‘skill’, but Google are still in the process of opening up to developers. The longer they wait (and the more people become accustomed to developing for Alexa) the more people are going to say “Google Action? Is that like an Alexa Skill?”.

Screenmedia are an innovation and voice interaction practice. If you're looking for more than just words, get in touch to see how we can help you out with Alexa skill development.

Andrew

Innovation Lead

Let’s build something amazing together

Let's talk