Bixby

Voice Design / UX Writing / Content Design​
About
Developing a virtual assistant for Microsoft/Samsung, localised for the Brazilian market.

Bixby is Samsung’s rebranding of Cortana.

It is a virtual assistant that you can interact with using your voice, text or taps in order to carry out a lot of the tasks you do on a Galaxy S8 and S8+
Problem statement
How to create a natural sounding and efficient voice assistant for the modern user?
01
PERSONA DESIGN
Defining the voice characteristics of an assistant
PERSONA CARACHTERISTICS

Microsoft Cognitive Service’s standard voices will be used by both 1st party and 3rd party apps in different applications and scenarios. Voice is one of the interactive ways between products/services and end users, so it should have the following qualifications:  

THE PERFECT ASSISTANT

- The voice sounds youthful, energising and positive
- The voice should sound reliable and pleasant, as a close friend, a guide, a favourite teacher who you love to talk with    
- The voice should be able to convey the information in a natural conversational style, but with a bit of professional confidence of knowledge so that users can fully trust the information    
- The voice should be a mix of traditional and modern, simple and closer to the root of the languages, but also catching up with the moving world with a right balance of seriousness and kindness    
- The voice definitely should not sound robotic but very humane, easy-going, warm and approachable, sensitive to boundaries and likes to keep the conversations to the point and crisp, sensitive to contexts and situations and skilfully adjusts demeanour and behaviour accordingly, can be humorous and funny at times but knows when to be serious.    
- The voice should sound proactive, intuitive and understands the user’s needs and offer multiple solutions and alternatives to choose from  

VOICE QUALITY REQUIREMENTS

Pitch: medium, neither too soft nor too loud. The pitch should be such that it doesn’t indicate a boring and dull voice.
Accent: should be a common, standard national wide accent
Pace: medium, neither too fast nor too slow. Too fast means that one eats words and becomes incoherent, too slow will sounds dragged as well as boring and cannot hold one’s attention Articulation: more conversational and easier to understand Speaking style: should blend with casualness, professionalism and empathy, should not sound mechanical or speaking from a rehearsed script.
Timbre: clear, smooth and melodious, smiling voice which indicates cheerfulness and positivity. Empathy and warm for content which need to show understanding
02
Choosing the one
Finding the right reader
Due to the amount of text the reader would have to go through, we chose to pick content from Microsoft Service sources and have me proofread all of the scripts for the voice candidates to read.

After that, I listened to all of the candidates to judge their reading abilities, based on the persona characteristics we had already developed.
03
It has to feel natural
Tweaking frequencies
When the AI (Machine Learning NLP algorithm) couldn’t properly convey meaning due to issues specifically in tone, pitch or pace, I would modify and label these parts of the speech and send it back to the engineering team to incorporate.

Recurrent problems were due to intonation and emotion. We wanted to have an assistant that could be humorous and serious at different times, so having the correct pitch, pace and timber proved to be something I'd have to train the AI a lot.
04
Content is king
All possible interactions
Since the content for a virtual assistant must be absolutely comprehensive, our tactic was to crop parts of open sourced text to feed the AI. I had to research and proofread content for the conversations Bixby would have with the users.