Chatbot

How to create animated talking virtual assistant chatbot

There are many text and voice based conversational virtual assistant such as Siri, Alexa and Cortana. Providing a physical appearance with expressive behaviors, reflecting the brand is a key differentiating factor for building conversational chat bots. Giving the bot a physical appearance ensures the continuity of the human-to-human dialogue.

In this tutorial, you will developing an animated talking virtual assistant chat bot using the following technologies.

  1. Amazon Polly – Text to Speech API
  2. Amazon Lex – Service to build conversational chat bots
  3. NodeJs – JavaScript run time environment
  4. Expressjs – Nodejs framework for building web applications
  5. Redis – In Memory Cache
  6. PM2 – Process manager for running NodeJs applications

This video shows the demo in Chrome, and this is what we are going to build in this tutorial. The virtual assistant in the video helps users to find nearest car dealers to buy a new car . The avatar gives a short intro and asks to log in with facebook to greet you with your name and take you through series for conversations to find the nearest dealer or showroom.

A simple AI chat bot demo with animated talking character

The entire source code used for this tutorial is on GitHub.

Prerequisites

For this tutorial, you’ll need.

  1. Ubuntu 16.04 or higher.
  2. AWS Account
  3. Facebook App Account

Creating the Character

For the demo application i have purchased the character from videohive and used the adobe after effects to create two videos containing series for character animations. The character in the first video will have the lips closed and other video with the lip sync. In order to create a lip synced animated character with audio the two videos have same number of frames and duration.

Building the Conversational Bot

The conversational bot is build using the Amazon Lex service from AWS. In this tutorial i will be not going deep into how you could create the bot using Amazon Lex. You can download the Lex bot file from Github and import into the service. Once imported you will have Intents with the following properties

  • Sample utterances – Spoken or typed phrases that invoke your intent.
  • Lambda initialization and validation – The business logic required to validate the users’s input.
  • Slots – Data the user must provide to fulfill the intent.
  • Confirmation prompt – Questions that ask the user to input data.
  • Fulfillment – The business logic required to fulfill the users’s intent.
  • Response – Message(s) and/or cards to close the intent or invoke another intent.

The bot imported has only one Intent “BuyNewCar” with the above properties. Next we have to setup Lambda function which is used for user input validation and fulfillment. Use the Lambda folder from Github and follow the tutorial AWS Lambda deployment package in Node.js to setup the Lambda function for the bot.

Running the Application

If you want to try out the application in your development environment please follow the below steps to install.

Run the following commands to install NodeJS.

curl -sL https://deb.nodesource.com/setup_10.x | sudo -E bash -
sudo apt -y install nodejs

Check the Node and NPM versions.

node -v && npm -v
# v10.x.x
# 6.x.x

Run the following command to install PM2.

sudo npm install -g pm2

Clone the repository

git clone https://github.com/bytize/talking-virtual-avatar-chatbot mychatbot

Now, install all of the dependencies needed for this app.

cd mychatbot
npm install

Start the application using pm2.

pm2 start development.json

Open your browser and browse to the url http://localhost:3001

Setting Up Your NodeJs Application

First, let’s set up a web app framework with NodeJs. Create your app directory, and set up your app’s structure like this.

├── app
│     ├── app.js
│     ├── bootstrap
│     │     └── server
│     ├── components
│     │     ├── lex.js
│     │     └── polly.js
│     ├── config
│     │     └── index.js
│     ├── controllers
│     │     ├── api.js
│     │     └── main.js
│     ├── helpers
│     │     └── pug_functions.js
│     └── routes
│           └── index.js
├── public
│     ├── css
│     │     └── style.css
│     └── js
│         └── script.js
└── views
      └── index.pug

Then, run this command to initialize your NodeJs app.

npm init -f

The -f accepts the default setting, or else you can configure the app manually without the flag. Also, this will generate a package.json file that contains the basic info for your app.

Now, install all of the dependencies needed to build this app.

npm install express aws-sdk body-parser connect-redis cookie-parser express-session morgan pug --save

npm install gulp gulp-clean-css gulp-concat gulp-minify gulp-rev del --save-dev

With the --save and --save-devflag added, your package.json file will be automatically updated with the dependencies.

Now, let’s create a configuration file for the application. For security reasons it’s a best practice to place your configuration file outside your application. We will create the JSON file with the following configuration parameters and credentials.

  • Redis connection data
  • AWS credentials for LEX and Polly Service
  • Facebook App Id
{
  "redis": {
    "development": {
      "host": "localhost",
      "port": 6379,
      "db": 1
    },
    "testing": {
      "host": "127.0.0.1",
      "port": 6379,
      "db": 1
    },
    "production": {
      "host": "localhost",
      "port": 6379,
      "db": 1
    }
  },
  "awsCredentials": {
    "development": {
      "accessKeyId": "xxxxxx",
      "secretAccessKey": "xxxxxxx"
    },
    "testing": {
      "accessKeyId": "xxxxx",
      "secretAccessKey": "xxxxx"
    },
    "production": {
      "accessKeyId": "xxxx",
      "secretAccessKey": "xxxxx"
    }
  },
  "facebook": "xxxxxx"
}

Then, let’s create a config/index.js file with the JSON file required.

const config = require("/etc/config/talking_virtual_avatar_chatbot.json");

module.exports = () => {
  const env = process.env.NODE_ENV;
  const redis = config.redis;

  const awsCredentials = config.awsCredentials;

  return {
    appName: "Talking Virtual Assistant Chatbot",
    redis: redis[env],
    awsCredentials: awsCredentials[env],
    facebook: config.facebook
  }
};

Now, let’s create a helper pug_functions.js file were we can use the JavaScript functions in pug file. The file includes a single JavaScript function to get the asset path location from the manifest file generated  with Gulp task runners.

exports.assetPath = path => {
  const manifest = require("../../public/dist/manifest.json");
  if (manifest[path] === undefined) {
      return path;
    } else {
      return manifest[path];
    }
};

We are going to use Express, a NodeJs web application server framework, to run the server locally.

Now, let’s create an app.js file and instantiate Express and other components:

const express = require('express'),
    app = express(),
    bodyParser = require('body-parser'),
    logger = require('morgan'),
    session = require('express-session'),
    redisStore = require('connect-redis')(session),
    env = process.env.NODE_ENV,
    config = require('./config/index')(),
    pugFunctions = require('./helpers/pug_functions'),
    path = require('path');

Set the Pug templating engine for your app. You don’t need to ‘require’ it. Add the following code to your app.js file.

app.set('views', path.join(__dirname, '../resources/views'));
app.set('view engine', 'pug');

Now, let’s add the following middlewares to the app.

  • morgan – http request logger
  • body-parser – Parse incoming request bodies
app.use(logger('dev'));
app.use(bodyParser.json());
app.use(bodyParser.urlencoded({ extended: true  }));

Add the following code to serve static files such as images, CSS files, and JavaScript files.

app.use(express.static(path.join(__dirname, '../public')));

You are going to add the session middleware with redis as the store.

app.use(session({
    store: new redisStore({
        host: config.redis.host,
        port: config.redis.port,
        db: config.redis.db,
        ttl: 3600
    }),
    secret: '1234567890abcdefghijk',
    resave: true,
    saveUninitialized: true
}));

Now, let’s set the config, helper and env variables in the response object to be used by the pug files for rendering.

app.use((req, res, next) => {
    
    if(req.session.user){
        res.locals.user = req.session.user;
    }
    res.locals.config = config;
    res.locals.pugFunctions = pugFunctions;
    res.locals.env = env;
    next();
});

Setting up Lex and Polly Component

We have to create the following JavaScript modules with AWS SDK required.

  • lex.js – Component to consume the AWS Lex Service
  • polly.js – Component to consume the AWS Polly Service

Now, let’s create the components/polly.js module which will have the following functionalities

  • Getting the audio stream in mp3 format
  • Getting the Speech Marks which are metadata that describes the speech, such as where a sentence or word starts and ends in the audio stream. It requests speech marks for your text and returns the metadata instead of audio. By using speech marks in conjunction with the audio stream, you can provide your applications with an enhanced visual experience. So by combining the metadata with the audio stream from your text we can synchronize speech with facial animation (lip-syncing).
const config = require("../config/index")();
const AWS = require("aws-sdk");

module.exports = () => {
  var creds = new AWS.Credentials(
    config.awsCredentials.accessKeyId, 
    config.awsCredentials.secretAccessKey
  );
  const getPollyObject = () => {

    return  new AWS.Polly({
      credentials : creds,
        signatureVersion: 'v4',
        region: 'us-east-1'
    });
  }

  const getMp3 = (text,callback) => {
    var Polly = getPollyObject();
    let params = {
        'Text': text,
        'OutputFormat': 'mp3',
        'VoiceId': 'Joanna'
    };
    Polly.synthesizeSpeech(params,callback);
  }

  const getSpeechMarks = (text,callback) => {
    var Polly = getPollyObject();
    let params = {
        'Text': text,
        'OutputFormat': "json",
        'VoiceId': 'Joanna',
        'SpeechMarkTypes': [
        	"viseme","word"
      	]
    };
    Polly.synthesizeSpeech(params, callback);
  }

  return {
    getMp3: getMp3,
    getSpeechMarks: getSpeechMarks
  }
};

Then, let’s create the components/lex.js module which will have the following functionality

  • Sending user input to Amazon Lex.  Amazon Lex then interprets the user input using the machine learning model it built for the bot. It also returns application specific session attributes. Amazon Lex returns the next message to convey to the user. The response text message is passed to polly module to get the audio and speech marks for the lip-syncing.
const config = require("../config/index")();
const AWS = require("aws-sdk");

module.exports = () => {
  var creds = new AWS.Credentials(
    config.awsCredentials.accessKeyId, 
    config.awsCredentials.secretAccessKey
  );

  const botAlias = "challenge";
  const botName = "BuyNewCar";

  const getLexObject = () => {
    return  new AWS.LexRuntime({
      credentials : creds,
        lexruntime: '2016-11-28',
        region: 'us-east-1'
    });
  }

  const postText = (text, userId, sessionAttributes, callback) => {
    var params = {
      botAlias: botAlias,
      botName: botName, 
      inputText: text,
      userId: userId,
      sessionAttributes: sessionAttributes
    };
    lexruntime = getLexObject();

    lexruntime.postText(params, callback);
  }

  return {
    postText: postText
  }
};

Creating the Controllers and Routes

Let’s create the controllers/api.js controller file which will be used by the front-end.

const lex = require("../components/lex")();
const pollyClient = require('../components/polly')();

module.exports = () => {
  
  const chatWithText = (req, res, next) => {
    lex.postText(req.body.text,req.session.user.id,{},function(err, data) {
      if (err) {
        res.send(err);
      } else {
        console.log(data);
        res.send(data);
      }
    });
  }

  const getAudio = (req, res) => {
    pollyClient.getMp3(req.query.text,(err,data)=>{
      if(err){
        res.send(err);
      } else if (data.AudioStream instanceof Buffer){
        res.writeHead(200, {'Content-Type': 'audio/mpeg'});
        res.end(data.AudioStream);
      } else {
        res.send({error: "Something went wrong"});
      }
    });
  }

  const getSpeechMarks = (req, res) => {
    pollyClient.getSpeechMarks(req.body.text,(err,data)=>{
      if(err){
        res.send(err);
      } else {
        var marks = data.AudioStream.toString('utf8');
        marks = "["+marks+"]";
        marks = marks.replace(new RegExp("}\n{","g"), "},{");
        marksJson = JSON.parse(marks);
        var frames = [];
        var words = [];
        var counter = 0;
        var wordCounter = 0;
        for (var i = 0; i < marksJson.length; i++) {
          var tmp = {};
          if(marksJson[i].type == "word" && wordCounter < 6){
            words.push(marksJson[i].value.toLowerCase());
            wordCounter++;
          }
          if(marksJson[i].type == "word" && !frames[counter]){
            tmp.time = marksJson[i].time;
            tmp.start = marksJson[i].time;
            tmp.end = 0;
            frames.push(tmp);
          } else if(marksJson[i].type == "viseme" && marksJson[i].value == "sil" && frames.length){
            frames[counter].end = marksJson[i].time;
            counter++;
          }
        }
        res.send({frames: frames, words: words});
      }
    });
  }

  const login = (req, res) => {
    if(req.body.id)
      req.session.user = req.body;
    res.send({status: "success"});
  };

  const logout = (req, res) => {
    req.session.user = null;
    res.send({status: "success"});
  };

  return {
    chatWithText: chatWithText,
    getSpeechMarks: getSpeechMarks,
    getAudio: getAudio,
    login: login,
    logout: logout
  }
};

Then, let’s create controllers/main.js file to render the front-end from the server.

module.exports = () => {
  const index = (req, res, next) => {
    res.render("index");
  }

  return {
    index: index
  }
};

Now, let’s create routes/index.js file with the controller actions we created above.

const express = require('express');
const router  = express.Router();
const main = require('../controllers/main')();
const api = require('../controllers/api')();

router.get('/', main.index);

router.post('/api/chat-text', api.chatWithText);

router.get('/api/audio', api.getAudio);
router.post('/api/speech-marks', api.getSpeechMarks);

router.post('/api/login', api.login);
router.post('/api/logout', api.logout);

module.exports = router;

Add the following code to app.js file to initialize the router.

const routes = require('./routes/index');
app.use('/', routes);

Add the error handler code to app.js file.

// catch 404 and forward to error handler
app.use(function(req, res, next) {
  var err = new Error('Not Found');
  err.status = 404;
  next(err);
});

// error handler
// no stacktraces leaked to user unless in development environment
app.use(function(err, req, res, next) {
  res.status(err.status || 500);
  res.send({
    message: err.message,
    error: (app.get('env') === 'development') ? err : {}
  });
});

Finally, add the following code at the end of app.js file to export the app object to create the server.

module.exports = app;

Now, let’s create a startup script file and listen to the server.

var app = require('../app');
var debug = require('debug')('talking_virtual_avatar_chatbot');
var http = require('http');

var port = normalizePort(process.env.PORT || '3000');
app.set('port', port);

var server = http.createServer(app);

server.listen(port, function() {
  debug('Express server listening on port ' + server.address().port);
});
server.on('error', onError);
server.on('listening', onListening);

Creating the User Interface

The UI of this app is simple with just two sections containing the virtual assistant on the left and the chat interface on the right. Let’s set up our views/index.pug file and include our front-end JavaScript files and CSS.

doctype html
html
  head
    meta(charset='UTF-8')
    meta(name='viewport', content='width=device-width, initial-scale=1')
    title Talking Virtual Avatar Chatbot - Alize
    if env == "development"
      link(rel="stylesheet",href="/css/bootstrap.css")
      link(rel="stylesheet",href="/css/prelodr.min.css")
      link(rel="stylesheet",href="/css/style.css")
    else
      link(rel="stylesheet",href="/dist/css/"+pugFunctions.assetPath("stylesheet.css"))
  body
    .fb-root
    script.
      window.fbAsyncInit = function() {
          FB.init({
            appId      : #{config.facebook}, // App ID
            status     : true, // check login status
            cookie     : true, // enable cookies to allow the server to access the session
            xfbml      : true,  // parse XFBML
            version: 'v2.9'
          });
          $('.fb-root').trigger('facebook:init');

          // Additional initialization code here
      };
      (function(d, s, id){
        var js, fjs = d.getElementsByTagName(s)[0];
        if (d.getElementById(id)) return;
        js = d.createElement(s); js.id = id;
        js.src = "//connect.facebook.net/es_US/sdk.js";
        fjs.parentNode.insertBefore(js, fjs);
      }(document, 'script', 'facebook-jssdk'));
    .wrapper
      .bot-container
        include _avatar
        include _chat
    if env == "development"
      script(src="/js/jquery.min.js")
      script(src="/js/bootstrap.min.js")
      script(src="/js/VideoFrame.js")
      script(src="/js/soundmanager.js")
      script(src="/js/prelodr.min.js")
      script(src="/js/annyang.min.js")
      script(src="/js/siriwave.js")
      script(src="/js/script.js")
    else
      script(src="/dist/js/"+pugFunctions.assetPath("bundle.js"))
  .user-action-overlay
    button(type="button")#js_start.btn.btn-success.btn-lg Start

Now, let’s create the virtual assistant views/_avatar.pug file. This file is a partial and included in the views/index.pug. The view has two video tags overlapped and the visibility will be toggled for lip-syncing.

.avatar
    header
        img.img-circle.pull-left(src="/images/avatar.png",width=40)
        h3.pull-left Alize
        .pull-right.dropdown
            a(href="#",data-toggle="dropdown")
                span.glyphicon.glyphicon-option-vertical
            ul.dropdown-menu
                li.active
                    .col-md-12
                        .checkbox
                            label
                                input.js_avatar_options(type="checkbox",checked=true,value="speech") 
                                small Enable Speech Recognition
                li
                    .col-md-12
                        .checkbox
                            label
                                input.js_avatar_options(type="checkbox",checked=true,value="avatar") 
                                small  Enable Character Animation

    .canvasAvatar.canvasAvatarLoading
        video#js_open.hide(autoplay=false)
            source(src="/images/open.mp4",type="video/mp4")
        video#js_close(autoplay=false)
            source(src="/images/close.mp4",type="video/mp4")

Then, let’s create the chat interface views/_chat.pug file. This file is a partial and included in the views/index.pug. The UI is similar to whatsapp interface with chat bubbles.

.chat
    header
        if user
            img.img-circle.pull-left(src="https://graph.facebook.com/"+user.id+"/picture?type=small",width=40)
        else
            img.img-circle.pull-left(src="/images/no-user.jpg",width=40)
            
        h3.pull-left 
            if user
                | #{user.name}
            else
                | Hi, Guest
        .pull-right.dropdown(class=user?"":"hide")
            a(href="#",data-toggle="dropdown")
                span.glyphicon.glyphicon-option-vertical
            ul.dropdown-menu
                li
                    a#js_logout(href="javascript:void();") Logout
    .chat-container
        .chat-body
            .chat-messages
    .chat-input
        #js_siriwave
        a#js_facebook.btn.btn-primary.btn-lg(href="#",class=user?"hide":"") Login with facebook
        .input-group(class=user?"":"hide")
            input#js_message.form-control.input-lg(type="text",data-user=user?user.name:"")
            span.input-group-btn
                button#js_send.btn.btn-default.btn-lg Send

To style the UI as seen in the demo, refer to the style.css file in the source code.

Communication with Bot

You will be using simple AJAX request to send data to the server via POST when the user interacts. The user input and bot response is appended as chat bubbles with timestamp. The bot response comes in two variants one is text and other in clickable list items. Once the item is clicked it will trigger as a reply to the bot to continue the conversation.

$("#js_send").click(function(e){
    e.preventDefault();
    messageSubmit();
});

$("#js_message").keypress(function (e) {
  if (e.which == 13) {
    messageSubmit();
  }
});

function messageSubmit(){
    var text = $("#js_message").val();
    if(text!="" && text.length < 300){
        $("#js_message").val('');
        insertUserMessage(text);
        getBotResponse(text);
    }
}

function getBotResponse(text){
    $.ajax({
        url: '/api/chat-text',
        data: {text:text},
        dataType: "JSON",
        method: "POST",
        success: function(data){
            if(data.message){
                insertBotMessage(data.message);
                insertResponseCard(data);
                talk(data.message);
            }
        }
    })
}

Once the bot responds to the user input. You can request the speech marks for the bot’s response text after the audio in mp3 format is returned from the server. For this tutorial we will be using SoundManger for handling audio in the browser. SoundManager is a JavaScript Sound API supporting MP3, MPEG4 and HTML5 audio + RTMP, providing reliable cross-browser/platform audio control.

function talk(text){
    if(enableCharAnimation) {
        if(!isVideosDownloaded) return;
        initSoundManager("/api/audio?text="+text+"&t="+new Date().getTime(),text);
    }
}

function initSoundManager(url, text){
    console.log("soundManager");
    pre.show("Loading...");
    audio = soundManager.createSound({
      id: 'audio',
      url: url,
      autoLoad: true,
      autoPlay: false,
      stream: false,
      onload: function() {
        getSpeechMarks(text,function(data){
            avatarDialog = data.words;
            speechMarks = data.frames;
            playAudio = true;
            speechMarkIndex = 0;
            audioFrames = 0;
        });
      },
      onfinish: function() {
        if(!startConversation){
            startConversation = true;
            getBotResponse("buy a new car");
        }
        resetAudio();
      }
    });
}

function getSpeechMarks(text,callback){
	$.ajax({
		url: "/api/speech-marks",
		method: "POST",
		data: {
			text: text
		},
		dataType: "JSON",
		success: function(data){
			callback(data);
		}
	})
}

Character Animation with JavaScript

Let’s create a object with start and end timing of the animations in the video we created above. Each object has a type which defines the animation. For this tutorial i have included only 7 actions. You could include more actions based on your requirement.

var actions = {
    "a": {
      type: "idle",
      start: 0,
      end: 90
    },
    "a1": {
      type: "left_normal",
      start: 90,
      end: 180
    },
    "a4": {
      type: "explain_left",
      start: 370,
      end: 470
    },
    "a7": {
      type: "me_left_round",
      start: 645,
      end: 755
    },
    "a9": {
      type: "explain_you",
      start: 855,
      end: 945
    },
    "a13": {
      type: "explain_you_alt",
      start: 1200,
      end: 1285
    },
    "a14": {
      type: "hands_down",
      start: 1285,
      end: 1400
    }
  };

Let’s initialize the videos using VideoFrame. VideoFrame is a HTML5 Video SMPTE Time Code and Frame Seeking API. You will be using this JavaScript library to seek the video by frames. VideoFrame callback is called every frames when the video starts playing. You can handle the logic to do lip-syncing in the callback.

function initVideo(id,callback){
    return VideoFrame({
        id : id,
        frameRate: 25,
        callback : callback
    });
}

openVideo = initVideo('js_open',renderOpen); 
closeVideo = initVideo('js_close',renderClose);

Lip-syncing with JavaScript

Once you get the speech marks and audio response from the server. You will do the lip-syncing by toggling the open and close video. Each speech mark will have an end and start time which will be converted to start and end frames and compared against the audio frames to enable the toggling.

audioFrames++;
var time = speechMarks[speechMarkIndex].time;
var endTime = speechMarks[speechMarkIndex].end;
var startTime = speechMarks[speechMarkIndex].start;
var startTimeFrame = Math.round((startTime/1000)*50);
var endTimeFrame = Math.round((endTime/1000)*50);
if(audioFrames >= startTimeFrame && audioFrames <= endTimeFrame){
    $("#js_close").addClass("hide");
    $("#js_open").removeClass("hide");
    if(audioFrames == endTimeFrame){
        speechMarkIndex++;
    }
} else {
    $("#js_open").addClass("hide");
    $("#js_close").removeClass("hide");
}

Minification with Gulp

You can use Gulp task runner to combine and minify the CSS and JavaScript files. Let’s create a gulpfile.js file with the minification tasks.

const { src, dest, series, parallel } = require('gulp');
var concat = require('gulp-concat');
var minify = require('gulp-minify');
var cleanCss = require('gulp-clean-css');
var rev = require('gulp-rev');
var del = require('del');

const clearJs = function () {
  return del([
    'public/dist/js/*.js'
  ]);
};

const clearCss = function () {
  return del([
    'public/dist/css/*.css'
  ]);
};

const packJs = function () {  
  return src(['public/js/jquery.min.js', 
  	'public/js/annyang.min.js', 
  	'public/js/bootstrap.min.js',
  	'public/js/prelodr.min.js',
  	'public/js/VideoFrame.js',
  	'public/js/soundmanager.js',
  	'public/js/siriwave.js',
  	'public/js/script.js'])
    .pipe(concat('bundle.js'))
    .pipe(minify({
        ext:{
            min:'.js'
        },
        noSource: true
    }))
    .pipe(rev())
    .pipe(dest('public/dist/js'))
    .pipe(rev.manifest('public/dist/manifest.json', {
      merge: true
    }));
};

const packCss = function () {  
  return src(['public/css/bootstrap.css', 
  	'public/css/prelodr.min.css',
  	'public/css/style.css'])
    .pipe(concat('stylesheet.css'))
    .pipe(cleanCss())
    .pipe(rev())
    .pipe(dest('public/dist/css'))
    .pipe(rev.manifest('public/dist/manifest.json', {
      merge: true
    }));
};

exports.build = series(clearJs, clearCss, parallel(packCss, packJs));

Now, let’s run the following command to execute the build process. After the build process is completed, you can find the minified assets under public/dist folder.

./node_modules/.bin/gulp build

I hope you’ve enjoyed the tutorial and created animated talking virtual assistant chatbot!

Rajesh Saravanan

Rajesh Saravanan

CEO and Co-Founder at Bytize

Leave a Reply