SIGNAL Behind the Scenes: Twilio's Kris Gutta

August 23, 2019
Written by


In this interview, Kris Gutta, a Product Manager on Twilio's Voice team sat down with Corey Weathers, a Developer Evangelist at Twilio to discuss the launch of Twilio Media Streams. We hope you enjoy this interview transcript.

The following interview took place August 6th, 2019 at Twilio's SIGNAL conference at Moscone West in San Francisco, California. It previously aired on our SIGNAL TV broadcast over on

You can also find this interview on Youtube on Twilio's channel.

Meet Kris Gutta

Kris: How you doing, Corey?

Corey: I'm hanging in there, how 'bout yourself?

Kris: Great.

Corey: We've been talking a lot about Twilio, we've been excited about the keynote, we're really excited to see somethings happening here on the stream and so, before we jump into that stuff, I have to ask: how's your SIGNAL been?

Kris: It's been fantastic. I mean I just love being at SIGNAL I get to see customers that I interact with day-to-day in person.

Corey: Yeah.

Kris: Like I get to see my colleagues that I talk to frequently in person. So it's just incredible, I just love SIGNAL just because of how we can interact with everyone. It's amazing.

Corey: Well, this is our first time kinda doing this experience where we're taking pieces of SIGNAL, and taking it home to our friends from around the world and we've got folks joining in from the UK and other parts of Europe. We've got folks joining in from Florida and other parts of the United States. So it's super good to have all of these folks here which is exciting.

Kris: Yeah, I gotta say when you first mentioned that we were doing this, I thought it was an incredible idea-

Corey: Thank you.

Kris: To be able to stream content from Twilio at the SIGNAL to the world.

Corey: Yeah

Kris: And I was first to sign up because I thought it was exciting.

Corey: That's great! And so we're excited to have you. It's now time for us to talk about one of the things that got announced earlier today during the keynote: Media Streams.

Kris: Yup

Corey: Can you describe what it is for the person who has no context, missed the keynote.

Kris: Absolutely. So first, I'm a Product Manager from Voice Teams, so I work with a lot of for voice customers and as you can imagine, if you're familiar with Twilio

Jonathan De Jong, VP of Engineering joins Kris and Corey

Corey: Oh we've got friends joining.

Kris: One of those customers.

Jonathan: Hi.

Kris: Hey Jonathan, how are you doing?

Corey What's up, Jonathan?

Jonathan: Oh you know.

Corey: Friends this is Jonathan De Jong he's VP of Engineering over at GLOBO. Kris was just talking about how we legitimately have, we got to get him in the shot here.

Kris: Let's get in there.

Corey: We were just talking about how he uses SIGNAL to connect up with customers.

Jonathan: Oh yeah.

Corey:Who he doesn't always get to see, and so it's super good to have you popping into the booth here.

Jonathan: Yeah, it's great to see you guys. I mean we're really excited for this new product. It's going to help us a lot with our call center and yeah, I couldn't be happier to be here talking with Kris. We just finished talking, actually, at the session but being part of this has been a great experience.

Corey: We love how excited you are about this. We really can't wait to get you on the stream. So, next time we got to make sure we schedule your own time.

Jonathan: Yeah, of course.

Kris: Thanks for popping in.

Jonathan: Thanks guys!

Kris: Thanks for stopping by.

Jonathan: Yeah, good to see you guys.

Corey: I love how excited he is. We didn't even really get to jump into the product yet. This is going to be amazing. You all should have the same excitement.

Kris: Jonathan, he's amazing. He's a great customer, as you can imagine. This is the kind of relationship that we like to have with our customers, where they feel like they can come in and talk with us anytime and have that interaction.

Corey: Absolutely.

Kris: That's why SIGNAL's great.

Corey: That's what SIGNAL's for.

Kris: It really builds these type of relationships with the customers and different developers around the world. Where they can feel like they can contact us anytime.

Corey: No I absolutely love it. I love that he felt so comfortable to just pop in here. I love that we got him a little bit on the mic, even though we got lapel mics. This is great.

Kris: I know, yeah.

Kris and Corey discuss Media Streams

Corey: Anyway let's get back into it. What is Media Stream? Let's talk about it.

Kris: Sure, so in simple terms, imagine that me and Corey are talking to each other, and that's a call that's going through Twilio platform let's say through, connected to Corey. With Media Streams, what you can do is, as we're talking to each other, you can actually fork the media, the actual RTP of the phone call.

Corey: Whoa.

Kris: And receive that over web sockets.

Corey: Really?

Kris: Right, so you can imagine, you can do a lot of different things.

Corey: Yes.

Kris: Like you can transcribe that call.

Corey: Right.

Kris: Imagine if the same conversation that's taking place between an Uber driver and a passenger or a contact center agent and a customer, you can transcribe these conversations

Corey: In real time?

Kris: In real time.

Corey: Oh my God!

Kris: Right, so once you have the transcription of what's actually happening in real time, you can do a lot of different things.

Corey: Yeah.

Kris: One, you can analyze the content of the transcription to analyze sentiment. Is there a negative sentiment in the conversation.

Corey: That's right.

Kris: Right? And so, you can actually then, you can do this across thousands of calls

Corey: Yeah.

Kris: Because it all automated.

Corey: Yep.

Kris: And so you surface these conversations where if there's a stressed customer or there's a agent that needs help, pop that, you know pop that up to your supervisors

Corey: Yeah.

Kris: so they can jump in and help.

Kris: Cause today, you do that very manually.

Corey: Yeah.

Kris: And another thing is you can extract the intent of what the customer is saying. Push information to the agent. Here's a classic example, imagine that you're in an Uber or a Lyft or driving down the street and you saw a rent sign or a For Sale sign, you call the number and say, "Hey, I was driving down Bush Street at Bush and Powell "and I saw there's a house for sale, "I'd like to know more about it."

Corey: Oh my God!

Kris: Right? So, as you're speaking, imagine the agent that is answering the phone kept the results of all the open houses on Bush Street

Kris: On Bush Street

Kris: and Powell

Corey: That is amazing!

Kris: So, it basically reduces the amount of time is takes to help customers, and it just creates a better experience.

Corey: Oh my goodness, this is amazing.

Kris: It is, I'm happy tell you, it's been exciting just working with this product from the very beginning and launching this product.

Corey: Yeah.

Kris: Just because everything we've done so far has been APIs that gives you functionality.

Corey: That's right.

Kris: But with Media Streams we are giving you a raw audio.

Corey: Absolutely.

Kris: So you can just build whatever the use case that may be. You can build voice authentication, so you can actually authenticate over the sound of your voice

Corey: That's right.

Kris: ends up capturing critical information

Corey: That's exactly right.

Kris: There's a lot more possibilities. For example, Jonathan is building something to analyze background noise.

Corey: Wow!

Kris: So, that he can basically detect if there's an agent that has the background, a very noisy environment.

Corey: Wow!

Kris: He can remove that automatically from routing so that agent doesn't get calls

Corey: Wow!

Kris: and that situation is addressed, which basically streamlines your customer experience.

Corey: That is amazing! Oh my God, can we see a demo?

Kris: Yes!

Corey: Okay, so we're going to bring up Kris' computer I want to give a quick shout out to Layla Codes It who came and it was the hand of God who fixed our panel here. You all were calling it out, thank you Chat Room. Also, really quickly before we hop to the demo, I want to say a quick thank you to a number of followers who just follow the channel, we've got a Straw-hat Raider, we got An-bruta, we've got Lord of 96 as well as Money Montano. Thank you so much for hitting the follow button. If you like what you're seeing, if you like what you're hearing, if you want to follow the SIGNAL conference experience for the next day and a half, hit that follow button. You'll know when we go live, you'll see our schedule of content and all the amazing live demos and interviews to come. Okay, let's get back to it,

Kris: All right,

Corey: So, here's Kris's computer.

Kris: Yeah, so, I have a couple of demos, that, a Corey was talking, I was like, "Oh what demos should I show?"

Corey: Yeah.

Kris: So, I'm going to show two demos. One, I'm going to just dive right into something I've already built, so it just saves us a little bit of time, so I can do second demo. What I have here is a simple Studio Flow. So, if your not familiar with the Twilio Studio, it's our regional builder that our customers can use to drag and drop and build your voice application. In case you're wondering, I can write code, but I decided to use Studio today just because it's simple and easy to visualize.

Corey: Well it's super simple

Kris: I was going to

Corey: to follow this.

Kris: Right. Exactly, so here I have a simple Flow. It says when you call, say please hold and what I'm doing is, I'm basically taking the phone call and forking it three times.

Corey: Whoa.

Kris: So, three independent forks, right?

Corey: Really?

Kris: Yes, so one fork is going to go over ngrok, our favorite tool.

Corey: Okay, we love ngrok.

Kris: And come to my laptop, so you can actually see the stream going through. Another one is going to go directly to Google Cloud so, that from Twilio it's gonna to go to Google Cloud, it's going to transcribe.

Corey: That's right.

Kris: It's going to show the transcription.

Corey: That's exactly right.

Kris: And the third one is going to AWS where I'm basically putting the stream into SQS so that later on I can create multiple consumers to analyze audio in real time

Corey: This is amazing.

Kris: Yeah.

Corey: And we're doing all of this in Studio?

Kris: [Kris] We're doing all this in Studio, and you can see there's a state that just protects your speech. Here's stream one if you can see, I'm zooming in a little bit here.

Corey: Well zoom just a little bit.

Kris: [Kris] Yeah, there you go. And that's basically sending it to the Google. This one's going to ngrok and this one's going to AWS, and I'm parking the car in a queue because if the call ends the stream disconnects

Corey: Oh, cool.

Kris: So for now, I'm parking the call in a queue.

Corey: Got it.

Kris: And I'm just going to publish this

Corey: And we're publishing this. What happens when we publish this flow?

Kris: So, what happens now is the, so far everything I've been doing is a draft, so when I publish it, whatever I've been working on becomes live. So that when I make a phone call

Corey: Got it.

Kris: Then the new Flow takes in effect.

Corey: Got it.

Kris: It just allows you to make changes and save it and publish it later

Corey: Right.

Kris: Without having to break your application.

Corey: That's cool.

Kris: That's really the key here and I already had wired a phone number here so, I'm going to quickly make a phone call.

Corey: We're going to call that number?

Kris: We're going to call that number.

Corey: Okay.

Kris: I'm going to, would you like to call?

Corey: Oh, I would love to call.

Kris: Do it. So I have ngrok running here and I have a simple, let's see where I am right now, okay.

Corey: I'm not love that we're in the terminal. Do you mind bumping up the zoom a little bit?

Kris: Not at all, so by the way, the code I'm showing you right here is something that's available for you all to use. We have a public GitHub repo where it records samples.

Corey: Nice.

Kris: So to get started with media streams, once you have the repo, it takes you like 10 minutes.

Corey: Oh my goodness!

Kris: Or less to start transcribing your call.

Corey: Ten minutes or less, now you heard it here. Listen, if it's not 10 or less, don't blame me, blame Kris.

Kris: That's right. That's right. So, I have a simple FH here, no bells and whistles and sorry, let me just point you the number right here.

Corey: Yeah, so we'll call this number, 505-539-[redacted for text]. Now we've just broadcast this to the internet.

Kris: I know

Corey: Let's see how this goes.

Kris: I'm a product manager, not an engineer, so I don't write scalable applications. Thankfully, we have engineers for that but as you can see this is an audio stream that's coming through

Corey: Oh my God, do you see this scroll?

Kris: my laptop.

Corey: This looks like the Matrix.

Kris: Go ahead and say something Corey, well there you go, your transcription

Corey: Oh, look at that, it's trying to figure out what I'm saying and it's doing it live and I am not even doing anything except calling into the number.

Kris: [Kris] That's right.

Corey: This is crazy. Oh my God, I'm shocked, we've just seen the Matrix. We're now doing a live call. Let me hang up before we continue.

Kris: And here's the SQS, like I said. You have also the messages that are actually in SQS. So you can process them at a later time with multiple consumers.

Corey: Oh my goodness.

Kris: I've got to say, I get excited every time I do a demo of this API because I am super excited. My weekends are now actually consumed on what can I do, what kind of demos I can build-

Corey: Yes, with the media streams.

Kris: Yeah, actually, I should've brought it today. I built a Raspberry Pi car.

Corey: Wait, wait.

Kris: It's called Pi Car.

Corey: You built a Raspberry Pi car?

Kris: Yeah, so we can actually stream audio to it and then have it move forward, backward, and all that stuff. I just really, I didn't bring it today, but I just couldn't. So I have to do something.

Corey: Okay, well, I got to ask two questions. Before I do, I gotta say, you said you got excited, the chat room said, "Corey just got shook." I sure did because I saw the Matrix, and I never thought I would see the Matrix in real life in like a way that actually worked. But I gotta ask the question because folks are asking, what is it that we see here in the console? Is it the same thing we see in SQS?

Kris: Yes, yes, absolutely right, so what I'm showing you here is, just to give you an example, that when we fork the media and stream, we're kind of sending it to you in a developer friendly fashion, so it's JSON pay load that comes to you every 20 milliseconds with the basic data, so it haves enough for you to know what the call is. There's unique SID as well as audio packet. By the way-

Corey: Everything has it's own unique SID.

Kris: Yes, the stream

Corey: Oh, the streaming has it's own unique SID.

Kris: And you have a chunk number that you can use if the messages are arriving in a specific order.

Corey: For a specific chunk, oh my goodness.

Kris: Right, so if you have a distributor system, you can put it in. For example, SQS don't always go in order so you can always use that order for items. One thing really cool is when you are silent you just basically see something like this, so you can actually see analysis of this. So this is when you're speaking, right? So, this is actually audio data when you're speaking

Corey: And then there goes silence.

Kris: And then silence, and this is all silence.

Corey: Oh this is amazing.

Kris: Right, so you can do SIGNAL analysis, you can do lots of different things. We have customers who have said

Corey: Yeah.

Kris: they're looking for background analysis. So, one thing I was going to show was this morning I actually disabled a sentiment analysis aspect of this.

Corey: Okay.

Kris: I don't know if I get to show you that, but what I was going to show you was

Corey: I love that we're doing it live friends. This is how we do the things, the most dangerous demos, we do them live.

Kris: So I just basically disabled it, I was going to show for example, if I find the right thing. Yeah, it's right here, I basically said Do that and then I save and then all I have to do is...

Kris: Now we're deploying this to G-Cloud.

Corey: Yeah.

Kris: I should've done this at the very beginning 'cause it does take a little while to spin out the new stuff but essentially you get the idea. So, now the whole thing is running, that is auto-scalable and this is a page that basically comes up from that. So maybe you know for fun, if you have a few minutes here, I can just show you on my laptop, Corey.

Corey: We sure do. Do you want me to call back in?

Kris: In a second, actually.

Corey: Well, while you're doing that, there was a good question that came into the chat room, and I thought it's a good question for us to answer, which is, so we did this off of a phone call, will this also work through like audio on say a WhatsApp call?

Kris: So, it will work on any call that is active within Twilio,

Corey: Oh!

Kris: So if you're filming with Twilio as long as you have a call SID, and the call is active, you can fork to media.

Corey: Okay.

Kris: So, that includes WebRTC, SIP calls, PSTN calls.

Corey: That is amazing!

Kris: And very soon you're going to also be able to do that with video calls. So, you can actually fork to video steam as well.

Corey: That is amazing, thank you for that question C Sharp Fritz and welcome to the stream, super glad to see you here. I love that we've made this accessible, easy, kind of to consume and use.

Kris: Absolutely, so let me see if this works. We're doing it live so...

Corey: Do it live!

Kris: Let's see, all right, so go ahead and call back in and perhaps Corey, you can show some emotions about how you're feeling about this, and so we can actually, let's make sure that actually is working.

Corey: Absolutely, absolutely, I am so excited to be here on the screen with Kris. I'm really excited. I'm hoping that we get a happy face. We haven't gotten a happy face yet.

Kris: You've got to pause for a second here.

Corey: But we've paused for a second,

Kris: There you go.

Corey: And there goes our happy face, now Ellie Face has asked us to get mad. So, I'm going to say that this is disappointing. This is hurting my feelings and now we've done this in real time. Oh my God!

Kris: Are you also very angry?

Corey: I am very angry! Oh, look at that face! Look at the shock here. So when Corey tries to feign anger, this is what this looks like. Thank you, Kris, I love how this goes. This is amazing.

Kris: Yeah, so what's happening is, I was only doing sentiment analysis on a completed stream, so, if you don't pause then it's going to continue to transcribe

Corey: As one stream.

Kris: As one stream ,and then once they're done, we send it over, but I think you get the idea of it. The key here is that you can take the text, and do a lot of different things with it.

Corey: That's exactly right.

Kris: As you can see there's a profanity filter on.

Corey: That's good, yes, so it catches the things.

Kris: Absolutely.

Corey: Okay, so I gotta ask, folks are asking, you know as we start to think about how to the next steps here, folks are asking, how can they get their hands on this today?

Kris: Yeah, it's easy, so two things. One is, we have API docs publicly available and second, our wonderful DevAngels team has put together a quick tutorial.

Corey: Shout out to the DevAngelists on the team.

Kris: That's right, Craig Dennis, big shout out, he's amazing. He helped me put together a tutorial. So, once you start, from the beginning to the end of the tutorial, five minutes in, you're out. You're off and running.

Ricky Robinett, Leader of Twilio's Developer Network enters the booth

Corey: Wait we have another guest.

Kris: Oh no, Ricky. How are you doing? Hey!

Corey: Oh my goodness.

Ricky: Hey!

Corey: It is Ricky Robinett, friends, now if you caught the keynote earlier, Ricky was our fun friend who decided to kick off his shoes and through down his luggage

Kris: I loved that!

Corey: At the end of the keynote, it was a ton of fun. Ricky, we're glad to have you on SIGNAL TV.

Ricky: Yeah, I don't want to interrupt, but I did. Are y'all talking media streams?

Kris: We were.

Ricky: I mean

Corey: We were talking media streams.

Ricky: How amazing are media streams?

Corey: We just had a ton of fun with sentiment analysis on media streams. Talk about the amazing things that you can do

Kris: That's how Corey looks when he's angry.

Ricky: Yeah, yeah, I've seen that. The Hurricane gets going and

Corey: The Hurricane has feigned anger and there, that's what you get.

Kris: I was also showing how we can easily deploy into Google Cloud with one step, but it's taking a while because it needs to spin up an instance, but I think, yeah. I'm excited, as you can imagine, with media streams. It's an amazing product.

Corey: Yes, I am excited too. I'm super glad that you had the chance to stop by. I've got to ask Ricky, since he's here, Ricky, the chat room wants to know, did you get your shoes back?

Ricky: I did, I did, I was really nervous, but thank you, thank you for caring.

Corey: They were concerned, I see the question there. Revertibles, Ricky did get his shoes back, I just wanted to make sure we all knew.

Ricky: Yes, thank y'all.

Corey: Hey, thanks for stopping by.

Kris: Thanks for coming, Ricky.

Ricky: Congrats on the launch.

Kris: Thank you.

Closing Time

Corey: And we're actually about to move over to our next guest but before we do that, we're going to say a big thank you to Kris.

Kris: Thank you Corey. It's amazing.

Corey: Hey listen, you are super popular. Ricky stopped by, Jonathan stopped by. I wish our next guest would be so popular, let's see friends.

Kris: Thanks a lot, thanks for everyone joining.

Corey: Listen, we've got some more content coming up. We have, we called a hidden session, our next guest is going to be another phenomenal Twilio employee, a person by the name of Ashley Roach.

Kris: Oh, yeah.

Corey: Who's going to spend sometime going over one of the products that we saw earlier today, we all got really excited about, that is the Twilio CLI. So join us back here in about 12 minutes, we're going to get started, 2:30 pm, Pacific Time. And until then, it's your boy, Hurricane Weathers. I'll see you soon.

Corey's talk with Kris

Hope you enjoyed the transcipt of Kris and Corey's stream session! You can find more information about Media Streams here. Be sure to check out our Twitch and YouTube channel for more great content.

Join 4,000+ of your peers for at SIGNAL 2020 for 2 days filled with 185+ talks. Transform your business and level-up your development skills. At SIGNAL, the world’s best companies join thousands of developers, business leaders, and innovators for learning, networking, and fun. Experience new product releases and in depth sessions led by innovative companies.

Limited time: register now for SIGNAL 2020 for the best pricing