Fill the form below to claim your gift. Scroll down to catch the Virtru-al session below!
In the latest episode of "Hash it Out", Trevor Foskett, Sr. Director, Solutions Engineering from Virtru and Jason Trip, Director, Solutions Engineering from Nightfall delve into data security trends and the increasing importance of Data Loss Prevention (DLP). Together, they cover unintentional and intentional insider threats, the rise of compliance regulations, and challenges due to new applications, specifically Gen AI tools like Chat GPT. Despite their usefulness, these tools may present data security risks. Tune in to explore how data security and privacy are maintained amidst evolving threats. Of course, coffee's on Virtru! Fill out the form above to get your gift certificate for a free cup of coffee!
[Trevor Foskett] Hey everyone. This is Trevor Foskett. I'm the Senior Director of Solutions Engineering here at Virtru and this is the latest episode of Hash It Out, which is a series of informal conversations where we just talk about trends that we're seeing in the industry. Today I'm joined by Jason Trip from Nightfall. Jason, you want to say hello?
[Jason Trip] Yeah, hi. Nice to see you Trevor.
[Trevor Foskett] You too. So, Jason is my counterpart. He leads the Engineering team over at Nightfall so we work with a lot of our joint customers who are using one or both of our solutions to identify and protect data within their tech stack. So Jason, these are obviously meant to be informal, so I guess we'll just kick it off with, you know, what are you hearing, any new kind of trends that are coming out from your customer discussions? What's the latest?
[Jason Trip] Yeah, sure. A few different things. Of course, Gen AI. I think we can talk about that a little later too, but on the Nightfall side, we have DLP in our name: data loss prevention or data leak prevention. I've really seen a lot of people coming to us for two different kinds of forms of that. I kind of attribute data loss prevention as more like preventing intellectual property from being lost, taken away, taking somewhere else like that X filtration use case, and to be honest, we have some tools for detecting that kind of information but that's not so much the primary strength of Nightfall. We're more about data prevention, employees taking sensitive data, putting it at all these other applications. I think that's kind of one trend that I've seen quite a bit lately with that intellectual property protection.
[Trevor Foskett] And are you hearing about that more in a capacity of just protecting the business from mistakes or you mentioned exfiltration, or are you hearing companies are looking for ways to prevent their own employees from intentionally removing items or just, these kind of casual, unknowing insider threats here is making mistakes, maybe a little bit of both.
[Jason Trip] Yeah, a little bit of both, and honestly, when people come to us to really prevent that sort of insider threat, the intentional misuse of data, hey, I'm going to take these documents, I'm about to leave this company and go to a competitor, download a whole bunch of documents, those are all things we partially work with, but, you know, that's not our forte. So that's the sort of intentional sharing. And that can happen, once in a while, right? You know, let's say employees are gonna leave in a few weeks, something like that. They might do it that one week when they download a bunch of files. On the other side, we hear a lot about insider risk, that accidental, or kind of unintended or unknowing, sharing, and putting of sensitive data in cloud applications, and that's our sweet spot, that Nightfall's finding helping employers. You know, educate and stop that.
[Trevor Foskett] Yeah, I mean, we hear about that one a lot, too, especially these days with all the different compliances, that I think, probably a lot of our joint customers have to deal with is, you know, that mistake or that unknowing user putting something into the wrong place, can be a HIPAA violation or CMMC is breathing down a lot of our customers' necks these days. but you mentioned another interesting one around that concept of intellectual property. I think a lot of times we think about this compliance-driven use case, but also whether you are regulated or not, there's a case for protecting sensitive internal information that may be sort of critical to your business, even if it's not regulated, being able to make money off of your intellectual property requires your ability to keep it confidential. And we see this all the time with TV scripts that get leaked and all of a sudden that has an impact on your own data specifications. I think it probably deserves as much attention as the compliance, in your use case, because both can be pretty risky for the business if you're not taking and taking the right precautions. So it's an interesting use case that I think we can solve for a little bit at Virtru with the ability to protect that data and sort of track it through its lifecycle. But it's going to be, as you mentioned, where people are putting this; it's always gonna be an evolving problem with, you know, the number of applications that are just coming up everyday and how do you continue to keep a hand on it. I don't think I have the full answer for it but it's interesting to me and we were chatting earlier and speaking of new applications, you kind of mentioned one that I think is probably on everyone's radar or maybe people are putting data where they shouldn't be. You want to talk a little bit about Chat GPT while we're here?
[Jason Trip] Yep, Chat GPT. All the Gen AIs that you can just get to your browser, power probably the most effective, important, powerful cloud application that users have at their disposal and super easy to just take some PHI and throw it in a Chat GPT and then go tell it to write a patient summary or something like that.
[Jason Trip]: Just like, I mean we know that it's happening in other applications in Slack or chat tools, or they're used to send this information. You want to store it in G Drive, you want it to hold in your projects and JIRA, but now, every security team I've talked to is now saying "Hey, people are just gonna paste this stuff all over and Chat GPT all the time." It's just the next, most powerful kind of SAAS application that's out there.
[Trevor Foskett] Yeah, and it's funny. I mean at least for me, and I've seen other people do this as well, you kind of forget with Chat GPT specifically that it is basically no different from than another SAAS application. You kind of have the feeling that you're just having a conversation. I'm one of the people who is guilty, I say thanks when I'm like getting something from it. You kind of forget that you're sending data to someone else's server where, now they have that history of you, so they're keeping that stuff. Now that you say at first, when it came out, it would kind of all go away the tab, and now it's all in there. Yeah, definitely. It feels like you're just having a chat with your friend but you are not.
[Jason Trip] And it's a third party hosted service. The data is transmitting across the public internet. It is arguably maybe a violation just when you put it out there, just as soon as they get across the cloud somewhere else and then who knows what it can lead to if it's sitting out in that server, they've got that history, like you said, and can access it, and Open AI had a couple security things recently and as any application, they're gonna be targeted and anywhere is sitting out there, it's just getting sprayed more and more.
[Trevor Foskett] So has anybody taken steps to mitigate this yet? Is this sort of an emerging issue that people don't really know what to do about yet? What are you seeing from the folks that you talk to?What's the answer? Or should I just ask Chat GPT with the answer is and it'll tell me.
[Jason Trip] It will probably know, but yeah, and also applications are building Generative AI into their own applications as well, so even more cost for that, but yeah, Nightfall's working on scanning the sensitive data before it goes into those applications; it's traditionally we've plugged into cloud applications and scan the data kind of after the fact, let it get in there like it's in a sanctions application like Slack. We'll scan it just afterwards and delete it out. This is kind of a cause for preempt scanning it right before it goes in, and then informing the end user trying to stop it, preventing it so, you know, we're working on a Chrome browser extension, a kind of the first iteration of this, but there are so many other places that this data could be kind of sprayed around, of course. So yeah, being blocked a little bit and then just educate users along the way to get them to understand, hey you're sending something sensitive In this new application.
[Trevor Foskett] I mean, there's a couple things you mentioned there, which I think are pretty cool. One, doing this more on the client side for you guys, which I know historically kind of operates on the server side to pick up those things. I think that's an interesting progression for you guys. I'm excited to see where that goes, and if there's any more integration opportunities between the two of us because of that. We tend to do things at the same level as the client side. We encrypt data before it goes into your Google workspace or before it goes into your Office 365, whatever it is. It's depending on what you're solving for, I think if you're kind of alluding to it, catching it before it enters into that risky sort of operation can be really important and so that's very cool. Look forward to trying that out and seeing all the bad behavior that I've been using the Chat GPT that it's gonna be able to flag for me.
[Jason Trip] It's about the time that, like in the email integration we have, when somebody sends an email, you can use a little bit of time to scan using machine learning, right? We got to use a lot of computing resources on our end to scan through, to look at this unstructured data and open up these files. Well, if you're sending something into Gen AI tools, you've got a little bit of time that you can spend scanning it, but as you know, too much on the client side can be really limited because you can only do simple regular expressions to be performant enough.
[Jason Trip] Otherwise, you interrupt end users and close cascading effects of interrupting user productivity and then you have to scan with really simplistic tools, so in this case we're going to be client-side, but we're still a hosted model of scanning, so we can use our ML to detect those.
[Trevor Foskett] Very cool. Yeah, and you guys are using ML to defend against the threats that come with ML or LLMs, really cool. You mentioned productivity, so let's do the flip side of GPT. What are the positives? I know I found it to be helpful for things like productivity of, give me 10 ideas to start with. Have you guys kind of explored using it for anything like that or are you still in the research phase? How does that look?
[Jason Trip] Yeah, yeah. I try to find all the different ways to use it. I had it write a limerick for PHI of Nightfall, of our platform. It somehow knows of us so yeah, we use it on our side, so we found that ML is the best way to do, kind of named entity recognition, the actually finding sensitive data. We've tested out LLMs to do some identifying. LLMs are particularly good at this context, like, knowing being able to know the intent of the words. It's extremely powerful there but we've used it for generating sample data. Our product team uses it for some of the training of the engine there, too.
[Trevor Foskett] Sample data, that's a really good one. When we demo all of our secure email and file solutions, I always struggle with what's a fake, yet potentially sensitive email and usually it's some, patient history or, account information or technical specifications, but I always spend what seems like way too long trying to come up with them, so I'm gonna steal that from you but I think that, that's what it leads for me is best for, just give me, I need something that's not necessarily like production ready but I need five potential titles for this blog post, or give me a subject line for my out of office email, that kind of thing where you just iterate through, not critical tasks but kind of move through that type of thing quickly has been helpful, for me, at least, don't tell the people that I email that some of their subject lines, maybe you're coming from. We've also found it really to be helpful. I mean, you mentioned that you asked it about yourselves and it sort of had some information about Nightfall to use in the limerick. One of the guys on my team, Juan, came up with a really good idea where he uses it for competitive analysis or he'll say, give me six bullet points about some other competitive solution, and having been trained on likely a lot of their public-facing docs, it can do that and just save you a lot of time in that research phase as well. We found that you kind of want to go and double-check the results that you get me and sometimes you read something and you're like that's not entirely true, but again, it's sort of like a starting point. I find it to be really helpful in that regard.
[Jason Trip] A really good starting point, and as you say again, it can hallucinate about sample data, like it doesn't know that a Social Security number has to start with nine. You can get some fake data in there and we sometimes have that when prospects test our detection engine and then they make up data but hey, it is actually a feature there that we're not picking up, you know. These items that have to pass a check or, can't start with certain digits, things like that.
[Trevor Foskett] Yeah, that is interesting. So you guys use ML to basically return a confidence score, right, so that would go into something detecting an SSN would look at not just the format but, hey we also have these confidences and tolerances for what that format can actually be.
[Jason Trip] Yeah. Yeah, we return a confidence level, a likelihood of occurrence; we call it the level of recall versus the level of precision, so you can tune how sensitive that you want to be. A very likely confidence threshold means that your recall will be a little bit lower, but your precision will be higher, meaning you're gonna weed out a lot of noise that would otherwise show up and basically, DLP is really hard. It can inundate you to make it basically impractical to implement the solution, so you've got those levels and you can see what the detecting is like in your production environment and you can tune those levels accordingly to the likelihood
[Trevor Foskett] Yeah, super powerful stuff. That contextual awareness is always really impressive because a phone number can look a lot like something else and obviously, everyone has a phone number and their email signature. We don't want to catch all those but if we want to catch something specific, so, really cool stuff. Definitely excited to see how you guys continue to evolve with some of the new breakthroughs that are coming out and how you adapt to that.
[Trevor Foskett] This is the final thing. Since we talked about it so much and sort of technical capabilities I think you mentioned earlier a little bit, the sort of user education piece as well. People understanding that as they engage with this, they're not engaging with a friendly robot who lives on their desktop. They're engaging with someone else's server. One thing I've seen is that organizations are now starting to have their formal policies about how that can be used and whether it's for data protection standpoint or brand reputations standpoint, I think as well. You don't want to all of a sudden have all of your marketing materials be clearly written by an AI. Have you seen similar? Or are we sort of unique in that?
[Jason Trip] So, okay, they're writing their marketing material with Gen AI, you're saying?
[Trevor Foskett] Well, writing policies that sort of govern how much you can rely on Gen AI to do your thing because it may affect your brand reputation. If all of a sudden, all of your marketing materials are written by Gen AI, which is an interesting thought, but I have read a few of these, AI generated pieces and you can kind of tell after you've seen a few; you can kind of get it and get a vibe of how they work, but interesting development I thought.
[Jason Trip] Yeah. That is interesting. Yeah. In that intellectual property realm, that's not so much where a lot of our conversations go. We're about finding the sensitive data that gets frayed within there. So, yeah, that's interesting too, is making yourself look like a person. We need to be certifying it, or, you know, we need to know it's not just coming from the machine.
[Trevor Foskett] Yeah. Do your problem for someone else to solve. I think we're probably not, we're not gonna crack it here today on this one. Jason, really appreciate the discussion. I think we're getting up on time here but really appreciate you sharing your thoughts with me and responding to my wild scenarios and AI questions. Of course, you're better versed in it than I am, given what you all do at Nightfall, but to wrap up, this has been the latest episode of the Virtru Hash It Out series, where if you feel like we've been rambling this whole time, we kind of have been. It's meant to be an informal conversation where we just talk about what we're seeing in our day-to-day and trends in the industry. So if you've made it this far, thanks for sticking with us and we'll see you next time. Jason, any parting thoughts?
[Jason Trip] Trevor, thanks again. I always learn a lot from you. It's been a pleasure.
Get expert insights on how to address your data protection challenges
Contact us to learn more about our partnership opportunities.