Hash it Out

Ep4 | Data-Centric Security: What is it?

March 09, 2023

Experts in Data-Centric Security, Virtru and Fluree talk all things Zero Trust, data interoperability and trends for data sharing in 2023.

Data-Centricity is an approach to your cybersecurity posture that prioritizes control and secure access to the data at a granular level first, before prioritizing the systems and networks that store and transmit it. This makes sense in theory, but what about practice?

We brought together experts in private and federal cybersecurity to illustrate varying perspectives on DCS using real-world examples. Together we’ll dive into its challenges and give insight on its benefits. Virtru’s Federal GM Shannon Vaughn and Fluree’s CEO Brian Platz, discuss what their respective companies are doing to make data interoperable and accessible to the right people, in the right place, at the right time, and how its truly a true game-changer.

Read transcript Hide transcript

[Shannon Vaughn] All right. Hey everybody. Welcome to Virtru's Hash It Out webinar here, I'm Shannon Vaughn. I'm the GM of Virtru's Federal practice. Today's guest on hash it out is Brian Platz, the co-founder and CEO of Fluree, a Link Data platform based in Winston-Salem, North Carolina. Brian, Thanks for hopping on here today.

[Brian Platz] Yeah, thanks for having me Shannon. Pleasure to be here.

[Shannon Vaughn] Yeah. Now, it's it's pretty cool actually. So a little background and kind of how we got here. We were down in San Antonio, Texas, this past December and we're at the DoDIIS Worldwide Conference. So that's the Defense Intelligence agencies big, you know, annual conference and I'm walking around checking out the floor and there's all these vendors and doing all the cool, you know, defense and intelligence work and you got your Humvees and you got your big demo screens. You got your satellites and then you got a couple companies just talking about nerdy data stuff. And I saw your guys' booth and I just had to walk over. So it's probably the only other company kind of pitching what we were pitching which is around data centric security and…and had it cool you know kind of vibe and everything else but it was data centric security and the only other time I could actually remember that was earlier, probably a couple months before that we were in Mons Belgium at a NATO conference. I think last October so and probably 60% of the companies in Europe are talking about data centric security and in the United States. We're kind of talking about Zero Trust, right? That's kind of our equivalent buzzword but somebody else was talking about data-centric security. So I just loved it because I think this is one area where the Europeans might actually be out ahead of the Americans. Thanks to things like GDPR and others like that. So really great to have you on here. Thanks to your team for putting up that great booth and making the introductions there. So, well, first, I want to ask you, give us a little background on Fluree and kind of the impetus to, you know, create the company.

[Brian Platz] Yeah. Um, so I've been in Enterprise software my, my whole career, which is getting closer to, I think 30 years than 20 years at this point. So quite some time and, you know, one of the things we see a major shift around is sort of the flipping of the world. Sometimes we refer to, which is everyone has for, you know, probably as long as I've been alive. Gone out and bought apps. And then those apps sort of have data that's stuck behind them. And of course, that's part of our data silo problem is we just go and buy apps and that You know, the apps are being, you know, more commoditized. Now they're the value. They hold is is decreasing, and we're the value is actually coming from, is the data that's kind of locked up behind them. And this this world that hopefully will see in my lifetime, which is you actually don't buy apps and manage apps. You manage strategic data. In the apps come to the data as opposed to you sending your data to the app. So you know the data itself ultimately I think is what wins and where the value comes from. So as you know, databases were designed to sit behind apps and there's not data platforms that are designed to sort of be the strategic repository and that's what we built fluree around is is towards that vision.

[Shannon Vaughn] So, so yeah, dive into that a little bit for me, you know, you know, describe you in the opening as a linked data platform. And I think that that's a really kind of cool way to think about it. You know, what, why would you say you guys describe yourselves, as kind of linked data platform in that context?

[Brian Platz] Yeah I mean you think about kind of how what we need to do so if if you are trying to get value out of data today, you're you're you know probably it worse writing a bunch of custom scripts and you know doing this on some data analyst laptop and Jupiter notebook or something. At best, maybe you're putting it into a data warehouse for Data Lake which is really you're copying and pasting data silos and you know putting them there which is better, you know. Potentially but it still doesn't get around. This problem of, the we cannot fail data by copying and pasting it. You know, we lose security when we do that we lose recency the data's stale. As soon as you do it we need to buy a lot of hard drives you know luckily that's getting cheaper

[Shannon Vaughn] Yep.

[Brian Platz] But if we're going to store everything and duplicate it and duplicate it and duplicate it. So, you know the linked data if we can leave data in place and have it sit where it sits. But have machines be able to decipher what that data means without humans. And have ways of connecting the data together virtually as though it was altogether, ultimately that is the long-term solution. So we lean really heavily into the linked data standards that do exactly this, you know. We have the web for humans, we have the web pages that can link to other web pages, this is the same thing for data, but its for machines. And its data that can link to other data that's sitting in other places. Um, this ultimately we create a huge virtual web of data by sitting on these W3 C standards that embrace, so thats when we say linked data, were really very specifically meaning the linked data standard that sit in the W3 C that enable this sort of thing.

[Shannon Vaughn] So very much decentralized blockchain, you know, kind of those core tenets of Web 3.

[Brian Platz] Yeah, core tenets of Web 3 but you know, blockchain I think, has had bad press lately for sure, but blockchain comes in and adds a characteristic to that data only if its important. Sometimes, its not important. What it can add is a verifiable provenance or verifiable history that can be proven how things change. Sometimes you might have data where you don't care about that. Theres no reason to sort of go through that expense, that you know, that these are all different sort of features and capabilities that can sit around data, that has this capability. Sometimes you want it sometimes you dont, its always been our focus, you know, grab the things that are important for your use case. If verifiability, traceability and cryptographic proofs, are really critical for what you happen to be working with, you know blockchain provided been a pretty good pattern for how to do that. There are other ways too. If you don't need that, turn it off you dont need it.

[Shannon Vaughn] Yeah, that's it. You remind me of actually one of one of our projects was one of the large US combatant commands and this goes back, a few years and we're still working on the project. But in the beginning, they kind of called it network collapse because talk about redundancy of data. You know, when you work with coalition partners, what they were doing is they would stand up a new network so you know, US and Country X that's one network, US in Country, X, and Country, Y is another network, and they might have to make three copies of that data. And then you get into the things you're talking about, right? Whether you have old stale data, whether you have wrong data, we have data that, nobody should be accessing anymore. And so you run into this problem set. And so, we basically collapsed nearly, you know, two dozen networks down to a single network that everybody was kind of federated into kind of this decentralized concept and say Let's have a single gold copy that can sit in country X and Country Y and the US and we can all call to those things and know that that is the right copy of the data. So.

[Brian Platz] Yeah. You know, it sounds crazy, it is crazy but its kind of, those are kind of the tools we have. Thats why people do it. Thats the tools we have. Part of it is we need better tools so we don't have to do that. I think the really impactful part of what you just said too, is each time you're making those copies, you know, the security that surrounds that data, typically sits in the application. Well you didnt copy the application, you just copied the data, which means you lost the security. So the idea that data needing to be carrying policy and security with it in the data tier I think is a really important component of this to really enable this capability.

[Shannon Vaughn] I think that is, that is exactly, right? I think, one of the other things our companies both really believe in is, you know, open source and open standards, right? So one of those kind of open standards for us is the trusted data format, right? It's the basis of a lot of our tech, our co-founder CTO was lead author on it when he was up at NSA, but it's open source. It's maintained by ODNI today. But others, you know, whether they're in DoD IC, or you know, international companies are using trusted data format, as a way to do that. And then what we've done is kind of taken that and built in transformers, right? Translation tools, I say glue to work with other other standards. So you mentioned, you know, kind of crypto binding, right? Metadata tagging and crypto binding. Those are two native standards 4774 and 4778. Well, we kind of build that into our version of TDF to be able to say, as long as you subscribe to an open standard You can, you know, we can understand that thing. So however, say the Brits they tag data certain way, but they subscribe to that NATO standard. The US can actually accept that and nobody has to go back and re-tag data or make another copy or use somebody else's tagging tools, right? Like, little things like that, can go a long way but I'd love to hear why you guys, you know. Are big on, you know, proponents of open source and kind of standard space tech.

[Brian Platz] Yeah. the ultimate way this works is if machines can automatically interoperate, or sort of crawl the data web, if you will, to get the information they need. And that whole premise relies on concepts of decentralization. I don't control all the data web I control just nodes in it. And if we don't have standards, then we're back in the situation we have now. Which again, its solvable, its just not scalable. And its solvable by hiring a lot of data analysts to write a lot of custom scripts to try to translate format A to format B, to format C, and you know, they could also use some better tools because the other thing that is happening to them is that a lot of they are duplicating each other's work, just because they don't have great ways to managing that workflow. But I think we're gonna see more in that area as well, that helps with it. But yeah, its not scalable without the standards and you know, the idea is that you can leverage the best data in the most accurate way, in a Zero Trust way, in a way that Zero Trust is always a little strange to me, because it actually means it has the most trust, not Zero Trust, but I understand the term. We use the term a lot, but in a Zero Trust way, that is where the scale happens and the insight happens, and you'd never get there without the standards.

[Shannon Vaughn] You. Yeah, I think that's exactly right. It's funny, you mentioned Zero trust because I was kind of crawling your website. I read you guys this blogs and stuff and one of the things you had on there was kind of top trends for data sharing in '23, which I thought was kind of cool. And I want to pick apart a couple of them because one of them is around verifiable credentials, right? We look at the Zero Trust framework. There is pretty good consensus, we all agree there is at least five, at least in the US government, but there's some places that say there is 7 some places say there is 8 but like, identity is that first pillar, the user pillars. Then theres endpoints, users, apps, and then data, like what we believe in - data-centric security. But then you guys talk about Verifiable Credentials. Why do you call that out? That kind of identity pillar so much as a trend in '23.

[Brian Platz] Yeah, so the, you know, credential really has three things. So it's got a payload, the data, you know, sometimes I like to think about this if you and sort of old school terms, you, you know, typed out a letter on your typewriter. It's got the envelope, which is what you put the letter in and seal it with. And and that's where kind of your cryptographic proof sits and then the cryptographic proof and the letter itself might be tied to a digital identity. So that identity is really kind of that third, part of that, that verifiable credential and it embraces, it doesn't require, but it certainly embraces that did standard decentralized. Identifier standard. Um, in its a, pretty interesting concept because it again, lays out some baseline standards, but allows flexibility for identity to exist anywhere. Some people, you know, we we do some work in higher ed and creating verifiable credentials for packaging up a credential, by the way. It's just like, you know, as simple as form a passport or driver's license are credentials. They're, of course, usually in a physical form. So this is a digital form, but but the digital form allows us to have a lot more credentials. So we do a lot in higher ed, you know, university degrees and transcripts things like that. Wrapping seals, you can think health records, you can think financial records, all being wrapped up and in this sort of format. But the identity component that the decentralized identity standard allows you to say, I want to manage my Identity on a blockchain or I just want to manage it on my laptop. In that space. A lot of people are using GitHub to manage their identity. So it offers this really flexible way where people can sort of store and manage their identity where they want, and then other entities can sort of wrap provable information around your identity. So that's one of the reasons. We we really like that. It's, it's a standard. It's sort of embraces then inherent flexibility, but still gives us a way of creating a link data. That's provable.

[Shannon Vaughn] That's great. It's funny. You mentioned the healthcare space and you know, while I run federal, you know, we've got some large healthcare clients, you know, CDC and others. But you know, if you ask across Virtru's 8,000 customers or so. I think our three largest sectors are people with sensitive data, and guess who those are health care finance and the governments, right? People that have data that they believe to be sensitive in healthcare. That's a PII or something. Tied to some, you know, personal identical identifiable information, right? So it it's, it's funny how much we probably have very similar customer bases. I was looking on your website actually, and two of the logos that you guys put up, there are DoD and and Air Force is you have a good kind of I'll say, case study, on kind of why they came to you given that, you know, there are government customer that probably has sensitive data?

[Brian Platz] I mean, this is, you know, gets back to my, I think kind of up front dialogue. Is that there's An increased emphasis on the strategic importance of data and data science, you know, within the DOD and I think for them, they see competitive pressure, if you call it from other large countries and of course they want to be on the leading edge of that and you know, as anyone who's kind of been in the inside of those organizations. Sometimes, is really hard to get your hands on the data. You need to do your job and so there's there's kind of this congressional mandate that came out several years ago where they were going to swap their data policy from "Need to know" which kind of makes sense in those environments to "need to share". So you actually have an obligation to share your data not to protect your data. Well, in a protected way, Obviously a high rate protected way but you're obligated to make sure that data can get into the hands of the people who are authorized to have it, which is a complete 180 shift in that. And you know to accomplish that I mean it needs all of these things we're talking about. How can I trust the data? How can I, you know, wrap policy and protection around the data to granular level? So all these kind of challenges exist in that exact environment where they're trying to shift their data strategy to in the DoD. So we think there's an immense amount of applicability there and just tremendous number of projects that could use this sort of philosophy.

[Shannon Vaughn] Yeah. Actually, one of the ones that also came up on the 23 trends. You guys were talking about multi-part multi-party compute right? Secure, multi-party, compute and and we've got a project there kind of around secure analytics, clean room style things, you know, Amazon's put out a clean room product kind of the funny thing there is, everybody has to be an Amazon customer their data has to sit in a S3 bucket. You can only have up to five parties, you know, a couple of downsides there. But I'd love to hear why you guys, you know, are looking at the, you know, the MPC space. And and how do you see that as a trend going forward?

[Brian Platz] Yeah, I think it's just an extension, a pretty far extension of what we've talked about which is you want data to be exchanged. But really, why do you want that data to be able to be exchanged to those who have permission and the policy allows you to? It's because you want to get insights, right? You're you're probably doing analytics or something on top of it. So what if you take that to an extreme and you say Well I want to be able to do analytics But I can't even see a single piece of the data like, it's that protected. That's where this technology comes in. Yeah. And it's a, you know, and it's an exciting. I mean it's been around for a while. I first learned about it. I'm really a government related conference many years ago, but it was from a professor who had been working on it for at Stanford for you know, like 12 years. I mean it wasn't like anything new but it is sort of new and emerging right now but it's sort of a field of kind of this class of zero knowledge cryptography but it's really cool. I mean you can do analytics you know across data you can't see and and have accurate results and never disclose anything. Um how it focuses how does that tie into us? So you know when you're doing analytics you have to do it in a non-identifiable way. So even if you even if you shield the record, if you allow them analytics to get granular enough, you can still sort of figure out who they're talking about, right? So everything kind of needs to get up. Aggregated into higher, level buckets, To allow those analytics to appear. We are a knowledge graph database, which means we create hierarchies or taxonomies out of data and so for us it's a very natural approach because you just need to pick the level and the hierarchy of your sort of data classification where its generic enough, you know, you won't accidentally leak any identifiable information. So it's a real natural extension for how we manage data.

[Shannon Vaughn] Yeah you got to be able to build the the trust with that data owner, right? And they're not going to share into the clean room or some ephemeral store to allow somebody to do kind of anonymous, whatever the thing is analytic or whatever, they're trying to run and they want to be able to, you know have control over that. So you know one of the things we talked about is, you know, you could share, you can have, you know, 100 party, share something in say, we're, you know, all trying to, you know, cure covid. Well, if there's a derivative data set that comes from that, all of those original data, owners should have, you know, some portion at least maintain ownership of their data. If there is a derivative data set that everybody buys into the unfortunate space where you know, maybe that company goes off makes a ton of money through acquisition. Well, they're new owners might not want that data in there and they should be able to revoke access, right? And so that's that's how you build that verifiable trust to allow people to do multi-party compute and incentivize them to do it, right? That's one thing. I just really like Next one, I want to. Chat on here because you guys had a good quote that I'll read it said, 2023 more than ever, the focus will be on access control, advanced encryption and transparency and where your data is going when into whom and I'd love that line, right? Because that's that's everything we believe, right? Advanced Smart Access Control, Decisions, Advanced Encryption, We do optional encryption, but, you know, for a lot of our sensitive customers they want that, that encryption piece, but also transparency. Right? So make, you know, we like to say, Make the data owner, stay the data owner, right? Like even after it's left their boundary, when that when their data goes externally, they should still be the data owner, right? It shouldn't be this, you know, Fail open concept of like, Oh, I hope you know that person doesn't do anything with my data, you know, I, I sent them a very stern email. Yeah. Can you talk to why you guys think, you know, Access controls Advance encryption transparency? Really are, you know, one of those trends in '23?

[Brian Platz] So I think the good example that you brought up in Europe around them being ahead of us in in many ways it's related to PII data and how they have approached regulation around it. You know, I'm proposing that we have this global interconnected web of data, but it's a pretty complicated web, especially when you're dealing with things like that because you even think you're in the US. Now you're giving up bank, a consent to market, they might auto check the box for you to market to other banks that they're gonna sell your data to. Well. Now that data is transfering through many different channels and my through being, you know, different banking, organizations or different advertising panels and that was a consumer, you come in, and you uncheck that box. How do they crawl that data? Back through this entire web of usage. And how it's been replicated? And we get to this idea. That that's how the consent in the policy has to travel along with the data itself. So it can be recognized and actually pull back. And I I think we're just going to see more and more instances of this need.

[Shannon Vaughn] Yeah, that's that's a good point. The I think the final one I want to kind of end on here is, you know, the one of the ones you call out is interoperability. And, you know, I, I think you guys speak on your website often to like semantic interoperability, right? Which I really like on our side. You know, our our interoperability really is, you know, it kind of what I mentioned earlier around, like, how do you let different countries, You know, do the things that they already do and you ensure an interoperability as the vendor, right? And so whether that's engaging when open standards, ensuring open platforms, where possible using open source like doing doing all the things you'd expect, you know, we we do that for. Like I said, the TDF specs, whether that's the IC TDF or base TDF, or Nano TDF, which we do for like really, really lightweight sensors, like edge edge in flows, or, you know, You know, and international kind of, you know, a mission partner interoperability to ensure that the standards that they subscribe to, we can help translate. But I'd really like to hear what you guys are talking about, you know, data interoperability and that semantic interoperability piece.

[Brian Platz] Yeah, so we really fully embrace and are a Knowledge Graph database and the Knowledge Graph database. That's on top of the W3C standards around data interoperability. So yeah, we're very focused on the standards. And in particular, what the standards, who have had a lot of smart people thinking about this for a long time, have added a lot of richness into the data format itself. So it's not just about sort of that interoperability layer. It's how do we do things, like Create class hierarchies and inferencing, automatically in the data tier, the thing that we're really layering, even that outside of these, this core Spec is, then how do we wrap that date in the policy and enforce it? But, but yeah, this all comes down to our core standard support and and we lean pretty heavily into the W3C standards around this.

[Shannon Vaughn] And that's great. All right, well, I I think that's about all I got today. I really do want to say, Thanks. Brian to you and the flurry team for, you know, hopping on here but also, for, for speaking, a common language. So the thanks for helping to push kind of that , those open standards help to push that data-centricity message, inside the United States and elsewhere, it's great to see that. There's other small companies out here really trying to make a difference. So, Brian thanks for hopping on hash it out here at Virtru today.

[Brian Platz] Thanks so much Shannon. It was a pleasure being on. Great conversation.

[Shannon Vaughn] All right, buddy. Bye Now