Decrypted | Insights from Virtru to Unlock New Ideas

Cypherpunks, SEO, and LLMs Oh My!

Written by Matt Howard | Oct 20, 2023 6:47:11 PM

As Eric Hughes wrote 30 years ago in his famous cypherpunk manifesto, "Privacy is the power to selectively reveal oneself to the world."

Over the course of my career as a software marketer, I've invested an enormous amount of time and money into search engine optimization (SEO) with the explicit goal of revealing my organization's content to search engines like Google. The motivation was simple: selectively reveal more content, rank higher in search results, connect contextually with more customers, and drive more business.  A huge part of the effort centered on leveraging meta tags, title tags, heading tags, etc to apply structure to information that was otherwise unstructured.  By properly structuring content, it was possible to train the search algorithms; e.g. "pay attention to this content, but ignore that".

But now, with the advent of artificial intelligence and large language models (LLMs), we are seeing a major irony unfolding.  One one hand, businesses still want to selectively "reveal certain information" to search engines for purposes of discovery; but on the other hand, these same businesses want to selectively "conceal certain information" from LLMs in the name of data privacy and IP protection.

The concern of course is that if sensitive business data is scraped into the copious training data used by LLMs, it could be exposed in unpredictable ways. So companies today are investing in solutions to help them discover, classify and tag their entire data estate so they can then apply and enforce granular policies to selectively control how data is revealed, or not.

In the past, SEO was about structuring and optimizing for findability.  Today, data privacy efforts are about selectively preventing sensitive information from accidentally leaking into AI systems and LLM training data.

This dynamic represents a thick irony between the business mindset of the past versus the mindset of the future which must contend with all things AI.  Further, it underscores the critical need to adopt data centric security and granular controls so you can selectively balance openness and transparency with privacy and security.

The SEO era was binary: put everything out there to be found.

The privacy era will be far more nuanced: discover, classify, and tag EVERYTHING so you can control at a granular level the data you expose, and the data you don't.