26/10/2023
TL;DR : Go on ChatGPT and ask him to explain Varnish and Varnish tags in a concise way.
Before talking about Varnish, you might need a bit of context. I am currently on a 7-year-old project for which we have a Varnish configured and employ a CMS website based on Sonata (and Symfony). We are using PHP7.4, and Symfony 4.4. Within our scope, we manage around 20 websites, 6 languages, an admin, and a frontend developed with Twig. We also manage articles (and their translations) that can be made available on several websites. The CMS part is pretty common, and as it’s mostly based on Sonata, it’s relatively simple to make improvements when needed. Currently, we are working on a big overhaul of the app, to now decouple the backend and frontend parts with a headless CMS architecture and a frontend built on top of Next.js.
The need
We wanted to take the opportunity of this big overhaul to dig deeper into Varnish, and take a closer look at how we can improve the caching system for our users without falling into the “cache invalidation nightmare”.
Varnish
Varnish is a reverse proxy cache that works by interposing between the client (such as a web browser) and the web server (such as nginx), intercepting requests and responses to improve performance and reduce server load.
Here is an overview of how Varnish works:
- A user requests a page from our website.
- Varnish, configured to capture the request first, checks its cache to see if it has a cached response for the requested resource. If it does, Varnish returns the cached response to the client without forwarding the request to the server.
- If Varnish does not have a cached response, it forwards the request to the server and waits for a response.
- When the server responds, Varnish caches the response (if the page is cacheable) and returns it to the client.
Varnish can improve performance by caching responses for frequently requested resources, reducing the amount of work that the server (Nginx, PHP, database, external API, …) has to do for subsequent requests. It can also improve scalability by reducing server load, allowing the server to handle more traffic.
Varnish Tags
Here’s a glimpse of what varnish tags are :
Varnish tags are custom markers that can be added to HTTP requests and responses to facilitate cache management. Tags can be used to identify and group related objects, such as web pages, images, or CSS files.
The idea is that once each request is identified with one or multiple tags, it should be easy to de-validate them from the admin. Let’s imagine that one of the authors of an article changes his photo. Once he has changed it in the admin, we want to de-validate all pages where this author is displayed without any further actions from the back-office. We only want to clear the cache for the related pages, so not for the whole website or pages where the author is not present.
Other solution exists using ESI tags, but this is not the focus of this article.
Our Solution
Perhaps there are other solutions that we’ve missed out on, but this solution was developed in the light of our current expertise on the subject here at ekino, both on Varnish and Sonata.
First, we want to have a specific identifier for each ressource (each entity) that will allow us to identify and understand them easily and that is short enough to avoid an HTTP Header that is too long.
We thus created an Interface:
<?php
interface CacheTagEntityInterface
{
public function getCacheTag(): ?string;
}
Each entity that implements it has to specify this new method:
// Exemple on the Site.php entity
class Site implements CacheTagEntityInterface
{
...
public function getCacheTag(): ?string
{
return null === $this->id ?: sprintf('SI-%d', $this->id);
}
}
Here, each entity NEEDS to have a specific identifier. In our case, we chose “SI-[id]” for sites, “PA-[id]” for pages, “AU-[id]” for authors, …
Next, we had to find a way to add all the tags used to build the current page to a pool, for them to be added at the end of the HTTP response. Since we are building our pages only using Symfony Serializer, all we need to do is create a “Recorder” and call it on each serializer we have.
<?php
class Recorder
{
private array $stack = [];
public function add(?string $cacheTagItem): void
{
if (null === $cacheTagItem) {
return;
}
if (!in_array($cacheTagItem, $this->stack, true)) {
$this->stack[] = $cacheTagItem;
}
}
// Here the goal is to return all tags separated by a comma
// Spaces are forbidden when rendering the cache tags string
// because the Cache-Tags header, just like many HTTP headers,
// uses spaces to separate values.
public function getCacheControlTags(): string
{
$stack = $this->stack;
sort($stack);
return implode(',', $stack);
}
}
<?php
class AuthorNormalizer
{
// ...
public function normalize($object, $format = null, array $context = []): array
{
// ...
$this->recorder->add($object->getCacheTag());
// ...
}
}
Now that we have a pool of tags, the last step is to create a listener hooked into “kernel.response” to fetch them and set the correct header on the response :
<?php
class VarnishCacheTagListener
{
public function __construct(private Recorder $recorder)
{
}
public function onKernelResponse(ResponseEvent $event): void
{
if (!$event->isMasterRequest()) {
return;
}
$response = $event->getResponse();
$response->headers->set('X-Cache-Tags', $this->recorder->getCacheControlTags());
}
}
And Tada 🎉
Ok well, that’s great, we’ve now added a new header in our Response, who cares? What’s the point ? And where does Varnish fit into all this ? Get to the point …
Patience, patience… There’s a full story here, so let’s keep going.
With the addition of this new header in the response, this means that from now on when Varnish processes the response to put it in its cache, it will take this new header into account to calculate its “hash”.
Here I’ve shown a screenshot of a local development environment, where the cache tags are being displayed for debugging purposes. In a production environment, however, we would of course remove these tags from Varnish’s “vcl_deliver”, as there’s no point in showing this to the end-user.
# The routine when we deliver the HTTP request to the user
# Last chance to modify headers that are sent to the client
sub vcl_deliver {
...
# Remove the cache tags invalidation to the user response
unset resp.http.X-Cache-Tags;
return (deliver);
}
We now need the invalidation process to happen, because as our page is now cached, whatever change we make to the admin, there will be no impact on the frontend. We’ll have to wait for the cache to expire, but that’s not efficient at all.
In order to invalidate the cache, we’ve just implemented a Doctrine subscriber which will subscribe to preRemove, preUpdate and prePersist events to call varnish and ask it to remove all cache entries where the tag is present :
<?php
class DoctrineORMSubscriber implements EventSubscriber
{
public function __construct(VarnishAdapter $varnishAdapter)
{
}
public function getSubscribedEvents(): array
{
return [
Events::preRemove,
Events::preUpdate,
Events::prePersist,
];
}
public function preRemove(LifecycleEventArgs $args): void
{
$this->flush($args);
}
public function preUpdate(LifecycleEventArgs $args): void
{
$this->flush($args);
}
public function prePersist(LifecycleEventArgs $args): void
{
$this->flush($args);
}
protected function flush(LifecycleEventArgs $args): void
{
$entity = $args->getObject();
if (!$entity instanceof CacheTagEntityInterface) {
return;
}
$cacheTag = $entity->getCacheTag();
if (null === $cacheTag) {
return;
}
$this->varnishAdapter->deleteItem($cacheTag);
}
}
Our VarnishAdapter will then call varnish and ban the cache based on a regex. For example, if we update a field on author 2, we will generate a ban request on: “obj.http.X-Cache-Tags ~ AU-2"
And again Tada 🎉
By doing this, we have now the possibility to invalidate only specific pages where the resource is used. This is much more efficient because, in the meantime, we’re preserving all the other caches we have. Our site continues to perform quickly and is always up-to-date !
Of course, if you update even just a small bit of information at site level, this system will delete all cache entries for the whole site, and for all pages ! But this is something that is very rarely done in our scope.
Conclusion
Over the past few years, I always stayed as far away as possible from Varnish because it felt like a big mystery box that was too complex to deal with without any “real-life cases”. But now that I’ve experienced it for myself, it’s made me want to go further in discovering this tool and how I can configure it properly to help me !
I’d also like to add that Varnish isn’t the only caching solution out there, there are others to consider such as SymfonyCache, Redis, Memcached, CDN, Database…
I hope all this information was clear enough and useful to some of you 🙂
Varnish tags, my journey to fall in love with the cache was originally published in ekino-france on Medium, where people are continuing the conversation by highlighting and responding to this story.