If you’ve spent any length of time managing a high-traffic website, you’ve inevitably heard of caching. For the uninitiated, caching is a method of storing the results of a complex operation for quick recall. It’s like having a cheat sheet of answers to complex math problems with a defined set of inputs. Every web page is assembled from hundreds of discrete bits of information. A few database rows here, a template there, and some processor time to piece it all together. A cache stores the final collage and passes out copies to multiple users, saving your hardware considerable strain.
This is useful in several different scenarios.
Let’s say your online store has a high ratio of visits to conversions. For the 90% of users who are just window-shopping, it might make sense to serve the exact same (cached) copy to all of them when they view your homepage. When a visitor signs in or adds a product to their cart, they can bypass the cache and start seeing live results from your server. Otherwise, they’ll be viewing the cached (empty) shopping cart.
Alternatively, if your site is expected to deliver interaction to all users equally, you can use a cache to speed up individual components of a webpage if they can be shared between users. You still need to piece the parts together at the end, but you can reduce the performance cost of individual parts.
Now that we’ve covered what caching is, let’s talk about the underlying technology. A cache, at its core, is just a key-value store, which acts like a list of files in a folder. Each file has a different name and contains different data. General purpose caching tools, like Memcached and Redis, can be used for any type of data. They can be used to store whole pages, or just discrete chunks, like the results of a database query. Some frameworks like Rails or Django include their own caching mechanisms, allowing developers to build caching into their apps throughout the development process.
One frequent case for caching involves a certain popular open-source Content Management System. I’m looking at you, WordPress. Your ubiquity is matched only by horrific performance. Don’t get me wrong, WordPress is a fantastic tool for creating websites. It’s easy to set up, it runs everywhere, it has a huge base of developers contributing great features at a breakneck pace. The downside is that it has no built-in caching mechanism, and plugin authors often take shortcuts that hurt performance. So, how do you build a WordPress site that will scale to thousands of users each day? Enter Varnish.
Varnish is a beast. It’s a web server in its own right, it comes with its own programming language and though it can be complex to set up, is a powerful scaling tool. Technically, it’s what’s called a “Reverse Proxy Cache Server” which means that it sits between your web server and the internet, and acts as a mediator for all requests. Configured properly, it can mask even the slowest-performing websites by storing responses and serving them to visitors lightning-fast.
This all comes at a cost, however. Remember how I said that caches work by storing responses? That means that if your content changes, the cache will continue to serve the old content until forced to fetch new content. Most caching tools use a time-to-live (TTL) setting which sets an expiration date on cached data. Whenever data is served, the expiration date is checked. If it fails, then the user experiences a slow(er) load time while the data is fetched once again. This is why smaller-scale caching is better, because individual components can have different expiration dates, which means no single user will have to load everything at once.
With Varnish, this effect becomes more pronounced, especially if the origin site is slow to begin with. Web pages served in less than 500 milliseconds could take 10 seconds or more until the cache is rebuilt. This encourages site owners to set long TTLs on content, so that these rebuilds occur less frequently, but this just means that new content takes longer to show up on the site.
No large website can run without caching. It should be an integral part of your application structure, not just an afterthought. However, it’s important to know the tradeoffs between speed of information, accuracy of information, and user experience so you can choose the caching tool most appropriate for your needs.
Developing a comprehensive strategy for the Internet of Things
The IoT Lifecycle The IoT lifecycle can be a confusing continuum of unease, enterprise-level self-doubt, and lack of clarity.…Read more
Dealing with legacy IoT infrastructure in the modern age
While the collective concept of the Internet of Things is relatively new, the concepts behind IoT are not. In fact,…Read more