Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Notifications
Mark all as read
Q&A

How does caching work in qpixel? Where are the entry points? Question

+1
−0

Caching is really important for scaling. I know we cache some data -- something about tags (tags for a post?) has been discussed on meta, and I know the "new posts" silver dot (badge) is cached so we don't check it on every page-load.

How and where is caching implemented? If I want to add a new indicator that I think will probably need to be cached, like an indicator for pending suggested edits, how should I approach that in a performant way?

Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

0 comment threads

1 answer

+2
−0

How is caching done in QPixel?

Note: this answer is based on my understanding of the code after setting up a self-hosted instance.

Caching happens in QPixel in multiple locations. The primary one is a so-called Redis cache, which is also set as the Rails cache (Rails.cache in the code, but you may also see RequestContext.redis in some places).

Terminology

Request: when you go to any page on the site, your browser does a request to the server (well actually, many requests). Each stylesheet, javascript file and image also results in a separate request to the server. Essentially, a single user actively using the site generates in the order of hundreds of requests per minute.

Rails

Rails caches a bunch of things as it sees fit. For example, it will never run the same database query again for one request (unless if it knows that the results will have changed). This is not cached across requests though. This mainly makes it easier to write efficient code. If you request all the posts which the user can see in two places on one page, it will be loaded from cache the second time rather than hitting the database again.

Redis

Redis can be seen as a large key-value store. You put things in with some key (i.e. in QPixel: production://<community_id>/<user_id>/<category>/last_visit) that uniquely identifies your item, and with which you can retrieve it again later. Depending on the setup, you may also indicate an expiration time. For QPixel, the expiration seems to be handled by rails / manually. For example, the post types that a category accepts are cached, and when the post types of a category are updated the old cache entry is deleted and a new one is written.

In most cases, cache retrieval has a fallback in case the value is not in the cache. This looks like this in the code:

Rails.cache.fetch 'something', expires_in: ... do
  # It was not in the cache, actually retrieve it from the database here. The value you return from this block will be put into the cache
  # ...
end

All items stored in the redis cache

A lot of places mention Rails.cache, but only about 30 are fetching from the cache. All of these are separated by community (unless indicated otherwise) but shared among all users and requests (unless indicated otherwise). For things like short-cached advertisements, after they expire a new random item is selected for the advertisements by the code block. All the Redis cache retrieved items are (at the time of writing):

(This would be a great place for a spoiler if that was supported :P)

  • AWS Bucket name for images (until cleared) (NOTE: its probably faster to retrieve in Rails than from Redis)
  • Advertisement of a specific post (5 minutes)
  • The currently used community (1 hour) (NOTE: to prevent the otherwise very frequent database lookup for determining the current community)
  • Pinned links (2 hours)
  • Hot questions (4 hours)
  • Categories to show in the header (until categories are added/changed)
  • Lottery questions in a category (24 hours)
  • Article advertisements (1 hour)
  • Network advertisement (1 hour, network wide)
  • Community advertisement (1 hour)
  • Question advertisement (1 hour)
  • Available post types (until cleared, network wide)
  • Autogenerated user avatars for users with no avatar set (24 hours, network wide) (NOTE: image with letter on coloured background)
  • Whether there are "new" posts for a particular user (5 minutes) (NOTE: based on the last visit, is responsible for adding the circle next to a category in case there are new things there)
  • All available categories by name in lowercase (until categories are added/changed)
  • All available categories by id (until categories are added/changed)
  • The reputation changes that various actions have (until rep is changed, network wide) (NOTE: configurable in the admin panel, cached because used often).
  • Score calculations which are relevant for unlocking abilities (3 hours) (NOTE: when you gain rep, you can unlock abilities. The calculations are cached since they are expensive to do)
  • Reactions that are available for each post type (until cleared)
  • Post type ids (until cleared, network wide)
  • Post types marked as top level (until cleared, network wide)
  • Post types marked as second level (until cleared, network wide)
  • All site settings (until changed)
  • User metrics (24 hours) (NOTE: Amount of: questions, top level posts, second level posts, received votes, received sum of score, votes cast, edits)
  • Post description of a specific post (until cleared (by edit?))
  • Whether the network is read only (?, network wide)
  • Promoted posts (?, network wide)
  • Global banner text (?, network wide) (NOTE: If there is a network-wide information banner set)
  • Last save date/time of each post (?, network wide, specific to a user?)

Most of these values enter the cache whenever they are requested for the first time. Redis may also automatically remove stuff from the cache if it is out of memory (it uses as much memory as is available) based on fancy heuristics (less requested = gone quicker).

Tags cache

A post has a tags cache, which is just a string with all the tags linked to the post listed by name separated by spaces. This in contrast to the actual tags associated to a post, which are stored not as a string but as separate relations in the database (post_id X, tag_id Y).

I'm not a 100% sure as to why this exists, but my best guess that it may be expensive to traverse all the tag associations all the time to display them in posts (posts can have many tags, could result in many additonal database queries). The post display page (or result in the posts list of search or category) generally does not need to traverse it since the single tags_cache string already holds all the info that the page needs. The actual associations are only necessary when looking at "what are all the posts linked to a specific tag" and stuff like that.

How to approach adding something which should be cached?

Essentially, rails / redis makes this relatively easy. First you need to answer two questions:

  1. How long should this thing be in the cache (time based or will you clean it up after you know its value will be changed?)
  2. Under what key should it be stored in the case.

For the key, the key should contain all information which uniquely identifies the thing you are storing. If it is specific to a category, the category id should be in there. If it is specific to a user, the user id should be in there. Then finally it needs a name to identify the thing you put in (e.g. ad, lottery_questions, flag_score, post_description, ...).

At the location where you would normally determine the information you want to cache, you wrap your code in a Rails.cache.fetch block. For example:

def some_method
  a = <some really expensive calculation>
  
  ...
end

would become

def some_method
  a = Rails.cache.fetch '<my key>', expires_in: <time> do
    <some really expensive calculation>
  end

  ...
end

The same holds if this code is somewhere in a view (though then that code should probably be in the controller) or anywhere else in the code base.

If the cached value should be network wide, then add include_community: false to the call to fetch.

Why does this post require moderator attention?
You might want to add some details to your flag.

0 comment threads

Sign up to answer this question »

This community is part of the Codidact network. We have other communities too — take a look!

You can also join us in chat!

Want to advertise this community? Use our templates!

Like what we're doing? Support us! Donate