Post History

75%

+4 −0

Q&A How does caching work in qpixel? Where are the entry points?

How is caching done in QPixel? Note: this answer is based on my understanding of the code after setting up a self-hosted instance. Caching happens in QPixel in multiple locations. The primary one...

posted 3y ago by Taeir‭ · edited 3y ago by trichoplax‭

Answer

#2: Post edited by

trichoplax‭ · 2023-02-01T03:37:43Z (about 3 years ago)
Add details HTML tag as a substitute for a spoiler

Copy Link

Raw

Markdown

# How is caching done in QPixel?
Note: this answer is based on my understanding of the code after setting up a self-hosted instance.
Caching happens in QPixel in multiple locations. The primary one is a so-called Redis cache, which is also set as the Rails cache (`Rails.cache` in the code, but you may also see `RequestContext.redis` in some places).
### Terminology
**Request**: when you go to any page on the site, your browser does a request to the server (well actually, many requests). Each stylesheet, javascript file and image also results in a separate request to the server. Essentially, a single user actively using the site generates in the order of hundreds of requests per minute.
## Rails
Rails caches a bunch of things as it sees fit. For example, it will never run the same database query again for one request (unless if it knows that the results will have changed). This is not cached across requests though. This mainly makes it easier to write efficient code. If you request all the posts which the user can see in two places on one page, it will be loaded from cache the second time rather than hitting the database again.
## Redis
Redis can be seen as a large key-value store. You put things in with some key (i.e. in QPixel: `production://<community_id>/<user_id>/<category>/last_visit`) that uniquely identifies your item, and with which you can retrieve it again later. Depending on the setup, you may also indicate an expiration time. For QPixel, the expiration seems to be handled by rails / manually. For example, the post types that a category accepts are cached, and when the post types of a category are updated the old cache entry is deleted and a new one is written.
In most cases, cache retrieval has a fallback in case the value is not in the cache. This looks like this in the code:
```ruby
Rails.cache.fetch 'something', expires_in: ... do
# It was not in the cache, actually retrieve it from the database here. The value you return from this block will be put into the cache
# ...
end
```
### All items stored in the redis cache
A lot of places mention `Rails.cache`, but only about 30 are fetching from the cache. All of these are separated by community (unless indicated otherwise) but shared among all users and requests (unless indicated otherwise). For things like short-cached advertisements, after they expire a new random item is selected for the advertisements by the code block. All the Redis cache retrieved items are (at the time of writing):
~~(This would be a great place for a spoiler if that was supported :P)~~
* AWS Bucket name for images (until cleared) (NOTE: its probably faster to retrieve in Rails than from Redis)
* Advertisement of a specific post (5 minutes)
* The currently used community (1 hour) (NOTE: to prevent the otherwise very frequent database lookup for determining the current community)
* Pinned links (2 hours)
* Hot questions (4 hours)
* Categories to show in the header (until categories are added/changed)
* Lottery questions in a category (24 hours)
* Article advertisements (1 hour)
* Network advertisement (1 hour, network wide)
* Community advertisement (1 hour)
* Question advertisement (1 hour)
* Available post types (until cleared, network wide)
* Autogenerated user avatars for users with no avatar set (24 hours, network wide) (NOTE: image with letter on coloured background)
* Whether there are "new" posts for a particular user (5 minutes) (NOTE: based on the last visit, is responsible for adding the circle next to a category in case there are new things there)
* All available categories by name in lowercase (until categories are added/changed)
* All available categories by id (until categories are added/changed)
* The reputation changes that various actions have (until rep is changed, network wide) (NOTE: configurable in the admin panel, cached because used often).
* Score calculations which are relevant for unlocking abilities (3 hours) (NOTE: when you gain rep, you can unlock abilities. The calculations are cached since they are expensive to do)
* Reactions that are available for each post type (until cleared)
* Post type ids (until cleared, network wide)
* Post types marked as top level (until cleared, network wide)
* Post types marked as second level (until cleared, network wide)
* All site settings (until changed)
* User metrics (24 hours) (NOTE: Amount of: questions, top level posts, second level posts, received votes, received sum of score, votes cast, edits)
* Post description of a specific post (until cleared (by edit?))
* Whether the network is read only (?, network wide)
* Promoted posts (?, network wide)
* Global banner text (?, network wide) (NOTE: If there is a network-wide information banner set)
* Last save date/time of each post (?, network wide, specific to a user?)
Most of these values enter the cache whenever they are requested for the first time. Redis may also automatically remove stuff from the cache if it is out of memory (it uses as much memory as is available) based on fancy heuristics (less requested = gone quicker).
## Tags cache
A post has a tags cache, which is just a string with all the tags linked to the post listed by name separated by spaces. This in contrast to the actual tags associated to a post, which are stored not as a string but as separate relations in the database (post_id X, tag_id Y).
I'm not a 100% sure as to why this exists, but my best guess that it may be expensive to traverse all the tag associations all the time to display them in posts (posts can have many tags, could result in many additonal database queries). The post display page (or result in the posts list of search or category) generally does not need to traverse it since the single tags_cache string already holds all the info that the page needs. The actual associations are only necessary when looking at "what are all the posts linked to a specific tag" and stuff like that.
# How to approach adding something which should be cached?
Essentially, rails / redis makes this relatively easy. First you need to answer two questions:
1. How long should this thing be in the cache (time based or will you clean it up after you know its value will be changed?)
2. Under what key should it be stored in the case.
For the key, the key should contain all information which uniquely identifies the thing you are storing. If it is specific to a category, the category id should be in there. If it is specific to a user, the user id should be in there. Then finally it needs a name to identify the thing you put in (e.g. ad, lottery_questions, flag_score, post_description, ...).
At the location where you would normally determine the information you want to cache, you wrap your code in a `Rails.cache.fetch` block. For example:
```ruby
def some_method
a = <some really expensive calculation>
...
end
```
would become
```ruby
def some_method
a = Rails.cache.fetch '<my key>', expires_in: <time> do
<some really expensive calculation>
end
...
end
```
The same holds if this code is somewhere in a view (though then that code should probably be in the controller) or anywhere else in the code base.
If the cached value should be network wide, then add `include_community: false` to the call to fetch.

# How is caching done in QPixel?
Note: this answer is based on my understanding of the code after setting up a self-hosted instance.
Caching happens in QPixel in multiple locations. The primary one is a so-called Redis cache, which is also set as the Rails cache (`Rails.cache` in the code, but you may also see `RequestContext.redis` in some places).
### Terminology
**Request**: when you go to any page on the site, your browser does a request to the server (well actually, many requests). Each stylesheet, javascript file and image also results in a separate request to the server. Essentially, a single user actively using the site generates in the order of hundreds of requests per minute.
## Rails
Rails caches a bunch of things as it sees fit. For example, it will never run the same database query again for one request (unless if it knows that the results will have changed). This is not cached across requests though. This mainly makes it easier to write efficient code. If you request all the posts which the user can see in two places on one page, it will be loaded from cache the second time rather than hitting the database again.
## Redis
Redis can be seen as a large key-value store. You put things in with some key (i.e. in QPixel: `production://<community_id>/<user_id>/<category>/last_visit`) that uniquely identifies your item, and with which you can retrieve it again later. Depending on the setup, you may also indicate an expiration time. For QPixel, the expiration seems to be handled by rails / manually. For example, the post types that a category accepts are cached, and when the post types of a category are updated the old cache entry is deleted and a new one is written.
In most cases, cache retrieval has a fallback in case the value is not in the cache. This looks like this in the code:
```ruby
Rails.cache.fetch 'something', expires_in: ... do
# It was not in the cache, actually retrieve it from the database here. The value you return from this block will be put into the cache
# ...
end
```
### All items stored in the redis cache
A lot of places mention `Rails.cache`, but only about 30 are fetching from the cache. All of these are separated by community (unless indicated otherwise) but shared among all users and requests (unless indicated otherwise). For things like short-cached advertisements, after they expire a new random item is selected for the advertisements by the code block.
<details>
<summary>All the Redis cache retrieved items (at the time of writing)</summary>
* AWS Bucket name for images (until cleared) (NOTE: its probably faster to retrieve in Rails than from Redis)
* Advertisement of a specific post (5 minutes)
* The currently used community (1 hour) (NOTE: to prevent the otherwise very frequent database lookup for determining the current community)
* Pinned links (2 hours)
* Hot questions (4 hours)
* Categories to show in the header (until categories are added/changed)
* Lottery questions in a category (24 hours)
* Article advertisements (1 hour)
* Network advertisement (1 hour, network wide)
* Community advertisement (1 hour)
* Question advertisement (1 hour)
* Available post types (until cleared, network wide)
* Autogenerated user avatars for users with no avatar set (24 hours, network wide) (NOTE: image with letter on coloured background)
* Whether there are "new" posts for a particular user (5 minutes) (NOTE: based on the last visit, is responsible for adding the circle next to a category in case there are new things there)
* All available categories by name in lowercase (until categories are added/changed)
* All available categories by id (until categories are added/changed)
* The reputation changes that various actions have (until rep is changed, network wide) (NOTE: configurable in the admin panel, cached because used often).
* Score calculations which are relevant for unlocking abilities (3 hours) (NOTE: when you gain rep, you can unlock abilities. The calculations are cached since they are expensive to do)
* Reactions that are available for each post type (until cleared)
* Post type ids (until cleared, network wide)
* Post types marked as top level (until cleared, network wide)
* Post types marked as second level (until cleared, network wide)
* All site settings (until changed)
* User metrics (24 hours) (NOTE: Amount of: questions, top level posts, second level posts, received votes, received sum of score, votes cast, edits)
* Post description of a specific post (until cleared (by edit?))
* Whether the network is read only (?, network wide)
* Promoted posts (?, network wide)
* Global banner text (?, network wide) (NOTE: If there is a network-wide information banner set)
* Last save date/time of each post (?, network wide, specific to a user?)
</details>
Most of these values enter the cache whenever they are requested for the first time. Redis may also automatically remove stuff from the cache if it is out of memory (it uses as much memory as is available) based on fancy heuristics (less requested = gone quicker).
## Tags cache
A post has a tags cache, which is just a string with all the tags linked to the post listed by name separated by spaces. This in contrast to the actual tags associated to a post, which are stored not as a string but as separate relations in the database (post_id X, tag_id Y).
I'm not a 100% sure as to why this exists, but my best guess that it may be expensive to traverse all the tag associations all the time to display them in posts (posts can have many tags, could result in many additonal database queries). The post display page (or result in the posts list of search or category) generally does not need to traverse it since the single tags_cache string already holds all the info that the page needs. The actual associations are only necessary when looking at "what are all the posts linked to a specific tag" and stuff like that.
# How to approach adding something which should be cached?
Essentially, rails / redis makes this relatively easy. First you need to answer two questions:
1. How long should this thing be in the cache (time based or will you clean it up after you know its value will be changed?)
2. Under what key should it be stored in the case.
For the key, the key should contain all information which uniquely identifies the thing you are storing. If it is specific to a category, the category id should be in there. If it is specific to a user, the user id should be in there. Then finally it needs a name to identify the thing you put in (e.g. ad, lottery_questions, flag_score, post_description, ...).
At the location where you would normally determine the information you want to cache, you wrap your code in a `Rails.cache.fetch` block. For example:
```ruby
def some_method
a = <some really expensive calculation>
...
end
```
would become
```ruby
def some_method
a = Rails.cache.fetch '<my key>', expires_in: <time> do
<some really expensive calculation>
end
...
end
```
The same holds if this code is somewhere in a view (though then that code should probably be in the controller) or anywhere else in the code base.
If the cached value should be network wide, then add `include_community: false` to the call to fetch.

#1: Initial revision by

Taeir‭ · 2022-09-03T16:19:23Z (over 3 years ago)

Copy Link

Raw

Markdown

# How is caching done in QPixel?
Note: this answer is based on my understanding of the code after setting up a self-hosted instance.

Caching happens in QPixel in multiple locations. The primary one is a so-called Redis cache, which is also set as the Rails cache (`Rails.cache` in the code, but you may also see `RequestContext.redis` in some places).

### Terminology
**Request**: when you go to any page on the site, your browser does a request to the server (well actually, many requests). Each stylesheet, javascript file and image also results in a separate request to the server. Essentially, a single user actively using the site generates in the order of hundreds of requests per minute.

## Rails
Rails caches a bunch of things as it sees fit. For example, it will never run the same database query again for one request (unless if it knows that the results will have changed). This is not cached across requests though. This mainly makes it easier to write efficient code. If you request all the posts which the user can see in two places on one page, it will be loaded from cache the second time rather than hitting the database again.

## Redis

Redis can be seen as a large key-value store. You put things in with some key (i.e. in QPixel: `production://<community_id>/<user_id>/<category>/last_visit`) that uniquely identifies your item, and with which you can retrieve it again later. Depending on the setup, you may also indicate an expiration time. For QPixel, the expiration seems to be handled by rails / manually. For example, the post types that a category accepts are cached, and when the post types of a category are updated the old cache entry is deleted and a new one is written.

In most cases, cache retrieval has a fallback in case the value is not in the cache. This looks like this in the code:

```ruby
Rails.cache.fetch 'something', expires_in: ... do
  # It was not in the cache, actually retrieve it from the database here. The value you return from this block will be put into the cache
  # ...
end
```

### All items stored in the redis cache

A lot of places mention `Rails.cache`, but only about 30 are fetching from the cache. All of these are separated by community (unless indicated otherwise) but shared among all users and requests (unless indicated otherwise). For things like short-cached advertisements, after they expire a new random item is selected for the advertisements by the code block. All the Redis cache retrieved items are (at the time of writing):

(This would be a great place for a spoiler if that was supported :P)

* AWS Bucket name for images (until cleared) (NOTE: its probably faster to retrieve in Rails than from Redis)
* Advertisement of a specific post (5 minutes)
* The currently used community (1 hour) (NOTE: to prevent the otherwise very frequent database lookup for determining the current community)
* Pinned links (2 hours)
* Hot questions (4 hours)
* Categories to show in the header (until categories are added/changed)
* Lottery questions in a category (24 hours)
* Article advertisements (1 hour)
* Network advertisement (1 hour, network wide)
* Community advertisement (1 hour)
* Question advertisement (1 hour)
* Available post types (until cleared, network wide)
* Autogenerated user avatars for users with no avatar set (24 hours, network wide) (NOTE: image with letter on coloured background)
* Whether there are "new" posts for a particular user (5 minutes) (NOTE: based on the last visit, is responsible for adding the circle next to a category in case there are new things there)
* All available categories by name in lowercase (until categories are added/changed)
* All available categories by id (until categories are added/changed)
* The reputation changes that various actions have (until rep is changed, network wide) (NOTE: configurable in the admin panel, cached because used often).
* Score calculations which are relevant for unlocking abilities (3 hours) (NOTE: when you gain rep, you can unlock abilities. The calculations are cached since they are expensive to do)
* Reactions that are available for each post type (until cleared)
* Post type ids (until cleared, network wide)
* Post types marked as top level (until cleared, network wide)
* Post types marked as second level (until cleared, network wide)
* All site settings (until changed)
* User metrics (24 hours) (NOTE: Amount of: questions, top level posts, second level posts, received votes, received sum of score, votes cast, edits)
* Post description of a specific post (until cleared (by edit?))
* Whether the network is read only (?, network wide)
* Promoted posts (?, network wide)
* Global banner text (?, network wide) (NOTE: If there is a network-wide information banner set)
* Last save date/time of each post (?, network wide, specific to a user?)

Most of these values enter the cache whenever they are requested for the first time. Redis may also automatically remove stuff from the cache if it is out of memory (it uses as much memory as is available) based on fancy heuristics (less requested = gone quicker).

## Tags cache
A post has a tags cache, which is just a string with all the tags linked to the post listed by name separated by spaces. This in contrast to the actual tags associated to a post, which are stored not as a string but as separate relations in the database (post_id X, tag_id Y).

I'm not a 100% sure as to why this exists, but my best guess that it may be expensive to traverse all the tag associations all the time to display them in posts (posts can have many tags, could result in many additonal database queries). The post display page (or result in the posts list of search or category) generally does not need to traverse it since the single tags_cache string already holds all the info that the page needs. The actual associations are only necessary when looking at "what are all the posts linked to a specific tag" and stuff like that.

# How to approach adding something which should be cached?
Essentially, rails / redis makes this relatively easy. First you need to answer two questions:

1. How long should this thing be in the cache (time based or will you clean it up after you know its value will be changed?)
2. Under what key should it be stored in the case.

For the key, the key should contain all information which uniquely identifies the thing you are storing. If it is specific to a category, the category id should be in there. If it is specific to a user, the user id should be in there. Then finally it needs a name to identify the thing you put in (e.g. ad, lottery_questions, flag_score, post_description, ...).

At the location where you would normally determine the information you want to cache, you wrap your code in a `Rails.cache.fetch` block. For example:

```ruby
def some_method
  a = <some really expensive calculation>
  
  ...
end
```

would become

```ruby
def some_method
  a = Rails.cache.fetch '<my key>', expires_in: <time> do
    <some really expensive calculation>
  end

  ...
end
```

The same holds if this code is somewhere in a view (though then that code should probably be in the controller) or anywhere else in the code base.

If the cached value should be network wide, then add `include_community: false` to the call to fetch.

Communities

Post History