Mutable Data Caching

(EDIT: See now that you mean something different, let’s add the case at the end of this. )

So this is the case I was comparing against in the post above, illustrating request paths.
And this is how caching works normally. After some time or event (cache size filled) the cache is cleared.

I was exploring how caching would work in combination with pubsub, as a comparison to the regular caching only, as it was described when checking version (as someone suggested above as a way to maximize consistency). The reason I wanted to compare it, is that someone else suggested pubsub, So there where two suggestions, and I thought it’d be interesting to analyse the differences.

I went through how it would give differences in load, depending on request pattern. I’ll write it here again, hopefully it is more clear what I mean with the additional formatting:

Data is requested by A, it is at C and passes B. In both of the illustrated examples, data is cached, and now B has a copy.

TTL = cache time to live (also subscription time to live)
r = Number of requests within TTL
u = Number of updates within TTL
s_v1 = data size of the request for version
s_v2 = data size of the response for version
s_d = data size of the requested data
distance = A => B => C = C => B => A

Example 1. (Regular Caching with a version check):
If A request rate for data is r, and there is a version request every time going from B to C, then that consumes more, if r > u.
Every request is a roundabout; s_v1 would go r / TTL times A => B => C and s_v2 goes r - 1 times C => B => A, then same direction 1 time with the s_d (if the version request returns the data when changed).

Example 2. (Caching in combination with pubsub):
The data would go u / TTL times C => B => A, so s_d would be transfered the distance u / TTL times.

We simplify and assume s_v1 ~= s_v2 and call it s_v.

Load Example 1: (2 * r / TTL - 1) * distance * s_v + r / TTL * distance * s_d
Load Example 2: u / TTL * distance * s_d

We get LE1 > LE2 for r >= u. (i.e. pubsub gives less load)

if s_d >> s_v then we will always be looking at r and u, where pubsub can give less load if r > u.

If cached data is more often than not (which has been proposed as a reasonable assumption), requested only once and not again within the cache TTL, then pubsub is more demanding than regular caching with version check.
This is a bit simplified, because it depends on the data transferred in a version check and the actual requested data size.

Edit: I see what you mean now, with invalidating instead of updating.

The costs depend on what the TTL is and the values for r and u.
If we assume r >= u, then we also have LE3 > LE2.

But let’s add the example of the push from data source, requesting cache to be cleared instead of pushing out the data:

Example 3. (Push on update invalidates cache):
// todo

1 Like