Skip to content

Commit

Permalink
use case conversion
Browse files Browse the repository at this point in the history
dropped th econfiguringclients page that needs to be rewritten. updated
faq a bit (holy crap it was out of date)
  • Loading branch information
dormando committed Sep 5, 2024
1 parent d88603d commit 17f1acd
Show file tree
Hide file tree
Showing 3 changed files with 571 additions and 4 deletions.
249 changes: 249 additions & 0 deletions content/userguide/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,252 @@ title = 'User Guide'
date = 2024-09-02T11:18:33-07:00
weight = 3
+++

This basic tutorial shows via pseudocode how you can get started with integrating memcached into your application. Memcached is not an automatic application accelerator: it must be integrated into an application via code.

[The Caching Story Tutorial](/tutorialcachingstory/) is a good place to start
if you are unfamiliar with memcached.

## Basic Data Caching

The "hello world" of memcached is to fetch "something" from somewhere, maybe process it a little, then put it into the cache, to expire in N seconds.

### Initializing a Memcached Client

Read the documentation carefully for your client.

```perl
my $memclient = Cache::Memcached->new({ servers => [ '10.0.0.10:11211', '10.0.0.11:11211' ]});
```

```
# pseudocode
memcli = new Memcache
memcli:add_server('10.0.0.10:11211')
```

Some rare clients will allow you add the same servers over and over again, without harm. Most will require that you carefully construct your memcached client object *once* at the start of your request, and perhaps persist it between requests. Initializing multiple times may cause memory leaks in your application or stack up connections against memcached until you cause a failure.

### Wrapping an SQL Query

Memcached is primarily used for reducing load on SQL databases.

```
# Don't load little bobby tables
sql = "SELECT * FROM user WHERE user_id = ?"
key = 'SQL:' . user_id . ':' . md5sum(sql)
# We check if the value is 'defined', since '0' or 'FALSE' # can be
# legitimate values!
if (defined result = memcli:get(key)) {
return result
} else {
handler = run_sql(sql, user_id)
# Often what you get back when executing SQL is a special handler
# object. You can't directly cache this. Stick to strings, arrays,
# and hashes/dictionaries/tables
rows_array = handler:turn_into_an_array
# Cache it for five minutes
memcli:set(key, rows_array, 5 * 60)
return rows_array
}
```

Wow, zippy! When you cache these user rows, they will now see that same data for up to five minutes. Unless you actively invalidate the cache when a user makes a change, it can take up to five minutes for them to see a difference.

Often this is enough to help. If you have some complex queries, such as a count of users or number of posts in a thread. It might be acceptable to limit how often those queries can be issued by having a flat cache.

### Wrapping Several Queries

The more processing that you can turn into a single memcached request, the better. Often you can replace several SQL queries with a single memcached lookup.

```
sql1 = "SELECT * FROM user WHERE user_id = ?"
sql2 = "SELECT * FROM user_preferences WHERE user_id = ?"
key = 'SQL:' . user_id . ':' . md5sum(sql1 . sql2)
if (defined result = memcli:get(key)) {
return result
} else {
# Remember to add error handling, kids ;)
handler = run_sql(sql1, user_id)
t[info] = handler:turn_into_an_array
handler = run_sql(sql2, user_id)
t[pref] = handler:turn_into_an_array
# Client will magically take this hash/table/dict/etc
# and serialize it for us.
memcli:set(key, t, 5 * 60)
return t
}
```

When you load a user, you fetch the user itself *and* their site preferences (whether they want to be seen by other users, what theme to show, etc). What was once two queries and possibly many rows of data, is now a single cache item, cached for five minutes.

### Wrapping Objects

Some languages allow you to configure objects to be serialized. Exactly how to do this in your language is beyond the scope of this document, however some tips remain.

* Consider if you actually need to serialize a whole object. Odds are your constructor could pull from cache.
* Serialize it as efficiently and simply as possible. Spending a lot of time in object setup/teardown can drag CPU.

Further consider, if you're deserializing a huge object for a request, and then using one small part of it, you might want to cache those parts separately.

### Fragment Caching

Once upon a time ESI (Edge Side Includes) were all the rage. Sadly they require special proxies/caching/etc. You can do this within your app for dynamic, authenticated pages just fine.

Memcached isn't just all about preventing database queries. You can cache
rendered HTML as well.

```
# Lets generate a bio page!
user = fetch_user_info(user_id)
bio_template = fetch_biotheme_for(user_id)
page_template = fetch_page_theme
pagedata = fetch_page_data
bio_fragment = apply_template(bio_template, user)
page = apply_template(page_template, bio_fragment)
print "Content-Type: text/html", page
```

In this oversimplified example, we're loading user data (which could be using a cache!), loading the raw template for the "bio" part of a webpage (which could be using a cache!). Then it loads the main template, which includes the header and footer.

Finally, it processes all that together into the main page and returns it. Applying templates can be costly. You can cache the assembled bio fragment, in case you're rendering a custom header for the viewing user. Or if it doesn't matter, cache the whole 'page' output.

```
key = 'FRAG-BIO:' . user_id
if (result = memcli:get(key)) {
return result
} else {
user = fetch_user_info(user_id)
bio_template = fetch_biotheme_for(user_id)
bio_fragment = apply_template(bio_template, user)
memcli:set(key, bio_fragment, 5 * 15)
return bio_fragment
}
```

See? Why do more work than you have to. The more you can roll up the faster pages will render, the happier your users.

## Extended Functions

Beyond 'set', there are add, incr, decr, etc. They are simple commands but require a little finesse.

### Proper Use of `add`

`add` allows you to set a value if it doesn't already exist. You use this when initializing counters, setting locks, or otherwise setting data you don't want overwritten as easily. There can be some odd little gotchas and race conditions in handling of `add` however.

```
# There can be only one
key = "the_highlander"
real_highlander = memcli:get(key)
if (! real_highlander) {
# Hmm, nobody there.
var = fetch_highlander
if (! memcli:add(key, var, 3600)) {
# Uh oh! Somebody beat us!
# We can either use the variable we fetched,
# or issue `get` again in case it might be newer.
real_highlander = memcli:get(key)
} else {
# We win!
gloat
}
}
return real_highlander
```

### Proper Use of `incr` or `decr`

`incr` and `decr` commands can be used to maintain counters. Such as how many hits a page has received, when you rate limit a user, etc. These commands will allow you to add values from 1 or higher, or even negative values.

They do not, however, initialize a missing value.

```
# Got a hit!
key = 'hits: ' . user_id
if (! memcli:incr(key, 1)) {
# Whoops, key doesn't already exist!
# There's a chance someone else just noticed this too,
# so we use `add` instead of `set`
if (! memcli:add(key, 1, 60 * 60 * 24)) {
# Failed! Someone else already put it back.
# So lets try one more time to incr.
memcli:incr(key, 1)
} else {
return success
}
} else {
return success
}
```

If you're not careful, you could miss counting that hit :) You can doll this up and retry a few times, or no times, depending on how important you think it is. Just don't run a `set` when you mean to do an `add` in this case.

## Cache Invalidation

Levelling up in memcached requires that you learn about actively invalidating (or revalidating) your cache.

When a user comes along and edits their user data, you should be attempting to keep the cache in sync some way, so the user has no idea they're being fed cached data.

### Expiration

A good place to start is to tune your expiration times. Even if you're actively deleting or overwriting cached data, you'll still want to have the cache expire occasionally. In case your app has a bug, a crash, a network blip, or some other issue where the cache could become out of sync.

There isn't a "rule of thumb" when picking an expiration time. Sit back and think about your users, and what your data is. How long can you go without making your users angry? Be honest with yourself, as "THEY _ALWAYS_ NEED FRESH DATA" isn't necessarily true.

Expiration times are specified in unsigned integer seconds. They can be set from `0`, meaning "never expire", to 30 days `(60*60*24*30)`. Any time higher than 30 days is interpreted as a unix timestamp date. If you want to expire an object on january 1st of next year, this is how you do that.

For binary protocol an expiration must be unsigned. If a negative expiration
is given to the ASCII protocol, it is treated it as "expire immediately".

### `delete`

The simplest method of invalidation is to simply delete it, and have your website re-cache the data next time it's fetched.

So user Bob updates his bio. You want Bob to see his latest info when he so vainly reloads the page. So you:

```
memcli:delete('FRAG-BIO: ' . user_id)
```

... and next time he loads the page, it will fetch from the database and repopulate the cache.

### `set`

The most efficient idea is to actively update your cache as your data changes. When Bob updates his bio, take bob's bio object and shove it into the cache via 'set'. You can pass the new data into the same routine that normally checks for data, or however you want to structure it.

Play your cards right, and your database only ever handles writes, and data it hasn't seen in a long time.

### Invalidating by Tag

TODO: link to namespacing document + say how this isn't possible.

## Key Usage

Thinking about your keys can save you a lot of time and memory. Memcached is a hash, but it also remembers the full key internally. The longer your keys are, the more bytes memcached has to hash to look up your value, and the more memory it wastes storing a full copy of your key.

On the other hand, it should be easy to figure out exactly where in your code a key came from. Otherwise many laborous hours of debugging wait for you.

### Avoid User Input

It's very easy to compromise memcached if you use arbitrary user input for keys. The ASCII protocol uses spaces and newlines. Ensure that neither show up your keys, live long and prosper. Binary protocol does not have this issue.

### Short Keys

64-bit UID's are clever ways to identify a user, but suck when printed out. 18446744073709551616. 20 characters! Using base64 encoding, or even just hexadecimal, you can cut that down by quite a bit.

With the binary protocol, it's possible to store anything, so you can directly pack 4 bytes into the key. This makes it impossible to read back via the ASCII protocol, and you should have tools available to simply determine what a key is.

### Informative Keys

```
key = 'SQL' . md5sum("SELECT blah blah blah")
```

... might be clever, but if you're looking at this key via tcpdump, strace, etc. You won't have any clue where it's coming from.

In this particular example, you may put your SQL queries into an outside file with the md5sum next to them. Or, more simply, appending a unique query ID into the key.

```
key = 'SQL' . query_id . ':' . m5sum("SELECT blah blah blah")
```
119 changes: 117 additions & 2 deletions content/userguide/faq.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,120 @@
+++
title = 'Faq'
title = 'FAQ'
date = 2024-09-04T14:41:27-07:00
draft = true
weight = 2
+++

## Basics

### How can you list all keys?

You can list all keys using an interface that is deliberately limited.
Applications _must not_ rely on reading all keys back from a memcached server.
A server may have millions or billions of unrelated keys. An application that
relies on looking at all keys to then render a page will eventually fail.

You can use the `lru crawler` to examine all keys in an instance. This
interface provides useful data for doing an analysis on data that is stored in
a cache. See [the protocol
documentation](Protocol](http://github.com/memcached/memcached/blob/master/doc/protocol.txt))
for full info.

### Why only RAM?

Everything memcached does is an attempt to guarantee latency and speed. That
said, it can make sense for some larger values to be fetched from high speed
flash drives. [A feature called extstore](/features/flashstorage/) allows
splitting items between RAM and disk storage.

### Why no complex operations?

All operations should run in O(1) time. They must be atomic. This doesn't necessarily mean complex operations can never happen, but it means we have to think very carefully about them first. Many complex operations can be emulated on top of more basic functionality.

### Why is memcached not recommended for sessions? Everyone does it!

If a session disappears, often the user is logged out. If a portion of a cache disappears, either due to a hardware crash or a simple software upgrade, it should not cause your users noticable pain. [This overly wordy post](http://dormando.livejournal.com/495593.html) explains alternatives. Memcached can often be used to reduce IO requirements to very very little, which means you may continue to use your existing relational database for the things it's good at.

Like keeping your users from being knocked off your site.

### What about the MySQL query cache?

The MySQL query cache can be a useful start for small sites. Unfortunately it uses many global locks on the mysql database, so enabling it can throttle you down. It also caches queries per table, and has to expire the entire cache related to a table when it changes, at all. If your site is fairly static this can work out fine, but when your tables start changing with any frequency this immediately falls over.

Memory is also limited, as it requires using a chunk of what's directly on your database.

### Is memcached atomic?

Aside from any bugs you may come across, all commands are internally atomic. Issuing multiple sets at the same time has no ill effect, aside from the last one in being the one that sticks.

### How do I troubleshoot client timeouts?

See [Timeouts](/troubleshooting/timeouts) for help.

## Setup Questions

### How do I authenticate?

Limited password based authentication is available in [the basic protocol](http://github.com/memcached/memcached/blob/master/doc/protocol.txt) - You can also enable TLS and authenticate by certificates verification.

### How do you handle failover?

You usually don't. Some clients have a "failover" option that will try the next server in the case of a failure.

- TODO: renovate this section.

### How do you handle replication?

It doesn't. Adding replication to the system halves your effective cache size. If you can't handle even a few percent extra cache misses, you have serious problems. Even with replication, things can break. More moving parts. Software to crash.

- TODO: renovate this section

### Can you persist cache between restarts?

Yes, in some situations. See [the documentation on warm restart](/features/restart/).

### Do clients and servers all need to talk to each other?

Nope. The less chatter, the more scalable the system.

## Monitoring

### Why Isn't curr_items Decreasing When Items Expire?

Expiration in memcached is lazy. In general, an item cannot be known to be expired until something looks at it. This helps the server keep consistent performance.

Since 1.5.0 a background thread analyzes the cache over time and
asynchronously removes expired items from memory. [See this blog post for more detail](https://memcached.org/blog/modern-lru/)

## Use Cases

### When would you not want to use memcached?

It doesn't always make sense to add memcached to your application.

TODO: link to that whynot page here or just inline new stuff?

### Why can't I use it as a database?

Memcached is an ephemeral data store. Meaning if the server goes down (crash,
reboot, "cloud burps") then your data is gone. The ephemeral nature of the
software allows us to take extreme tradeoffs in design which allow us to be
10x, 100x, or even 1000x faster than a traditional database. Combining caching
with traditional datastores allows reducing cost and improving user
experience.

### Can using memcached make my application slower?

Yes, absolutely. If your DB queries are all fast, your website is fast, adding memcached might not make it faster.

Also, this:

```
my @post_ids = fetch_all_posts($thread_id);
my @post_entries = ();
for my $post_id (@post_ids) {
push(@post_entries, $memc->get($post_id));
}
# Yay I have all my post entries!
```

Instead of this anti-pattern, use pipelined gets instead. Fetching a single item from memcached still requires a network roundtrip and a little processing. The more you can fetch at once the better.
Loading

0 comments on commit 17f1acd

Please sign in to comment.