use case conversion

dropped th econfiguringclients page that needs to be rewritten. updated faq a bit (holy crap it was out of date)
memcached · Sep 5, 2024 · 17f1acd · 17f1acd
1 parent d88603d
commit 17f1acd
Show file tree

Hide file tree

Showing 3 changed files with 571 additions and 4 deletions.
diff --git a/content/userguide/_index.md b/content/userguide/_index.md
@@ -3,3 +3,252 @@ title = 'User Guide'
 date = 2024-09-02T11:18:33-07:00
 weight = 3
 +++
+
+This basic tutorial shows via pseudocode how you can get started with integrating memcached into your application. Memcached is not an automatic application accelerator: it must be integrated into an application via code.
+
+[The Caching Story Tutorial](/tutorialcachingstory/) is a good place to start
+if you are unfamiliar with memcached.
+
+## Basic Data Caching
+
+The "hello world" of memcached is to fetch "something" from somewhere, maybe process it a little, then put it into the cache, to expire in N seconds.
+
+### Initializing a Memcached Client
+
+Read the documentation carefully for your client.
+
+```perl
+my $memclient = Cache::Memcached->new({ servers => [ '10.0.0.10:11211', '10.0.0.11:11211' ]});
+```
+
+```
+ # pseudocode
+memcli = new Memcache
+memcli:add_server('10.0.0.10:11211')
+```
+
+Some rare clients will allow you add the same servers over and over again, without harm. Most will require that you carefully construct your memcached client object *once* at the start of your request, and perhaps persist it between requests. Initializing multiple times may cause memory leaks in your application or stack up connections against memcached until you cause a failure.
+
+### Wrapping an SQL Query
+
+Memcached is primarily used for reducing load on SQL databases.
+
+```
+ # Don't load little bobby tables
+sql = "SELECT * FROM user WHERE user_id = ?"
+key = 'SQL:' . user_id . ':' . md5sum(sql)
+ # We check if the value is 'defined', since '0' or 'FALSE' # can be
+ # legitimate values!
+if (defined result = memcli:get(key)) {
+	return result
+} else {
+	handler = run_sql(sql, user_id)
+	# Often what you get back when executing SQL is a special handler
+	# object. You can't directly cache this. Stick to strings, arrays,
+	# and hashes/dictionaries/tables
+	rows_array = handler:turn_into_an_array
+	# Cache it for five minutes
+	memcli:set(key, rows_array, 5 * 60)
+	return rows_array
+}
+```
+
+Wow, zippy! When you cache these user rows, they will now see that same data for up to five minutes. Unless you actively invalidate the cache when a user makes a change, it can take up to five minutes for them to see a difference.
+
+Often this is enough to help. If you have some complex queries, such as a count of users or number of posts in a thread. It might be acceptable to limit how often those queries can be issued by having a flat cache.
+
+### Wrapping Several Queries
+
+The more processing that you can turn into a single memcached request, the better. Often you can replace several SQL queries with a single memcached lookup.
+
+```
+sql1 = "SELECT * FROM user WHERE user_id = ?"
+sql2 = "SELECT * FROM user_preferences WHERE user_id = ?"
+key  = 'SQL:' . user_id . ':' . md5sum(sql1 . sql2)
+if (defined result = memcli:get(key)) {
+	return result
+} else {
+	# Remember to add error handling, kids ;)
+	handler = run_sql(sql1, user_id)
+	t[info] = handler:turn_into_an_array
+	handler = run_sql(sql2, user_id)
+	t[pref] = handler:turn_into_an_array
+	# Client will magically take this hash/table/dict/etc
+	# and serialize it for us.
+	memcli:set(key, t, 5 * 60)
+	return t
+}
+```
+
+When you load a user, you fetch the user itself *and* their site preferences (whether they want to be seen by other users, what theme to show, etc). What was once two queries and possibly many rows of data, is now a single cache item, cached for five minutes.
+
+### Wrapping Objects
+
+Some languages allow you to configure objects to be serialized. Exactly how to do this in your language is beyond the scope of this document, however some tips remain.
+
+ * Consider if you actually need to serialize a whole object. Odds are your constructor could pull from cache.
+ * Serialize it as efficiently and simply as possible. Spending a lot of time in object setup/teardown can drag CPU.
+
+Further consider, if you're deserializing a huge object for a request, and then using one small part of it, you might want to cache those parts separately.
+
+### Fragment Caching
+
+Once upon a time ESI (Edge Side Includes) were all the rage. Sadly they require special proxies/caching/etc. You can do this within your app for dynamic, authenticated pages just fine.
+
+Memcached isn't just all about preventing database queries. You can cache
+rendered HTML as well.
+
+```
+ # Lets generate a bio page!
+user          = fetch_user_info(user_id)
+bio_template  = fetch_biotheme_for(user_id)
+page_template = fetch_page_theme
+pagedata      = fetch_page_data
+
+bio_fragment = apply_template(bio_template, user)
+page         = apply_template(page_template, bio_fragment)
+print "Content-Type: text/html", page
+```
+
+In this oversimplified example, we're loading user data (which could be using a cache!), loading the raw template for the "bio" part of a webpage (which could be using a cache!). Then it loads the main template, which includes the header and footer.
+
+Finally, it processes all that together into the main page and returns it. Applying templates can be costly. You can cache the assembled bio fragment, in case you're rendering a custom header for the viewing user. Or if it doesn't matter, cache the whole 'page' output.
+
+```
+key = 'FRAG-BIO:' . user_id 
+if (result = memcli:get(key)) {
+	return result
+} else {
+	user         = fetch_user_info(user_id)
+	bio_template = fetch_biotheme_for(user_id)
+	bio_fragment = apply_template(bio_template, user)
+	memcli:set(key, bio_fragment, 5 * 15)
+	return bio_fragment
+}
+```
+
+See? Why do more work than you have to. The more you can roll up the faster pages will render, the happier your users.
+
+## Extended Functions
+
+Beyond 'set', there are add, incr, decr, etc. They are simple commands but require a little finesse.
+
+### Proper Use of `add`
+
+`add` allows you to set a value if it doesn't already exist. You use this when initializing counters, setting locks, or otherwise setting data you don't want overwritten as easily. There can be some odd little gotchas and race conditions in handling of `add` however.
+
+```
+ # There can be only one
+key = "the_highlander"
+real_highlander = memcli:get(key)
+if (! real_highlander) {
+	# Hmm, nobody there.
+	var = fetch_highlander
+	if (! memcli:add(key, var, 3600)) {
+		# Uh oh! Somebody beat us!
+		# We can either use the variable we fetched,
+		# or issue `get` again in case it might be newer.
+		real_highlander = memcli:get(key)
+	} else {
+		# We win!
+	    gloat
+	}
+}
+return real_highlander
+```
+
+### Proper Use of `incr` or `decr`
+
+`incr` and `decr` commands can be used to maintain counters. Such as how many hits a page has received, when you rate limit a user, etc. These commands will allow you to add values from 1 or higher, or even negative values.
+
+They do not, however, initialize a missing value.
+
+```
+# Got a hit!
+key = 'hits: ' . user_id
+if (! memcli:incr(key, 1)) {
+	# Whoops, key doesn't already exist!
+	# There's a chance someone else just noticed this too,
+	# so we use `add` instead of `set`
+	if (! memcli:add(key, 1, 60 * 60 * 24)) {
+		# Failed! Someone else already put it back.
+		# So lets try one more time to incr.
+		memcli:incr(key, 1)
+	} else {
+		return success
+	}
+} else {
+	return success
+}
+```
+
+If you're not careful, you could miss counting that hit :) You can doll this up and retry a few times, or no times, depending on how important you think it is. Just don't run a `set` when you mean to do an `add` in this case.
+
+## Cache Invalidation
+
+Levelling up in memcached requires that you learn about actively invalidating (or revalidating) your cache.
+
+When a user comes along and edits their user data, you should be attempting to keep the cache in sync some way, so the user has no idea they're being fed cached data.
+
+### Expiration
+
+A good place to start is to tune your expiration times. Even if you're actively deleting or overwriting cached data, you'll still want to have the cache expire occasionally. In case your app has a bug, a crash, a network blip, or some other issue where the cache could become out of sync.
+
+There isn't a "rule of thumb" when picking an expiration time. Sit back and think about your users, and what your data is. How long can you go without making your users angry? Be honest with yourself, as "THEY _ALWAYS_ NEED FRESH DATA" isn't necessarily true.
+
+Expiration times are specified in unsigned integer seconds. They can be set from `0`, meaning "never expire", to 30 days `(60*60*24*30)`. Any time higher than 30 days is interpreted as a unix timestamp date. If you want to expire an object on january 1st of next year, this is how you do that.
+
+For binary protocol an expiration must be unsigned. If a negative expiration
+is given to the ASCII protocol, it is treated it as "expire immediately".
+
+### `delete`
+
+The simplest method of invalidation is to simply delete it, and have your website re-cache the data next time it's fetched.
+
+So user Bob updates his bio. You want Bob to see his latest info when he so vainly reloads the page. So you:
+
+```
+memcli:delete('FRAG-BIO: ' . user_id)
+```
+
+... and next time he loads the page, it will fetch from the database and repopulate the cache.
+
+### `set`
+
+The most efficient idea is to actively update your cache as your data changes. When Bob updates his bio, take bob's bio object and shove it into the cache via 'set'. You can pass the new data into the same routine that normally checks for data, or however you want to structure it.
+
+Play your cards right, and your database only ever handles writes, and data it hasn't seen in a long time.
+
+### Invalidating by Tag
+
+TODO: link to namespacing document + say how this isn't possible.
+
+## Key Usage
+
+Thinking about your keys can save you a lot of time and memory. Memcached is a hash, but it also remembers the full key internally. The longer your keys are, the more bytes memcached has to hash to look up your value, and the more memory it wastes storing a full copy of your key.
+
+On the other hand, it should be easy to figure out exactly where in your code a key came from. Otherwise many laborous hours of debugging wait for you.
+
+### Avoid User Input
+
+It's very easy to compromise memcached if you use arbitrary user input for keys. The ASCII protocol uses spaces and newlines. Ensure that neither show up your keys, live long and prosper. Binary protocol does not have this issue.
+
+### Short Keys
+
+64-bit UID's are clever ways to identify a user, but suck when printed out. 18446744073709551616. 20 characters! Using base64 encoding, or even just hexadecimal, you can cut that down by quite a bit.
+
+With the binary protocol, it's possible to store anything, so you can directly pack 4 bytes into the key. This makes it impossible to read back via the ASCII protocol, and you should have tools available to simply determine what a key is.
+
+### Informative Keys
+
+```
+key = 'SQL' . md5sum("SELECT blah blah blah")
+```
+
+... might be clever, but if you're looking at this key via tcpdump, strace, etc. You won't have any clue where it's coming from.
+
+In this particular example, you may put your SQL queries into an outside file with the md5sum next to them. Or, more simply, appending a unique query ID into the key.
+
+```
+key = 'SQL' . query_id . ':' . m5sum("SELECT blah blah blah")
+```
diff --git a/content/userguide/faq.md b/content/userguide/faq.md
@@ -1,5 +1,120 @@
 +++
-title = 'Faq'
+title = 'FAQ'
 date = 2024-09-04T14:41:27-07:00
-draft = true
+weight = 2
 +++
+
+## Basics
+
+### How can you list all keys?
+
+You can list all keys using an interface that is deliberately limited.
+Applications _must not_ rely on reading all keys back from a memcached server.
+A server may have millions or billions of unrelated keys. An application that
+relies on looking at all keys to then render a page will eventually fail.
+
+You can use the `lru crawler` to examine all keys in an instance. This
+interface provides useful data for doing an analysis on data that is stored in
+a cache. See [the protocol
+documentation](Protocol](http://github.com/memcached/memcached/blob/master/doc/protocol.txt))
+for full info.
+
+### Why only RAM?
+
+Everything memcached does is an attempt to guarantee latency and speed. That
+said, it can make sense for some larger values to be fetched from high speed
+flash drives. [A feature called extstore](/features/flashstorage/) allows
+splitting items between RAM and disk storage.
+
+### Why no complex operations?
+
+All operations should run in O(1) time. They must be atomic. This doesn't necessarily mean complex operations can never happen, but it means we have to think very carefully about them first. Many complex operations can be emulated on top of more basic functionality.
+
+### Why is memcached not recommended for sessions? Everyone does it!
+
+If a session disappears, often the user is logged out. If a portion of a cache disappears, either due to a hardware crash or a simple software upgrade, it should not cause your users noticable pain. [This overly wordy post](http://dormando.livejournal.com/495593.html) explains alternatives. Memcached can often be used to reduce IO requirements to very very little, which means you may continue to use your existing relational database for the things it's good at.
+
+Like keeping your users from being knocked off your site.
+
+### What about the MySQL query cache?
+
+The MySQL query cache can be a useful start for small sites. Unfortunately it uses many global locks on the mysql database, so enabling it can throttle you down. It also caches queries per table, and has to expire the entire cache related to a table when it changes, at all. If your site is fairly static this can work out fine, but when your tables start changing with any frequency this immediately falls over.
+
+Memory is also limited, as it requires using a chunk of what's directly on your database.
+
+### Is memcached atomic?
+
+Aside from any bugs you may come across, all commands are internally atomic. Issuing multiple sets at the same time has no ill effect, aside from the last one in being the one that sticks.
+
+### How do I troubleshoot client timeouts?
+
+See [Timeouts](/troubleshooting/timeouts) for help.
+
+## Setup Questions
+
+### How do I authenticate?
+
+Limited password based authentication is available in [the basic protocol](http://github.com/memcached/memcached/blob/master/doc/protocol.txt) - You can also enable TLS and authenticate by certificates verification.
+
+### How do you handle failover?
+
+You usually don't. Some clients have a "failover" option that will try the next server in the case of a failure.
+
+- TODO: renovate this section.
+
+### How do you handle replication?
+
+It doesn't. Adding replication to the system halves your effective cache size. If you can't handle even a few percent extra cache misses, you have serious problems. Even with replication, things can break. More moving parts. Software to crash.
+
+- TODO: renovate this section
+
+### Can you persist cache between restarts?
+
+Yes, in some situations. See [the documentation on warm restart](/features/restart/).
+
+### Do clients and servers all need to talk to each other?
+
+Nope. The less chatter, the more scalable the system.
+
+## Monitoring
+
+### Why Isn't curr_items Decreasing When Items Expire?
+
+Expiration in memcached is lazy.  In general, an item cannot be known to be expired until something looks at it. This helps the server keep consistent performance.
+
+Since 1.5.0 a background thread analyzes the cache over time and
+asynchronously removes expired items from memory. [See this blog post for more detail](https://memcached.org/blog/modern-lru/)
+
+## Use Cases
+
+### When would you not want to use memcached?
+
+It doesn't always make sense to add memcached to your application.
+
+TODO: link to that whynot page here or just inline new stuff?
+
+### Why can't I use it as a database?
+
+Memcached is an ephemeral data store. Meaning if the server goes down (crash,
+reboot, "cloud burps") then your data is gone. The ephemeral nature of the
+software allows us to take extreme tradeoffs in design which allow us to be
+10x, 100x, or even 1000x faster than a traditional database. Combining caching
+with traditional datastores allows reducing cost and improving user
+experience.
+
+### Can using memcached make my application slower?
+
+Yes, absolutely. If your DB queries are all fast, your website is fast, adding memcached might not make it faster.
+
+Also, this:
+
+```
+my @post_ids = fetch_all_posts($thread_id);
+my @post_entries = ();
+for my $post_id (@post_ids) {
+	push(@post_entries, $memc->get($post_id));
+}
+# Yay I have all my post entries!
+```
+
+Instead of this anti-pattern, use pipelined gets instead. Fetching a single item from memcached still requires a network roundtrip and a little processing. The more you can fetch at once the better.