From 73c1143de7701ad8501e729d406dd4c7ba14c964 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ronny=20Lo=CC=81pez?= Date: Tue, 15 Jan 2013 22:57:53 +0100 Subject: [PATCH 0001/2573] Fixed command BITCOUNT arguments. --- commands.json | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/commands.json b/commands.json index 1fc881e944..bff4fdbb2e 100644 --- a/commands.json +++ b/commands.json @@ -45,14 +45,9 @@ "type": "key" }, { - "name": "start", - "type": "integer", - "optional": true - }, - { - "name": "end", - "type": "integer", - "optional": true + "name": ["start", "end"], + "type": ["integer", "integer"], + "multiple": true } ], "since": "2.6.0", From 49da29860dae9730861ab1e1dd8774490fe53546 Mon Sep 17 00:00:00 2001 From: John Weir Date: Thu, 17 Jan 2013 14:49:32 -0500 Subject: [PATCH 0002/2573] Document Pub/Sub and db number scope --- topics/pubsub.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/topics/pubsub.md b/topics/pubsub.md index 9765f47efb..d8772db905 100644 --- a/topics/pubsub.md +++ b/topics/pubsub.md @@ -49,6 +49,16 @@ issued by another client. The second element is the name of the originating channel, and the third argument is the actual message payload. +## Database & Scoping + +Pub/Sub has no relation to the key space. It was made to not interfere with +it on any level, including database numbers. + +Publishing on db 10, will be heard on by a subscriber on db 1. + +If you need scoping of some kind, prefix the channels with the name of the +environment (test, staging, production, ...). + ## Wire protocol example SUBSCRIBE first second From 3e00d2f5c1cda5a712e3d5d331990c108da3fad9 Mon Sep 17 00:00:00 2001 From: george Date: Fri, 8 Feb 2013 23:43:02 +0900 Subject: [PATCH 0003/2573] add examples for set-related store commands --- commands/sdiffstore.md | 13 +++++++++++++ commands/sinterstore.md | 13 +++++++++++++ commands/sunionstore.md | 13 +++++++++++++ 3 files changed, 39 insertions(+) diff --git a/commands/sdiffstore.md b/commands/sdiffstore.md index db95908556..e941016742 100644 --- a/commands/sdiffstore.md +++ b/commands/sdiffstore.md @@ -6,3 +6,16 @@ If `destination` already exists, it is overwritten. @return @integer-reply: the number of elements in the resulting set. + +@examples + +```cli +SADD key1 "a" +SADD key1 "b" +SADD key1 "c" +SADD key2 "c" +SADD key2 "d" +SADD key2 "e" +SDIFFSTORE key key1 key2 +SMEMBERS key +``` diff --git a/commands/sinterstore.md b/commands/sinterstore.md index 26d6e3f381..17dd0bf0b4 100644 --- a/commands/sinterstore.md +++ b/commands/sinterstore.md @@ -6,3 +6,16 @@ If `destination` already exists, it is overwritten. @return @integer-reply: the number of elements in the resulting set. + +@examples + +```cli +SADD key1 "a" +SADD key1 "b" +SADD key1 "c" +SADD key2 "c" +SADD key2 "d" +SADD key2 "e" +SINTERSTORE key key1 key2 +SMEMBERS key +``` diff --git a/commands/sunionstore.md b/commands/sunionstore.md index f3bf959c5d..74df06071f 100644 --- a/commands/sunionstore.md +++ b/commands/sunionstore.md @@ -6,3 +6,16 @@ If `destination` already exists, it is overwritten. @return @integer-reply: the number of elements in the resulting set. + +@examples + +```cli +SADD key1 "a" +SADD key1 "b" +SADD key1 "c" +SADD key2 "c" +SADD key2 "d" +SADD key2 "e" +SINTERSTORE key key1 key2 +SMEMBERS key +``` From ef0e67c09af63dc8b808c620fccff24c0b11e661 Mon Sep 17 00:00:00 2001 From: george Date: Sat, 16 Feb 2013 12:04:44 +0900 Subject: [PATCH 0004/2573] fix typo - s/an hash/a hash --- topics/memory-optimization.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/topics/memory-optimization.md b/topics/memory-optimization.md index 5b5bea9621..e2efeee280 100644 --- a/topics/memory-optimization.md +++ b/topics/memory-optimization.md @@ -45,10 +45,10 @@ where values can just be just strings, that is not just more memory efficient than Redis plain keys but also much more memory efficient than memcached. Let's start with some fact: a few keys use a lot more memory than a single key -containing an hash with a few fields. How is this possible? We use a trick. +containing a hash with a few fields. How is this possible? We use a trick. In theory in order to guarantee that we perform lookups in constant time (also known as O(1) in big O notation) there is the need to use a data structure -with a constant time complexity in the average case, like an hash table. +with a constant time complexity in the average case, like a hash table. But many times hashes contain just a few fields. When hashes are small we can instead just encode them in an O(N) data structure, like a linear @@ -60,7 +60,7 @@ it contains will grow too much (you can configure the limit in redis.conf). This does not work well just from the point of view of time complexity, but also from the point of view of constant times, since a linear array of key value pairs happens to play very well with the CPU cache (it has a better -cache locality than an hash table). +cache locality than a hash table). However since hash fields and values are not (always) represented as full featured Redis objects, hash fields can't have an associated time to live @@ -168,7 +168,7 @@ of your keys and values: hash-max-zipmap-value 1024 -Every time an hash will exceed the number of elements or element size specified +Every time a hash will exceed the number of elements or element size specified it will be converted into a real hash table, and the memory saving will be lost. You may ask, why don't you do this implicitly in the normal key space so that From e86c70f25482e188a97d22a6c21afe7e3fd0d76d Mon Sep 17 00:00:00 2001 From: Antonio Ognio Date: Sun, 24 Feb 2013 11:13:17 -0500 Subject: [PATCH 0005/2573] =?UTF-8?q?Adding=20br=C3=BCkva=20to=20the=20lis?= =?UTF-8?q?t=20of=20Python=20Redis=20clients?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- clients.json | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/clients.json b/clients.json index 6d7a7623de..75dfc54f21 100644 --- a/clients.json +++ b/clients.json @@ -381,6 +381,15 @@ "active": true }, + { + "name": "brukva", + "language": "Python", + "repository": "https://github.com/evilkost/brukva", + "description": "Asynchronous Redis client that works within Tornado IO loop", + "authors": ["evilkost"], + "active": true + }, + { "name": "scala-redis", "language": "Scala", From 9087dd36d3a0b355a0b658dabcc5d484be98ec35 Mon Sep 17 00:00:00 2001 From: Markus Rothe Date: Sun, 3 Mar 2013 09:19:50 +0000 Subject: [PATCH 0006/2573] URL of Radix changed --- clients.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/clients.json b/clients.json index 6d7a7623de..639c65f0da 100644 --- a/clients.json +++ b/clients.json @@ -93,7 +93,7 @@ { "name": "Radix", "language": "Go", - "repository": "https://github.com/fzzbt/radix", + "repository": "https://github.com/fzzy/radix", "description": "MIT licensed Redis client.", "authors": ["fzzbt"], "recommended": true, From d4bbbdc5e5ec92bf0ac400ac681866b3671a98fa Mon Sep 17 00:00:00 2001 From: antirez Date: Tue, 5 Mar 2013 20:07:16 +0100 Subject: [PATCH 0007/2573] Redis cluster spec updated: from 4096 to 16384 hash slots. --- topics/cluster-spec.md | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/topics/cluster-spec.md b/topics/cluster-spec.md index dc525e6d35..5831cd0c52 100644 --- a/topics/cluster-spec.md +++ b/topics/cluster-spec.md @@ -26,8 +26,8 @@ subset of the features available in the Redis stand alone server. In Redis cluster there are no central or proxy nodes, and one of the major design goals is linear scalability. -Redis cluster sacrifices fault tolerance for consistence, so the system -try to be as consistent as possible while guaranteeing limited resistance +Redis cluster sacrifices fault tolerance for consistency, so the system +tries to be as consistent as possible while guaranteeing limited resistance to net splits and node failures (we consider node failures as special cases of net splits). @@ -46,10 +46,10 @@ operations like Set type unions or intersections are not implemented, and in general all the operations where in theory keys are not available in the same node are not implemented. -In the future there is the possibility to add a new kind of node called a -Computation Node to perform multi-key read only operations in the cluster, -but it is not likely that the Redis cluster itself will be able -to perform complex multi key operations implementing some kind of +In the future it is possible that using the MIGRATE COPY command users will +be able to use *Computation Nodes* to perform multi-key read only operations +in the cluster, but it is not likely that the Redis Cluster itself will be +able to perform complex multi key operations implementing some kind of transparent way to move keys around. Redis Cluster does not support multiple databases like the stand alone version @@ -82,11 +82,11 @@ keys and nodes can improve the performance in a sensible way. Keys distribution model --- -The key space is split into 4096 slots, effectively setting an upper limit -for the cluster size of 4096 nodes (however the suggested max size of -nodes is in the order of a few hundreds). +The key space is split into 16384 slots, effectively setting an upper limit +for the cluster size of 16384 nodes (however the suggested max size of +nodes is in the order of ~ 1000 nodes). -All the master nodes will handle a percentage of the 4096 hash slots. +All the master nodes will handle a percentage of the 16384 hash slots. When the cluster is **stable**, that means that there is no a cluster reconfiguration in progress (where hash slots are moved from one node to another) a single hash slot will be served exactly by a single node @@ -95,7 +95,7 @@ it in the case of net splits or failures). The algorithm used to map keys to hash slots is the following: - HASH_SLOT = CRC16(key) mod 4096 + HASH_SLOT = CRC16(key) mod 16384 * Name: XMODEM (also known as ZMODEM or CRC-16/ACORN) * Width: 16 bit @@ -109,9 +109,9 @@ The algorithm used to map keys to hash slots is the following: A reference implementation of the CRC16 algorithm used is available in the Appendix A of this document. -12 out of 16 bit of the output of CRC16 are used. +14 out of 16 bit of the output of CRC16 are used. In our tests CRC16 behaved remarkably well in distributing different kind of -keys evenly across the 4096 slots. +keys evenly across the 16384 slots. Cluster nodes attributes --- @@ -320,7 +320,7 @@ only ask the next query to the specified node. This is needed because the next query about hash slot 8 can be about the key that is still in A, so we always want that the client will try A and -then B if needed. Since this happens only for one hash slot out of 4096 +then B if needed. Since this happens only for one hash slot out of 16384 available the performance hit on the cluster is acceptable. However we need to force that client behavior, so in order to make sure @@ -380,11 +380,11 @@ a node that is now in a failure state). Once the configuration is processed the node enters one of the following states: -* FAIL: the cluster can't work. When the node is in this state it will not serve queries at all and will return an error for every query. This state is entered when the node detects that the current nodes are not able to serve all the 4096 slots. -* OK: the cluster can work as all the 4096 slots are served by nodes that are not flagged as FAIL. +* FAIL: the cluster can't work. When the node is in this state it will not serve queries at all and will return an error for every query. This state is entered when the node detects that the current nodes are not able to serve all the 16384 slots. +* OK: the cluster can work as all the 16384 slots are served by nodes that are not flagged as FAIL. This means that the Redis Cluster is designed to stop accepting queries once even a subset of the hash slots are not available. However there is a portion of time in which an hash slot can't be accessed correctly since the associated node is experiencing problems, but the node is still not marked as failing. -In this range of time the cluster will only accept queries about a subset of the 4096 hash slots. +In this range of time the cluster will only accept queries about a subset of the 16384 hash slots. Since Redis cluster does not support MULTI/EXEC transactions the application developer should make sure the application can recover from only a subset of queries being accepted by the cluster. From 0318afe6b4aa94d99ef4b0b17674beebad777b1b Mon Sep 17 00:00:00 2001 From: antirez Date: Tue, 5 Mar 2013 20:15:30 +0100 Subject: [PATCH 0008/2573] Cluster specification failure detection updated to reflect the code. --- topics/cluster-spec.md | 22 +++++++++------------- 1 file changed, 9 insertions(+), 13 deletions(-) diff --git a/topics/cluster-spec.md b/topics/cluster-spec.md index 5831cd0c52..ddf7a8fbfe 100644 --- a/topics/cluster-spec.md +++ b/topics/cluster-spec.md @@ -137,10 +137,11 @@ know: * A set of hash slots served by the node. * Last time we sent a PING packet using the cluster bus. * Last time we received a PONG packet in reply. +* The time at which we flagged the node as failing. * The number of slaves of this node. * The master node ID, if this node is a slave (or 0000000... if it is a master). -All this information is available using the `CLUSTER NODES` command that +Soem of this information is available using the `CLUSTER NODES` command that can be sent to all the nodes in the cluster, both master and slave nodes. The following is an example of output of CLUSTER NODES sent to a master @@ -360,18 +361,16 @@ Node failure detection Failure detection is implemented in the following way: * A node marks another node setting the PFAIL flag (possible failure) if the node is not responding to our PING requests for a given time. -* Nodes broadcast information about other nodes (three random nodes taken at random) when pinging other nodes. The gossip section contains information about other nodes flags. -* If we have a node marked as PFAIL, and we receive a gossip message where another nodes also think the same node is PFAIL, we mark it as FAIL (failure). -* Once a node marks another node as FAIL as result of a PFAIL confirmed by another node, a message is send to all the other nodes to force all the reachable nodes in the cluster to set the specified not as FAIL. +* Nodes broadcast information about other nodes (three random nodes per packet) when pinging other nodes. The gossip section contains information about other nodes flags. +* Nodes remember if other nodes advertised some node as failing. This is called a failure report. +* Once a node receives a new failure report, such as that the majority of master nodes agree about the failure of a given node, the node is marked as FAIL. +* When a node is marked as FAIL, a message is broadcasted to the cluster in order to force all the reachable nodes to set the specified node as FAIL. -So basically a node is not able to mark another node as failing without external acknowledge. +So basically a node is not able to mark another node as failing without external acknowledge, and the majority of the master nodes are required to agree. -(still to implement:) -Once a node is marked as failing, any other node receiving a PING or -connection attempt from this node will send back a "MARK AS FAIL" message -in reply that will force the receiving node to set itself as failing. +Old failure reports are removed, so the majority of master nodes need to have a recent entry in the failure report table of a given node for it to mark another node as FAIL. -Cluster state detection (only partially implemented) +Cluster state detection --- Every cluster node scan the list of nodes every time a configuration change @@ -386,9 +385,6 @@ Once the configuration is processed the node enters one of the following states: This means that the Redis Cluster is designed to stop accepting queries once even a subset of the hash slots are not available. However there is a portion of time in which an hash slot can't be accessed correctly since the associated node is experiencing problems, but the node is still not marked as failing. In this range of time the cluster will only accept queries about a subset of the 16384 hash slots. -Since Redis cluster does not support MULTI/EXEC transactions the application -developer should make sure the application can recover from only a subset of queries being accepted by the cluster. - Slave election (not implemented) --- From 8b74e49b1dc61347ce7dd3fdc094fd6e532dfdd5 Mon Sep 17 00:00:00 2001 From: antirez Date: Wed, 6 Mar 2013 17:25:10 +0100 Subject: [PATCH 0009/2573] Cluster specification updated. --- topics/cluster-spec.md | 56 ++++++++++++------------------------------ 1 file changed, 16 insertions(+), 40 deletions(-) diff --git a/topics/cluster-spec.md b/topics/cluster-spec.md index ddf7a8fbfe..6488e13f09 100644 --- a/topics/cluster-spec.md +++ b/topics/cluster-spec.md @@ -370,7 +370,7 @@ So basically a node is not able to mark another node as failing without external Old failure reports are removed, so the majority of master nodes need to have a recent entry in the failure report table of a given node for it to mark another node as FAIL. -Cluster state detection +Cluster state detection (partilly implemented) --- Every cluster node scan the list of nodes every time a configuration change @@ -379,56 +379,32 @@ a node that is now in a failure state). Once the configuration is processed the node enters one of the following states: -* FAIL: the cluster can't work. When the node is in this state it will not serve queries at all and will return an error for every query. This state is entered when the node detects that the current nodes are not able to serve all the 16384 slots. +* FAIL: the cluster can't work. When the node is in this state it will not serve queries at all and will return an error for every query. * OK: the cluster can work as all the 16384 slots are served by nodes that are not flagged as FAIL. -This means that the Redis Cluster is designed to stop accepting queries once even a subset of the hash slots are not available. However there is a portion of time in which an hash slot can't be accessed correctly since the associated node is experiencing problems, but the node is still not marked as failing. -In this range of time the cluster will only accept queries about a subset of the 16384 hash slots. +This means that the Redis Cluster is designed to stop accepting queries once even a subset of the hash slots are not available for some time. -Slave election (not implemented) ---- +However there is a portion of time in which an hash slot can't be accessed correctly since the associated node is experiencing problems, but the node is still not marked as failing. In this range of time the cluster will only accept queries about a subset of the 16384 hash slots. -Every master can have any number of slaves (including zero). -Slaves are responsible of electing themselves to masters when a given -master fails. For instance we may have node A1, A2, A3, where A1 is the -master an A2 and A3 are two slaves. +The FAIL state for the cluster happens in two cases. -If A1 is failing in some way and no longer replies to pings, other nodes -will end marking it as failing using the gossip protocol. When this happens -its **first slave** will try to perform the election. +* 1) If at least one hash slot is not served as the node serving it currently is in FAIL state. +* 2) If we are not able to reach the majority of masters (that is, if the majorify of masters are simply in PFAIL state, it is enough for the node to enter FAIL mode). -The concept of first slave is very simple. Of all the slaves of a master -the first slave is the one that has the smallest node ID, sorting node IDs -lexicographically. If the first slave is also marked as failing, the next -slave is in charge of performing the election and so forth. +The second check is required because in order to mark a node from PFAIL to FAIL state, the majority of masters are required. However when we are not connected with the majority of masters it is impossible from our side of the net split to mark nodes as FAIL. However since we detect this condition we set the Cluster state in FAIL mode to stop serving queries. -So after a configuration update every slave checks if it is the first slave -of the failing master. In the case it is it changes its state to master -and broadcasts a message to all the other nodes to update the configuration. - -Protection mode (not implemented) +Slave election (not implemented) --- -After a net split resulting into a few isolated nodes, this nodes will -end thinking all the other nodes are failing. In the process they may try -to start a slave election or some other action to modify the cluster -configuration. In order to avoid this problem, nodes seeing a majority of -other nodes in PFAIL or FAIL state for a long enough time should enter -a protection mode that will prevent them from taking actions. - -The protection mode is cleared once the cluster state is OK again. - -Majority of masters rule (not implemented) ---- +The design of slave election is a work in progress right now. -As a result of a net split it is possible that two or more partitions are -independently able to serve all the hash slots. -Since Redis Cluster try to be consistent this is not what we want, and -a net split should always produce zero or one single partition able to -operate. +The idea is to use the concept of first slave, that is, out of all the +slaves for a given node, the first slave is the one with the lower +Node ID (comparing node IDs lexicographically). -In order to enforce this rule nodes into a partition should only try to -serve queries if they have the **majority of the original master nodes**. +However it is likely that the same system used for failure reports will be +used in order to require the majority of masters to authorize the slave +election. Publish/Subscribe (implemented, but to refine) === From 1e77210feb2153c58d716c736028c3ce317c9b7c Mon Sep 17 00:00:00 2001 From: Michael Jackson Date: Wed, 6 Mar 2013 11:25:39 -0800 Subject: [PATCH 0010/2573] Add then-redis to client list --- clients.json | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/clients.json b/clients.json index 6d7a7623de..10db12b8bb 100644 --- a/clients.json +++ b/clients.json @@ -34,7 +34,7 @@ "description": "Redis client build on top of lamina", "authors":["Zach Tellman"], "active": true - }, + }, { "name": "CL-Redis", "language": "Common Lisp", @@ -127,7 +127,7 @@ "authors": ["simonz05"], "active": true }, - + { "name": "gosexy/redis", "language": "Go", @@ -515,6 +515,15 @@ "active": true }, + { + "name": "then-redis", + "language": "Node.js", + "repository": "https://github.com/mjijackson/then-redis", + "description": "A small, promise-based Redis client for node", + "authors": ["mjackson"], + "active": true + }, + { "name": "redis-node-client", "language": "Node.js", From d208cf6bc29e8a1be2d44af040878d4f06ebaf41 Mon Sep 17 00:00:00 2001 From: antirez Date: Fri, 8 Mar 2013 19:23:50 +0100 Subject: [PATCH 0011/2573] Updates in the cluster spec about node failure detection. --- topics/cluster-spec.md | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/topics/cluster-spec.md b/topics/cluster-spec.md index 6488e13f09..e9efe98668 100644 --- a/topics/cluster-spec.md +++ b/topics/cluster-spec.md @@ -360,16 +360,23 @@ Node failure detection Failure detection is implemented in the following way: -* A node marks another node setting the PFAIL flag (possible failure) if the node is not responding to our PING requests for a given time. -* Nodes broadcast information about other nodes (three random nodes per packet) when pinging other nodes. The gossip section contains information about other nodes flags. +* A node marks another node setting the PFAIL flag (possible failure) if the node is not responding to our PING requests for a given time. This time is called the node timeout, and is a node-wise setting. +* Nodes broadcast information about other nodes (three random nodes per packet) when pinging other nodes. The gossip section contains information about other nodes flags, including the PFAIL and FAIL flags. * Nodes remember if other nodes advertised some node as failing. This is called a failure report. -* Once a node receives a new failure report, such as that the majority of master nodes agree about the failure of a given node, the node is marked as FAIL. +* Once a node (already considering a given other node in PFAIL state) receives enough failure reports, so that the majority of master nodes agree about the failure of a given node, the node is marked as FAIL. * When a node is marked as FAIL, a message is broadcasted to the cluster in order to force all the reachable nodes to set the specified node as FAIL. So basically a node is not able to mark another node as failing without external acknowledge, and the majority of the master nodes are required to agree. Old failure reports are removed, so the majority of master nodes need to have a recent entry in the failure report table of a given node for it to mark another node as FAIL. +The FAIL state is reversible in two cases: + +* If the FAIL state is set for a slave node, the FAIL state can be reversed if the slave is already reachable. There is no point in retaning the FAIL state for a slave node as it does not serve slots, and we want to make sure we have the chance to promote it to master if needed. +* If the FAIL state is set for a master node, and after four times the node timeout, plus 10 seconds, the slots are were still not failed over, and the node is reachable again, the FAIL state is reverted. + +The rationale for the second case is that if the failover did not worked we want the cluster to continue to work if the master is back online, without any kind of user intervetion. + Cluster state detection (partilly implemented) --- From 5a6f0f49eb5ad20ac664355fc6c77820b3b81a83 Mon Sep 17 00:00:00 2001 From: 0x20h Date: Fri, 15 Mar 2013 00:46:32 +0100 Subject: [PATCH 0012/2573] fixed typo --- commands/object.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/commands/object.md b/commands/object.md index c0b9a1f709..87aee77de6 100644 --- a/commands/object.md +++ b/commands/object.md @@ -37,14 +37,14 @@ Objects can be encoded in different ways: sets of any size. All the specially encoded types are automatically converted to the general type -once you perform an operation that makes it no possible for Redis to retain the +once you perform an operation that makes it impossible for Redis to retain the space saving encoding. @return Different return values are used for different subcommands. -* Subcommands `refcount` and `idletime` returns integers. +* Subcommands `refcount` and `idletime` return integers. * Subcommand `encoding` returns a bulk reply. If the object you try to inspect is missing, a null bulk reply is returned. From 4c5f161a8c6fabdbc388b61c56ffd2e2cc1c3df3 Mon Sep 17 00:00:00 2001 From: ctnstone Date: Tue, 26 Mar 2013 08:25:16 -0400 Subject: [PATCH 0013/2573] Added csredis to C# clients --- clients.json | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/clients.json b/clients.json index a9f6d1819c..6d0946164f 100644 --- a/clients.json +++ b/clients.json @@ -644,5 +644,13 @@ "repository": "https://github.com/wg/lettuce", "description": "Thread-safe client supporting async usage and key/value codecs", "authors": ["ar3te"] + }, + + { + "name": "csredis", + "language": "C#", + "repository": "https://github.com/ctstone/csredis", + "description": "Async (and sync) client for Redis and Sentinel", + "authors": ["ctnstone"] } ] From 0a63d5e0e90f0b3cbe01a45b1802ac944c922de6 Mon Sep 17 00:00:00 2001 From: bradvoth Date: Wed, 27 Mar 2013 21:18:12 -0300 Subject: [PATCH 0014/2573] Update tools.json Added redis-tcl --- tools.json | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/tools.json b/tools.json index aa8f473b2c..f1cdee7057 100644 --- a/tools.json +++ b/tools.json @@ -281,5 +281,12 @@ "repository": "https://github.com/sasanrose/phpredmin", "description": "Yet another web interface for Redis", "authors": ["sasanrose"] + }, + { + "name": "redis-tcl", + "language": "Tcl", + "repository" : "http://github.com/bradvoth/redis-tcl", + "description" : "Tcl library largely copied from the redis test tree, modified for minor bug fixes and expanded pub/sub capabilities", + "authors" : ["bradvoth","antirez"] } -] \ No newline at end of file +] From bb319c667fae43b78f368ab3402df7a8bbe06bc3 Mon Sep 17 00:00:00 2001 From: antirez Date: Fri, 29 Mar 2013 14:23:02 +0100 Subject: [PATCH 0015/2573] SET command definition updated. --- commands.json | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/commands.json b/commands.json index 28e2de99f5..16ed1194c4 100644 --- a/commands.json +++ b/commands.json @@ -1410,6 +1410,24 @@ { "name": "value", "type": "string" + }, + { + "command": "EX", + "name": "seconds", + "type": "integer", + "optional": true + }, + { + "command": "PX", + "name": "milliseconds", + "type": "integer", + "optional": true + }, + { + "name": "condition", + "type": "enum", + "enum": ["NX", "XX"], + "optional": true } ], "since": "1.0.0", From 58bbdafd78d058c593a850178a18edec0167464a Mon Sep 17 00:00:00 2001 From: antirez Date: Fri, 29 Mar 2013 14:38:44 +0100 Subject: [PATCH 0016/2573] SET documenatation updated for SET options. --- commands/set.md | 31 ++++++++++++++++++++++++++++++- 1 file changed, 30 insertions(+), 1 deletion(-) diff --git a/commands/set.md b/commands/set.md index b93d618525..b89f1e602d 100644 --- a/commands/set.md +++ b/commands/set.md @@ -1,9 +1,23 @@ Set `key` to hold the string `value`. If `key` already holds a value, it is overwritten, regardless of its type. +Any previous time to live associated with the key is discarded on successful `SET` operation. + +## Options + +Starting with Redis 2.6.12 `SET` supports a set of options that modify its +behavior: + +* `EX` *seconds* -- Set the specified expire time, in seconds. +* `PX` *milliseconds* -- Set the specified expire time, in milliseconds. +* `NX` -- Only set the key if it does not already exist. +* `XX` -- Only set the key if it already exist. + +Note: Since the `SET` command options can replace `SETNX`, `SETEX`, `PSETEX`, it is possible that in future versions of Redis these three commands will be deprecated and finally removed. @return -@status-reply: always `OK` since `SET` can't fail. +@status-reply: `OK` if `SET` was executed correctly. +@nil-reply: a Null Bulk Reply is returned if the `SET` operation was not performed becase the user specified the `NX` or `XX` option but the condition was not met. @examples @@ -11,3 +25,18 @@ If `key` already holds a value, it is overwritten, regardless of its type. SET mykey "Hello" GET mykey ``` + +## Patterns + +The command `SET resource-name anystring NX EX max-lock-time` is a simple way to implement a locking system with Redis. + +A client can acquire the lock if the above command returns `OK` (or retry after some time if the command returns Nil), and remove the lock just using `DEL`. + +The lock will be auto-released after the expire time is reached. + +It is possible to make this system more robust modifying the unlock schema as follows: + +* Instead of setting a random string, set a non-guessable large random string. +* Instead of releasing the lock with `DEL`, send a script that only removes the key if the value matches. + +This avoids that a client will try to release the lock after the expire time deleting the key created by another client that acquired the lock later. From cc97848c97a682359598ae146bff1dc4fc60ad79 Mon Sep 17 00:00:00 2001 From: adilbaig Date: Sat, 30 Mar 2013 17:46:19 +0530 Subject: [PATCH 0017/2573] Corrected the twitter account for Tiny-Redis. Added a small description too. --- clients.json | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/clients.json b/clients.json index a9f6d1819c..0c19dc75f9 100644 --- a/clients.json +++ b/clients.json @@ -625,8 +625,8 @@ "language": "D", "url": "http://adilbaig.github.com/Tiny-Redis/", "repository": "https://github.com/adilbaig/Tiny-Redis", - "description": "", - "authors": ["adilbaig"] + "description": "A Redis client for D2. Supports pipelining, transactions and Lua scripting", + "authors": ["aidezigns"] }, { From 214cf0208c2e93d5df4f4e9e5537113fa312cd80 Mon Sep 17 00:00:00 2001 From: antirez Date: Wed, 3 Apr 2013 11:05:10 +0200 Subject: [PATCH 0018/2573] Provide an example script for unlocking in SET pattern. --- commands/set.md | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/commands/set.md b/commands/set.md index b89f1e602d..209080cdad 100644 --- a/commands/set.md +++ b/commands/set.md @@ -36,7 +36,18 @@ The lock will be auto-released after the expire time is reached. It is possible to make this system more robust modifying the unlock schema as follows: -* Instead of setting a random string, set a non-guessable large random string. +* Instead of setting a fixed string, set a non-guessable large random string, called token. * Instead of releasing the lock with `DEL`, send a script that only removes the key if the value matches. This avoids that a client will try to release the lock after the expire time deleting the key created by another client that acquired the lock later. + +An example of unlock script would be similar to the following: + + if redis.call("get",KEYS[1]) == ARGV[1] + then + return redis.call("del",KEYS[1]) + else + return 0 + end + +The script should be called with `EVAL ...script... 1 resource-name token-value` From 7249438696466646a0d47f0c9339a05d479c5a36 Mon Sep 17 00:00:00 2001 From: antirez Date: Wed, 3 Apr 2013 11:08:38 +0200 Subject: [PATCH 0019/2573] Point to the SET based lock from the SETNX page. --- commands/setnx.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/commands/setnx.md b/commands/setnx.md index 72c33798d0..3c70cab673 100644 --- a/commands/setnx.md +++ b/commands/setnx.md @@ -20,6 +20,10 @@ GET mykey ## Design pattern: Locking with `!SETNX` +**NOTE:** Starting with Redis 2.6.12 it is possible to create a much simpler locking primitive using the `SET` command to acquire the lock, and a simple Lua script to release the lock. The pattern is documented in the `SET` command page. + +The old `SETNX` based pattern is documented below for historical reasons. + `SETNX` can be used as a locking primitive. For example, to acquire the lock of the key `foo`, the client could try the following: From b5f66ef273a595551f0568d28ee2d576578cd06b Mon Sep 17 00:00:00 2001 From: Thomas Tourlourat Date: Mon, 8 Apr 2013 13:52:53 +0200 Subject: [PATCH 0020/2573] fix word --- topics/clients.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topics/clients.md b/topics/clients.md index 1cb603ce3c..eae7c189ec 100644 --- a/topics/clients.md +++ b/topics/clients.md @@ -34,7 +34,7 @@ of the error. In what order clients are served --- -The order is determined by a combination of the client scoket file descriptor +The order is determined by a combination of the client socket file descriptor number and order in which the kernel reports events, so the order is to be considered as unspecified. From d2fa2beb9331ee75c0f5bd9266d8d7b86e6302c2 Mon Sep 17 00:00:00 2001 From: Frank Mueller Date: Mon, 8 Apr 2013 16:06:57 +0300 Subject: [PATCH 0021/2573] Update clients.json Added the second Tideland client, after Go now Erlang/OTP. --- clients.json | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/clients.json b/clients.json index a9f6d1819c..b7f1c276f5 100644 --- a/clients.json +++ b/clients.json @@ -72,6 +72,15 @@ "active": true }, + { + "name": "Tideland Erlang/OTP Redis Client", + "language": "Erlang", + "repository": "git://git.tideland.biz/errc", + "description": "A comfortable Redis client for Erlang/OTP support pooling, pub/sub and transactions.", + "authors": ["themue"], + "active": true + }, + { "name": "redis.fy", "language": "Fancy", From 4415628a596f1fd7ad69c701213db26df2600dc7 Mon Sep 17 00:00:00 2001 From: Martyn Loughran Date: Tue, 9 Apr 2013 12:24:31 +0100 Subject: [PATCH 0022/2573] Add ruby em-hiredis client --- clients.json | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/clients.json b/clients.json index a9f6d1819c..9deefafb6f 100644 --- a/clients.json +++ b/clients.json @@ -487,6 +487,14 @@ "authors": [] }, + { + "name": "em-hiredis", + "language": "Ruby", + "repository": "https://github.com/mloughran/em-hiredis", + "description": "An EventMachine Redis client (uses hiredis).", + "authors": ["mloughran"] + }, + { "name": "em-redis", "language": "Ruby", From 1acf5d336669907b63cfb445959c774f9c6d4d4a Mon Sep 17 00:00:00 2001 From: antirez Date: Wed, 10 Apr 2013 23:02:50 +0200 Subject: [PATCH 0023/2573] The first two Redis Design Drafts. --- topics/rdd-1.md | 28 +++++++++++++++ topics/rdd-2.md | 90 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 118 insertions(+) create mode 100644 topics/rdd-1.md create mode 100644 topics/rdd-2.md diff --git a/topics/rdd-1.md b/topics/rdd-1.md new file mode 100644 index 0000000000..00ca3d2e72 --- /dev/null +++ b/topics/rdd-1.md @@ -0,0 +1,28 @@ +# Redis Design Draft 1 -- Redis Design Drafts + +Author: Salvatore Sanfilippo `antirez@gmail.com` +Github issue: none + +## History of revisions + +1.0, 10 April 2012 - Initial draft. + +## Overview + +Redis Design Drafts are a way to make the community aware of designs planned +in order to modify or evolve Redis. Every new Redis Design Draft is published +in the Redis mailing list and announced on Twitter, in the hope to receive +feedbacks before implementing a given feature. + +The way the community can provide feedbacks about a RDD is simply writing +a message to the Redis mailing list, or commenting in the associated +Github issue if any. + +Drafts are published only for features already approved as potentially very +interesting for the project by the current Redis project maintainer. + +The official Redis web site includes a list of published RDDs. + +## Format + +The format of RDDs should reflect the format of this RDD. diff --git a/topics/rdd-2.md b/topics/rdd-2.md new file mode 100644 index 0000000000..404615e25c --- /dev/null +++ b/topics/rdd-2.md @@ -0,0 +1,90 @@ +# Redis Design Draft 2 -- RDB version 7 info fields + +Author: Salvatore Sanfilippo `antirez@gmail.com` +Github issue: https://github.com/antirez/redis/issues/1048 + +## History of revisions + +1.0, 10 April 2012 - Initial draft. + +## Overview + +The Redis RDB format lacks a simple way to add info fields to an RDB file +without causing a backward compatibility issue even if the added meta data +is not required in order to load data from the RDB file. + +For example thanks to the info fields specified in this document it will +be possible to add to RDB informations like file creation time, Redis version +generating the file, and any other useful information, in a way that not +every field is required for an RDB version 7 file to be correctly processed. + +Also with minimal changes it will be possible to add RDB version 7 support to +Redis 2.6 without actually supporting the additional fields but just skipping +them when loading an RDB file. + +RDB info fields may have semantical meaning if needed, so that the presence +of the field may add information about the data set specified in the RDB +file format, however when an info field is required to be correctly decoded +in order to understand and load the data set content of the RDB file, the +RDB file format must be increased so that previous versions of Redis will not +attempt to load it. + +However currently the info fields are designed to only hold additional +informations that are not useful to load the dataset, but can better specify +how the RDB file was created. + +## Info fields representation + +The RDB format 6 has the following layout: + +* A 9 bytes magic "REDIS0006" +* key-value pairs +* An EOF opcode +* CRC64 checksum + +The proposal for RDB format 7 is to add the optional fields immediately +after the first 9 bytes magic, so that the new format will be: + +* A 9 bytes magic "REDIS0007" +* Info field 1 +* Info field 2 +* ... +* Info field N +* Info field end-of-fields +* key-value pairs +* An EOF opcode +* CRC64 checksum + +Every single info field has the following structure: + +* A 16 bit identifier +* A 64 bit data length +* A data section of the exact length as specified + +Both the identifier and the data length are stored in little endian byte +ordering. + +The special identifier 0 means that there are no other info fields, and that +the remaining of the RDB file contains the key-value pairs. + +## Handling of info fields + +A program can simply skip every info field it does not understand, as long +as the RDB version matches the one that it is capable to load. + +## Specification of info fields IDs and content. + +### Info field 0 -- End of info fields + +This just means there are no longer info fields to process. + +### Info field 1 -- Creation date + +This field represents the unix time at which the RDB file was created. +The format of the unix time is a 64 bit little endian integer representing +seconds since 1th January 1970. + +### Info field 2 -- Redis version + +This field represents a null-terminated string containing the Redis version +that generated the file, as displayed in the Redis version INFO field. From 7a51e2b78635a03780c66bbdc61720e2f296d3c8 Mon Sep 17 00:00:00 2001 From: antirez Date: Wed, 10 Apr 2013 23:04:20 +0200 Subject: [PATCH 0024/2573] RDD markdown changes. --- topics/rdd-1.md | 4 ++-- topics/rdd-2.md | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/topics/rdd-1.md b/topics/rdd-1.md index 00ca3d2e72..c9f54d602c 100644 --- a/topics/rdd-1.md +++ b/topics/rdd-1.md @@ -1,7 +1,7 @@ # Redis Design Draft 1 -- Redis Design Drafts -Author: Salvatore Sanfilippo `antirez@gmail.com` -Github issue: none +* Author: Salvatore Sanfilippo `antirez@gmail.com` +* Github issue: none ## History of revisions diff --git a/topics/rdd-2.md b/topics/rdd-2.md index 404615e25c..e85dd158f5 100644 --- a/topics/rdd-2.md +++ b/topics/rdd-2.md @@ -1,7 +1,7 @@ # Redis Design Draft 2 -- RDB version 7 info fields -Author: Salvatore Sanfilippo `antirez@gmail.com` -Github issue: https://github.com/antirez/redis/issues/1048 +* Author: Salvatore Sanfilippo `antirez@gmail.com` +* Github issue [#1048](https://github.com/antirez/redis/issues/1048) ## History of revisions From 910b16983f0a070653c46f83ada52c8df6c49e62 Mon Sep 17 00:00:00 2001 From: antirez Date: Wed, 10 Apr 2013 23:44:49 +0200 Subject: [PATCH 0025/2573] RDD time machine. --- topics/rdd-1.md | 2 +- topics/rdd-2.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/topics/rdd-1.md b/topics/rdd-1.md index c9f54d602c..7daa5ec92b 100644 --- a/topics/rdd-1.md +++ b/topics/rdd-1.md @@ -5,7 +5,7 @@ ## History of revisions -1.0, 10 April 2012 - Initial draft. +1.0, 10 April 2013 - Initial draft. ## Overview diff --git a/topics/rdd-2.md b/topics/rdd-2.md index e85dd158f5..15a464abf9 100644 --- a/topics/rdd-2.md +++ b/topics/rdd-2.md @@ -5,7 +5,7 @@ ## History of revisions -1.0, 10 April 2012 - Initial draft. +1.0, 10 April 2013 - Initial draft. ## Overview From 81657250e8165eed1a62b4b82ed8a6f2b2beeceb Mon Sep 17 00:00:00 2001 From: antirez Date: Thu, 11 Apr 2013 10:16:29 +0200 Subject: [PATCH 0026/2573] Redis Design Draft main page. --- topics/rdd.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) create mode 100644 topics/rdd.md diff --git a/topics/rdd.md b/topics/rdd.md new file mode 100644 index 0000000000..15f1238bfd --- /dev/null +++ b/topics/rdd.md @@ -0,0 +1,18 @@ +Redis Design Drafts +=== + +Redis Design Drafts are a way to make the community aware about the design of +new features before this feature is actually implemented. This is done in the +hope to get good feedbacks from the user base, that may result in a change +of the design if a flaw or possible improvement was discovered. + +The following is the list of published RDDs so far: + +* [RDD1 -- Redis Design Drafts](/topics/rdd-1) +* [RDD2 -- RDB version 7 info fields](/topics/rdd-2) + +To get an RDD accepted for publication you need to talk about your idea in +the [Redis Google Group](http://groups.google.com/group/redis-db). Once the +general feature is accepted and/or considered for further exploration you +can write an RDD or ask the current Redis maintainer to write one about the +topic. From 98438f66a3b71bb208b5455fb2bb3105bd236695 Mon Sep 17 00:00:00 2001 From: Sandeep Shetty Date: Tue, 16 Apr 2013 17:37:09 +0530 Subject: [PATCH 0027/2573] Added phpish/redis --- clients.json | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/clients.json b/clients.json index 327a74c93c..33b7e4da66 100644 --- a/clients.json +++ b/clients.json @@ -362,7 +362,15 @@ "description": "Lightweight, standalone, unit-tested fork of Redisent which wraps phpredis for best performance if available.", "authors": ["colinmollenhour"] }, - + + { + "name": "phpish/redis", + "language": "PHP", + "repository": "https://github.com/phpish/redis", + "description": "Simple Redis client in PHP", + "authors": ["sandeepshetty"] + }, + { "name": "redis-py", "language": "Python", From 10847affa498574ebb8ffaac672f1938f9c5ae43 Mon Sep 17 00:00:00 2001 From: antirez Date: Tue, 23 Apr 2013 10:45:14 +0200 Subject: [PATCH 0028/2573] Sentinel doc updated with handling of resurrecting master. --- topics/sentinel.md | 27 ++++++++++++++++++++++++--- 1 file changed, 24 insertions(+), 3 deletions(-) diff --git a/topics/sentinel.md b/topics/sentinel.md index d2bea6849e..48a318e84e 100644 --- a/topics/sentinel.md +++ b/topics/sentinel.md @@ -288,9 +288,9 @@ it the **Subjective Leader**, and is selected using the following rule: For a Sentinel to sense to be the **Objective Leader**, that is, the Sentinel that should start the failove process, the following conditions are needed. * It thinks it is the subjective leader itself. -* It receives acknowledges from other Sentinels about the fact it is the leader: at least 50% plus one of all the Sentinels that were able to reply to the `SENTINEL is-master-down-by-addr` request shoudl agree it is the leader, and additionally we need a total level of agreement at least equal to the configured quorum of the master instance that we are going to failover. +* It receives acknowledges from other Sentinels about the fact it is the leader: at least 50% plus one of all the Sentinels that were able to reply to the `SENTINEL is-master-down-by-addr` request should agree it is the leader, and additionally we need a total level of agreement at least equal to the configured quorum of the master instance that we are going to failover. -Once a Sentinel things it is the Leader, the failover starts, but there is always a delay of five seconds plus an additional random delay. This is an additional layer of protection because if during this period we see another instance turning a slave into a master, we detect it as another instance staring the failover and turn ourselves into an observer instead. +Once a Sentinel things it is the Leader, the failover starts, but there is always a delay of five seconds plus an additional random delay. This is an additional layer of protection because if during this period we see another instance turning a slave into a master, we detect it as another instance staring the failover and turn ourselves into an observer instead. This is just a redundancy layer and should in theory never happen. **Sentinel Rule #11**: A **Good Slave** is a slave with the following requirements: * It is not in SDOWN nor in ODOWN condition. @@ -298,6 +298,7 @@ Once a Sentinel things it is the Leader, the failover starts, but there is alway * Latest PING reply we received from it is not older than five seconds. * Latest INFO reply we received from it is not older than five seconds. * The latest INFO reply reported that the link with the master is down for no more than the time elapsed since we saw the master entering SDOWN state, plus ten times the configured `down_after_milliseconds` parameter. So for instance if a Sentinel is configured to sense the SDOWN condition after 10 seconds, and the master is down since 50 seconds, we accept a slave as a Good Slave only if the replication link was disconnected less than `50+(10*10)` seconds (two minutes and half more or less). +* It is not flagged as DEMOTE (see the section about resurrecting masters). **Sentinel Rule #12**: A **Subjective Leader** from the point of view of a Sentinel, is the Sentinel (including itself) with the lower runid monitoring a given master, that also replied to PING less than 5 seconds ago, reported to be able to do the failover via Pub/Sub hello channel, and is not in DISCONNECTED state. @@ -400,6 +401,26 @@ the configuration back to the original master. * A failover is in progress and a slave to promote was already selected (or in the case of the observer was already detected as master). * The promoted slave is in **Extended SDOWN** condition (continually in SDOWN condition for at least ten times the configured `down-after-milliseconds`). +Resurrecting master +--- + +After the failover, at some point the old master may return back online. Starting with Redis 2.6.13 Sentinel is able to handle this condition by automatically reconfiguring the old master as a slave of the new master. + +This happens in the following way: + +* After the failover has started from the point of view of a Sentinel, either as a leader, or as an observer that detected the promotion of a slave, the old master is put in the list of slaves of the new master, but with a special `DEMOTE` flag (the flag can be seen in the `SENTINEL SLAVES` command output). +* Once the master is back online and it is possible to contact it again, if it still claims to be a master (from INFO output) Sentinels will send a `SLAVEOF` command trying to reconfigure it. Once the instance claims to be a slave, the `DEMOTE` flag is cleared. + +There is no single Sentinel in charge of turning the old master into a slave, so the process is resistant against failing sentinels. At the same time instances with the `DEMOTE` flag set are never selected as promotable slaves. + +In this specific case the `+slave` event is only generated only when the old master will report to be actually a slave again in its `INFO` output. + +**Sentinel Rule #19**: Once the failover starts (either as observer or leader), the old master is added as a slave of the new master, flagged as `DEMOTE`. + +**Sentinel Rule #20**: A slave instance claiming to be a master, and flagged as `DEMOTE`, is reconfigured via `SLAVEOF` every time a Sentinel receives an `INFO` output where the wrong role is detected. + +**Sentinel Rule #21**: The `DEMOTE` flag is cleared as soon as an `INFO` output shows the instance to report itself as a slave. + Manual interactions --- @@ -506,7 +527,7 @@ Note: because currently slave priority is not implemented, the selection is performed only discarding unreachable slaves and picking the one with the lower Run ID. -**Sentinel Rule #19**: A Sentinel performing the failover as leader will select the slave to promote, among the existing **Good Slaves** (See rule #11), taking the one with the lower slave priority. When priority is the same the slave with lexicographically lower runid is preferred. +**Sentinel Rule #22**: A Sentinel performing the failover as leader will select the slave to promote, among the existing **Good Slaves** (See rule #11), taking the one with the lower slave priority. When priority is the same the slave with lexicographically lower runid is preferred. APPENDIX B - Get started with Sentinel in five minutes === From b18ff2956f299d42966b5f59d6b77f46ac145cce Mon Sep 17 00:00:00 2001 From: antirez Date: Sat, 4 May 2013 00:58:09 +0200 Subject: [PATCH 0029/2573] 'an user' -> 'a user' everywhere in the docs. --- commands/bitcount.md | 2 +- topics/sentinel-spec.md | 8 ++++---- topics/twitter-clone.md | 6 +++--- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/commands/bitcount.md b/commands/bitcount.md index ad0ff50560..35f65b6bc1 100644 --- a/commands/bitcount.md +++ b/commands/bitcount.md @@ -38,7 +38,7 @@ with a small progressive integer. For instance day 0 is the first day the application was put online, day 1 the next day, and so forth. -Every time an user performs a page view, the application can register that in +Every time a user performs a page view, the application can register that in the current day the user visited the web site using the `SETBIT` command setting the bit corresponding to the current day. diff --git a/topics/sentinel-spec.md b/topics/sentinel-spec.md index 0b70e52303..a1d33efb7f 100644 --- a/topics/sentinel-spec.md +++ b/topics/sentinel-spec.md @@ -55,7 +55,7 @@ configured quorum, select the desired behavior among many possibilities. Redis Sentinel does not use any proxy: clients reconfiguration is performed running user-provided executables (for instance a shell script or a -Python program) in an user setup specific way. +Python program) in a user setup specific way. In what form it will be shipped === @@ -271,7 +271,7 @@ Guarantees of the Leader election process === As you can see for a Sentinel to become a leader the majority is not strictly -required. An user can force the majority to be needed just setting the master +required. A user can force the majority to be needed just setting the master quorum to, for instance, the value of 5 if there are a total of 9 sentinels. However it is also possible to set the quorum to the value of 2 with 9 @@ -350,7 +350,7 @@ The fail over process consists of the following steps: * 1) Turn the selected slave into a master using the SLAVEOF NO ONE command. * 2) Turn all the remaining slaves, if any, to slaves of the new master. This is done incrementally, one slave after the other, waiting for the previous slave to complete the synchronization process before starting with the next one. -* 3) Call an user script to inform the clients that the configuration changed. +* 3) Call a user script to inform the clients that the configuration changed. * 4) Completely remove the old failing master from the table, and add the new master with the same name. If Steps "1" fails, the fail over is aborted. @@ -471,7 +471,7 @@ TODO === * More detailed specification of user script error handling, including what return codes may mean, like 0: try again. 1: fatal error. 2: try again, and so forth. -* More detailed specification of what happens when an user script does not return in a given amount of time. +* More detailed specification of what happens when a user script does not return in a given amount of time. * Add a "push" notification system for configuration changes. * Document that for every master monitored the configuration specifies a name for the master that is reported by all the SENTINEL commands. * Make clear that we handle a single Sentinel monitoring multiple masters. diff --git a/topics/twitter-clone.md b/topics/twitter-clone.md index 0453d23ce0..e21e189f11 100644 --- a/topics/twitter-clone.md +++ b/topics/twitter-clone.md @@ -117,14 +117,14 @@ Data layout Working with a relational database this is the stage were the database layout should be produced in form of tables, indexes, and so on. We don't have tables, so what should be designed? We need to identify what keys are needed to represent our objects and what kind of values this keys need to hold. -Let's start from Users. We need to represent this users of course, with the username, userid, password, followers and following users, and so on. The first question is, what should identify an user inside our system? The username can be a good idea since it is unique, but it is also too big, and we want to stay low on memory. So like if our DB was a relational one we can associate an unique ID to every user. Every other reference to this user will be done by id. That's very simple to do, because we have our atomic INCR operation! When we create a new user we can do something like this, assuming the user is called "antirez": +Let's start from Users. We need to represent this users of course, with the username, userid, password, followers and following users, and so on. The first question is, what should identify a user inside our system? The username can be a good idea since it is unique, but it is also too big, and we want to stay low on memory. So like if our DB was a relational one we can associate an unique ID to every user. Every other reference to this user will be done by id. That's very simple to do, because we have our atomic INCR operation! When we create a new user we can do something like this, assuming the user is called "antirez": INCR global:nextUserId => 1000 SET uid:1000:username antirez SET uid:1000:password p1pp0 We use the _global:nextUserId_ key in order to always get an unique ID for every new user. Then we use this unique ID to populate all the other keys holding our user data. *This is a Design Pattern* with key-values stores! Keep it in mind. -Besides the fields already defined, we need some more stuff in order to fully define an User. For example sometimes it can be useful to be able to get the user ID from the username, so we set this key too: +Besides the fields already defined, we need some more stuff in order to fully define a User. For example sometimes it can be useful to be able to get the user ID from the username, so we set this key too: SET username:antirez:uid 1000 @@ -150,7 +150,7 @@ OK, we have more or less everything about the user, but authentication. We'll ha SET uid:1000:auth fea5e81ac8ca77622bed1c2132a021f9 SET auth:fea5e81ac8ca77622bed1c2132a021f9 1000 -In order to authenticate an user we'll do this simple work (`login.php`): +In order to authenticate a user we'll do this simple work (`login.php`): * Get the username and password via the login form * Check if the username:``:uid key actually exists * If it exists we have the user id, (i.e. 1000) From 9e8163f89085aadf6f6b9be1b351c8c4092e68b8 Mon Sep 17 00:00:00 2001 From: Victor Deryagin Date: Tue, 14 May 2013 11:23:46 +0300 Subject: [PATCH 0030/2573] Fixed typo in partitioning.md --- topics/partitioning.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topics/partitioning.md b/topics/partitioning.md index c21f342036..7d663407f0 100644 --- a/topics/partitioning.md +++ b/topics/partitioning.md @@ -50,7 +50,7 @@ Some features of Redis don't play very well with partitioning: Data store or cache? --- -Partitioning when using Redis ad a data store or cache is conceptually the same, however there is a huge difference. While when Redis is used as a data store you need to be sure that a given key always maps to the same instance, when Redis is used as a cache if a given node is unavailable it is not a big problem if we start using a different node, altering the key-instance map as we wish to improve the *availability* of the system (that is, the ability of the system to reply to our queries). +Partitioning when using Redis as a data store or cache is conceptually the same, however there is a huge difference. While when Redis is used as a data store you need to be sure that a given key always maps to the same instance, when Redis is used as a cache if a given node is unavailable it is not a big problem if we start using a different node, altering the key-instance map as we wish to improve the *availability* of the system (that is, the ability of the system to reply to our queries). Consistent hashing implementations are often able to switch to other nodes if the preferred node for a given key is not available. Similarly if you add a new node, part of the new keys will start to be stored on the new node. From 6c0d29710f94e32c2715618148f3e142a1f055ab Mon Sep 17 00:00:00 2001 From: antirez Date: Wed, 15 May 2013 12:54:09 +0200 Subject: [PATCH 0031/2573] Changes to the sponsors page. --- topics/sponsors.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/topics/sponsors.md b/topics/sponsors.md index 3fb8ae516f..a2e94d3aa1 100644 --- a/topics/sponsors.md +++ b/topics/sponsors.md @@ -1,7 +1,9 @@ Redis Sponsors === -All the work [Salvatore Sanfilippo](http://twitter.com/antirez) and [Pieter Noordhuis](http://twitter.com/pnoordhuis) are doing in order to develop Redis is sponsored by [VMware](http://vmware.com). The Redis project no longer accepts money donations. +Starting from May 2013, all the work [Salvatore Sanfilippo](http://twitter.com/antirez) is doing in order to develop Redis is sponsored by [Pivotal](http://gopivotal.com). The Redis project no longer accepts money donations. + +Before May 2013 the project was sponsored by VMware with the work of [Salvatore Sanfilippo](http://twitter.com/antirez) and [Pieter Noordhuis](http://twitter.com/pnoordhuis). In the past Redis accepted donations from the following companies: @@ -17,6 +19,6 @@ Also thanks to the following people or organizations that donated to the Project * [Brad Jasper](http://bradjasper.com/) * [Mrkris](http://www.mrkris.com/) -We are grateful to [VMware](http://vmware.com) and to the companies and people that donated to the Redis project. Thank you. +We are grateful to [Pivotal](http://gopivotal.com), [VMware](http://vmware.com) and to the other companies and people that donated to the Redis project. Thank you. The Redis.io domain is kindly donated to the project by [I Want My Name](http://iwantmyname.com). From 14cc15a04a74c63864b1c96fc83c8936cb4dc04d Mon Sep 17 00:00:00 2001 From: BB Date: Sat, 18 May 2013 08:43:46 +0200 Subject: [PATCH 0032/2573] Added Redis client for Rebol. --- clients.json | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/clients.json b/clients.json index 327a74c93c..c0caf92da5 100644 --- a/clients.json +++ b/clients.json @@ -399,6 +399,14 @@ "active": true }, + { + "name": "prot-redis", + "language": "Rebol", + "repository": "https://github.com/rebolek/prot-redis", + "description": "Redis network scheme for Rebol 3", + "authors": ["rebolek"] + }, + { "name": "scala-redis", "language": "Scala", From 68f2caa343f6921e30d78844d2f8f4d4d04cf0a2 Mon Sep 17 00:00:00 2001 From: Matt MacAulay Date: Fri, 14 Jun 2013 15:00:51 -0400 Subject: [PATCH 0033/2573] Added Brando to the list of clients --- clients.json | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/clients.json b/clients.json index 327a74c93c..5bdbb735d6 100644 --- a/clients.json +++ b/clients.json @@ -669,5 +669,13 @@ "repository": "https://github.com/ctstone/csredis", "description": "Async (and sync) client for Redis and Sentinel", "authors": ["ctnstone"] + }, + + { + "name": "Brando", + "language": "Scala", + "repository": "https://github.com/chrisdinn/brando", + "description": "A Redis client written with the Akka IO package introduced in Akka 2.2.", + "authors": ["chrisdinn"] } ] From c49821ce776a8e1c4e9ba41e1d8898b8c1739cc9 Mon Sep 17 00:00:00 2001 From: Matt Perpick Date: Tue, 18 Jun 2013 12:16:30 -0300 Subject: [PATCH 0034/2573] Fixing latency typo s/log time/long time --- topics/latency.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topics/latency.md b/topics/latency.md index 261a5ea2dd..29bbaaff1a 100644 --- a/topics/latency.md +++ b/topics/latency.md @@ -436,7 +436,7 @@ The active expiring is designed to be adaptive. An expire cycle is started every + Sample `REDIS_EXPIRELOOKUPS_PER_CRON` keys, evicting all the keys already expired. + If the more than 25% of the keys were found expired, repeat. -Given that `REDIS_EXPIRELOOKUPS_PER_CRON` is set to 10 by default, and the process is performed ten times per second, usually just 100 keys per second are actively expired. This is enough to clean the DB fast enough even when already expired keys are not accessed for a log time, so that the *lazy* algorithm does not help. At the same time expiring just 100 keys per second has no effects in the latency a Redis instance. +Given that `REDIS_EXPIRELOOKUPS_PER_CRON` is set to 10 by default, and the process is performed ten times per second, usually just 100 keys per second are actively expired. This is enough to clean the DB fast enough even when already expired keys are not accessed for a long time, so that the *lazy* algorithm does not help. At the same time expiring just 100 keys per second has no effects in the latency a Redis instance. However the algorithm is adaptive and will loop if it founds more than 25% of keys already expired in the set of sampled keys. But given that we run the algorithm ten times per second, this means that the unlucky event of more than 25% of the keys in our random sample are expiring at least *in the same second*. From 9f61549ef4923c47708f82d3f380be73d0555940 Mon Sep 17 00:00:00 2001 From: antirez Date: Wed, 26 Jun 2013 11:38:56 +0200 Subject: [PATCH 0035/2573] PUBSUB command documented. --- commands.json | 18 ++++++++++++++++++ commands/pubsub.md | 42 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 60 insertions(+) create mode 100644 commands/pubsub.md diff --git a/commands.json b/commands.json index 16ed1194c4..2f2d70f05a 100644 --- a/commands.json +++ b/commands.json @@ -1121,6 +1121,24 @@ "since": "2.0.0", "group": "pubsub" }, + "PUBSUB": { + "summary": "Inspect the state of the Pub/Sub subsystem", + "complexity": "O(N) for the CHANNELS subcommand, where N is the number of active channels, and assuming constant time pattern matching (relatively short channels and patterns). O(N) for the NUMSUB subcommand, where N is the number of requested channels. O(1) for the NUMPAT subcommand.", + "arguments": [ + { + "name": "subcommand", + "type": "string" + }, + { + "name": "argument", + "type": "string", + "optional": true, + "multiple": true + } + ], + "since": "2.8.0", + "group": "pubsub" + }, "PTTL": { "summary": "Get the time to live for a key in milliseconds", "complexity": "O(1)", diff --git a/commands/pubsub.md b/commands/pubsub.md new file mode 100644 index 0000000000..319653f738 --- /dev/null +++ b/commands/pubsub.md @@ -0,0 +1,42 @@ +The PUBSUB command is an introspection command that allows to inspect the +state of the Pub/Sub subsystem. It is composed of subcommands that are +documented separately. The general form is: + + PUBSUB ... args ... + +# PUBSUB CHANNELS [pattern] + +Lists the currently *active channels*. An active channel is a Pub/Sub channel +with one ore more subscribers (not including clients subscribed to patterns). + +If no `pattern` is specified, all the channels are listed, otherwise if pattern +is specified only channels matching the specified glob-style pattern are +listed. + +@return + +@multi-bulk-reply: a list of active channels, optionally matching the specified pattern. + +# PUBSUB NUMSUB [channel-1 ... channel-N] + +Returns the number of subscribers (not counting clients subscribed to patterns) +for the specified channels. + +@return + +@multi-bulk-reply: a list of channels and number of subscribers for every channel. The format is channel, count, channel, count, ..., so the list is flat. +The order in which the channels are listed is the same as the order of the +channels specified in the command call. + +Note that it is valid to call this command without channels. In this case it +will just return an empty list. + +# PUBSUB NUMPAT + +Returns the number of subscriptions to patterns (that are performed using the +`PSUBSCRIBE` command). Note that this is not just the count of clients subscribed +to patterns but the total number of patterns all the clients are subscribed to. + +@return + +@integer-reply: the number of patterns all the clients are subscribed to. From 9db568250965da21e47258e1292757414c7f556b Mon Sep 17 00:00:00 2001 From: Tianon Gravi Date: Thu, 4 Jul 2013 14:56:29 -0600 Subject: [PATCH 0036/2573] Swapped MojoX::Redis for Mojo::Redis MojoX::Redis is deprecated in favor of Mojo::Redis --- clients.json | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/clients.json b/clients.json index 327a74c93c..53b6642e31 100644 --- a/clients.json +++ b/clients.json @@ -291,12 +291,12 @@ }, { - "name": "MojoX::Redis", + "name": "Mojo::Redis", "language": "Perl", - "url": "http://search.cpan.org/dist/MojoX-Redis", - "repository": "https://github.com/und3f/mojox-redis", + "url": "http://search.cpan.org/dist/Mojo-Redis", + "repository": "https://github.com/marcusramberg/mojo-redis", "description": "asynchronous Redis client for Mojolicious", - "authors": ["und3f"], + "authors": ["und3f", "marcusramberg", "jhthorsen"], "active": true }, From 510229a5efe3e8035fdfd103a4365162785ad323 Mon Sep 17 00:00:00 2001 From: Jan-Erik Rediger Date: Thu, 18 Jul 2013 22:12:07 +0200 Subject: [PATCH 0037/2573] Document COPY and REPLACE option for migrate. --- commands.json | 12 ++++++++++++ commands/migrate.md | 5 +++++ 2 files changed, 17 insertions(+) diff --git a/commands.json b/commands.json index 2f2d70f05a..a50abdcf82 100644 --- a/commands.json +++ b/commands.json @@ -964,6 +964,18 @@ { "name": "timeout", "type": "integer" + }, + { + "name": "copy", + "type": "enum", + "enum": ["COPY"], + "optional": true + }, + { + "name": "replace", + "type": "enum", + "enum": ["REPLACE"], + "optional": true } ], "since": "2.6.0", diff --git a/commands/migrate.md b/commands/migrate.md index 69736e1f21..775d1ea6fc 100644 --- a/commands/migrate.md +++ b/commands/migrate.md @@ -37,6 +37,11 @@ same name was also _already_ present on the target instance). On success OK is returned. +## Options + +* `COPY` -- Do not remove the key from the local instance. +* `REPLACE` -- Replace existing key on the remote instance. + @return @status-reply: The command returns OK on success. From 6dbf47427835c6120b5f9f35a3c9c1689318c54b Mon Sep 17 00:00:00 2001 From: antirez Date: Fri, 19 Jul 2013 10:44:50 +0200 Subject: [PATCH 0038/2573] Replication page updated with Redis 2.8 features. --- topics/replication.md | 65 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 64 insertions(+), 1 deletion(-) diff --git a/topics/replication.md b/topics/replication.md index 43c5c27014..3191fc7c5e 100644 --- a/topics/replication.md +++ b/topics/replication.md @@ -6,6 +6,8 @@ replication that allows slave Redis servers to be exact copies of master servers. The following are some very important facts about Redis replication: +* Redis uses asynchronous replication. Starting with Redis 2.8 there is however a periodic (one time every second) acknowledge of the replication stream processed by slaves. + * A master can have multiple slaves. * Slaves are able to accept other slaves connections. Aside from @@ -56,7 +58,33 @@ slave link goes down for some reason. If the master receives multiple concurrent slave synchronization requests, it performs a single background save in order to serve all of them. -When a master and a slave reconnects after the link went down, a full resync is performed. +When a master and a slave reconnects after the link went down, a full resync +is always performed. However starting with Redis 2.8, a partial resynchronization +is also possible. + +Partial resynchronization +--- + +Starting with Redis 2.8, master and slave are usually able to continue the +replication process without requiring a full resynchronization after the +replication link went down. + +This works using an in-memory backlog of the replication stream in the +master side. Also the master and all the slaves agree on a *replication +offset* and a *master run id*, so when the link goes down, the slave will +reconnect and ask the master to continue the replication, assuming the +master run id is still the same, and that the offset specified is available +in the replication backlog. + +If the conditions are met, the master just sends the part of the replication +stream the master missed, and the replication continues. +Otherwise a full resynchronization is performed as in the past versions of +Redis. + +The new partial resynchronization feature uses the `PSYNC` command internally, +while the old implementation used the `SYNC` command, however a Redis 2.8 +slave is able to detect if the server it is talking with does not support +`PSYNC`, and will use `SYNC` instead. Configuration --- @@ -70,6 +98,10 @@ Of course you need to replace 192.168.1.1 6379 with your master IP address (or hostname) and port. Alternatively, you can call the `SLAVEOF` command and the master host will start a sync with the slave. +There are also a few parameters in order to tune the replication backlog taken +in memory by the master to perform the partial resynchronization. See the example +`redis.conf` shipped with the Redis distribution for more information. + Read only slave --- @@ -93,3 +125,34 @@ To do it on a running instance, use `redis-cli` and type: To set it permanently, add this to your config file: masterauth + +Allow writes only with N attached replicas +--- + +Starting with Redis 2.8 it is possible to configure a Redis master in order to +accept write queries only if at least N slaves are currently connected to the +master, in order to improve data safety. + +However because Redis uses asynchronous replication it is not possible to ensure +the write actually received a given write, so there is always a window for data +loss. + +This is how the feature works: + +* Redis slaves ping the master every second, acknowledging the amount of replication stream processed. +* Redis masters will remember the last time it received a ping from every slave. +* The user can configure a minimum number of slaves that have a lag not greater than a maximum number of seconds. + +If there are at least N slaves, with a lag less than M seconds, then the write will be accepted. + +You may think at it as a relaxed version of the "C" in the CAP theorem, where consistency is not ensured for a given write, but at least the time window for data loss is restricted to a given number of seconds. + +If the conditions are not met, the master will instead reply with an error and the write will not be accepted. + +There are two configuration parameters for this feature: + +* min-slaves-to-write `` +* min-slaves-max-lag `` + +For more information please check the example `redis.conf` file shipped with the +Redis source distribution. From f3323a2761b6244452c923504d1c036cacdfccec Mon Sep 17 00:00:00 2001 From: antirez Date: Fri, 19 Jul 2013 11:20:48 +0200 Subject: [PATCH 0039/2573] CONFIG REWRITE documented. --- commands.json | 5 +++++ commands/config rewrite.md | 20 ++++++++++++++++++++ 2 files changed, 25 insertions(+) create mode 100644 commands/config rewrite.md diff --git a/commands.json b/commands.json index 2f2d70f05a..d0d4b14a46 100644 --- a/commands.json +++ b/commands.json @@ -180,6 +180,11 @@ "since": "2.0.0", "group": "server" }, + "CONFIG REWRITE": { + "summary": "Rewrite the configuration file with the in memory configuration", + "since": "2.8.0", + "group": "server" + }, "CONFIG SET": { "summary": "Set a configuration parameter to the given value", "arguments": [ diff --git a/commands/config rewrite.md b/commands/config rewrite.md new file mode 100644 index 0000000000..dcf5a973ad --- /dev/null +++ b/commands/config rewrite.md @@ -0,0 +1,20 @@ +The `CONFIG REWRITE` command rewrites the `redis.conf` file the server was started with, applying the minimal changes needed to make it reflecting the configuration currently used by the server, that may be different compared to the original one because of the use of the `CONFIG SET` command. + +The rewrite is performed in a very conservative way: + +* Comments and the overall structure of the original redis.conf are preserved as much as possible. +* If an option already exists in the old redis.conf file, it will be rewritten at the same position (line number). +* If an option was not already present, but it is set to its default value, it is not added by the rewrite process. +* If an option was not already present, but it is set to a non-default value, it is appended at the end of the file. +* Non used lines are blanked. For instance if you used to have multiple `save` directives, but the current configuration has fewer or none as you disabled RDB persistence, all the lines will be blanked. + +CONFIG REWRITE is also able to rewrite the configuration file from scratch if the original one no longer exists for some reason. However if the server was started without a configuration file at all, the CONFIG REWRITE will just return an error. + +## Atomic rewrite process + +In order to make sure the redis.conf file is always consistent, that is, on errors or crashes you always end with the old file, or the new one, the rewrite is perforemd with a single `write(2)` call that has enough content to be at least as big as the old file. Sometimes additional padding in the form of comments is added in order to make sure the resulting file is big enough, and later the file gets truncated to remove the padding at the end. + +@return + +@status-reply: `OK` when the configuration was rewritten properly. +Otherwise an error is returned. From b478a67868f3636ac27df0464c8a1de97c7d4d95 Mon Sep 17 00:00:00 2001 From: antirez Date: Fri, 19 Jul 2013 15:41:14 +0200 Subject: [PATCH 0040/2573] Benchmark page updated. --- topics/benchmarks.md | 152 ++++++++++++++++++++++++++++++++++++------- 1 file changed, 128 insertions(+), 24 deletions(-) diff --git a/topics/benchmarks.md b/topics/benchmarks.md index 1a25bc6c58..b645f4ea28 100644 --- a/topics/benchmarks.md +++ b/topics/benchmarks.md @@ -1,35 +1,35 @@ # How fast is Redis? -Redis includes the `redis-benchmark` utility that simulates SETs/GETs done by N -clients at the same time sending M total queries (it is similar to the Apache's -`ab` utility). Below you'll find the full output of a benchmark executed +Redis includes the `redis-benchmark` utility that simulates running commands done +by N clients at the same time sending M total queries (it is similar to the +Apache's `ab` utility). Below you'll find the full output of a benchmark executed against a Linux box. The following options are supported: Usage: redis-benchmark [-h ] [-p ] [-c ] [-n [-k ] - -h Server hostname (default 127.0.0.1) - -p Server port (default 6379) - -s Server socket (overrides host and port) - -c Number of parallel connections (default 50) - -n Total number of requests (default 10000) - -d Data size of SET/GET value in bytes (default 2) - -k 1=keep alive 0=reconnect (default 1) - -r Use random keys for SET/GET/INCR, random values for SADD + -h Server hostname (default 127.0.0.1) + -p Server port (default 6379) + -s Server socket (overrides host and port) + -c Number of parallel connections (default 50) + -n Total number of requests (default 10000) + -d Data size of SET/GET value in bytes (default 2) + -k 1=keep alive 0=reconnect (default 1) + -r Use random keys for SET/GET/INCR, random values for SADD Using this option the benchmark will get/set keys in the form mykey_rand:000000012456 instead of constant keys, the argument determines the max number of values for the random number. For instance if set to 10 only rand:000000000000 - rand:000000000009 range will be allowed. - -P Pipeline requests. Default 1 (no pipeline). - -q Quiet. Just show query/sec values - --csv Output in CSV format - -l Loop. Run the tests forever - -t Only run the comma separated list of tests. The test + -P Pipeline requests. Default 1 (no pipeline). + -q Quiet. Just show query/sec values + --csv Output in CSV format + -l Loop. Run the tests forever + -t Only run the comma separated list of tests. The test names are the same as the ones produced as output. - -I Idle mode. Just open N idle connections and wait. + -I Idle mode. Just open N idle connections and wait. You need to have a running Redis instance before launching the benchmark. A typical example would be: @@ -39,6 +39,79 @@ A typical example would be: Using this tool is quite easy, and you can also write your own benchmark, but as with any benchmarking activity, there are some pitfalls to avoid. +Running only a subset of the tests +--- + +You don't need to run all the default tests every time you execute redis-benchmark. +The simplest thing to select only a subset of tests is to use the `-t` option +like in the following example: + + $ redis-benchmark -t set,lpush -n 100000 -q + SET: 74239.05 requests per second + LPUSH: 79239.30 requests per second + +In the above example we asked to just run test the SET and LPUSH commands, +in quite mode (see the `-q` switch). + +It is also possible to specify the command to benchmark directly like in the +following example: + + $ redis-benchmark -n 100000 -q script load "redis.call('set','foo','bar')" + script load redis.call('set','foo','bar'): 69881.20 requests per second + +Selecting the size of the key space +--- + +By default the benchmark runs against a single key. In Redis the difference +between such a synthetic benchmark and a real one is not huge since it is an +in memory system, however it is possible to stress cache misses and in general +to simulate a more real-world work load by using a large key space. + +This is obtained by using the `-r` switch. For instance if I want to run +one million of SET operations, using a random key for every operation out of +100k possible keys, I'll use the following command line: + + $ redis-cli flushall + OK + + $ redis-benchmark -t set -r 100000 -n 1000000 + ====== SET ====== + 1000000 requests completed in 13.86 seconds + 50 parallel clients + 3 bytes payload + keep alive: 1 + + 99.76% `<=` 1 milliseconds + 99.98% `<=` 2 milliseconds + 100.00% `<=` 3 milliseconds + 100.00% `<=` 3 milliseconds + 72144.87 requests per second + + $ redis-cli dbsize + (integer) 99993 + +Using pipelining +--- + +By default every client (the benchmark simulates 50 clients if not otherwise +specified with `-c`) sends the next command only when the reply of the previous +command is received, this means that the server will likely need a read call +in order to read each command from every client. Also RTT is payed as well. + +Redis supports [/topics/pipelining](pipelining), so it is possible to send +multiple commands at once, a feature often exploited by real world applications. +Redis pipelining is able to dramatically improve the number of operations per +second a server is able do deliver. + +This is an example of running the benchmark in a Macbook air 11" using a +pipeling of 16 commands: + + $ redis-benchmark -n 1000000 -t set,get -P 16 -q + SET: 403063.28 requests per second + GET: 508388.41 requests per second + +Using pipelining resulted into a sensible amount of more commands processed. + Pitfalls and misconceptions --------------------------- @@ -239,16 +312,47 @@ the generated log file on a remote filesystem. instance using INFO at regular interval to gather statistics is probably fine, but MONITOR will impact the measured performance significantly. -# Example of benchmark result +# Benchmark results on different virtualized and bare metal servers. + +* The test was done with 50 simultaneous clients performing 2 million requests. +* Redis 2.6.14 is used for all the tests. +* Test executed using the loopback interface. +* Test executed using a key space of 1 million keys. +* Test executed with and without pipelining (16 commands pipeline). + +**Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (with pipelining)** + + $ ./redis-benchmark -r 1000000 -n 2000000 -t get,set,lpush,lpop -P 16 -q + SET: 552028.75 requests per second + GET: 707463.75 requests per second + LPUSH: 767459.75 requests per second + LPOP: 770119.38 requests per second + +**Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (without pipelining)** + + $ ./redis-benchmark -r 1000000 -n 2000000 -t get,set,lpush,lpop -q + SET: 122556.53 requests per second + GET: 123601.76 requests per second + LPUSH: 136752.14 requests per second + LPOP: 132424.03 requests per second + +**Linode 2048 instance (with pipelining)** + + $ ./redis-benchmark -r 1000000 -n 2000000 -t get,set,lpush,lpop -q -P 16 + SET: 195503.42 requests per second + GET: 250187.64 requests per second + LPUSH: 230547.55 requests per second + LPOP: 250815.16 requests per second -* The test was done with 50 simultaneous clients performing 100000 requests. -* The value SET and GET is a 256 bytes string. -* The Linux box is running *Linux 2.6*, it's *Xeon X3320 2.5 GHz*. -* Text executed using the loopback interface (127.0.0.1). +**Linode 2048 instance (without pipelining)** -Results: *about 110000 SETs per second, about 81000 GETs per second.* + $ ./redis-benchmark -r 1000000 -n 2000000 -t get,set,lpush,lpop -q + SET: 35001.75 requests per second + GET: 37481.26 requests per second + LPUSH: 36968.58 requests per second + LPOP: 35186.49 requests per second -## Latency percentiles +## More detailed tests without pipelining $ redis-benchmark -n 100000 From 7a87240ed0e105906d7005874df0e9142f2aafb2 Mon Sep 17 00:00:00 2001 From: Philipp Klose Date: Fri, 26 Jul 2013 02:16:15 +0200 Subject: [PATCH 0041/2573] Haxe was renamed Haxe was renamed. From "haXe" to "Haxe". --- clients.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/clients.json b/clients.json index 327a74c93c..1a7d9e6d33 100644 --- a/clients.json +++ b/clients.json @@ -489,7 +489,7 @@ { "name": "hxneko-redis", - "language": "haXe", + "language": "Haxe", "url": "http://code.google.com/p/hxneko-redis", "repository": "http://code.google.com/p/hxneko-redis/source/browse", "description": "", From ae80f66def21b68498f5c592d974cb3bdc1196fb Mon Sep 17 00:00:00 2001 From: sugelav Date: Sat, 27 Jul 2013 23:25:51 +0530 Subject: [PATCH 0042/2573] Added entry for aredis java client in clients.json. --- clients.json | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/clients.json b/clients.json index 327a74c93c..df24918f70 100644 --- a/clients.json +++ b/clients.json @@ -212,6 +212,14 @@ "active": true }, + { + "name": "aredis", + "language": "Java", + "repository": "http://aredis.sourceforge.net/", + "description": "Asynchronous, pipelined client based on Java 7 NIO Channel API", + "authors": ["msuresh"] + }, + { "name": "redis-lua", "language": "Lua", From 209a76a7270d85a84452b5cfd2580cdc9fe314b1 Mon Sep 17 00:00:00 2001 From: sugelav Date: Sun, 28 Jul 2013 12:57:02 +0530 Subject: [PATCH 0043/2573] Minor change to the description of aredis. --- clients.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/clients.json b/clients.json index df24918f70..00fd367596 100644 --- a/clients.json +++ b/clients.json @@ -216,7 +216,7 @@ "name": "aredis", "language": "Java", "repository": "http://aredis.sourceforge.net/", - "description": "Asynchronous, pipelined client based on Java 7 NIO Channel API", + "description": "Asynchronous, pipelined client based on the Java 7 NIO Channel API", "authors": ["msuresh"] }, From 380858ed6c4ab72c86959626c94887ac462da40f Mon Sep 17 00:00:00 2001 From: sugelav Date: Mon, 29 Jul 2013 22:27:12 +0530 Subject: [PATCH 0044/2573] Blanked out author tag for aredis in clients.json since author msuresh does not have a twitter account. --- clients.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/clients.json b/clients.json index 00fd367596..63dc9d9ee5 100644 --- a/clients.json +++ b/clients.json @@ -217,7 +217,7 @@ "language": "Java", "repository": "http://aredis.sourceforge.net/", "description": "Asynchronous, pipelined client based on the Java 7 NIO Channel API", - "authors": ["msuresh"] + "authors": [] }, { From 8e6266bb33edba5e81afd765ccdf2ca6bf7cbf63 Mon Sep 17 00:00:00 2001 From: Shawn Milochik Date: Fri, 9 Aug 2013 17:04:35 -0400 Subject: [PATCH 0045/2573] typo & grammar fixes, other minor edits Typos: (quite vs quiet, text vs test) A couple of capitalization fixes. A few small English grammar improvements. --- topics/benchmarks.md | 58 ++++++++++++++++++++++---------------------- 1 file changed, 29 insertions(+), 29 deletions(-) diff --git a/topics/benchmarks.md b/topics/benchmarks.md index b645f4ea28..e95a3f35a2 100644 --- a/topics/benchmarks.md +++ b/topics/benchmarks.md @@ -27,7 +27,7 @@ The following options are supported: -q Quiet. Just show query/sec values --csv Output in CSV format -l Loop. Run the tests forever - -t Only run the comma separated list of tests. The test + -t Only run the comma-separated list of tests. The test names are the same as the ones produced as output. -I Idle mode. Just open N idle connections and wait. @@ -51,7 +51,7 @@ like in the following example: LPUSH: 79239.30 requests per second In the above example we asked to just run test the SET and LPUSH commands, -in quite mode (see the `-q` switch). +in quiet mode (see the `-q` switch). It is also possible to specify the command to benchmark directly like in the following example: @@ -64,11 +64,11 @@ Selecting the size of the key space By default the benchmark runs against a single key. In Redis the difference between such a synthetic benchmark and a real one is not huge since it is an -in memory system, however it is possible to stress cache misses and in general +in-memory system, however it is possible to stress cache misses and in general to simulate a more real-world work load by using a large key space. This is obtained by using the `-r` switch. For instance if I want to run -one million of SET operations, using a random key for every operation out of +one million SET operations, using a random key for every operation out of 100k possible keys, I'll use the following command line: $ redis-cli flushall @@ -110,7 +110,7 @@ pipeling of 16 commands: SET: 403063.28 requests per second GET: 508388.41 requests per second -Using pipelining resulted into a sensible amount of more commands processed. +Using pipelining results in a significant increase in performance. Pitfalls and misconceptions --------------------------- @@ -124,8 +124,8 @@ in account. + Redis is a server: all commands involve network or IPC roundtrips. It is meaningless to compare it to embedded data stores such as SQLite, Berkeley DB, -Tokyo/Kyoto Cabinet, etc ... because the cost of most operations is precisely -dominated by network/protocol management. +Tokyo/Kyoto Cabinet, etc ... because the cost of most operations is +primarily in network/protocol management. + Redis commands return an acknowledgment for all usual commands. Some other data stores do not (for instance MongoDB does not implicitly acknowledge write operations). Comparing Redis to stores involving one-way queries is only @@ -136,7 +136,7 @@ you need multiple connections (like redis-benchmark) and/or to use pipelining to aggregate several commands and/or multiple threads or processes. + Redis is an in-memory data store with some optional persistency options. If you plan to compare it to transactional servers (MySQL, PostgreSQL, etc ...), -then you should consider activating AOF and decide of a suitable fsync policy. +then you should consider activating AOF and decide on a suitable fsync policy. + Redis is a single-threaded server. It is not designed to benefit from multiple CPU cores. People are supposed to launch several Redis instances to scale out on several cores if needed. It is not really fair to compare one @@ -184,7 +184,7 @@ memcached (dormando) developers. You can see that in the end, the difference between the two solutions is not so staggering, once all technical aspects are considered. Please note both -Redis and memcached have been optimized further after these benchmarks ... +Redis and memcached have been optimized further after these benchmarks. Finally, when very efficient servers are benchmarked (and stores like Redis or memcached definitely fall in this category), it may be difficult to saturate @@ -198,7 +198,7 @@ Factors impacting Redis performance There are multiple factors having direct consequences on Redis performance. We mention them here, since they can alter the result of any benchmarks. Please note however, that a typical Redis instance running on a low end, -non tuned, box usually provides good enough performance for most applications. +untuned box usually provides good enough performance for most applications. + Network bandwidth and latency usually have a direct impact on the performance. It is a good practice to use the ping program to quickly check the latency @@ -207,7 +207,7 @@ Regarding the bandwidth, it is generally useful to estimate the throughput in Gbits/s and compare it to the theoretical bandwidth of the network. For instance a benchmark setting 4 KB strings in Redis at 100000 q/s, would actually consume 3.2 Gbits/s of bandwidth -and probably fit with a 10 GBits/s link, but not a 1 Gbits/s one. In many real +and probably fit within a 10 GBits/s link, but not a 1 Gbits/s one. In many real world scenarios, Redis throughput is limited by the network well before being limited by the CPU. To consolidate several high-throughput Redis instances on a single server, it worth considering putting a 10 Gbits/s NIC @@ -215,24 +215,24 @@ or multiple 1 Gbits/s NICs with TCP/IP bonding. + CPU is another very important factor. Being single-threaded, Redis favors fast CPUs with large caches and not many cores. At this game, Intel CPUs are currently the winners. It is not uncommon to get only half the performance on -an AMD Opteron CPU compared to similar Nehalem EP/Westmere EP/Sandy bridge +an AMD Opteron CPU compared to similar Nehalem EP/Westmere EP/Sandy Bridge Intel CPUs with Redis. When client and server run on the same box, the CPU is the limiting factor with redis-benchmark. + Speed of RAM and memory bandwidth seem less critical for global performance especially for small objects. For large objects (>10 KB), it may become -noticeable though. Usually, it is not really cost effective to buy expensive +noticeable though. Usually, it is not really cost-effective to buy expensive fast memory modules to optimize Redis. -+ Redis runs slower on a VM. Virtualization toll is quite high because ++ Redis runs slower on a VM. The virtualization toll is quite high because for many common operations, Redis does not add much overhead on top of the required system calls and network interruptions. Prefer to run Redis on a physical box, especially if you favor deterministic latencies. On a state-of-the-art hypervisor (VMWare), result of redis-benchmark on a VM -through the physical network is almost divided by 2 compared to the +through the physical network is almost cut in half compared to the physical machine, with some significant CPU time spent in system and interruptions. + When the server and client benchmark programs run on the same box, both -the TCP/IP loopback and unix domain sockets can be used. It depends on the -platform, but unix domain sockets can achieve around 50% more throughput than +the TCP/IP loopback and unix domain sockets can be used. Depending on the +platform, unix domain sockets can achieve around 50% more throughput than the TCP/IP loopback (on Linux for instance). The default behavior of redis-benchmark is to use the TCP/IP loopback. + The performance benefit of unix domain sockets compared to TCP/IP loopback @@ -247,7 +247,7 @@ See the graph below. + On multi CPU sockets servers, Redis performance becomes dependant on the NUMA configuration and process location. The most visible effect is that -redis-benchmark results seem non deterministic because client and server +redis-benchmark results seem non-deterministic because client and server processes are distributed randomly on the cores. To get deterministic results, it is required to use process placement tools (on Linux: taskset or numactl). The most efficient combination is always to put the client and server on two @@ -260,7 +260,7 @@ Please note this benchmark is not meant to compare CPU models between themselves ![NUMA chart](https://github.com/dspezia/redis-doc/raw/6374a07f93e867353e5e946c1e39a573dfc83f6c/topics/NUMA_chart.gif) + With high-end configurations, the number of client connections is also an -important factor. Being based on epoll/kqueue, Redis event loop is quite +important factor. Being based on epoll/kqueue, the Redis event loop is quite scalable. Redis has already been benchmarked at more than 60000 connections, and was still able to sustain 50000 q/s in these conditions. As a rule of thumb, an instance with 30000 connections can only process half the throughput @@ -278,7 +278,7 @@ Jumbo frames may also provide a performance boost when large objects are used. + Depending on the platform, Redis can be compiled against different memory allocators (libc malloc, jemalloc, tcmalloc), which may have different behaviors in term of raw speed, internal and external fragmentation. -If you did not compile Redis by yourself, you can use the INFO command to check +If you did not compile Redis yourself, you can use the INFO command to check the mem_allocator field. Please note most benchmarks do not run long enough to generate significant external fragmentation (contrary to production Redis instances). @@ -289,7 +289,7 @@ Other things to consider One important goal of any benchmark is to get reproducible results, so they can be compared to the results of other tests. -+ A good practice is to try to run tests on isolated hardware as far as possible. ++ A good practice is to try to run tests on isolated hardware as much as possible. If it is not possible, then the system must be monitored to check the benchmark is not impacted by some external activity. + Some configurations (desktops and laptops for sure, some servers as well) @@ -300,8 +300,8 @@ reproducible results, it is better to set the highest possible fixed frequency for all the CPU cores involved in the benchmark. + An important point is to size the system accordingly to the benchmark. The system must have enough RAM and must not swap. On Linux, do not forget -to set the overcommit_memory parameter correctly. Please note 32 and 64 bits -Redis instances have not the same memory footprint. +to set the overcommit_memory parameter correctly. Please note 32 and 64 bit +Redis instances do not have the same memory footprint. + If you plan to use RDB or AOF for your benchmark, please check there is no other I/O activity in the system. Avoid putting RDB or AOF files on NAS or NFS shares, or on any other devices impacting your network bandwidth and/or latency @@ -312,13 +312,13 @@ the generated log file on a remote filesystem. instance using INFO at regular interval to gather statistics is probably fine, but MONITOR will impact the measured performance significantly. -# Benchmark results on different virtualized and bare metal servers. +# Benchmark results on different virtualized and bare-metal servers. * The test was done with 50 simultaneous clients performing 2 million requests. * Redis 2.6.14 is used for all the tests. -* Test executed using the loopback interface. -* Test executed using a key space of 1 million keys. -* Test executed with and without pipelining (16 commands pipeline). +* Test was executed using the loopback interface. +* Test was executed using a key space of 1 million keys. +* Test was executed with and without pipelining (16 commands pipeline). **Intel(R) Xeon(R) CPU E5520 @ 2.27GHz (with pipelining)** @@ -447,7 +447,7 @@ will output the following: LPUSH: 34803.41 requests per second LPOP: 37367.20 requests per second -Another one using a 64 bit box, a Xeon L5420 clocked at 2.5 GHz: +Another one using a 64-bit box, a Xeon L5420 clocked at 2.5 GHz: $ ./redis-benchmark -q -n 100000 PING: 111731.84 requests per second @@ -463,7 +463,7 @@ Another one using a 64 bit box, a Xeon L5420 clocked at 2.5 GHz: * Redis version **2.4.2** * Default number of connections, payload size = 256 * The Linux box is running *SLES10 SP3 2.6.16.60-0.54.5-smp*, CPU is 2 x *Intel X5670 @ 2.93 GHz*. -* Text executed while running redis server and benchmark client on the same CPU, but different cores. +* Test executed while running Redis server and benchmark client on the same CPU, but different cores. Using a unix domain socket: From 464e917b5c104d62d30aee601e6f3f91adc10805 Mon Sep 17 00:00:00 2001 From: Amber Jain Date: Thu, 15 Aug 2013 19:16:13 +0530 Subject: [PATCH 0046/2573] fixed typos in http://redis.io/topics/quickstart --- topics/quickstart.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/topics/quickstart.md b/topics/quickstart.md index c0dc6f80b3..5de6b92cf5 100644 --- a/topics/quickstart.md +++ b/topics/quickstart.md @@ -28,19 +28,19 @@ In order to compile Redis follow this simple steps: cd redis-stable make -At this point you can try if your build works correctly typing **make test**, but this is an optional step. After the compilation the **src** directory inside the Redis distribution is populated with the different executables that are part of Redis: +At this point you can try if your build works correctly by typing **make test**, but this is an optional step. After the compilation the **src** directory inside the Redis distribution is populated with the different executables that are part of Redis: * **redis-server** is the Redis Server itself. * **redis-cli** is the command line interface utility to talk with Redis. * **redis-benchmark** is used to check Redis performances. * **redis-check-aof** and **redis-check-dump** are useful in the rare event of corrupted data files. -It is a good idea to copy both the Redis server than the command line interface in proper places using the following commands: +It is a good idea to copy both the Redis server and the command line interface in proper places using the following commands: * sudo cp redis-server /usr/local/bin/ * sudo cp redis-cli /usr/local/bin/ -In the following documentation I assume that /usr/local/bin is in your PATH environment variable so you can execute both the binaries without specifying the full path. +In the following documentation I assume that /usr/local/bin is in your PATH environment variable so that you can execute both the binaries without specifying the full path. Starting Redis === @@ -114,7 +114,7 @@ commands calling methods. A short interactive example using Ruby: Redis persistence ================= -You can learn [how Redis persisence works in this page](http://redis.io/topics/persistence), however what is important to understand for a quick start is that by default, if you start Redis with the default configuration, Redis will spontaneously save the dataset only from time to time (for instance after at least five minutes if you have at least 100 changes in your data), so if you want your database to persist and be reloaded after a restart make sure to call the **SAVE** command manually every time you want to force a data set snapshot. Otherwise make sure to shutdown the database using the **SHUTDOWN** command: +You can learn [how Redis persisence works on this page](http://redis.io/topics/persistence), however what is important to understand for a quick start is that by default, if you start Redis with the default configuration, Redis will spontaneously save the dataset only from time to time (for instance after at least five minutes if you have at least 100 changes in your data), so if you want your database to persist and be reloaded after a restart make sure to call the **SAVE** command manually every time you want to force a data set snapshot. Otherwise make sure to shutdown the database using the **SHUTDOWN** command: $ redis-cli shutdown @@ -182,5 +182,5 @@ Make sure that everything is working as expected: * Check that your Redis instance is correctly logging in the log file. * If it's a new machine where you can try it without problems make sure that after a reboot everything is still working. -Note: in the above instructions we skipped many Redis configurations parameters that you would like to change, for instance in order to use AOF persistence instead of RDB persistence, or to setup replication, and so forth. +Note: In the above instructions we skipped many Redis configuration parameters that you would like to change, for instance in order to use AOF persistence instead of RDB persistence, or to setup replication, and so forth. Make sure to read the redis.conf file (that is heavily commented) and the other documentation you can find in this web site for more information. From a33ab77627e1871fc8b01fa2061533127ac266aa Mon Sep 17 00:00:00 2001 From: antirez Date: Wed, 21 Aug 2013 16:48:29 +0200 Subject: [PATCH 0047/2573] Slave election documented in Redis Cluster spec. --- topics/cluster-spec.md | 38 ++++++++++++++++++++++++++++++-------- 1 file changed, 30 insertions(+), 8 deletions(-) diff --git a/topics/cluster-spec.md b/topics/cluster-spec.md index e9efe98668..455c274d0a 100644 --- a/topics/cluster-spec.md +++ b/topics/cluster-spec.md @@ -400,18 +400,40 @@ The FAIL state for the cluster happens in two cases. The second check is required because in order to mark a node from PFAIL to FAIL state, the majority of masters are required. However when we are not connected with the majority of masters it is impossible from our side of the net split to mark nodes as FAIL. However since we detect this condition we set the Cluster state in FAIL mode to stop serving queries. -Slave election (not implemented) +Slave election --- -The design of slave election is a work in progress right now. +Once a master node is in FAIL state, if one or more slaves exist for this master one should be elected as a master and all the other slaves reconfigured to replicate with the new master. -The idea is to use the concept of first slave, that is, out of all the -slaves for a given node, the first slave is the one with the lower -Node ID (comparing node IDs lexicographically). +The election of a slave is a task that is handled directly by the slaves of the failing master. The trigger is the following set of conditions: -However it is likely that the same system used for failure reports will be -used in order to require the majority of masters to authorize the slave -election. +* A node is a slave of a master in FAIL state. +* The master was serving a non-zero number of slots. +* The slave's data is considered reliable, that is, from the point of view of the replication layer, the replication link has not been down for more than the configured node timeout multiplied for a given multiplication factor (see the `REDIS_CLUSTER_SLAVE_VALIDITY_MULT` define). + +If all the above conditions are true, the slave starts requesting the +authorization to be promoted to master to all the reachable masters. + +A master will reply with a positive message `FAILOVER_AUTH_GRANTED` if the sender of the message has the following properties: + +* Is a slave, and the master is indeed in FAIL state. +* Ordering all the slaves for this master, it has the lowest Node ID. +* It appears to be up and running (no FAIL or PFAIL state). + +Once the slave receives the authorization from the majority of the masters within a certain amount of time, it starts the failover process performing the following tasks: + +* It starts advertising itself as a master (via PONG packets). +* It also advertises it is a promoted slave (via PONG packets). +* It also starts claiming all the nodes that were served by the old master. +* A PONG packet is broadcasted to all the nodes to speedup the proccess. + +All the other nodes will update the configuration accordingly. Specifically: + +* All the slots claimed by the new master will be updated, since they are currently claimed by a master in FAIL state. +* All the other slaves of the old master will detect the PROMOTED flag and will switch the replication to the new master. +* If the old master will return back again, will detect the PROMOTED flag and will configure itself as a slave of the new master. + +The PROMOTED flag will be lost by a node when it is turned again into a slave for some reason during the life of the cluster. Publish/Subscribe (implemented, but to refine) === From 5149995e8fc682be9e21afe7ee82e851193dff85 Mon Sep 17 00:00:00 2001 From: antirez Date: Wed, 21 Aug 2013 16:49:22 +0200 Subject: [PATCH 0048/2573] typo in cluster spec. elected -> promoted. --- topics/cluster-spec.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topics/cluster-spec.md b/topics/cluster-spec.md index 455c274d0a..0bbab07625 100644 --- a/topics/cluster-spec.md +++ b/topics/cluster-spec.md @@ -403,7 +403,7 @@ The second check is required because in order to mark a node from PFAIL to FAIL Slave election --- -Once a master node is in FAIL state, if one or more slaves exist for this master one should be elected as a master and all the other slaves reconfigured to replicate with the new master. +Once a master node is in FAIL state, if one or more slaves exist for this master one should be promoted as a master and all the other slaves reconfigured to replicate with the new master. The election of a slave is a task that is handled directly by the slaves of the failing master. The trigger is the following set of conditions: From 4424e5354cedd12057d33a809556396ac7bc643b Mon Sep 17 00:00:00 2001 From: Matteo Centenaro Date: Fri, 23 Aug 2013 18:17:40 +0200 Subject: [PATCH 0049/2573] The redhatvm cited article have a known bug The "Understanding Virtual Memory" article cited when motivating the setting for overcommit_memory had the meaning of the values 1 and 2 reversed. I found it while reading this comment http://superuser.com/a/200504. With this commit, I'm trying to make this known to the Redis FAQ reader. The proc(5) man page has it pretty clear: /proc/sys/vm/overcommit_memory This file contains the kernel virtual memory accounting mode. Values are: 0: heuristic overcommit (this is the default) 1: always overcommit, never check 2: always check, never overcommit In mode 0, calls of mmap(2) with MAP_NORESERVE are not checked, and the default check is very weak, leading to the risk of getting a process "OOM-killed". Under Linux 2.4 any nonzero value implies mode 1. In mode 2 (available since Linux 2.6), the total virtual address space on the system is limited to (SS + RAM*(r/100)), where SS is the size of the swap space, and RAM is the size of the physical memory, and r is the contents of the file /proc/sys/vm/overcommit_ratio. --- topics/faq.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/topics/faq.md b/topics/faq.md index dfb42a0665..154c1cbf16 100644 --- a/topics/faq.md +++ b/topics/faq.md @@ -252,8 +252,12 @@ more optimistic allocation fashion, and this is indeed what you want for Redis. A good source to understand how Linux Virtual Memory work and other alternatives for `overcommit_memory` and `overcommit_ratio` is this classic from Red Hat Magazine, ["Understanding Virtual Memory"][redhatvm]. +Beware, this article had 1 and 2 configurtation value for `overcommit_memory` +reversed: reffer to the ["proc(5)"][proc5] man page for the right meaning of the +available values. [redhatvm]: http://www.redhat.com/magazine/001nov04/features/vm/ +[proc5]: http://man7.org/linux/man-pages/man5/proc.5.html ## Are Redis on disk snapshots atomic? From a7a6c8751c6a798713a2d4e35c07fc1141c2c642 Mon Sep 17 00:00:00 2001 From: Matteo Centenaro Date: Fri, 23 Aug 2013 18:23:26 +0200 Subject: [PATCH 0050/2573] Remove " araund proc(5) --- topics/faq.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topics/faq.md b/topics/faq.md index 154c1cbf16..5ea626080a 100644 --- a/topics/faq.md +++ b/topics/faq.md @@ -253,7 +253,7 @@ A good source to understand how Linux Virtual Memory work and other alternatives for `overcommit_memory` and `overcommit_ratio` is this classic from Red Hat Magazine, ["Understanding Virtual Memory"][redhatvm]. Beware, this article had 1 and 2 configurtation value for `overcommit_memory` -reversed: reffer to the ["proc(5)"][proc5] man page for the right meaning of the +reversed: reffer to the [proc(5)][proc5] man page for the right meaning of the available values. [redhatvm]: http://www.redhat.com/magazine/001nov04/features/vm/ From 0c9c73adbf7d10d3d6768df122dfdb5fe3f86bb3 Mon Sep 17 00:00:00 2001 From: Matteo Centenaro Date: Fri, 23 Aug 2013 18:25:19 +0200 Subject: [PATCH 0051/2573] FIX: typos here and there - configurtation -> configuration - reffer -> refer --- topics/faq.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/topics/faq.md b/topics/faq.md index 5ea626080a..c7294947ec 100644 --- a/topics/faq.md +++ b/topics/faq.md @@ -252,8 +252,8 @@ more optimistic allocation fashion, and this is indeed what you want for Redis. A good source to understand how Linux Virtual Memory work and other alternatives for `overcommit_memory` and `overcommit_ratio` is this classic from Red Hat Magazine, ["Understanding Virtual Memory"][redhatvm]. -Beware, this article had 1 and 2 configurtation value for `overcommit_memory` -reversed: reffer to the [proc(5)][proc5] man page for the right meaning of the +Beware, this article had 1 and 2 configuration value for `overcommit_memory` +reversed: refer to the [proc(5)][proc5] man page for the right meaning of the available values. [redhatvm]: http://www.redhat.com/magazine/001nov04/features/vm/ From 9ebac39d49f576a4e14ff9a4ffe642fb48aa68ba Mon Sep 17 00:00:00 2001 From: Matteo Centenaro Date: Fri, 23 Aug 2013 18:28:43 +0200 Subject: [PATCH 0052/2573] Format option values as code --- topics/faq.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topics/faq.md b/topics/faq.md index c7294947ec..9e53d4bde0 100644 --- a/topics/faq.md +++ b/topics/faq.md @@ -252,7 +252,7 @@ more optimistic allocation fashion, and this is indeed what you want for Redis. A good source to understand how Linux Virtual Memory work and other alternatives for `overcommit_memory` and `overcommit_ratio` is this classic from Red Hat Magazine, ["Understanding Virtual Memory"][redhatvm]. -Beware, this article had 1 and 2 configuration value for `overcommit_memory` +Beware, this article had `1` and `2` configuration values for `overcommit_memory` reversed: refer to the [proc(5)][proc5] man page for the right meaning of the available values. From f66cf7cffc7f1980cc9f50a1b1a8bfa9d5c0ebee Mon Sep 17 00:00:00 2001 From: antirez Date: Mon, 26 Aug 2013 09:48:13 +0200 Subject: [PATCH 0053/2573] Redis release cycle. --- topics/releases.md | 74 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 74 insertions(+) create mode 100644 topics/releases.md diff --git a/topics/releases.md b/topics/releases.md new file mode 100644 index 0000000000..bdaf616fe9 --- /dev/null +++ b/topics/releases.md @@ -0,0 +1,74 @@ +Redis release cycle +=== + +Redis is system software, and a type of system software that holds user +data, so it is among the most critical pieces of a software stack. + +For this reason our release cycle tries hard to make sure that a stable +release is only released when it reaches a sufficiently high level of +stability, even at the cost of a slower release cycle. + +A given version of Redis can be at three different levels of stability: + +* unstable +* development +* frozen +* release candidate +* stable + +Unstable tree +=== + +The unstable version of Redis is always located in the `unstable` branch in +the [Redis Github Repository](http://github.com/antirez/redis). + +This is the source tree where most of the new features are developed and +is not considered to be production ready: it may contain critical bugs, +not entirely ready features, and may be unstable. + +However we try hard to make sure that even the unstable branch of most of the +times usable in a development environment without major issues. + +Forked, Frozen, Release candidate tree +=== + +When a new version of Redis starts to be planned, the unstable branch +(or sometimes the currently stable branch) is forked into a new +branch that has the name of the target release. + +For instance when Redis 2.6 was released a stable, the `unstable` branch +was forked into a `2.8` branch. + +This new branch can be at three different levels of stability: development, frozen, release canddiate. + +* Development: new features and bug fixes are commited into the branch, but not everything goign into `unstable` is merged here. Only the features that can become stable in a reasonable timeframe are merged. +* Frozen: no new feature is added, unless it is almost guaranteed to have zero stability impacts on the source code, and at the same time for some reason it is a very important feature that must be shipped ASAP. Big code changes are only allowed when they are needed in order to fix bugs. +* Release Candidate: only fixes are committed against this release. + +Stable tree +=== + +At some point, when a given Redis release is in the Release Candidate state +for enough time, we observe that the frequency at which critical bugs are +signaled start to decrease, at the point that for a few weeks we don't have +any report of serious bugs. + +When this happens the release is marked as stable. + +Version numbers +--- + +Stable releases follow the usual `major.minor.patch` versioning schema, with the following special rules: + +* The minor is even in stable versions of Redis. +* The minor is odd in unstable, development, frozen, release candidates. For instance the unstable version of 2.8.x will have a version number in the form 2.7.x. In general the unstable version of x.y.z will have a version x.(y-1).z. +* As an unstable version of Redis progresses, the patchlevel is incremented from time to time, so at a given time you may have 2.7.2, and later 2.7.3 and so forth. However when the release candidate state is reached, the patchlevel starts tfrom 101. So for instance 2.7.101 is the first release candidate for 2.8, 2.7.105 is Release Candidate 5, and so forth. + +Support +--- + +Old versions are not supported as we try hard to take the Redis mostly API compatible with the past, so upgrading to newer versions is usually trivial. + +So for instance if currently stable release is 2.6.x we accept bug reports and provide support for the previous stable release (2.4.x), but not for older releases such as 2.2.x. + +When 2.8 will be released as a stable release 2.6.x will be the oldest supported release, and so forth. From 4e1f3f4700417ad6ffe1bae8f25abf5737156ecc Mon Sep 17 00:00:00 2001 From: bmatte Date: Mon, 26 Aug 2013 10:53:37 +0200 Subject: [PATCH 0054/2573] Typos. --- topics/releases.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topics/releases.md b/topics/releases.md index bdaf616fe9..33ddcbf2b8 100644 --- a/topics/releases.md +++ b/topics/releases.md @@ -62,7 +62,7 @@ Stable releases follow the usual `major.minor.patch` versioning schema, with the * The minor is even in stable versions of Redis. * The minor is odd in unstable, development, frozen, release candidates. For instance the unstable version of 2.8.x will have a version number in the form 2.7.x. In general the unstable version of x.y.z will have a version x.(y-1).z. -* As an unstable version of Redis progresses, the patchlevel is incremented from time to time, so at a given time you may have 2.7.2, and later 2.7.3 and so forth. However when the release candidate state is reached, the patchlevel starts tfrom 101. So for instance 2.7.101 is the first release candidate for 2.8, 2.7.105 is Release Candidate 5, and so forth. +* As an unstable version of Redis progresses, the patchlevel is incremented from time to time, so at a given time you may have 2.7.2, and later 2.7.3 and so forth. However when the release candidate state is reached, the patchlevel starts from 101. So for instance 2.7.101 is the first release candidate for 2.8, 2.7.105 is Release Candidate 5, and so forth. Support --- From 53a0b25a35a7628bbd6866050a5101cb440522a5 Mon Sep 17 00:00:00 2001 From: Michel Martens Date: Thu, 29 Aug 2013 09:37:49 +0200 Subject: [PATCH 0055/2573] Minor editing for releases. --- topics/releases.md | 31 +++++++++++++++++++------------ 1 file changed, 19 insertions(+), 12 deletions(-) diff --git a/topics/releases.md b/topics/releases.md index bdaf616fe9..25c1d9535b 100644 --- a/topics/releases.md +++ b/topics/releases.md @@ -26,8 +26,9 @@ This is the source tree where most of the new features are developed and is not considered to be production ready: it may contain critical bugs, not entirely ready features, and may be unstable. -However we try hard to make sure that even the unstable branch of most of the -times usable in a development environment without major issues. +However, we try hard to make sure that even the unstable branch is +usable most of the time in a development environment without major +issues. Forked, Frozen, Release candidate tree === @@ -36,12 +37,13 @@ When a new version of Redis starts to be planned, the unstable branch (or sometimes the currently stable branch) is forked into a new branch that has the name of the target release. -For instance when Redis 2.6 was released a stable, the `unstable` branch -was forked into a `2.8` branch. +For instance, when Redis 2.6 was released as stable, the `unstable` branch +was forked into the `2.8` branch. -This new branch can be at three different levels of stability: development, frozen, release canddiate. +This new branch can be at three different levels of stability: +development, frozen, and release candidate. -* Development: new features and bug fixes are commited into the branch, but not everything goign into `unstable` is merged here. Only the features that can become stable in a reasonable timeframe are merged. +* Development: new features and bug fixes are commited into the branch, but not everything going into `unstable` is merged here. Only the features that can become stable in a reasonable timeframe are merged. * Frozen: no new feature is added, unless it is almost guaranteed to have zero stability impacts on the source code, and at the same time for some reason it is a very important feature that must be shipped ASAP. Big code changes are only allowed when they are needed in order to fix bugs. * Release Candidate: only fixes are committed against this release. @@ -50,10 +52,10 @@ Stable tree At some point, when a given Redis release is in the Release Candidate state for enough time, we observe that the frequency at which critical bugs are -signaled start to decrease, at the point that for a few weeks we don't have -any report of serious bugs. +signaled starts to decrease, to the point that for a few weeks we don't have +any serious bugs reported. -When this happens the release is marked as stable. +When this happens, the release is marked as stable. Version numbers --- @@ -67,8 +69,13 @@ Stable releases follow the usual `major.minor.patch` versioning schema, with the Support --- -Old versions are not supported as we try hard to take the Redis mostly API compatible with the past, so upgrading to newer versions is usually trivial. +Older versions are not supported as we try very hard to make the +Redis API mostly backward compatible. Upgrading to newer versions +is usually trivial. -So for instance if currently stable release is 2.6.x we accept bug reports and provide support for the previous stable release (2.4.x), but not for older releases such as 2.2.x. +For example, if the current stable release is 2.6.x, we accept bug +reports and provide support for the previous stable release +(2.4.x), but not for older ones such as 2.2.x. -When 2.8 will be released as a stable release 2.6.x will be the oldest supported release, and so forth. +When 2.8 becomes the current stable release, the 2.6.x will be the +oldest supported release. From e5d1a63f926d8cfcb7bf04200defd1e7cfb8d5f8 Mon Sep 17 00:00:00 2001 From: antirez Date: Tue, 3 Sep 2013 14:42:06 +0200 Subject: [PATCH 0056/2573] TTL command doc improved. --- commands/ttl.md | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/commands/ttl.md b/commands/ttl.md index 8be2d28b05..17055f4884 100644 --- a/commands/ttl.md +++ b/commands/ttl.md @@ -2,10 +2,18 @@ Returns the remaining time to live of a key that has a timeout. This introspection capability allows a Redis client to check how many seconds a given key will continue to be part of the dataset. +In Redis 2.6 or older the command returns `-1` if the key does not exist or if the key exist but has no associated expire. + +Starting with Redis 2.8 the return value in case of error changed: + +* The command returns `-2` if the key does not exist. +* The command returns `-1` if the key exists but has no associated expire. + +See also the `PTTL` command that returns the same information with milliseconds resolution (Only available in Redis 2.8 or greater). + @return -@integer-reply: TTL in seconds, `-2` when `key` does not exist or `-1` when `key` does not -have a timeout. +@integer-reply: TTL in seconds, or a negative value in order to signal an error (see the description above). @examples From fe073ea10296cb3f091dee2e8e6c90840ac901c6 Mon Sep 17 00:00:00 2001 From: antirez Date: Wed, 18 Sep 2013 18:47:41 +0200 Subject: [PATCH 0057/2573] Fixed typo and grammar in cluster spec. --- topics/cluster-spec.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/topics/cluster-spec.md b/topics/cluster-spec.md index 0bbab07625..a218fbfa6d 100644 --- a/topics/cluster-spec.md +++ b/topics/cluster-spec.md @@ -422,10 +422,10 @@ A master will reply with a positive message `FAILOVER_AUTH_GRANTED` if the sende Once the slave receives the authorization from the majority of the masters within a certain amount of time, it starts the failover process performing the following tasks: -* It starts advertising itself as a master (via PONG packets). -* It also advertises it is a promoted slave (via PONG packets). -* It also starts claiming all the nodes that were served by the old master. -* A PONG packet is broadcasted to all the nodes to speedup the proccess. +* Starts advertising itself as a master (via PONG packets). +* Starts advertising it is a promoted slave (via PONG packets). +* Starts claiming all the slots that were served by the old master. +* A PONG packet is broadcasted to all the nodes to speedup the proccess, without waiting for the usual PING/PONG period. All the other nodes will update the configuration accordingly. Specifically: From f39169afff494f6e87850e84e8f592a403ef3e56 Mon Sep 17 00:00:00 2001 From: reterVision Date: Wed, 25 Sep 2013 15:23:55 +0800 Subject: [PATCH 0058/2573] Fix typos in BLPOP doc. --- commands/blpop.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/commands/blpop.md b/commands/blpop.md index 6213702bc2..7e9cbdb5a8 100644 --- a/commands/blpop.md +++ b/commands/blpop.md @@ -107,7 +107,7 @@ redis> BLPOP list1 list2 0 When `BLPOP` returns an element to the client, it also removes the element from the list. This means that the element only exists in the context of the client: if the client crashes while processing the returned element, it is lost forever. -This can be a problem with some application where we want a more reliable messaging system. When this is the case, please check the `BRPOPLPUSH` command, that is a variant of `BLPOP` that adds the returned element to a traget list before returing it to the client. +This can be a problem with some application where we want a more reliable messaging system. When this is the case, please check the `BRPOPLPUSH` command, that is a variant of `BLPOP` that adds the returned element to a target list before returning it to the client. ## Pattern: Event notification From 7b327da4837ffb5d913c20d90375e1a53425bd5e Mon Sep 17 00:00:00 2001 From: Lucas Chi Date: Tue, 1 Oct 2013 22:51:01 -0400 Subject: [PATCH 0059/2573] grammar fixes and rewording in persistence docs --- topics/persistence.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topics/persistence.md b/topics/persistence.md index 731236e61d..d9710e11fc 100644 --- a/topics/persistence.md +++ b/topics/persistence.md @@ -40,7 +40,7 @@ AOF disadvantages * AOF files are usually bigger than the equivalent RDB files for the same dataset. * AOF can be slower then RDB depending on the exact fsync policy. In general with fsync set to *every second* performances are still very high, and with fsync disabled it should be exactly as fast as RDB even under high load. Still RDB is able to provide more guarantees about the maximum latency even in the case of an huge write load. -* In the past we experienced rare bugs in specific commands (for instance there was one involving blocking commands like BRPOPLPUSH) causing the AOF produced to don't reproduce exactly the same dataset on reloading. This bugs are rare and we have tests in the test suite creating random complex datasets automatically and reloading them to check everything is ok, but this kind of bugs are almost impossible with RDB persistence. To make this point more clear: the Redis AOF works incrementally updating an existing state, like MySQL or MongoDB does, while the RDB snapshotting creates everything from scratch again and again, that is conceptually more robust. However 1) It should be noted that every time the AOF is rewritten by Redis it is recreated from scratch starting from the actual data contained in the data set, making resistance to bugs stronger compared to an always appending AOF file (or one rewritten reading the old AOF instead of reading the data in memory). 2) We never had a single report from users about an AOF corruption that was detected in the real world. +* In the past we've experienced rare bugs in specific commands (for instance there was one involving blocking commands like BRPOPLPUSH) causing the AOF to inaccurately reproduce the dataset on recovery. These bugs are rare and are almost impossible with RDB persistence. To make this point more clear: the Redis AOF works by incrementally updating an existing state, like MySQL or MongoDB, while RDB snapshotting is conceptually more robust because it recreates the snapshot from scratch each time. It should be noted that every time the AOF is rewritten it is recreated from scratch using the actual data contained in the dataset and therefore is more robust when compared to a perpetually appending AOF file (or one that is rewritten by reading the old AOF instead of reading the data in memory). To date, there has never been a single report from users about an AOF corruption that was detected in the real world. Ok, so what should I use? --- From bb259e1b7469722ed19741bf11bc59c33b0e801e Mon Sep 17 00:00:00 2001 From: Lucas Chi Date: Tue, 1 Oct 2013 23:05:13 -0400 Subject: [PATCH 0060/2573] update readme with parse task dependencies --- README.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/README.md b/README.md index 2ee41c8696..d51fb29f99 100644 --- a/README.md +++ b/README.md @@ -111,6 +111,15 @@ You can do this by running Rake inside your working directory. $ rake parse ``` +The parse task has the following dependencies: + +* batch +* rdiscount + +``` +gem install batch rdiscount +``` + Additionally, if you have [Aspell][han] installed, you can spell check the documentation: From 27e81e8c851f65fd11b7d6ba8bed082b14056801 Mon Sep 17 00:00:00 2001 From: "Seth W. Klein" Date: Wed, 2 Oct 2013 18:23:51 -0400 Subject: [PATCH 0061/2573] hoisie/redis.go moved to hoisie/redis --- clients.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/clients.json b/clients.json index 327a74c93c..7496c0aa4b 100644 --- a/clients.json +++ b/clients.json @@ -569,7 +569,7 @@ { "name": "redis.go", "language": "Go", - "repository": "https://github.com/hoisie/redis.go", + "repository": "https://github.com/hoisie/redis", "description": "", "authors": ["hoisie"] }, From 9d68bfcaaee3da9711a67a52d633168e1650b52d Mon Sep 17 00:00:00 2001 From: Damian Janowski Date: Fri, 4 Oct 2013 17:42:15 -0300 Subject: [PATCH 0062/2573] Typo. --- topics/config.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/topics/config.md b/topics/config.md index a8b2558b6d..cf18a9190e 100644 --- a/topics/config.md +++ b/topics/config.md @@ -2,7 +2,7 @@ Redis configuration === Redis is able to start without a configuration file using a built-in default -configuration, however this setup is only recommanded for testing and +configuration, however this setup is only recommended for testing and development purposes. The proper way to configure Redis is by providing a Redis configuration file, @@ -29,7 +29,7 @@ Redis distribution. * The self documented [redis.conf for Redis 2.6](https://raw.github.com/antirez/redis/2.6/redis.conf). * The self documented [redis.conf for Redis 2.4](https://raw.github.com/antirez/redis/2.4/redis.conf). -Passing arguments via command line +Passing arguments via the command line --- Since Redis 2.6 it is possible to also pass Redis configuration parameters From bfd275d38a1a2cb24755c378a5ded76ce5e7bd37 Mon Sep 17 00:00:00 2001 From: antirez Date: Tue, 8 Oct 2013 17:40:13 +0200 Subject: [PATCH 0063/2573] Cluster design document updated. --- topics/cluster-spec.md | 399 +++++++++++++++++++++++++++++------------ 1 file changed, 288 insertions(+), 111 deletions(-) diff --git a/topics/cluster-spec.md b/topics/cluster-spec.md index a218fbfa6d..d3389a5c4e 100644 --- a/topics/cluster-spec.md +++ b/topics/cluster-spec.md @@ -1,41 +1,17 @@ Redis cluster Specification (work in progress) === -Introduction +Redis Cluster goals --- -This document is a work in progress specification of Redis cluster. -The document is split into two parts. The first part documents what is already -implemented in the unstable branch of the Redis code base, the second part -documents what is still to be implemented. +Redis Cluster is a distributed implementation of Redis with the following goals, in order of importance in the design: -All the parts of this document may be modified in the future as result of a -design change in Redis cluster, but the part not yet implemented is more likely -to change than the part of the specification that is already implemented. +* High performance and linear scalability up to 1000 nodes. +* No merge operations in order to play well with the large values typical of the Redis data model. +* Write safety: the system tries to retain all the writes originating from clients connected with the majority of the nodes. However there are small windows where acknowledged writes can be lost. +* Availability: Redis Cluster is able to survive to partitions where the majority of the master nodes are reachable and there is at least a reachable salve for every master node that is no longer reachable. -The specification includes everything needed to write a client library, -however client libraries authors should be aware that it is possible for the -specification to change in the future in some detail. - -What is Redis cluster ---- - -Redis cluster is a distributed and fault tolerant implementation of a -subset of the features available in the Redis stand alone server. - -In Redis cluster there are no central or proxy nodes, and one of the -major design goals is linear scalability. - -Redis cluster sacrifices fault tolerance for consistency, so the system -tries to be as consistent as possible while guaranteeing limited resistance -to net splits and node failures (we consider node failures as -special cases of net splits). - -Fault tolerance is achieved using two different roles for nodes, that -can be either masters or slaves. Even if nodes are functionally the same -and run the same server implementation, slave nodes are not used if -not to replace lost master nodes. It is actually possible to use slave nodes -for read-only queries when read-after-write consistency is not required. +What is described in this document is implemented in the `unstable` branch of the Github Redis repository. Implemented subset --- @@ -43,17 +19,17 @@ Implemented subset Redis Cluster implements all the single keys commands available in the non distributed version of Redis. Commands performing complex multi key operations like Set type unions or intersections are not implemented, and in -general all the operations where in theory keys are not available in the -same node are not implemented. +general all the operations where keys are not available in the node processing +the command are not implemented. -In the future it is possible that using the MIGRATE COPY command users will +In the future it is possible that using the `MIGRATE COPY` command users will be able to use *Computation Nodes* to perform multi-key read only operations in the cluster, but it is not likely that the Redis Cluster itself will be able to perform complex multi key operations implementing some kind of transparent way to move keys around. Redis Cluster does not support multiple databases like the stand alone version -of Redis, there is just database 0, and SELECT is not allowed. +of Redis, there is just database 0, and `SELECT` is not allowed. Clients and Servers roles in the Redis cluster protocol --- @@ -79,6 +55,56 @@ getting redirected if needed, so the client is not required to take the state of the cluster. However clients that are able to cache the map between keys and nodes can improve the performance in a sensible way. +Write safety +--- + +Redis Cluster uses asynchronous replication between nodes, so there are always windows when it is possible to lose writes during partitions. However these windows are very different in the case of a client that is connected to the majority of masters, and a client that is connected to the minority of masters. + +Redis Cluster tries hard to retain all the writes that are performed by clients connected to the majority of masters, with two exceptions: + +1) A write may reach a master, but while the master may be able to reply to the client, the write may not be propagated to slaves via the asynchronous replication used between master and slave nodes. If the master dies without the write reaching the slaves, the write is lost forever in case the master is unreachable for a long enough period that one of its slaves is promoted. + +2) Another theoretically possible failure mode where writes are lost is the following: + +* A master is unreachable because of a partition. +* It gets failed over by one of its slaves. +* After some time it may be reachable again. +* A client with a not updated routing table may write to it before the master is converted to a slave (of the new master) by the cluster. + +Practically this is very unlikely to happen because nodes not able to reach the majority of other masters for enough time to be failed over, no longer accept writes, and when the partition is fixed writes are still refused for a small amount of time to allow other nodes to inform about configuration changes. All the nodes in general try to reach a node that joins the cluster again as fast as possible, using a non-blocking connection attempt and sending a ping packet (that is enough to upgrade the node configuration) as soon as there is a new link with the node. This makes unlikely that a node is not informed about configuration changes before it returns writable again. + +Redis Cluster loses a non trivial amount of writes on partitions where there is a minority of masters and at least one or more clients, since all the writes sent to the masters may potentially get lost if the masters are failed over in the majority side. + +For a master to be failed over, it must be not reachable by the majority of masters for at least `NODE_TIMEOUT`, so if the partition is fixed before that time, no write is lost. When the partition lasts for more than `NODE_TIMEOUT`, the minority side of the cluster will start refusing writes as soon as `NODE_TIMEOUT` time has elapsed, so there is a maximum window after which the minority becomes no longer available, hence no write is accepted and lost after that time. + +Availability +--- + +Redis Cluster is not available in the minority side of the partition. In the majority side of the partition assuming that there are at least the majority of masters and a slave for every unreachable master, the cluster returns available after `NODE_TIMEOUT` plus some more second required for a slave to get elected and failover its master. + +This means that Redis Cluster is designed to survive to failures of a few nodes in the cluster, but is not a suitable solution for applications that require availability in the event of large net splits. + +In the example of a cluster composed of N master nodes where every node has a single slave, the majority side of the cluster will remain available as soon as a single node is partitioned away, and will remain available with a probability of `1-(1/(N*2-1))` when two nodes are partitioned away (After the first node fails we are left with `N*2-1` nodes in total, and the probability of the only master without a replica to fail is `1/(N*2-1))`. + +For example in a cluster with 5 nodes and a single slave per node, there is a `1/(5*2-1) = 0.1111` probabilities that after two nodes are partitioned away from the majority, the cluster will no longer be available, that is about 11% of probabilities. + +Performance +--- + +In Redis Cluster nodes don't proxy commands to the right node in charge for a given key, but instead they redirect clients to the right nodes serving a given range of the key space. +Eventually clients obtain an up to date representation of the cluster and which node serves which subset of keys, so during normal operations clients directly contact the right nodes in order to send a given command. + +Because of the use of asynchronous replication, nodes does not wait for other nodes acknowledgement of writes (optional synchronous replication is a work in progress and will be likely added in future releases). + +Also, because of the restriction to the subset of commands that don't perform operations on multiple keys, data is never moved between nodes if not in case of resharding. + +So normal operations are handled exactly as in the case of a single Redis instance. This means that in a Redis Cluster with N master nodes you can expect the same performance as a single Redis instance multiplied by N as the design allows to scale linearly. At the same time the query is usually performed in a single round trip, since clients usually retain persistent connections with the nodes, so latency figures are also the same as the single stand alone Redis node case. + +Why merge operations are avoided +--- + +Redis Cluster design avoids conflicting versions of the same key-value pair in multiple nodes since in the case of the Redis data model this is not always desirable: values in Redis are often very large, it is common to see lists or sorted sets with millions of elements. Also data types are semantically complex. Transferring and merging these kind of values can be a major bottleneck. + Keys distribution model --- @@ -135,16 +161,16 @@ know: * The IP address and TCP port where the node is located. * A set of flags. * A set of hash slots served by the node. -* Last time we sent a PING packet using the cluster bus. -* Last time we received a PONG packet in reply. +* Last time we sent a ping packet using the cluster bus. +* Last time we received a pong packet in reply. * The time at which we flagged the node as failing. * The number of slaves of this node. * The master node ID, if this node is a slave (or 0000000... if it is a master). -Soem of this information is available using the `CLUSTER NODES` command that +Some of this information is available using the `CLUSTER NODES` command that can be sent to all the nodes in the cluster, both master and slave nodes. -The following is an example of output of CLUSTER NODES sent to a master +The following is an example of output of `CLUSTER NODES` sent to a master node in a small cluster of three nodes. $ redis-cli cluster nodes @@ -154,7 +180,16 @@ node in a small cluster of three nodes. In the above listing the different fields are in order: node id, address:port, flags, last ping sent, last pong received, link state, slots. -Nodes handshake (implemented) +Cluster topology +--- + +Redis cluster is a full mesh where every node is connected with every other node using a TCP connection. + +In a cluster of N nodes, every node has N-1 outgoing TCP connections, and N-1 incoming connections. + +These TCP connections are kept alive all the time and are not created on demand. + +Nodes handshake --- Nodes always accept connection in the cluster bus port, and even reply to @@ -164,15 +199,12 @@ is not considered part of the cluster. A node will accept another node as part of the cluster only in two ways: -* If a node will present itself with a MEET message. A meet message is exactly -like a PING message, but forces the receiver to accept the node as part of -the cluster. Nodes will send MEET messages to other nodes ONLY IF the system -administrator requests this via the following command: - +* If a node will present itself with a `MEET` message. A meet message is exactly +like a `PING` message, but forces the receiver to accept the node as part of +the cluster. Nodes will send `MEET` messages to other nodes **only if** the system administrator requests this via the following command: CLUSTER MEET ip port - * A node will also register another node as part of the cluster if a node that is already trusted will gossip about this other node. So if A knows B, and B knows C, eventually B will send gossip messages to A about C. When this happens, A will register C as part of the network, and will try to connect with C. This means that as long as we join nodes in any connected graph, they'll eventually form a fully connected graph automatically. This means that basically the cluster is able to auto-discover other nodes, but only if there is a trusted relationship that was forced by the system administrator. @@ -248,25 +280,25 @@ The following subcommands are available: * CLUSTER SETSLOT slot MIGRATING node * CLUSTER SETSLOT slot IMPORTING node -The first two commands, ADDSLOTS and DELSLOTS, are simply used to assign +The first two commands, `ADDSLOTS` and `DELSLOTS`, are simply used to assign (or remove) slots to a Redis node. After the hash slots are assigned they will propagate across all the cluster using the gossip protocol. -The ADDSLOTS command is usually used when a new cluster is configured +The `ADDSLOTS` command is usually used when a new cluster is configured from scratch to assign slots to all the nodes in a fast way. -The SETSLOT subcommand is used to assign a slot to a specific node ID if -the NODE form is used. Otherwise the slot can be set in the two special -states MIGRATING and IMPORTING: +The `SETSLOT` subcommand is used to assign a slot to a specific node ID if +the `NODE` form is used. Otherwise the slot can be set in the two special +states `MIGRATING` and `IMPORTING`: * When a slot is set as MIGRATING, the node will accept all the requests for queries that are about this hash slot, but only if the key in question -exists, otherwise the query is forwarded using a -ASK redirection to the +exists, otherwise the query is forwarded using a `-ASK` redirection to the node that is target of the migration. * When a slot is set as IMPORTING, the node will accept all the requests for queries that are about this hash slot, but only if the request is preceded by an ASKING command. Otherwise if not ASKING command was given by the client, the query is redirected to the real hash slot owner via -a -MOVED redirection error. +a `-MOVED` redirection error. At first this may appear strange, but now we'll make it more clear. Assume that we have two Redis nodes, called A and B. @@ -293,19 +325,19 @@ The above command will return `count` keys in the specified hash slot. For every key returned, redis-trib sends node A a `MIGRATE` command, that will migrate the specified key from A to B in an atomic way (both instances are locked for the time needed to migrate a key so there are no race -conditions). This is how MIGRATE works: +conditions). This is how `MIGRATE` works: MIGRATE target_host target_port key target_database id timeout -MIGRATE will connect to the target instance, send a serialized version of +`MIGRATE` will connect to the target instance, send a serialized version of the key, and once an OK code is received will delete the old key from its own dataset. So from the point of view of an external client a key either exists in A or B in a given time. In Redis cluster there is no need to specify a database other than 0, but -MIGRATE can be used for other tasks as well not involving Redis cluster so +`MIGRATE` can be used for other tasks as well not involving Redis cluster so it is a general enough command. -MIGRATE is optimized to be as fast as possible even when moving complex +`MIGRATE` is optimized to be as fast as possible even when moving complex keys such as long lists, but of course in Redis cluster reconfiguring the cluster where big keys are present is not considered a wise procedure if there are latency constraints in the application using the database. @@ -345,97 +377,242 @@ Note that however if a buggy client will perform the map earlier this is not a problem since it will not send the ASKING command before the query and B will redirect the client to A using a MOVED redirection error. -Clients implementations hints +Fault Tolerance +=== + +Nodes heartbeat and gossip messages --- -* TODO Pipelining: use MULTI/EXEC for pipelining. -* TODO Persistent connections to nodes. -* TODO hash slot guessing algorithm. +Nodes in the cluster exchange ping / pong packets. -Fault Tolerance -=== +Usually a node will ping a few random nodes every second so that the total number of ping packets send (and pong packets received) is a constant amount regardless of the number of nodes in the cluster. + +However every node makes sure to ping every other node that we don't either sent a ping or received a pong for longer than half the `NODE_TIMEOUT` time. Before `NODE_TIMEOUT` has elapsed, nodes also try to reconnect the TCP link with another node to make sure nodes are not believed to be unreachable only because there is a problem in the current TCP connection. + +The amount of messages exchanged can be bigger than O(N) if `NODE_TIMEOUT` is set to a small figure and the number of nodes (N) is very large, since every node will try to ping every other node for which we don't have fresh information for half the `NODE_TIMEOUT` time. + +For example in a 100 nodes cluster with a node timeout set to 60 seconds, every node will try to send 99 pings every 30 seconds, with a total amount of pings of 3.3 per second, that multiplied for 100 nodes is 330 pings per second in the total cluster. + +There are ways to use the gossip information already exchanged by Redis Cluster to reduce the amount of messages exchanged in a significant way. For example we may ping within half `NODE_TIMEOUT` only nodes that are already reported to be in "possible failure" state (see later) by other nodes, and ping the other nodes that are reported as working only in a best-effort way within the limit of the few packets per second. However in real-world tests large clusters with very small `NODE_TIMEOUT` settings used to work reliably so this change will be considered in the future as actual deployments of large clusters will be tested. + +Ping and Pong packets content +--- + +Ping and Pong packets contain an header that is common to all the kind of packets (for instance packets to request a vote), and a special Gossip Section that is specific of Ping and Pong packets. + +The common header has the following information: + +* Node ID, that is a 160 bit pseudorandom string that is assigned the first time a node is created and remains the same for all the life of a Redis Cluster node. +* The `currentEpoch` and `configEpoch` field, that are used in order to mount the distributed algorithms used by Redis Cluster (this is explained in details in the next sections). If the node is a slave the `configEpoch` is the last known `configEpoch` of the master. +* The node flags, indicating if the node is a slave, a master, and other single-bit node information. +* A bitmap of the hash slots served by a given node, or if the node is a slave, a bitmap of the slots served by its master. +* Port: the sender TCP base port (that is, the port used by Redis to accept client commands, add 10000 to this to obtain the cluster port). +* State: the state of the cluster from the point of view of the sender (down or ok). +* The master node ID, if this is a slave. + +Ping and pong packets contain a gossip section. This section offers to the receiver a view about what the sender node thinks about other nodes in the cluster. The gossip section only contains informations about a few random nodes among the known nodes set of the sender. + +For every node added in the gossip section the following fields are reported: + +* Node ID. +* IP and port of the node. +* Node flags. + +Gossip sections allow receiving nodes to get information about the state of other nodes from the point of view of the sender. This is useful both for failure detection and to discover other nodes in the cluster. -Node failure detection +Failure detection --- -Failure detection is implemented in the following way: +Redis Cluster failure detection is used to recognize when a master or slave node is no longer reachable by the majority of nodes, and as a result of this event, either promote a slave to the role of master, of when this is not possible, put the cluster in an error state to stop receiving queries from clients. -* A node marks another node setting the PFAIL flag (possible failure) if the node is not responding to our PING requests for a given time. This time is called the node timeout, and is a node-wise setting. -* Nodes broadcast information about other nodes (three random nodes per packet) when pinging other nodes. The gossip section contains information about other nodes flags, including the PFAIL and FAIL flags. -* Nodes remember if other nodes advertised some node as failing. This is called a failure report. -* Once a node (already considering a given other node in PFAIL state) receives enough failure reports, so that the majority of master nodes agree about the failure of a given node, the node is marked as FAIL. -* When a node is marked as FAIL, a message is broadcasted to the cluster in order to force all the reachable nodes to set the specified node as FAIL. +Every node takes a list of flags associated with other known nodes. There are two flags that are used for failure detection that are called `PFAIL` and `FAIL`. `PFAIL` means _Possible failure_, and is a non acknowledged failure type. `FAIL` means that a node is failing and that this condition was confirmed by a majority of masters in a fixed amount of time. -So basically a node is not able to mark another node as failing without external acknowledge, and the majority of the master nodes are required to agree. +**PFAIL flag:** -Old failure reports are removed, so the majority of master nodes need to have a recent entry in the failure report table of a given node for it to mark another node as FAIL. +A node flags another node with the `PFAIL` flag when the node is not reachable for more than `NODE_TIMEOUT` time. Both master and slave nodes can flag another node as `PFAIL`, regardless of its type. -The FAIL state is reversible in two cases: +The concept of non reachability for a Redis Cluster node is that we have an **active ping** (a ping that we sent for which we still have to get a reply) pending for more than `NODE_TIMEOUT`, so for this mechanism to work the `NODE_TIMEOUT` must be large compared to the network round trip time. In order to add reliability during normal operations, nodes will try to reconnect with other nodes in the cluster as soon as half of the `NODE_TIMEOUT` has elapsed without a reply to a ping. This mechanism ensures that connections are kept alive so broken connections should usually not result into false failure reports between nodes. -* If the FAIL state is set for a slave node, the FAIL state can be reversed if the slave is already reachable. There is no point in retaning the FAIL state for a slave node as it does not serve slots, and we want to make sure we have the chance to promote it to master if needed. -* If the FAIL state is set for a master node, and after four times the node timeout, plus 10 seconds, the slots are were still not failed over, and the node is reachable again, the FAIL state is reverted. +**FAIL flag:** -The rationale for the second case is that if the failover did not worked we want the cluster to continue to work if the master is back online, without any kind of user intervetion. +The `PFAIL` flag alone is just some local information every node has about other nodes, but it is not used in order to act and is not sufficient to trigger a slave promotion. For a node to be really considered down the `PFAIL` condition needs to be promoted to a `FAIL` condition. -Cluster state detection (partilly implemented) +As outlined in the node heartbeats section of this document, every node sends gossip messages to every other node including the state of a few random known nodes. So every node eventually receives the set of node flags for every other node. This way every node has a mechanism to signal other nodes about failure conditions they detected. + +This mechanism is used in order to escalate a `PFAIL` condition to a `FAIL` condition, when the following set of conditions are met: + +* Some node, that we'll call A, has another node B flagged as `PFAIL`. +* Node A collected, via gossip sections, informations about the state of B from the point of view of the majority of masters in the cluster. +* The majority of masters signaled the `PFAIL` or `PFAIL` condition within `NODE_TIMEOUT * FAIL_REPORT_VALIDITY_MULT` time. + +If all the above conditions are true, Node A will: + +* Mark the node as `FAIL`. +* Send a `FAIL` message to all the reachable nodes. + +The `FAIL` message will force every receiving node to mark the node in `FAIL` state. + +Note that *the FAIL flag is mostly one way*, that is, a node can go from `PFAIL` to `FAIL`, but for the `FAIL` flag to be cleared there are only two possibilities: + +* The node is already reachable, and it is a slave. In this case the `FAIL` flag can be cleared as slaves are not failed over. +* The node is already reachable, is a master, but a long time (N times the `NODE_TIMEOUT`) has elapsed without any detectable slave promotion. + +**While the `PFAIL` -> `FAIL` transition uses a form of agreement, the agreement used is weak:** + +1) Nodes collect views of other nodes during some time, so even if the majority of master nodes need to "agree", actually this is just state that we collected from different nodes at different times and we are not sure this state is stable. + +2) While every node detecting the `FAIL` condition will force that condition on other nodes in the cluster using the `FAIL` message, there is no way to ensure the message will reach all the nodes. For instance a node may detect the `FAIL` condition and because of a partition will not be able to reach any other node. + +However the Redis Cluster failure detection has a requirement: eventually all the nodes should agree about the state of a given node even in case of partitions. There are two cases that can originate from split brain conditions, either some minority of nodes believe the node is in `FAIL` state, or a minority of nodes believe the node is not in `FAIL` state. In both the cases eventually the cluster will have a single view of the state of a given node: + +**Case 1**: If an actual majority of masters flagged a node as `FAIL`, for the chain effect every other node will flag the master as `FAIL` eventually. + +**Case 2**: When only a minority of masters flagged a node as `FAIL`, the slave promotion will not happen (as it uses a more formal algorithm that makes sure everybody will know about the promotion eventually) and every node will clear the `FAIL` state for the `FAIL` state clearing rules above (no promotion after some time > of N times the `NODE_TIMEOUT`). + +**Basically the `FAIL` flag is only used as a trigger to run the safe part of the algorithm** for the slave promotion. In theory a slave may act independently and start a slave promotion when its master is not reachable, and wait for the masters to refuse the provide acknowledgement if the master is actually reachable by the majority. However the added complexity of the `PFAIL -> FAIL` state, the weak agreement, and the `FAIL` message to force the propagation of the state in the shortest amount of time in the reachable part of the cluster, have practical advantages. Because of this mechanisms usually all the nodes will stop accepting writes about at the same time if the cluster is in an error condition, that is a desirable feature from the point of view of applications using Redis Cluster. Also not necessary elections, initiated by slaves that can't reach its master that is otherwise reachable by the majority of the other master nodes, are avoided. + +Cluster epoch --- -Every cluster node scan the list of nodes every time a configuration change -happens in the cluster (this can be an update to an hash slot, or simply -a node that is now in a failure state). +Redis Cluster uses a concept similar to the Raft algorithm "term". In Redis Cluster the term is called epoch instead, and it is used in order to give an incremental version to events, so that when multiple nodes provide conflicting informaiton, it is possible for another node to understand which state is the most up to date. + +The `currentEpoch` is a 64 bit unsigned number. + +At node creation every Redis Cluster node, both slaves and master nodes, set the `currentEpoch` to 0. + +Every time a ping or pong is received from another node, if the epoch of the sender (part of the cluster bus messages header) is greater than the local node epoch, then `currentEpoch` is updated to the sender epoch. + +Because of this semantics eventually all the nodes will agree to the greater epoch in the cluster. + +The way this information is used is when the state is changed and a node seeks agreement in order to perform some action. + +Currently this happens only during slave promotion, as described in the next section. Basically the epoch is a logical clock for the cluster and dictates whatever a given information wins over one with a smaller epoch. -Once the configuration is processed the node enters one of the following states: +Config epoch +--- + +Every master always advertises its `configEpoch` in ping and pong packets along with a bitmap advertising the set of slots it serves. -* FAIL: the cluster can't work. When the node is in this state it will not serve queries at all and will return an error for every query. -* OK: the cluster can work as all the 16384 slots are served by nodes that are not flagged as FAIL. +The `configEpoch` is set to zero in masters when a new node is created. -This means that the Redis Cluster is designed to stop accepting queries once even a subset of the hash slots are not available for some time. +Slaves that are promoted to master because of a failover event instead have a `configEpoch` that is set to the value of the `currentEpoch` at the time the slave won the election in order to replace its failing master. -However there is a portion of time in which an hash slot can't be accessed correctly since the associated node is experiencing problems, but the node is still not marked as failing. In this range of time the cluster will only accept queries about a subset of the 16384 hash slots. +As explained in the next sections the `configEpoch` helps to resolve conflicts due to different nodes claiming diverging configurations (a condition that may happen after partitions). -The FAIL state for the cluster happens in two cases. +Slave nodes also advertise the `configEpoch` field in ping and pong packets, but in case of slaves the field represents the `configEpoch` of its master the last time they exchanged packets. This allows other instances to detect when a slave has an old configuration that needs to be updated (Master nodes will not grant votes to slaves with an old configuration). -* 1) If at least one hash slot is not served as the node serving it currently is in FAIL state. -* 2) If we are not able to reach the majority of masters (that is, if the majorify of masters are simply in PFAIL state, it is enough for the node to enter FAIL mode). +Every time the `configEpoch` changes for some known node, it is permanently stored in the nodes.conf file. -The second check is required because in order to mark a node from PFAIL to FAIL state, the majority of masters are required. However when we are not connected with the majority of masters it is impossible from our side of the net split to mark nodes as FAIL. However since we detect this condition we set the Cluster state in FAIL mode to stop serving queries. +When a node is restarted its `currentEpoch` is set to the greatest `configEpoch` of the known nodes. -Slave election +Slave election and promotion --- -Once a master node is in FAIL state, if one or more slaves exist for this master one should be promoted as a master and all the other slaves reconfigured to replicate with the new master. +Slave election and promotion is handled by slave nodes, with the help of master nodes that vote for the slave to promote. +A slave election happens when a master is in `FAIL` state from the point of view of at least one of its slaves that has the prerequisites in order to become a master. -The election of a slave is a task that is handled directly by the slaves of the failing master. The trigger is the following set of conditions: +In order for a slave to promote itself to master, it requires to start an election and win it. All the slaves for a given master can start an election if the master is in `FAIL` state, however only one slave will win the election and promote itself to master. -* A node is a slave of a master in FAIL state. +A slave starts an election when the following conditions are met: + +* The slave's master is in `FAIL` state. * The master was serving a non-zero number of slots. -* The slave's data is considered reliable, that is, from the point of view of the replication layer, the replication link has not been down for more than the configured node timeout multiplied for a given multiplication factor (see the `REDIS_CLUSTER_SLAVE_VALIDITY_MULT` define). +* The slave replication link was disconnected from the master for no longer than a given amount of time, in order to ensure to promote a slave with a reasonable data freshness. + +In order to be elected the first step for a slave is to increment its `currentEpoch` counter, and request votes from master instances. + +Votes are requested by the slave by broadcasting a `FAILOVER_AUTH_REQUEST` packet to every master node of the cluster. +Then it waits for replies to arrive for a maximum time of `NODE_TIMEOUT`. +Once a master voted for a given slave, replying positively with a `FAILOVER_AUTH_ACK`, it can no longer vote for another slave of the same master for a period of `NODE_TIMEOUT * 2`. In this period it will not be able to reply to other authorization requests at all. + +The slave will discard all the ACKs that are received having an epoch that is less than the `currentEpoch`, in order to never count as valid votes that are about a previous election. + +Once the slave receives ACKs from the majority of masters, it wins the election. +Otherwise if the majority is not reached within the period of the `NODE_TIMEOUT`, the election is aborted and a new one will be tried again after `NODE_TIMEOUT * 4`. + +A slave does not try to get elected as soon as the master is in `FAIL` state, but there is a little delay, that is computed as: + + DELAY = fixed_delay + (data_age - NODE_TIMEOUT) / 10 + random delay between 0 and 2000 milliseconds. + +The fixed delay ensures that we wait for the `FAIL` state to propagate across the cluster, otherwise the slave may try to get elected when the masters are still not away of the `FAIL` state, refusing to grant their vote. + +The `data_age / 10` figure is used in order to give an advantage to slaves with fresher data (disconnected from the master for a smaller period of time). +The random delay is used in order to add some non-determinism that makes less likely that multiple slaves start the election at the same time, a situation that may result into no slave winning the election, requiring another election that makes the cluster not available in the meantime. + +Once a slave wins the election, it starts advertising itself as master in ping and pong packets, providing the set of served slots with a `configEpoch` set to the `currentEpoch` at which the election was started. + +In order to speedup the reconfiguration of other nodes, a pong packet is broadcasted to all the nodes of the cluster (however nodes not currently reachable will eventually receive a ping or pong packet and will be reconfigured). + +The other nodes will detect that there is a new master serving the same slots served by the old master but with a greater `configEpoch`, and will upgrade the configuration. Slaves of the old master, or the failed over master that rejoins the cluster, will not just upgrade the configuration but will also configure to replicate from the new master. + +Masters reply to slave vote request +--- + +In the previous section it was discussed how slaves try to get elected, this section explains what happens from the point of view of a master that is requested to vote for a given slave. + +Masters receive requests for votes in form of `FAILOVER_AUTH_REQUEST` requests from slaves. + +For a vote to be granted the following conditions need to be met: + +* 1) A master only votes a single time for a given epoch, and refuses to vote for older epochs: every master has a lastVoteEpoch field and will refuse to vote again as long as the `currentEpoch` in the auth request packet is not greater than the lastVoteEpoch. When a master replies positively to an vote request, the lastVoteEpoch is updated accordingly. +* 2) A master votes for a slave only if the slave's master is flagged as `FAIL`. +* 3) Auth requests with a `currentEpoch` that is less than the master `currentEpoch` are ignored. Because of this the Master reply will always have the same `currentEpoch` as the auth request. If the same slave asks again to be voted, incrementing the `currentEpoch`, it is guaranteed that an old delayed reply from the master can not be accepted for the new vote. + +Example of the issue caused by not using this rule: + +Master `currentEpoch` is 5, lastVoteEpoch is 1 (this may happen after a few failed elections) + +* Slave `currentEpoch` is 3 +* Slave tries to be elected with epoch 4 (3+1), master replies with an ok with `currentEpoch` 5, however the reply is delayed. +* Slave tries to be elected again, with epoch 5 (4+1), the delayed reply reaches the master with `currentEpoch` 5, and is accepted as valid. + +* 4) Masters don't vote a slave of the same master before `NODE_TIMEOUT * 2` has elapsed since a slave of that master was already voted. This is not strictly required as it is not possible that two slaves win the election in the same epoch, but in practical terms it ensures that normally when a slave is elected it has plenty of time to inform the other slaves avoiding that another slave will try a new election. +* 5) Masters don't try to select the best slave in any way, simply if the slave's master is in `FAIL` state and the master did not voted in the current term, the positive vote is granted. +* 6) When a master refuses to vote for a given slave there is no negative response, the request is simply ignored. +* 7) Masters don't grant the vote to slaves sending a `configEpoch` that is less than any `configEpoch` in the master table for the slots claimed by the slave. Remember that the slave sends the `configEpoch` of its master, and the bitmap of the slots served by its master. What this means is basically that the slave requesting the vote must have a configuration, for the slots it wants to failover, that is newer or equal the one of the master granting the vote. + +Race conditions during slaves election +--- + +This section illustrates how the concept of epoch is used to make the slave promotion process more resistant to partitions. + +* A master is no longer reachable indefinitely. The master has three slaves A, B, C. +* Slave A wins the election and is promoted as master. +* A partition makes A not available for the majority of the cluster. +* Slave B wins the election and is promoted as master. +* A partition makes B not available for the majority of the cluster. +* The previous partition is fixed, and A is available again. + +At this point B is down, and A is available again and will compete with C that will try to get elected in order to fail over B. + +Both will eventually claim to be promoted slaves for the same set of hash slots, however the `configEpoch` they publish will be different, and the C epoch will be greater, so all the other nodes will upgrade their configuration to C. + +A itself will detect pings from C serving the same slots with a greater epoch and will reconfigure as a slave of C. + +Rules for server slots information propagation +--- + +An important part of Redis Cluster is the mechanism used to propagate the information about which cluster node is serving a given set of hash slots. This is vital to both the startup of a fresh cluster and the ability to upgrade the configuration after a slave was promoted to serve the slots of its failing master. + +Ping and Pong packets that instances continuously exchange contain an header that is used by the sender in oder to advertise the hash slots it claims to be responsible for. This is the main mechanism used in order to propagate change, with the exception of a manual reconfiguration operated by the cluster administrator (for example a manual resharding via redis-trib in order to move hash slots among masters). -If all the above conditions are true, the slave starts requesting the -authorization to be promoted to master to all the reachable masters. +When a new Redis Cluster node is created, its local slot table, that maps a given hash slot with a given node ID, is initialized so that every hash slot is assigned to nill, that is, the hash slot is unassigned. -A master will reply with a positive message `FAILOVER_AUTH_GRANTED` if the sender of the message has the following properties: +The first rule followed by a node in order to update its hash slot table is the following: -* Is a slave, and the master is indeed in FAIL state. -* Ordering all the slaves for this master, it has the lowest Node ID. -* It appears to be up and running (no FAIL or PFAIL state). +**Rule 1: If an hash slot is unassigned, and a known node claims it, I'll modify my hash slot table to associate the hash slot to this node.** -Once the slave receives the authorization from the majority of the masters within a certain amount of time, it starts the failover process performing the following tasks: +Because of this rule, when a new cluster is created, it is only needed to manually assign (using the `CLUSTER` command, usually via the redis-trib command line tool) the slots served by each master node to the node itself, and the information will rapidly propagate across the cluster. -* Starts advertising itself as a master (via PONG packets). -* Starts advertising it is a promoted slave (via PONG packets). -* Starts claiming all the slots that were served by the old master. -* A PONG packet is broadcasted to all the nodes to speedup the proccess, without waiting for the usual PING/PONG period. +However this rule is not enough when a configuration update happens because of a slave gets promoted to master after a master failure. The new master instance will advertise the slots previously served by the old slave, but those slots are not unassigned from the point of view of the other nodes, that will not upgrade the configuration if they just follow the first rule. -All the other nodes will update the configuration accordingly. Specifically: +For this reason there is a second rule that is used in order to rebind an hash slot already assigned to a previous node to a new node claiming it. The rule is the following: -* All the slots claimed by the new master will be updated, since they are currently claimed by a master in FAIL state. -* All the other slaves of the old master will detect the PROMOTED flag and will switch the replication to the new master. -* If the old master will return back again, will detect the PROMOTED flag and will configure itself as a slave of the new master. +**Rule 2: If an hash slot is already assigned, and a known node is advertising it using a `configEpoch` that is greater than the `configEpoch` advertised by the current owner of the slot, I'll rebind the hash slot to the new node.** -The PROMOTED flag will be lost by a node when it is turned again into a slave for some reason during the life of the cluster. +Because of the second rule eventually all the nodes in the cluster will agree that the owner of a slot is the one with the greatest `configEpoch` among the nodes advertising it. -Publish/Subscribe (implemented, but to refine) +Publish/Subscribe === In a Redis Cluster clients can subscribe to every node, and can also From f10af03633762961af88cecafd8e99b77e502270 Mon Sep 17 00:00:00 2001 From: Alexandre Curreli Date: Thu, 10 Oct 2013 11:41:30 -0400 Subject: [PATCH 0064/2573] Added scredis to Scala clients --- clients.json | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/clients.json b/clients.json index 327a74c93c..1c6ae47be7 100644 --- a/clients.json +++ b/clients.json @@ -669,5 +669,13 @@ "repository": "https://github.com/ctstone/csredis", "description": "Async (and sync) client for Redis and Sentinel", "authors": ["ctnstone"] + }, + + { + "name": "scredis", + "language": "Scala", + "repository": "https://github.com/Livestream/scredis", + "description": "Advanced async (and sync) client entirely written in Scala. Extensively used in production at http://www.livestream.com", + "authors": ["Livestream"] } ] From 97b55db50b340b7f3c975cd9e59f615aae795961 Mon Sep 17 00:00:00 2001 From: Igor Malinovskiy Date: Tue, 22 Oct 2013 19:52:35 +0300 Subject: [PATCH 0065/2573] Added tool - Redis Desktop Manager --- tools.json | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/tools.json b/tools.json index f1cdee7057..c59676a4cf 100644 --- a/tools.json +++ b/tools.json @@ -288,5 +288,13 @@ "repository" : "http://github.com/bradvoth/redis-tcl", "description" : "Tcl library largely copied from the redis test tree, modified for minor bug fixes and expanded pub/sub capabilities", "authors" : ["bradvoth","antirez"] + }, + { + "name": "Redis Desktop Manager", + "language": "C++", + "url": "http://www.springsource.org/spring-data/redis", + "repository": "https://github.com/uglide/RedisDesktopManager", + "description": "Cross-platform desktop GUI management tool for Redis", + "authors": ["u_glide"] } ] From bc4cf8dd95fc32119d4f4cfb0c72e4301a3a8f3f Mon Sep 17 00:00:00 2001 From: Hugo Lopes Tavares Date: Wed, 23 Oct 2013 11:51:29 -0400 Subject: [PATCH 0066/2573] Add note to restart redis after `vm.overcommit_memory` changes Redis needs to be restarted after a sysctl `vm.overcommit_memory` change to take effect. --- topics/admin.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topics/admin.md b/topics/admin.md index 1baa94adf9..98a2da2fe0 100644 --- a/topics/admin.md +++ b/topics/admin.md @@ -8,7 +8,7 @@ Redis setup hints ----------------- + We suggest deploying Redis using the **Linux operating system**. Redis is also tested heavily on osx, and tested from time to time on FreeBSD and OpenBSD systems. However Linux is where we do all the major stress testing, and where most production deployments are working. -+ Make sure to set the Linux kernel **overcommit memory setting to 1**. Add `vm.overcommit_memory = 1` to `/etc/sysctl.conf` and then reboot or run the command `sysctl vm.overcommit_memory=1` for this to take effect immediately. ++ Make sure to set the Linux kernel **overcommit memory setting to 1**. Add `vm.overcommit_memory = 1` to `/etc/sysctl.conf` and then reboot the machine, or run the command `sysctl vm.overcommit_memory=1` and restart Redis for this to take effect immediately. + Make sure to **setup some swap** in your system (we suggest as much as swap as memory). If Linux does not have swap and your Redis instance accidentally consumes too much memory, either Redis will crash for out of memory or the Linux kernel OOM killer will kill the Redis process. + If you are using Redis in a very write-heavy application, while saving an RDB file on disk or rewriting the AOF log **Redis may use up to 2 times the memory normally used**. The additional memory used is proportional to the number of memory pages modified by writes during the saving process, so it is often proportional to the number of keys (or aggregate types items) touched during this time. Make sure to size your memory accordingly. + Even if you have persistence disabled, Redis will need to perform RDB saves if you use replication. From 3ad190078e02a13ea6f7fd2a9b8cdd735be19496 Mon Sep 17 00:00:00 2001 From: Igor Malinovskiy Date: Fri, 25 Oct 2013 16:52:23 +0300 Subject: [PATCH 0067/2573] Fixed url of Redis Desktop Manager --- tools.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools.json b/tools.json index c59676a4cf..038ef20f32 100644 --- a/tools.json +++ b/tools.json @@ -292,7 +292,7 @@ { "name": "Redis Desktop Manager", "language": "C++", - "url": "http://www.springsource.org/spring-data/redis", + "url": "http://redisdesktop.com", "repository": "https://github.com/uglide/RedisDesktopManager", "description": "Cross-platform desktop GUI management tool for Redis", "authors": ["u_glide"] From 67041891c4081292b3da2ff6287c8b7f687d2453 Mon Sep 17 00:00:00 2001 From: antirez Date: Thu, 31 Oct 2013 12:24:27 +0100 Subject: [PATCH 0068/2573] SCAN, SSCAN, HSCAN, ZSCAN added to commands.json. --- commands.json | 108 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 108 insertions(+) diff --git a/commands.json b/commands.json index 32aecc31a6..0c78e79e9d 100644 --- a/commands.json +++ b/commands.json @@ -2249,5 +2249,113 @@ ], "since": "2.0.0", "group": "sorted_set" + }, + "SCAN": { + "summary": "Incrementally iterate the keys space", + "complexity": "O(1) for every call. O(N) for a complete iteration, including enough command calls for the cursor to return back to 0. N is the number of elements inside the collection..", + "arguments": [ + { + "name": "cursor", + "type": "integer" + }, + { + "command": "MATCH", + "name": "pattern", + "type": "pattern", + "optional": true + }, + { + "command": "COUNT", + "name": "count", + "type": "integer", + "optional": true + } + ], + "since": "2.8.0", + "group": "generic" + }, + "SSCAN": { + "summary": "Incrementally iterate Set elements", + "complexity": "O(1) for every call. O(N) for a complete iteration, including enough command calls for the cursor to return back to 0. N is the number of elements inside the collection..", + "arguments": [ + { + "name": "key", + "type": "key" + }, + { + "name": "cursor", + "type": "integer" + }, + { + "command": "MATCH", + "name": "pattern", + "type": "pattern", + "optional": true + }, + { + "command": "COUNT", + "name": "count", + "type": "integer", + "optional": true + } + ], + "since": "2.8.0", + "group": "set" + }, + "HSCAN": { + "summary": "Incrementally iterate hash fields and associated values", + "complexity": "O(1) for every call. O(N) for a complete iteration, including enough command calls for the cursor to return back to 0. N is the number of elements inside the collection..", + "arguments": [ + { + "name": "key", + "type": "key" + }, + { + "name": "cursor", + "type": "integer" + }, + { + "command": "MATCH", + "name": "pattern", + "type": "pattern", + "optional": true + }, + { + "command": "COUNT", + "name": "count", + "type": "integer", + "optional": true + } + ], + "since": "2.8.0", + "group": "hash" + }, + "ZSCAN": { + "summary": "Incrementally iterate sorted sets elements and associated scores", + "complexity": "O(1) for every call. O(N) for a complete iteration, including enough command calls for the cursor to return back to 0. N is the number of elements inside the collection..", + "arguments": [ + { + "name": "key", + "type": "key" + }, + { + "name": "cursor", + "type": "integer" + }, + { + "command": "MATCH", + "name": "pattern", + "type": "pattern", + "optional": true + }, + { + "command": "COUNT", + "name": "count", + "type": "integer", + "optional": true + } + ], + "since": "2.8.0", + "group": "sorted_set" } } From 9bd2ef8c24e693c2b1d36726e380be66432e2851 Mon Sep 17 00:00:00 2001 From: antirez Date: Thu, 31 Oct 2013 17:08:28 +0100 Subject: [PATCH 0069/2573] SCAN documentation. --- commands/scan.md | 186 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 186 insertions(+) create mode 100644 commands/scan.md diff --git a/commands/scan.md b/commands/scan.md new file mode 100644 index 0000000000..e23ca1fe63 --- /dev/null +++ b/commands/scan.md @@ -0,0 +1,186 @@ +The [SCAN] command and the closely related commands [SSCAN], [HSCAN] and [ZSCAN] are used in order to incrementally iterate over a collection of elements. + +* [SCAN] iterates the set of keys in the currently selected Redis database. +* [SSCAN] iterates elements of Sets types. +* [HSCAN] iterates fields of Hash types and their associated values. +* [ZSCAN] iterates elements of Sorted Set types and their associated scores. + +Since these commands allow for incremental iteration, that means that only a small number of elements are returned at every call, they can be used in production and are very fast commands, without the downside of commands like [KEYS] or [SMEMBERS] that may block the server for a long time (even several seconds) when called against big collections of keys or elements. + +However while blocking commands like [SMEMBERS] are able to provide all the elements that are part of a Set in a given moment, The SCAN family of commands only offer limited guarantees about the returned elements since the collection that we incrementally iterate can change during the iteration process. + +Note that [SCAN], [SSCAN], [HSCAN] and [ZSCAN] all work very similarly, so this documentation covers all the four commands. However an obvious difference is that in the case of [SSCAN], [HSCAN] and [ZSCAN] the first argument is the name of the key holding the Set, Hash or Sorted Set value. The [SCAN] command does not need any key name argument as it iterates keys in the current database, so the iterated object is the database itself. + +## SCAN basic usage + +SCAN is a cursor based iteration. This means that at every call of the command, the server returns an updated cursor that the user needs to use as the cursor argument in the next call. + +An iteration starts when the cursor is set to 0, and terminates when the cursor returned by the server is 0. The following is an example of SCAN iteration: + +``` +redis 127.0.0.1:6379> scan 0 +1) "17" +2) 1) "key:12" + 2) "key:8" + 3) "key:4" + 4) "key:14" + 5) "key:16" + 6) "key:17" + 7) "key:15" + 8) "key:10" + 9) "key:3" + 10) "key:7" + 11) "key:1" +redis 127.0.0.1:6379> scan 17 +1) "0" +2) 1) "key:5" + 2) "key:18" + 3) "key:0" + 4) "key:2" + 5) "key:19" + 6) "key:13" + 7) "key:6" + 8) "key:9" + 9) "key:11" +``` + +In the example above, the first call uses zero as a cursor, to start the iteration. The second call uses the cursor returned by the previous call as the first element of the reply, that is, 17. + +As you can see the **SCAN return value** is an array of two values: the first value is the new cursor to use in the next call, the second value is an array of elements. + +Since in the second call the returned cursor is 0, the server signaled to the caller that the iteration finished, and the collection was completely explored. Starting an iteration with a cursor value of 0, and calling [SCAN] until the returned cursor is 0 again is called a **full iteration**. + +## Scan guarantees + +The [SCAN] command, and the other commands in the [SCAN] family, are able to provide to the user a set of guarantees associated to full iterations. + +* A full iteration always retrieves all the elements that were present in the collection from the start to the end of a full iteration. This means that if a given element is inside the collection when an iteration is started, and is still there when an iteration terminates, then at some point [SCAN] returned it to the user. +* A full iteration never returns any element that was NOT present in the collection from the start to the end of a full iteration. So if an element was removed before the start of an iteration, and is never added back to the collection for all the time an iteration lasts, [SCAN] ensures that this element will never be returned. + +However because [SCAN] has very little state associated (just the cursor) it has the following drawbacks: + +* A given element may be returned multiple times. It is up to the application to handle the case of duplicated elements, for example only using the returned elements in order to perform operations that are safe when re-applied multiple times. +* Elements that were not constantly present in the collection during a full iteration, may be returned or not: it is undefined. + +## Number of elements returned at every SCAN call + +[SCAN] family functions do not guarantee that the number of elements returned per call are in a given range. The commands are also allowed to return zero elements, and the client should not consider the iteration complete as long as the returned cursor is not zero. + +However the number of returned elements is reasonable, that is, in practical terms SCAN may return a maximum number of elements in the order of a few tens of elements when iterating a large collection, or may return all the elements of the collection in a single call when the iterated collection is small enough to be internally represented as an encoded data structure (this happens for small sets, hashes and sorted sets). + +However there is a way for the user to tune the order of magnitude of the number of returned elements per call using the **COUNT** option. + +## The COUNT option + +While [SCAN] does not provide guarantees about the number of elements returned at every iteration, it is possible to empirically adjust the behavior of [SCAN] using the **COUNT** option. Basically with COUNT the user specified the *amount of work that should be done at every call in order to retrieve elements from the collection*. This is **just an hint** for the implementation, however generally speaking this is what you could expect most of the times from the implementation. + +* The default COUNT value is 10. +* When iterating the key space, or a Set, Hash or Sorted Set that is big enough to be represented by an hash table, assuming no **MATCH** option is used, the server will usually return *count* or a bit more than *count* elements per call. +* When iterating Sets encoded as intsets (small sets composed of just integers), or Hashes and Sorted Sets encoded as ziplists (small hashes and sets composed of small individual values), usually all the elements are returned in the first [SCAN] call regardless of the COUNT value. + +Important: **there is no need to use the same COUNT value** for every iteration. The caller is free to change the count from one iteration to the other as required, as long as the cursor passed in the next call is the one obtained in the previous call to the command. + +## The MATCH option + +It is possible to only iterate elements matching a given glob-style pattern, similarly to the behavior of the [KEYS] command that takes a pattern as only argument. + +To do so, just append the `MATCH ` arguments at the end of the [SCAN] command (it works with all the SCAN family commands). + +This is an example of iteration using **MATCH**: + +``` +redis 127.0.0.1:6379> sadd myset 1 2 3 foo foobar feelsgood +(integer) 6 +redis 127.0.0.1:6379> sscan myset 0 match f* +1) "0" +2) 1) "foo" + 2) "feelsgood" + 3) "foobar" +redis 127.0.0.1:6379> +``` + +It is important to note that the **MATCH** filter is applied after elements are retrieved from the collection, just before returning data to the client. This means that if the pattern matches very little elements inside the collection, [SCAN] will likely return no elements in most iterations. An example is shown below: + +``` +redis 127.0.0.1:6379> scan 0 MATCH *11* +1) "288" +2) 1) "key:911" +redis 127.0.0.1:6379> scan 288 MATCH *11* +1) "224" +2) (empty list or set) +redis 127.0.0.1:6379> scan 224 MATCH *11* +1) "80" +2) (empty list or set) +redis 127.0.0.1:6379> scan 80 MATCH *11* +1) "176" +2) (empty list or set) +redis 127.0.0.1:6379> scan 176 MATCH *11* COUNT 1000 +1) "0" +2) 1) "key:611" + 2) "key:711" + 3) "key:118" + 4) "key:117" + 5) "key:311" + 6) "key:112" + 7) "key:111" + 8) "key:110" + 9) "key:113" + 10) "key:211" + 11) "key:411" + 12) "key:115" + 13) "key:116" + 14) "key:114" + 15) "key:119" + 16) "key:811" + 17) "key:511" + 18) "key:11" +redis 127.0.0.1:6379> +``` + +As you can see most of the calls returned zero elements, but the last call where a COUNT of 1000 was used in order to force the command to do more scanning for that iteration. + +## Multiple parallel iterations + +It is possible to an infinite number of clients to iterate the same collection at the same time, as the full state of the iterator is in the cursor, that is obtained and returned to the client at every call. So server side no state is taken. + +## Terminating iterations in the middle + +Since there is no state server side, but the full state is captured by the cursor, the caller is free to terminate an iteration half-way without signaling this to the server in any way. An infinite number of iterations can be started and never terminated without any issue. + +## Calling SCAN with a corrupted cursor + +Calling [SCAN] with a broken, negative, out of range, or otherwise invalid cursor, will result into undefined behavior but never into a crash. What will be undefined is that the guarantees about the returned elements can no longer be ensured by the [SCAN] implementation. + +The only valid cursors to use are: +* The cursor value of 0 when starting an iteration. +* The cursor returned by the previous call to SCAN in order to continue the iteration. + +## Guarantee of termination + +The [SCAN] algorithm is guaranteed to terminate only if the size of the iterated collection remains bounded to a given maximum size, otherwise iterating a collection that always grows may result into [SCAN] to never terminate a full iteration. + +This is easy to see intuitively: if the collection grows there is more and more work to do in order to visit all the possible elements, and the ability to terminate the iteration depends on the number of calls to [SCAN] and its COUNT option value compared with the rate at which the collection grows. + +## Return value + +[SCAN], [SSCAN], [HSCAN] and [ZSCAN] return a two elements multi-bulk reply, where the first element is a string representing an unsigned 64 bit number (the cursor), and the second element is a multi-bulk with an array of elements. + +* [SCAN] array of elements is a list of keys. +* [SSCAN] array of elements is a list of Set members. +* [HSCAN] array of elements contain two elements, a field and a value, for every returned element of the Hash. +* [ZSCAN] array of elements contain two elements, a member and its associated score, for every returned element of the sorted set. + +## Additional examples + +Iteration of an Hash value. + +``` +redis 127.0.0.1:6379> hmset hash name Jack age 33 +OK +redis 127.0.0.1:6379> hscan hash 0 +1) "0" +2) 1) "name" + 2) "Jack" + 3) "age" + 4) "33" +``` From 8b1068b5e972756b9120519cff24225ba9b73167 Mon Sep 17 00:00:00 2001 From: antirez Date: Thu, 31 Oct 2013 17:12:34 +0100 Subject: [PATCH 0070/2573] SCAN doc markup fix. --- commands/scan.md | 48 ++++++++++++++++++++++++------------------------ 1 file changed, 24 insertions(+), 24 deletions(-) diff --git a/commands/scan.md b/commands/scan.md index e23ca1fe63..ef7ab60c91 100644 --- a/commands/scan.md +++ b/commands/scan.md @@ -1,15 +1,15 @@ -The [SCAN] command and the closely related commands [SSCAN], [HSCAN] and [ZSCAN] are used in order to incrementally iterate over a collection of elements. +The `SCAN` command and the closely related commands `SSCAN`, `HSCAN` and `ZSCAN` are used in order to incrementally iterate over a collection of elements. -* [SCAN] iterates the set of keys in the currently selected Redis database. -* [SSCAN] iterates elements of Sets types. -* [HSCAN] iterates fields of Hash types and their associated values. -* [ZSCAN] iterates elements of Sorted Set types and their associated scores. +* `SCAN` iterates the set of keys in the currently selected Redis database. +* `SSCAN` iterates elements of Sets types. +* `HSCAN` iterates fields of Hash types and their associated values. +* `ZSCAN` iterates elements of Sorted Set types and their associated scores. Since these commands allow for incremental iteration, that means that only a small number of elements are returned at every call, they can be used in production and are very fast commands, without the downside of commands like [KEYS] or [SMEMBERS] that may block the server for a long time (even several seconds) when called against big collections of keys or elements. However while blocking commands like [SMEMBERS] are able to provide all the elements that are part of a Set in a given moment, The SCAN family of commands only offer limited guarantees about the returned elements since the collection that we incrementally iterate can change during the iteration process. -Note that [SCAN], [SSCAN], [HSCAN] and [ZSCAN] all work very similarly, so this documentation covers all the four commands. However an obvious difference is that in the case of [SSCAN], [HSCAN] and [ZSCAN] the first argument is the name of the key holding the Set, Hash or Sorted Set value. The [SCAN] command does not need any key name argument as it iterates keys in the current database, so the iterated object is the database itself. +Note that `SCAN`, `SSCAN`, `HSCAN` and `ZSCAN` all work very similarly, so this documentation covers all the four commands. However an obvious difference is that in the case of `SSCAN`, `HSCAN` and `ZSCAN` the first argument is the name of the key holding the Set, Hash or Sorted Set value. The `SCAN` command does not need any key name argument as it iterates keys in the current database, so the iterated object is the database itself. ## SCAN basic usage @@ -48,23 +48,23 @@ In the example above, the first call uses zero as a cursor, to start the iterati As you can see the **SCAN return value** is an array of two values: the first value is the new cursor to use in the next call, the second value is an array of elements. -Since in the second call the returned cursor is 0, the server signaled to the caller that the iteration finished, and the collection was completely explored. Starting an iteration with a cursor value of 0, and calling [SCAN] until the returned cursor is 0 again is called a **full iteration**. +Since in the second call the returned cursor is 0, the server signaled to the caller that the iteration finished, and the collection was completely explored. Starting an iteration with a cursor value of 0, and calling `SCAN` until the returned cursor is 0 again is called a **full iteration**. ## Scan guarantees -The [SCAN] command, and the other commands in the [SCAN] family, are able to provide to the user a set of guarantees associated to full iterations. +The `SCAN` command, and the other commands in the `SCAN` family, are able to provide to the user a set of guarantees associated to full iterations. -* A full iteration always retrieves all the elements that were present in the collection from the start to the end of a full iteration. This means that if a given element is inside the collection when an iteration is started, and is still there when an iteration terminates, then at some point [SCAN] returned it to the user. -* A full iteration never returns any element that was NOT present in the collection from the start to the end of a full iteration. So if an element was removed before the start of an iteration, and is never added back to the collection for all the time an iteration lasts, [SCAN] ensures that this element will never be returned. +* A full iteration always retrieves all the elements that were present in the collection from the start to the end of a full iteration. This means that if a given element is inside the collection when an iteration is started, and is still there when an iteration terminates, then at some point `SCAN` returned it to the user. +* A full iteration never returns any element that was NOT present in the collection from the start to the end of a full iteration. So if an element was removed before the start of an iteration, and is never added back to the collection for all the time an iteration lasts, `SCAN` ensures that this element will never be returned. -However because [SCAN] has very little state associated (just the cursor) it has the following drawbacks: +However because `SCAN` has very little state associated (just the cursor) it has the following drawbacks: * A given element may be returned multiple times. It is up to the application to handle the case of duplicated elements, for example only using the returned elements in order to perform operations that are safe when re-applied multiple times. * Elements that were not constantly present in the collection during a full iteration, may be returned or not: it is undefined. ## Number of elements returned at every SCAN call -[SCAN] family functions do not guarantee that the number of elements returned per call are in a given range. The commands are also allowed to return zero elements, and the client should not consider the iteration complete as long as the returned cursor is not zero. +`SCAN` family functions do not guarantee that the number of elements returned per call are in a given range. The commands are also allowed to return zero elements, and the client should not consider the iteration complete as long as the returned cursor is not zero. However the number of returned elements is reasonable, that is, in practical terms SCAN may return a maximum number of elements in the order of a few tens of elements when iterating a large collection, or may return all the elements of the collection in a single call when the iterated collection is small enough to be internally represented as an encoded data structure (this happens for small sets, hashes and sorted sets). @@ -72,11 +72,11 @@ However there is a way for the user to tune the order of magnitude of the number ## The COUNT option -While [SCAN] does not provide guarantees about the number of elements returned at every iteration, it is possible to empirically adjust the behavior of [SCAN] using the **COUNT** option. Basically with COUNT the user specified the *amount of work that should be done at every call in order to retrieve elements from the collection*. This is **just an hint** for the implementation, however generally speaking this is what you could expect most of the times from the implementation. +While `SCAN` does not provide guarantees about the number of elements returned at every iteration, it is possible to empirically adjust the behavior of `SCAN` using the **COUNT** option. Basically with COUNT the user specified the *amount of work that should be done at every call in order to retrieve elements from the collection*. This is **just an hint** for the implementation, however generally speaking this is what you could expect most of the times from the implementation. * The default COUNT value is 10. * When iterating the key space, or a Set, Hash or Sorted Set that is big enough to be represented by an hash table, assuming no **MATCH** option is used, the server will usually return *count* or a bit more than *count* elements per call. -* When iterating Sets encoded as intsets (small sets composed of just integers), or Hashes and Sorted Sets encoded as ziplists (small hashes and sets composed of small individual values), usually all the elements are returned in the first [SCAN] call regardless of the COUNT value. +* When iterating Sets encoded as intsets (small sets composed of just integers), or Hashes and Sorted Sets encoded as ziplists (small hashes and sets composed of small individual values), usually all the elements are returned in the first `SCAN` call regardless of the COUNT value. Important: **there is no need to use the same COUNT value** for every iteration. The caller is free to change the count from one iteration to the other as required, as long as the cursor passed in the next call is the one obtained in the previous call to the command. @@ -84,7 +84,7 @@ Important: **there is no need to use the same COUNT value** for every iteration. It is possible to only iterate elements matching a given glob-style pattern, similarly to the behavior of the [KEYS] command that takes a pattern as only argument. -To do so, just append the `MATCH ` arguments at the end of the [SCAN] command (it works with all the SCAN family commands). +To do so, just append the `MATCH ` arguments at the end of the `SCAN` command (it works with all the SCAN family commands). This is an example of iteration using **MATCH**: @@ -99,7 +99,7 @@ redis 127.0.0.1:6379> sscan myset 0 match f* redis 127.0.0.1:6379> ``` -It is important to note that the **MATCH** filter is applied after elements are retrieved from the collection, just before returning data to the client. This means that if the pattern matches very little elements inside the collection, [SCAN] will likely return no elements in most iterations. An example is shown below: +It is important to note that the **MATCH** filter is applied after elements are retrieved from the collection, just before returning data to the client. This means that if the pattern matches very little elements inside the collection, `SCAN` will likely return no elements in most iterations. An example is shown below: ``` redis 127.0.0.1:6379> scan 0 MATCH *11* @@ -149,7 +149,7 @@ Since there is no state server side, but the full state is captured by the curso ## Calling SCAN with a corrupted cursor -Calling [SCAN] with a broken, negative, out of range, or otherwise invalid cursor, will result into undefined behavior but never into a crash. What will be undefined is that the guarantees about the returned elements can no longer be ensured by the [SCAN] implementation. +Calling `SCAN` with a broken, negative, out of range, or otherwise invalid cursor, will result into undefined behavior but never into a crash. What will be undefined is that the guarantees about the returned elements can no longer be ensured by the `SCAN` implementation. The only valid cursors to use are: * The cursor value of 0 when starting an iteration. @@ -157,18 +157,18 @@ The only valid cursors to use are: ## Guarantee of termination -The [SCAN] algorithm is guaranteed to terminate only if the size of the iterated collection remains bounded to a given maximum size, otherwise iterating a collection that always grows may result into [SCAN] to never terminate a full iteration. +The `SCAN` algorithm is guaranteed to terminate only if the size of the iterated collection remains bounded to a given maximum size, otherwise iterating a collection that always grows may result into `SCAN` to never terminate a full iteration. -This is easy to see intuitively: if the collection grows there is more and more work to do in order to visit all the possible elements, and the ability to terminate the iteration depends on the number of calls to [SCAN] and its COUNT option value compared with the rate at which the collection grows. +This is easy to see intuitively: if the collection grows there is more and more work to do in order to visit all the possible elements, and the ability to terminate the iteration depends on the number of calls to `SCAN` and its COUNT option value compared with the rate at which the collection grows. ## Return value -[SCAN], [SSCAN], [HSCAN] and [ZSCAN] return a two elements multi-bulk reply, where the first element is a string representing an unsigned 64 bit number (the cursor), and the second element is a multi-bulk with an array of elements. +`SCAN`, `SSCAN`, `HSCAN` and `ZSCAN` return a two elements multi-bulk reply, where the first element is a string representing an unsigned 64 bit number (the cursor), and the second element is a multi-bulk with an array of elements. -* [SCAN] array of elements is a list of keys. -* [SSCAN] array of elements is a list of Set members. -* [HSCAN] array of elements contain two elements, a field and a value, for every returned element of the Hash. -* [ZSCAN] array of elements contain two elements, a member and its associated score, for every returned element of the sorted set. +* `SCAN` array of elements is a list of keys. +* `SSCAN` array of elements is a list of Set members. +* `HSCAN` array of elements contain two elements, a field and a value, for every returned element of the Hash. +* `ZSCAN` array of elements contain two elements, a member and its associated score, for every returned element of the sorted set. ## Additional examples From fd28c07cca927f4faccd0a7b6cb82839322deb53 Mon Sep 17 00:00:00 2001 From: antirez Date: Thu, 31 Oct 2013 17:15:44 +0100 Subject: [PATCH 0071/2573] SCAN doc, more fixes. --- commands/scan.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/commands/scan.md b/commands/scan.md index ef7ab60c91..1ebfea18e7 100644 --- a/commands/scan.md +++ b/commands/scan.md @@ -5,7 +5,7 @@ The `SCAN` command and the closely related commands `SSCAN`, `HSCAN` and `ZSCAN` * `HSCAN` iterates fields of Hash types and their associated values. * `ZSCAN` iterates elements of Sorted Set types and their associated scores. -Since these commands allow for incremental iteration, that means that only a small number of elements are returned at every call, they can be used in production and are very fast commands, without the downside of commands like [KEYS] or [SMEMBERS] that may block the server for a long time (even several seconds) when called against big collections of keys or elements. +Since these commands allow for incremental iteration, returning only a small number of elements t every call, they can be used in production without the downside of commands like `KEYS` or `SMEMBERS` that may block the server for a long time (even several seconds) when called against big collections of keys or elements. However while blocking commands like [SMEMBERS] are able to provide all the elements that are part of a Set in a given moment, The SCAN family of commands only offer limited guarantees about the returned elements since the collection that we incrementally iterate can change during the iteration process. From 11e667102d9f58b521bb55087cc3312e61632089 Mon Sep 17 00:00:00 2001 From: antirez Date: Thu, 31 Oct 2013 17:16:21 +0100 Subject: [PATCH 0072/2573] More markup fixes for SCAN. --- commands/scan.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/commands/scan.md b/commands/scan.md index 1ebfea18e7..d6c331b634 100644 --- a/commands/scan.md +++ b/commands/scan.md @@ -7,7 +7,7 @@ The `SCAN` command and the closely related commands `SSCAN`, `HSCAN` and `ZSCAN` Since these commands allow for incremental iteration, returning only a small number of elements t every call, they can be used in production without the downside of commands like `KEYS` or `SMEMBERS` that may block the server for a long time (even several seconds) when called against big collections of keys or elements. -However while blocking commands like [SMEMBERS] are able to provide all the elements that are part of a Set in a given moment, The SCAN family of commands only offer limited guarantees about the returned elements since the collection that we incrementally iterate can change during the iteration process. +However while blocking commands like `SMEMBERS` are able to provide all the elements that are part of a Set in a given moment, The SCAN family of commands only offer limited guarantees about the returned elements since the collection that we incrementally iterate can change during the iteration process. Note that `SCAN`, `SSCAN`, `HSCAN` and `ZSCAN` all work very similarly, so this documentation covers all the four commands. However an obvious difference is that in the case of `SSCAN`, `HSCAN` and `ZSCAN` the first argument is the name of the key holding the Set, Hash or Sorted Set value. The `SCAN` command does not need any key name argument as it iterates keys in the current database, so the iterated object is the database itself. @@ -82,7 +82,7 @@ Important: **there is no need to use the same COUNT value** for every iteration. ## The MATCH option -It is possible to only iterate elements matching a given glob-style pattern, similarly to the behavior of the [KEYS] command that takes a pattern as only argument. +It is possible to only iterate elements matching a given glob-style pattern, similarly to the behavior of the `KEYS` command that takes a pattern as only argument. To do so, just append the `MATCH ` arguments at the end of the `SCAN` command (it works with all the SCAN family commands). From 75e96c6beae1630cea5638978cca752bc115d116 Mon Sep 17 00:00:00 2001 From: antirez Date: Thu, 31 Oct 2013 17:22:42 +0100 Subject: [PATCH 0073/2573] Fixed typo. --- commands/scan.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/commands/scan.md b/commands/scan.md index d6c331b634..9eef2c926e 100644 --- a/commands/scan.md +++ b/commands/scan.md @@ -5,7 +5,7 @@ The `SCAN` command and the closely related commands `SSCAN`, `HSCAN` and `ZSCAN` * `HSCAN` iterates fields of Hash types and their associated values. * `ZSCAN` iterates elements of Sorted Set types and their associated scores. -Since these commands allow for incremental iteration, returning only a small number of elements t every call, they can be used in production without the downside of commands like `KEYS` or `SMEMBERS` that may block the server for a long time (even several seconds) when called against big collections of keys or elements. +Since these commands allow for incremental iteration, returning only a small number of elements per call, they can be used in production without the downside of commands like `KEYS` or `SMEMBERS` that may block the server for a long time (even several seconds) when called against big collections of keys or elements. However while blocking commands like `SMEMBERS` are able to provide all the elements that are part of a Set in a given moment, The SCAN family of commands only offer limited guarantees about the returned elements since the collection that we incrementally iterate can change during the iteration process. From 08bf3f4a4b59cae1c60f51402d797d33ed09a2b4 Mon Sep 17 00:00:00 2001 From: antirez Date: Thu, 31 Oct 2013 17:24:34 +0100 Subject: [PATCH 0074/2573] SCAN typo fixed. --- commands/scan.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/commands/scan.md b/commands/scan.md index 9eef2c926e..167ae82565 100644 --- a/commands/scan.md +++ b/commands/scan.md @@ -13,7 +13,7 @@ Note that `SCAN`, `SSCAN`, `HSCAN` and `ZSCAN` all work very similarly, so this ## SCAN basic usage -SCAN is a cursor based iteration. This means that at every call of the command, the server returns an updated cursor that the user needs to use as the cursor argument in the next call. +SCAN is a cursor based iterator. This means that at every call of the command, the server returns an updated cursor that the user needs to use as the cursor argument in the next call. An iteration starts when the cursor is set to 0, and terminates when the cursor returned by the server is 0. The following is an example of SCAN iteration: From 8d010460d28979529b03232312fce1f36d3eb799 Mon Sep 17 00:00:00 2001 From: antirez Date: Thu, 31 Oct 2013 17:26:28 +0100 Subject: [PATCH 0075/2573] SCAN: more typos fixed. --- commands/scan.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/commands/scan.md b/commands/scan.md index 167ae82565..3e0c0ffdc2 100644 --- a/commands/scan.md +++ b/commands/scan.md @@ -141,7 +141,7 @@ As you can see most of the calls returned zero elements, but the last call where ## Multiple parallel iterations -It is possible to an infinite number of clients to iterate the same collection at the same time, as the full state of the iterator is in the cursor, that is obtained and returned to the client at every call. So server side no state is taken. +It is possible for an infinite number of clients to iterate the same collection at the same time, as the full state of the iterator is in the cursor, that is obtained and returned to the client at every call. Server side no state is taken at all. ## Terminating iterations in the middle From 85e8c6f25193ff11a63e2f9fc49b026485ea1903 Mon Sep 17 00:00:00 2001 From: antirez Date: Thu, 31 Oct 2013 17:27:14 +0100 Subject: [PATCH 0076/2573] SCAN: markdown fix for list. --- commands/scan.md | 1 + 1 file changed, 1 insertion(+) diff --git a/commands/scan.md b/commands/scan.md index 3e0c0ffdc2..ccbd0944d5 100644 --- a/commands/scan.md +++ b/commands/scan.md @@ -152,6 +152,7 @@ Since there is no state server side, but the full state is captured by the curso Calling `SCAN` with a broken, negative, out of range, or otherwise invalid cursor, will result into undefined behavior but never into a crash. What will be undefined is that the guarantees about the returned elements can no longer be ensured by the `SCAN` implementation. The only valid cursors to use are: + * The cursor value of 0 when starting an iteration. * The cursor returned by the previous call to SCAN in order to continue the iteration. From 04a7a2aad4010eea1c33319e5e5959de80c32973 Mon Sep 17 00:00:00 2001 From: Xavier Shay Date: Sun, 3 Nov 2013 06:51:39 -0800 Subject: [PATCH 0077/2573] Minor typo fix in protocol documentation. --- topics/protocol.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topics/protocol.md b/topics/protocol.md index 032e3b921b..44cc02f75f 100644 --- a/topics/protocol.md +++ b/topics/protocol.md @@ -70,7 +70,7 @@ possible to detect the kind of reply from the first byte sent by the server: * In an Error Reply the first byte of the reply is "-" * In an Integer Reply the first byte of the reply is ":" * In a Bulk Reply the first byte of the reply is "$" -* In a Multi Bulk Reply the first byte of the reply s "`*`" +* In a Multi Bulk Reply the first byte of the reply is "`*`" From 6eab5ab73396b89726a8d85582e9e0601c155f8d Mon Sep 17 00:00:00 2001 From: antirez Date: Thu, 7 Nov 2013 15:31:29 +0100 Subject: [PATCH 0078/2573] GPG key added in the security page. --- topics/security.md | 69 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 69 insertions(+) diff --git a/topics/security.md b/topics/security.md index b44493dc12..d0a81cb433 100644 --- a/topics/security.md +++ b/topics/security.md @@ -6,6 +6,10 @@ view of Redis: the access control provided by Redis, code security concerns, attacks that can be triggered from the outside by selecting malicious inputs and other similar topics are covered. +For security related contacts please open an issue on Github, or when you feel it +is really important that the security of the communication is preserved, use the +GPG key at the end of this document. + Redis general security model ---- @@ -162,3 +166,68 @@ The Redis authors are currently investigating the possibility of adding a new configuration parameter to prevent **CONFIG SET/GET dir** and other similar run-time configuration directives. This would prevent clients from forcing the server to write Redis dump files at arbitrary locations. + +GPG key +--- + +``` +-----BEGIN PGP PUBLIC KEY BLOCK----- +Version: GnuPG v1.4.13 (Darwin) + +mQINBFJ7ouABEAC5HwiDmE+tRCsWyTaPLBFEGDHcWOLWzph5HdrRtB//UUlSVt9P +tTWZpDvZQvq/ujnS2i2c54V+9NcgVqsCEpA0uJ/U1sUZ3RVBGfGO/l+BIMBnM+B+ +TzK825TxER57ILeT/2ZNSebZ+xHJf2Bgbun45pq3KaXUrRnuS8HWSysC+XyMoXET +nksApwMmFWEPZy62gbeayf1U/4yxP/YbHfwSaldpEILOKmsZaGp8PAtVYMVYHsie +gOUdS/jO0P3silagq39cPQLiTMSsyYouxaagbmtdbwINUX0cjtoeKddd4AK7PIww +7su/lhqHZ58ZJdlApCORhXPaDCVrXp/uxAQfT2HhEGCJDTpctGyKMFXQbLUhSuzf +IilRKJ4jqjcwy+h5lCfDJUvCNYfwyYApsMCs6OWGmHRd7QSFNSs335wAEbVPpO1n +oBJHtOLywZFPF+qAm3LPV4a0OeLyA260c05QZYO59itakjDCBdHwrwv3EU8Z8hPd +6pMNLZ/H1MNK/wWDVeSL8ZzVJabSPTfADXpc1NSwPPWSETS7JYWssdoK+lXMw5vK +q2mSxabL/y91sQ5uscEDzDyJxEPlToApyc5qOUiqQj/thlA6FYBlo1uuuKrpKU1I +e6AA3Gt3fJHXH9TlIcO6DoHvd5fS/o7/RxyFVxqbRqjUoSKQeBzXos3u+QARAQAB +tChTYWx2YXRvcmUgU2FuZmlsaXBwbyA8YW50aXJlekBnbWFpbC5jb20+iQI+BBMB +AgAoBQJSe6LgAhsDBQld/A8ABgsJCAcDAgYVCAIJCgsEFgIDAQIeAQIXgAAKCRAx +gTcoDlyI1riPD/oDDvyIVHtgHvdHqB8/GnF2EsaZgbNuwbiNZ+ilmqnjXzZpu5Su +kGPXAAo+v+rJVLSU2rjCUoL5PaoSlhznw5PL1xpBosN9QzfynWLvJE42T4i0uNU/ +a7a1PQCluShnBchm4Xnb3ohNVthFF2MGFRT4OZ5VvK7UcRLYTZoGRlKRGKi9HWea +2xFvyUd9jSuGZG/MMuoslgEPxei09rhDrKxnDNQzQZQpamm/42MITh/1dzEC5ZRx +8hgh1J70/c+zEU7s6kVSGvmYtqbV49/YkqAbhENIeZQ+bCxcTpojEhfk6HoQkXoJ +oK5m21BkMlUEvf1oTX22c0tuOrAX8k0y1M5oismT2e3bqs2OfezNsSfK2gKbeASk +CyYivnbTjmOSPbkvtb27nDqXjb051q6m2A5d59KHfey8BZVuV9j35Ettx4nrS1Ni +S7QrHWRvqceRrIrqXJKopyetzJ6kYDlbP+EVN9NJ2kz/WG6ermltMJQoC0oMhwAG +dfrttG+QJ8PCOlaYiZLD2bjzkDfdfanE74EKYWt+cseenZUf0tsncltRbNdeGTQb +1/GHfwJ+nbA1uKhcHCQ2WrEeGiYpvwKv2/nxBWZ3gwaiAwsz/kI6DQlPZqJoMea9 +8gDK2rQigMgbE88vIli4sNhc0yAtm3AbNgAO28NUhzIitB+av/xYxN/W/LkCDQRS +e6LgARAAtdfwe05ZQ0TZYAoeAQXxx2mil4XLzj6ycNjj2JCnFgpYxA8m6nf1gudr +C5V7HDlctp0i9i0wXbf07ubt4Szq4v3ihQCnPQKrZZWfRXxqg0/TOXFfkOdeIoXl +Fl+yC5lUaSTJSg21nxIr8pEq/oPbwpdnWdEGSL9wFanfDUNJExJdzxgyPzD6xubc +OIn2KviV9gbFzQfOIkgkl75V7gn/OA5g2SOLOIPzETLCvQYAGY9ppZrkUz+ji+aT +Tg7HBL6zySt1sCCjyBjFFgNF1RZY4ErtFj5bdBGKCuglyZou4o2ETfA8A5NNpu7x +zkls45UmqRTbmsTD2FU8Id77EaXxDz8nrmjz8f646J0rqn9pGnIg6Lc2PV8j7ACm +/xaTH03taIloOBkTs/Cl01XYeloM0KQwrML43TIm3xSE/AyGF9IGTQo3zmv8SnMO +F+Rv7+55QGlSkfIkXUNCUSm1+dJSBnUhVj/RAjxkekG2di+Jh/y8pkSUxPMDrYEa +OtDoiq2G/roXjVQcbOyOrWA2oB58IVuXO6RzMYi6k6BMpcbmQm0y+TcJqo64tREV +tjogZeIeYDu31eylwijwP67dtbWgiorrFLm2F7+povfXjsDBCQTYhjH4mZgV94ri +hYjP7X2YfLV3tvGyjsMhw3/qLlEyx/f/97gdAaosbpGlVjnhqicAEQEAAYkCJQQY +AQIADwUCUnui4AIbDAUJXfwPAAAKCRAxgTcoDlyI1kAND/sGnXTbMvfHd9AOzv7i +hDX15SSeMDBMWC+8jH/XZASQF/zuHk0jZNTJ01VAdpIxHIVb9dxRrZ3bl56BByyI +8m5DKJiIQWVai+pfjKj6C7p44My3KLodjEeR1oOODXXripGzqJTJNqpW5eCrCxTM +yz1rzO1H1wziJrRNc+ACjVBE3eqcxsZkDZhWN1m8StlX40YgmQmID1CC+kRlV+hg +LUlZLWQIFCGo2UJYoIL/xvUT3Sx4uKD4lpOjyApWzU40mGDaM5+SOsYYrT8rdwvk +nd/efspff64meT9PddX1hi7Cdqbq9woQRu6YhGoCtrHyi/kklGF3EZiw0zWehGAR +2pUeCTD28vsMfJ3ZL1mUGiwlFREUZAcjIlwWDG1RjZDJeZ0NV07KH1N1U8L8aFcu ++CObnlwiavZxOR2yKvwkqmu9c7iXi/R7SVcGQlNao5CWINdzCLHj6/6drPQfGoBS +K/w4JPe7fqmIonMR6O1Gmgkq3Bwl3rz6MWIBN6z+LuUF/b3ODY9rODsJGp21dl2q +xCedf//PAyFnxBNf5NSjyEoPQajKfplfVS3mG8USkS2pafyq6RK9M5wpBR9I1Smm +gon60uMJRIZbxUjQMPLOViGNXbPIilny3FdqbUgMieTBDxrJkE7mtkHfuYw8bERy +vI1sAEeV6ZM/uc4CDI3E2TxEbQ== +``` + +**Key fingerprint** + +``` +pub 4096R/0E5C88D6 2013-11-07 [expires: 2063-10-26] + Key fingerprint = E5F3 DA80 35F0 2EC1 47F9 020F 3181 3728 0E5C 88D6 + uid Salvatore Sanfilippo + sub 4096R/3B34D15F 2013-11-07 [expires: 2063-10-26] +``` From 8d45b0d13bc1ed18ef704224ada3c7a2d0a4156e Mon Sep 17 00:00:00 2001 From: Jan-Erik Rediger Date: Sat, 9 Nov 2013 13:54:55 +0100 Subject: [PATCH 0079/2573] Documented included libraries and added examples. --- commands/eval.md | 84 ++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 82 insertions(+), 2 deletions(-) diff --git a/commands/eval.md b/commands/eval.md index 606ea4a9c6..d00771c127 100644 --- a/commands/eval.md +++ b/commands/eval.md @@ -481,14 +481,94 @@ The Redis Lua interpreter loads the following Lua libraries: * string lib. * math lib. * debug lib. +* struct lib. * cjson lib. * cmsgpack lib. +* redis.sha1hex function. Every Redis instance is _guaranteed_ to have all the above libraries so you can be sure that the environment for your Redis scripts is always the same. -The CJSON library provides extremely fast JSON maniplation within Lua. -All the other libraries are standard Lua libraries. +struct, CJSON and cmsgpack are external libraries, all the other libraries are standard +Lua libraries. + +### struct + +struct is a library for packing/unpacking structures within Lua. + +``` +Valid formats: +> - big endian +< - little endian +![num] - alignment +x - pading +b/B - signed/unsigned byte +h/H - signed/unsigned short +l/L - signed/unsigned long +T - size_t +i/In - signed/unsigned integer with size `n' (default is size of int) +cn - sequence of `n' chars (from/to a string); when packing, n==0 means + the whole string; when unpacking, n==0 means use the previous + read number as the string length +s - zero-terminated string +f - float +d - double +' ' - ignored +``` + + +Example: + +``` +127.0.0.1:6379> eval 'return struct.pack("HH", 1, 2)' 0 +"\x01\x00\x02\x00" +3) (integer) 5 +127.0.0.1:6379> eval 'return {struct.unpack("HH", ARGV[1])}' 0 "\x01\x00\x02\x00" +1) (integer) 1 +2) (integer) 2 +3) (integer) 5 +127.0.0.1:6379> eval 'return struct.size("HH")' 0 +(integer) 4 +``` + +### CJSON + +The CJSON library provides extremely fast JSON manipulation within Lua. + +Example: + +``` +redis 127.0.0.1:6379> eval 'return cjson.encode({["foo"]= "bar"})' 0 +"{\"foo\":\"bar\"}" +redis 127.0.0.1:6379> eval 'return cjson.decode(ARGV[1])["foo"]' 0 "{\"foo\":\"bar\"}" +"bar" +``` + +### cmsgpack + +The cmsgpack library provides simple and fast MessagePack manipulation within Lua. + +Example: + +``` +127.0.0.1:6379> eval 'return cmsgpack.pack({"foo", "bar", "baz"})' 0 +"\x93\xa3foo\xa3bar\xa3baz" +127.0.0.1:6379> eval 'return cmsgpack.unpack(ARGV[1])' 0 "\x93\xa3foo\xa3bar\xa3baz +1) "foo" +2) "bar" +3) "baz" +``` + +### redis.sha1hex + +Perform the SHA1 of the input string. + +Example: + +``` +127.0.0.1:6379> eval 'return redis.sha1hex(ARGV[1])' 0 "foo" +"0beec7b5ea3f0fdbc95d0dd47f3c5bc275da8a33" +``` ## Emitting Redis logs from scripts From 33a0b714a196aa18b193e37aa184083c4d1ba235 Mon Sep 17 00:00:00 2001 From: Frank Mueller Date: Tue, 12 Nov 2013 10:56:56 +0100 Subject: [PATCH 0080/2573] Changed location of the Tideland Go Redis Client Repository moved from Google into own Git. --- clients.json | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/clients.json b/clients.json index 327a74c93c..1a052e18ea 100644 --- a/clients.json +++ b/clients.json @@ -120,9 +120,9 @@ }, { - "name": "Tideland CGL Redis", + "name": "Tideland Go Redis Client", "language": "Go", - "repository": "http://code.google.com/p/tcgl/", + "repository": "http://git.tideland.biz/godm/redis", "description": "A flexible Go Redis client able to handle all commands", "authors": ["themue"], "active": true From a838e74d857ca49a9549cb171c47d11f54fad517 Mon Sep 17 00:00:00 2001 From: Jan-Erik Rediger Date: Tue, 12 Nov 2013 16:56:24 +0100 Subject: [PATCH 0081/2573] Documented that migrate options are only available in 2.8 Fixes antirez/redis#506 --- commands/migrate.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/commands/migrate.md b/commands/migrate.md index 775d1ea6fc..cee852393b 100644 --- a/commands/migrate.md +++ b/commands/migrate.md @@ -42,6 +42,8 @@ On success OK is returned. * `COPY` -- Do not remove the key from the local instance. * `REPLACE` -- Replace existing key on the remote instance. +`COPY` and `REPLACE` are added in 2.8 and not available in 2.6 + @return @status-reply: The command returns OK on success. From 25cdf50fbceea0904b278a67d76d586fc65d8065 Mon Sep 17 00:00:00 2001 From: antirez Date: Wed, 13 Nov 2013 13:19:55 +0100 Subject: [PATCH 0082/2573] RENAME doc updated. --- commands/rename.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/commands/rename.md b/commands/rename.md index 6317706540..b8c524a4f2 100644 --- a/commands/rename.md +++ b/commands/rename.md @@ -1,7 +1,7 @@ Renames `key` to `newkey`. It returns an error when the source and destination names are the same, or when `key` does not exist. -If `newkey` already exists it is overwritten. +If `newkey` already exists it is overwritten, when this happens `RENAME` executes an implicit `DEL` operation, so if the deleted key contains a very big value it may cause high latency even if `RENAME` itself is usually a constant-time operation. @return From 4e99f5dbbccf65a277bf543e323bb2498bccdf32 Mon Sep 17 00:00:00 2001 From: antirez Date: Mon, 18 Nov 2013 09:57:35 +0100 Subject: [PATCH 0083/2573] Added SSCAN, ZSCAN, HSCAN pages redirecting to SCAN. --- commands/hscan.md | 1 + commands/sscan.md | 1 + commands/zscan.md | 1 + 3 files changed, 3 insertions(+) create mode 100644 commands/hscan.md create mode 100644 commands/sscan.md create mode 100644 commands/zscan.md diff --git a/commands/hscan.md b/commands/hscan.md new file mode 100644 index 0000000000..9ab261616a --- /dev/null +++ b/commands/hscan.md @@ -0,0 +1 @@ +See `SCAN` for `HSCAN` documentation. diff --git a/commands/sscan.md b/commands/sscan.md new file mode 100644 index 0000000000..c19f3b1bf3 --- /dev/null +++ b/commands/sscan.md @@ -0,0 +1 @@ +See `SCAN` for `SSCAN` documentation. diff --git a/commands/zscan.md b/commands/zscan.md new file mode 100644 index 0000000000..3926307fbe --- /dev/null +++ b/commands/zscan.md @@ -0,0 +1 @@ +See `SCAN` for `ZSCAN` documentation. From b3d53fd3e7014a79895874cf93c19c2f24db1335 Mon Sep 17 00:00:00 2001 From: Fayiz Musthafa Date: Wed, 20 Nov 2013 20:22:36 +0530 Subject: [PATCH 0084/2573] Added scredis a scala client --- clients.json | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/clients.json b/clients.json index 327a74c93c..af53096001 100644 --- a/clients.json +++ b/clients.json @@ -441,6 +441,14 @@ "active": true }, + { + "name": "scredis", + "language": "Scala", + "repository": "https://github.com/Livestream/scredis", + "description": "Scredis is an advanced Redis client entirely written in Scala. Used in production at http://Livestream.com.", + "active": true + }, + { "name": "Tcl Client", "language": "Tcl", From 47115ec5f7cdc2fff7b7a9f278db8fe316b25de3 Mon Sep 17 00:00:00 2001 From: antirez Date: Thu, 21 Nov 2013 18:01:24 +0100 Subject: [PATCH 0085/2573] New Sentinel doc. --- topics/{sentinel.md => sentinel-old.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename topics/{sentinel.md => sentinel-old.md} (100%) diff --git a/topics/sentinel.md b/topics/sentinel-old.md similarity index 100% rename from topics/sentinel.md rename to topics/sentinel-old.md From 173155f6d3b185348fb62c8dceb5616a33d51ffa Mon Sep 17 00:00:00 2001 From: antirez Date: Thu, 21 Nov 2013 18:02:15 +0100 Subject: [PATCH 0086/2573] New doc actually added... --- topics/sentinel.md | 359 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 359 insertions(+) create mode 100644 topics/sentinel.md diff --git a/topics/sentinel.md b/topics/sentinel.md new file mode 100644 index 0000000000..5a600697c3 --- /dev/null +++ b/topics/sentinel.md @@ -0,0 +1,359 @@ +Redis Sentinel Documentation +=== + +**Note:** this page documents the *new* Sentinel implementation that entered the Github repository 21th of November. The old Sentinel implementation is [documented here](http://redis.io/topics/sentinel-old), however using the old implementation is discouraged. + +Redis Sentinel is a system designed to help managing Redis instances. +It performs the following three tasks: + +* **Monitoring**. Sentinel constantly check if your master and slave instances are working as expected. +* **Notification**. Sentinel can notify the system administrator, or another computer program, via an API, that something is wrong with one of the monitored Redis instances. +* **Automatic failover**. If a master is not working as expected, Sentinel can start a failover process where a slave is promoted to master, the other additional slaves are reconfigured to use the new master, and the applications using the Redis server informed about the new address to use when connecting. + +Redis Sentinel is a distributed system, this means that usually you want to run +multiple Sentinel processes across your infrastructure, and this processes +will use gossip protocols in order to understand if a master is down and +agreement protocols in order to perform the failover and assign a version to +the new configuration. + +Redis Sentinel is shipped as a stand-alone executable called `redis-sentinel` +but actually it is a special execution mode of the Redis server itself, and +can be also invoked using the `--sentinel` option of the normal `redis-sever` +executable. + +**WARNING:** Redis Sentinel is currently a work in progress. This document +describes how to use what we is already implemented, and may change as the +Sentinel implementation evolves. + +Redis Sentinel is compatible with Redis 2.4.16 or greater, and Redis 2.6.0 or greater, however it works better if used against Redis instances version 2.8.0 or greater. + +Obtaining Sentinel +--- + +Currently Sentinel is part of the Redis *unstable* branch at github. +To compile it you need to clone the *unstable* branch and compile Redis. +You'll see a `redis-sentinel` executable in your `src` directory. + +Alternatively you can use directly the `redis-server` executable itself, +starting it in Sentinel mode as specified in the next paragraph. + +An updated version of Sentinel is also available as part of the Redis 2.8.0 release. + +Running Sentinel +--- + +If you are using the `redis-sentinel` executable (or if you have a symbolic +link with that name to the `redis-server` executable) you can run Sentinel +with the following command line: + + redis-sentinel /path/to/sentinel.conf + +Otherwise you can use directly the `redis-server` executable starting it in +Sentinel mode: + + redis-server /path/to/sentinel.conf --sentinel + +Both ways work the same. + +However **it is mandatory** to use a configuration file when running Sentinel, as this file will be used by the system in order to save the current state that will be reloaded in case of restarts. Sentinel will simply refuse to start if no configuration file is given or if the configuration file path is not writable. + +Configuring Sentinel +--- + +The Redis source distribution contains a file called `sentinel.conf` +that is a self-documented example configuration file you can use to +configure Sentinel, however a typical minimal configuration file looks like the +following: + + sentinel monitor mymaster 127.0.0.1 6379 2 + sentinel down-after-milliseconds mymaster 60000 + sentinel failover-timeout mymaster 180000 + sentinel parallel-syncs mymaster 1 + + sentinel monitor resque 192.168.1.3 6380 4 + sentinel down-after-milliseconds resque 10000 + sentinel failover-timeout resque 180000 + sentinel parallel-syncs resque 5 + +The first line is used to tell Redis to monitor a master called *mymaster*, +that is at address 127.0.0.1 and port 6379, with a level of agreement needed +to detect this master as failing of 2 sentinels (if the agreement is not reached +the automatic failover does not start). + +However note that whatever the agreement you specify to detect an instance as not working, a Sentinel requires **the vote from the majority** of the known Sentinels in the system in order to start a failover and reserve a given *configuration Epoch* (that is a version to attach to a new master configuration). + +In other words **Sentinel is not able to perform the failover if only a minority of the Sentinel processes are working**. + +The other options are almost always in the form: + + sentinel + +And are used for the following purposes: + +* `down-after-milliseconds` is the time in milliseconds an instance should not +be reachable (either does not reply to our PINGs or it is replying with an +error) for a Sentinel starting to think it is down. After this time has elapsed +the Sentinel will mark an instance as **subjectively down** (also known as +`SDOWN`), that is not enough to start the automatic failover. +However if enough instances will think that there is a subjectively down +condition, then the instance is marked as **objectively down**. The number of +sentinels that needs to agree depends on the configured agreement for this +master. +* `parallel-syncs` sets the number of slaves that can be reconfigured to use +the new master after a failover at the same time. The lower the number, the +more time it will take for the failover process to complete, however if the +slaves are configured to serve old data, you may not want all the slaves to +resync at the same time with the new master, as while the replication process +is mostly non blocking for a slave, there is a moment when it stops to load +the bulk data from the master during a resync. You may make sure only one +slave at a time is not reachable by setting this option to the value of 1. + +The other options are described in the rest of this document and +documented in the example sentinel.conf file shipped with the Redis +distribution. + +SDOWN and ODOWN +--- + +As already briefly mentioned in this document Redis Sentinel has two different +concepts of *being down*, one is called a *Subjectively Down* condition +(SDOWN) and is a down condition that is local to a given Sentinel instance. +Another is called *Objectively Down* condition (ODOWN) and is reached when +enough Sentinels (at least the number configured as the `quorum` parameter +of the monitored master) have an SDOWN condition, and get feedbacks from +other Sentinels using the `SENTINEL is-master-down-by-addr` command. + +From the point of view of a Sentinel an SDOWN condition is reached if we +don't receive a valid reply to PING requests for the number of seconds +specified in the configuration as `is-master-down-after-milliseconds` +parameter. + +An acceptable reply to PING is one of the following: + +* PING replied with +PONG. +* PING replied with -LOADING error. +* PING replied with -MASTERDOWN error. + +Any other reply (or no reply) is considered non valid. + +Note that SDOWN requires that no acceptable reply is received for the whole +interval configured, so for instance if the interval is 30000 milliseconds +(30 seconds) and we receive an acceptable ping reply every 29 seconds, the +instance is considered to be working. + +To switch from SDOWN to ODOWN no strong quorum algorithm is used, but +just a form of gossip: if a given Sentinel gets acknowledge that the master +is not working from enough Sentinels in a given time range, the SDOWN is +promoted to ODOWN. If this acknowledge is later missing, the flag is cleared. + +The ODOWN condition **only applies to masters**. For other kind of instances +Sentinel don't require any agreement, so the ODOWN state is never reached +for slaves and other sentinels. + +However once a Sentinel sees a master in ODOWN state, it can try to be +elected by the other Sentinels to perform the failover. + +Tasks every Sentinel accomplish periodically +--- + +* Every Sentinel sends a **PING** request to every known master, slave, and sentinel instance, every second. +* An instance is Subjectively Down (**SDOWN**) if the latest valid reply to **PING** was received more than `down-after-milliseconds` milliseconds ago. Acceptable PING replies are: +PONG, -LOADING, -MASTERDOWN. +* If a master is in **SDOWN** condition, every other Sentinel also monitoring this master, is queried for confirmation of this state, every second. +* If a master is in **SDOWN** condition, and enough other Sentinels (to reach the configured quorum) agree about the condition in a given time range, the master is marked as Objectively Down (**ODOWN**). +* Every Sentinel sends an **INFO** request to every known master and slave instance, one time every 10 seconds. If a master is in **ODOWN** condition, its slaves are asked for **INFO** every second instead of being asked every 10 seconds. +* The **ODOWN** condition is cleared if there is no longer acknowledgement about enough other Sentinels about the fact that the master is unreachable. The **SDOWN** condition is cleared as soon as the master starts to reply again to pings. + +Sentinels and Slaves auto discovery +--- + +While Sentinels stay connected with other Sentinels in order to reciprocally +check the availability of each other, and to exchange messages, you don't +need to configure the other Sentinel addresses in every Sentinel instance you +run, as Sentinel uses the Redis master Pub/Sub capabilities in order to +discover the other Sentinels that are monitoring the same master. + +This is obtained by sending *Hello Messages* into the channel named +`__sentinel__:hello`. + +Similarly you don't need to configure what is the list of the slaves attached +to a master, as Sentinel will auto discover this list querying Redis. + +* Every Sentinel publishes a message to every monitored master and slave Pub/Sub channel `__sentinel__:hello`, every two seconds, announcing its presence with ip, port, runid. +* Every Sentinel is subscribed to the Pub/Sub channel `__sentinel__:hello` of every master and slave, looking for unknown sentinels. When new sentinels are detected, they are added as sentinels of this master. +* Hello messages also include the full current configuration of the master. If another Sentinel has a configuration for a given master that is older than the one received, it updates to the new configuration immediately. +* Before adding a new sentinel to a master a Sentinel always checks if there is already a sentinel with the same runid or the same address (ip and port pair). In that case all the matching sentinels are removed, and the new added. + +Sentinel API +=== + +By default Sentinel runs using TCP port 26379 (note that 6379 is the normal +Redis port). Sentinels accept commands using the Redis protocol, so you can +use `redis-cli` or any other unmodified Redis client in order to talk with +Sentinel. + +There are two ways to talk with Sentinel: it is possible to directly query +it to check what is the state of the monitored Redis instances from its point +of view, to see what other Sentinels it knows, and so forth. + +An alternative is to use Pub/Sub to receive *push style* notifications from +Sentinels, every time some event happens, like a failover, or an instance +entering an error condition, and so forth. + +Sentinel commands +--- + +The following is a list of accepted commands: + +* **PING** this command simply returns PONG. +* **SENTINEL masters** show a list of monitored masters and their state. +* **SENTINEL slaves ``** show a list of slaves for this master, and their state. +* **SENTINEL get-master-addr-by-name ``** return the ip and port number of the master with that name. If a failover is in progress or terminated successfully for this master it returns the address and port of the promoted slave. +* **SENTINEL reset ``** this command will reset all the masters with matching name. The pattern argument is a glob-style pattern. The reset process clears any previous state in a master (including a failover in progress), and removes every slave and sentinel already discovered and associated with the master. +* **SENTINEL failover ``** force a failover as if the master was not reachable, and without asking for agreement to other Sentinels (however a new version of the configuration will be published so that the other Sentinels will update their configurations). + +Pub/Sub Messages +--- + +A client can use a Sentinel as it was a Redis compatible Pub/Sub server +(but you can't use `PUBLISH`) in order to `SUBSCRIBE` or `PSUBSCRIBE` to +channels and get notified about specific events. + +The channel name is the same as the name of the event. For instance the +channel named `+sdown` will receive all the notifications related to instances +entering an `SDOWN` condition. + +To get all the messages simply subscribe using `PSUBSCRIBE *`. + +The following is a list of channels and message formats you can receive using +this API. The first word is the channel / event name, the rest is the format of the data. + +Note: where *instance details* is specified it means that the following arguments are provided to identify the target instance: + + @ + +The part identifying the master (from the @ argument to the end) is optional +and is only specified if the instance is not a master itself. + +* **+reset-master** `` -- The master was reset. +* **+slave** `` -- A new slave was detected and attached. +* **+failover-state-reconf-slaves** `` -- Failover state changed to `reconf-slaves` state. +* **+failover-detected** `` -- A failover started by another Sentinel or any other external entity was detected (An attached slave turned into a master). +* **+slave-reconf-sent** `` -- The leader sentinel sent the `SLAVEOF` command to this instance in order to reconfigure it for the new slave. +* **+slave-reconf-inprog** `` -- The slave being reconfigured showed to be a slave of the new master ip:port pair, but the synchronization process is not yet complete. +* **+slave-reconf-done** `` -- The slave is now synchronized with the new master. +* **-dup-sentinel** `` -- One or more sentinels for the specified master were removed as duplicated (this happens for instance when a Sentinel instance is restarted). +* **+sentinel** `` -- A new sentinel for this master was detected and attached. +* **+sdown** `` -- The specified instance is now in Subjectively Down state. +* **-sdown** `` -- The specified instance is no longer in Subjectively Down state. +* **+odown** `` -- The specified instance is now in Objectively Down state. +* **-odown** `` -- The specified instance is no longer in Objectively Down state. +* **+new-epoch** `` -- The current epoch was updated. +* **+try-failover** `` -- New failover in progress, waiting to be elected by the majority. +* **+elected-leader** `` -- Won the election for the specified epoch, can do the failover. +* **+failover-state-select-slave** `` -- New failover state is `select-slave`: we are trying to find a suitable slave for promotion. +* **no-good-slave** `` -- There is no good slave to promote. Currently we'll try after some time, but probably this will change and the state machine will abort the failover at all in this case. +* **selected-slave** `` -- We found the specified good slave to promote. +* **failover-state-send-slaveof-noone** `` -- We are trynig to reconfigure the promoted slave as master, waiting for it to switch. +* **failover-end-for-timeout** `` -- The failover terminated for timeout, slaves will eventually be configured to replicate with the new master anyway. +* **failover-end** `` -- The failover terminated with success. All the slaves appears to be reconfigured to replicate with the new master. +* **switch-master** ` ` -- The master new IP and address is the specified one after a configuration change. This is **the message most external users are interested in**. +* **+tilt** -- Tilt mode entered. +* **-tilt** -- Tilt mode exited. + +Sentinel failover +=== + +The failover process consists on the following steps: + +* Recognize that the master is in ODOWN state. +* Increment our current epoch (see Raft leader election), and try to be elected for the current epoch. +* If the election failed, retry to be elected again after two times the configured failover timeout and stop for now. Otherwise continue with the following steps. +* Select a slave to promote as master. +* The promoted slave is turned into a master with the command **SLAVEOF NO ONE**. +* The Hello messages broadcasted via Pub/Sub contain the updated configuration starting from now, so that all the other Sentinels will update their config. +* All the other slaves attached to the original master are configured with the **SLAVEOF** command in order to start the replication process with the new master. +* The leader terminates the failover process when all the slaves are reconfigured. + +The Sentinel to elect as master is chosen in the following way: + +* We remove all the slaves in SDOWN, disconnected, or with the last ping reply received older than 5 seconds (PING is sent every second). +* We remove all the slaves disconnected from the master for more than 10 times the configured `down-after` time. +* Of all the remaining instances, we get the one with the greatest replication offset if available, or the one with the lowest `runid`, lexicographically, if the replication offset is not available or the same. + +Consistency qualities of Sentinel failover +--- + +The Sentinel failover uses the leader election from the Raft algorithm in order +to guarantee that only a given leader is elected in a given epoch. + +This means that there are no two Sentinels that will try to perform the +election in the same epoch. Also Sentinels will never vote another leader for +a given epoch more than one time. + +Higher configuration epochs always win over older epochs, so every Sentinel will +actively replace its configuration with a new one. + +Basically it is possible to think to Sentinel configurations as a state with an associated version. The state is **eventually propagated** to all the other Sentinels in a last-write-wins fashion (that is, last configuration wins). + +For example during network partitions, a given Sentinel can claim an older configuration, that will be updated as soon as the Sentinel is already able to receive an update. + +In environments where consistency is important during network partitions, it is suggested to use the Redis option that stops accepting writes if not connected to at least a given number of slaves instances, and at the same time to run a Redis Sentinel process in every physical or virtual machine where a Redis master or slave is running. + +Sentinel persistent state +--- + +Sentinel state is persisted in the sentinel configuration file. For example +every time a new configuration is received, or created (leader Sentinels), for +a master, the configuration is persisted on disk together with the configuration +epoch. This means that it is safe to stop and restart Sentinel processes. + +Sentinel reconfiguration of instances outside the failover procedure. +--- + +Even when no failover is in progress, Sentinels will always try to set the +current configuration on monitored instances. Specifically: + +* Slaves (according to the current configuration) that claim to be masters, will be configured as slaves to replicate with the current master. +* Slaves connected to a wrong master, will be reconfigured to replicate with the right master. + +However when this conditions are encountered Sentinel waits enough time to be sure to catch a configuration update in via Pub/Sub Hello messages before to reconfigure the instances, in order to avoid that Sentinels with a stale configuration will try to change the slaves configuration without a good reason. + +TILT mode +--- + +Redis Sentinel is heavily dependent on the computer time: for instance in +order to understand if an instance is available it remembers the time of the +latest successful reply to the PING command, and compares it with the current +time to understand how old it is. + +However if the computer time changes in an unexpected way, or if the computer +is very busy, or the process blocked for some reason, Sentinel may start to +behave in an unexpected way. + +The TILT mode is a special "protection" mode that a Sentinel can enter when +something odd is detected that can lower the reliability of the system. +The Sentinel timer interrupt is normally called 10 times per second, so we +expect that more or less 100 milliseconds will elapse between two calls +to the timer interrupt. + +What a Sentinel does is to register the previous time the timer interrupt +was called, and compare it with the current call: if the time difference +is negative or unexpectedly big (2 seconds or more) the TILT mode is entered +(or if it was already entered the exit from the TILT mode postponed). + +When in TILT mode the Sentinel will continue to monitor everything, but: + +* It stops acting at all. +* It starts to reply negatively to `SENTINEL is-master-down-by-addr` requests as the ability to detect a failure is no longer trusted. + +If everything appears to be normal for 30 second, the TILT mode is exited. + +Handling of -BUSY state +--- + +(Warning: Yet not implemented) + +The -BUSY error is returned when a script is running for more time than the +configured script time limit. When this happens before triggering a fail over +Redis Sentinel will try to send a "SCRIPT KILL" command, that will only +succeed if the script was read-only. From 99ebb3ec136f446cc7a318f272d7007c42757bf3 Mon Sep 17 00:00:00 2001 From: antirez Date: Fri, 22 Nov 2013 10:20:18 +0100 Subject: [PATCH 0087/2573] Sentinel doc improved a bit. --- topics/sentinel.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/topics/sentinel.md b/topics/sentinel.md index 5a600697c3..0f0a3f6845 100644 --- a/topics/sentinel.md +++ b/topics/sentinel.md @@ -274,6 +274,8 @@ The failover process consists on the following steps: * All the other slaves attached to the original master are configured with the **SLAVEOF** command in order to start the replication process with the new master. * The leader terminates the failover process when all the slaves are reconfigured. +**Note:** every time a Redis instance is reconfigured, either by turning it into a master, a slave, or reconfiguring it as a slave of a different instance, the `CONFIG REWRITE` command is sent to the instance in order to make sure the configuration is persisted on disk. + The Sentinel to elect as master is chosen in the following way: * We remove all the slaves in SDOWN, disconnected, or with the last ping reply received older than 5 seconds (PING is sent every second). @@ -357,3 +359,8 @@ The -BUSY error is returned when a script is running for more time than the configured script time limit. When this happens before triggering a fail over Redis Sentinel will try to send a "SCRIPT KILL" command, that will only succeed if the script was read-only. + +Sentinel clients implementation +--- + +Sentinel requires explicit client support, unless the system is configured to execute a script that performs a transparent redirection of all the requests to the new master instance (virtual IP or other similar systems). The topic of client libraries implementation is covered in the document [Sentinel clients guidelines](/topics/sentinel-clients). From 62a3bffe65aecce387067d84a704a7a76d21b81e Mon Sep 17 00:00:00 2001 From: Sean Charles Date: Sun, 24 Nov 2013 19:26:47 +0000 Subject: [PATCH 0088/2573] Added GNU Prolog client information --- clients.json | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/clients.json b/clients.json index 327a74c93c..6bdef60f2f 100644 --- a/clients.json +++ b/clients.json @@ -669,5 +669,13 @@ "repository": "https://github.com/ctstone/csredis", "description": "Async (and sync) client for Redis and Sentinel", "authors": ["ctnstone"] + }, + + { + "name": "gnuprolog-redisclient", + "language": "GNU Prolog", + "repository": "https://github.com/emacstheviking/gnuprolog-redisclient", + "description": "Simple Redis client for GNU Prolog in native Prolog, no FFI, libraries etc.", + "authors": ["seancharles"] } ] From e0342264cd24c49bddfe3a1f5dbab7e34856a406 Mon Sep 17 00:00:00 2001 From: Curtis Maloney Date: Tue, 26 Nov 2013 10:20:25 +1100 Subject: [PATCH 0089/2573] Minor grammar corrections --- topics/persistence.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/topics/persistence.md b/topics/persistence.md index 731236e61d..42cd83ee55 100644 --- a/topics/persistence.md +++ b/topics/persistence.md @@ -39,8 +39,10 @@ AOF disadvantages --- * AOF files are usually bigger than the equivalent RDB files for the same dataset. -* AOF can be slower then RDB depending on the exact fsync policy. In general with fsync set to *every second* performances are still very high, and with fsync disabled it should be exactly as fast as RDB even under high load. Still RDB is able to provide more guarantees about the maximum latency even in the case of an huge write load. -* In the past we experienced rare bugs in specific commands (for instance there was one involving blocking commands like BRPOPLPUSH) causing the AOF produced to don't reproduce exactly the same dataset on reloading. This bugs are rare and we have tests in the test suite creating random complex datasets automatically and reloading them to check everything is ok, but this kind of bugs are almost impossible with RDB persistence. To make this point more clear: the Redis AOF works incrementally updating an existing state, like MySQL or MongoDB does, while the RDB snapshotting creates everything from scratch again and again, that is conceptually more robust. However 1) It should be noted that every time the AOF is rewritten by Redis it is recreated from scratch starting from the actual data contained in the data set, making resistance to bugs stronger compared to an always appending AOF file (or one rewritten reading the old AOF instead of reading the data in memory). 2) We never had a single report from users about an AOF corruption that was detected in the real world. +* AOF can be slower than RDB depending on the exact fsync policy. In general with fsync set to *every second* performances are still very high, and with fsync disabled it should be exactly as fast as RDB even under high load. Still RDB is able to provide more guarantees about the maximum latency even in the case of an huge write load. +* In the past we experienced rare bugs in specific commands (for instance there was one involving blocking commands like BRPOPLPUSH) causing the AOF produced to not reproduce exactly the same dataset on reloading. This bugs are rare and we have tests in the test suite creating random complex datasets automatically and reloading them to check everything is ok, but this kind of bugs are almost impossible with RDB persistence. To make this point more clear: the Redis AOF works incrementally updating an existing state, like MySQL or MongoDB does, while the RDB snapshotting creates everything from scratch again and again, that is conceptually more robust. However - + 1) It should be noted that every time the AOF is rewritten by Redis it is recreated from scratch starting from the actual data contained in the data set, making resistance to bugs stronger compared to an always appending AOF file (or one rewritten reading the old AOF instead of reading the data in memory). + 2) We never had a single report from users about an AOF corruption that was detected in the real world. Ok, so what should I use? --- From cd1cadaf8429f856e0b7cf6a06a084fe3476a531 Mon Sep 17 00:00:00 2001 From: antirez Date: Wed, 27 Nov 2013 10:16:37 +0100 Subject: [PATCH 0090/2573] Cluster tutorial. --- topics/cluster-tutorial.md | 707 +++++++++++++++++++++++++++++++++++++ 1 file changed, 707 insertions(+) create mode 100644 topics/cluster-tutorial.md diff --git a/topics/cluster-tutorial.md b/topics/cluster-tutorial.md new file mode 100644 index 0000000000..b6b7da958f --- /dev/null +++ b/topics/cluster-tutorial.md @@ -0,0 +1,707 @@ +Redis cluster tutorial +=== + +This document is a gentle introduction to Redis Cluster, that does not use +complex to understand distributed systems concepts. It provides instructions +about how to setup a cluster, test, and operate it, without +going into the details that are covered in +the [Redis Cluster specification](/topics/cluster-spec) but just describing +how the system behaves from the point of view of the cluster. + +Note that if you plan to run a serious Redis Cluster deployment, the +more formal specification is an highly suggested reading. + +**Redis cluster is currently alpha quality code**, please get in touch in the +Redis mailing list or open an issue in the Redis Github repository if you +find any issue. + +Redis Cluster 101 +--- + +Redis Cluster provides a way to run a Redis installation where data is +**automatically sharded across multiple Redis nodes**. + +Commands dealing with multiple keys are not supported by the cluster, because +this would require moving data between Redis nodes, making Redis Cluster +not able to provide Redis-alike performances and predictable behavior +under load. + +Redis Cluster also provides **some degree of availability during partitions**, +that is in practical terms the ability to continue the operations when +some nodes fail or are not able to communicate. + +So in practical terms, what you get with Redis Cluster? + +* The ability to automatically split your dataset among multiple nodes. +* The ability to continue operations when a subset of the nodes are experiencing failures or are unable to communicate with the rest of the cluster. + +Redis Cluster data sharding +--- + +Redis Cluster does not use consistency hashing, but a different form of sharding +where every key is conceptually part of what we call an **hash slot**. + +There are 16384 hash slots in Redis Cluster, and to compute what is the hash +slot of a given key, we simply take the CRC16 of the hash slot modulo +16384. + +Every node in a Redis Cluster is responsible of a subset of the hash slots, +so for example you may have a cluster wit 3 nodes, where: + +* Node A contains hash slots from 0 to 5500. +* Node B contains hash slots from 5501 to 11000. +* Node C contains hash slots from 11001 to 16384. + +This allows to add and remove nodes in the cluster easily. For example if +I want to add a new node D, I need to move some hash slot from nodes A, B, C +to D. Similarly if I want to remove node A from the cluster I can just +move the hash slots served by A to B and C. When the node A will be empty +I can remove it from the cluster completely. + +Because moving hash slots from a node to another does not require to stop +operations, adding and removing nodes, or changing the percentage of hash +slots hold by nodes, does not require any downtime. + +Redis Cluster master-slave model +--- + +In order to remain available when a subset of nodes are failing or are not able +to communicate with the majority of nodes, Redis Cluster uses a master-slave +model where every node has from 1 (the master itself) to N replicas (N-1 +additional slaves). + +In our example cluster with nodes A, B, C, if node B fails the cluster is not +able to continue, since we no longer have a way to serve hash slots in the +range 5501-11000. + +However if when the cluster is created (or at a latter time) we add a slave +node to every master, so that the final cluster is composed of A, B, C +that are masters, and A1, B1, C1 that are slaves, the system is able to +continue if node B fails. + +Node B1 replicates B, so the cluster will elect node B1 as the new master +and will continue to operate correctly. + +However note that if nodes B and B1 fail at the same time Redis Cluster is not +able to continue to operate. + +Redis Cluster consistency guarantees +--- + +Redis Cluster is not able to guarantee **strong consistency**. In practical +terms this means that under certain conditions it is possible that Redis +Cluster will forget a write that was acknowledged by the system. + +The first reason why Redis Cluster can lose writes is because it uses +asynchronous replication. This means that during writes the following +happens: + +* Your client writes to the master B. +* The master B replies OK to your client. +* The master B propagates the write to its slaves B1, B2 and B3. + +As you can see B does not write for an acknowledge from B1, B2, B3 before +replying to the client, since this would be a prohibitive latency penalty +for Redis, so if your client writes something, B acknowledges the write, +but crashes before being able to send the write to its slaves, one of the +slaves can be promoted to master losing the write forever. + +This is **very similar to what happens** with most databases that are +configured to flush data to disk every second, so it is a scenario you +are already able to reason about because of past experiences with traditional +database systems not involving distributed systems. Similarly you can +improve consistency by forcing the database to flush data on disk before +replying to the client, but this usually results into prohibitively low +performances. + +Basically there is a trade-off to take between performances and consistency. + +Note: Redis Cluster in the future will allow users to perform synchronous +writes when absolutely needed. + +There is another scenario where Redis Cluster will lose writes, that happens +during a network partition where a client is isolated with a minority of +instances including at least a master. + +Take as an example our 6 nodes cluster composed of A, B, C, A1, B1, C1, +with 3 masters and 3 slaves. There is also a client, that we will call Z1. + +After a partition occurs, it is possible that in one side of the +partition we have A, C, A1, B1, C1, and in the other side we have B and Z1. + +Z1 is still able to write to B, that will accept its writes. If the +partition heals in a very short time, the cluster will continue normally. +However if the partition lasts enough time for B1 to be promoted to master +in the majority side of the partition, the writes that Z1 is sending to B +will be lost. + +Note that there is a maximum window to the amount of writes Z1 will be able +to send to B: if enough time has elapsed for the majority side of the +partition to elect a slave as master, every master node in the minority +side stops accepting writes. + +This amount of time is a very important configuration directive of Redis +Cluster, and is called the **node timeout**. + +After node timeout has elapsed, a master node is considered to be failing, +and can be replaced by one if its replicas. +Similarly after node timeout has elapsed without a master node to be able +to sense the majority of the other master nodes, it enters an error state +and stops accepting writes. + +Creating and using a Redis Cluster +=== + +To create a cluster, the first thing we need is to have a few empty +Redis instances running in **cluster mode**. This basically means that +clusters are not created using normal Redis instances, but a special mode +needs to be configured so that the Redis instance will enable the Cluster +specific features and commands. + +The following is a minimal Redis cluster configuration file: + +``` +port 7000 +cluster-enabled yes +cluster-config-file nodes.conf +cluster-node-timeout 5000 +appendonly yes +``` + +As you can see what enables the cluster mode is simply the `cluster-enabled` +directive. Every instance also contains the path of a file where the +configuration for this node is stored, that by default is `nodes.conf`. +This file is never touched by humans, it is simply generated at startup +by the Redis Cluster instances, and updated every time it is needed. + +Note that the **minimal cluster** that works as expected requires to contain +at least three master nodes. For your first tests it is strongly suggested +to start a six nodes cluster with three masters and three slaves. + +To do so, enter a new directory, and create the following directories named +after the port number of the instance we'll run inside any given directory. + +Something like: + +``` +mkdir cluster-test +cd cluster-test +mkdir 7000 7001 7002 7003 7004 7005 +``` + +Create a `redis.conf` file inside each of the directories, from 7000 to 7005. +As a template for your configuration file just use the small example above, +but make sure to replace the port number `7000` with the right port number +according to the directory name. + +Now copy your redis-server executable, **compiled from the latest sources in the unstable branch at Github**, into the `cluster-test` directory, and finally open 6 terminal tabs in your favorite terminal application. + +Start every instance like that, one every tab: + +``` +cd 7000 +../redis-server ./redis.conf +``` + +As you can see from the logs of every instance, since no `nodes.conf` file was +existing, every node assigns itself a new ID. + + [82462] 26 Nov 11:56:55.329 * No cluster configuration found, I'm 97a3a64667477371c4479320d683e4c8db5858b1 + +This ID will be used forever by this specific instance in order for the instance +to have an unique name in the context of the cluster. All the other nodes +remember the other nodes by this specific ID, and not by IP or port, these can +change, but the unique node identifier will never change for all the life +of the node. We call this identifier simply **Node ID**. + +Creating the cluster +--- + +Now that we have a number of instances running, we need to create our +cluster writing some meaningful configuration to the nodes. + +This is very easy to accomplish as we are helped by the Redis Cluster +command line interface utility called `redis-trib`, that is a Ruby program +calling CLUSTER commands in the instances in order to create new clusters, +check or reshard an existing cluster. + +The `redis-trib` utility is in the `src` directory of the Redis source code +distribution. To create your cluster simply type: + + ./redis-trib.rb create --replicas 1 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 + +The command used here is **create**, since we want to create a new cluster. +The option `--replicas 1` means that we want a slave for every master created. +The other arguments are the list of addresses of the instances I want to use +to create the new cluster. + +Obviously the only setup with our requirements is to create a cluster with +3 masters and 3 slaves. + +Redis-trib will propose you a configuration. Accept typing **yes**. +The cluster will be configured and *joined*, that means, instances will be +bootstrapped into talking with each other. Finally if everything went ok +you'll see a message like that: + + [OK] All 16384 slots covered + +This means that there is at least a master instance serving each of the +16384 slots available. + +Playing with the cluster +--- + +At this stage one of the problems with Redis Cluster is the lack of +client libraries implementations. + +I'm aware of the following implementations: + +* [redis-rb-cluster](http://github.com/antirez/redis-rb-cluster) is a Ruby implementation written by me (@antirez) as a reference for other languages. It is a simple wrapper around the original redis-rb, implementing the minimal semantics to talk with the cluster efficiently. +* [redis-py-cluster](https://github.com/Grokzen/redis-py-cluster) appears to be a port of redis-rb-cluster to Python. Not recently updated (last commit 6 months ago) however it may be a starting point. +* The popular [Predis](https://github.com/nrk/predis) used to have some Redis Cluster support at the very early stages of Redis Cluster, however I'm currently not sure about the completeness of the support, nor if the support is designed to work with recent versions of Redis Cluster (at some point we changed the number of hash slots from 4k to 16k). +* The `redis-cli` utility in the unstable branch of the Redis repository at Github implements a very basic cluster support when started with the `-c` switch. + +Long story short an easy way to test Redis Cluster is either to try the +[redis-rb-cluster](http://github.com/antirez/redis-rb-cluster) Ruby client or +simply the `redis-cli` command line utility. The following is an example +of interaction using the latter: + +``` +$ redis-cli -c -p 7000 +redis 127.0.0.1:7000> set foo bar +-> Redirected to slot [12182] located at 127.0.0.1:7002 +OK +redis 127.0.0.1:7002> set hello world +-> Redirected to slot [866] located at 127.0.0.1:7000 +OK +redis 127.0.0.1:7000> get foo +-> Redirected to slot [12182] located at 127.0.0.1:7002 +"bar" +redis 127.0.0.1:7000> get hello +-> Redirected to slot [866] located at 127.0.0.1:7000 +"world" +``` + +The redis-cli cluster support is very basic so it always uses the fact that +Redis Cluster nodes are able to redirect a client to the right node. +A serious client is able to do better than that, and cache the map between +hash slots and nodes addresses, to directly use the right connection to the +right node. The map is refreshed only when something changed in the cluster +configuration, for example after a failover or after the system administrator +changed the cluster layout by adding or removing nodes. + +Writing an example app with redis-rb-cluster +--- + +Before goign forward showing how to operate the Redis Cluster, doing things +like a failover, or a resharding, we need to create some example application +or at least to be able to understand the semantics of a simple Redis Cluster +client interaction. + +In this way we can run an example and at the same time try to make nodes +failing, or start a resharding, to see how Redis Cluster behaves under real +world conditions. It is not very helpful to see what happens while nobody +is writing to the cluster. + +This section explains some basic usage of redis-rb-cluster showing two +examples. The first is the following, and is the `example.rb` file inside +the redis-rb-cluster distribution: + +``` + 1 require './cluster' + 2 + 3 startup_nodes = [ + 4 {:host => "127.0.0.1", :port => 7000}, + 5 {:host => "127.0.0.1", :port => 7001} + 6 ] + 7 rc = RedisCluster.new(startup_nodes,32,:timeout => 0.1) + 8 + 9 last = false + 10 + 11 while not last + 12 begin + 13 last = rc.get("__last__") + 14 last = 0 if !last + 15 rescue => e + 16 puts "error #{e.to_s}" + 17 sleep 1 + 18 end + 19 end + 20 + 21 ((last.to_i+1)..1000000000).each{|x| + 22 begin + 23 rc.set("foo#{x}",x) + 24 puts rc.get("foo#{x}") + 25 rc.set("__last__",x) + 26 rescue => e + 27 puts "error #{e.to_s}" + 28 end + 29 sleep 0.1 + 30 } +``` + +The application does a very simple thing, it sets keys in the form `foo` to `number`, one after the other. So if you run the program the result is the +following stream of commands: + +* SET foo0 0 +* SET foo1 1 +* SET foo2 2 +* And so forth... + +The program looks more complex than it should usually as it is designed to +show errors on the screen instead of exiting with an exception, so every +operation performed with the cluster is wrapped by `begin` `rescue` blocks. + +The **line 7** is the first interesting line in the program. It creates the +Redis Cluster object, using as argument a list of *startup nodes*, the maximum +number of connections this object is allowed to take against different nodes, +and finally the timeout after a given operation is considered to be failed. + +The startup nodes don't need to be all the nodes of the cluster. The important +thing is that at least one node is reachable. Also note that redis-rb-cluster +updates this list of startup nodes as soon as it is able to connect with the +first node. You should expect such a behavior with any other serious client. + +Now that we have the Redis Cluster object instance stored in the **rc** variable +we are ready to use the object like if it was a normal Redis object instance. + +This is exactly what happens in **line 11 to 19**: when we restart the example +we don't want to start again with `foo0`, so we store the counter inside +Redis itself. The code above is designed to read this counter, or if the +counter does not exist, to assign it the value of zero. + +However note how it is a while loop, as we want to try again and again even +if the cluster is down and is returning errors. Normal applications don't need +to be so careful. + +**Lines between 21 and 30** start the main loop where the keys are set or +an error is displayed. + +Note the `sleep` call at the end of the loop. In your tests you can remove +the sleep if you want to write to the cluster as fast as possible (relatively +to the fact that this is a busy loop without real parallelism of course, so +you'll get the usually 10k ops/second in the best of the conditions). + +Normally writes are slowed down in order for the example application to be +easier to follow by humans. + +Starting the application produces the following output: + +``` +ruby ./example.rb +1 +2 +3 +4 +5 +6 +7 +8 +9 +^C (I stopped the program here) +``` + +This is not a very interesting program and we'll use a better one in a moment +but we can already try what happens during a resharding when the program +is running. + +Resharding the cluster +--- + +Now we are ready to try a cluster resharding. To do this please +keep the example.rb program running, so that you can see if there is some +impact on the program running. Also you may want to comment the `sleep` +call in order to have some more serious write load during resharding. + +Resharding basically means to move hash slots from a set of nodes to another +set of nodes, and like cluster creation it is accomplished using the +redis-trib utility. + +To start a resharding just type: + + ./redis-trib.rb reshard 127.0.0.1:7000 + +You only need to specify a single node, redis-trib will find the other nodes +automatically. + +For now redis-trib is only able to reshard with the administrator support, +you can't just say move 5% of slots from this node to the other one (but +this is pretty trivial to implement). So it starts with questions. The first +is how much a big resharding do you want to do: + + How many slots do you want to move (from 1 to 16384)? + +We can try to reshard 1000 hash slots, that should already contain a non +trivial amount of keys if the example is still running without the sleep +call. + +Then redis-trib needs to know what is the target of the resharding. +I'll use the first master node, that is, 127.0.0.1:7000, but I need +to specify the Node ID of the instance. This was already printed in a +list by redis-trib, but I can always find the ID of a node with the following +command if I need: + +``` +$ redis-cli -p 7000 cluster nodes | grep myself +97a3a64667477371c4479320d683e4c8db5858b1 :0 myself,master - 0 0 0 connected 0-5460 +``` + +Ok so my target node is 97a3a64667477371c4479320d683e4c8db5858b1. + +Now you'll get asked from what nodes you want to take those keys. +I'll just type `all` in order to take a bit of hash slots from all the +other master nodes. + +After the final confirmation you'll see a message for every slot that +redis-trib is going to move from a node to another, and a dot will be printed +for every actual key moved from one side to the other. + +While the resharding is in progress you should be able to see your +example program running unaffected. You can stop and restart it multiple times +during the resharding if you want. + +At the end of the resharding, you can test the health of the cluster with +the following command: + + ./redis-trib.rb check 127.0.0.1:7000 + +All the slots will be covered as usually, but this time the master at +127.0.0.1:7000 will have more hash slots, something around 6461. + +A more interesting example application +--- + +So far so good, but the example application we used is not very good. +It writes acritically to the cluster without ever checking if what was +written is the right thing. + +From our point of view the cluster receiving the writes could just always +write the key `foo` to `42` to every operation, and we would not notice at +all. + +So in the reids-rb-cluster repository, there is a more interesting application +that is called `consistency-test.rb`. It is a much more interesting application +as it uses a set of counters, by default 1000, and sends `INCR` commands +in order to increment the counters. + +However instead of just writing, the application does two additional things: + +* When a counter is updated using `INCR`, the application remembers the write. +* It also reads a random counter before every write, and check if the value is what it expected it to be, comparing it with the value it has in memory. + +What this means is that this application is a simple **consistency checker**, +and is able to tell you if the cluster lost some write, or if it accepted +a write that we did not received acknowledgement for. In the first case we'll +see a counter having a value that is smaller than the one we remember, while +in the second case the value will be greater. + +Running the consistency-test application produces a line of output every +second: + +``` +$ ruby consistency-test.rb +925 R (0 err) | 925 W (0 err) | +5030 R (0 err) | 5030 W (0 err) | +9261 R (0 err) | 9261 W (0 err) | +13517 R (0 err) | 13517 W (0 err) | +17780 R (0 err) | 17780 W (0 err) | +22025 R (0 err) | 22025 W (0 err) | +25818 R (0 err) | 25818 W (0 err) | +``` + +The line shows the number of **R**eads and **W**rites performed, and the +number of errors (query not accepted because of errors since the system was +not available). + +If some inconsistency is found, new lines are added to the output. +This is what happens, for example, if I reset a counter manually while +the program is running: + +``` +$ redis 127.0.0.1:7000> set key_217 0 +OK + +(in the other tab I see...) + +94774 R (0 err) | 94774 W (0 err) | +98821 R (0 err) | 98821 W (0 err) | +102886 R (0 err) | 102886 W (0 err) | 114 lost | +107046 R (0 err) | 107046 W (0 err) | 114 lost | +``` + +When I set the counter to 0 the real value was 144, so the program reports +144 lost writes (`INCR` commands that are not remembered by the cluster). + +This program is much more interesting as a test case, so we'll use it +to test the Redis Cluster failover. + +Testing the failover +--- + +Note: during this test, you should take a tab open with the consistency test +application running. + +In order to trigger the failover, the simplest thing we can do (that is also +the semantically simplest failure that can occur in a distributed system) +is to crash a single process, in our case a single master. + +We can identify a cluster and crash it with the following command: + +``` +$ redis-cli -p 7000 cluster nodes | grep master +3e3a6cb0d9a9a87168e266b0a0b24026c0aae3f0 127.0.0.1:7001 master - 0 1385482984082 0 connected 5960-10921 +2938205e12de373867bf38f1ca29d31d0ddb3e46 127.0.0.1:7002 master - 0 1385482983582 0 connected 11423-16383 +97a3a64667477371c4479320d683e4c8db5858b1 :0 myself,master - 0 0 0 connected 0-5959 10922-11422 +``` + +Ok, so 7000, 7001, and 7002 are masters. Let's crash node 7002 with the +**DEBUG SEGFAULT** command: + +``` +$ redis-cli -p 7002 debug segfault +Error: Server closed the connection +``` + +Now we can look at the output of the consistency test to see what it reported. + +``` +18849 R (0 err) | 18849 W (0 err) | +23151 R (0 err) | 23151 W (0 err) | +27302 R (0 err) | 27302 W (0 err) | + +... many error warnings here ... + +29659 R (578 err) | 29660 W (577 err) | +33749 R (578 err) | 33750 W (577 err) | +37918 R (578 err) | 37919 W (577 err) | +42077 R (578 err) | 42078 W (577 err) | +``` + +As you can see during the failover the system was not able to accept 578 reads and 577 writes, however no inconsistency was created in the database. This may +sound unexpected as in the first part of this tutorial we stated that Redis +Cluster can lost writes during the failover because it uses synchronous +replication. What we did not said is that this is not very likely to happen +because Redis sends the reply to the client, and the commands to replicate +to the slaves, about at the same time, so there is a very small window to +lose data. However the fact that it is hard to trigger does not mean that it +is impossible, so this does not change the consistency guarantees provided +by Redis cluster. + +We can now check what is the cluster setup after the failover (note that +in the meantime I restarted the crashed instance so that it rejoins the +cluster as a slave): + +``` +$ redis-cli -p 7000 cluster nodes +3fc783611028b1707fd65345e763befb36454d73 127.0.0.1:7004 slave 3e3a6cb0d9a9a87168e266b0a0b24026c0aae3f0 0 1385503418521 0 connected +a211e242fc6b22a9427fed61285e85892fa04e08 127.0.0.1:7003 slave 97a3a64667477371c4479320d683e4c8db5858b1 0 1385503419023 0 connected +97a3a64667477371c4479320d683e4c8db5858b1 :0 myself,master - 0 0 0 connected 0-5959 10922-11422 +3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e 127.0.0.1:7005 master - 0 1385503419023 3 connected 11423-16383 +3e3a6cb0d9a9a87168e266b0a0b24026c0aae3f0 127.0.0.1:7001 master - 0 1385503417005 0 connected 5960-10921 +2938205e12de373867bf38f1ca29d31d0ddb3e46 127.0.0.1:7002 slave 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e 0 1385503418016 3 connected +``` + +Now the masters are running on ports 7000, 7001 and 7005. What was previously +a master, that is the Redis instance running on port 7002, is now a slave of +7005. + +The output of the `CLUSTER NODES` command may look intimidating, but it is actually pretty simple, and is composed of the following tokens: + +* Node ID +* ip:port +* flags: master, slave, myself, fail, ... +* if it is a slave, the Node ID of the master +* Time of the last pending PING still waiting for a reply. +* Time of the last PONG received. +* Configuration epoch for this node (see the Cluster specification). +* Status of the link to this node. +* Slots served... + +Adding a new node +--- + +Adding a new node is basically the process of adding an empty node and then +moving some data into it, in case it is a new master, or telling it to +setup as a replica of a known node, in case it is a slave. + +We'll show both, starting with the addition of a new master instance. + +In both cases the first step to perform is **adding an empty node**. + +This is as simple as to start a new node in port 7006 (we already used +from 7000 to 7005 for our existing 6 nodes) with the same configuration +used for the other nodes, except for the port number, so what you should +do in order to conform with the setup we used for the previous nodes: + +* Create a new tab in your terminal application. +* Enter the `cluster-test` directory. +* Create a directory named `7006`. +* Create a redis.conf file inside, similar to the one used for the other nodes but using 7006 as port number. +* Finally start the server with `../redis-server ./redis.conf` + +At this point the server should be running. + +Now we can use **redis-trib** as usually in order to add the node to +the existing cluster. + + ./redis-trib.rb addnode 127.0.0.1:7006 127.0.0.1:7000 + +As you can see I used the **addnode** command specifying the address of the +new node as first argument, and the address of a random existing node in the +cluster as second argument. + +In practical terms redis-trib here did very little to help us, it just +sent a `CLUSTER MEET` message to the node, something that is also possible +to accomplish manually. However redis-trib also checks the state of the +cluster before to operate, that is an advantage, and will be improved more +and more in the future in order to also be able to rollback changes when +needed or to help the user to fix a messed up cluster when there are issues. + +Now we can connect to the new node to see if it really joined the cluster: + +``` +redis 127.0.0.1:7006> cluster nodes +3e3a6cb0d9a9a87168e266b0a0b24026c0aae3f0 127.0.0.1:7001 master - 0 1385543178575 0 connected 5960-10921 +3fc783611028b1707fd65345e763befb36454d73 127.0.0.1:7004 slave 3e3a6cb0d9a9a87168e266b0a0b24026c0aae3f0 0 1385543179583 0 connected +f093c80dde814da99c5cf72a7dd01590792b783b :0 myself,master - 0 0 0 connected +2938205e12de373867bf38f1ca29d31d0ddb3e46 127.0.0.1:7002 slave 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e 0 1385543178072 3 connected +a211e242fc6b22a9427fed61285e85892fa04e08 127.0.0.1:7003 slave 97a3a64667477371c4479320d683e4c8db5858b1 0 1385543178575 0 connected +97a3a64667477371c4479320d683e4c8db5858b1 127.0.0.1:7000 master - 0 1385543179080 0 connected 0-5959 10922-11422 +3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e 127.0.0.1:7005 master - 0 1385543177568 3 connected 11423-16383 +``` + +Note that since this node is already connected to the cluster it is already +able to redirect client queries correctly and is generally speaking part of +the cluster. However it has two peculiarities compared to the other masters: + +* It holds no data as it has no assigned hash slots. +* Because it is a master without assigned slots, it does not participate in the election process when a slave wants to become a master. + +Now it is possible to assign hash slots to this node using the resharding +feature of `redis-trib`. It is basically useless to show this as we already +did in a previous section, so instead I want to cover the case where we want +to turn this instance into a replica (a slave) of some other master. + +For example in order to add a replica for the node 127.0.0.1:7005 that is +currently serving hash slots in the range 11423-16383, that has a Node ID +3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e, all I need to do is to connect +with the new node that we just added and send a simple command: + + redis 127.0.0.1:7006> cluster replicate 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e + +That's it. Now we have a new replica for this set of hash slots, and all +the other nodes in the cluster already know (after a few seconds needed to +update their config). We can verify with the following command: + +``` +$ redis-cli -p 7000 cluster nodes | grep slave | grep 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e +f093c80dde814da99c5cf72a7dd01590792b783b 127.0.0.1:7006 slave 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e 0 1385543617702 3 connected +2938205e12de373867bf38f1ca29d31d0ddb3e46 127.0.0.1:7002 slave 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e 0 1385543617198 3 connected +``` + +The node 3c3a0c... now has two slaves, running on ports 7002 (the existing one) and 7006 (the new one). + +Removing a node +--- + +Work in progress. From 4103f8cae754f085201208fd3862b696db023ab8 Mon Sep 17 00:00:00 2001 From: antirez Date: Wed, 27 Nov 2013 10:18:44 +0100 Subject: [PATCH 0091/2573] Long line split in cluster tutorial. --- topics/cluster-tutorial.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/topics/cluster-tutorial.md b/topics/cluster-tutorial.md index b6b7da958f..68aea3cb75 100644 --- a/topics/cluster-tutorial.md +++ b/topics/cluster-tutorial.md @@ -228,7 +228,8 @@ check or reshard an existing cluster. The `redis-trib` utility is in the `src` directory of the Redis source code distribution. To create your cluster simply type: - ./redis-trib.rb create --replicas 1 127.0.0.1:7000 127.0.0.1:7001 127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 + ./redis-trib.rb create --replicas 1 127.0.0.1:7000 127.0.0.1:7001 \ + 127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 The command used here is **create**, since we want to create a new cluster. The option `--replicas 1` means that we want a slave for every master created. From 2fc3e9bed576ffe34108ece85c9bcf9b5b69c045 Mon Sep 17 00:00:00 2001 From: antirez Date: Wed, 27 Nov 2013 10:27:33 +0100 Subject: [PATCH 0092/2573] typo in cluster-tutorial. --- topics/cluster-tutorial.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topics/cluster-tutorial.md b/topics/cluster-tutorial.md index 68aea3cb75..428d96de37 100644 --- a/topics/cluster-tutorial.md +++ b/topics/cluster-tutorial.md @@ -6,7 +6,7 @@ complex to understand distributed systems concepts. It provides instructions about how to setup a cluster, test, and operate it, without going into the details that are covered in the [Redis Cluster specification](/topics/cluster-spec) but just describing -how the system behaves from the point of view of the cluster. +how the system behaves from the point of view of the user. Note that if you plan to run a serious Redis Cluster deployment, the more formal specification is an highly suggested reading. From 09d7756a443b51145a3de8f54b822353824b4ae3 Mon Sep 17 00:00:00 2001 From: antirez Date: Wed, 27 Nov 2013 10:29:15 +0100 Subject: [PATCH 0093/2573] typo #2 in cluster-tutorial. --- topics/cluster-tutorial.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topics/cluster-tutorial.md b/topics/cluster-tutorial.md index 428d96de37..bfc73b438c 100644 --- a/topics/cluster-tutorial.md +++ b/topics/cluster-tutorial.md @@ -42,7 +42,7 @@ Redis Cluster does not use consistency hashing, but a different form of sharding where every key is conceptually part of what we call an **hash slot**. There are 16384 hash slots in Redis Cluster, and to compute what is the hash -slot of a given key, we simply take the CRC16 of the hash slot modulo +slot of a given key, we simply take the CRC16 of the key modulo 16384. Every node in a Redis Cluster is responsible of a subset of the hash slots, From 5d3346cf09fd0bcceb7c81bf138661c065ba399c Mon Sep 17 00:00:00 2001 From: antirez Date: Wed, 27 Nov 2013 10:31:52 +0100 Subject: [PATCH 0094/2573] typo #3 in cluster-tutorial. --- topics/cluster-tutorial.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topics/cluster-tutorial.md b/topics/cluster-tutorial.md index bfc73b438c..92a9c52a5e 100644 --- a/topics/cluster-tutorial.md +++ b/topics/cluster-tutorial.md @@ -100,7 +100,7 @@ happens: * The master B replies OK to your client. * The master B propagates the write to its slaves B1, B2 and B3. -As you can see B does not write for an acknowledge from B1, B2, B3 before +As you can see B does not wait for an acknowledge from B1, B2, B3 before replying to the client, since this would be a prohibitive latency penalty for Redis, so if your client writes something, B acknowledges the write, but crashes before being able to send the write to its slaves, one of the From f0d2619a289bc76c4142361777ba41e7fc88291a Mon Sep 17 00:00:00 2001 From: antirez Date: Wed, 27 Nov 2013 10:41:40 +0100 Subject: [PATCH 0095/2573] A few sentences improved in cluster-tutorial. --- topics/cluster-tutorial.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/topics/cluster-tutorial.md b/topics/cluster-tutorial.md index 92a9c52a5e..679f176da0 100644 --- a/topics/cluster-tutorial.md +++ b/topics/cluster-tutorial.md @@ -203,16 +203,16 @@ cd 7000 ../redis-server ./redis.conf ``` -As you can see from the logs of every instance, since no `nodes.conf` file was -existing, every node assigns itself a new ID. +As you can see from the logs of every instance, since no `nodes.conf` file +existed, every node assigns itself a new ID. [82462] 26 Nov 11:56:55.329 * No cluster configuration found, I'm 97a3a64667477371c4479320d683e4c8db5858b1 This ID will be used forever by this specific instance in order for the instance -to have an unique name in the context of the cluster. All the other nodes -remember the other nodes by this specific ID, and not by IP or port, these can -change, but the unique node identifier will never change for all the life -of the node. We call this identifier simply **Node ID**. +to have an unique name in the context of the cluster. Every node +remembers every other node using this IDs, and not by IP or port. +IP addresses and ports may change, but the unique node identifier will never +change for all the life of the node. We call this identifier simply **Node ID**. Creating the cluster --- @@ -221,9 +221,9 @@ Now that we have a number of instances running, we need to create our cluster writing some meaningful configuration to the nodes. This is very easy to accomplish as we are helped by the Redis Cluster -command line interface utility called `redis-trib`, that is a Ruby program -calling CLUSTER commands in the instances in order to create new clusters, -check or reshard an existing cluster. +command line utility called `redis-trib`, that is a Ruby program +executing special commands in the instances in order to create new clusters, +check or reshard an existing cluster, and so forth. The `redis-trib` utility is in the `src` directory of the Redis source code distribution. To create your cluster simply type: From f9a124e70fad9a4571592f24f86659ec8af5b1bd Mon Sep 17 00:00:00 2001 From: antirez Date: Wed, 27 Nov 2013 10:51:35 +0100 Subject: [PATCH 0096/2573] Grammar fix in cluster-tutorial. --- topics/cluster-tutorial.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topics/cluster-tutorial.md b/topics/cluster-tutorial.md index 679f176da0..63c3fefdc7 100644 --- a/topics/cluster-tutorial.md +++ b/topics/cluster-tutorial.md @@ -425,7 +425,7 @@ To start a resharding just type: You only need to specify a single node, redis-trib will find the other nodes automatically. -For now redis-trib is only able to reshard with the administrator support, +Currently redis-trib is only able to reshard with the administrator support, you can't just say move 5% of slots from this node to the other one (but this is pretty trivial to implement). So it starts with questions. The first is how much a big resharding do you want to do: From 9f9920e40f7243def0ea39ad2dd4436b293f5175 Mon Sep 17 00:00:00 2001 From: antirez Date: Wed, 27 Nov 2013 10:55:40 +0100 Subject: [PATCH 0097/2573] Sentence more clear in cluster-tutorial. --- topics/cluster-tutorial.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/topics/cluster-tutorial.md b/topics/cluster-tutorial.md index 63c3fefdc7..3151195b75 100644 --- a/topics/cluster-tutorial.md +++ b/topics/cluster-tutorial.md @@ -436,7 +436,8 @@ We can try to reshard 1000 hash slots, that should already contain a non trivial amount of keys if the example is still running without the sleep call. -Then redis-trib needs to know what is the target of the resharding. +Then redis-trib needs to know what is the target of the resharding, that is, +the node that will receive the hash slots. I'll use the first master node, that is, 127.0.0.1:7000, but I need to specify the Node ID of the instance. This was already printed in a list by redis-trib, but I can always find the ID of a node with the following From 0b52b85b004a6240582e31ffd640fd5a4097412a Mon Sep 17 00:00:00 2001 From: antirez Date: Wed, 27 Nov 2013 11:03:37 +0100 Subject: [PATCH 0098/2573] syncrhonous -> asynchronous. --- topics/cluster-tutorial.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topics/cluster-tutorial.md b/topics/cluster-tutorial.md index 3151195b75..19043e83a5 100644 --- a/topics/cluster-tutorial.md +++ b/topics/cluster-tutorial.md @@ -581,7 +581,7 @@ Now we can look at the output of the consistency test to see what it reported. As you can see during the failover the system was not able to accept 578 reads and 577 writes, however no inconsistency was created in the database. This may sound unexpected as in the first part of this tutorial we stated that Redis -Cluster can lost writes during the failover because it uses synchronous +Cluster can lost writes during the failover because it uses asynchronous replication. What we did not said is that this is not very likely to happen because Redis sends the reply to the client, and the commands to replicate to the slaves, about at the same time, so there is a very small window to From 6f0a9b62b0536304a198b29e07d7275103b999ae Mon Sep 17 00:00:00 2001 From: Jan-Erik Rediger Date: Sat, 30 Nov 2013 14:50:14 +0100 Subject: [PATCH 0099/2573] Update notice about keyspace notifications --- topics/notifications.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/topics/notifications.md b/topics/notifications.md index a10932fcbd..7cee600f62 100644 --- a/topics/notifications.md +++ b/topics/notifications.md @@ -1,9 +1,7 @@ Redis Keyspace Notifications === -**IMPORTANT** Keyspace notifications is a feature only available in development -versions of Redis. This documentation and the implementation of the feature are -likely to change in the next weeks. +**IMPORTANT** Keyspace notifications is a feature available since 2.8.0 Feature overview --- From e9bca400841626f9a8ff0620425e37ac5da7c775 Mon Sep 17 00:00:00 2001 From: Jonathan Lassoff Date: Sat, 30 Nov 2013 10:23:35 -0800 Subject: [PATCH 0100/2573] Fix small mis-spelling. --- topics/cluster-tutorial.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/topics/cluster-tutorial.md b/topics/cluster-tutorial.md index 19043e83a5..cd3f62cab7 100644 --- a/topics/cluster-tutorial.md +++ b/topics/cluster-tutorial.md @@ -46,7 +46,7 @@ slot of a given key, we simply take the CRC16 of the key modulo 16384. Every node in a Redis Cluster is responsible of a subset of the hash slots, -so for example you may have a cluster wit 3 nodes, where: +so for example you may have a cluster with 3 nodes, where: * Node A contains hash slots from 0 to 5500. * Node B contains hash slots from 5501 to 11000. From 15ba3deb536a4482a828e64aabe54fe7d84c3b52 Mon Sep 17 00:00:00 2001 From: antirez Date: Mon, 2 Dec 2013 16:16:46 +0100 Subject: [PATCH 0101/2573] Added authors field for Scredis in clients.json. --- clients.json | 1 + 1 file changed, 1 insertion(+) diff --git a/clients.json b/clients.json index 71e5dad987..a37f58a377 100644 --- a/clients.json +++ b/clients.json @@ -446,6 +446,7 @@ "language": "Scala", "repository": "https://github.com/Livestream/scredis", "description": "Scredis is an advanced Redis client entirely written in Scala. Used in production at http://Livestream.com.", + "authors": ["livestream"], "active": true }, From 719fc979c102ca5f7930e8839d1a1aea1686dcda Mon Sep 17 00:00:00 2001 From: antirez Date: Fri, 6 Dec 2013 23:43:10 +0100 Subject: [PATCH 0102/2573] Nydus added to tools section. --- tools.json | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/tools.json b/tools.json index 038ef20f32..d09920c570 100644 --- a/tools.json +++ b/tools.json @@ -296,5 +296,13 @@ "repository": "https://github.com/uglide/RedisDesktopManager", "description": "Cross-platform desktop GUI management tool for Redis", "authors": ["u_glide"] + }, + { + "name": Nydus", + "language": "Python", + "url": "https://pypi.python.org/pypi/nydus", + "repository": "https://pypi.python.org/pypi/nydus", + "description": "Connection clustering and routing for Redis and Python.", + "authors": ["@zeeg"] } ] From a4d5623d3e60643340fe8c6d1d258a87413d3f11 Mon Sep 17 00:00:00 2001 From: antirez Date: Fri, 6 Dec 2013 23:44:22 +0100 Subject: [PATCH 0103/2573] json fix --- tools.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools.json b/tools.json index d09920c570..4518441b57 100644 --- a/tools.json +++ b/tools.json @@ -298,7 +298,7 @@ "authors": ["u_glide"] }, { - "name": Nydus", + "name": "Nydus", "language": "Python", "url": "https://pypi.python.org/pypi/nydus", "repository": "https://pypi.python.org/pypi/nydus", From 465aa903da13869ea52323c0f7d5a6e6077bcfd6 Mon Sep 17 00:00:00 2001 From: antirez Date: Fri, 6 Dec 2013 23:45:44 +0100 Subject: [PATCH 0104/2573] Twitter ref fixed in Nydus tool. --- tools.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools.json b/tools.json index 4518441b57..c85a0e6d4e 100644 --- a/tools.json +++ b/tools.json @@ -303,6 +303,6 @@ "url": "https://pypi.python.org/pypi/nydus", "repository": "https://pypi.python.org/pypi/nydus", "description": "Connection clustering and routing for Redis and Python.", - "authors": ["@zeeg"] + "authors": ["zeeg"] } ] From 83fb3acba896c25987a3561ee5cdaaaaf287ba61 Mon Sep 17 00:00:00 2001 From: antirez Date: Mon, 9 Dec 2013 11:30:42 +0100 Subject: [PATCH 0105/2573] UPDATE messages added to the Cluster spec. This is already part of the implementatio but was not covered in the spec. --- topics/cluster-spec.md | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/topics/cluster-spec.md b/topics/cluster-spec.md index d3389a5c4e..c8e4cc2628 100644 --- a/topics/cluster-spec.md +++ b/topics/cluster-spec.md @@ -612,6 +612,31 @@ For this reason there is a second rule that is used in order to rebind an hash s Because of the second rule eventually all the nodes in the cluster will agree that the owner of a slot is the one with the greatest `configEpoch` among the nodes advertising it. +UPDATE messages +=== + +The described system for the propagation of hash slots configurations +only uses the normal ping and pong messages exchanged by nodes. + +It also requires that there is a node that is either a slave or a master +for a given hash slot and has the updated configuration, because nodes +send their own configuration in pong and pong packets headers. + +However sometimes a node may recover after a partition in a setup where +it is the only node serving a given hash slot, but with an old configuration. + +Example: a given hash slot is served by node A and B. A is the master, and at some point fails, so B is promoted as master. Later B fails as well, and the cluster has no way to recover since there are no more replicas for this hash slot. + +However A may recover some time later, and rejoin the cluster with an old configuration in which it was writable as a master. There is no replica that can update its configuration. This is the goal of UPDATE messages: when a node detects that another node is advertising its hash slots with an old configuration, it sends the node an UPDATE message with the ID of the new node serving the slots and the set of hash slots (send as a bitmap) that it is serving. + +NOTE: while currently configuration updates via ping / pong and UPDATE share the +same code path, there is a functional overlap between the two in the way they +update a configuration of a node with stale informations. However the two +mechanisms are both useful because ping / pong messages after some time are +able to populate the hash slots routing table of a new node, while UPDATE +messages are only sent when an old configuration is detected, and only +cover the information needed to fix the wrong configuration. + Publish/Subscribe === From 50a9f98a8822b4267786d147ff5adadb225179d0 Mon Sep 17 00:00:00 2001 From: Lennie Date: Sun, 15 Dec 2013 10:06:53 +0100 Subject: [PATCH 0106/2573] Update link to example 2.8 redis.conf in config set command documentation Maybe it is time to update the documentation to point to the example redis.conf of the stable (2.8) version. --- commands/config set.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/commands/config set.md b/commands/config set.md index fb75449e82..f556b9a3d2 100644 --- a/commands/config set.md +++ b/commands/config set.md @@ -14,7 +14,7 @@ All the supported parameters have the same meaning of the equivalent configuration parameter used in the [redis.conf][hgcarr22rc] file, with the following important differences: -[hgcarr22rc]: http://github.com/antirez/redis/raw/2.2/redis.conf +[hgcarr22rc]: http://github.com/antirez/redis/raw/2.8/redis.conf * Where bytes or other quantities are specified, it is not possible to use the `redis.conf` abbreviated form (10k 2gb ... and so forth), everything From 4a39bc329a6a518a1c9351a55dde112b2b41d037 Mon Sep 17 00:00:00 2001 From: Sasan Rose Date: Mon, 23 Dec 2013 23:09:11 +0330 Subject: [PATCH 0107/2573] Multi-server functionality added to PHPRedMin --- tools.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools.json b/tools.json index c85a0e6d4e..f8b5e88d00 100644 --- a/tools.json +++ b/tools.json @@ -279,7 +279,7 @@ "name": "PHPRedMin", "language": "PHP", "repository": "https://github.com/sasanrose/phpredmin", - "description": "Yet another web interface for Redis", + "description": "Yet another web interface for Redis with multi-server support", "authors": ["sasanrose"] }, { From 3e2df64b262e75c8d29cd4bd8df3f26dadf33e9a Mon Sep 17 00:00:00 2001 From: Michael Neumann Date: Sat, 4 Jan 2014 11:39:54 +0100 Subject: [PATCH 0108/2573] Add rust-redis client for Rust language --- clients.json | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/clients.json b/clients.json index e18661b340..2c6e44cca2 100644 --- a/clients.json +++ b/clients.json @@ -686,5 +686,14 @@ "repository": "https://github.com/chrisdinn/brando", "description": "A Redis client written with the Akka IO package introduced in Akka 2.2.", "authors": ["chrisdinn"] + }, + + { + "name": "rust-redis", + "language": "Rust", + "repository": "https://github.com/mneumann/rust-redis", + "description": "A Rust client library for Redis.", + "authors": ["mneumann"], + "active": true } ] From 4953dcd08081e320eeaf9f8b71402c2b9f108c64 Mon Sep 17 00:00:00 2001 From: Carlos Nieto Date: Sun, 5 Jan 2014 13:02:19 -0600 Subject: [PATCH 0109/2573] Updating description and url. --- clients.json | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/clients.json b/clients.json index e18661b340..f2f9f29ad1 100644 --- a/clients.json +++ b/clients.json @@ -141,7 +141,8 @@ "name": "gosexy/redis", "language": "Go", "repository": "https://github.com/gosexy/redis", - "description": "Go bindings for the official C redis client (hiredis), supports the whole command set of redis 2.6.10 and subscriptions with go channels.", + "url": "https://menteslibres.net/gosexy/redis", + "description": "A Go client for redis built on top of the hiredis C client. Supports non-blocking connections and channel-based subscriptions.", "authors": ["xiam"], "active": true }, @@ -671,7 +672,7 @@ "description": "Thread-safe client supporting async usage and key/value codecs", "authors": ["ar3te"] }, - + { "name": "csredis", "language": "C#", From 54997964be3219b396b787ad11b07ecee00bb28d Mon Sep 17 00:00:00 2001 From: Austin McKinley Date: Tue, 7 Jan 2014 12:26:31 -0800 Subject: [PATCH 0110/2573] fixing doc typos --- topics/replication.md | 82 +++++++++++++++++++++---------------------- 1 file changed, 40 insertions(+), 42 deletions(-) diff --git a/topics/replication.md b/topics/replication.md index 3191fc7c5e..5b70a2e1d8 100644 --- a/topics/replication.md +++ b/topics/replication.md @@ -6,46 +6,47 @@ replication that allows slave Redis servers to be exact copies of master servers. The following are some very important facts about Redis replication: -* Redis uses asynchronous replication. Starting with Redis 2.8 there is however a periodic (one time every second) acknowledge of the replication stream processed by slaves. +* Redis uses asynchronous replication. Starting with Redis 2.8, however, slaves +will periodically acknowledge the replication stream. * A master can have multiple slaves. -* Slaves are able to accept other slaves connections. Aside from +* Slaves are able to accept connections from other slaves. Aside from connecting a number of slaves to the same master, slaves can also be connected to other slaves in a graph-like structure. -* Redis replication is non-blocking on the master side, this means that -the master will continue to serve queries when one or more slaves perform -the first synchronization. +* Redis replication is non-blocking on the master side. This means that +the master will continue to handle queries when one or more slaves perform +the initial synchronization. -* Replication is non blocking on the slave side: while the slave is performing -the first synchronization it can reply to queries using the old version of -the data set, assuming you configured Redis to do so in redis.conf. -Otherwise you can configure Redis slaves to send clients an error if the -link with the master is down. However there is a moment where the old dataset must be deleted and the new one must be loaded by the slave where it will block incoming connections. +* Replication is also non-blocking on the slave side. While the slave is performing +the initial synchronization, it can handle queries using the old version of +the dataset, assuming you configured Redis to do so in redis.conf. +Otherwise, you can configure Redis slaves to return an error to clients if the +replication stream is down. However, after the initial sync, the old dataset +must be deleted and the new one must be loaded. The slave will block incoming +connections during this brief window. -* Replications can be used both for scalability, in order to have +* Replication can be used both for scalability, in order to have multiple slaves for read-only queries (for example, heavy `SORT` -operations can be offloaded to slaves, or simply for data redundancy. +operations can be offloaded to slaves), or simply for data redundancy. -* It is possible to use replication to avoid the saving process on the -master side: just configure your master redis.conf to avoid saving -(just comment all the "save" directives), then connect a slave +* It is possible to use replication to avoid the cost of writing the master +write the full dataset to disk: just configure your master redis.conf to avoid +saving (just comment all the "save" directives), then connect a slave configured to save from time to time. How Redis replication works --- -If you set up a slave, upon connection it sends a SYNC command. And -it doesn't matter if it's the first time it has connected or if it's -a reconnection. +If you set up a slave, upon connection it sends a SYNC command. It doesn't +matter if it's the first time it has connected or if it's a reconnection. -The master then starts background saving, and collects all new +The master then starts background saving, and starts to buffer all new commands received that will modify the dataset. When the background saving is complete, the master transfers the database file to the slave, which saves it on disk, and then loads it into memory. The master will -then send to the slave all accumulated commands, and all new commands -received from clients that will modify the dataset. This is done as a +then send to the slave all buffered commands. This is done as a stream of commands and is in the same format of the Redis protocol itself. You can try it yourself via telnet. Connect to the Redis port while the @@ -59,7 +60,7 @@ concurrent slave synchronization requests, it performs a single background save in order to serve all of them. When a master and a slave reconnects after the link went down, a full resync -is always performed. However starting with Redis 2.8, a partial resynchronization +is always performed. However, starting with Redis 2.8, a partial resynchronization is also possible. Partial resynchronization @@ -69,20 +70,17 @@ Starting with Redis 2.8, master and slave are usually able to continue the replication process without requiring a full resynchronization after the replication link went down. -This works using an in-memory backlog of the replication stream in the -master side. Also the master and all the slaves agree on a *replication +This works by creating an in-memory backlog of the replication stream on the +master side. The master and all the slaves agree on a *replication offset* and a *master run id*, so when the link goes down, the slave will -reconnect and ask the master to continue the replication, assuming the +reconnect and ask the master to continue the replication. Assuming the master run id is still the same, and that the offset specified is available -in the replication backlog. - -If the conditions are met, the master just sends the part of the replication -stream the master missed, and the replication continues. -Otherwise a full resynchronization is performed as in the past versions of -Redis. +in the replication backlog, replication will resume from the point where it left off. +If either of these conditions are unmet, a full resynchronization is performed +(which is the normal pre-2.8 behavior). The new partial resynchronization feature uses the `PSYNC` command internally, -while the old implementation used the `SYNC` command, however a Redis 2.8 +while the old implementation uses the `SYNC` command. Note that a Redis 2.8 slave is able to detect if the server it is talking with does not support `PSYNC`, and will use `SYNC` instead. @@ -98,19 +96,19 @@ Of course you need to replace 192.168.1.1 6379 with your master IP address (or hostname) and port. Alternatively, you can call the `SLAVEOF` command and the master host will start a sync with the slave. -There are also a few parameters in order to tune the replication backlog taken +There are also a few parameters for tuning the replication backlog taken in memory by the master to perform the partial resynchronization. See the example `redis.conf` shipped with the Redis distribution for more information. -Read only slave +Read-only slave --- -Since Redis 2.6 slaves support a read-only mode that is enabled by default. +Since Redis 2.6, slaves support a read-only mode that is enabled by default. This behavior is controlled by the `slave-read-only` option in the redis.conf file, and can be enabled and disabled at runtime using `CONFIG SET`. -Read only slaves will reject all the write commands, so that it is not possible to write to a slave because of a mistake. This does not mean that the feature is conceived to expose a slave instance to the internet or more generally to a network where untrusted clients exist, because administrative commands like `DEBUG` or `CONFIG` are still enabled. However security of read-only instances can be improved disabling commands in redis.conf using the `rename-command` directive. +Read-only slaves will reject all write commands, so that it is not possible to write to a slave because of a mistake. This does not mean that the feature is intended to expose a slave instance to the internet or more generally to a network where untrusted clients exist, because administrative commands like `DEBUG` or `CONFIG` are still enabled. However, security of read-only instances can be improved by disabling commands in redis.conf using the `rename-command` directive. -You may wonder why it is possible to revert the default and have slave instances that can be target of write operations. The reason is that while this writes will be discarded if the slave and the master will resynchronize, or if the slave is restarted, often there is ephemeral data that is unimportant that can be stored into slaves. For instance clients may take information about reachability of master in the slave instance to coordinate a fail over strategy. +You may wonder why it is possible to revert the read-only setting and have slave instances that can be target of write operations. The reason is that these writes will be discarded if the slave and the master resynchronize, or if the slave is restarted. Often there is ephemeral data that is unimportant that can be stored on read-only slaves. For instance, clients may take information about master reachability to coordinate a failover strategy. Setting a slave to authenticate to a master --- @@ -129,12 +127,12 @@ To set it permanently, add this to your config file: Allow writes only with N attached replicas --- -Starting with Redis 2.8 it is possible to configure a Redis master in order to +Starting with Redis 2.8, it is possible to configure a Redis master to accept write queries only if at least N slaves are currently connected to the -master, in order to improve data safety. +master. -However because Redis uses asynchronous replication it is not possible to ensure -the write actually received a given write, so there is always a window for data +However, because Redis uses asynchronous replication it is not possible to ensure +the slave actually received a given write, so there is always a window for data loss. This is how the feature works: @@ -154,5 +152,5 @@ There are two configuration parameters for this feature: * min-slaves-to-write `` * min-slaves-max-lag `` -For more information please check the example `redis.conf` file shipped with the +For more information, please check the example `redis.conf` file shipped with the Redis source distribution. From 17924196017b266d98332d6477b0e3b70c521a4c Mon Sep 17 00:00:00 2001 From: xuyu Date: Wed, 8 Jan 2014 16:22:34 +0800 Subject: [PATCH 0111/2573] Update clients.json add a new redis client for golang --- clients.json | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/clients.json b/clients.json index e18661b340..22bc868b68 100644 --- a/clients.json +++ b/clients.json @@ -145,6 +145,15 @@ "authors": ["xiam"], "active": true }, + + { + "name": "goredis", + "language": "Go", + "repository": "https://github.com/xuyu/goredis", + "description": "A redis client for golang with full features", + "authors": ["xuyu"], + "active": true + }, { "name": "hedis", From c4b896b57a7feb4ecd7c387bf99f7650f741ccb3 Mon Sep 17 00:00:00 2001 From: Jon Forrest Date: Thu, 9 Jan 2014 16:55:50 -0800 Subject: [PATCH 0112/2573] Initial changes. Nothing major. --- topics/twitter-clone.md | 78 ++++++++++++++++++++--------------------- 1 file changed, 39 insertions(+), 39 deletions(-) diff --git a/topics/twitter-clone.md b/topics/twitter-clone.md index e21e189f11..194b1084f0 100644 --- a/topics/twitter-clone.md +++ b/topics/twitter-clone.md @@ -1,36 +1,36 @@ A case study: Design and implementation of a simple Twitter clone using only the Redis key-value store as database and PHP === -In this article I'll explain the design and the implementation of a [simple clone of Twitter](http://retwis.antirez.com) written using PHP and Redis as only database. The programming community uses to look at key-value stores like special databases that can't be used as drop in replacement for a relational database for the development of web applications. This article will try to prove the contrary. +In this article I'll describe the design and the implementation of a [simple clone of Twitter](http://retwis.antirez.com) written using PHP with Redis as the only database. The programming community traditionally considered key-value stores as special databases that couldn't be used as drop in replacements for a relational database for the development of web applications. This article will try to correct this impression. -Our Twitter clone, [called Retwis](http://retwis.antirez.com), is structurally simple, has very good performance, and can be distributed among N web servers and M Redis servers with very little effort. You can find the source code [here](http://code.google.com/p/redis/downloads/list). +Our Twitter clone, called [Retwis](http://retwis.antirez.com), is structurally simple, has very good performance, and can be distributed among any number of web and Redis servers with very little effort. You can find the source code [here](http://code.google.com/p/redis/downloads/list). -We use PHP for the example since it can be read by everybody. The same (or... much better) results can be obtained using Ruby, Python, Erlang, and so on. +I use PHP for the example since it can be read by everybody. The same (or... much better) results can be obtained using Ruby, Python, Erlang, and so on. **Note:** [Retwis-RB](http://retwisrb.danlucraft.com/) is a port of Retwis to Ruby and Sinatra written by Daniel Lucraft! With full source code included of -course, the Git repository is linked in the footer of the web page. The rest -of this article targets PHP, but Ruby programmers can also check the other -source code, it conceptually very similar. +course, a link to its Git repository appears in the footer of this article. The rest +of this article targets PHP, but Ruby programmers can also check the Retwis-RB +source code since it's conceptually very similar. **Note:** [Retwis-J](http://retwisj.cloudfoundry.com/) is a port of Retwis to -Java, using the Spring Data Framework, written by [Costin Leau](http://twitter.com/costinl). The source code +Java, using the Spring Data Framework, written by [Costin Leau](http://twitter.com/costinl). Its source code can be found on -[GitHub](https://github.com/SpringSource/spring-data-keyvalue-examples) and +[GitHub](https://github.com/SpringSource/spring-data-keyvalue-examples), and there is comprehensive documentation available at [springsource.org](http://j.mp/eo6z6I). -Key-value stores basics +Key-value store basics --- -The essence of a key-value store is the ability to store some data, called _value_, inside a key. This data can later be retrieved only if we know the exact key used to store it. There is no way to search something by value. In a sense, it is like a very large hash/dictionary, but it is persistent, i.e. when your application ends, the data doesn't go away. So for example I can use the command SET to store the value *bar* at key *foo*: +The essence of a key-value store is the ability to store some data, called a _value_, inside a key. The value can be retrieved later only if we know the exact key it was stored in. There is no way to search for something by value. In a sense, it is like a very large hash/dictionary, but it is persistent, i.e. when your application ends, the data doesn't go away. So, for example, I can use the command SET to store the value *bar* in the key *foo*: SET foo bar -Redis will store our data permanently, so we can later ask for "_What is the value stored at key foo?_" and Redis will reply with *bar*: +Redis stores data permanently, so if I later ask "_What is the value stored in key foo?_" Redis will reply with *bar*: GET foo => bar -Other common operations provided by key-value stores are DEL used to delete a given key, and the associated value, SET-if-not-exists (called SETNX on Redis) that sets a key only if it does not already exist, and INCR that is able to atomically increment a number stored at a given key: +Other common operations provided by key-value stores are DEL, to delete a given key and its associated value, SET-if-not-exists (called SETNX on Redis), to assign a value to a key only if the key does not already exist, and INCR, to atomically increment a number stored in a given key: SET foo 10 INCR foo => 11 @@ -40,13 +40,13 @@ Other common operations provided by key-value stores are DEL used to delete a gi Atomic operations --- -So far it should be pretty simple, but there is something special about INCR. Think about this, why to provide such an operation if we can do it ourselves with a bit of code? After all it is as simple as: +There is something special about INCR. Think about why Redis provides such an operation if we can do it ourselves with a bit of code? After all, it is as simple as: x = GET foo x = x + 1 SET foo x -The problem is that doing the increment this way will work as long as there is only a client working with the value _x_ at a time. See what happens if two computers are accessing this data at the same time: +The problem is that incrementing this way will work as long as there is only one client working with the key _foo_ at one time. See what happens if two clients are accessing this key at the same time: x = GET foo (yields 10) y = GET foo (yields 10) @@ -55,34 +55,34 @@ The problem is that doing the increment this way will work as long as there is o SET foo x (foo is now 11) SET foo y (foo is now 11) -Something is wrong with that! We incremented the value two times, but instead to go from 10 to 12 our key holds 11. This is because the INCR operation done with `GET / increment / SET` *is not an atomic operation*. Instead the INCR provided by Redis, Memcached, ..., are atomic implementations, the server will take care to protect the get-increment-set for all the time needed to complete in order to prevent simultaneous accesses. +Something is wrong! We incremented the value two times, but instead of going from 10 to 12, our key holds 11. This is because the increment done with `GET / increment / SET` *is not an atomic operation*. Instead the INCR provided by Redis, Memcached, ..., are atomic implementations, and the server will take care of protecting the key for all the time needed to complete the increment in order to prevent simultaneous accesses. -What makes Redis different from other key-value stores is that it provides more operations similar to INCR that can be used together to model complex problems. This is why you can use Redis to write whole web applications without using an SQL database and without going crazy. +What makes Redis different from other key-value stores is that it provides other operations similar to INCR that can be used to model complex problems. This is why you can use Redis to write whole web applications without using an SQL database and without going crazy. Beyond key-value stores --- -In this section we will see what Redis features we need to build our Twitter clone. The first thing to know is that Redis values can be more than strings. Redis supports Lists and Sets as values, and there are atomic operations to operate against this more advanced values so we are safe even with multiple accesses against the same key. Let's start from Lists: +In this section we will see which Redis features we need to build our Twitter clone. The first thing to know is that Redis values can be more than strings. Redis supports Lists and Sets as values, and there are atomic operations to operate on them so we are safe even with multiple accesses of the same key. Let's start with Lists: LPUSH mylist a (now mylist holds one element list 'a') LPUSH mylist b (now mylist holds 'b,a') LPUSH mylist c (now mylist holds 'c,b,a') -LPUSH means _Left Push_, that is, add an element to the left (or to the head) of the list stored at _mylist_. If the key _mylist_ does not exist it is automatically created by Redis as an empty list before the PUSH operation. As you can imagine, there is also the RPUSH operation that adds the element on the right of the list (on the tail). +LPUSH means _Left Push_, that is, add an element to the left (or to the head) of the list stored in _mylist_. If the key _mylist_ does not exist it is automatically created by Redis as an empty list before the PUSH operation. As you can imagine, there is also an RPUSH operation that adds the element to the right of the list (on the tail). -This is very useful for our Twitter clone. Updates of users can be stored into a list stored at `username:updates` for instance. There are operations to get data or information from Lists of course. For instance LRANGE returns a range of the list, or the whole list. +This is very useful for our Twitter clone. User updates can be added to a list stored in `username:updates`, for instance. There are operations to get data from Lists, of course. For instance, LRANGE returns a range of the list, or the whole list. LRANGE mylist 0 1 => c,b -LRANGE uses zero-based indexes, that is the first element is 0, the second 1, and so on. The command arguments are `LRANGE key first-index last-index`. The _last index_ argument can be negative, with a special meaning: -1 is the last element of the list, -2 the penultimate, and so on. So in order to get the whole list we can use: +LRANGE uses zero-based indexes, that is the first element is 0, the second 1, and so on. The command arguments are `LRANGE key first-index last-index`. The _last-index_ argument can be negative, with a special meaning: -1 is the last element of the list, -2 the penultimate, and so on. So in order to get the whole list we can use: LRANGE mylist 0 -1 => c,b,a -Other important operations are LLEN that returns the length of the list, and LTRIM that is like LRANGE but instead of returning the specified range *trims* the list, so it is like _Get range from mylist, Set this range as new value_ but atomic. We will use only this List operations, but make sure to check the [Redis documentation](http://code.google.com/p/redis/wiki/README) to discover all the List operations supported by Redis. +Other important operations are LLEN that returns the length of the list, and LTRIM that is like LRANGE but instead of returning the specified range *trims* the list, so it is like _Get range from mylist, Set this range as new value_ but atomically. We will use only these List operations, but make sure to check the [Redis documentation](http://code.google.com/p/redis/wiki/README) to discover all the List operations supported by Redis. The set data type --- -There is more than Lists, Redis also supports Sets, that are unsorted collection of elements. It is possible to add, remove, and test for existence of members, and perform intersection between different Sets. Of course it is possible to ask for the list or the number of elements of a Set. Some example will make it more clear. Keep in mind that SADD is the _add to set_ operation, SREM is the _remove from set_ operation, _sismember_ is the _test if it is a member_ operation, and SINTER is _perform intersection_ operation. Other operations are SCARD that is used to get the cardinality (the number of elements) of a Set, and SMEMBERS that will return all the members of a Set. +There is more than Lists. Redis also supports Sets, which are unsorted collection of elements. It is possible to add, remove, and test for existence of members, and perform intersection between different Sets. Of course it is possible to ask for the list or the number of elements of a Set. Some example will make it more clear. Keep in mind that SADD is the _add to set_ operation, SREM is the _remove from set_ operation, _sismember_ is the _test if it is a member_ operation, and SINTER is _perform intersection_ operation. Other operations are SCARD that is used to get the cardinality (the number of elements) of a Set, and SMEMBERS that will return all the members of a Set. SADD myset a SADD myset b @@ -103,21 +103,21 @@ SINTER can return the intersection between Sets but it is not limited to two set SISMEMBER myset foo => 1 SISMEMBER myset notamember => 0 -Okay, I think we are ready to start coding! +Okay, we are ready to start coding! Prerequisites --- -If you didn't download it already please grab the [source code of Retwis](http://code.google.com/p/redis/downloads/list). It's a simple tar.gz file with a few of PHP files inside. The implementation is very simple. You will find the PHP library client inside (redis.php) that is used to talk with the Redis server from PHP. This library was written by [Ludovico Magnocavallo](http://qix.it) and you are free to reuse this in your own projects, but for updated version of the library please download the Redis distribution. (Note: there are now better PHP libraries available, check our [clients page](/clients). +If you haven't downloaded the [Retwis source code](http://code.google.com/p/redis/downloads/list) already please grab it now. It's a simple tar.gz file containing a few PHP files. The implementation is very simple. You will find the PHP library client inside (redis.php) that is used to talk with the Redis server from PHP. This library was written by [Ludovico Magnocavallo](http://qix.it) and you are free to reuse this in your own projects, but for an updated version of the library please download the Redis distribution. (Note: there are now better PHP libraries available, check our [clients page](/clients). -Another thing you probably want is a working Redis server. Just get the source, compile with make, and run with ./redis-server and you are done. No configuration is required at all in order to play with it or to run Retwis in your computer. +Another thing you probably want is a working Redis server. Just get the source, build with make, run with ./redis-server and you're done. No configuration is required at all in order to play with or run Retwis in your computer. Data layout --- -Working with a relational database this is the stage were the database layout should be produced in form of tables, indexes, and so on. We don't have tables, so what should be designed? We need to identify what keys are needed to represent our objects and what kind of values this keys need to hold. +When working with a relational database, this is when the database schema should be designed so that we'd know the tables, indexes, and so on that the database will contain. We don't have tables, so what should be designed? We need to identify what keys are needed to represent our objects and what kind of values this keys need to hold. -Let's start from Users. We need to represent this users of course, with the username, userid, password, followers and following users, and so on. The first question is, what should identify a user inside our system? The username can be a good idea since it is unique, but it is also too big, and we want to stay low on memory. So like if our DB was a relational one we can associate an unique ID to every user. Every other reference to this user will be done by id. That's very simple to do, because we have our atomic INCR operation! When we create a new user we can do something like this, assuming the user is called "antirez": +Let's start with Users. We need to represent the users, of course, with their username, userid, password, followers, following users, and so on. The first question is, how should we identify a user? The username can be a good idea since it is unique, but it is also too big, and we want to stay low on memory. So like if our DB was a relational one we can associate an unique ID to every user. Every other reference to this user will be done by id. That's very simple to do, because we have our atomic INCR operation! When we create a new user we can do something like this, assuming the user is called "antirez": INCR global:nextUserId => 1000 SET uid:1000:username antirez @@ -130,10 +130,10 @@ Besides the fields already defined, we need some more stuff in order to fully de This may appear strange at first, but remember that we are only able to access data by key! It's not possible to tell Redis to return the key that holds a specific value. This is also *our strength*, this new paradigm is forcing us to organize the data so that everything is accessible by _primary key_, speaking with relational DBs language. -Following, followers and updates +Following, followers, and updates --- -There is another central need in our system. Every user has followers users and following users. We have a perfect data structure for this work! That is... Sets. So let's add this two new fields to our schema: +There is another central need in our system. Every user has users that they follow and users who follow them. We have a perfect data structure for this! That is... Sets. So let's add these two new fields to our schema: uid:1000:followers => Set of uids of all the followers users uid:1000:following => Set of uids of all the following users @@ -215,9 +215,9 @@ The code is simpler than the description, possibly: return true; } -`loadUserInfo` as separated function is an overkill for our application, but it's a good template for a complex application. The only thing it's missing from all the authentication is the logout. What we do on logout? That's simple, we'll just change the random string in uid:1000:auth, remove the old auth:`` and add a new auth:``. +`loadUserInfo` as a separate function is overkill for our application, but it's a good approach in a complex application. The only thing that's missing from all the authentication is the logout. What do we do on logout? That's simple, we'll just change the random string in uid:1000:auth, remove the old auth:`` and add a new auth:``. -*Important:* the logout procedure explains why we don't just authenticate the user after the lookup of auth:``, but double check it against uid:1000:auth. The true authentication string is the latter, the auth:`` is just an authentication key that may even be volatile, or if there are bugs in the program or a script gets interrupted we may even end with multiple auth:`` keys pointing to the same user id. The logout code is the following (logout.php): +*Important:* the logout procedure explains why we don't just authenticate the user after looking up auth:``, but double check it against uid:1000:auth. The true authentication string is the latter, the auth:`` is just an authentication key that may even be volatile, or if there are bugs in the program or a script gets interrupted we may even end with multiple auth:`` keys pointing to the same user id. The logout code is the following (logout.php): include("retwis.php"); @@ -242,12 +242,12 @@ That is just what we described and should be simple to understand. Updates --- -Updates, also known as posts, are even simpler. In order to create a new post on the database we do something like this: +Updates, also known as posts, are even simpler. In order to create a new post in the database we do something like this: INCR global:nextPostId => 10343 SET post:10343 "$owner_id|$time|I'm having fun with Retwis" -As you can see the user id and time of the post are stored directly inside the string, we don't need to lookup by time or user id in the example application so it is better to compact everything inside the post string. +As you can see, the user id and time of the post are stored directly inside the string, so we don't need to lookup by time or user id in the example application so it is better to compact everything inside the post string. After we create a post we obtain the post id. We need to LPUSH this post id in every user that's following the author of the post, and of course in the list of posts of the author. This is the file update.php that shows how this is performed: @@ -277,14 +277,14 @@ After we create a post we obtain the post id. We need to LPUSH this post id in e header("Location: index.php"); -The core of the function is the `foreach`. We get using SMEMBERS all the followers of the current user, then the loop will LPUSH the post against the uid:``:posts of every follower. +The core of the function is the `foreach` loop. We get using SMEMBERS all the followers of the current user, then the loop will LPUSH the post against the uid:``:posts of every follower. -Note that we also maintain a timeline with all the posts. In order to do so what is needed is just to LPUSH the post against global:timeline. Let's face it, do you start thinking it was a bit strange to have to sort things added in chronological order using ORDER BY with SQL? I think so indeed. +Note that we also maintain a timeline for all the posts. This requires just LPUSHing the post against global:timeline. Let's face it, do you start thinking it was a bit strange to have to sort things added in chronological order using ORDER BY with SQL? I think so indeed. Paginating updates --- -Now it should be pretty clear how we can user LRANGE in order to get ranges of posts, and render this posts on the screen. The code is simple: +Now it should be pretty clear how we can use LRANGE in order to get ranges of posts, and render these posts on the screen. The code is simple: function showPost($id) { $r = redisLink(); @@ -333,7 +333,7 @@ You can find the code that sets or removes a following/follower relation at foll Making it horizontally scalable --- -Gentle reader, if you reached this point you are already an hero, thank you. Before to talk about scaling horizontally it is worth to check the performances on a single server. Retwis is *amazingly fast*, without any kind of cache. On a very slow and loaded server, apache benchmark with 100 parallel clients issuing 100000 requests measured the average pageview to take 5 milliseconds. This means you can serve millions of users every day with just a single Linux box, and this one was monkey asses slow! Go figure with more recent hardware. +Gentle reader, if you reached this point you are already a hero. Thank you. Before talking about scaling horizontally it is worth checking the performances on a single server. Retwis is *amazingly fast*, without any kind of cache. On a very slow and loaded server, an apache benchmark with 100 parallel clients issuing 100000 requests measured the average pageview to take 5 milliseconds. This means you can serve millions of users every day with just a single Linux box, and this one was monkey ass slow! Go figure with more recent hardware. So, first of all, probably you will not need more than one server for a lot of applications, even when you have a lot of users. But let's assume we *are* Twitter and need to handle a huge amount of traffic. What to do? @@ -344,7 +344,7 @@ The first thing to do is to hash the key and issue the request on different serv server_id = crc32(key) % number_of_servers -This has a lot of problems since if you add one server you need to move too much keys and so on, but this is the general idea even if you use a better hashing scheme like consistent hashing. +This has a lot of problems since if you add one server you need to move too many keys and so on, but this is the general idea even if you use a better hashing scheme like consistent hashing. Ok, are key accesses distributed among the key space? Well, all the user data will be partitioned among different servers. There are no inter-keys operations used (like SINTER, otherwise you need to care that things you want to intersect will end in the same server. *This is why Redis unlike memcached does not force a specific hashing scheme, it's application specific*). Btw there are keys that are accessed more frequently. @@ -353,6 +353,6 @@ Special keys For example every time we post a new message, we *need* to increment the `global:nextPostId` key. How to fix this problem? A Single server will get a lot if increments. The simplest way to handle this is to have a dedicated server just for increments. This is probably an overkill btw unless you have really a lot of traffic. There is another trick. The ID does not really need to be an incremental number, but just *it needs to be unique*. So you can get a random string long enough to be unlikely (almost impossible, if it's md5-size) to collide, and you are done. We successfully eliminated our main problem to make it really horizontally scalable! -There is another one: global:timeline. There is no fix for this, if you need to take something in order you can split among different servers and *then merge* when you need to get the data back, or take it ordered and use a single key. Again if you really have so much posts per second, you can use a single server just for this. Remember that with commodity hardware Redis is able to handle 100000 writes for second, that's enough even for Twitter, I guess. +There is another one: global:timeline. There is no fix for this, if you need to take something in order you can split among different servers and *then merge* when you need to get the data back, or take it ordered and use a single key. Again if you really have so much posts per second, you can use a single server just for this. Remember that with commodity hardware Redis is able to handle 100000 writes per second. That's enough even for Twitter, I guess. Please feel free to use the comments below for questions and feedbacks. From 335c90deb8bdf2eeccfcbd24c044d437c4510cb4 Mon Sep 17 00:00:00 2001 From: Nikita Koksharov Date: Sat, 11 Jan 2014 07:15:23 -0800 Subject: [PATCH 0113/2573] Redisson entry added --- clients.json | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/clients.json b/clients.json index e18661b340..108bb3cdd5 100644 --- a/clients.json +++ b/clients.json @@ -176,6 +176,16 @@ "active": true }, + { + "name": "Redisson", + "language": "Java", + "repository": "https://github.com/mrniko/redisson", + "description": "distributed and scalable Java data structures on top of Redis server", + "authors": ["mrniko"], + "recommended": true, + "active": true + }, + { "name": "JRedis", "language": "Java", From 6c192be1997962d27189fe5db3e71239f89fd565 Mon Sep 17 00:00:00 2001 From: "Stuart P. Bentley" Date: Sat, 11 Jan 2014 17:02:05 -0800 Subject: [PATCH 0114/2573] Fix PTTL first version in ttl.md PTTL is first available in 2.6, not 2.8. --- commands/ttl.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/commands/ttl.md b/commands/ttl.md index 17055f4884..15821e1140 100644 --- a/commands/ttl.md +++ b/commands/ttl.md @@ -9,7 +9,7 @@ Starting with Redis 2.8 the return value in case of error changed: * The command returns `-2` if the key does not exist. * The command returns `-1` if the key exists but has no associated expire. -See also the `PTTL` command that returns the same information with milliseconds resolution (Only available in Redis 2.8 or greater). +See also the `PTTL` command that returns the same information with milliseconds resolution (Only available in Redis 2.6 or greater). @return From cf613d1359f87e450c66f8c72a1a00a2ff0a2f99 Mon Sep 17 00:00:00 2001 From: "Stuart P. Bentley" Date: Sat, 11 Jan 2014 17:03:42 -0800 Subject: [PATCH 0115/2573] Document 2.8+ negative value behavior in pttl.md I just got bitten by this behavior in my own code. --- commands/pttl.md | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/commands/pttl.md b/commands/pttl.md index a3d66431a1..4e0807971b 100644 --- a/commands/pttl.md +++ b/commands/pttl.md @@ -2,10 +2,16 @@ Like `TTL` this command returns the remaining time to live of a key that has an expire set, with the sole difference that `TTL` returns the amount of remaining time in seconds while `PTTL` returns it in milliseconds. +In Redis 2.6 or older the command returns `-1` if the key does not exist or if the key exist but has no associated expire. + +Starting with Redis 2.8 the return value in case of error changed: + +* The command returns `-2` if the key does not exist. +* The command returns `-1` if the key exists but has no associated expire. + @return -@integer-reply: Time to live in milliseconds or `-1` when `key` does not exist -or does not have a timeout. +@integer-reply: TTL in milliseconds, or a negative value in order to signal an error (see the description above). @examples From 584dd95882738ee2762040bb6dc2bd3bab20ad88 Mon Sep 17 00:00:00 2001 From: antirez Date: Mon, 13 Jan 2014 16:35:55 +0100 Subject: [PATCH 0116/2573] SENTINEL runtime config API documented. --- topics/sentinel.md | 47 +++++++++++++++++++++++++++++----------------- 1 file changed, 30 insertions(+), 17 deletions(-) diff --git a/topics/sentinel.md b/topics/sentinel.md index 0f0a3f6845..21fb99b2c4 100644 --- a/topics/sentinel.md +++ b/topics/sentinel.md @@ -1,8 +1,6 @@ Redis Sentinel Documentation === -**Note:** this page documents the *new* Sentinel implementation that entered the Github repository 21th of November. The old Sentinel implementation is [documented here](http://redis.io/topics/sentinel-old), however using the old implementation is discouraged. - Redis Sentinel is a system designed to help managing Redis instances. It performs the following three tasks: @@ -25,19 +23,14 @@ executable. describes how to use what we is already implemented, and may change as the Sentinel implementation evolves. -Redis Sentinel is compatible with Redis 2.4.16 or greater, and Redis 2.6.0 or greater, however it works better if used against Redis instances version 2.8.0 or greater. +Redis Sentinel is compatible with Redis 2.4.16 or greater, and Redis 2.6.0 or greater, however it works better if used with Redis instances version 2.8.0 or greater. Obtaining Sentinel --- -Currently Sentinel is part of the Redis *unstable* branch at github. -To compile it you need to clone the *unstable* branch and compile Redis. -You'll see a `redis-sentinel` executable in your `src` directory. - -Alternatively you can use directly the `redis-server` executable itself, -starting it in Sentinel mode as specified in the next paragraph. +Sentinel is currently developed in the *unstable* branch of the Redis source code at Github. However an update copy of Sentinel is provided with every patch release of Redis 2.8. -An updated version of Sentinel is also available as part of the Redis 2.8.0 release. +The simplest way to use Sentinel is to download the latest verison of Redis 2.8 or to compile Redis latest commit in the *unstable* branch at Github. Running Sentinel --- @@ -80,7 +73,7 @@ that is at address 127.0.0.1 and port 6379, with a level of agreement needed to detect this master as failing of 2 sentinels (if the agreement is not reached the automatic failover does not start). -However note that whatever the agreement you specify to detect an instance as not working, a Sentinel requires **the vote from the majority** of the known Sentinels in the system in order to start a failover and reserve a given *configuration Epoch* (that is a version to attach to a new master configuration). +However note that whatever the agreement you specify to detect an instance as not working, a Sentinel requires **the vote from the majority** of the known Sentinels in the system in order to start a failover and obtain a new *configuration Epoch* to assign to the new configuraiton afte the failiver. In other words **Sentinel is not able to perform the failover if only a minority of the Sentinel processes are working**. @@ -112,6 +105,8 @@ The other options are described in the rest of this document and documented in the example sentinel.conf file shipped with the Redis distribution. +All the configuration parameters can be modified at runtime using the `SENTINEL` command. See the **Reconfiguring Sentinel at runtime** section for more information. + SDOWN and ODOWN --- @@ -204,12 +199,30 @@ Sentinel commands The following is a list of accepted commands: -* **PING** this command simply returns PONG. -* **SENTINEL masters** show a list of monitored masters and their state. -* **SENTINEL slaves ``** show a list of slaves for this master, and their state. -* **SENTINEL get-master-addr-by-name ``** return the ip and port number of the master with that name. If a failover is in progress or terminated successfully for this master it returns the address and port of the promoted slave. -* **SENTINEL reset ``** this command will reset all the masters with matching name. The pattern argument is a glob-style pattern. The reset process clears any previous state in a master (including a failover in progress), and removes every slave and sentinel already discovered and associated with the master. -* **SENTINEL failover ``** force a failover as if the master was not reachable, and without asking for agreement to other Sentinels (however a new version of the configuration will be published so that the other Sentinels will update their configurations). +* **PING** This command simply returns PONG. +* **SENTINEL masters** Show a list of monitored masters and their state. +* **SENTINEL master ``** Show the state and info of the specified master. +* **SENTINEL slaves ``** Show a list of slaves for this master, and their state. +* **SENTINEL get-master-addr-by-name ``** Return the ip and port number of the master with that name. If a failover is in progress or terminated successfully for this master it returns the address and port of the promoted slave. +* **SENTINEL reset ``** This command will reset all the masters with matching name. The pattern argument is a glob-style pattern. The reset process clears any previous state in a master (including a failover in progress), and removes every slave and sentinel already discovered and associated with the master. +* **SENTINEL failover ``** Force a failover as if the master was not reachable, and without asking for agreement to other Sentinels (however a new version of the configuration will be published so that the other Sentinels will update their configurations). + +Reconfiguring Sentinel at Runtime +--- + +Starting with Redis version 2.8.4, Sentinel provides an API in order to add, remove, or change the configuration of a given master. Note that if you have multiple sentinels you should apply the changes to all to your instances for Redis Sentinel to work properly. This means that changing the configuration of a single Sentinel does not automatically propagates the changes to the other Sentinels in the network. + +The following is a list of `SENTINEL` sub commands used in order to update the configuration of a Sentinel instance. + +* **SENTINEL MONITOR `` `` `` ``** This command tells the Sentinel to start monitoring a new master with the specified name, ip, port, and quorum. It is identical to the `sentinel monitor` configuration directive in `sentinel.conf` configuration file, with the difference that you can't use an hostname in as `ip`, but you need to provide an IPv4 or IPv6 address. +* **SENTINEL REMOVE ``** is used in order to remove the specified master: the master will no longer be monitored, and will totally be removed from the internal state of the Sentinel, so it will no longer listed by `SENTINEL masters` and so forth. +* **SENTINEL SET `` `