压缩列表
《Redis设计与实现》
在Redis中,压缩列表(ziplist)是由一段有特定编码的内存块构成的列表,目的是为了节约内存。
压缩列表被用作列表键和哈希键的底层实现之一。
当一个列表键只包含少量列表项,并且每个列表项要么是小整数值,要么是长度比较短的字符串,Redis会使用压缩列表作为列表键的底层实现。
当一个哈希键只包含少量键值对,且每个键值对的键和值要么是小整数值,要么是长度比较短的字符串,Redis会使用压缩列表作为哈希键的底层实现。
特点
- 压缩列表是一种为节约内存而开发的顺序型数据结构。
- 压缩列表被用作列表键和哈希键的底层实现之一。
- 压缩列表可以包含多个节点,每个节点可以保存一个字节数组或者整数值。
- 添加新节点到压缩列表,或者从压缩列表中删除节点,可能会引发连锁更新操作,但这种操作出现的几率并不高。
结构
压缩列表的结构
属性 | 类型 | 长度 | |
---|---|---|---|
zlbytes | uint32_t | 4 bytes | 记录整个压缩列表的内存字节数。 |
zltail | uint32_t | 4 bytes | 记录压缩列表的起始地址距离压缩列表尾结点多少字节数。使得无须遍历即可访问列表尾。 |
zllen | uint16_t | 2 bytes | 记录压缩列表中节点的数量。当zllen 的值小于UINT_16_MAX(65536) 时,该属性的值就是列表节点的数量。当zllen 的值等于UINT_16_MAX(65536) 时,只有遍历才能统计出节点的数量。 |
entryX | 列表节点 | 不定 | 压缩裂变的节点,节点长度由内存决定。 |
zlend | uint8_t | 1 byte | 标记压缩列表结束的特殊值(0xFF)。 |
压缩列表节点的结构
压缩列表可以保存:
- 一个字节数组(可以为以下3种长度其中1种):
- 长度小于等于63(2^6 - 1)字节的字节数组;
- 长度小于等于16383(2^14 - 1)字节的字节数组;
- 长度小于等于4294967295(2^32 - 1)字节的字节数组;
- 一个整数值(可以为以下6种长度其中1种):
- 4位长,介于0~12之间的无符号整数;
- 1字节长的有符号整数;
- 3字节长的有符号整数;
- int16_t类型整数;
- int32_t类型整数;
- int64_t类型整数;
previous_entry_length
节点的previous_entry_length
属性是以字节为单位,记录前一个节点的长度。
previous_entry_length
属性长度可以是1字节或者5字节:
- 如果前一节点的长度小于254字节,那么
previous_entry_length
属性长度为1字节;前一字节的长度就保存在这一个字节中; - 如果前一节点的长度大于等于254字节,那么
previous_entry_length
属性的长度为5字节。其中属性的第1字节会被设置为0xFE(十进制254)
,而之后的4个字节则用于保存前一节点的长度。
encoding
encoding
属性记录本节点的content
所保存数据的类型以及长度:
- 字节数组编码:
编码 | 编码长度 | content属性保存的值 |
---|---|---|
00 bbbbbb | 1 byte | 保存长度小于等于63字节的字节数组(以00 开头) |
01 bbbbbb xxxxxxxx | 2 bytes | 保存长度小于等于16383字节的字节数组(以01 开头) |
10 bbbbbb xxxxxxxx aaaaaaaa cccccccc dddddddd | 5 bytes | 保存长度小于等于4294 967 295字节的字节数组(以10 开头) |
- 整数编码(整数编码以
11
开头,占 1 byte):
编码 | 编码长度 | content属性保存的值 |
---|---|---|
1100 0000 | 1 byte | int16_t 类型的整数 |
1101 0000 | 1 byte | int32_t 类型的整数 |
1110 0000 | 1 byte | int64_t 类型的整数 |
1111 0000 | 1 byte | 24位有符号整数 |
1111 1110 | 1 byte | 8位有符号整数 |
1111 xxxx | 1 byte | xxxx 将是介于0001 到1101 之间,表示 0 - 12 之间的值,所以使用这一编码无须content 属性 |
content
content
用于保存节点的数据,可以使一个字节数组或者整数,类型由encoding
决定。
连锁更新
当对压缩列表添加/删除节点时,为了让每个节点的previous_entry_length
属性都符合压缩列表对节点的要求,需要不断对压缩列表执行空间的重新分配操作,知道末尾节点位置。这种特殊情况下产生的连续多次空间扩展操作称为连续更新。
连锁更新 在最坏的情况下需要对压缩列表执行N次空间(N为节点数)重新分配操作,而每次空间重新分配的最坏复杂度为O(N),所以连续更新的最坏复杂度为O(N^2)。
但是连锁更新造成性能问题的几率很低:
- 压缩列表要恰好有多个连续的、长度介于250字节至253字节之间的节点,连锁更新才有可能被引发。
- 即使出现连锁更新,只要被更新的节点数量不多,就不会对性能造成任何影响(压缩列表也是用于少量节点的情况下)。
部分代码解析
-
unsigned char *ziplistNew(void)
创建一个压缩列表:/* Create a new empty ziplist. */ unsigned char *ziplistNew(void) { // ZIPLIST_HEADER_SIZE 是压缩列表固定的头部大小 = 4 bytes + 4 bytes + 2 bytes // 1 是 1 byte 的列表结束位 // 所以这个是空压缩列表所需要的内存大小 unsigned int bytes = ZIPLIST_HEADER_SIZE+1; unsigned char *zl = zmalloc(bytes); // 将压缩列表的 `bytes` 属性值 设为 11 ZIPLIST_BYTES(zl) = intrev32ifbe(bytes); // 空压缩列表的 `zltail` 的值 为 10 ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(ZIPLIST_HEADER_SIZE); // 节点数量 `zllen` 为 0 ZIPLIST_LENGTH(zl) = 0; // 将 `zlend` 值设为 255 zl[bytes-1] = ZIP_END; return zl; }
-
unsigned char *ziplistInsert(unsigned char *zl, unsigned char *p, unsigned char *s, unsigned int slen)
将长度为slen
的字节数组或者整数s
插入到压缩列表zl
的给定节点p
之后:/* Insert an entry at "p". */ unsigned char *ziplistInsert(unsigned char *zl, unsigned char *p, unsigned char *s, unsigned int slen) { return __ziplistInsert(zl,p,s,slen); } /* Insert item at "p". */ // 将长度为`slen`的字节数组或者整数`s`插入到压缩列表`zl`的位置`p`上 unsigned char *__ziplistInsert(unsigned char *zl, unsigned char *p, unsigned char *s, unsigned int slen) { // 获取当前压缩列表zl的元素个数 curlen size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)), reqlen; unsigned int prevlensize, prevlen = 0; size_t offset; int nextdiff = 0; unsigned char encoding = 0; long long value = 123456789; /* initialized to avoid warning. Using a value that is easy to see if for some reason we use it uninitialized. */ zlentry tail; // 整段代码主要为了找到待插入位置的前一个节点长度 prevlen /* Find out prevlen for the entry that is inserted. */ if (p[0] != ZIP_END) { // 如果节点p不是结束位置 // 获取节点p的 `previous_entry_length`属性 所占用字节数 prevlensize // 获取节点p的 `previous_entry_length`属性 的值 prevlen ZIP_DECODE_PREVLEN(p, prevlensize, prevlen); } else { // 如果节点p是结束位置 // 获取最后一个节点的位置ptail unsigned char *ptail = ZIPLIST_ENTRY_TAIL(zl); if (ptail[0] != ZIP_END) { // 如果ptail不是结束位置,则说明压缩列表中有其他节点,插入位置为末尾 // 获取节点ptail的 `previous_entry_length`属性 的值 prevlen prevlen = zipRawEntryLength(ptail); } } // 检查检点是否可以被编码,并选择合适的编码方式 /* See if the entry can be encoded */ if (zipTryEncoding(s,slen,&value,&encoding)) { /* 'encoding' is set to the appropriate integer encoding */ reqlen = zipIntSize(encoding); } else { /* 'encoding' is untouched, however zipStoreEntryEncoding will use the * string length to figure out how to encode it. */ // 不能编码,则以字符串形式保存 reqlen = slen; } /* We need space for both the length of the previous entry and * the length of the payload. */ // 还需要加上 前一个节点的长度 zipStorePrevEntryLength(NULL,prevlen) // 和当前节点的长度 zipStoreEntryEncoding(NULL,encoding,slen) reqlen += zipStorePrevEntryLength(NULL,prevlen); reqlen += zipStoreEntryEncoding(NULL,encoding,slen); /* When the insert position is not equal to the tail, we need to * make sure that the next entry can hold this entry's length in * its prevlen field. */ // 当插入位置不是末尾的时候,需要保证下一个节点的`previous_entry_length`有足够的长度来保存 int forcelarge = 0; nextdiff = (p[0] != ZIP_END) ? zipPrevLenByteDiff(p,reqlen) : 0; if (nextdiff == -4 && reqlen < 4) { nextdiff = 0; forcelarge = 1; } /* Store offset because a realloc may change the address of zl. */ // 保存偏移量,以防realloc()改变地址 offset = p-zl; zl = ziplistResize(zl,curlen+reqlen+nextdiff); p = zl+offset; /* Apply memory move when necessary and update tail offset. */ if (p[0] != ZIP_END) { // 当插入位置不为末尾的时候 // 将p之后的所有内容向后移动到p+reqlen /* Subtract one because of the ZIP_END bytes */ memmove(p+reqlen,p-nextdiff,curlen-offset-1+nextdiff); /* Encode this entry's raw length in the next entry. */ // 将节点的长度写入下一个节点的`previous_entry_length`中 if (forcelarge) // 需要扩展的保存写入 zipStorePrevEntryLengthLarge(p+reqlen,reqlen); else zipStorePrevEntryLength(p+reqlen,reqlen); /* Update offset for tail */ // 更新zltail的值 ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+reqlen); /* When the tail contains more than one entry, we need to take * "nextdiff" in account as well. Otherwise, a change in the * size of prevlen doesn't have an effect on the *tail* offset. */ zipEntry(p+reqlen, &tail); if (p[reqlen+tail.headersize+tail.len] != ZIP_END) { ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+nextdiff); } } else { // 插入到尾部 /* This element will be the new tail. */ ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(p-zl); } /* When nextdiff != 0, the raw length of the next entry has changed, so * we need to cascade the update throughout the ziplist */ if (nextdiff != 0) { offset = p-zl; zl = __ziplistCascadeUpdate(zl,p+reqlen); p = zl+offset; } /* Write the entry */ // 将节点数据写入位置p p += zipStorePrevEntryLength(p,prevlen); p += zipStoreEntryEncoding(p,encoding,slen); if (ZIP_IS_STR(encoding)) { memcpy(p,s,slen); } else { zipSaveInteger(p,value,encoding); } ZIPLIST_INCR_LENGTH(zl,1); return zl; } /* Return the total number of bytes used by the entry pointed to by 'p'. */ unsigned int zipRawEntryLength(unsigned char *p) { unsigned int prevlensize, encoding, lensize, len; ZIP_DECODE_PREVLENSIZE(p, prevlensize); ZIP_DECODE_LENGTH(p + prevlensize, encoding, lensize, len); return prevlensize + lensize + len; } /* Check if string pointed to by 'entry' can be encoded as an integer. * Stores the integer value in 'v' and its encoding in 'encoding'. */ int zipTryEncoding(unsigned char *entry, unsigned int entrylen, long long *v, unsigned char *encoding) { long long value; if (entrylen >= 32 || entrylen == 0) return 0; if (string2ll((char*)entry,entrylen,&value)) { /* Great, the string can be encoded. Check what's the smallest * of our encoding types that can hold this value. */ if (value >= 0 && value <= 12) { *encoding = ZIP_INT_IMM_MIN+value; } else if (value >= INT8_MIN && value <= INT8_MAX) { *encoding = ZIP_INT_8B; } else if (value >= INT16_MIN && value <= INT16_MAX) { *encoding = ZIP_INT_16B; } else if (value >= INT24_MIN && value <= INT24_MAX) { *encoding = ZIP_INT_24B; } else if (value >= INT32_MIN && value <= INT32_MAX) { *encoding = ZIP_INT_32B; } else { *encoding = ZIP_INT_64B; } *v = value; return 1; } return 0; } /* Return bytes needed to store integer encoded by 'encoding'. */ unsigned int zipIntSize(unsigned char encoding) { switch(encoding) { case ZIP_INT_8B: return 1; case ZIP_INT_16B: return 2; case ZIP_INT_24B: return 3; case ZIP_INT_32B: return 4; case ZIP_INT_64B: return 8; } if (encoding >= ZIP_INT_IMM_MIN && encoding <= ZIP_INT_IMM_MAX) return 0; /* 4 bit immediate */ panic("Invalid integer encoding 0x%02X", encoding); return 0; } /* Encode the length of the previous entry and write it to "p". Return the * number of bytes needed to encode this length if "p" is NULL. */ unsigned int zipStorePrevEntryLength(unsigned char *p, unsigned int len) { if (p == NULL) { return (len < ZIP_BIG_PREVLEN) ? 1 : sizeof(len)+1; } else { if (len < ZIP_BIG_PREVLEN) { p[0] = len; return 1; } else { return zipStorePrevEntryLengthLarge(p,len); } } } /* Write the encoidng header of the entry in 'p'. If p is NULL it just returns * the amount of bytes required to encode such a length. Arguments: * * 'encoding' is the encoding we are using for the entry. It could be * ZIP_INT_* or ZIP_STR_* or between ZIP_INT_IMM_MIN and ZIP_INT_IMM_MAX * for single-byte small immediate integers. * * 'rawlen' is only used for ZIP_STR_* encodings and is the length of the * srting that this entry represents. * * The function returns the number of bytes used by the encoding/length * header stored in 'p'. */ unsigned int zipStoreEntryEncoding(unsigned char *p, unsigned char encoding, unsigned int rawlen) { unsigned char len = 1, buf[5]; if (ZIP_IS_STR(encoding)) { /* Although encoding is given it may not be set for strings, * so we determine it here using the raw length. */ if (rawlen <= 0x3f) { if (!p) return len; buf[0] = ZIP_STR_06B | rawlen; } else if (rawlen <= 0x3fff) { len += 1; if (!p) return len; buf[0] = ZIP_STR_14B | ((rawlen >> 8) & 0x3f); buf[1] = rawlen & 0xff; } else { len += 4; if (!p) return len; buf[0] = ZIP_STR_32B; buf[1] = (rawlen >> 24) & 0xff; buf[2] = (rawlen >> 16) & 0xff; buf[3] = (rawlen >> 8) & 0xff; buf[4] = rawlen & 0xff; } } else { /* Implies integer encoding, so length is always 1. */ if (!p) return len; buf[0] = encoding; } /* Store this length at p. */ memcpy(p,buf,len); return len; } /* Given a pointer 'p' to the prevlen info that prefixes an entry, this * function returns the difference in number of bytes needed to encode * the prevlen if the previous entry changes of size. * * So if A is the number of bytes used right now to encode the 'prevlen' * field. * * And B is the number of bytes that are needed in order to encode the * 'prevlen' if the previous element will be updated to one of size 'len'. * * Then the function returns B - A * * So the function returns a positive number if more space is needed, * a negative number if less space is needed, or zero if the same space * is needed. */ int zipPrevLenByteDiff(unsigned char *p, unsigned int len) { unsigned int prevlensize; ZIP_DECODE_PREVLENSIZE(p, prevlensize); return zipStorePrevEntryLength(NULL, len) - prevlensize; } /* Resize the ziplist. */ unsigned char *ziplistResize(unsigned char *zl, unsigned int len) { zl = zrealloc(zl,len); ZIPLIST_BYTES(zl) = intrev32ifbe(len); zl[len-1] = ZIP_END; return zl; } /* Encode the length of the previous entry and write it to "p". This only * uses the larger encoding (required in __ziplistCascadeUpdate). */ int zipStorePrevEntryLengthLarge(unsigned char *p, unsigned int len) { if (p != NULL) { p[0] = ZIP_BIG_PREVLEN; memcpy(p+1,&len,sizeof(len)); memrev32ifbe(p+1); } return 1+sizeof(len); } /* When an entry is inserted, we need to set the prevlen field of the next * entry to equal the length of the inserted entry. It can occur that this * length cannot be encoded in 1 byte and the next entry needs to be grow * a bit larger to hold the 5-byte encoded prevlen. This can be done for free, * because this only happens when an entry is already being inserted (which * causes a realloc and memmove). However, encoding the prevlen may require * that this entry is grown as well. This effect may cascade throughout * the ziplist when there are consecutive entries with a size close to * ZIP_BIG_PREVLEN, so we need to check that the prevlen can be encoded in * every consecutive entry. * * Note that this effect can also happen in reverse, where the bytes required * to encode the prevlen field can shrink. This effect is deliberately ignored, * because it can cause a "flapping" effect where a chain prevlen fields is * first grown and then shrunk again after consecutive inserts. Rather, the * field is allowed to stay larger than necessary, because a large prevlen * field implies the ziplist is holding large entries anyway. * * The pointer "p" points to the first entry that does NOT need to be * updated, i.e. consecutive fields MAY need an update. */ unsigned char *__ziplistCascadeUpdate(unsigned char *zl, unsigned char *p) { size_t curlen = intrev32ifbe(ZIPLIST_BYTES(zl)), rawlen, rawlensize; size_t offset, noffset, extra; unsigned char *np; zlentry cur, next; while (p[0] != ZIP_END) { zipEntry(p, &cur); rawlen = cur.headersize + cur.len; rawlensize = zipStorePrevEntryLength(NULL,rawlen); /* Abort if there is no next entry. */ if (p[rawlen] == ZIP_END) break; zipEntry(p+rawlen, &next); /* Abort when "prevlen" has not changed. */ if (next.prevrawlen == rawlen) break; if (next.prevrawlensize < rawlensize) { /* The "prevlen" field of "next" needs more bytes to hold * the raw length of "cur". */ offset = p-zl; extra = rawlensize-next.prevrawlensize; zl = ziplistResize(zl,curlen+extra); p = zl+offset; /* Current pointer and offset for next element. */ np = p+rawlen; noffset = np-zl; /* Update tail offset when next element is not the tail element. */ if ((zl+intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))) != np) { ZIPLIST_TAIL_OFFSET(zl) = intrev32ifbe(intrev32ifbe(ZIPLIST_TAIL_OFFSET(zl))+extra); } /* Move the tail to the back. */ memmove(np+rawlensize, np+next.prevrawlensize, curlen-noffset-next.prevrawlensize-1); zipStorePrevEntryLength(np,rawlen); /* Advance the cursor */ p += rawlen; curlen += extra; } else { if (next.prevrawlensize > rawlensize) { /* This would result in shrinking, which we want to avoid. * So, set "rawlen" in the available bytes. */ zipStorePrevEntryLengthLarge(p+rawlen,rawlen); } else { zipStorePrevEntryLength(p+rawlen,rawlen); } /* Stop here, as the raw length of "next" has not changed. */ break; } } return zl; } /* Store integer 'value' at 'p', encoded as 'encoding' */ void zipSaveInteger(unsigned char *p, int64_t value, unsigned char encoding) { int16_t i16; int32_t i32; int64_t i64; if (encoding == ZIP_INT_8B) { ((int8_t*)p)[0] = (int8_t)value; } else if (encoding == ZIP_INT_16B) { i16 = value; memcpy(p,&i16,sizeof(i16)); memrev16ifbe(p); } else if (encoding == ZIP_INT_24B) { i32 = value<<8; memrev32ifbe(&i32); memcpy(p,((uint8_t*)&i32)+1,sizeof(i32)-sizeof(uint8_t)); } else if (encoding == ZIP_INT_32B) { i32 = value; memcpy(p,&i32,sizeof(i32)); memrev32ifbe(p); } else if (encoding == ZIP_INT_64B) { i64 = value; memcpy(p,&i64,sizeof(i64)); memrev64ifbe(p); } else if (encoding >= ZIP_INT_IMM_MIN && encoding <= ZIP_INT_IMM_MAX) { /* Nothing to do, the value is stored in the encoding itself. */ } else { assert(NULL); } }
压缩列表API
参考之《Redis设计与实现》
函数 | 作用 |
---|---|
unsigned char *ziplistNew(void) | 创建一个新的压缩列表 |
unsigned char *ziplistMerge(unsigned char **first, unsigned char **second) | 将两个压缩列表拼接一起,first 的后面接着second |
unsigned char *ziplistPush(unsigned char *zl, unsigned char *s, unsigned int slen, int where) | 创建一个长度为slen 给定值为s 的新节点,并将这个新节点添加到压缩列表的表头或者表尾 |
unsigned char *ziplistIndex(unsigned char *zl, int index) | 返回压缩列表zl 给定索引index 上的节点 |
unsigned char *ziplistNext(unsigned char *zl, unsigned char *p) | 返回给定节点的下一个节点 |
unsigned char *ziplistPrev(unsigned char *zl, unsigned char *p) | 返回给定节点的前一个节点 |
unsigned int ziplistGet(unsigned char *p, unsigned char **sval, unsigned int *slen, long long *lval) | 获取给定节点所保存的值 |
unsigned char *ziplistInsert(unsigned char *zl, unsigned char *p, unsigned char *s, unsigned int slen) | 将长度为slen 的字节数组或者整数s 插入到压缩列表zl 的给定节点p 之后 |
unsigned char *ziplistDelete(unsigned char *zl, unsigned char **p) | 从压缩列表zl 中删除给定节点p |
unsigned char *ziplistDeleteRange(unsigned char *zl, int index, unsigned int num) | 删除研所列表zl 在给定索引index 上连续num 个结点 |
unsigned int ziplistCompare(unsigned char *p, unsigned char *s, unsigned int slen) | 比较节点p 和s 是否相同 |
unsigned char *ziplistFind(unsigned char *p, unsigned char *vstr, unsigned int vlen, unsigned int skip) | 在压缩裂变中查找并返回包含给定值的节点 |
unsigned int ziplistLen(unsigned char *zl) | 返回压缩列表中包含的节点个数 |
size_t ziplistBlobLen(unsigned char *zl) | 返回压缩列表占用的内存字节数 |