C语言实现短字符串压缩的三种方法详解

目录前言一、通用算法的短字符压缩二、短字符串压缩(1)Smaz(2)Shoco(3)Unisox2三、总结前言上一篇探索了LZ4的压缩和解压性能,以及对LZ4和ZSTD的压缩、解压性能进行了横向对比。...

结果如何?

压缩性能在40w条/S,解压在百万级,好像还不错哈!

(2)Shoco

#include <stdio.h>
#include <string.h>
#include <iostream>
#include "shoco.h"

using namespace std;

int main()
{
    int buf_len;
    int com_size;
    int decom_size;

    char com_buf[4096] = {0};
    char decom_buf[4096] = {0};

    char str_buf[1024] = "Narrator: It is raining today. So, Peppa and George cannot play outside.Peppa: Daddy, it's stopped raining.";

    buf_len = strlen(str_buf);
    com_size = shoco_compress(str_buf, buf_len, com_buf, 4096);

    cout << "text size:" << buf_len << endl;
    cout << "compress text size:" << com_size << endl;
    cout << "compress ratio:" << (float)buf_len / (float)com_size << endl << endl;

    decom_size = shoco_decompress(com_buf, com_size, decom_buf, 4096);
    cout << "decompress text size:" << decom_size << endl;

    if(strncmp(str_buf, decom_buf, buf_len)) {
        cout << "decompress text is not equal to source text" << endl;
    }

    return 0;
}

执行结果如下:

通过shoco压缩后的短字符串长度为86,和源字符串相比,减少了21Byte。压缩率比smaz要低。

 

#include <stdio.h>
#include <string.h>
#include <iostream>
#include <sys/time.h>
#include "shoco.h"

using namespace std;

int main()
{
    int cnt = 0;
    int buf_len;
    int com_size;
    int decom_size;

    timeval st, et;

    char *com_ptr = NULL;
    char* decom_ptr = NULL;

    char str_buf[1024] = "Narrator: It is raining today. So, Peppa and George cannot play outside.Peppa: Daddy, it's stopped raining.";

    buf_len = strlen(str_buf);
    gettimeofday(&st, NULL);
    while(1) {

        com_ptr = (char *)malloc(buf_len);
        com_size = shoco_compress(str_buf, buf_len, com_ptr, buf_len);

        free(com_ptr);
        cnt++;

        gettimeofday(&et, NULL);
        if(et.tv_sec - st.tv_sec >= 10) {
            break;
        }
    }

    cout << endl <<"compress per second:" << cnt/10 << " times" << endl;

    cnt = 0;
    com_ptr = (char *)malloc(buf_len);
    com_size = shoco_compress(str_buf, buf_len, com_ptr, buf_len);

    gettimeofday(&st, NULL);
    while(1) {

        // decompress length not more than origin buf length
        decom_ptr = (char *)malloc(buf_len + 1);
        decom_size = shoco_decompress(com_ptr, com_size, decom_ptr, buf_len + 1);

        // check decompress length
        if(buf_len != decom_size) {
            cout << "decom error" << endl;
        }

        free(decom_ptr);
        cnt++;

        gettimeofday(&et, NULL);
        if(et.tv_sec - st.tv_sec >= 10) {
            break;
        }
    }

    cout << "decompress per second:" << cnt/10 << " times" << endl << endl;

    free(com_ptr);
    return 0;
}

执行结果如何呢?

holy shit!压缩和解压缩居然都达到了惊人的百万级。就像算法作者们自己说的一样:“在长字符串压缩领域,shoco不想与通用压缩算法竞争,我们的优势是短字符的快速压缩,虽然压缩率很烂!”。这样说,好像也没毛病。

(3)Unisox2

我们再来看看unisox2呢。

#include <stdio.h>
#include <string.h>
#include "unishox2.h"

int main()
{
    int buf_len;
    int com_size;
    int decom_size;

    char com_buf[4096] = {0};
    char decom_buf[4096] = {0};

    char str_buf[1024] = "Narrator: It is raining today. So, Peppa and George cannot play outside.Peppa: Daddy, it's stopped raining.";

    buf_len = strlen(str_buf);
    com_size = unishox2_compress_simple(str_buf, buf_len, com_buf);

    printf("text size:%d\n", buf_len);
    printf("compress text size:%d\n", com_size);
    printf("compress ratio:%f\n\n", (float)buf_len / (float)com_size);

    decom_size = unishox2_decompress_simple(com_buf, com_size, decom_buf);

    printf("decompress text size:%d\n", decom_size);

    if(strncmp(str_buf, decom_buf, buf_len)) {
        printf("decompress text is not equal to source text\n");
    }

    return 0;
}

结果如下:

通过Unisox2压缩后的短字符串长度为67,和源字符串相比,减少了40Byte,相当于是打了6折啊!不错不错。

 的压缩和解压缩性能

Unisox2的压缩能力目前来看是三者中最好的,如果他的压缩和解压性能也不错的话,那就真的就比较完美了。再一起看看Unisox2的压缩和解压性能吧!

#include <stdio.h>
#include <string.h>
#include <malloc.h>
#include <sys/time.h>
#include "unishox2.h"

int main()
{
    int cnt = 0;
    int buf_len;
    int com_size;
    int decom_size;

    struct timeval st, et;

    char *com_ptr = NULL;
    char* decom_ptr = NULL;

    char str_buf[1024] = "Narrator: It is raining today. So, Peppa and George cannot play outside.Peppa: Daddy, it's stopped raining.";

    buf_len = strlen(str_buf);
    gettimeofday(&st, NULL);
    while(1) {

        com_ptr = (char *)malloc(buf_len);
        com_size = unishox2_compress_simple(str_buf, buf_len, com_ptr);

        free(com_ptr);
        cnt++;

        gettimeofday(&et, NULL);
        if(et.tv_sec - st.tv_sec >= 10) {
            break;
        }
    }

    printf("\ncompress per second:%d times\n", cnt/10);

    cnt = 0;
    com_ptr = (char *)malloc(buf_len);
    com_size = unishox2_compress_simple(str_buf, buf_len, com_ptr);

    gettimeofday(&st, NULL);
    while(1) {

        // decompress length not more than origin buf length
        decom_ptr = (char *)malloc(buf_len + 1);
        decom_size = unishox2_decompress_simple(com_ptr, com_size, decom_ptr);

        // check decompress length
        if(buf_len != decom_size) {
            printf("decom error\n");
        }

        free(decom_ptr);
        cnt++;

        gettimeofday(&et, NULL);
        if(et.tv_sec - st.tv_sec >= 10) {
            break;
        }
    }

    printf("decompress per second:%d times\n\n", cnt/10);

    free(com_ptr);
    return 0;
}

执行结果如下:

事与愿违,Unisox2虽然有三个算法中最好的压缩率,可是却也拥有最差的压缩和解压性能。跟前两章分析的不谋而合:有高压缩率,就会损失自身的压缩性能,两者不可兼得。

三、总结

本篇分享了smaz,shoco,unisox2三种短字符串压缩算法,分别探索了它们各自的压缩率与压缩和解压缩性能,结果如下表所示。

表1

shoco的压缩率最低,但是拥有最高的压缩和解压速率;smaz居中;unisox2拥有最高的压缩率,可是它的压缩和解压性能最低。

结论与前两章有关长字符串压缩的分析不谋而合:拥有高压缩率,就会损失自身的压缩性能,两者不可兼得。

实际使用还是看自身需求和环境吧。如果适当压缩就好,那就可以选用shoco,毕竟性能高;想要节约更多的空间,那就选择smaz或者unisox2。

到此这篇关于C语言实现短字符串压缩的三种方法详解的文章就介绍到这了,更多相关C语言短字符串压缩内容请搜索我们以前的文章或继续浏览下面的相关文章希望大家以后多多支持我们!

本文标题为:C语言实现短字符串压缩的三种方法详解

基础教程推荐