Dynamic_Huffman_Vitter

所属分类:物联网
开发工具:C/C++
文件大小:2417KB
下载次数:3
上传日期:2020-11-15 13:01:12
上 传 者岚岚岚1
说明:  哈夫曼编码,c语言,压缩,Vitter, 数据压缩
(Huffman coding, C language, compression, vitte)

文件列表:
Dynamic_Huffman_Vitter (0, 2020-11-14)
Dynamic_Huffman_Vitter\Makefile (625, 2020-10-10)
Dynamic_Huffman_Vitter\compress.c (3634, 2020-10-10)
Dynamic_Huffman_Vitter\compress.h (582, 2020-10-10)
Dynamic_Huffman_Vitter\decompress.c (6366, 2020-10-10)
Dynamic_Huffman_Vitter\decompress.h (513, 2020-10-10)
Dynamic_Huffman_Vitter\dictionary.c (1251, 2020-10-10)
Dynamic_Huffman_Vitter\dictionary.h (862, 2020-10-10)
Dynamic_Huffman_Vitter\file.c (2927, 2020-10-10)
Dynamic_Huffman_Vitter\file.h (920, 2020-10-10)
Dynamic_Huffman_Vitter\list.c (3228, 2020-10-10)
Dynamic_Huffman_Vitter\list.h (1645, 2020-10-10)
Dynamic_Huffman_Vitter\main.c (1956, 2020-10-10)
Dynamic_Huffman_Vitter\other (0, 2020-11-14)
Dynamic_Huffman_Vitter\other\1.png (378778, 2020-10-10)
Dynamic_Huffman_Vitter\other\2.png (192043, 2020-10-10)
Dynamic_Huffman_Vitter\other\Vit87.jacmACMversion.pdf (1626108, 2020-10-10)
Dynamic_Huffman_Vitter\other\test1_step_by_step_output (90, 2020-10-10)
Dynamic_Huffman_Vitter\other\test2_step_by_step_output (90, 2020-10-10)
Dynamic_Huffman_Vitter\testfile (0, 2020-11-14)
Dynamic_Huffman_Vitter\testfile\harry_potter_1.txt (492161, 2020-10-10)
Dynamic_Huffman_Vitter\testfile\harry_potter_4.txt (1138205, 2020-10-10)
Dynamic_Huffman_Vitter\testfile\test1 (7, 2020-10-10)
Dynamic_Huffman_Vitter\testfile\test2 (8, 2020-10-10)
Dynamic_Huffman_Vitter\tree.c (4646, 2020-10-10)
Dynamic_Huffman_Vitter\tree.h (2830, 2020-10-10)
Dynamic_Huffman_Vitter\update.c (12934, 2020-10-10)
Dynamic_Huffman_Vitter\update.h (1207, 2020-10-10)

# Dynamic Huffman Vitter Algorithm Implementation ## Introduction Dynamic Huffman encoding, compared to the static Huffman encoding, is a one-pass algorithm which reads the input file once only. The static Huffman reads twice, first time to generate the occurrence table and construct the binary tree, the second time to encode the file. In contrast, dynamic huffman encoding updates the tree at the same time as reading the file. The algorithm is thus a bit more complex than the statuc Huffman. Vitter invented the algorithm that is based on FGK algorithm. The main idea is that > For each weight w, all leaves of weight w precede (in the implicit numbering) all internal nodes of weight w. [Vitter 1***7] And he also provides a detailed implementation that uses array structure with a right pointer [Vitter 1***9] which is a bit more complex and time efficient than his proposed functions in his earlier paper. ## Details of the Update Function The key to maintain the dynamic tree similar to the Huffman tree is in the update procedure. FGK algorithm produces a Huffman tree that may be skewed, and Vitter improves the procedure by generating a more balanced Huffman tree. ``` Update_TreeVITTER( accept symbol ) { leafToIncrement = NULL; p = pointer to symbol’s node; if ( p == NULL ){/* a new symbol! */ Create two leaf children of the 0-node,such that the right child is the new symbol node and the left child is the new 0-node; p = parent of the symbol node; leafToIncrement = the right child of p; } else { Swap p in the tree with the leader of its block; if ( p is the sibling of the 0-node ) { leafToIncrement = p; p = parent of p; } } increase p weight by 1; while ( p != root of the tree ) { /* if possible, advance p to the next block. */ SlideAndIncrement( p ); } if ( leafToIncrement != NULL ) { SlideAndIncrement(leafToIncrement); } } ``` We can see that, `SlideAndIncrement` is the major feature of Vitter's method. FGK only "swaps" the symbol node with the highest node of the block, while Vitter slides the whole block into targeted position. The following picture from Vitter's paper explains the procedure. ![slide&increment vitter paper 1***7](other/1.png) To update a leaf node, slides all leaf nodes with same weight before it one position backwards, and make this leaf node the leader, and increase its weight. Then procede with its new parent node. To update an internal node, slides all leaf nodes with weight 1 higher before its position one position backwards, and move this internal node into the gap, increase its weight. Thenprocede with its original parent node. The pseudo code for `SlideAndIncrement` is at below. ``` SlideAndIncrement ( accept node p ){ fp = parent of p; wt = p’s weight; if ( p is an internal node ) Slide p in the tree higher than the leaf nodes of weight wt+1; else Slide p in the tree higher than the nodes of weight wt; p’s weight = p’s weight + 1; if ( p is an internal node ) p = fp; else p = new parent of p; } ``` ## Implementation for the Update Function So the issue is, how to construct a data structure that not only can have the properties of a binary tree, but also can depict the increasing order of occurrence in the way like an array. My solution is to use the combination of a tree and a list. The tree structure is almost the same as all binary trees ``` struct _TreeNode{ int c; int occ; struct _TreeNode *left; struct _TreeNode *right; struct _TreeNode *parent; }; typedef struct _TreeNode *TreeNode; struct _Tree{ TreeNode root; TreeNode NYT; }; typedef struct _Tree *Tree; ``` And the list has each node that encapsulates a treenode. So that each listnode relates to a particular treenode, and this relation never break. A doubly linked list is used to make implementation easier. ``` struct _ListNode{ TreeNode trn; struct _ListNode *prev; struct _ListNode *next; }; typedef struct _ListNode *ListNode; typedef struct _ListNode *List; ``` So during the `Update` procedure, especially the `SlideAndIncrement` step, the update not only happens on the tree level, but also on the list level. For each sliding, the block of treenodes first reconnects to its new parent, then the block of corresponding listnodes move and reconnect to its new position. For details please read `Update.h` and `Update.c` files. ## Pesudo EOF Design For compression, since the output for each symbol is not exactly 8 bits, there is always a problem for the EOF sign. Usually the last char has to be padded with several bits of 0 before printing. In this implementation, I uses a simple way. The first byte of the file records the number of bits of 0 padded at the last char of the file. So during compression, the actual output starts at the second byte. And when the compression finish, the padding function returns the number of bits padded, then the compression function goes back to the first byte of the file and reprint that char. And for decompression, the function reads the number of bits pad from the first char at first. Then it starts decompression from the second char. And when the file reads to the final char, it stops before the number of padded bits. ## File Structure ``` main.c --- compress.c decompress.c --- update.c --- dictionary.c file.c --- list.c --- tree.c ``` ## Usage Codes for download the repository: ``` git@github.com:luke-mao/Data-Compression.git cd Data-Compression/Dynamic_Huffman_Vitter make ``` ``` Usage: ./vitter <-c|-d> // -c for compression, -d for decompression ``` The compressed file will have .v suffix. The decompressed file will remove .v suffix, and add deVitter_ prefix. Sample usage shows in the picture. ![sample usage](other/2.png) ## Reference >Vitter, J., 1***7. Design and analysis of dynamic Huffman codes. Journal of the ACM, 34(4), pp.825-845. >Sites.google.com. 2020. Algorithm Vitter - The Data Compression Guide. [online] Available at: [Accessed 8 October 2020]. >UNSW 2020T2 COMP9319 Web Data Compression and Search by Raymond Wong

近期下载者

相关文件


收藏者