DS – Huffman Coding

Huffman Coding

  • It is the concept of compressing the data without losing the details.
  • It was developed by David Huffman.
  • Representing the input string in encrypted format to reduce the size of the string.
  • For example, consider one string
  • The length of above string is 15
  • Each character occupies 1 byte memory(8 bits)
  • The total size of string is 15*8=120 bits.

Using this technique we can reduce the size of string:

  • First we need to find the frequencies(count) of each character in the string.
  • Huffman coding creates a tree from the frequencies table.
  • Generate code(binary) for each character using the tree.
  • Using character code, we can store information into memory.
  • This character code is working like encrypted code also.
  • To get the actual string, we have to decode.

Calculate the frequency of each character:

Sort the characters in increasing order of frequency. These are stored into priority Queue.

  • We need to construct a tree with these characters.
  • Make each unique character as leaf node.
  • Create an empty node. Assign minimum frequency to the left child of node and assign second minimum frequency to the right child of node. Set the value of the node as sum of the above two minimum frequencies(child frequencies)

Add the next minimum frequency to node as child.

Repeat the above technique until all frequencies completed.

We need to set binary values to each node in the tree.

If it is non-leaf node, assign value 0 to left edge and 1 to right edge.

The tree with binary values as follows

We need to generate the code for each character from the above tree binary values.

Now we can find the reduced size of given string from above Huffman data storage.

  • Without encoding the total size of the string was 120 bits.
  • After encoding the size is reduced to 32 + 15 + 28 = 75 bits.

Scroll to Top