Information theory

                CODE COEFFICIENCY

Code efficiency is a broad term used to depict the reliability, speed and programming methodology used in developing codes for an application. Code efficiency is directly linked with algorithmic efficiency and the speed of runtime execution for software. It is the key element in ensuring high performance. The goal of code efficiency is to reduce resource consumption and completion time as much as possible with minimum risk to the business or operating environment. The software product quality can be accessed and evaluated with the help of the efficiency of the code used.

 

                  Shannon–Fano Algorithm 

A Shannon–Fano tree is built according to a specification designed to define an effective code table. The actual algorithm is simple:

  1. For a given list of symbols, develop a corresponding list of probabilities or frequency counts so that each symbol’s relative frequency of occurrence is known.
  2. Sort the lists of symbols according to frequency, with the most frequently occurring symbols at the left and the least common at the right.
  3. Divide the list into two parts, with the total frequency counts of the left part being as close to the total of the right as possible.
  4. The left part of the list is assigned the binary digit 0, and the right part is assigned the digit 1. This means that the codes for the symbols in the first part will all start with 0, and the codes in the second part will all start with 1.
  5. Recursively apply the steps 3 and 4 to each of the two halves, subdividing groups and adding bits to the codes until each symbol has become a corresponding code leaf on the tree.

 

                  HUFFMAN CODING

Huffman coding is a lossless data compression algorithm. The idea is to assign variable-legth codes to input characters, lengths of the assigned codes are based on the frequencies of corresponding characters. The most frequent character gets the smallest code and the least frequent character gets the largest code.

There are mainly two major parts in Huffman Coding:
1) Build a Huffman Tree from input characters.
2) Traverse the Huffman Tree and assign codes to characters.

Steps to build Huffman Tree
Input is array of unique characters along with their frequency of occurrences and output is Huffman Tree.

1. Create a leaf node for each unique character and build a min heap of all leaf nodes (Min Heap is used as a priority queue. The value of frequency field is used to compare two nodes in min heap. Initially, the least frequent character is at root)

2. Extract two nodes with the minimum frequency from the min heap.

3. Create a new internal node with frequency equal to the sum of the two nodes frequencies. Make the first extracted node as its left child and the other extracted node as its right child. Add this node to the min heap.

4. Repeat steps#2 and #3 until the heap contains only one node. The remaining node is the root node and the tree is complete.

                       Entropy rate

In the mathematical theory of probability, the entropy rate or source information rate of a stochastic process is, informally, the time density of the average information in a stochastic process. For stochastic processes with a countable index, the entropy rate H(X) is the limit of the joint entropy of n members of the process Xk divided by n, as n tends to infinity:

     H(X) = \lim_{n \to \infty} \frac{1}{n} H(X_1, X_2, \dots X_n)

when the limit exists. An alternative, related quantity is:

  H'(X) = \lim_{n \to \infty} H(X_n|X_{n-1}, X_{n-2}, \dots X_1)