Online Learning Platform

Information Theory and Coding > Data Compression > What are the Basic properties of codes?

Block Code: A block code is one in which a particular message of the source is always encoded into the same “fixed sequence” of the code symbol. Here block is used as fixed sequence only.

Example: Source alphabet is S = {s1, s2, s3, s4}, Code alphabet is D = {0, 1} and The Code words are: C = {0, 11, 10, 11}

Non-singular Code: A code is said to be nonsingular if every element of the range of X maps into a different string in D* i.e if x ≠ y → C(x) ≠ C(y)

A block code is said to be non-singular if all the words of the code set are distinct. We cannot distinguish the code words. If the codes are not distinguishable on a simple inspection we say the code set is “singular in the small”.

The codes given in the example of Block code do not satisfy this property as the codes for s2 and s4 are not different.

If the source alphabet is S = {s1, s2, s3, s4}, D = {0, 1}; Codes, C = {0, 11, 10, 01} appear to be non-singular, upon transmission would pose problems in decoding. For, if the transmitted sequence is 0011, it might be interpreted as s1 s1 s4 or s2 s4. Thus there is an ambiguity about the code. No doubt, the code is non-singular in the small, but becomes “Singular in the large”.

Extension of a code: Extension of a code C is the mapping from finite length strings of X to finite-length strings of D, e.i.

C(x₁ x₂ ··· x_n) = C(x₁)C(x₂)···C(x_n)

where C(x₁)C(x₂)···C(x_n) indicates concatenation of the corresponding codewords.

The extension code 0110111100110 is decoded as 134213 using concatenation

Uniquely decodable code: A code is called uniquely decodable if its extension is nonsingular. In other words, any encoded string in a uniquely decodable code has only one possible source string producing it.

Let S = {s1, s2, s3, s4}, D = {0, 1}; Codes, C = {0, 11, 10, 01}. The Second extension of the code set given by

S² ={s1s1, s1s2, s1s3, s1s4, s2s1, s2s2, s2s3, s2s4, s3s1, s3s2, s3s3, s3s4, s4s1, s4s2, s4s3, s4s4}

The codes for the source sequences, s1s3 and s4s1 are not distinct and hence the code is “ Singular in the Large”. We need to have unique decidability of our codes that “The n-th extension of the code be non-singular for every finite n.”

Prefix or instantaneous Code: A code is called a prefix code or an instantaneous code if no codeword is a prefix of any other codeword.

Let X_k = x_k1x_k2….x_km, be a code word of some source symbol s_k. Then the sequences of code symbols, (x_k1x_k2….x_km), " j ≤ m, are called prefixes of the codeword. For example, the code word 0111 has four prefixes, viz; 0, 01, 011 and 0111.The complete code word is also regarded as a prefix.

An instantaneous code can be decoded without reference to future codewords since the end of a codeword is immediately recognizable. Hence, for an instantaneous code, the symbol x_i can be decoded as soon as we come to the end of the codeword corresponding to it. We need not wait to see the codewords that come later.

Example:

Code A: undoubtedly it is the simplest possible uniquely decipherable code. It is non- singular and all the code words have same length. The decoding can be done as soon as we receive two code symbols without any need to receive succeeding code symbols.

Code B: it is also uniquely decodable with a special feature that the 0`s indicate the termination of a code word. It is called the “comma code”. When scanning a sequence of code symbols, we may use the comma to determine the end of a code word and the beginning of the other.

Code C: It is a non- singular and uniquely decodable code but it cannot be decoded word by word as it is received. For example, if we receive ‘01’, we cannot decode it as ‘ s2’ until we receive the next code symbol. If the next code symbol is ‘0’, indeed the previous word corresponds to s2, while if it is a ‘ 1’ it may be the symbol s3; which can be concluded so if only if we receive a ‘0’ in the fourth place. Thus, there is a definite ‘time lag’ before a word can be decoded. Such a ‘time waste’ is not there if we use either Code A or Code B. So Code C is not a Prefix or instantaneous Code.

Optimal Codes

An instantaneous code is said to be optimal if it has “minimum average word length”, for a source with a given probability assignment for the source symbols. In such codes, source symbols with higher probabilities of occurrence are made to correspond to shorter code words.

Suppose that a source symbol s_i has a probability of occurrence P_i and has a code word of length l_i assigned to it, while a source symbol sj with probability Pj has a code word of length l_j . If P_i >P_j and then let l_i < l_j then average length will be

L₁ = P_i l_i + P_j l_j

Now, suppose we interchange the code words so that the code word of length l_j corresponds to s_i and that of length l_i corresponds to s_j . Then, the average length become

L₂ = P_i l_j + P_j l_i

L₂- L₁= P_i l_j + P_j l_i -( P_i l_i + P_j l_j) = (P_i – P_j ) (l_j – l_i )

Since P_i >P_j and l_i < l_j : L₂- L₁is greater than zero i.e. positive i.e. minimum average word length.

If a code that satisfies all the five properties is called an irreducible code.

Codes-sub grouping

What do you mean by "Codes" in data compression?

Feedback

ABOUT

Statlearner

Statlearner STUDY

Statlearner