Difference Between UNICODE and ASCII
Learn via video course
Overview
Encoding schemes are used to convert characters we use daily to machine language. The characters can be emojis, alphabets, Greek symbols, etc.
ASCII and Unicode are two popular encoding schemes. ASCII encodes symbols, digits, letters, etc., whereas Unicode encodes special texts from different languages, letters, symbols, etc.
It can be said that ASCII is a subset of the Unicode encoding scheme. Below we will be studying the difference between Unicode and ASCII.
Scope
In this article, we will cover below topics :
- Explanation of encoding schemes: We will focus on popularly used standard encoding schemes, Unicode and ASCII.
- Difference between Unicode and ASCII.
- Table representing ASCII characters.
- What are Unicode characters, and how are they encoded in memory?
- We will not be discussing any other type of encoding schemes or encoding-related questions.
The ASCII Characters
Now, we will be discussing what ASCII characters are. ASCII stands for American Standard Code for Information Interchange and is used for electronic communication.
It uses integers to encode numbers(0-9), uppercase alphabets(A-Z), lowercase alphabets(a-z) and symbols such as semicolon(;), exclamation(!) etc. Integers are easy to store in electronic devices rather than alphabets or symbols. For example, 97 is used to represent "a", and 33 is used to represent "!" and can be easily stored in memory.
If the ASCII value of a particular alphabet is known, then the ASCII value of another alphabet can be estimated. For example, ASCII value of a is 97, then the ASCII value of z will be .
ASCII uses 8 bits to encode any character, most of them from the English language used in modern-day programming. It is also used in graphic arts to represent clip art or images using characters.
The major disadvantage of ASCII is that it can represent only 256 different characters as it can use only 8 bits. ASCII cannot be used to encode the many types of characters found around the world. Unicode was extended further to UTF-16 and UTF-32 to encode the various types of characters. Therefore, the significant difference between ASCII and Unicode is the number of bits used to encode.
Decimal-Binary-ASCII conversion chart
Decimal | Binary | ASCII | Decimal | Binary | ASCII |
---|---|---|---|---|---|
0 | 00000000 | NUL | 64 | 01000000 | @ |
1 | 00000001 | SOH | 65 | 01000001 | A |
2 | 00000010 | STX | 66 | 01000010 | B |
3 | 00000011 | ETX | 67 | 01000011 | C |
4 | 00000100 | EOT | 68 | 01000100 | D |
5 | 00000101 | ENQ | 69 | 01000101 | E |
6 | 00000110 | ACK | 70 | 01000110 | F |
7 | 00000111 | BEL | 71 | 01000111 | G |
8 | 00001000 | BS | 72 | 01001000 | H |
9 | 00001001 | HT | 73 | 01001001 | I |
10 | 00001010 | LF | 74 | 01001010 | J |
11 | 00001011 | VT | 75 | 01001011 | K |
12 | 00001100 | FF | 76 | 01001100 | L |
13 | 00001101 | CR | 77 | 01001101 | M |
14 | 00001110 | SO | 78 | 01001110 | N |
15 | 00001111 | SI | 79 | 01001111 | O |
16 | 00010000 | DLE | 80 | 01010000 | P |
17 | 00010001 | DC1 | 81 | 01010001 | Q |
18 | 00010010 | DC2 | 82 | 01010010 | R |
19 | 00010011 | DC3 | 83 | 01010011 | S |
20 | 00010100 | DC4 | 84 | 01010100 | T |
21 | 00010101 | NAK | 85 | 01010101 | U |
22 | 00010110 | SYN | 86 | 01010110 | V |
23 | 00010111 | ETB | 87 | 01010111 | X |
24 | 00011000 | CAN | 88 | 01011000 | W |
25 | 00011001 | EM | 89 | 01011001 | Y |
26 | 00011010 | SUB | 90 | 01011010 | Z |
27 | 00011011 | ESC | 91 | 01011011 | [ |
28 | 00011100 | FS | 92 | 01011100 | | |
29 | 00011101 | GS | 93 | 01011101 | ] |
30 | 00011110 | RS | 94 | 01011110 | ^ |
31 | 00011111 | US | 95 | 01011111 | _ |
32 | 00100000 | SP | 96 | 01100000 | . |
33 | 00100001 | ! | 97 | 01100001 | a |
34 | 00100010 | " | 98 | 01100010 | b |
35 | 00100011 | # | 99 | 01100011 | c |
36 | 00100100 | $ | 100 | 01100100 | d |
37 | 00100101 | % | 101 | 01100101 | e |
38 | 00100110 | & | 102 | 01100110 | f |
39 | 00100111 | ' | 103 | 01100111 | g |
40 | 00101000 | ( | 104 | 01101000 | h |
41 | 00101001 | ) | 105 | 01101001 | i |
42 | 00101010 | * | 106 | 01101010 | j |
43 | 00101011 | + | 107 | 01101011 | k |
44 | 00101100 | , | 108 | 01101100 | l |
45 | 00101101 | - | 109 | 01101101 | m |
46 | 00101110 | . | 110 | 01101110 | n |
47 | 00101111 | / | 111 | 01101111 | o |
48 | 00110000 | 0 | 112 | 01110000 | p |
49 | 00110001 | 1 | 113 | 01110001 | q |
50 | 00110010 | 2 | 114 | 01110010 | r |
51 | 00110011 | 3 | 115 | 01110011 | s |
52 | 00110100 | 4 | 116 | 01110100 | t |
53 | 00110101 | 5 | 117 | 01110101 | u |
54 | 00110110 | 6 | 118 | 01110110 | v |
55 | 00110111 | 7 | 119 | 01110111 | w |
56 | 00111000 | 8 | 120 | 01111000 | x |
57 | 00111001 | 9 | 121 | 01111001 | y |
58 | 00111010 | : | 122 | 01111010 | z |
59 | 00111011 | ; | 123 | 01111011 | { |
60 | 00111100 | < | 124 | 01111100 | | |
61 | 00111101 | = | 125 | 01111101 | } |
62 | 00111110 | > | 126 | 01111110 | ~ |
63 | 00111111 | ? | 127 | 01111111 | DEL |
The Unicode Characters
Unicode stands for Universal Character Set and is maintained by Unicode Consortium. Unicode Consortium is a non-profit corporation that sets the standards for software to be used internationally. The IT industry standardizes Unicode to encode and represent characters in computers and other electronic and communication devices.
Unicode represents a vast ocean of characters, formulas, mathematical symbols, and texts from different languages such as Devanagiri, Latin, Greek, Cyrillic, Armenian, etc. Unicode is also used to represent texts written from right to left, such as Hebrew and Arabic. Unicode is one of the only encoding schemes that can be used to encode many of the characters used around the world.
Unicode Transformation Format(UTF) is the type of Unicode encoding scheme. Unicode encoding schemes are classified based on the number of bits used to encode the characters. The types of Unicode encoding schemes used at present are UTF-7, UTF-8, UTF-16, and UTF-32 using 7 bits, 8 bits, 16 bits, and 32 bits, respectively, for representing characters. The requirement of Unicode is for the internationalization and localization of computer software and is also used for the operating system, XML, Java programming, etc.
Relationship Between ASCII And Unicode
Unicode has several encoding formats, two of which are UTF-7 and UTF-8, which use 7 bits and 8 bits, respectively, to represent characters that are otherwise difficult to store in memory. ASCII also uses 7 and 8 bits for the representation of characters. A large number of characters used around the world which cannot be encoded by using 8-bit representation led to the creation of UTF-16 and UTF-32 encoding formats under Unicode encoding. Thus, ASCII is a subset of the Unicode encoding scheme.
Difference Between ASCII And Unicode
Parameter | Unicode | ASCII |
---|---|---|
Abbreviation | Unicode stands for Universal Character Set. | ASCII stands for American Standard Code for Information Interchange. |
Usage | Unicode is standardized by the IT industry to be used in encoding and representing characters in computers. | ASCII is used for electronic communication and in programming languages such as HTML. |
Characters represented | Unicode is used to represent a large number of characters, formulas, mathematical symbols, and texts from different languages such as Devanagiri, Latin, Greek, Cyrillic, Armenian, etc. | ASCII is used to represent English Alphabets, digits, some mathematical symbols(+,-,/ etc.) and grammatical symbols such as punctuation, exclamation, etc. |
Bits used for encoding | Unicode uses four type of encoding formats, i.e, UTF-7, UTF-8, UTF-16, UTF-32 utilising 7,8,16 and 32 bits respectively. | ASCII only uses 7 or 8 bits to represent various characters. |
Memory occupied | Unicode's UTF-8, UTF-16, and UTF-32 encoding schemes use 8, 16, and 32 bits, respectively, thus consuming more memory. | ASCII uses 7 or 8 bits in encoding; therefore, it occupies less space in memory. |
Conclusion
- Characters such as emojis, complex mathematical symbols, etc., are mapped to the bits by encoding schemes such as ASCII and Unicode to be stored in memory.
- ASCII encodes very basic day-to-day characters such as alphabets, numbers, etc., with its 7 to 8-bit encoding hence consuming less space.
- Unicode has many formats and thus is a very flexible encoding scheme that is standardized for operating systems,.NET frameworks, Java, etc.
- ASCII occupies less space, making it perfect for electronic communication, such as sending text messages.
- The difference between Unicode and ASCII is basically due to the number of bits they use and amount of characters they can encode.
- ASCII is a subset of Unicode as Unicode represents many other characters along with characters represented by ASCII.