Question 1

How many Unicode characters are there?

Accepted Answer

Unicode 16.0 (2024) defines 154,998 characters across 168 scripts. This covers every major living language, many historic scripts, thousands of symbols, all emoji, mathematical notation, and more. The total Unicode code space has capacity for over 1.1 million characters (1,114,112 code points), so the standard has plenty of room for future additions.

Question 2

What is a Unicode code point?

Accepted Answer

A Unicode code point is the unique number assigned to each character in the Unicode standard. It is written as U+ followed by a hexadecimal number — for example, U+0041 for the letter A, U+00A9 for the copyright sign ©, and U+1F600 for the grinning face emoji 😀. Code points are the definitive identifiers for Unicode characters — the same code point represents the same character on every device and platform.

Question 3

What is the difference between a Unicode character and a glyph?

Accepted Answer

A Unicode character is an abstract unit — a code point with a defined identity and properties. A glyph is the specific visual shape used to represent that character in a particular font. The character A (U+0041) is the same abstract unit regardless of whether it is displayed in Arial, Times New Roman, or any other font — but each font provides a different glyph (visual design) for that character.

Question 4

What is the Basic Multilingual Plane (BMP)?

Accepted Answer

The Basic Multilingual Plane (BMP) covers Unicode code points U+0000 through U+FFFF — the first 65,536 code points. It contains most of the characters in everyday use: all Latin letters, digits, common symbols, CJK ideographs, Arabic, Cyrillic, and many other scripts. Code points above U+FFFF (called supplementary characters) include emoji, rare historic scripts, and mathematical symbols.

Question 5

Why do some Unicode characters cause issues in programming?

Accepted Answer

Code points above U+FFFF require 2 code units in UTF-16 encoding (called a surrogate pair) and 4 bytes in UTF-8. Languages that measure string length in code units (like JavaScript, which uses UTF-16) report a length of 2 for a single emoji character. This can cause bugs when slicing strings, reversing text, or counting characters. Unicode-aware string methods handle these correctly by working with code points rather than code units.

Unicode Character

What Is Unicode Character?

How Unicode Character Works

Examples of Unicode Character

Where Is Unicode Character Used?

Try These Free Tools

Related Terms

Frequently Asked Questions