Blog - What is Punycode, Why Should I Care?
What Is Punycode?
The original standards for domains name specified that names should only use a subset of ASCII characters called LDH or Letter, Digit, Hyphen. This meant that domain names were limited to the characters a-z, 0-9 and hyphens. For example:
When the domain name system was opened up (see below) to allow the use of Unicode characters in domain names, the ASCII LDH rule still had to apply, i.e. although the domain itself may display to the user as Unicode characters, the storage and resolving of those domains would still be in ASCII.
Example of a Unicode domain: alliancefrançaise.nu
This meant that an algorithm to convert between Unicode and ASCII LDH was needed and this is where Punycode comes in. Punycode is an algorithm that allows the conversion of Unicode domain names to ASCII LDH.
Punycode allows Unicode characters to be represented using just ASCII Compatible Encodings (ACE). This allows Unicode characters to be encoded and transmitted across systems that only understand ASCII.
After being converted our Unicode example becomes: xn--alliancefranaise-npb.nu.
If you click on either URL, they both will link to the same location.
Punycode provides the method of converting between these two formats.
International Domain Names
International Domain Names (IDN) are domain names that contain characters that are in a language specific script, for example Arabic, Chinese and French. These characters cannot be represented with ASCII and instead use Unicode.
IDNs allow internet companies to use domain names in the native language of the user instead of having to use ASCII characters. Thism a much better user experience.
For those who are used to using and seeing ASCII characters you might wonder why this is needed. Let’s imagine for a moment the roles were reversed and all domain names were in Chinese. You would have to know and understand domain names that look like this:
If you are not native to the language this is very difficult, especially when characters in a language look very similar, think I and l or m and n. Add on top of this that fonts also affect how a character may look and make it harder to understand.
So imagine how difficult it is for people who are not used to ASCII characters to understand the current domain name system. There are massive benefits to users, in being able to use their native language.
There are also many top level domains that are now IDNs specific, for example:
- 中国 - China
- 台湾 - Taiwan
- Рф - Russia
The entire domain name does not need to be converted to support IDNs. It is possible just to have part of your domain convert to an IDN.
An example of this might be a large organisation that wants to use localised sub-domains. We can imagine a fictional online shop might use these domains:
For a businesse this gives a lot of scope to create sub domains for a specific country. Making it easier for users in that country to understand the intention of a domain, find the site and make it more memorable. It may also make it easier to find the website when using a search engine because the sub-domain will match the search term entered by the user.
Another advantage of allowing Unicode characters in the domain name is the ability to use Emoji’s. This means you can start embedding 👍 and 😃 in your domains. For example:
This might not be useful as a domain that someone physically types in, but it becomes a very interesting tool to use in online marketing campaigns because the emojis capture the eye and stand out from the rest of the content.
So How Can Refract DNS Help?
Refract DNS can help because it makes working with Punycode very easy. The Windows host file itself does not support Unicode characters, therefore to use the Windows hosts file you are going to have to enter the Punycode values. This means converting domains to punycode and then making lots of notes:
This is fine until a colleague sends you an email like this:
“Staging is ready for test, update your hosts file to point 体育.awesomebrand.com at 126.96.36.199”
Now you have to work out which Punycode string matches that domain name, which means a trip back to the Punycode converter and then a search in the Windows Hosts file.
With Refract DNS you don’t have this problem. Refract DNS allows you to add domains with Unicode characters and it will handle the conversion to Punycode. You can also add a friendly name to the domain:
When you need to find the domain, you can use the Unicode value in the search box:
Each domain also has a handy context menu which allows you to copy the unicode value, punycode value or browse directly to the website.
Refract DNS makes managing IDN domains locally very easy!
Mike - August 2019