IDN Homograph Attacks: When ‘аpple.com’ Isn't apple.com
TL;DR. Domain names can contain non-ASCII characters since 2003. Many of those characters — Cyrillic а, Greek ο, Armenian օ — are visually indistinguishable from their ASCII twins. An attacker registers аpple.com and most browsers show it as apple.com in the address bar. The widget below lets you paste a URL and see which characters are confusable and what hostname your browser would actually request.
How Unicode got into domain names
DNS is, internally, ASCII-only — the protocol exchanges raw bytes between resolvers, and labels are limited to [A-Za-z0-9-]. To let people register names in their own script, RFC 3490 (2003) introduced Internationalised Domain Names: registries accept Unicode strings, normalise them with a series of rules (case folding, NFKC normalisation, then more recent IDNA2008 mapping), and convert the result to ASCII via an encoding called punycode. Punycode-encoded labels are prefixed with xn-- so the registry, the resolver and every router in between can keep working in pure ASCII.
The result: bücher.de is registered at the .de registry as xn--bcher-kva.de. Both forms resolve to the same record. Browsers display the Unicode form to the user — as long as they decide it's safe to.
The homograph problem
Unicode has thousands of characters that look like ASCII letters. The Cyrillic small letter а (U+0430) is rendered identically to the Latin small letter a (U+0061) in essentially every font ever shipped, but they are different characters with different code points. The same is true for о/о, р/p, с/c, е/e, х/x, and a few dozen more. Greek and Armenian add their own: Greek omicron ο (U+03BF) for o, Armenian օ (U+0585) for o.
An attacker who registers аpple.com (with the Cyrillic а) gets a real DNS record, a real TLS certificate (from any CA that didn't implement confusable-domain checks), and a domain that looks pixel-for-pixel identical to apple.com in most fonts. The 2017 demo by Xudong Zheng, using аррӏе.com (all four lowercase letters Cyrillic), got past Chrome, Firefox and Safari simultaneously. Browsers tightened the rules afterwards, but the underlying problem is permanent.
Try it yourself
Paste a domain into the checker below. It splits the input into individual characters, flags any that aren't ASCII, identifies common confusables by Unicode script, and renders the punycode form your browser would actually request.
We split the input into individual code points, flag the ones that aren't in the ASCII range, and render the punycode form your browser would actually request. Try one of the presets to see how a spoof differs from a legit IDN.
xn--pple-43d.comxn-- prefix). Browsers display it in the address bar only if it passes anti-spoof rules.A gallery of common spoofs
The pairs below are all live patterns we've seen used in phishing campaigns. Each pair is two completely distinct domains at the registry — they just happen to render identically in your browser. Hover the right column to see the punycode the browser actually sends to the resolver.
Every pair below registers as a different domain at the registry level — the spoof is a completely distinct hostname that just happens to render identically in most fonts.
apple.comаpple.com= xn--pple-43d.compaypal.comрaypal.com= xn--aypal-uye.comgoogle.comgооgle.com= xn--ggle-55da.comgithub.comgitһub.com= xn--gitub-y22b.comamazon.comаmаzon.com= xn--mzon-7raa.commicrosoft.commicrosοft.com= xn--microsft-3xc.comWhat browsers do to mitigate
Chrome and Firefox both ship a heuristic: if a domain mixes scripts in a single label (Latin + Cyrillic, for example), or if the label is entirely in a script the user's locale doesn't use, the address bar displays the punycode form instead of the Unicode one. The Cyrillic-only аррӏе.com spoof is the textbook reason this rule exists.
The rule is necessarily approximate. Mixed-script labels are legitimate in many regional contexts — Japanese names with Latin letters in them, for instance — so a strict rule would break real IDN use. Browsers err on the side of showing punycode when in doubt, and there is still a long tail of confusable combinations that fly under the radar. Single-character substitutions inside an otherwise pure-Latin label (the аpple.com case) are the most common practical attack today.
What registries and CAs do
- Registry-level script restrictions. Most ccTLDs only allow characters from a defined “language table”: .de allows German, .gr allows Greek, .ru allows Cyrillic. .com historically allowed anything; ICANN's IDN tables tightened this over a period of years.
- Bundle policies at the registrar. When a registrar accepts an IDN registration, it may automatically reserve the “ASCII twin” for the same owner, or block the registration of a confusable variant by anyone else. This is uneven across the industry.
- CA review. Some CAs refuse to issue DV certificates for confusable IDNs to unrelated parties, but they don't enforce a global policy and many issue freely. CT logs let you monitor newly issued certs for your own brand to catch this — services like Phishery and Certstream are built for exactly this use case.
What you can do
- For end users: don't type sensitive URLs from memory — use bookmarks, password manager URLs, or click through from inside the legit site. Homograph attacks rely on you typing or clicking a link that looks correct.
- For brand owners: register the obvious confusable variants of your own domain so attackers can't. Monitor Certificate Transparency logs for new certs that mention your brand. Our Typo Generator produces the typo permutations to monitor as well.
- For developers: when accepting URL input from users, normalise to punycode before any further checks (
new URL(input).hostnamedoes it for you in modern browsers). Compare the normalised form against your allowlist, not the raw display string.
Hunt the lookalikes that target your brand
The typo / homograph problem doesn't stop at script substitution — it includes character swaps, letter additions, keyboard slips, and homoglyph combinations across scripts. Our Typo Generator enumerates the full neighbourhood of a domain so you can monitor it. Pair it with our Typo Density analysis to see how saturated your namespace is.