IDN Homograph Attacks: When ‘аpple.com’ Isn't apple.com

Unicode looks the same, the registry doesn'tip8 Team

PhishingDomainPunycodeIDNSecurity

TL;DR. Domain names can contain non-ASCII characters since 2003. Many of those characters — Cyrillic а, Greek ο, Armenian օ — are visually indistinguishable from their ASCII twins. An attacker registers аpple.com and most browsers show it as apple.com in the address bar. The widget below lets you paste a URL and see which characters are confusable and what hostname your browser would actually request.

How Unicode got into domain names

DNS is, internally, ASCII-only — the protocol exchanges raw bytes between resolvers, and labels are limited to [A-Za-z0-9-]. To let people register names in their own script, RFC 3490 (2003) introduced Internationalised Domain Names: registries accept Unicode strings, normalise them with a series of rules (case folding, NFKC normalisation, then more recent IDNA2008 mapping), and convert the result to ASCII via an encoding called punycode. Punycode-encoded labels are prefixed with xn-- so the registry, the resolver and every router in between can keep working in pure ASCII.

The result: bücher.de is registered at the .de registry as xn--bcher-kva.de. Both forms resolve to the same record. Browsers display the Unicode form to the user — as long as they decide it's safe to.

The homograph problem

Unicode has thousands of characters that look like ASCII letters. The Cyrillic small letter а (U+0430) is rendered identically to the Latin small letter a (U+0061) in essentially every font ever shipped, but they are different characters with different code points. The same is true for о/о, р/p, с/c, е/e, х/x, and a few dozen more. Greek and Armenian add their own: Greek omicron ο (U+03BF) for o, Armenian օ (U+0585) for o.

An attacker who registers аpple.com (with the Cyrillic а) gets a real DNS record, a real TLS certificate (from any CA that didn't implement confusable-domain checks), and a domain that looks pixel-for-pixel identical to apple.com in most fonts. The 2017 demo by Xudong Zheng, using аррӏе.com (all four lowercase letters Cyrillic), got past Chrome, Firefox and Safari simultaneously. Browsers tightened the rules afterwards, but the underlying problem is permanent.

Try it yourself

Paste a domain into the checker below. It splits the input into individual characters, flags any that aren't ASCII, identifies common confusables by Unicode script, and renders the punycode form your browser would actually request.

Paste a domain — see if it's really what it looks like

We split the input into individual code points, flag the ones that aren't in the ASCII range, and render the punycode form your browser would actually request. Try one of the presets to see how a spoof differs from a legit IDN.

What you typed

аpple.com

Red characters are non-ASCII; hover for the Unicode code point and intended ASCII equivalent.

Punycode (the actual hostname)

xn--pple-43d.com

Every IDN domain is registered in punycode form (thexn-- prefix). Browsers display it in the address bar only if it passes anti-spoof rules.

This domain contains confusable characters from Cyrillic. That doesn't make it malicious by itself — but if it resembles a well-known brand, treat it as suspicious until you have verified ownership another way.

A gallery of common spoofs

The pairs below are all live patterns we've seen used in phishing campaigns. Each pair is two completely distinct domains at the registry — they just happen to render identically in your browser. Hover the right column to see the punycode the browser actually sends to the resolver.

Famous lookalikes, side by side

Every pair below registers as a different domain at the registry level — the spoof is a completely distinct hostname that just happens to render identically in most fonts.

Real

apple.com

Spoof

аpple.com= xn--pple-43d.com

Cyrillic а (U+0430) at position 1. Identical glyph in nearly every font.

Real

paypal.com

Spoof

рaypal.com= xn--aypal-uye.com

Cyrillic р (U+0440) instead of Latin p. The single most common bank-phishing spoof.

Real

google.com

Spoof

gооgle.com= xn--ggle-55da.com

Two Cyrillic о’s (U+043E). Used in 2017 Chrome IDN demo by Xudong Zheng.

Real

github.com

Spoof

gitһub.com= xn--gitub-y22b.com

Cyrillic һ (U+04BB) instead of Latin h. Common in software-developer phishing.

Real

amazon.com

Spoof

аmаzon.com= xn--mzon-7raa.com

Two Cyrillic а’s. Indistinguishable in monospace fonts.

Real

microsoft.com

Spoof

microsοft.com= xn--microsft-3xc.com

Greek omicron ο (U+03BF) for Latin o. Greek-script spoofs are less common but harder to spot in some fonts.

What browsers do to mitigate

Chrome and Firefox both ship a heuristic: if a domain mixes scripts in a single label (Latin + Cyrillic, for example), or if the label is entirely in a script the user's locale doesn't use, the address bar displays the punycode form instead of the Unicode one. The Cyrillic-only аррӏе.com spoof is the textbook reason this rule exists.

The rule is necessarily approximate. Mixed-script labels are legitimate in many regional contexts — Japanese names with Latin letters in them, for instance — so a strict rule would break real IDN use. Browsers err on the side of showing punycode when in doubt, and there is still a long tail of confusable combinations that fly under the radar. Single-character substitutions inside an otherwise pure-Latin label (the аpple.com case) are the most common practical attack today.

What registries and CAs do

Registry-level script restrictions. Most ccTLDs only allow characters from a defined “language table”: .de allows German, .gr allows Greek, .ru allows Cyrillic. .com historically allowed anything; ICANN's IDN tables tightened this over a period of years.
Bundle policies at the registrar. When a registrar accepts an IDN registration, it may automatically reserve the “ASCII twin” for the same owner, or block the registration of a confusable variant by anyone else. This is uneven across the industry.
CA review. Some CAs refuse to issue DV certificates for confusable IDNs to unrelated parties, but they don't enforce a global policy and many issue freely. CT logs let you monitor newly issued certs for your own brand to catch this — services like Phishery and Certstream are built for exactly this use case.

What you can do

For end users: don't type sensitive URLs from memory — use bookmarks, password manager URLs, or click through from inside the legit site. Homograph attacks rely on you typing or clicking a link that looks correct.
For brand owners: register the obvious confusable variants of your own domain so attackers can't. Monitor Certificate Transparency logs for new certs that mention your brand. Our Typo Generator produces the typo permutations to monitor as well.
For developers: when accepting URL input from users, normalise to punycode before any further checks (new URL(input).hostname does it for you in modern browsers). Compare the normalised form against your allowlist, not the raw display string.

Hunt the lookalikes that target your brand

The typo / homograph problem doesn't stop at script substitution — it includes character swaps, letter additions, keyboard slips, and homoglyph combinations across scripts. Our Typo Generator enumerates the full neighbourhood of a domain so you can monitor it. Pair it with our Typo Density analysis to see how saturated your namespace is.

Back to Blog

IDN Homograph Attacks: When ‘аpple.com’ Isn't apple.com

Unicode looks the same, the registry doesn'tip8 Team

PhishingDomainPunycodeIDNSecurity

How Unicode got into domain names

The homograph problem

Try it yourself

Paste a domain — see if it's really what it looks like

What you typed

аpple.com

Red characters are non-ASCII; hover for the Unicode code point and intended ASCII equivalent.

Punycode (the actual hostname)

xn--pple-43d.com

Every IDN domain is registered in punycode form (thexn-- prefix). Browsers display it in the address bar only if it passes anti-spoof rules.

A gallery of common spoofs

Famous lookalikes, side by side

Every pair below registers as a different domain at the registry level — the spoof is a completely distinct hostname that just happens to render identically in most fonts.

Real

apple.com

Spoof

аpple.com= xn--pple-43d.com

Cyrillic а (U+0430) at position 1. Identical glyph in nearly every font.

Real

paypal.com

Spoof

рaypal.com= xn--aypal-uye.com

Cyrillic р (U+0440) instead of Latin p. The single most common bank-phishing spoof.

Real

google.com

Spoof

gооgle.com= xn--ggle-55da.com

Two Cyrillic о’s (U+043E). Used in 2017 Chrome IDN demo by Xudong Zheng.

Real

github.com

Spoof

gitһub.com= xn--gitub-y22b.com

Cyrillic һ (U+04BB) instead of Latin h. Common in software-developer phishing.

Real

amazon.com

Spoof

аmаzon.com= xn--mzon-7raa.com

Two Cyrillic а’s. Indistinguishable in monospace fonts.

Real

microsoft.com

Spoof

microsοft.com= xn--microsft-3xc.com

Greek omicron ο (U+03BF) for Latin o. Greek-script spoofs are less common but harder to spot in some fonts.

What browsers do to mitigate

What registries and CAs do

Registry-level script restrictions. Most ccTLDs only allow characters from a defined “language table”: .de allows German, .gr allows Greek, .ru allows Cyrillic. .com historically allowed anything; ICANN's IDN tables tightened this over a period of years.
Bundle policies at the registrar. When a registrar accepts an IDN registration, it may automatically reserve the “ASCII twin” for the same owner, or block the registration of a confusable variant by anyone else. This is uneven across the industry.
CA review. Some CAs refuse to issue DV certificates for confusable IDNs to unrelated parties, but they don't enforce a global policy and many issue freely. CT logs let you monitor newly issued certs for your own brand to catch this — services like Phishery and Certstream are built for exactly this use case.

What you can do

For end users: don't type sensitive URLs from memory — use bookmarks, password manager URLs, or click through from inside the legit site. Homograph attacks rely on you typing or clicking a link that looks correct.
For brand owners: register the obvious confusable variants of your own domain so attackers can't. Monitor Certificate Transparency logs for new certs that mention your brand. Our Typo Generator produces the typo permutations to monitor as well.
For developers: when accepting URL input from users, normalise to punycode before any further checks (new URL(input).hostname does it for you in modern browsers). Compare the normalised form against your allowlist, not the raw display string.

How Unicode got into domain names

The homograph problem

Try it yourself

A gallery of common spoofs

What browsers do to mitigate

What registries and CAs do

What you can do

Hunt the lookalikes that target your brand

IP Address Details

How Unicode got into domain names

The homograph problem

Try it yourself

A gallery of common spoofs

What browsers do to mitigate

What registries and CAs do

What you can do

Hunt the lookalikes that target your brand