Table of Contents

This is a memo of RFC 5646, ie BCP-47.

1 The Language Tag

Language tags are used to help identify languages, whether spoken,
written, signed, or otherwise signaled, for the purpose of
communication. This includes constructed and artificial languages
but excludes languages not intended primarily for human
communication, such as programming languages.

1.1 Syntax

  • TAG is composed from a sequence of one or more subtags
  • SubTags are sequence of alphanumric characters to narrow the range of languge.
  • SubTags are concated suing "-".

The syntax of the language tag in ABNF [RFC5234] is:

Language-Tag  = langtag             ; normal language tags
/ privateuse ; private use tag
/ grandfathered ; grandfathered tags langtag = language
["-" script]
["-" region]
*("-" variant)
*("-" extension)
["-" privateuse] language = *3ALPHA ; shortest ISO code
["-" extlang] ; sometimes followed by
; extended language subtags
/ 4ALPHA ; or reserved for future use
/ *8ALPHA ; or registered language subtag extlang = 3ALPHA ; selected ISO codes
*("-" 3ALPHA) ; permanently reserved script = 4ALPHA ; ISO code region = 2ALPHA ; ISO - code
/ 3DIGIT ; UN M. code variant = *8alphanum ; registered variants
/ (DIGIT 3alphanum) extension = singleton *("-" (*8alphanum)) ; Single alphanumerics
; "x" reserved for private use
singleton = DIGIT ; -
/ %x41- ; A - W
/ %x59-5A ; Y - Z
/ %x61- ; a - w
/ %x79-7A ; y - z privateuse = "x" *("-" (*8alphanum)) grandfathered = irregular ; non-redundant tags registered
/ regular ; during the RFC era irregular = "en-GB-oed" ; irregular tags do not match
/ "i-ami" ; the 'langtag' production and
/ "i-bnn" ; would not otherwise be
/ "i-default" ; considered 'well-formed'
/ "i-enochian" ; These tags are all valid,
/ "i-hak" ; but most are deprecated
/ "i-klingon" ; in favor of more modern
/ "i-lux" ; subtags or subtag
/ "i-mingo" ; combination
/ "i-navajo"
/ "i-pwn"
/ "i-tao"
/ "i-tay"
/ "i-tsu"
/ "sgn-BE-FR"
/ "sgn-BE-NL"
/ "sgn-CH-DE" regular = "art-lojban" ; these tags match the 'langtag'
/ "cel-gaulish" ; production, but their subtags
/ "no-bok" ; are not extended language
/ "no-nyn" ; or variant subtags: their meaning
/ "zh-guoyu" ; is defined by their registration
/ "zh-hakka" ; and all of these are deprecated
/ "zh-min" ; in favor of a more modern
/ "zh-min-nan" ; subtag or sequence of subtags
/ "zh-xiang" alphanum = (ALPHA / DIGIT) ; letters and numbers

Figure 1: Language Tag ABNF


1.1.1 Formatting of Languge Tags

Although tags should be case-insensitive, there are formatting conventions:

  • recommends that language codes be written in lowercase ('mn' Mongolian).
  • recommends that script codes use lowercase with the initial letter capitalized ('Cyrl' Cyrillic).
  • recommends that country codes be capitalized ('MN' Mongolia).

1.2 Language Subtag Sources and Interpretation

The namespace of language tags and their subtags is administered by
the Internet Assigned Numbers Authority (IANA) according to the rules
in Section 5 of this document. The Language Subtag Registry
maintained by IANA is the source for valid subtags: other standards
referenced in this section provide the source material for that

1.2.1 Primary Language Subtag

Should never be omitted in most cases, can be two or three characters.


