Products
Home»Products»Rosette Linguistics Platform»Unicode»Supported Encodings

Supported Encodings

This section lists the legacy encodings supported by Rosette. The encodings are listed by language and include alternative names that Rosette recognizes as equal in code points to the encoding.

Supported platforms for Rosette Core Library for Unicode include Windows, Linux, Solaris, AIX, HPUX, and MacOS.

Arabic Language Encoding

EncodingOther NamesVendor/Standard BodyOther Rosette Names
CP10004Macintosh ArabicMicrosoft & IBMCP10004
CP1256Microsoft & IBMCP1256
CP20420(with fullwidth Latin & punctuation)Microsoft & IBMCP20420
CP28596Arabic Alphabet (ISO)Microsoft & IBMCP28596
CP708ASMO708Microsoft & IBMCP708
CP720Transparent ASMOMicrosoft & IBMCP720
CP864Microsoft & IBMCP864
ISO 8859-6ISOLatinArabicInternational or National StandardISO_8859-6, Arabic, iso-ir-127, ECMA-114, ASMO-708

Baltic Language Encoding

EncodingOther NamesVendor/Standard BodyOther Rosette Names
CP28594Baltic Alphabet (ISO)Microsoft & IBMCP28594
CP775Microsoft & IBMCP775
ISO 8859-4Latin4International or National StandardISO-8859-4, Latin4, iso-ir-110
ISO 8859-13Latin7International or National StandardISO-8859-13, Latin7

Celtic Language Encoding

EncodingOther NamesVendor/Standard BodyOther Rosette Names
ISO 8859-14Latin8International or National StandardISO-8859-14, Latin8, iso-ir-199

Chinese Language Encoding

EncodingOther NamesVendor/Standard BodyOther Rosette Names
ChineseAutoDetectFor encodings, see ChineseAutodetectRosette AutodetectChineseAutoDetect
HKSCSInternational or National StandardHKSCS
ISO 2022-CNInternational or National StandardISO-2022-CN
GB 18030International or National StandardGB18030
Chinese, Simplified
CCSID 935IBMCCSID-935, CCSID935
EUC-CNGB2312, EUC-SCUnixGB2312
GB2312EUC-CN, EUC-SCInternational or National StandardGB2312
HZ-GB-2312HZ-GB-2312International or National StandardHZ, HZ-GB-2312
CP936GBKMicrosoft & IBMCP936, GBK
MacChineseSimplifiedMacintoshMacChineseSimplified
Chinese, Traditional
CCSID 937IBMCCSID-937, CCSID937
CNS-11643-1986EUC-TWInternational or National StandardCNS-11643-1986
CNS-11643-1992EUC-TWInternational or National StandardCNS-11643, CNS-11643-1992
EUC-TWCNS-11643-1986, CNS-11643-1992UnixCNS-11643, CNS-11643-1992
GB12345International or National StandardGB12345
Big5International or National StandardBig5
Big5+International or National StandardBig5+, Big5Plus
CP10002Macintosh Traditional ChineseMicrosoft & IBMCP10002
CP950Microsoft & IBMCP950
MacChineseTraditionalMacintoshMacChineseTraditional

Croatian Language Encoding

EncodingOther NamesVendor/Standard BodyOther Rosette Names
MacCroatianMacintoshMacCroatian

Cyrillic Language Encoding

EncodingOther NamesVendor/Standard BodyOther Rosette Names
CP10007Macintosh CyrillicMicrosoft & IBMCP10007
CP1251MS Windows Cyrillic (Slavic)Microsoft & IBMCP1251
CP20866Cyrillic Alphabet, KOI8-RMicrosoft & IBMCP20866
CP20880(with fullwidth Latin & punctuation) Microsoft & IBMCP20880
CP21025(with fullwidth Latin & punctuation) Microsoft & IBMCP21025
CP21866Ukrainian KOI8-RUMicrosoft & IBMCP21866
CP28595Cyrillic Alphabet (ISO)Microsoft & IBMCP28595
CP855IBM CyrillicMicrosoft & IBMCP855
CP866MS DOS RussianMicrosoft & IBMCP866
ISO 8859-5ISOLatinCyrillicInternational or National StandardISOLatinCyrillic
MacCyrillicMacintoshMacCyrillic

Devanagari Language Encoding

EncodingOther NamesVendor/Standard BodyOther Rosette Names
MacDevanagariMacintosh MacDevanagari
ISCII-DevanagariIndian Standardsx-iscii-de, windows-57002

Greek Language Encoding

EncodingOther NamesVendor/Standard BodyOther Rosette Names
CP10006Macintosh Greek 1Microsoft & IBMCP10006
CP1253Microsoft & IBMCP1253
CP20423(with fullwidth Latin & punctuation)Microsoft & IBMCP20423
CP28597Greek Alphabet (ISO)Microsoft & IBMCP28597
CP737Microsoft & IBMCP737
CP869IBM Modern GreekMicrosoft & IBMCP869
ISO 8859-7ISOLatinGreekInternational or National StandardISO-8859-7, Greek
MacGreekMacintoshMacGreek

Gujarati Language Encoding

EncodingOther NamesVendor/Standard BodyOther Rosette Names
MacGujaratiMacintoshMacGujarati
ISCII-GujaratiIndian Standardsx-iscii-gu, windows-57010

Gurmukhi Language Encoding

EncodingOther NamesVendor/Standard BodyOther Rosette Names
CP10010Macintosh GurmukhiMicrosoft & IBMCP10010
MacGurmukhiMacintoshMacGurmukhi

Hebrew Language Encoding

EncodingOther NamesVendor/Standard BodyOther Rosette Names
CP10005Macintosh HebrewMicrosoft & IBMCP10005
CP1255Microsoft & IBMCP1255
CP28598Hebrew Alphabet (ISO)Microsoft & IBMCP28598
CP38598ASCII + Hebrew and private use charactersMicrosoft & IBMCP38598
CP862Microsoft & IBMCP862
ISO 8859-8ISOLatinHebrewInternational or National StandardHebrew

Icelandic Language Encoding

EncodingOther NamesVendor/Standard BodyOther Rosette Names
CP10079Macintosh IcelandicMicrosoft & IBM CP10079
CP861MS DOS IcelandicMicrosoft & IBMCP861
MacIcelandicMacintoshMacIcelandic

Japanese Language Encoding

EncodingOther NamesVendor/Standard BodyOther Rosette Names
CCSID 1027EBCDIKMicrosoft & IBMCCSID-1027, CCSID1027
CCSID 290EBCDIKMicrosoft & IBMCCSID-290, CCSID290
CCSID 930IBMCCSID-930, CCSID930
CCSID 939IBMCCSID-939, CCSID939
CCSID 942Microsoft & IBMCCSID-942, CCSID942
CP10001Macintosh JapaneseMicrosoft & IBMCP10001
CP20290(full/half width Latin & halfwidth katakana)Microsoft & IBMCP20290
CP21027(halfwidth Latin, halfwidth katakana & private use) Microsoft & IBMCP21027
EUC-JPUnixEUC-JP, EUC-J
EUC-JP-JISROMANUnixEUC-JP-JISROMAN
ISO 2022-JPInternational or National StandardISO-2022-JP
JapaneseAutoDetectFor encodings, see JapaneseAutodetectRosette AutodetectJapaneseAutoDetect
JIS_X_0201HalfWidthKatakanaInternational or National StandardJIS_X_0201, IBM897
JIS_X_0208International or National StandardJIS_X_0208
MacJapaneseMacintoshMacJapanese
Shift-JISMSMS_Kanji, CP932Microsoft & IBMShift-JIS, SJIS
Shift_JIS-2004ShiftJISX0213Microsoft & IBMShift_JISX0213, Shift-X
Shift-JIS78Shift-JIS without MS/IBM extensionsUnix/MacintoshShift-JIS78, SJIS78

Korean Language Encoding

EncodingOther NamesVendor/Standard BodyOther Rosette Names
CP10003Macintosh KoreanMicrosoft & IBM CP10003
CP1361Korean Johab (based on KSC 5861-1992)Microsoft & IBMCP1361
CP949Microsoft & IBMCP949
EUC-KRKS_C_5861-1992UnixEUC-KR, EUC-K
ISO 2022-KRKS_C_5601-1987International or National StandardISO-2022-KR
JohabInternational or National StandardJohab
KoreanAutoDetectSee KoreanAutodetect Rosette AutodetectKoreanAutoDetect
KoreanAutoDetectSee KoreanAutodetectRosette AutodetectKoreanAutoDetect
KS_C_5601-1987ISO-2022-KRInternational or National StandardISO-2022-KR
KS_C_5861-1992EUC-KRInternational or National StandardKS_C_5861-1992
MacKoreanMacintoshMacKorean

Latin Language Encoding

EncodingOther NamesVendor/Standard BodyOther Rosette Names
CP10000Macintosh RomanMicrosoft & IBMCP10000
CP10029Macintosh Latin2Microsoft & IBMCP10029
CP10082(with mathematical symbols)Microsoft & IBMCP10082
CCSID 1047EBCDIC (for IBM Open Systems platform)Microsoft & IBMCCSID1047
CP20261(with private use characters)Microsoft & IBMCP20261
CP20269Microsoft & IBMCP20269
CP20273(with fullwidth Latin & punctuation)Microsoft & IBMCP20273
CP20277(with fullwidth Latin & punctuation)Microsoft & IBMCP20277
CP20278(with fullwidth Latin & punctuation)Microsoft & IBMCP20278
CP20280(with fullwidth Latin & punctuation)Microsoft & IBMCP20280
CP20284(with fullwidth Latin & punctuation)Microsoft & IBMCP20284
CP20285(with fullwidth Latin & punctuation)Microsoft & IBMCP20285
CP20297(with fullwidth Latin & punctuation)Microsoft & IBMCP20297
CP20833(with fullwidth Latin & punctuation)Microsoft & IBMCP20833
CP20871(with fullwidth Latin & punctuation)Microsoft & IBMCP20871
CP28591ASCII + Latin accented vowelsMicrosoft & IBMCP28591
CP28593Latin 3 Alphabet (ISO)Microsoft & IBMCP28593
CP850MS DOS Multilingual, MS-DOS Latin1Microsoft & IBMCP850
CP870(with fullwidth punctuation)Microsoft & IBMCP870
ISO 8859-1Latin1International or National StandardISO-8859-1, Latin1, IBM819, iso-ir-100
ISO 8859-15Latin1 + Euro symbol & accented charactersInternational or National StandardISO-8859-15, Latin9
ISO 8859-2ISO_8859-2, Latin2, iso-ir-101International or National StandardLatin2, ISO-8859-2
MacRomanMacintoshMacRoman
NextStepApple/NextNextStep
Adobe-Standard-Encoding(used in PS printers)Other CorporateAdobe-Standard-Encoding
Adobe-Standard-Encoding(used in PS printers)Other CorporateAdobe-Standard-Encoding
Latin, Canadian French
CP863 MS DOS Canadian FrenchMicrosoft & IBMCP863
Latin, Central European
CP28592Central European Alphabet (ISO)Microsoft & IBMCP28592
MacCentralEuropeanMacintoshMacCentralEuropean
Latin, Eastern European
CP1250Microsoft & IBMCP1250
Latin, Esperanto
CP20905(with fullwidth Latin & punctuation)Microsoft & IBMCP20905
Latin, Portugese
CP860MS DOS PortugeseMicrosoft & IBMCP860
Latin, Southeast European
ISO 8859-3Latin3International or National StandardLatin3, ISO-8859-3
Latin, US English
ASCIIUS-ASCII, CP367International or National StandardASCII
CP037EBCDICMicrosoft & IBM CP037
CP1026EBCDICMicrosoft & IBMCP1026
CP1252MS Windows Latin1 (ANSI)Microsoft & IBMCP1252
CP20105US ASCIIMicrosoft & IBMCP20105
CP437MS-DOS Latin USMicrosoft & IBMCP437
CP500EBCDICMicrosoft & IBMCP500
CP875EBCDICMicrosoft & IBMCP875

Malayalam Language Encoding

EncodingOther NamesVendor/Standard BodyOther Rosette Names
CP10017Macintosh MalayalamMicrosoft & IBMCP10017
ISCII-MalayalamIndian Standardsx-iscii-ma, windows-57009

Nordic Language Encoding

EncodingOther NamesVendor/Standard BodyOther Rosette Names
CP865MS DOS NordicMicrosoft & IBMCP865
ISO 8859-10Latin6International or National StandardLatin6, ISO-8859-10, iso-ir-157

Romanian Language Encoding

EncodingOther NamesVendor/Standard BodyOther Rosette Names
MacRomanianMacintoshMacRomanian

Slavic Language Encoding

EncodingOther NamesVendor/Standard BodyOther Rosette Names
CP852MS DOS SlavicMicrosoft & IBM CP852

Symbol Encoding

EncodingOther NamesVendor/Standard BodyOther Rosette Names
Adobe-Symbol-Encoding(used in PS printers)AdobeAdobe-Symbol-Encoding
Adobe-Zapf-Dingbats-Encoding(used in PS printers)AdobeAdobe-Zapf-Dingbats-Encoding
CP10008Macintosh RSymbol (Right-left symbol)Microsoft & IBM CP10008
MacDingbatsMacintoshMacDingbats
MacSymbolMacintoshMacSymbol

Thai Language Encoding

EncodingOther NamesVendor/Standard BodyOther Rosette Names
CP20838(with fullwidth Latin & punctuation)Microsoft & IBMCP20838
CP874IBMThaiMicrosoft & IBMCP874
ISO 8859-11 (draft)ISOLatinThaiInternational or National StandardThai
MacThaiMacintoshMacThai

Turkish Language Encoding

EncodingOther NamesVendor/Standard BodyOther Rosette Names
CP10081Macintosh TurkishMicrosoft & IBMCP10081
CP1254Microsoft & IBMCP1254
CP28599Turkish (ISO)Microsoft & IBMCP28599
CP857IBM TurkishMicrosoft & IBMCP857
ISO 8859-9Latin5International or National StandardISO-8859-9, Latin5, iso-ir-148
MacTurkishMacintoshMacTurkish

Ukrainian Language Encoding

EncodingOther NamesVendor/Standard BodyOther Rosette Names
MacUkrainianMacintoshMacUkrainian

Vietnamese Language Encoding

EncodingOther NamesVendor/Standard BodyOther Rosette Names
CP1258Microsoft & IBM CP1258

Unicode Encodings Supported

EncodingOther NamesVendor/Standard BodyOther Rosette Names
BMPUnicodeBMP, Unicode20:big-endian
Java(way of representing Unicode chars in ASCII)SunJava, Unicode20:BOM:Java, Unicode11:Java, Unicode11:BOM:Java
UCS2ISO-10646-UCS2, UTF16UnicodeUnicode
Unicode Big-endianUnicodebig-endian, Unicode20:big-endian, Unicode11:big-endian, Unicode11:BOM:big-endian
Unicode Little-endianUnicodelittle-endian, Unicode20:little-endian, Unicode11:little-endian, Unicode11:BOM:little-endian
Unicode11-UCS2UnicodeUnicode11-UCS2, Unicode11:UCS2, Unicode11:BOM:UCS2
Unicode11-UTF7UnicodeUnicode11-UTF7, Unicode11:UTF7, Unicode11:BOM:UTF7
Unicode11-UTF8UnicodeUnicode11-UTF8, Unicode11:UTF8, Unicode11:BOM:UTF8
UTF7UnicodeUTF7, Unicode20:BOM:UTF7
UTF8UnicodeUTF8, Unicode20:BOM:UTF8
UTF32UnicodeUTF32
UTF8UnicodeUTF8, Unicode20:BOM:UTF8
UTF-EBCDICUnicodeUTF8-EBCDIC, UTF-8-EBCDIC