Class CodePageUtil
Provides constants for understanding numeric codepages, along with utilities to translate these into Java Character Sets.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final int
Codepage 037, a special casestatic final int
Codepage for EUC-JPstatic final int
Codepage for EUC-KRstatic final int
Codepage for GB18030static final int
Codepage for GB2312static final int
Codepage for GBK, aka MS936static final int
Codepage for ISO-2022-JPstatic final int
Another codepage for ISO-2022-JPstatic final int
Yet another codepage for ISO-2022-JPstatic final int
Codepage for ISO-2022-KRstatic final int
Codepage for ISO-8859-1static final int
Codepage for ISO-8859-2static final int
Codepage for ISO-8859-3static final int
Codepage for ISO-8859-4static final int
Codepage for ISO-8859-5static final int
Codepage for ISO-8859-6static final int
Codepage for ISO-8859-7static final int
Codepage for ISO-8859-8static final int
Codepage for ISO-8859-9static final int
Codepage for Johabstatic final int
Codepage for KOI8-Rstatic final int
Codepage for Macintosh Arabic (Java: MacArabic)static final int
Codepage for Macintosh Central Europe (Latin-2) (Java: MacCentralEurope)static final int
Codepage for Macintosh Chinese Simplified (Java: unknown - use EUC_CN, ISO2022_CN_GB, MS936 or cp935)static final int
Codepage for Macintosh Chinese Traditional (Java: unknown - use Big5, MS950, or cp937)static final int
Codepage for Macintosh Croatian (Java: MacCroatian)static final int
Codepage for Macintosh Cyrillic (Java: MacCyrillic)static final int
Codepage for Macintosh Greek (Java: MacGreek)static final int
Codepage for Macintosh Hebrew (Java: MacHebrew)static final int
Codepage for Macintosh Iceland (Java: MacIceland)static final int
Codepage for Macintosh Japan (Java: unknown - use SJIS, cp942 or cp943)static final int
Codepage for Macintosh Korean (Java: unknown - use EUC_KR or cp949)static final int
Codepage for Macintosh Roman (Java: MacRoman)static final int
static final int
Codepage for Macintosh Romanian (Java: MacRomania)static final int
Codepage for Macintosh Thai (Java: MacThai)static final int
Codepage for Macintosh Turkish (Java: MacTurkish)static final int
Codepage for Macintosh Ukrainian (Java: MacUkraine)static final int
Codepage for MS949static final int
Codepage for SJISstatic final int
Codepage for Unicodestatic final int
Codepage for US-ASCIIstatic final int
Another codepage for US-ASCIIstatic final int
Codepage for UTF-16static final int
Codepage for UTF-16 big-endianstatic final int
Codepage for UTF-8static final int
Codepage for Windows 1250static final int
Codepage for Windows 1251static final int
Codepage for Windows 1252static final int
static final int
Codepage for Windows 1253static final int
Codepage for Windows 1254static final int
Codepage for Windows 1255static final int
Codepage for Windows 1256static final int
Codepage for Windows 1257static final int
Codepage for Windows 1258 -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic String
codepageToEncoding
(int codepage) Turns a codepage number into the equivalent character encoding's name (in Java NIO canonical naming format).static String
codepageToEncoding
(int codepage, boolean javaLangFormat) Turns a codepage number into the equivalent character encoding's name, in either Java NIO or Java Lang canonical naming.static String
cp950ToString
(byte[] data, int offset, int lengthInBytes) This tries to convert a LE byte array in cp950 (Microsoft's dialect of Big5) to a String.static byte[]
getBytesInCodePage
(String string, int codepage) Converts a string into bytes, in the equivalent character encoding to the supplied codepage number.static String
getStringFromCodePage
(byte[] string, int codepage) Converts the bytes into a String, based on the equivalent character encoding to the supplied codepage number.static String
getStringFromCodePage
(byte[] string, int offset, int length, int codepage) Converts the bytes into a String, based on the equivalent character encoding to the supplied codepage number.
-
Field Details
-
DOUBLE_BYTE_CHARSETS
-
CP_037
public static final int CP_037Codepage 037, a special case
- See Also:
-
CP_SJIS
public static final int CP_SJISCodepage for SJIS
- See Also:
-
CP_GBK
public static final int CP_GBKCodepage for GBK, aka MS936
- See Also:
-
CP_MS949
public static final int CP_MS949Codepage for MS949
- See Also:
-
CP_UTF16
public static final int CP_UTF16Codepage for UTF-16
- See Also:
-
CP_UTF16_BE
public static final int CP_UTF16_BECodepage for UTF-16 big-endian
- See Also:
-
CP_WINDOWS_1250
public static final int CP_WINDOWS_1250Codepage for Windows 1250
- See Also:
-
CP_WINDOWS_1251
public static final int CP_WINDOWS_1251Codepage for Windows 1251
- See Also:
-
CP_WINDOWS_1252
public static final int CP_WINDOWS_1252Codepage for Windows 1252
- See Also:
-
CP_WINDOWS_1252_BIFF23
public static final int CP_WINDOWS_1252_BIFF23- See Also:
-
CP_WINDOWS_1253
public static final int CP_WINDOWS_1253Codepage for Windows 1253
- See Also:
-
CP_WINDOWS_1254
public static final int CP_WINDOWS_1254Codepage for Windows 1254
- See Also:
-
CP_WINDOWS_1255
public static final int CP_WINDOWS_1255Codepage for Windows 1255
- See Also:
-
CP_WINDOWS_1256
public static final int CP_WINDOWS_1256Codepage for Windows 1256
- See Also:
-
CP_WINDOWS_1257
public static final int CP_WINDOWS_1257Codepage for Windows 1257
- See Also:
-
CP_WINDOWS_1258
public static final int CP_WINDOWS_1258Codepage for Windows 1258
- See Also:
-
CP_JOHAB
public static final int CP_JOHABCodepage for Johab
- See Also:
-
CP_MAC_ROMAN
public static final int CP_MAC_ROMANCodepage for Macintosh Roman (Java: MacRoman)
- See Also:
-
CP_MAC_ROMAN_BIFF23
public static final int CP_MAC_ROMAN_BIFF23- See Also:
-
CP_MAC_JAPAN
public static final int CP_MAC_JAPANCodepage for Macintosh Japan (Java: unknown - use SJIS, cp942 or cp943)
- See Also:
-
CP_MAC_CHINESE_TRADITIONAL
public static final int CP_MAC_CHINESE_TRADITIONALCodepage for Macintosh Chinese Traditional (Java: unknown - use Big5, MS950, or cp937)
- See Also:
-
CP_MAC_KOREAN
public static final int CP_MAC_KOREANCodepage for Macintosh Korean (Java: unknown - use EUC_KR or cp949)
- See Also:
-
CP_MAC_ARABIC
public static final int CP_MAC_ARABICCodepage for Macintosh Arabic (Java: MacArabic)
- See Also:
-
CP_MAC_HEBREW
public static final int CP_MAC_HEBREWCodepage for Macintosh Hebrew (Java: MacHebrew)
- See Also:
-
CP_MAC_GREEK
public static final int CP_MAC_GREEKCodepage for Macintosh Greek (Java: MacGreek)
- See Also:
-
CP_MAC_CYRILLIC
public static final int CP_MAC_CYRILLICCodepage for Macintosh Cyrillic (Java: MacCyrillic)
- See Also:
-
CP_MAC_CHINESE_SIMPLE
public static final int CP_MAC_CHINESE_SIMPLECodepage for Macintosh Chinese Simplified (Java: unknown - use EUC_CN, ISO2022_CN_GB, MS936 or cp935)
- See Also:
-
CP_MAC_ROMANIA
public static final int CP_MAC_ROMANIACodepage for Macintosh Romanian (Java: MacRomania)
- See Also:
-
CP_MAC_UKRAINE
public static final int CP_MAC_UKRAINECodepage for Macintosh Ukrainian (Java: MacUkraine)
- See Also:
-
CP_MAC_THAI
public static final int CP_MAC_THAICodepage for Macintosh Thai (Java: MacThai)
- See Also:
-
CP_MAC_CENTRAL_EUROPE
public static final int CP_MAC_CENTRAL_EUROPECodepage for Macintosh Central Europe (Latin-2) (Java: MacCentralEurope)
- See Also:
-
CP_MAC_ICELAND
public static final int CP_MAC_ICELANDCodepage for Macintosh Iceland (Java: MacIceland)
- See Also:
-
CP_MAC_TURKISH
public static final int CP_MAC_TURKISHCodepage for Macintosh Turkish (Java: MacTurkish)
- See Also:
-
CP_MAC_CROATIAN
public static final int CP_MAC_CROATIANCodepage for Macintosh Croatian (Java: MacCroatian)
- See Also:
-
CP_US_ACSII
public static final int CP_US_ACSIICodepage for US-ASCII
- See Also:
-
CP_KOI8_R
public static final int CP_KOI8_RCodepage for KOI8-R
- See Also:
-
CP_ISO_8859_1
public static final int CP_ISO_8859_1Codepage for ISO-8859-1
- See Also:
-
CP_ISO_8859_2
public static final int CP_ISO_8859_2Codepage for ISO-8859-2
- See Also:
-
CP_ISO_8859_3
public static final int CP_ISO_8859_3Codepage for ISO-8859-3
- See Also:
-
CP_ISO_8859_4
public static final int CP_ISO_8859_4Codepage for ISO-8859-4
- See Also:
-
CP_ISO_8859_5
public static final int CP_ISO_8859_5Codepage for ISO-8859-5
- See Also:
-
CP_ISO_8859_6
public static final int CP_ISO_8859_6Codepage for ISO-8859-6
- See Also:
-
CP_ISO_8859_7
public static final int CP_ISO_8859_7Codepage for ISO-8859-7
- See Also:
-
CP_ISO_8859_8
public static final int CP_ISO_8859_8Codepage for ISO-8859-8
- See Also:
-
CP_ISO_8859_9
public static final int CP_ISO_8859_9Codepage for ISO-8859-9
- See Also:
-
CP_ISO_2022_JP1
public static final int CP_ISO_2022_JP1Codepage for ISO-2022-JP
- See Also:
-
CP_ISO_2022_JP2
public static final int CP_ISO_2022_JP2Another codepage for ISO-2022-JP
- See Also:
-
CP_ISO_2022_JP3
public static final int CP_ISO_2022_JP3Yet another codepage for ISO-2022-JP
- See Also:
-
CP_ISO_2022_KR
public static final int CP_ISO_2022_KRCodepage for ISO-2022-KR
- See Also:
-
CP_EUC_JP
public static final int CP_EUC_JPCodepage for EUC-JP
- See Also:
-
CP_EUC_KR
public static final int CP_EUC_KRCodepage for EUC-KR
- See Also:
-
CP_GB2312
public static final int CP_GB2312Codepage for GB2312
- See Also:
-
CP_GB18030
public static final int CP_GB18030Codepage for GB18030
- See Also:
-
CP_US_ASCII2
public static final int CP_US_ASCII2Another codepage for US-ASCII
- See Also:
-
CP_UTF8
public static final int CP_UTF8Codepage for UTF-8
- See Also:
-
CP_UNICODE
public static final int CP_UNICODECodepage for Unicode
- See Also:
-
-
Constructor Details
-
CodePageUtil
public CodePageUtil()
-
-
Method Details
-
getBytesInCodePage
public static byte[] getBytesInCodePage(String string, int codepage) throws UnsupportedEncodingException Converts a string into bytes, in the equivalent character encoding to the supplied codepage number.- Parameters:
string
- The string to convertcodepage
- The codepage number- Throws:
UnsupportedEncodingException
-
getStringFromCodePage
public static String getStringFromCodePage(byte[] string, int codepage) throws UnsupportedEncodingException Converts the bytes into a String, based on the equivalent character encoding to the supplied codepage number.- Parameters:
string
- The byte of the string to convertcodepage
- The codepage number- Throws:
UnsupportedEncodingException
-
getStringFromCodePage
public static String getStringFromCodePage(byte[] string, int offset, int length, int codepage) throws UnsupportedEncodingException Converts the bytes into a String, based on the equivalent character encoding to the supplied codepage number.- Parameters:
string
- The byte of the string to convertcodepage
- The codepage number- Throws:
UnsupportedEncodingException
-
codepageToEncoding
Turns a codepage number into the equivalent character encoding's name (in Java NIO canonical naming format).
- Parameters:
codepage
- The codepage number- Returns:
- The character encoding's name. If the codepage number is 65001, the encoding name is "UTF-8". All other positive numbers are mapped to their Java NIO names, normally either "windows-" followed by the number, eg "windows-1251", or "cp" followed by the number, e.g. if the codepage number is 1252 the returned character encoding name will be "cp1252".
- Throws:
UnsupportedEncodingException
- if the specified codepage is less than zero.
-
codepageToEncoding
public static String codepageToEncoding(int codepage, boolean javaLangFormat) throws UnsupportedEncodingException Turns a codepage number into the equivalent character encoding's name, in either Java NIO or Java Lang canonical naming.
- Parameters:
codepage
- The codepage numberjavaLangFormat
- Should Java Lang or Java NIO naming be used?- Returns:
- The character encoding's name, in either Java Lang format (eg Cp1251, ISO8859_5) or Java NIO format (eg windows-1252, ISO-8859-9)
- Throws:
UnsupportedEncodingException
- if the specified codepage is less than zero.- See Also:
-
cp950ToString
This tries to convert a LE byte array in cp950 (Microsoft's dialect of Big5) to a String. We know MS zero-padded ascii, and we drop those. There may be areas for improvement in this.- Parameters:
data
-offset
-lengthInBytes
-- Returns:
- Decoded String
-