Class CodePageUtil

java.lang.Object
org.docx4j.org.apache.poi.util.CodePageUtil

public class CodePageUtil
extends java.lang.Object
Utilities for working with Microsoft CodePages.

Provides constants for understanding numeric codepages, along with utilities to translate these into Java Character Sets.

  • Field Summary

    Fields 
    Modifier and Type Field Description
    static int CP_037
    Codepage 037, a special case
    static int CP_EUC_JP
    Codepage for EUC-JP
    static int CP_EUC_KR
    Codepage for EUC-KR
    static int CP_GB18030
    Codepage for GB18030
    static int CP_GB2312
    Codepage for GB2312
    static int CP_GBK
    Codepage for GBK, aka MS936
    static int CP_ISO_2022_JP1
    Codepage for ISO-2022-JP
    static int CP_ISO_2022_JP2
    Another codepage for ISO-2022-JP
    static int CP_ISO_2022_JP3
    Yet another codepage for ISO-2022-JP
    static int CP_ISO_2022_KR
    Codepage for ISO-2022-KR
    static int CP_ISO_8859_1
    Codepage for ISO-8859-1
    static int CP_ISO_8859_2
    Codepage for ISO-8859-2
    static int CP_ISO_8859_3
    Codepage for ISO-8859-3
    static int CP_ISO_8859_4
    Codepage for ISO-8859-4
    static int CP_ISO_8859_5
    Codepage for ISO-8859-5
    static int CP_ISO_8859_6
    Codepage for ISO-8859-6
    static int CP_ISO_8859_7
    Codepage for ISO-8859-7
    static int CP_ISO_8859_8
    Codepage for ISO-8859-8
    static int CP_ISO_8859_9
    Codepage for ISO-8859-9
    static int CP_JOHAB
    Codepage for Johab
    static int CP_KOI8_R
    Codepage for KOI8-R
    static int CP_MAC_ARABIC
    Codepage for Macintosh Arabic (Java: MacArabic)
    static int CP_MAC_CENTRAL_EUROPE
    Codepage for Macintosh Central Europe (Latin-2) (Java: MacCentralEurope)
    static int CP_MAC_CHINESE_SIMPLE
    Codepage for Macintosh Chinese Simplified (Java: unknown - use EUC_CN, ISO2022_CN_GB, MS936 or cp935)
    static int CP_MAC_CHINESE_TRADITIONAL
    Codepage for Macintosh Chinese Traditional (Java: unknown - use Big5, MS950, or cp937)
    static int CP_MAC_CROATIAN
    Codepage for Macintosh Croatian (Java: MacCroatian)
    static int CP_MAC_CYRILLIC
    Codepage for Macintosh Cyrillic (Java: MacCyrillic)
    static int CP_MAC_GREEK
    Codepage for Macintosh Greek (Java: MacGreek)
    static int CP_MAC_HEBREW
    Codepage for Macintosh Hebrew (Java: MacHebrew)
    static int CP_MAC_ICELAND
    Codepage for Macintosh Iceland (Java: MacIceland)
    static int CP_MAC_JAPAN
    Codepage for Macintosh Japan (Java: unknown - use SJIS, cp942 or cp943)
    static int CP_MAC_KOREAN
    Codepage for Macintosh Korean (Java: unknown - use EUC_KR or cp949)
    static int CP_MAC_ROMAN
    Codepage for Macintosh Roman (Java: MacRoman)
    static int CP_MAC_ROMAN_BIFF23  
    static int CP_MAC_ROMANIA
    Codepage for Macintosh Romanian (Java: MacRomania)
    static int CP_MAC_THAI
    Codepage for Macintosh Thai (Java: MacThai)
    static int CP_MAC_TURKISH
    Codepage for Macintosh Turkish (Java: MacTurkish)
    static int CP_MAC_UKRAINE
    Codepage for Macintosh Ukrainian (Java: MacUkraine)
    static int CP_MS949
    Codepage for MS949
    static int CP_SJIS
    Codepage for SJIS
    static int CP_UNICODE
    Codepage for Unicode
    static int CP_US_ACSII
    Codepage for US-ASCII
    static int CP_US_ASCII2
    Another codepage for US-ASCII
    static int CP_UTF16
    Codepage for UTF-16
    static int CP_UTF16_BE
    Codepage for UTF-16 big-endian
    static int CP_UTF8
    Codepage for UTF-8
    static int CP_WINDOWS_1250
    Codepage for Windows 1250
    static int CP_WINDOWS_1251
    Codepage for Windows 1251
    static int CP_WINDOWS_1252
    Codepage for Windows 1252
    static int CP_WINDOWS_1252_BIFF23  
    static int CP_WINDOWS_1253
    Codepage for Windows 1253
    static int CP_WINDOWS_1254
    Codepage for Windows 1254
    static int CP_WINDOWS_1255
    Codepage for Windows 1255
    static int CP_WINDOWS_1256
    Codepage for Windows 1256
    static int CP_WINDOWS_1257
    Codepage for Windows 1257
    static int CP_WINDOWS_1258
    Codepage for Windows 1258
  • Constructor Summary

    Constructors 
    Constructor Description
    CodePageUtil()  
  • Method Summary

    Modifier and Type Method Description
    static java.lang.String codepageToEncoding​(int codepage)
    Turns a codepage number into the equivalent character encoding's name (in Java NIO canonical naming format).
    static java.lang.String codepageToEncoding​(int codepage, boolean javaLangFormat)
    Turns a codepage number into the equivalent character encoding's name, in either Java NIO or Java Lang canonical naming.
    static byte[] getBytesInCodePage​(java.lang.String string, int codepage)
    Converts a string into bytes, in the equivalent character encoding to the supplied codepage number.
    static java.lang.String getStringFromCodePage​(byte[] string, int codepage)
    Converts the bytes into a String, based on the equivalent character encoding to the supplied codepage number.
    static java.lang.String getStringFromCodePage​(byte[] string, int offset, int length, int codepage)
    Converts the bytes into a String, based on the equivalent character encoding to the supplied codepage number.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait