Package org.docx4j.org.apache.poi.util
Class StringUtil
java.lang.Object
org.docx4j.org.apache.poi.util.StringUtil
public class StringUtil
extends java.lang.Object
Title: String Utility Description: Collection of string handling utilities
Note - none of the methods in this class deals with
org.docx4j.org.apache.poi.hssf.record.ContinueRecords.
For such functionality, consider using RecordInputStream- Author:
- Andrew C. Oliver, Sergei Kozello (sergeikozello at mail.ru), Toshiaki Kamoshida (kamoshida.toshiaki at future dot co dot jp)
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classStringUtil.StringsIteratorAn Iterator over an array of Strings. -
Method Summary
Modifier and Type Method Description static java.lang.Stringformat(java.lang.String message, java.lang.Object[] params)Apply printf() like formatting to a string.static intgetEncodedSize(java.lang.String value)static java.lang.StringgetFromCompressedUnicode(byte[] string, int offset, int len)Read 8 bit data (in ISO-8859-1 codepage) into a (unicode) Java String and return.static java.lang.StringgetFromUnicodeLE(byte[] string)Given a byte array of 16-bit unicode characters in little endian format (most important byte last), return a Java String representation of it.static java.lang.StringgetFromUnicodeLE(byte[] string, int offset, int len)Given a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it.static java.lang.StringgetPreferredEncoding()static byte[]getToUnicodeLE(java.lang.String string)Convert String to 16-bit unicode characters in little endian formatstatic booleanhasMultibyte(java.lang.String value)check the parameter has multibyte characterstatic booleanisUnicodeString(java.lang.String value)Checks to see if a given String needs to be represented as Unicodestatic voidmapMsCodepoint(int msCodepoint, int unicodeCodepoint)static java.lang.StringmapMsCodepointString(java.lang.String string)Some strings may contain encoded characters of the unicode private use area.static voidputCompressedUnicode(java.lang.String input, byte[] output, int offset)Takes a unicode (java) string, and returns it as 8 bit data (in ISO-8859-1 codepage).static voidputCompressedUnicode(java.lang.String input, LittleEndianOutput out)static voidputUnicodeLE(java.lang.String input, byte[] output, int offset)Takes a unicode string, and returns it as little endian (most important byte last) bytes in the supplied byte array.static voidputUnicodeLE(java.lang.String input, LittleEndianOutput out)static java.lang.StringreadCompressedUnicode(LittleEndianInput in, int nChars)static java.lang.StringreadUnicodeLE(LittleEndianInput in, int nChars)static java.lang.StringreadUnicodeString(LittleEndianInput in)InputStream in is expected to contain: ushort nChars byte is16BitFlag byte[]/char[] characterData For this encoding, the is16BitFlag is always present even if nChars==0.static java.lang.StringreadUnicodeString(LittleEndianInput in, int nChars)InputStream in is expected to contain: byte is16BitFlag byte[]/char[] characterData For this encoding, the is16BitFlag is always present even if nChars==0.static voidwriteUnicodeString(LittleEndianOutput out, java.lang.String value)OutputStream out will get: ushort nChars byte is16BitFlag byte[]/char[] characterData For this encoding, the is16BitFlag is always present even if nChars==0.static voidwriteUnicodeStringFlagAndData(LittleEndianOutput out, java.lang.String value)OutputStream out will get: byte is16BitFlag byte[]/char[] characterData For this encoding, the is16BitFlag is always present even if nChars==0.
-
Method Details
-
getFromUnicodeLE
public static java.lang.String getFromUnicodeLE(byte[] string, int offset, int len) throws java.lang.ArrayIndexOutOfBoundsException, java.lang.IllegalArgumentExceptionGiven a byte array of 16-bit unicode characters in Little Endian format (most important byte last), return a Java String representation of it. { 0x16, 0x00 } -0x16- Parameters:
string- the byte array to be convertedoffset- the initial offset into the byte array. it is assumed that string[ offset ] and string[ offset + 1 ] contain the first 16-bit unicode characterlen- the length of the final string- Returns:
- the converted string, never
null. - Throws:
java.lang.ArrayIndexOutOfBoundsException- if offset is out of bounds for the byte array (i.e., is negative or is greater than or equal to string.length)java.lang.IllegalArgumentException- if len is too large (i.e., there is not enough data in string to create a String of that length)
-
getFromUnicodeLE
public static java.lang.String getFromUnicodeLE(byte[] string)Given a byte array of 16-bit unicode characters in little endian format (most important byte last), return a Java String representation of it. { 0x16, 0x00 } -0x16- Parameters:
string- the byte array to be converted- Returns:
- the converted string, never
null
-
getToUnicodeLE
public static byte[] getToUnicodeLE(java.lang.String string)Convert String to 16-bit unicode characters in little endian format- Parameters:
string- the string- Returns:
- the byte array of 16-bit unicode characters
-
getFromCompressedUnicode
public static java.lang.String getFromCompressedUnicode(byte[] string, int offset, int len)Read 8 bit data (in ISO-8859-1 codepage) into a (unicode) Java String and return. (In Excel terms, read compressed 8 bit unicode as a string)- Parameters:
string- byte array to readoffset- offset to read byte arraylen- length to read byte array- Returns:
- String generated String instance by reading byte array
-
readCompressedUnicode
-
readUnicodeString
InputStream in is expected to contain:- ushort nChars
- byte is16BitFlag
- byte[]/char[] characterData
-
readUnicodeString
InputStream in is expected to contain:- byte is16BitFlag
- byte[]/char[] characterData
This method should be used when the nChars field is not stored as a ushort immediately before the is16BitFlag. Otherwise,readUnicodeString(LittleEndianInput)can be used. -
writeUnicodeString
OutputStream out will get:- ushort nChars
- byte is16BitFlag
- byte[]/char[] characterData
-
writeUnicodeStringFlagAndData
OutputStream out will get:- byte is16BitFlag
- byte[]/char[] characterData
This method should be used when the nChars field is not stored as a ushort immediately before the is16BitFlag. Otherwise,writeUnicodeString(LittleEndianOutput, String)can be used. -
getEncodedSize
public static int getEncodedSize(java.lang.String value)- Returns:
- the number of bytes that would be written by
writeUnicodeString(LittleEndianOutput, String)
-
putCompressedUnicode
public static void putCompressedUnicode(java.lang.String input, byte[] output, int offset)Takes a unicode (java) string, and returns it as 8 bit data (in ISO-8859-1 codepage). (In Excel terms, write compressed 8 bit unicode)- Parameters:
input- the String containing the data to be writtenoutput- the byte array to which the data is to be writtenoffset- an offset into the byte arrat at which the data is start when written
-
putCompressedUnicode
-
putUnicodeLE
public static void putUnicodeLE(java.lang.String input, byte[] output, int offset)Takes a unicode string, and returns it as little endian (most important byte last) bytes in the supplied byte array. (In Excel terms, write uncompressed unicode)- Parameters:
input- the String containing the unicode data to be writtenoutput- the byte array to hold the uncompressed unicode, should be twice the length of the Stringoffset- the offset to start writing into the byte array
-
putUnicodeLE
-
readUnicodeLE
-
format
public static java.lang.String format(java.lang.String message, java.lang.Object[] params)Apply printf() like formatting to a string. Primarily used for logging.- Parameters:
message- the string with embedded formatting info eg. "This is a test %2.2"params- array of values to format into the string- Returns:
- The formatted string
-
getPreferredEncoding
public static java.lang.String getPreferredEncoding()- Returns:
- the encoding we want to use, currently hardcoded to ISO-8859-1
-
hasMultibyte
public static boolean hasMultibyte(java.lang.String value)check the parameter has multibyte character- Parameters:
value- string to check- Returns:
- boolean result true:string has at least one multibyte character
-
isUnicodeString
public static boolean isUnicodeString(java.lang.String value)Checks to see if a given String needs to be represented as Unicode- Parameters:
value-- Returns:
- true if string needs Unicode to be represented.
-
mapMsCodepointString
public static java.lang.String mapMsCodepointString(java.lang.String string)Some strings may contain encoded characters of the unicode private use area. Currently the characters of the symbol fonts are mapped to the corresponding characters in the normal unicode range.- Parameters:
string- the original string- Returns:
- the string with mapped characters
- See Also:
- Private Use Area (symbol), Symbol font - Unicode alternatives for Greek and special characters in HTML
-
mapMsCodepoint
public static void mapMsCodepoint(int msCodepoint, int unicodeCodepoint)
-