Package com.ibm.icu.impl
Class IntTrieBuilder
java.lang.Object
com.ibm.icu.impl.TrieBuilder
com.ibm.icu.impl.IntTrieBuilder
Builder class to manipulate and generate a trie.
This is useful for ICU data in primitive types.
Provides a compact way to store information that is indexed by Unicode
values, such as character properties, types, keyboard values, etc. This is
very useful when you have a block of Unicode data that contains significant
values while the rest of the Unicode data is unused in the application or
when you have a lot of redundance, such as where all 21,000 Han ideographs
have the same value. However, lookup is much faster than a hash table.
A trie of any primitive data type serves two purposes:
- Fast access of the indexed values.
- Smaller memory footprint.
-
Nested Class Summary
Nested classes/interfaces inherited from class com.ibm.icu.impl.TrieBuilder
TrieBuilder.DataManipulate
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected int[]
protected int
private int
Fields inherited from class com.ibm.icu.impl.TrieBuilder
BMP_INDEX_LENGTH_, DATA_BLOCK_LENGTH, DATA_GRANULARITY_, INDEX_SHIFT_, m_dataCapacity_, m_dataLength_, m_index_, m_indexLength_, m_isCompacted_, m_isLatin1Linear_, m_map_, MASK_, MAX_DATA_LENGTH_, MAX_INDEX_LENGTH_, OPTIONS_DATA_IS_32_BIT_, OPTIONS_INDEX_SHIFT_, OPTIONS_LATIN1_IS_LINEAR_, SHIFT_, SURROGATE_BLOCK_COUNT_
-
Constructor Summary
ConstructorsConstructorDescriptionIntTrieBuilder
(int[] aliasdata, int maxdatalength, int initialvalue, int leadunitvalue, boolean latin1linear) Constructs a build tableIntTrieBuilder
(IntTrieBuilder table) Copy constructor -
Method Summary
Modifier and TypeMethodDescriptionprivate int
private void
compact
(boolean overlap) Compact a folded build-time trie.private void
fillBlock
(int block, int start, int limit, int value, boolean overwrite) private static final int
findSameDataBlock
(int[] data, int dataLength, int otherBlock, int step) Find the same data blockprivate final void
fold
(TrieBuilder.DataManipulate manipulate) Fold the normalization data for supplementary code points into a compact area on top of the BMP-part of the trie index, with the lead surrogates indexing this compact area.private int
getDataBlock
(int ch) No error checking for illegal arguments.int
getValue
(int ch) Gets a 32 bit data from the table dataint
getValue
(int ch, boolean[] inBlockZero) Get a 32 bit data from the table dataserialize
(TrieBuilder.DataManipulate datamanipulate, Trie.DataManipulate triedatamanipulate) Serializes the build table with 32 bit dataint
serialize
(OutputStream os, boolean reduceTo16Bits, TrieBuilder.DataManipulate datamanipulate) Serializes the build table to an output stream.boolean
setRange
(int start, int limit, int value, boolean overwrite) Set a value in a range of code points [start..limit].boolean
setValue
(int ch, int value) Sets a 32 bit data in the table dataMethods inherited from class com.ibm.icu.impl.TrieBuilder
equal_int, findSameIndexBlock, findUnusedBlocks, isInZeroBlock
-
Field Details
-
m_data_
protected int[] m_data_ -
m_initialValue_
protected int m_initialValue_ -
m_leadUnitValue_
private int m_leadUnitValue_
-
-
Constructor Details
-
IntTrieBuilder
Copy constructor -
IntTrieBuilder
public IntTrieBuilder(int[] aliasdata, int maxdatalength, int initialvalue, int leadunitvalue, boolean latin1linear) Constructs a build table- Parameters:
aliasdata
- data to be filled into tablemaxdatalength
- maximum data length allowed in tableinitialvalue
- initial data valuelatin1linear
- is latin 1 to be linear
-
-
Method Details
-
getValue
public int getValue(int ch) Gets a 32 bit data from the table data- Parameters:
ch
- codepoint which data is to be retrieved- Returns:
- the 32 bit data
-
getValue
public int getValue(int ch, boolean[] inBlockZero) Get a 32 bit data from the table data- Parameters:
ch
- code point for which data is to be retrieved.inBlockZero
- Output parameter, inBlockZero[0] returns true if the char maps into block zero, otherwise false.- Returns:
- the 32 bit data value.
-
setValue
public boolean setValue(int ch, int value) Sets a 32 bit data in the table data- Parameters:
ch
- codepoint which data is to be setvalue
- to set- Returns:
- true if the set is successful, otherwise if the table has been compacted return false
-
serialize
public IntTrie serialize(TrieBuilder.DataManipulate datamanipulate, Trie.DataManipulate triedatamanipulate) Serializes the build table with 32 bit data- Parameters:
datamanipulate
- builder raw fold method implementationtriedatamanipulate
- result trie fold method- Returns:
- a new trie
-
serialize
public int serialize(OutputStream os, boolean reduceTo16Bits, TrieBuilder.DataManipulate datamanipulate) throws IOException Serializes the build table to an output stream. Compacts the build-time trie after all values are set, and then writes the serialized form onto an output stream. After this, this build-time Trie can only be serialized again and/or closed; no further values can be added. This function is the rough equivalent of utrie_seriaize() in ICU4C.- Parameters:
os
- the output stream to which the seriaized trie will be written. If nul, the function still returns the size of the serialized Trie.reduceTo16Bits
- If true, reduce the data size to 16 bits. The resulting serialized form can then be used to create a CharTrie.datamanipulate
- builder raw fold method implementation- Returns:
- the number of bytes written to the output stream.
- Throws:
IOException
-
setRange
public boolean setRange(int start, int limit, int value, boolean overwrite) Set a value in a range of code points [start..limit]. All code points c with start <= c < limit will get the value if overwrite is true or if the old value is 0.- Parameters:
start
- the first code point to get the valuelimit
- one past the last code point to get the valuevalue
- the valueoverwrite
- flag for whether old non-initial values are to be overwritten- Returns:
- false if a failure occurred (illegal argument or data array overrun)
-
allocDataBlock
private int allocDataBlock() -
getDataBlock
private int getDataBlock(int ch) No error checking for illegal arguments.- Parameters:
ch
- codepoint to look for- Returns:
- -1 if no new data block available (out of memory in data array)
-
compact
private void compact(boolean overlap) Compact a folded build-time trie. The compaction - removes blocks that are identical with earlier ones - overlaps adjacent blocks as much as possible (if overlap == true) - moves blocks in steps of the data granularity - moves and overlaps blocks that overlap with multiple values in the overlap region It does not - try to move and overlap blocks that are not already adjacent- Parameters:
overlap
- flag
-
findSameDataBlock
private static final int findSameDataBlock(int[] data, int dataLength, int otherBlock, int step) Find the same data block- Parameters:
data
- arraydataLength
-otherBlock
-step
-
-
fold
Fold the normalization data for supplementary code points into a compact area on top of the BMP-part of the trie index, with the lead surrogates indexing this compact area. Duplicate the index values for lead surrogates: From inside the BMP area, where some may be overridden with folded values, to just after the BMP area, where they can be retrieved for code point lookups.- Parameters:
manipulate
- fold implementation
-
fillBlock
private void fillBlock(int block, int start, int limit, int value, boolean overwrite)
-