|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectmorfologik.fsa.FSA
morfologik.fsa.CFSA
public final class CFSA
CFSA (Compact Finite State Automaton) binary format implementation. This is a
slightly reorganized version of FSA5 offering smaller automata size
at some (minor) performance penalty.
Note: Serialize to CFSA2 for new code.
The encoding of automaton body is as follows.
---- FSA header (standard)
Byte Description
+-+-+-+-+-+-+-+-+\
0 | | | | | | | | | +------ '\'
+-+-+-+-+-+-+-+-+/
+-+-+-+-+-+-+-+-+\
1 | | | | | | | | | +------ 'f'
+-+-+-+-+-+-+-+-+/
+-+-+-+-+-+-+-+-+\
2 | | | | | | | | | +------ 's'
+-+-+-+-+-+-+-+-+/
+-+-+-+-+-+-+-+-+\
3 | | | | | | | | | +------ 'a'
+-+-+-+-+-+-+-+-+/
+-+-+-+-+-+-+-+-+\
4 | | | | | | | | | +------ version (fixed 0xc5)
+-+-+-+-+-+-+-+-+/
+-+-+-+-+-+-+-+-+\
5 | | | | | | | | | +------ filler character
+-+-+-+-+-+-+-+-+/
+-+-+-+-+-+-+-+-+\
6 | | | | | | | | | +------ annot character
+-+-+-+-+-+-+-+-+/
+-+-+-+-+-+-+-+-+\
7 |C|C|C|C|G|G|G|G| +------ C - node data size (ctl), G - address size (gotoLength)
+-+-+-+-+-+-+-+-+/
+-+-+-+-+-+-+-+-+\
8-32 | | | | | | | | | +------ labels mapped for type (1) of arc encoding.
: : : : : : : : : |
+-+-+-+-+-+-+-+-+/
---- Start of a node; only if automaton was compiled with NUMBERS option.
Byte
+-+-+-+-+-+-+-+-+\
0 | | | | | | | | | \ LSB
+-+-+-+-+-+-+-+-+ +
1 | | | | | | | | | | number of strings recognized
+-+-+-+-+-+-+-+-+ +----- by the automaton starting
: : : : : : : : : | from this node.
+-+-+-+-+-+-+-+-+ +
ctl-1 | | | | | | | | | / MSB
+-+-+-+-+-+-+-+-+/
---- A vector of node's arcs. Conditional format, depending on flags.
1) NEXT bit set, mapped arc label.
+--------------- arc's label mapped in M bits if M's field value > 0
| +------------- node pointed to is next
| | +----------- the last arc of the node
_______| | | +--------- the arc is final
/ | | | |
+-+-+-+-+-+-+-+-+\
0 |M|M|M|M|M|1|L|F| +------ flags + (M) index of the mapped label.
+-+-+-+-+-+-+-+-+/
2) NEXT bit set, label separate.
+--------------- arc's label stored separately (M's field is zero).
| +------------- node pointed to is next
| | +----------- the last arc of the node
| | | +--------- the arc is final
| | | |
+-+-+-+-+-+-+-+-+\
0 |0|0|0|0|0|1|L|F| +------ flags
+-+-+-+-+-+-+-+-+/
+-+-+-+-+-+-+-+-+\
1 | | | | | | | | | +------ label
+-+-+-+-+-+-+-+-+/
3) NEXT bit not set. Full arc.
+------------- node pointed to is next
| +----------- the last arc of the node
| | +--------- the arc is final
| | |
+-+-+-+-+-+-+-+-+\
0 |A|A|A|A|A|0|L|F| +------ flags + (A) address field, lower bits
+-+-+-+-+-+-+-+-+/
+-+-+-+-+-+-+-+-+\
1 | | | | | | | | | +------ label
+-+-+-+-+-+-+-+-+/
: : : : : : : : :
+-+-+-+-+-+-+-+-+\
gtl-1 |A|A|A|A|A|A|A|A| +------ address, continuation (MSB)
+-+-+-+-+-+-+-+-+/
| Field Summary | |
|---|---|
byte[] |
arcs
An array of bytes with the internal representation of the automaton. |
static int |
BIT_FINAL_ARC
Bitmask indicating that an arc corresponds to the last character of a sequence available when building the automaton. |
static int |
BIT_LAST_ARC
Bitmask indicating that an arc is the last one of the node's list and the following one belongs to another node. |
static int |
BIT_TARGET_NEXT
Bitmask indicating that the target node of this arc follows it in the compressed automaton structure (no goto field). |
int |
gtl
Number of bytes each address takes in full, expanded form (goto length). |
byte[] |
labelMapping
Label mapping for arcs of type (1) (see class documentation). |
int |
nodeDataLength
The length of the node header structure (if the automaton was compiled with NUMBERS option). |
static byte |
VERSION
Automaton header version value. |
| Constructor Summary | |
|---|---|
CFSA(java.io.InputStream fsaStream)
Creates a new automaton, reading it from a file in FSA format, version 5. |
|
| Method Summary | |
|---|---|
int |
getArc(int node,
byte label)
|
byte |
getArcLabel(int arc)
Return the label associated with a given arc. |
int |
getEndNode(int arc)
Return the end node pointed to by a given arc. |
int |
getFirstArc(int node)
|
java.util.Set<FSAFlags> |
getFlags()
Returns a set of flags for this FSA instance. |
int |
getNextArc(int arc)
|
int |
getRightLanguageCount(int node)
|
int |
getRootNode()
Returns the start node of this automaton. |
boolean |
isArcFinal(int arc)
Returns true if the destination node at the end of this
arc corresponds to an input sequence created when building
this automaton. |
boolean |
isArcLast(int arc)
Returns true if this arc has NEXT bit set. |
boolean |
isArcTerminal(int arc)
Returns true if this arc does not have a
terminating node (@link FSA.getEndNode(int) will throw an
exception). |
boolean |
isLabelCompressed(int arc)
Returns true if the label is compressed inside flags byte. |
boolean |
isNextSet(int arc)
|
| Methods inherited from class morfologik.fsa.FSA |
|---|
getArcCount, getSequences, getSequences, iterator, read, visitAllStates, visitInPostOrder, visitInPostOrder, visitInPreOrder, visitInPreOrder |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final byte VERSION
public static final int BIT_FINAL_ARC
public static final int BIT_LAST_ARC
public static final int BIT_TARGET_NEXT
public byte[] arcs
public final int nodeDataLength
NUMBERS option). Otherwise zero.
public final int gtl
public final byte[] labelMapping
| Constructor Detail |
|---|
public CFSA(java.io.InputStream fsaStream)
throws java.io.IOException
java.io.IOException| Method Detail |
|---|
public int getRootNode()
0 if
the start node is also an end node.
getRootNode in class FSApublic final int getFirstArc(int node)
getFirstArc in class FSAnode
or 0 if the node has no outgoing arcs.public final int getNextArc(int arc)
getNextArc in class FSAarc and
leaving node. Zero is returned if no more arcs are
available for the node.
public int getArc(int node,
byte label)
getArc in class FSAnode and
labeled with label. An identifier equal to 0 means
the node has no outgoing arc labeled label.public int getEndNode(int arc)
arc. Terminal arcs
(those that point to a terminal state) have no end node representation
and throw a runtime exception.
getEndNode in class FSApublic byte getArcLabel(int arc)
arc.
getArcLabel in class FSApublic int getRightLanguageCount(int node)
getRightLanguageCount in class FSAFSAFlags.NUMBERS. The size of
the right language of the state, in other words.public boolean isArcFinal(int arc)
true if the destination node at the end of this
arc corresponds to an input sequence created when building
this automaton.
isArcFinal in class FSApublic boolean isArcTerminal(int arc)
true if this arc does not have a
terminating node (@link FSA.getEndNode(int) will throw an
exception). Implies FSA.isArcFinal(int).
isArcTerminal in class FSApublic boolean isArcLast(int arc)
true if this arc has NEXT bit set.
BIT_LAST_ARCpublic boolean isNextSet(int arc)
BIT_TARGET_NEXTpublic boolean isLabelCompressed(int arc)
true if the label is compressed inside flags byte.
public java.util.Set<FSAFlags> getFlags()
For this automaton version, an additional FSAFlags.NUMBERS flag
may be set to indicate the automaton contains extra fields for each node.
getFlags in class FSA
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||