kr.ac.kaist.swrc.jhannanum.plugin.MajorPlugin.MorphAnalyzer.ChartMorphAnalyzer
Class MorphemeChart

java.lang.Object
  extended by kr.ac.kaist.swrc.jhannanum.plugin.MajorPlugin.MorphAnalyzer.ChartMorphAnalyzer.MorphemeChart

public class MorphemeChart
extends java.lang.Object

This class is for the lattice style morpheme chart which is a internal data structure for morphological analysis without backtracking.

Author:
Sangwon Park (hudoni@world.kaist.ac.kr), CILab, SWRC, KAIST

Nested Class Summary
 class MorphemeChart.Morpheme
          A morpheme node in the lattice style chart.
 
Field Summary
private  java.lang.String bufString
          string buffer
 MorphemeChart.Morpheme[] chart
          the morpheme chart
 int chartEnd
          the last index of the chart
private static java.lang.String CHI_REPLACE
          the reserved word for replacement of Chinese characters
private  int chiReplaceIndex
          the index for replacement of Chinese characters
private  java.util.LinkedList<java.lang.String> chiReplacementList
          the list for the replacement of Chinese character
private  Connection connection
          the connection rules
private static java.lang.String ENG_REPLACE
          the reserved word for replacement of English alphabets
private  int engReplaceIndex
          the index for replacement of English alphabets
private  java.util.LinkedList<java.lang.String> engReplacementList
          the list for the replacement of English alphabets
private  Exp exp
          chart expansion
private static int MAX_CANDIDATE_NUM
          the maximum number of analysis results
private static int MAX_MORPHEME_CHART
          the maximum number of morpheme nodes in the chart
private static int MAX_MORPHEME_CONNECTION
          the maximum number of connections between one morpheme and others
private static int MORPHEME_STATE_FAIL
          the processing state - fail
private static int MORPHEME_STATE_INCOMPLETE
          the processing state - incomplete
private static int MORPHEME_STATE_SUCCESS
          the processing state - success
private  NumberDic numDic
          number dictionary - automata
private  int printResultCnt
          the number of analysis results printed
private  java.util.LinkedList<Eojeol> resEojeols
          the list of eojeols analyzed
private  java.util.ArrayList<java.lang.String> resMorphemes
          the list of morphemes analyzed
private  java.util.ArrayList<java.lang.String> resTags
          the list of morpheme tags analyzed
private  int[] segmentPath
          path of segmentation
private  Simti simti
          SIMple Trie Index
private  SegmentPosition sp
          segment position
private  Trie systemDic
          system morpheme dictionary
private  TagSet tagSet
          the morpheme tag set
private  Trie userDic
          user morpheme dictionary
 
Constructor Summary
MorphemeChart(TagSet tagSet, Connection connection, Trie systemDic, Trie userDic, NumberDic numDic, Simti simti, java.util.LinkedList<Eojeol> resEojeolList)
          Constructor.
 
Method Summary
 int addMorpheme(int tag, int phoneme, int nextPosition, int nextTagType)
          Adds a new morpheme to the chart.
 int altSegment(java.lang.String str)
          It inserts the reverse of the given string to the SIMTI data structure.
 int analyze()
          It performs morphological analysis on the morpheme chart constructed.
private  int analyze(int chartIndex, int tagType)
          It performs morphological anlysis on the morpheme chart from the specified index in the chart.
private  int analyzeUnknown()
          It segments all phonemes, and tags 'unknown' to each segment, and then performs chart analysis, so that the eojeols that consist of morphems not in the dictionaries can be processed.
 boolean checkChart(int[] morpheme, int morphemeLen, int tag, int phoneme, int nextPosition, int nextTagType, java.lang.String str)
          Checks the specified morpheme is exist in the morpheme chart.
 void getResult()
          Generates the morphological analysis result based on the morpheme chart where the analysis is performed.
 void init(java.lang.String word)
          Initializes the morpheme chart with the specified word.
 void phonemeChange(int from, java.lang.String front, java.lang.String back, int ftag, int btag, int phoneme)
          It expands the morpheme chart to deal with the phoneme change phenomenon.
private  java.lang.String preReplace(java.lang.String str)
          Replaces the English alphabets and Chinese characters in the specified string with the reserved words.
private  void printChart(int chartIndex)
          It generates the final mophological analysis result from the morpheme chart.
 void printMorphemeAll()
          It prints the all data in the chart to the console.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

CHI_REPLACE

private static final java.lang.String CHI_REPLACE
the reserved word for replacement of Chinese characters

See Also:
Constant Field Values

ENG_REPLACE

private static final java.lang.String ENG_REPLACE
the reserved word for replacement of English alphabets

See Also:
Constant Field Values

chiReplacementList

private java.util.LinkedList<java.lang.String> chiReplacementList
the list for the replacement of Chinese character


engReplacementList

private java.util.LinkedList<java.lang.String> engReplacementList
the list for the replacement of English alphabets


engReplaceIndex

private int engReplaceIndex
the index for replacement of English alphabets


chiReplaceIndex

private int chiReplaceIndex
the index for replacement of Chinese characters


MAX_MORPHEME_CONNECTION

private static final int MAX_MORPHEME_CONNECTION
the maximum number of connections between one morpheme and others

See Also:
Constant Field Values

MAX_MORPHEME_CHART

private static final int MAX_MORPHEME_CHART
the maximum number of morpheme nodes in the chart

See Also:
Constant Field Values

MORPHEME_STATE_INCOMPLETE

private static final int MORPHEME_STATE_INCOMPLETE
the processing state - incomplete

See Also:
Constant Field Values

MORPHEME_STATE_SUCCESS

private static final int MORPHEME_STATE_SUCCESS
the processing state - success

See Also:
Constant Field Values

MAX_CANDIDATE_NUM

private static final int MAX_CANDIDATE_NUM
the maximum number of analysis results

See Also:
Constant Field Values

MORPHEME_STATE_FAIL

private static final int MORPHEME_STATE_FAIL
the processing state - fail

See Also:
Constant Field Values

chart

public MorphemeChart.Morpheme[] chart
the morpheme chart


chartEnd

public int chartEnd
the last index of the chart


tagSet

private TagSet tagSet
the morpheme tag set


connection

private Connection connection
the connection rules


sp

private SegmentPosition sp
segment position


bufString

private java.lang.String bufString
string buffer


segmentPath

private int[] segmentPath
path of segmentation


exp

private Exp exp
chart expansion


systemDic

private Trie systemDic
system morpheme dictionary


userDic

private Trie userDic
user morpheme dictionary


numDic

private NumberDic numDic
number dictionary - automata


simti

private Simti simti
SIMple Trie Index


printResultCnt

private int printResultCnt
the number of analysis results printed


resEojeols

private java.util.LinkedList<Eojeol> resEojeols
the list of eojeols analyzed


resMorphemes

private java.util.ArrayList<java.lang.String> resMorphemes
the list of morphemes analyzed


resTags

private java.util.ArrayList<java.lang.String> resTags
the list of morpheme tags analyzed

Constructor Detail

MorphemeChart

public MorphemeChart(TagSet tagSet,
                     Connection connection,
                     Trie systemDic,
                     Trie userDic,
                     NumberDic numDic,
                     Simti simti,
                     java.util.LinkedList<Eojeol> resEojeolList)
Constructor.

Parameters:
tagSet - - the morpheme tag set
connection - - the morpheme connection rules
systemDic - - the system morpheme dictionary
userDic - - the user morpheme dictionary
numDic - - the number dictionary
simti - - the SIMple Trie Index
resEojeolList - - the list of eojeols to store the analysis result
Method Detail

addMorpheme

public int addMorpheme(int tag,
                       int phoneme,
                       int nextPosition,
                       int nextTagType)
Adds a new morpheme to the chart.

Parameters:
tag - - the morpheme tag ID
phoneme - - phoneme
nextPosition - - the index of next morpheme
nextTagType - - the tag type of next morpheme
Returns:
the last index of the chart

altSegment

public int altSegment(java.lang.String str)
It inserts the reverse of the given string to the SIMTI data structure.

Parameters:
str - - string to insert to the SIMTI structure
Returns:
the index of the next morpheme

analyze

public int analyze()
It performs morphological analysis on the morpheme chart constructed.

Returns:
the number of analysis results

analyze

private int analyze(int chartIndex,
                    int tagType)
It performs morphological anlysis on the morpheme chart from the specified index in the chart.

Parameters:
chartIndex - - the index of the chart to analyze
tagType - - the type of next morpheme
Returns:
the number of analysis results

analyzeUnknown

private int analyzeUnknown()
It segments all phonemes, and tags 'unknown' to each segment, and then performs chart analysis, so that the eojeols that consist of morphems not in the dictionaries can be processed.

Returns:
the number of analysis results

checkChart

public boolean checkChart(int[] morpheme,
                          int morphemeLen,
                          int tag,
                          int phoneme,
                          int nextPosition,
                          int nextTagType,
                          java.lang.String str)
Checks the specified morpheme is exist in the morpheme chart.

Parameters:
morpheme - - the list of indices of the morphemes to check
morphemeLen - - the length of the list
tag - - morpheme tag ID
phoneme - - phoneme
nextPosition - - the index of the next morpheme
nextTagType - - the type of the next morpheme tag
str - - plain string
Returns:
true: the morpheme is in the chart, false: not exist

getResult

public void getResult()
Generates the morphological analysis result based on the morpheme chart where the analysis is performed.


init

public void init(java.lang.String word)
Initializes the morpheme chart with the specified word.

Parameters:
word - - the plain string of an eojeol to analyze

phonemeChange

public void phonemeChange(int from,
                          java.lang.String front,
                          java.lang.String back,
                          int ftag,
                          int btag,
                          int phoneme)
It expands the morpheme chart to deal with the phoneme change phenomenon.

Parameters:
from - - the index of the start segment position
front - - the front part of the string
back - - the next part of the string
ftag - - the morpheme tag of the front part
btag - - the morpheme tag of the next part
phoneme - - phoneme

printChart

private void printChart(int chartIndex)
It generates the final mophological analysis result from the morpheme chart.

Parameters:
chartIndex - - the start index of the chart to generate final result

printMorphemeAll

public void printMorphemeAll()
It prints the all data in the chart to the console.


preReplace

private java.lang.String preReplace(java.lang.String str)
Replaces the English alphabets and Chinese characters in the specified string with the reserved words.

Parameters:
str - - the string to replace English and Chinese characters
Returns:
the string in which English and Chinese characters were replace with the reserved words