|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectkr.ac.kaist.swrc.jhannanum.plugin.MajorPlugin.PosTagger.HmmPosTagger.HMMTagger
public class HMMTagger
Hidden Markov Model based Part Of Speech Tagger. It is a POS Tagger plug-in which is a major plug-in of phase 3 in HanNanum work flow. It uses Hidden Markov Model regarding the features of Korean Eojeol to choose the most promising morphological analysis results of each eojeol for entire sentence.
Nested Class Summary | |
---|---|
private class |
HMMTagger.MNode
Node for the markov model. |
private class |
HMMTagger.WPhead
Header of an eojeol. |
Field Summary | |
---|---|
private static double |
LAMBDA
lambda value |
private static double |
Lambda1
lambda 1 |
private static double |
Lambda2
lambda 2 |
private HMMTagger.MNode[] |
mn
the nodes for the markov model |
private int |
mn_end
the last index of the markov model |
private static double |
PCONSTANT
the default probability |
private java.lang.String |
PTT_POS_TDBM_FILE
the statistic file for the probability P(T|T) for morphemes |
private ProbabilityDBM |
ptt_pos_tf
for the probability P(T|T) |
private java.lang.String |
PTT_WP_TDBM_FILE
the statistic file for the probability P(T|T) for eojeols |
private ProbabilityDBM |
ptt_wp_tf
for the probability P(T|T) for eojeols |
private java.lang.String |
PWT_POS_TDBM_FILE
the statistic file for the probability P(T|W) for morphemes |
private ProbabilityDBM |
pwt_pos_tf
for the probability P(W|T) |
private static double |
SF
log 0.01 - smoothing factor |
private HMMTagger.WPhead[] |
wp
the list of nodes for each eojeol |
private int |
wp_end
the last index of eojeol list |
Constructor Summary | |
---|---|
HMMTagger()
|
Method Summary | |
---|---|
private double |
compute_wt(Eojeol eojeol)
Computes P(T_i, W_i) of the specified eojeol. |
private Sentence |
end_sentence(SetOfSentences sos)
Runs viterbi to get the final morphological analysis result which has the highest probability. |
void |
initialize(java.lang.String baseDir,
java.lang.String configFile)
This method is called before the work flow starts in order to initialize the plug-in. |
private int |
new_mnode(Eojeol eojeol,
java.lang.String wp_tag,
double prob)
Adds a new node for the markov model. |
private int |
new_wp(java.lang.String str)
Adds a new header of an eojeol. |
private void |
reset()
Resets the model. |
void |
shutdown()
This method is called before the work flow is closed. |
Sentence |
tagPOS(SetOfSentences sos)
It performs POS tagging, which selects the most promising morphological analysis result of each eojeol, so that the final result is the morphologically analyzed sentence with the highest probability. |
private void |
update_prob_score(int from,
int to)
Updates the probability regarding the transition between two eojeols. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
private static double SF
private HMMTagger.WPhead[] wp
private int wp_end
private HMMTagger.MNode[] mn
private int mn_end
private ProbabilityDBM pwt_pos_tf
private ProbabilityDBM ptt_pos_tf
private ProbabilityDBM ptt_wp_tf
private java.lang.String PWT_POS_TDBM_FILE
private java.lang.String PTT_POS_TDBM_FILE
private java.lang.String PTT_WP_TDBM_FILE
private static final double PCONSTANT
private static final double LAMBDA
private static final double Lambda1
private static final double Lambda2
Constructor Detail |
---|
public HMMTagger()
Method Detail |
---|
public Sentence tagPOS(SetOfSentences sos)
PosTagger
tagPOS
in interface PosTagger
sos
- - the result morphological analysis where each eojeol has more than one candidate of analysis
public void initialize(java.lang.String baseDir, java.lang.String configFile) throws java.lang.Exception
Plugin
initialize
in interface Plugin
baseDir
- - the base directory of HanNanum filesconfigFile
- - the path for the configuration file
java.lang.Exception
- xpublic void shutdown()
Plugin
shutdown
in interface Plugin
private double compute_wt(Eojeol eojeol)
eojeol
- - the eojeol to compute the probability
private Sentence end_sentence(SetOfSentences sos)
sos
- - all the candidates of morphological analysis
private int new_mnode(Eojeol eojeol, java.lang.String wp_tag, double prob)
eojeol
- - the eojeol to addwp_tag
- - the eojeol tagprob
- - the probability P(w|t)
private int new_wp(java.lang.String str)
str
- - the plain string of the eojeol
private void reset()
private void update_prob_score(int from, int to)
from
- - the previous eojeolto
- - the current eojeol
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |