kr.ac.kaist.swrc.jhannanum.hannanum
Class Workflow

java.lang.Object
  extended by kr.ac.kaist.swrc.jhannanum.hannanum.Workflow

public class Workflow
extends java.lang.Object

This class is for the HanNanum work flow, which can be set up with more than one HanNanum plug-in. The work flow can be used with the following steps:

1. Create the work flow using one of the constructors with suitable configurations.
2. Set the plug-ins up on the work flow regarding the purpose of analysis and the characteristics of input.
3. Activate the work flow in the multi-thread mode or the single thread mode.
4. Analyze the target text with the work flow.
5. Get the result with some relevant data type or string representation.
6. Repeat the step 4-5 you need it.
7. Close the work flow when it will not be used anymore.

Take a look the demo program - kr.ac.kaist.swrc.jhannanum.demo.WorkflowWithHMMTagger for an example.

Author:
Sangwon Park (hudoni@world.kaist.ac.kr), CILab, SWRC, Kaist

Field Summary
private  java.lang.String baseDir
          The path for the base directory data and configuration files.
private  boolean isInitialized
          It is true when the work flow is ready for analysis.
private  boolean isThreadMode
          The flag for the thread mode. true: multi-thread mode, false: single-thread mode.
static int MAX_SUPPLEMENT_PLUGIN_NUM
          The default value for the maximum number of the supplement plug-ins on each phase.
private  int maxSupplementPluginNum
          The maximum number of the supplement plug-ins of each phase.
private  MorphAnalyzer morphAnalyzer
          The second phase, major plug-in - morphological analyzer.
private  java.lang.String morphAnalyzerConfFile
          The configuration file for the morphological analyzer.
private  int morphemePluginCnt
          The number of the morpheme processors.
private  MorphemeProcessor[] morphemeProcessors
          The second phase, supplement plug-ins, morpheme processors.
private  java.lang.String[] morphemeProcessorsConfFiles
          The configuration files for the morpheme processors.
private  int outputPhaseNum
          The analysis phase of the work flow.
private  int outputQueueNum
          The number of the plug-ins for the last phase of the work flow.
private  int plainTextPluginCnt
          The number of the plain text processors.
private  PlainTextProcessor[] plainTextProcessors
          The first phase, supplement plug-ins, plain text processors.
private  java.lang.String[] plainTextProcessorsConfFiles
          The configuration files for the plain text processors.
private  int posPluginCnt
          The number of pos processors.
private  java.lang.String[] posProcessorConfFiles
          The configuration file for the pos processors.
private  PosProcessor[] posProcessors
          The third phase, supplement plug-ins, pos processors.
private  PosTagger posTagger
          The third phase, major plug-in - POS tagger.
private  java.lang.String posTaggerConfFile
          The configuration file for the POS tagger.
(package private)  java.util.ArrayList<java.util.concurrent.LinkedBlockingQueue<PlainSentence>> queuePhase1
          The communication queues for the fist phase plug-ins.
(package private)  java.util.ArrayList<java.util.concurrent.LinkedBlockingQueue<SetOfSentences>> queuePhase2
          The communication queues for the second phase plug-ins.
(package private)  java.util.ArrayList<java.util.concurrent.LinkedBlockingQueue<Sentence>> queuePhase3
          The communication queues for the third phase plug-ins.
private  java.util.LinkedList<java.lang.Thread> threadList
          Plug-in thread list.
 
Constructor Summary
Workflow()
          Constructor.
Workflow(java.lang.String baseDir)
          Constructor.
Workflow(java.lang.String baseDir, int maxSupplementPluginNum)
          Constructor.
 
Method Summary
 void activateWorkflow(boolean threadMode)
          It activates the work flow with the plug-ins that were set up.
 void analyze(java.io.File document)
          It adds the specified input text to the input queue of the work flow.
 void analyze(java.lang.String document)
          It adds the specified input text to the input queue of the work flow.
private  void analyzeInSingleThread()
          Analyze the text in the single thread.
 void appendMorphemeProcessor(MorphemeProcessor plugin, java.lang.String configFile)
          Appends the morpheme processor plug-in, which is the supplement plug-in on the second phase, on the work flow.
 void appendPlainTextProcessor(PlainTextProcessor plugin, java.lang.String configFile)
          Appends the plain text processor plug-in, which is the supplement plug-in on the first phase, on the work flow.
 void appendPosProcessor(PosProcessor plugin, java.lang.String configFile)
          Appends the POS processor plug-in, which is the supplement plug-in on the third phase, on the work flow.
 void clear()
          It removes the plug-ins on the work flow.
 void close()
          It ends the threads for each plug-in on the work flow.
 java.lang.String getResultOfDocument()
          Returns the analysis result list for all sentence in the result.
<T> java.util.LinkedList<T>
getResultOfDocument(T a)
          Returns the analysis result list for all sentence in the result.
 java.lang.String getResultOfSentence()
          Returns the analysis result for one sentence at the top of the result queue.
<T> T
getResultOfSentence(T a)
          Returns the analysis result for one sentence at the top of the result queue.
private  void runThreads()
          It starts the threads for each plug-in on the work flow, when the work flow was activated with the multi-thread mode.
 void setMorphAnalyzer(MorphAnalyzer ma, java.lang.String configFile)
          Sets the morphological analyzer plug-in, which is the major plug-in on second phase, on the work flow.
 void setPosTagger(PosTagger tagger, java.lang.String configFile)
          Sets the POS tagger plug-in, which is the major plug-in on the third phase, on the work flow.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

MAX_SUPPLEMENT_PLUGIN_NUM

public static int MAX_SUPPLEMENT_PLUGIN_NUM
The default value for the maximum number of the supplement plug-ins on each phase.


maxSupplementPluginNum

private int maxSupplementPluginNum
The maximum number of the supplement plug-ins of each phase.


isThreadMode

private boolean isThreadMode
The flag for the thread mode. true: multi-thread mode, false: single-thread mode.


outputPhaseNum

private int outputPhaseNum
The analysis phase of the work flow.


outputQueueNum

private int outputQueueNum
The number of the plug-ins for the last phase of the work flow.


threadList

private java.util.LinkedList<java.lang.Thread> threadList
Plug-in thread list.


morphAnalyzer

private MorphAnalyzer morphAnalyzer
The second phase, major plug-in - morphological analyzer.


morphAnalyzerConfFile

private java.lang.String morphAnalyzerConfFile
The configuration file for the morphological analyzer.


posTagger

private PosTagger posTagger
The third phase, major plug-in - POS tagger.


posTaggerConfFile

private java.lang.String posTaggerConfFile
The configuration file for the POS tagger.


plainTextProcessors

private PlainTextProcessor[] plainTextProcessors
The first phase, supplement plug-ins, plain text processors.


plainTextProcessorsConfFiles

private java.lang.String[] plainTextProcessorsConfFiles
The configuration files for the plain text processors.


plainTextPluginCnt

private int plainTextPluginCnt
The number of the plain text processors.


morphemeProcessors

private MorphemeProcessor[] morphemeProcessors
The second phase, supplement plug-ins, morpheme processors.


morphemeProcessorsConfFiles

private java.lang.String[] morphemeProcessorsConfFiles
The configuration files for the morpheme processors.


morphemePluginCnt

private int morphemePluginCnt
The number of the morpheme processors.


posProcessors

private PosProcessor[] posProcessors
The third phase, supplement plug-ins, pos processors.


posProcessorConfFiles

private java.lang.String[] posProcessorConfFiles
The configuration file for the pos processors.


posPluginCnt

private int posPluginCnt
The number of pos processors.


isInitialized

private boolean isInitialized
It is true when the work flow is ready for analysis.


baseDir

private java.lang.String baseDir
The path for the base directory data and configuration files.


queuePhase1

java.util.ArrayList<java.util.concurrent.LinkedBlockingQueue<PlainSentence>> queuePhase1
The communication queues for the fist phase plug-ins.


queuePhase2

java.util.ArrayList<java.util.concurrent.LinkedBlockingQueue<SetOfSentences>> queuePhase2
The communication queues for the second phase plug-ins.


queuePhase3

java.util.ArrayList<java.util.concurrent.LinkedBlockingQueue<Sentence>> queuePhase3
The communication queues for the third phase plug-ins.

Constructor Detail

Workflow

public Workflow()
Constructor. The maximum number of supplement plug-ins for each phase is set up with Workflow.MAX_SUPPLEMENT_PLUGIN_NUM.


Workflow

public Workflow(java.lang.String baseDir)
Constructor. The maximum number of supplement plug-ins for each phase is set up with Workflow.MAX_SUPPLEMENT_PLUGIN_NUM.

Parameters:
baseDir - - the path for base directory, which should have the 'conf' and 'data' directory

Workflow

public Workflow(java.lang.String baseDir,
                int maxSupplementPluginNum)
Constructor.

Parameters:
baseDir - - the path for base directory, which should have the 'conf' and 'data' directory
maxSupplementPluginNum - - the maximum number of supplement plug-ins for each phase
Method Detail

setMorphAnalyzer

public void setMorphAnalyzer(MorphAnalyzer ma,
                             java.lang.String configFile)
Sets the morphological analyzer plug-in, which is the major plug-in on second phase, on the work flow.

Parameters:
ma - - the morphological analyzer plug-in
configFile - - the path for the configuration file (relative path to the base directory)

setPosTagger

public void setPosTagger(PosTagger tagger,
                         java.lang.String configFile)
Sets the POS tagger plug-in, which is the major plug-in on the third phase, on the work flow.

Parameters:
tagger - - the POS tagger plug-in
configFile - - the path for the configuration file (relative path to the base directory)

appendPlainTextProcessor

public void appendPlainTextProcessor(PlainTextProcessor plugin,
                                     java.lang.String configFile)
Appends the plain text processor plug-in, which is the supplement plug-in on the first phase, on the work flow.

Parameters:
plugin - - the plain text processor plug-in
configFile - - the path for the configuration file (relative path to the base directory)

appendMorphemeProcessor

public void appendMorphemeProcessor(MorphemeProcessor plugin,
                                    java.lang.String configFile)
Appends the morpheme processor plug-in, which is the supplement plug-in on the second phase, on the work flow.

Parameters:
plugin - - the morpheme processor plug-in
configFile - - the path for the configuration file (relative path to the base directory)

appendPosProcessor

public void appendPosProcessor(PosProcessor plugin,
                               java.lang.String configFile)
Appends the POS processor plug-in, which is the supplement plug-in on the third phase, on the work flow.

Parameters:
plugin - - the plain POS processor plug-in
configFile - - the path for the configuration file (relative path to the base directory)

activateWorkflow

public void activateWorkflow(boolean threadMode)
                      throws java.lang.Exception
It activates the work flow with the plug-ins that were set up. The work flow can be activated in the thread mode where each plug-in works on its own thread. It may show better performance in the machines with multi-processor.

Parameters:
threadMode - - true: multi-thread mode, false: sigle thread mode
Throws:
java.lang.Exception

runThreads

private void runThreads()
It starts the threads for each plug-in on the work flow, when the work flow was activated with the multi-thread mode.


close

public void close()
It ends the threads for each plug-in on the work flow. The shutdown() methods of each plug-in are called before they end.


clear

public void clear()
It removes the plug-ins on the work flow.


analyze

public void analyze(java.lang.String document)
It adds the specified input text to the input queue of the work flow. After this method, you are allowed to get the analysis result by using one of the following methods: - getResultOfSentence() : to get the result for one sentence at the front of result queue - getResultOfDocument() : to get the entire result for all sentences If the input document is not small, getResultOfDocument() may show lower performance, and it could be better to call getResultOfSentence() repeatedly. You need to pay attention on this.

Parameters:
document - - sequence of sentences separated with newlines.

analyze

public void analyze(java.io.File document)
             throws java.io.IOException
It adds the specified input text to the input queue of the work flow. After this method, you are allowed to get the analysis result by using one of the following methods: - getResultOfSentence() : to get the result for one sentence at the front of result queue - getResultOfDocument() : to get the entire result for all sentences If the input document is not small, getResultOfDocument() may show lower performance, and it could be better to call getResultOfSentence() repeatedly. You need to pay attention on this.

Parameters:
document - - the path for the text file to be analyzed
Throws:
java.io.IOException

getResultOfSentence

public <T> T getResultOfSentence(T a)
                      throws ResultTypeException
Returns the analysis result for one sentence at the top of the result queue. You can call this method repeatedly to get the result for remaining sentences in the input document. If there is no result, this method will be blocked until a new result comes. It stores the specified object with the analysis result. The return type of the object depends on the analysis phase of the work flow so you must give the relevant type of parameter. In this way, you can get the analysis result with a relevant object, so you don't need to parse the result string again. If you just want to see the result, consider to use "String getResultOfSentence()" instead.

Type Parameters:
T - - One of PlainSentence (for the first phase), Sentence (for the second phase), and SetOfSentences (for the third phase).
Parameters:
a - - the object to get the result
Returns:
the analysis result for one sentence at front
Throws:
ResultTypeException

getResultOfSentence

public java.lang.String getResultOfSentence()
Returns the analysis result for one sentence at the top of the result queue. You can call this method repeatedly to get the result for remaining sentences in the input document. If there is no result, this method will be blocked until a new result comes. It returns the sting representation of the result. If you want to reuse the result, the string should be parsed, which requires extra program codes and causes overhead. To solve this problem, consider to use " T getResultOfSentence(T a)" instead.

Returns:
the string representation of the analysis result for one sentence at front
Throws:
ResultTypeException

getResultOfDocument

public <T> java.util.LinkedList<T> getResultOfDocument(T a)
                                            throws ResultTypeException
Returns the analysis result list for all sentence in the result. When you use this method, you need to pay attention on the size of the data. If the size of data is big, it may show lower performance than using getResultOfSentence() repeatedly. The return type of the object depends on the analysis phase of the work flow so you must give the relevant type of parameter. In this way, you can get the analysis result with a relevant object, so you don't need to parse the result string again. If you just want to see the result, consider to use "String getResultOfDocument()" instead.

Type Parameters:
T - - One of PlainSentence (for the first phase), Sentence (for the second phase), and SetOfSentences (for the third phase).
Parameters:
a - - the object to specify the return type
Returns:
the list of the analysis result for all sentences in the document
Throws:
ResultTypeException

getResultOfDocument

public java.lang.String getResultOfDocument()
Returns the analysis result list for all sentence in the result. When you use this method, you need to pay attention on the size of the data. If the size of data is big, it may show lower performance than using getResultOfSentence() repeatedly. It returns the sting representation of the result. If you want to reuse the result, the string should be parsed, which requires extra program codes and causes overhead. To solve this problem, consider to use " LinkedList getResultOfDocument(T a)" instead.

Returns:
the list of the analysis result for all sentences in the document
Throws:
ResultTypeException

analyzeInSingleThread

private void analyzeInSingleThread()
Analyze the text in the single thread.