org.apache.uima.java true com.ibm.langware.annotator.jFrostLexAnnotator LanguageWare Lexical Annotator This annotator provides access to LanguageWare Lexical Analysis. 8.0.4.0 IBM Corporation SofaNames The Sofa names the annotator should work on. If no names are specified, the annotator works on the default sofa. String true false LWDataSubdir The name of the directory under the UIMA data directory in which the LanguageWare resources are located String false false UseExplicitDicts Dictionaries to be used are specified explicitly in this config file. Boolean false false PreloadLanguages A list of all languages which should be pre-loaded at init-time in the form xx-YY (xx=lang, YY=sublang/country) String true false DefaultLanguage The language to use in processing when the document language is not set before the annotator processing. String false false DictionaryCacheSize !Deprecated! Maximum number of dictionaries held in cache Integer false false ProcessLanguagesWithNoDictionaries Control the annotator behaviour if no dictionaries are configured for the processed document language. If "tokenize", only basic tokenization will be possible. If "skip", processing will be terminated with no errors. If "error", an exception will be thrown. The default value is "skip". String false false UseFirstMatchPolicy If true lookup stops after the first match in any dictionary (DLTCM_POLICY_FIRST) otherwise all matches from all dictionaries are found (DLTCM_POLICY_ALL) Boolean false false UseStrictCaseMode If true, the strict-case mode is turned 'ON'. That means Case information will be respected when doing lookup in lowercase dictionaries. otherwise, it will set strict-case mode to 'OFF' and a match will be returned even if the case doesn't match. Boolean false false UseRelativeTokenAndSentenceNumbers If true token and sentence numbers are reset to 1 for each new sentence/paragraph Boolean false false AnnotateMWConstituentTokens If true, MWU annotations will be created for Multi-Word entries and Token Annotations will be created for their constituent words; otherwise, Only MWU annotations will be created. Boolean false false MWBoundary This defines MWUs lookup boundaries. possible values for this parameter are: "Sentence", "Paragraph", or "Document". String false false IgnorePunctuationTokens If true, punctuation tokens are ignored Boolean false false AggressiveSentenceBreaks !Deprecated! If true, an end-of-line will be considered end-of-sentence Boolean false false CrossDictionaryDecomposition If true a decomposition is performed across dictionaries i.e. words from several dictionaries may be combined into one compound Boolean false false BOFAOnlyDecomposition If true a decomposition is performed basing on BOFA values only. Boolean false false FilterDecomposedGlosses If true, the paradigms reported by decomposition for each component are filtered according to the decomposition rules, removing paradigms that are not valid in combination. Setting this to false may lead to better performance and recall at the expense of precision. Boolean false false StandaloneDecomposition If true, the lexical analyzer tries to decompose dictionary-matched entries which have a compound flag. Boolean false false JapaneseDecomposition If true decomposition is done for Japanese documents without regard to the result specification Boolean false false JapaneseDeepWordBreak If true returns Japanese word suffixes separated from their stems Boolean false false CreateCompoundPartsInsteadOfToken If true then compound parts are created not as type uima.tt.CompPartAnnotation but as uima.tt.TokenAnnotation. The annotations for a compound parts of a complex word are created instead of (not in addition to) the token for the whole complex word. Boolean false true ReturnOnlyFirstLevelOfCompoundBreakdown If true then for compounds which have several decompositions are only the first (longest match) decomposition is returned. E.g. for the German "Segelschullehrer" only "Segelschul"+"lehrer" is returned and not also "Segel" + "schul"+ "lehrer" Boolean false false CreateDecompStructure If true, then full decomp analysis structure is created. This option is intended to be used mutually exclusively with the previous two. Boolean false false BreakOnHyphens !Deprecated! If true then we will try to break unknown words if it contains a hyphen Boolean false false DoLookupVariant If true lookup unknown word in variant dictionary. Boolean false false DoRuleBasedNormalization4All If true lookup a variant with rulebased normalization for all unknown word. Boolean false false DoRuleBasedNormalization4Katakana If true lookup a variant with rulebased normalization only for katakana word. Boolean false false CreateGenericAnnotations Create Generic annotations if annotate glosses availables. Boolean false false CheckGenericTypes Check the types when writing the feature values for generic annotations. Boolean false false GlossComparatorClassname The full name of the class implementation for the Comparator interface to be used for sorting gloss collections. String false false LemmaPoolingThreshold A threshold that is used to control lemma Pooling. Pooling enhance memory usage of the annotator. It is good when processing large documents. Setting the value to 0 means always enabled; while setting its value to -1 disables pooling. Integer false false LexicalDicts File name of dictionaries for the lexical analysis String true false MultiWordDicts File name of dictionaries for the specific multi-word unit String true false OOVDicts File name of dictionaries for the morphological guesser (out-of-vocabulary) String true false SynonymDicts File name of dictionaries for synonyms String true false VariantDicts File name of dictionaries for word variants String true false SpellCorrectionDicts File name of dictionaries for the spelling correction String true false PartOfSpeechDict File name of dictionary for the Part-of-Speech Tagging String false false PostTagHandling Post tag handling policy String false false PostLemmaEntryHandling Post LemmaEntries handling policy String false false MaxCharNumPerSentence The maximum number of characters in a sentence. Integer false false BreakRulesSpec Break rules to be used. String false false DecompositionRulesSpec Decomposition rules to be used. String false false LWDataSubdir PreloadLanguages en UseExplicitDicts true ProcessLanguagesWithNoDictionaries skip UseFirstMatchPolicy false UseStrictCaseMode false UseRelativeTokenAndSentenceNumbers false AnnotateMWConstituentTokens true MWBoundary Sentence IgnorePunctuationTokens false CrossDictionaryDecomposition true BOFAOnlyDecomposition false FilterDecomposedGlosses true JapaneseDecomposition true JapaneseDeepWordBreak false CreateCompoundPartsInsteadOfToken true ReturnOnlyFirstLevelOfCompoundBreakdown false CreateDecompStructure false DoLookupVariant false DoRuleBasedNormalization4All false DoRuleBasedNormalization4Katakana false CreateGenericAnnotations true CheckGenericTypes false GlossComparatorClassname com.ibm.langware.annotator.GlossComparator PartOfSpeechDict de-XX-TSimplified-7220.dic LexicalDicts ../resources/dictionary/8/de-XX-LLex-7017.dic ../resources/dictionary/9/de-XX-OOV-7002.dic OOVDicts BreakRulesSpec PartOfSpeechDict ru-RU-TSimplified-7200.dic LexicalDicts ../resources/dictionary/24/ru-RU-LLex-7003.dic ../resources/dictionary/25/ru-RU-OOV-7003.dic OOVDicts BreakRulesSpec PartOfSpeechDict pt-XX-TSimplified-7001.dic LexicalDicts ../resources/dictionary/22/pt-XX-LLex-7008.dic ../resources/dictionary/23/pt-XX-OOV-7003.dic OOVDicts BreakRulesSpec PartOfSpeechDict ko-KR-TKpos-8041.dic LexicalDicts OOVDicts BreakRulesSpec PartOfSpeechDict en-XX-TPenn-7212.dic LexicalDicts ../resources/dictionary/0/en-XX-LLex-7030.dic ../resources/dictionary/1/en-XX-OOV-7004.dic OOVDicts BreakRulesSpec PartOfSpeechDict it-IT-TSimplified-7001.dic LexicalDicts ../resources/dictionary/15/it-IT-LLex-7007.dic ../resources/dictionary/16/it-IT-OOV-7002.dic OOVDicts BreakRulesSpec PartOfSpeechDict fr-XX-TSimplified-7001.dic LexicalDicts ../resources/dictionary/12/fr-XX-LLex-7009.dic ../resources/dictionary/13/fr-XX-OOV-7002.dic OOVDicts BreakRulesSpec PartOfSpeechDict zh-XX-TCpos-7000.dic LexicalDicts ../resources/dictionary/26/zh-XX-Lex-8003.dic OOVDicts BreakRulesSpec PartOfSpeechDict es-ES-TSimplified-7002.dic LexicalDicts ../resources/dictionary/10/es-ES-LLex-7006.dic ../resources/dictionary/11/es-ES-OOV-7003.dic OOVDicts BreakRulesSpec PartOfSpeechDict cs-CZ-TSimplified-7200.dic LexicalDicts ../resources/dictionary/4/cs-CZ-LLex-7003.dic ../resources/dictionary/5/cs-CZ-OOV-7004.dic OOVDicts BreakRulesSpec PartOfSpeechDict ar-XX-TSimplified-7003.dic LexicalDicts ../resources/dictionary/2/ar-XX-Lex-7007.dic ../resources/dictionary/3/ar-XX-OOV-7003.dic OOVDicts BreakRulesSpec PartOfSpeechDict ja-JP-TJpos-7000.dic LexicalDicts ../resources/dictionary/17/ja-JP-Lex-7006.dic OOVDicts BreakRulesSpec PartOfSpeechDict pl-PL-TSimplified-7200.dic LexicalDicts ../resources/dictionary/20/pl-PL-LLex-7003.dic ../resources/dictionary/21/pl-PL-OOV-7004.dic OOVDicts BreakRulesSpec PartOfSpeechDict da-DK-TSimplified-7000.dic LexicalDicts ../resources/dictionary/6/da-DK-LLex-7005.dic ../resources/dictionary/7/da-DK-OOV-7002.dic OOVDicts BreakRulesSpec PartOfSpeechDict he-IL-TSimplified-7201.dic LexicalDicts ../resources/dictionary/14/he-IL-Lex-7205.dic OOVDicts BreakRulesSpec PartOfSpeechDict tr-TR-TTpos-8502.dic LexicalDicts OOVDicts BreakRulesSpec PartOfSpeechDict nl-NL-TSimplified-7000.dic LexicalDicts ../resources/dictionary/18/nl-NL-Reform-LLex-7004.dic ../resources/dictionary/19/nl-NL-OOV-7002.dic OOVDicts BreakRulesSpec uima.tcas.DocumentAnnotation Annotation covering the entire document, containing document meta information, for example the document language uima.tcas.Annotation language The document language uima.cas.String languageCandidates A list of language candidates for the document produced during language identification. These are sorted by confidence value uima.cas.FSList uima.tt.TTAnnotation Base type for lexical and document structure annotation types uima.tcas.Annotation uima.tt.DocStructureAnnotation Base type for document structure annotation types uima.tt.TTAnnotation uima.tt.ParagraphAnnotation A paragraph uima.tt.DocStructureAnnotation paragraphNumber The sequence number of the paragraph in the document uima.cas.Integer uima.tt.SentenceAnnotation A sentence uima.tt.DocStructureAnnotation sentenceNumber The sequence number of the sentence in the paragraph (or the document) uima.cas.Integer uima.tt.LexicalAnnotation Base type for lexical annotation types uima.tt.TTAnnotation uima.tt.DictionaryEntryAnnotation Base type for dictionary-based user-defined annotation types uima.tt.LexicalAnnotation lemma Morphological information for the dictionary entry uima.tt.Lemma uima.tt.TokenLikeAnnotation Base type for token annotation types uima.tt.LexicalAnnotation lemma The best probable entry containing all morphological information for the token uima.tt.Lemma lemmaEntries List of lemma entries containing all morphological information for the token uima.cas.FSArray dictionaryMatch A flag indicating whether or not the token matches a dictionary entry uima.cas.Boolean uima.tt.TokenAnnotation General token annotation type. It is also the base type for the special token types uima.tt.TokenLikeAnnotation posTag Part-of-Speech tag uima.cas.String uima.tt.CompPartAnnotation A part of a compound word uima.tt.TokenLikeAnnotation uima.tt.KeyStringEntry Base type for types defining key/value feature (e.g. uima.tt.Lemma type) uima.cas.TOP key A key/value feature (e.g. lemma string in uima.tt.Lemma type) uima.cas.String uima.tt.Lemma Morphological information retrieved from a lexical dictionary entry uima.tt.KeyStringEntry partOfSpeech An integral encoding representing the part-of-speech for the lemma uima.cas.Integer frost_ExtendedPOS An integer representing additional information related to the part-of-speech uima.cas.Integer isStopword uima.cas.Boolean uima.tt.LanguageConfidencePair Language-Confidence pair of a language candidate for the document text uima.cas.TOP languageConfidence An indication (a float value between 0 and 1) of how well the candidate language actually fits the language of the document uima.cas.Float language Language name (ISO Locale code) uima.cas.String com.ibm.langware.uimatypes.WordLikeToken Base type for possible words (not punctuations nor symbols). Also represents alphanumeric tokens uima.tt.TokenAnnotation com.ibm.langware.uimatypes.Alphabetic Alphabetic word com.ibm.langware.uimatypes.WordLikeToken com.ibm.langware.uimatypes.UppercaseAlphabetic Uppercase alphabetic word com.ibm.langware.uimatypes.Alphabetic com.ibm.langware.uimatypes.TitlecaseAlphabetic Titlecase alphabetic word com.ibm.langware.uimatypes.Alphabetic com.ibm.langware.uimatypes.LowercaseAlphabetic Lowercase alphabetic word com.ibm.langware.uimatypes.Alphabetic com.ibm.langware.uimatypes.Arabic Arabic word com.ibm.langware.uimatypes.Alphabetic com.ibm.langware.uimatypes.Hebrew Hebrew word com.ibm.langware.uimatypes.Alphabetic com.ibm.langware.uimatypes.Syllabic Syllabic word com.ibm.langware.uimatypes.WordLikeToken com.ibm.langware.uimatypes.Hiragana Hiragana (Syllabic) word com.ibm.langware.uimatypes.Syllabic com.ibm.langware.uimatypes.Katakana Katakana (Syllabic) word com.ibm.langware.uimatypes.Syllabic com.ibm.langware.uimatypes.Hangul Hangul (Syllabic) word com.ibm.langware.uimatypes.Syllabic com.ibm.langware.uimatypes.Ideographic Ideographic word com.ibm.langware.uimatypes.WordLikeToken com.ibm.langware.uimatypes.Han Han (Ideographic) word com.ibm.langware.uimatypes.Ideographic com.ibm.langware.uimatypes.Numeric A numeric sequence com.ibm.langware.uimatypes.WordLikeToken com.ibm.langware.uimatypes.ChineseNumeral A Chinese numeral com.ibm.langware.uimatypes.Numeric com.ibm.langware.uimatypes.Punctuation A punctuation or symbol uima.tt.TokenAnnotation com.ibm.langware.uimatypes.ClauseEndingPunctuation A clause terminating punctuation com.ibm.langware.uimatypes.Punctuation uima.tt.ParagraphAnnotation uima.tt.SentenceAnnotation uima.tt.TokenAnnotation uima.tt.TokenAnnotation:lemma uima.tt.TokenAnnotation:lemmaEntries x-unspecified uima.tt.ParagraphAnnotation uima.tt.SentenceAnnotation uima.tt.TokenAnnotation uima.tt.CompPartAnnotation uima.tt.Lemma uima.tt.ParagraphAnnotation:paragraphNumber uima.tt.SentenceAnnotation:sentenceNumber uima.tt.TokenAnnotation:posTag uima.tt.TokenAnnotation:lemmaEntries uima.tt.TokenAnnotation:dictionaryMatch uima.tt.Lemma:key uima.tt.Lemma:partOfSpeech uima.tt.Lemma:isStopword uima.tt.Lemma:frost_ExtendedPOS en af ar ca cs da de el es fr he it ja ko nb nl nn pl pt ru sv tr zh true true false ResourcesFile Location of Resources ../resources/Tagger/ / Resources ResourcesFile