Num. | Name | Last modification date |
---|---|---|
1. | cache | 2005.08.24 |
2. | calc_perplexity | 2006.04.26 |
3. | cluster | 2003.09.10 |
4. | decay | 2005.08.26 |
5. | dl | 2006.04.26 |
6. | evallm_1 | 2002.12.02 |
7. | gram2gramcl | 2003.09.03 |
8. | idg2ids_idg | 2006.03.02 |
9. | idg2ids_idgs | 2005.11.15 |
10. | idg2rev_idg | 2004.11.18 |
11. | idngram2lm | 2005.11.24 |
12. | initClasses | 2004.01.27 |
13. | intCache | 2004.07.07 |
14. | interpolateEM | 2005.08.23 |
15. | kn1 | 2006.04.26 |
16. | lmManager | 2006.06.06 |
17. | mergeGrams | 2004.02.05 |
18. | rescorer | 2006.06.06 |
19. | skaldymas | 2006.03.07 |
20. | skaldymas_class | 2003.12.02 |
21. | text2idngram | 2005.11.24 |
22. | text2idngram_s | 2006.11.24 |
23. | text2wfreq | 2004.05.13 |
24. | topicCluster | 2004.08.23 |
25. | vocabFromTrigram | 2004.08.23 |
26. | wfreq2id_wc | 2004.03.30 |
27. | wfreq2idsg | 2006.10.09 |
28. | wfreq2vocab | 2004.03.30 |
29. | zodynasClass | 2003.04.28 |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | cacheSize | In | int | 1..integer - ; |
2. | decay | In | file | Decay file for cache models |
3. | fileout | Out | file | Probability file((Word Num.); probability;) |
4. | fileout1 | Out | file | File for output of bigram cache information: 1-if w(i-1) in cache, 0-otherwise |
5. | start | In | string | start - starts normal cache; start_bigram - starts bigram cache; start_bigram_decay - starts bigram cache with decay function; start_decay - starts cache with decay function; |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | close | In | int | 1 - Program exits after calculation has been completed; |
2. | count | In | int | 0 - word count will be detected automatically according "datafile"; k - word count in test corpus; |
3. | datafile | In | file | Probability file((Word Num.); probability;) |
4. | info | Out | file | Out text file for the information about the calculation process |
5. | start | In | int | 1 - Starts calculating; |
6. | startin | In | string | - Starting directory; |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | cCount | In | int | k - Class count k; |
2. | classf | In | file | Class map file (Word class) |
3. | classfOut | Out | file | Class map file (Word class) |
4. | info | Out | file | Out text file for the information about the calculation process |
5. | iteration | In | int | 1.. - Iterations of clustering algorithm; |
6. | vocSize | In | int | v - Vocabulary size v; |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | cacheSize | In | int | 1.. - cache size for decay function; |
2. | d | In | int | 1..3 - specifies the calculated reoccurrences position. Is used with "start" and "startbigram"; |
3. | fileout | Out | file | Decay file for cache models |
4. | start | In | string | start - Calculates histogram of word reoccurrences; startbigram - Calculates histogram of bigram reoccurrences; startdecayem - decay function is being evaluated using EM; startdecayembi - bigram decay function is being evaluated using EM; startword - Calculates histogram of word reoccurrences for specified "word"; |
5. | word | In | string | - specifies the word to estimate decay for; |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | beamSize | In | int | p - beam size for Viterbi search; |
2. | contextWords | In | int | 0 - no cross-word triphones; k>0 - context phonemes are used by expanding k last words; |
3. | featuresIn | In | file | , text file of feature vectors (in HTK format); |
4. | fileOut | Out | file | N-best list file in mlf format (HTK), recognized sequences; |
5. | hmm | In | file | , acoustic models in HTK format; |
6. | info | Out | file | Out text file for the information about the calculation process |
7. | lmContext | In | int | (n-1)>0 - order of ngram model; 0 - unigram; 1 - bigram; |
8. | lmScale | In | int | 0 - language model is not used; >0 - language model weight; |
9. | nbest | In | int | 1 - returns only the best word sequence; N - returns N-best list; |
10. | pron | In | file | , pronunciation vocabulary (in HTK format); |
11. | skipFrames | In | int | s - skips s frames in stack search; |
12. | start | In | string | decode - decodes; load - load acoustic models for online decoding; |
13. | tiedList | In | file | , tied list of phonemes in HTK format; |
14. | tree | In | file | , questions tree (in HTK format) for synthesis of new phonemes; |
15. | useSynthesis | In | int | 0 - none; 1 - synthesizes new phonemes according to question tree; |
16. | wordEndBeamSize | In | int | p - beam size at the word level; |
17. | wordInsertionPenalty | In | int | p - word insertion penalty; |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | -annotate | Out | file | Annotation |
2. | -binary | In | file | Binary language model file |
3. | -fl | In | file | File list of text corpus |
4. | -oovs | Out | file | OOV file (word) |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | classf | In | file | Class map file (Word class) |
2. | gram | In | file | Idngram file (id0 id1 ... idN count), Word trigram; |
3. | gramOut | Out | file | Idngram file (id0 id1 ... idN count), Class trigram; |
4. | info | Out | file | Out text file for the information about the calculation process |
5. | start | In | int | 1 - starts working; |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | galuniuFile | In | file | Ordered list of endings |
2. | idngram | In | file | Idngram file (id0 id1 ... idN count) |
3. | idngramG | Out | file | Idngram file (id0 id1 ... idN count), idtrigram of word endings; |
4. | idngramS | Out | file | Idngram file (id0 id1 ... idN count), idtrigram of word beginning; |
5. | neskaidytiFile | In | file | , list of words that will not be split; |
6. | vocab | In | file | Vocabulary |
7. | vocabG | Out | file | Vocabulary, Vocabulary of word endings; |
8. | vocabGFreq | Out | file | Word frequency file, frequency file of word endings; |
9. | vocabS | Out | file | Vocabulary, Vocabulary of word beginning; |
10. | vocabSkaldymas | Out | file | , Word splitting information; |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | galuniuFile | In | file | Ordered list of endings |
2. | idngram | In | file | Idngram file (id0 id1 ... idN count) |
3. | idngramGS | Out | file | Idngram file (id0 id1 ... idN count), idngram ggsg (ending ending beginning ending); |
4. | idngramS | Out | file | Idngram file (id0 id1 ... idN count), idtrigram of word beginning; |
5. | neskaidytiFile | In | file | , list of words that will not be split; |
6. | vocab | In | file | Vocabulary |
7. | vocabG | Out | file | Vocabulary, Vocabulary of word endings; |
8. | vocabGFreq | Out | file | Word frequency file, frequency file of word endings; |
9. | vocabS | Out | file | Vocabulary, Vocabulary of word beginning; |
10. | vocabSkaldymas | Out | file | , Word splitting information; |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | -ascii_input | In | - Idngram is loaded from text file; | |
2. | -fin | In | file | Idngram file (id0 id1 ... idN count) |
3. | -fout | Out | file | Reversed idngram file (idN id(N-1) ... id0 count) |
4. | -n N | In | int | 1..integer - Order of the ngram; |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | -ascii_input | In | - Idngram is loaded from text file; | |
2. | -binary | Out | file | Binary language model file |
3. | -good_turing | In | - Smoothing type; | |
4. | -idngram | In | file | Idngram file (id0 id1 ... idN count) |
5. | -n N | In | int | 1..integer - Order of the ngram; |
6. | -vocab | In | file | Vocabulary |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | cCount | In | int | k - Class count k; |
2. | fOut | Out | file | Class map file (Word class) |
3. | info | Out | file | Out text file for the information about the calculation process |
4. | wCount | In | int | v - Vocabulary size v; |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | ann | Out | file | File for dynamic interpolated lambdas |
2. | cache | In | int | 1..integer - cache for dynamic interpolate; |
3. | datafile0..2 | In | file | Probability file((Word Num.); probability;) |
4. | datafile2i | In | file | File for output of bigram cache information: 1-if w(i-1) in cache, 0-otherwise |
5. | lamdastart | In | file | Lambda file for models interpolation arrayd: double double ..., array of lambdas for ngram, cache unigram and bigram; |
6. | lamdastart1 | In | file | Lambda file for models interpolation arrayd: double double ..., array of lambdas for ngram and cache unigram; |
7. | probCount | In | int | 3 - count of interpolation models; |
8. | probOut | Out | file | Probability file((Word Num.); probability;) |
9. | start | In | string | calculate - Calculates perplexity and OOV; interpolate_dynamic - Dynamic interpolation. Model weighs are set using "cache" history before calculating word probability estimate; |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | ann | Out | file | File for dynamic interpolated lambdas |
2. | cache | In | int | 1..integer - cache for dynamic interpolate; |
3. | datafile0..(probCount-1) | In | file | Probability file((Word Num.); probability;) |
4. | lamdaout | Out | file | Lambda file for models interpolation arrayd: double double ... |
5. | lamdastart | In | file | Lambda file for models interpolation arrayd: double double ... |
6. | probCount | In | int | 1..integer - count of interpolation models; |
7. | probOut | Out | file | Probability file((Word Num.); probability;) |
8. | start | In | string | calculate - Calculates perplexity and OOV; interpolate - Interpolates; interpolate_dynamic - Dynamic interpolation. Model weighs are set using "cache" history before calculating word probability estimate; |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | ann | Out | file | Annotation |
2. | close | In | int | 1 - Program exits after calculation has been completed; |
3. | filein | In | file | Text file for word ngram probabilities evaluation (Word num ; X; word0 ;...; wordX; ) |
4. | fileout | Out | file | Probability file((Word Num.); probability;) |
5. | gramKN | In | file | Reversed idngram file (idN id(N-1) ... id0 count) |
6. | gramVoc0..N | In | int | X - gramVocA=X - ngram Ath id of ngram is from vocabulary vocabKNX; |
7. | info | Out | file | Out text file for the information about the calculation process |
8. | n=N | In | int | 1..integer - Order of the ngram; |
9. | oov | Out | file | OOV file (word) |
10. | start | In | string | calc - Calculates ngram probabilities; |
11. | startin | In | string | - Starting directory; |
12. | vocabKN0..N | In | file | Vocabulary |
13. | wordOrder | In | int | 1 - "filein" represents a data file for evaluation of model P(g|ggs); 2 - "filein" represents a word endings ngram file; 3 - "filein" represents a word beginning ending ngram file for evaluation of model P(g|s); 4 - "filein" represents a word ngram file; |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | EMHist | In | int | 1.. - word count for setting dynamic model weights (lamdaDinamic=1); |
2. | info | Out | file | Out text file for the information about the calculation process |
3. | lamdaDinamic | In | int | 0 - static model weights; 1 - dynamic model weights; |
4. | lm | In | file | , settings file for model information; |
5. | start | In | string | load - load language model; |
6. | type | In | int | 1 - word ngram with Good-Turing smoothing. See settings file; 10 - skip bigram model. See settings file; 2 - cache models. See settings file; 3 - topic mixture model. See settings file; 5 - class-based model. See settings file; 8 - word ngram with Kneser-Ney smoothing. See settings file; |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | fileList | In | file | List of links to text idunigram files of particular topic |
2. | fileOut | Out | file | Idngram file (id0 id1 ... idN count), topic idtrigram file; |
3. | info | Out | file | Out text file for the information about the calculation process |
4. | start | In | int | 1 - starts working; |
5. | vocab | In | file | Vocabulary |
6. | vocabOut | In | file | Vocabulary, topic vocabulary; |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | amFactor | In | float | x - acoustic model weight; |
2. | format | In | int | v - Vocabulary size v; |
3. | info | Out | file | Out text file for the information about the calculation process |
4. | insertPenalty | In | float | x - word insertion penalty; |
5. | listIn | In | file | Text file for word ngram probabilities evaluation (Word num ; X; word0 ;...; wordX; ) |
6. | lmFactor | In | float | x - language model weight; |
7. | nbest | In | int | N - rescores the list of N sequences; |
8. | nbestIn | In | file | N-best list file in mlf format (HTK), N-best list; |
9. | out | Out | file | N-best list file in mlf format (HTK), rescored best list out (start="nbest"); Probability file((Word Num.); probability;), start="calcperplexity"; |
10. | saveall | In | int | 0 - only the best sequence is saved to out file; 1 - all sequences are saved to out file; |
11. | start | In | string | calcperplexitylist - calculates perplexity; nbest - rescores N-best list; |
12. | usecache | In | int | 0 - the last best word sequence is not added to word history that is used for evaluation of further sequences; 1 - the last best word sequence is added to word history that is used for evaluation of further sequences; |
13. | usereverse | In | int | 0 - none; 1 - reversed model evaluation; |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | filein | In | file | File list of text corpus |
2. | fileout | Out | file | Text file for word ngram probabilities evaluation (Word num ; X; word0 ;...; wordX; ) |
3. | galuniuFile | In | file | Ordered list of endings |
4. | neskaidytiFile | In | file | , list of words that will not be split; |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | close | In | int | 1 - Program exits after calculation has been completed; |
2. | filein | In | file | File list of text corpus |
3. | fileout | Out | file | Text file for word ngram probabilities evaluation (Word num ; X; word0 ;...; wordX; ) |
4. | info | Out | file | Out text file for the information about the calculation process |
5. | n=N | In | int | 1..integer - Order of the ngram; |
6. | skip=s | In | int | 0..integer - skip of s words for skip bigram model (start="start_2g_skip"); |
7. | start | In | string | start - prepares file for class evaluation; start_2g_skip - prepares file for skip bigram evaluation; start_3g - prepares file for trigram evaluation; start_3g_reverse - prepares file for reverse trigram evaluation ; start_reverse - prepares file for reverse class evaluation ; |
8. | startin | In | string | - Starting directory; |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | -fl | In | file | File list of text corpus |
2. | -fout | Out | file | Idngram file (id0 id1 ... idN count) |
3. | -n N | In | int | 1..integer - Order of the ngram; |
4. | -vocab | In | file | Vocabulary |
5. | -write_ascii | In | - Result in text format; |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | -fl | In | file | File list of text corpus |
2. | -fout | Out | file | Idngram file (id0 id1 ... idN count) |
3. | -skip s | In | int | 0..integer - skip words; |
4. | -vocab | In | file | Vocabulary |
5. | -write_ascii | In | - Result in text format; |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | -fl | In | file | File list of text corpus |
2. | -fout | Out | file | Word frequency file |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | articleCount | In | int | a - text count v; |
2. | articleList | In | file | List of links to text idunigram files (number; idunigram) |
3. | classCount | In | int | k - Class count k; |
4. | classf | In | file | Class map file (Word class) |
5. | classfOut | Out | file | Class map file (Word class) |
6. | fileList | Out | file | List of links to text idunigram files of particular topic, file name for topic idngram list (start="start_filelist"); |
7. | info | Out | file | Out text file for the information about the calculation process |
8. | iteration | In | int | 1.. - Iterations of clustering algorithm; |
9. | newExt | Out | string | - extension of idngrams (start="start_filelist"); |
10. | newPath | Out | string | - path of idngrams (start="start_filelist"); |
11. | start | In | string | start_filelist - prepares idngram file list for every topic; start_tfidf - clustering using TFIDF criterion; start_unigram - clustering using unigram perplexity criterion; |
12. | wordCount | In | int | v - Vocabulary size v; |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | fileList | In | file | List of links to text idunigram files of particular topic |
2. | info | Out | file | Out text file for the information about the calculation process |
3. | newVocab | Out | file | Vocabulary, topic vocabulary; |
4. | start | In | int | 1 - starts working; |
5. | vocab | In | file | Vocabulary |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | 2gram | Out | file | Idngram file (id0 id1 ... idN count), Class-word idbigram; |
2. | classf | In | file | Class map file (Word class) |
3. | vocab | In | file | Vocabulary |
4. | wfreq | In | file | Word frequency file |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | 2gram | Out | file | Idngram file (id0 id1 ... idN count), Idbigram of word beginning and ending; |
2. | galuniuFile | In | file | Ordered list of endings |
3. | neskaidytiFile | In | file | , list of words that will not be split; |
4. | vocabG | In | file | Vocabulary, Vocabulary of word endings; |
5. | vocabS | In | file | Vocabulary, Vocabulary of word beginning; |
6. | wfreq | In | file | Word frequency file |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | -gt N | In | int | 1..integer - The words that appeared in text corpus more than N - 1 times; |
2. | -top N | In | int | 1..integer - The N most frequent words; |
3. | < | In | file | Word frequency file |
4. | > | Out | file | Vocabulary |
N. | Name | In/Out | Type | Info |
---|---|---|---|---|
1. | classf | In | file | Class map file (Word class) |
2. | info | Out | file | Out text file for the information about the calculation process |
3. | start | In | int | 1 - starts working; |
4. | vocabclass | Out | file | Vocabulary, Class vocabulary; |
N. | Name | Generated by | Sample |
---|---|---|---|
1. | Annotation | evallm_1 | 01.anot |
2. | Binary language model file | idngram2lm | |
3. | Class map file (Word class) | topicCluster ; initClasses ; cluster | 01.cla |
4. | Decay file for cache models | decay | 01.decay |
5. | File for dynamic interpolated lambdas | interpolateEM | |
6. | File for output of bigram cache information: 1-if w(i-1) in cache, 0-otherwise | cache | 03.ca |
7. | File list of text corpus | 01.fsr | |
8. | File sample of text corpus | 10004.txt | |
9. | Idngram file (id0 id1 ... idN count) | gram2gramcl ; mergeGrams ; text2idngram | 01.2gram |
10. | Lambda file for models interpolation arrayd: double double ... | interpolateEM | 01_lambdas.txt |
11. | List of links to text idunigram files (number; idunigram) | 02.list | |
12. | List of links to text idunigram files of particular topic | topicCluster | topic.list_0 |
13. | N-best list file in mlf format (HTK) | dl ; rescorer | |
14. | OOV file (word) | evallm_1 | 01.oov |
15. | Ordered list of endings | galunes.txt | |
16. | Out text file for the information about the calculation process | ||
17. | Probability file((Word Num.); probability;) | interpolateEM ; kn1 | 01.prob |
18. | Reversed idngram file (idN id(N-1) ... id0 count) | idg2rev_idg | 01_r.2gram |
19. | Text file for word ngram probabilities evaluation (Word num ; X; word0 ;...; wordX; ) | skaldymas_class | 01.2txt |
20. | Vocabulary | zodynasClass ; vocabFromTrigram ; wfreq2vocab | 01.vocab |
21. | Word frequency file | text2wfreq | 01.wfreq |