nlg-a1

所属分类:自然语言处理
开发工具:Python
文件大小:12717KB
下载次数:0
上传日期:2018-03-07 16:24:52
上 传 者sh-1993
说明:  nlg-a1,关于自然语言生成的首次尝试;使用简单的LSTM-RNN生成字符级和...
(The first attempt about nature language generation; use simple LSTM RNN to generate text in both character level and word level)

文件列表:
DQN.py (1176, 2018-03-08)
batch_test.py (5125, 2018-03-08)
create_vocab.py (3032, 2018-03-08)
data (0, 2018-03-08)
data\Aesop's Fables.txt (142590, 2018-03-08)
data\Aesop's Fables_pro.txt (133118, 2018-03-08)
data\Aesop's Fables_pro_end.txt (133826, 2018-03-08)
data\Grimm's Fairy Tales.txt (522841, 2018-03-08)
data\fanren.txt (22758046, 2018-03-08)
data\test1.txt (978, 2018-03-08)
data\thrones1.txt (1648398, 2018-03-08)
data\thrones2.txt (1835701, 2018-03-08)
data\thrones3.txt (2380188, 2018-03-08)
data\thrones4.txt (1687919, 2018-03-08)
data\thrones5.txt (2390284, 2018-03-08)
data\wonderland.txt (169858, 2018-03-08)
demo_char.py (1912, 2018-03-08)
demo_char_addsigns.py (1992, 2018-03-08)
demo_word.py (2584, 2018-03-08)
demo_word2.py (3332, 2018-03-08)
n-gram_LM.py (2679, 2018-03-08)
nlg-eval-for1.py (297, 2018-03-08)
readCN.py (857, 2018-03-08)
train_char.py (4836, 2018-03-08)
train_charCN.py (5997, 2018-03-08)
train_char_lower.py (4694, 2018-03-08)
train_char_vec_CN.py (5427, 2018-03-08)
train_words.py (7182, 2018-03-08)
txtpreprocessing.py (1662, 2018-03-08)

# nlg-a1 The first attempt about nature language generation; use simple LSTM RNN to generate text in both character level and word level files starting with 'train' are used to train the constructed neural network. files starting with 'demo' are used to test the trained neural network. files with 'char' are designed for character-based NLG. files with 'word' are desined for word-based NLG. 'create_vocab' is used to create a universal vocabulary list which can be directly used in training process. 'n-gram_LM' is the n-gram language module which is used to calculate the perplexity. 'txtprocessing' is used to preprocess the texts. There are some special neural network designs for attempts: 'demo_char_addsigns' uses two different character-based RNN. One only contains the letters in the alphabet. The other only contains some commom signs. In this program, it will firstly generate letters, and then, add signs to devide these chracters. At present, it seems it does not work well. 'train_charCN' and 'train_vec_charCN' are attempts to process chinese characters. Just for try:) 'char_lower' means all the characters are lowercase. Some training data I used is in the data doc.

近期下载者

相关文件


收藏者