movie-review-sentiment-classification
所属分类:自然语言处理
开发工具:Scheme
文件大小:0KB
下载次数:0
上传日期:2019-04-28 06:36:19
上 传 者:
sh-1993
说明: 使用Word2Vec、双向LSTM、AWD LSTM...构建情感分类器,将电影评论文本和输出评级从1到10...,
(Building a sentiment classifier that takes movie review text and output rating from 1 to 10 using Word2Vec, bidirectional LSTM, AWD LSTM and more.)
文件列表:
.vscode/ (0, 2019-04-27)
.vscode/settings.json (84, 2019-04-27)
LICENSE (1060, 2019-04-27)
data/ (0, 2019-04-27)
data/test_set.ss (4736563, 2019-04-27)
data/training_set.ss (35837256, 2019-04-27)
data/validation_set.ss (4468503, 2019-04-27)
images/ (0, 2019-04-27)
images/drop_connect.jpg (65696, 2019-04-27)
language.w2v.model (27533625, 2019-04-27)
language/ (0, 2019-04-27)
language/movie_corpus.txt (10245491, 2019-04-27)
language/selected/ (0, 2019-04-27)
language/selected/0_0.txt (791, 2019-04-27)
language/selected/1000_0.txt (1558, 2019-04-27)
language/selected/1001_0.txt (4778, 2019-04-27)
language/selected/1002_0.txt (1527, 2019-04-27)
language/selected/1003_0.txt (498, 2019-04-27)
language/selected/1004_0.txt (1042, 2019-04-27)
language/selected/1005_0.txt (2151, 2019-04-27)
language/selected/1006_0.txt (2222, 2019-04-27)
language/selected/1007_0.txt (483, 2019-04-27)
language/selected/1008_0.txt (1475, 2019-04-27)
language/selected/1009_0.txt (2062, 2019-04-27)
language/selected/100_0.txt (315, 2019-04-27)
language/selected/1010_0.txt (909, 2019-04-27)
language/selected/1011_0.txt (687, 2019-04-27)
language/selected/1012_0.txt (634, 2019-04-27)
language/selected/1013_0.txt (2301, 2019-04-27)
language/selected/1014_0.txt (819, 2019-04-27)
language/selected/1015_0.txt (4582, 2019-04-27)
language/selected/1016_0.txt (2361, 2019-04-27)
language/selected/1017_0.txt (968, 2019-04-27)
language/selected/1018_0.txt (705, 2019-04-27)
language/selected/1019_0.txt (930, 2019-04-27)
language/selected/101_0.txt (423, 2019-04-27)
language/selected/1020_0.txt (2399, 2019-04-27)
language/selected/1021_0.txt (3447, 2019-04-27)
... ...
# Movie Review Sentiment Classification
> The project is done using Jupyter Notebook with Python 3.7, PyTorch 1.0.1, fastai 1.0.52, gensim, ...
Building a sentiment classifier that takes movie review text and output rating from 1 to 10 using `Word2Vec`, `Bidirectional LSTM`, `AWD LSTM` and more.
## Directory Structure
```
project
├─data
│ ├─test_set.ss Test dataset
│ ├─training_set.ss Training dataset
│ └─validation.ss Validation dataset
├─images Notebook images
├─language
│ └─movie_corpus.txt Corpus for training Word2Cec model
├─rating_model_fastai.ipynb Plan B notebook
├─rating_model.ipynb Plan A notebook
├─word2vec.ipynb Word2Vec model training notebook
│
...
```
## Report
Reports with implementation introduction, code explanation and result analysis are all embedded in the notebooks for better coherence.
## Plan A: Bidirectional LSTM with word2vec as embedding
**Training Word2Vec model**
The corpus I used is a self-made 10M `movie review + Harry Potter` sentence collection. File at [language/movie_corpus.txt](https://github.com/dizys/movie-review-sentiment-classification/blob/master/./language/movie_corpus.txt). The dimension of Word2Vec model is 100.
Please see [word2vec.ipynb](https://github.com/dizys/movie-review-sentiment-classification/blob/master/./word2vec.ipynb)
**Training Classifier**
Mainly use PyTorch
Please see [rating_model.ipynb](https://github.com/dizys/movie-review-sentiment-classification/blob/master/./rating_model.ipynb)
## Plan B: Transfer Learning LSTM using FastAI
Mainly use FastAI - an high-level library for easier working with PyTorch.
Please see [rating_model_fastai.ipynb](https://github.com/dizys/movie-review-sentiment-classification/blob/master/./rating_model_fastai.ipynb)
## Predicts of Test Set
Choosing the result of 'Plan B' for its better performance.
Please see [senti_output.ss](https://github.com/dizys/movie-review-sentiment-classification/blob/master/./senti_output.ss)
## License
MIT, see the [LICENSE](https://github.com/dizys/movie-review-sentiment-classification/blob/master//LICENSE) file for details.
近期下载者:
相关文件:
收藏者: