DataCampus-KoreaUniv-Team7
所属分类:特征抽取
开发工具:Jupyter Notebook
文件大小:67954KB
下载次数:0
上传日期:2020-09-01 05:13:35
上 传 者:
sh-1993
说明: Youtube新闻摘要
(Youtube News Summarizer)
文件列表:
1. Crawling (0, 2020-09-01)
1. Crawling\.ipynb_checkpoints (0, 2020-09-01)
1. Crawling\.ipynb_checkpoints\daum_crawling-checkpoint.ipynb (13112, 2020-09-01)
1. Crawling\.ipynb_checkpoints\naver_crawl-checkpoint.ipynb (8490, 2020-09-01)
1. Crawling\01_naver_crawling.ipynb (132140, 2020-09-01)
1. Crawling\02_daum_crawling.ipynb (13112, 2020-09-01)
1. Crawling\03_youtube_crawling.ipynb (43257, 2020-09-01)
1. Crawling\04_youtube_crawling_using_api.ipynb (21531, 2020-09-01)
1. Crawling\05_youtube_crawling_api_pytube.ipynb (44468, 2020-09-01)
1. Crawling\06_youtube_comment_crawling.ipynb (9766, 2020-09-01)
2. Preprocessing (0, 2020-09-01)
2. Preprocessing\.ipynb_checkpoints (0, 2020-09-01)
2. Preprocessing\.ipynb_checkpoints\01_stopwords_list-checkpoint.ipynb (9457, 2020-09-01)
2. Preprocessing\.ipynb_checkpoints\preprocessing-checkpoint.ipynb (260954, 2020-09-01)
2. Preprocessing\.ipynb_checkpoints\stopwords_list-checkpoint.ipynb (9457, 2020-09-01)
2. Preprocessing\01_naverdaum (0, 2020-09-01)
2. Preprocessing\01_naverdaum\.ipynb_checkpoints (0, 2020-09-01)
2. Preprocessing\01_naverdaum\.ipynb_checkpoints\01_train_cleaning-checkpoint.ipynb (56319, 2020-09-01)
2. Preprocessing\01_naverdaum\.ipynb_checkpoints\02_train_tokenizing_normalizing-checkpoint.ipynb (148692, 2020-09-01)
2. Preprocessing\01_naverdaum\01_train_cleaning.ipynb (56319, 2020-09-01)
2. Preprocessing\01_naverdaum\02_train_tokenizing_normalizing.ipynb (148692, 2020-09-01)
2. Preprocessing\01_naverdaum\try (0, 2020-09-01)
2. Preprocessing\01_naverdaum\try\.ipynb_checkpoints (0, 2020-09-01)
2. Preprocessing\01_naverdaum\try\.ipynb_checkpoints\01_preprocessing_1-checkpoint.ipynb (142811, 2020-09-01)
2. Preprocessing\01_naverdaum\try\.ipynb_checkpoints\02_preprocessing_2_Stemming-checkpoint.ipynb (2892425, 2020-09-01)
2. Preprocessing\01_naverdaum\try\01_preprocessing_1.ipynb (142811, 2020-09-01)
2. Preprocessing\01_naverdaum\try\02_preprocessing_2_Stemming.ipynb (2892425, 2020-09-01)
2. Preprocessing\01_naverdaum\try\03_body_kobert_embedding.ipynb (19261, 2020-09-01)
2. Preprocessing\01_naverdaum\try\04_convert_token_to_id_and_padding.ipynb (17313, 2020-09-01)
2. Preprocessing\02_youtube_pipeline (0, 2020-09-01)
2. Preprocessing\02_youtube_pipeline\.ipynb_checkpoints (0, 2020-09-01)
2. Preprocessing\02_youtube_pipeline\.ipynb_checkpoints\02_main-checkpoint.ipynb (32230, 2020-09-01)
2. Preprocessing\02_youtube_pipeline\01_main.ipynb (41570, 2020-09-01)
2. Preprocessing\02_youtube_pipeline\02_main.ipynb (31918, 2020-09-01)
2. Preprocessing\02_youtube_pipeline\03_wordcloud.ipynb (2751869, 2020-09-01)
2. Preprocessing\02_youtube_pipeline\__pycache__ (0, 2020-09-01)
2. Preprocessing\02_youtube_pipeline\__pycache__\cleaning.cpython-37.pyc (18256, 2020-09-01)
2. Preprocessing\02_youtube_pipeline\__pycache__\crawling.cpython-37.pyc (6167, 2020-09-01)
... ...
# YouTube Summary : YouTube
by. 7
- (ek-koh)
- (popo97kr)
- (BaeHyungjun)
##
`Naver , Daum ` , , , `YouTube ` .
##
1. **Crawling** : Naver Daum , YouTube .
2. **Preprocessing** : Naver, Daum, YouTube Data Cleaning, Tokenizing, Normalizing, Stemming, Integer Encoding .
3. **CommentSentimentAnalysis** : YouTube .
4. **Modeling** : .
5. **Flask** : PC/ .
> ** ** : 1 ~ 5 . , 1 ~ 5 .
## &
link: https://drive.google.com/drive/u/0/folders/1EXBU7Cwbb7DhtwldgVqwRTT6IYf9k_Mu
- [](https://drive.google.com/drive/u/0/folders/10IFZEeVduhyJR4LZlH9iHobNBvRHfyAg)
- [](https://drive.google.com/drive/u/0/folders/1YC38uwdjvp77PNPvw12OkeSMBITJPW__)
##
TV , SNS . , 2019 1 22 SNS 1600 . , , , , , .
. 2018 2 , 2016 2018 3 (, , , ) . 2020 .
YouTube , YouTube . [ 2020](http://www.digitalnewsreport.org/) , SNS 44%( ) 2019 26% SNS YouTube 45%.
, , . . . , , AI (summary) , (Time Poor) .
##
| |Naver|Daum|YouTube|
|:---:|:---:|:---:|:---:|
|| / / / | / / / | / / |
||Beautifulsoup|Beautifulsoup|Pytube & YouTube API|
|| 112,466| 109,419||
||Training / Validation|Training / Validation|Test|
|| 2 , | 3 , |YTN, SBS, KBS, MBC, JTBC, MBN, A, |
## /
![image](https://user-images.githubusercontent.com/58713684/91732327-0e67ab00-ebe3-11ea-8160-fbfe9a782811.png)
- `Requests`, `BeautifulSoup`, `pytube`, `YoutubeAPI` `KoNLPy` `NLTK` .
- `pandas`, `sqlite` , `matplotlib`, `wordcloud` `tensorflow` `Gensim` .
- `pycharm` `flask` `azure mysql` , `Swing2App` .
##
![image](https://user-images.githubusercontent.com/58713684/91732396-263f2f00-ebe3-11ea-915f-dbb6cbd91140.png)
- train data x, y tokenizing , koNLPy .
- , textrank gensim summarizer .
## EDA
EDA [EDA.md](https://github.com/BaeHyungjun/DataCampus-KoreaUniv-Team7/blob/master/preprocessing/EDA.md) .
- groupby
- Summary
- Zipf's law: barplot ()
##
`Attention sequence to sequence ` . encoder decoder text summarization .
**Seq2Seq with Attention**
![image](https://user-images.githubusercontent.com/58713684/89136169-4***7a780-d56d-11ea-9f4c-7dd2687327fe.png)
## /
, . .
.
![image](https://user-images.githubusercontent.com/58713684/91732204-e37d5700-ebe2-11ea-8824-5b7e0ddef4be.png)
. .
.
![image](https://user-images.githubusercontent.com/58713684/91732493-440c9400-ebe3-11ea-91af-5c2562905adf.png)
/ , PC .
![image](https://user-images.githubusercontent.com/58713684/91733077-11af6680-ebe4-11ea-9d52-d4cbb2eb4bd9.png)
##
- `, ` : , , .
- ` YouTube ` : , ML/DL .
##
| |||
|:---|:---|:---|
| | | |
| | / |(Time Poor) |
近期下载者:
相关文件:
收藏者: