DataCampus-KoreaUniv-Team7

所属分类:特征抽取
开发工具:Jupyter Notebook
文件大小:67954KB
下载次数:0
上传日期:2020-09-01 05:13:35
上 传 者sh-1993
说明:  Youtube新闻摘要
(Youtube News Summarizer)

文件列表:
1. Crawling (0, 2020-09-01)
1. Crawling\.ipynb_checkpoints (0, 2020-09-01)
1. Crawling\.ipynb_checkpoints\daum_crawling-checkpoint.ipynb (13112, 2020-09-01)
1. Crawling\.ipynb_checkpoints\naver_crawl-checkpoint.ipynb (8490, 2020-09-01)
1. Crawling\01_naver_crawling.ipynb (132140, 2020-09-01)
1. Crawling\02_daum_crawling.ipynb (13112, 2020-09-01)
1. Crawling\03_youtube_crawling.ipynb (43257, 2020-09-01)
1. Crawling\04_youtube_crawling_using_api.ipynb (21531, 2020-09-01)
1. Crawling\05_youtube_crawling_api_pytube.ipynb (44468, 2020-09-01)
1. Crawling\06_youtube_comment_crawling.ipynb (9766, 2020-09-01)
2. Preprocessing (0, 2020-09-01)
2. Preprocessing\.ipynb_checkpoints (0, 2020-09-01)
2. Preprocessing\.ipynb_checkpoints\01_stopwords_list-checkpoint.ipynb (9457, 2020-09-01)
2. Preprocessing\.ipynb_checkpoints\preprocessing-checkpoint.ipynb (260954, 2020-09-01)
2. Preprocessing\.ipynb_checkpoints\stopwords_list-checkpoint.ipynb (9457, 2020-09-01)
2. Preprocessing\01_naverdaum (0, 2020-09-01)
2. Preprocessing\01_naverdaum\.ipynb_checkpoints (0, 2020-09-01)
2. Preprocessing\01_naverdaum\.ipynb_checkpoints\01_train_cleaning-checkpoint.ipynb (56319, 2020-09-01)
2. Preprocessing\01_naverdaum\.ipynb_checkpoints\02_train_tokenizing_normalizing-checkpoint.ipynb (148692, 2020-09-01)
2. Preprocessing\01_naverdaum\01_train_cleaning.ipynb (56319, 2020-09-01)
2. Preprocessing\01_naverdaum\02_train_tokenizing_normalizing.ipynb (148692, 2020-09-01)
2. Preprocessing\01_naverdaum\try (0, 2020-09-01)
2. Preprocessing\01_naverdaum\try\.ipynb_checkpoints (0, 2020-09-01)
2. Preprocessing\01_naverdaum\try\.ipynb_checkpoints\01_preprocessing_1-checkpoint.ipynb (142811, 2020-09-01)
2. Preprocessing\01_naverdaum\try\.ipynb_checkpoints\02_preprocessing_2_Stemming-checkpoint.ipynb (2892425, 2020-09-01)
2. Preprocessing\01_naverdaum\try\01_preprocessing_1.ipynb (142811, 2020-09-01)
2. Preprocessing\01_naverdaum\try\02_preprocessing_2_Stemming.ipynb (2892425, 2020-09-01)
2. Preprocessing\01_naverdaum\try\03_body_kobert_embedding.ipynb (19261, 2020-09-01)
2. Preprocessing\01_naverdaum\try\04_convert_token_to_id_and_padding.ipynb (17313, 2020-09-01)
2. Preprocessing\02_youtube_pipeline (0, 2020-09-01)
2. Preprocessing\02_youtube_pipeline\.ipynb_checkpoints (0, 2020-09-01)
2. Preprocessing\02_youtube_pipeline\.ipynb_checkpoints\02_main-checkpoint.ipynb (32230, 2020-09-01)
2. Preprocessing\02_youtube_pipeline\01_main.ipynb (41570, 2020-09-01)
2. Preprocessing\02_youtube_pipeline\02_main.ipynb (31918, 2020-09-01)
2. Preprocessing\02_youtube_pipeline\03_wordcloud.ipynb (2751869, 2020-09-01)
2. Preprocessing\02_youtube_pipeline\__pycache__ (0, 2020-09-01)
2. Preprocessing\02_youtube_pipeline\__pycache__\cleaning.cpython-37.pyc (18256, 2020-09-01)
2. Preprocessing\02_youtube_pipeline\__pycache__\crawling.cpython-37.pyc (6167, 2020-09-01)
... ...

# YouTube Summary : YouTube by. 7 - (ek-koh) - (popo97kr) - (BaeHyungjun) ## `Naver , Daum ` , , , `YouTube ` . ## 1. **Crawling** : Naver Daum , YouTube . 2. **Preprocessing** : Naver, Daum, YouTube Data Cleaning, Tokenizing, Normalizing, Stemming, Integer Encoding . 3. **CommentSentimentAnalysis** : YouTube . 4. **Modeling** : . 5. **Flask** : PC/ . > ** ** : 1 ~ 5 . , 1 ~ 5 . ## & link: https://drive.google.com/drive/u/0/folders/1EXBU7Cwbb7DhtwldgVqwRTT6IYf9k_Mu - [](https://drive.google.com/drive/u/0/folders/10IFZEeVduhyJR4LZlH9iHobNBvRHfyAg) - [](https://drive.google.com/drive/u/0/folders/1YC38uwdjvp77PNPvw12OkeSMBITJPW__) ## TV , SNS . , 2019 1 22 SNS 1600 . , , , , , . . 2018 2 , 2016 2018 3 (, , , ) . 2020 . YouTube , YouTube . [ 2020](http://www.digitalnewsreport.org/) , SNS 44%( ) 2019 26% SNS YouTube 45%. , , . . . , , AI (summary) , (Time Poor) . ## | |Naver|Daum|YouTube| |:---:|:---:|:---:|:---:| || / / / | / / / | / / | ||Beautifulsoup|Beautifulsoup|Pytube & YouTube API| || 112,466| 109,419|| ||Training / Validation|Training / Validation|Test| || 2 , | 3 , |YTN, SBS, KBS, MBC, JTBC, MBN, A, | ## / ![image](https://user-images.githubusercontent.com/58713684/91732327-0e67ab00-ebe3-11ea-8160-fbfe9a782811.png) - `Requests`, `BeautifulSoup`, `pytube`, `YoutubeAPI` `KoNLPy` `NLTK` . - `pandas`, `sqlite` , `matplotlib`, `wordcloud` `tensorflow` `Gensim` . - `pycharm` `flask` `azure mysql` , `Swing2App` . ## ![image](https://user-images.githubusercontent.com/58713684/91732396-263f2f00-ebe3-11ea-915f-dbb6cbd91140.png) - train data x, y tokenizing , koNLPy . - , textrank gensim summarizer . ## EDA EDA [EDA.md](https://github.com/BaeHyungjun/DataCampus-KoreaUniv-Team7/blob/master/preprocessing/EDA.md) . - groupby - Summary - Zipf's law: barplot () ## `Attention sequence to sequence ` . encoder decoder text summarization . **Seq2Seq with Attention** ![image](https://user-images.githubusercontent.com/58713684/89136169-4***7a780-d56d-11ea-9f4c-7dd2687327fe.png) ## / , . . . ![image](https://user-images.githubusercontent.com/58713684/91732204-e37d5700-ebe2-11ea-8824-5b7e0ddef4be.png) . . . ![image](https://user-images.githubusercontent.com/58713684/91732493-440c9400-ebe3-11ea-91af-5c2562905adf.png) / , PC . ![image](https://user-images.githubusercontent.com/58713684/91733077-11af6680-ebe4-11ea-9d52-d4cbb2eb4bd9.png) ## - `, ` : , , . - ` YouTube ` : , ML/DL . ## | ||| |:---|:---|:---| | | | | | | / |(Time Poor) |

近期下载者

相关文件


收藏者