TrashGPT

所属分类:GPT/ChatGPT
开发工具:Jupyter Notebook
文件大小:0KB
下载次数:0
上传日期:2023-12-05 22:46:54
上 传 者sh-1993
说明:  自动播客生成。
(Automatic podcast generation.)

文件列表:
data_prep/ (0, 2023-12-06)
data_prep/_dataset.txt (18206108, 2023-12-06)
data_prep/_dataset_old.txt (19296295, 2023-12-06)
data_prep/_dataset_timings.txt (22291451, 2023-12-06)
data_prep/annotated/ (0, 2023-12-06)
data_prep/annotated/Animals_We_Could_Beat_in_a_Fight__Trash_Taste_91_ZWS2nFo7eEo.en-ehkg1hFWq8A.txt (230373, 2023-12-06)
data_prep/annotated/Animals_We_Could_Beat_in_a_Fight__Trash_Taste_91_ZWS2nFo7eEo.en-en-ehkg1hFWq8A.txt (230373, 2023-12-06)
data_prep/annotated/Animals_We_Could_Beat_in_a_Fight__Trash_Taste_91_ZWS2nFo7eEo.en-orig.txt (236654, 2023-12-06)
data_prep/annotated/Animals_We_Could_Beat_in_a_Fight__Trash_Taste_91_ZWS2nFo7eEo.en.txt (236654, 2023-12-06)
data_prep/annotated/Anime_Convention_Horror_Stories__Trash_Taste_22_1t1lme5nGZE.en-ehkg1hFWq8A.txt (272712, 2023-12-06)
data_prep/annotated/Anime_Convention_Horror_Stories__Trash_Taste_22_1t1lme5nGZE.en-en-ehkg1hFWq8A.txt (272712, 2023-12-06)
data_prep/annotated/Are_Online_Friends_REAL_Friends__Trash_Taste_28_Zk8QMtjnRj0.en-ehkg1hFWq8A.txt (289055, 2023-12-06)
data_prep/annotated/Are_Online_Friends_REAL_Friends__Trash_Taste_28_Zk8QMtjnRj0.en-en-ehkg1hFWq8A.txt (289055, 2023-12-06)
data_prep/annotated/Are_Online_Friends_REAL_Friends__Trash_Taste_28_Zk8QMtjnRj0.en-orig.txt (282950, 2023-12-06)
data_prep/annotated/Are_Online_Friends_REAL_Friends__Trash_Taste_28_Zk8QMtjnRj0.en.txt (282950, 2023-12-06)
data_prep/annotated/CHRISTMAS_IS_CANCELED__Trash_Taste_131_0G4gYj446dA.en-orig.txt (221358, 2023-12-06)
data_prep/annotated/CHRISTMAS_IS_CANCELED__Trash_Taste_131_0G4gYj446dA.en.txt (221358, 2023-12-06)
data_prep/annotated/Christmas_Horror_Stories__Trash_Taste_29_TBGk1TKhVeA.en-ehkg1hFWq8A.txt (238106, 2023-12-06)
data_prep/annotated/Christmas_Horror_Stories__Trash_Taste_29_TBGk1TKhVeA.en-en-ehkg1hFWq8A.txt (238106, 2023-12-06)
data_prep/annotated/Christmas_Horror_Stories__Trash_Taste_29_TBGk1TKhVeA.en-orig.txt (230535, 2023-12-06)
data_prep/annotated/Christmas_Horror_Stories__Trash_Taste_29_TBGk1TKhVeA.en.txt (230535, 2023-12-06)
data_prep/annotated/College_Horror_Stories__Trash_Taste_12_GMCxI2KJp3Y.en-ehkg1hFWq8A.txt (240086, 2023-12-06)
data_prep/annotated/College_Horror_Stories__Trash_Taste_12_GMCxI2KJp3Y.en-en-ehkg1hFWq8A.txt (240086, 2023-12-06)
data_prep/annotated/College_Horror_Stories__Trash_Taste_12_GMCxI2KJp3Y.en-orig.txt (231933, 2023-12-06)
data_prep/annotated/College_Horror_Stories__Trash_Taste_12_GMCxI2KJp3Y.en.txt (231933, 2023-12-06)
data_prep/annotated/Cycling_is_HARD__Trash_Taste_117_bYIb-YaaCMY.en-orig.txt (240708, 2023-12-06)
data_prep/annotated/Cycling_is_HARD__Trash_Taste_117_bYIb-YaaCMY.en.txt (240708, 2023-12-06)
data_prep/annotated/Defending_Our_WORST_Takes_Weve_Had__Trash_Taste_130_ZkVuuKR8fg0.en-orig.txt (207541, 2023-12-06)
data_prep/annotated/Defending_Our_WORST_Takes_Weve_Had__Trash_Taste_130_ZkVuuKR8fg0.en.txt (207541, 2023-12-06)
data_prep/annotated/Do_We_Drink_Too_Much__Trash_Taste_48_p6JEsEmtsds.en-ehkg1hFWq8A.txt (275636, 2023-12-06)
data_prep/annotated/Do_We_Drink_Too_Much__Trash_Taste_48_p6JEsEmtsds.en-en-ehkg1hFWq8A.txt (275636, 2023-12-06)
data_prep/annotated/Do_We_Drink_Too_Much__Trash_Taste_48_p6JEsEmtsds.en-orig.txt (268222, 2023-12-06)
data_prep/annotated/Do_We_Drink_Too_Much__Trash_Taste_48_p6JEsEmtsds.en.txt (268222, 2023-12-06)
data_prep/annotated/Dont_Watch_Anime_to_Learn_Japanese__Trash_Taste_6_dgZDICFDY5o.en-ehkg1hFWq8A.txt (245386, 2023-12-06)
data_prep/annotated/Dont_Watch_Anime_to_Learn_Japanese__Trash_Taste_6_dgZDICFDY5o.en-en-ehkg1hFWq8A.txt (245386, 2023-12-06)
data_prep/annotated/Dont_Watch_Anime_to_Learn_Japanese__Trash_Taste_6_dgZDICFDY5o.en-orig.txt (224930, 2023-12-06)
data_prep/annotated/Dont_Watch_Anime_to_Learn_Japanese__Trash_Taste_6_dgZDICFDY5o.en.txt (224930, 2023-12-06)
... ...

# TrashGPT Welcome to our repository. This project was completed for CS 685 (Spring 2023). The goal of this project is to generate clips of the Trash Taste Podcast. Each folder specifies one aspect of the data pipeline. ## Setup There are many different subroutines in this repository, and I would recommend only installing the packages needed for that subroutine to avoid package conflicts. 1. Install Python 3.10.X 2. Setup a [virtual environment](https://docs.python.org/3/tutorial/venv.html): `python -m venv .venv` 3. Activate virtual environment. Command is platform dependent. 4. Install [PyTorch](https://pytorch.org/). Make sure to use the command on their site. It is platform dependent. 5. Install other requirements as seen in the README of the pipeline component you are running. ## Pipeline 1. Data download - Create directory `raw_data/`, move into the directory. - Run `yt-dlp.exe --yes-playlist --write-sub --write-auto-sub --sub-lang "en.*" --sub-format json3 -f m4a https://www.youtube.com/playlist?list=PLUHmmIt9sU6i4JlDABqLeWybD_ZrJf9LB` 2. Data preprocessing (look in `data_prep/`) - Speech diarization - Speaker naming tool - Dataset formulation 3. Training the models and generating transcripts (look in `modeling/`) - Training and generation code for GPT2-Small, GPT2-Med, Bloom 560M, and LLaMA 7B 4. Voice cloning and audio performance generation (look in `voice_cloning/`) - Transcript parser - TTS with voice cloning using [tortoise-tts](https://github.com/neonbjb/tortoise-tts) ## Results - Sample text generations are in `test/` - [Evaluation](https://docs.google.com/spreadsheets/d/1jtsw4g0nGK2sbywgND0AkrXzY3MM4u4L0yGolVhIMPU/edit?usp=sharing) - [LLaMA 7B Clip](https://youtu.be/rR67-ePpWF4) - [Bloom 560M Clip](https://youtu.be/DJM6BLNaWhI) - [GPT2-Medium Clip](https://youtu.be/DnMJ3biSkKQ) ## Other Documents - [Proposal](https://www.overleaf.com/project/63ebb492e3b98236eca9357b) - [Final Report](https://www.overleaf.com/6268427319fsgdzvcrfsyq) - [Dataset Playlist](https://youtube.com/playlist?list=PLUHmmIt9sU6i4JlDABqLeWybD_ZrJf9LB)

近期下载者

相关文件


收藏者