tts-reader
所属分类:语音合成
开发工具:TypeScript
文件大小:0KB
下载次数:0
上传日期:2023-12-03 22:18:27
上 传 者:
sh-1993
说明: 将新闻文章转换为您可以随时收听的`.flac`文件
(Convert news articles into `.flac` files that you can listen to at your own convenience)
文件列表:
.eslintrc.js (435, 2023-08-22)
.pre-commit-config.yaml (3992, 2023-08-22)
LICENSE (1067, 2023-08-22)
package-lock.json (326616, 2023-08-22)
package.json (1052, 2023-08-22)
src/ (0, 2023-08-22)
src/config.ts (493, 2023-08-22)
src/countLines.ts (450, 2023-08-22)
src/exclude-list.txt (40913, 2023-08-22)
src/findVoice.ts (3386, 2023-08-22)
src/functools.ts (600, 2023-08-22)
src/helpers.ts (595, 2023-08-22)
src/reshaper.ts (1868, 2023-08-22)
src/run.ts (2573, 2023-08-22)
src/splitFile.ts (2406, 2023-08-22)
src/utils.ts (878, 2023-08-22)
tsconfig.json (1000, 2023-08-22)
# TTS Reader
A preprocessing script for converting text to speech.
Helps convert news articles into `.flac` files that you can listen to at your
own convenience.
The idea behind this tool is to have a TXT file where the user would put web
pages in read mode and other text-based information separated by `...` or other
separation string you specify.
The script will read the data from the file, remove redundancy, adapt it for TTS
and covert it to `.mp3`
Previous implementations of the TTS Reader:
- [python_tts](https://github.com/maxpatiiuk/python_tts/)
- [TTS King](https://github.com/maxpatiiuk/tts_king/)
Example output:
```
Fetching the source file
Reshaping the text
Reduced the number of lines from 126,527 (72,756 non-blank) down to 61,707
Done!
Total time: 3s
```
## Installation
Install dependencies:
```sh
npm install
```
## Workflow
1. Create a big text file (articles from the web, a book, a document, etc)
2. Run this script to preprocess the file (see details on the processing done
below)
3. Run a text-to-speech software on the final file (i.e. macOS's `say`)
Example usage with macOS's `say`:
```sh
# Generate an out.txt, out_2.txt, ... files based on processed input
node --loader ts-node/esm/transpile-only src/run.ts --input ./in.txt --output ./out.txt
# Go though each text file that hasn't yet been converted
for f in out*.txt; do
echo "Generating $f.flac"
say -r 250 -o "$f.flac" --progress "$(cat $f)"
rm "$f"
done
```
To see all available options, run the main script with `--help` argument:
```
node --loader ts-node/esm/transpile-only src/run.ts --help
```
To see available `say` options, run `man say`.
To see supported `say` file formats, run `say --file-format=\? --data-format=\?`
## Preprocessing steps
1. Strip HTML
2. Strip Emoji
3. Strip block listed lines (defined in
[./src/exclude-list.txt](./src/exclude-list.txt)). Those currently consist of
the spam lines of text I found commonly repeated in the websites I often
visit. (i.e `Advertisement`, `RECOMMENDED VIDEOS FOR YOU`, and other trash)
4. Stip non-text lines (defined as lines that have more than 30% of
non-English-letter characters). This strips out code, spam, math equations
and other things that are not friendly with text-to-speech software.
5. Remove repeated lines. This is perhaps the most important one. It can
dramatically reduce the size of the input file (by about 30%-50%)
Why this is super cool:
- If you accidentally pasted the same news article twice, this step will
remove the duplicate
- It will automatically remove all the commonly repeated lines like
`Advertisement`, or footers from websites (i.e, Wired has a whole bunch of
lines like`More Great WIRED Stories` at the end of each article)
## Finding Voices
There is a helper script provided for finding the best voice among those
available on your system. Note, this only works with macOS's `say` command.
To see available options, run it with `--help` argument:
```
node --loader ts-node/esm/transpile-only src/findVoice.ts --help
```
Example call:
```sh
node --loader ts-node/esm/transpile-only src/findVoice.ts --voices Ava Karen Samantha --speed 200 --text "Hi! Isn't it cool to have a computer talk to you?"
```
By default, `say` will use the voice you configured in your macOS settings.
[Documentation](https://support.apple.com/guide/mac-help/change-spoken-content-settings-accessibility-spch638/mac)
## Finding spam lines
It is desirable to strip spam lines (i.e `Advertisement`,
`RECOMMENDED VIDEOS FOR YOU`, and other trash). To help with finding those, you
can run the script with `--countPath` argument, like so:
```sh
node --loader ts-node/esm/transpile-only src/run.ts --input ./in.txt --countPath ./counted.tsv
```
Then, you can add the discovered spam lines to `./src/exclude-list.txt`
近期下载者:
相关文件:
收藏者: