thaigov-v2-corpus
所属分类:数据集
开发工具:Jupyter Notebook
文件大小:0KB
下载次数:0
上传日期:2023-08-04 11:33:43
上 传 者:
sh-1993
说明: 泰国政府网站的泰国新闻数据集。,
(Thai News Dataset from Thai government website.,)
文件列表:
LICENSE (11357, 2023-12-14)
data/ (0, 2023-12-14)
data/2020/ (0, 2023-12-14)
data/2020/09/ (0, 2023-12-14)
data/2020/09/17/ (0, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_1.txt (14588, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_10.txt (4391, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_11.txt (3888, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_12.txt (11435, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_13.txt (11242, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_14.txt (3549, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_15.txt (5828, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_16.txt (4626, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_17.txt (9506, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_18.txt (12445, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_19.txt (4792, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_2.txt (5211, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_20.txt (4182, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_21.txt (9424, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_22.txt (9276, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_23.txt (7414, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_24.txt (11846, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_25.txt (8111, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_26.txt (8952, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_27.txt (8124, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_28.txt (6415, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_29.txt (5289, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_3.txt (6827, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_30.txt (12305, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_31.txt (9165, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_32.txt (7978, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_33.txt (8502, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_34.txt (5880, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_35.txt (5348, 2023-12-14)
data/2020/09/17/喔傕箞喔侧抚喔椸赋喙喔權傅喔⑧笟喔`副喔愢笟喔侧弗_36.txt (7242, 2023-12-14)
... ...
# ThaiGov V2 Corpus
## English
- Data from Thai government website. https://www.thaigov.go.th
- This part of PyThaiNLP Project.
- Compiled by Mr.Wannaphong Phatthiyaphaibun
- License Dataset is public domain.
## Data format
- 1 file, 1 news, which is extracted from 1 url.
```
topic
(Blank line)
content
content
content
content
content
(Blank line)
(URL source) : http://www.thaigov.go.th/news/contents/details/NNN
```
## Thai
- https://www.thaigov.go.th
- [PyThaiNLP](https://github.com/PyThaiNLP/)
-
- (public domain) ... .. 2537 7 ( (1) [...] (3) [...])
** Git**
###
- 17 .. 2563
###
- 1 1 1 url
```
()
()
: http://www.thaigov.go.th/news/contents/details/NNN
```
###
- _.txt
### Script
- run.py url ```http://www.thaigov.go.th/news/contents/details/NNN``` NNN
- i
- clean.py
- ```clean.py ```
- ```clean.py 1 2```
- ```clean.py *.txt```
We build Thai NLP.
PyThaiNLP
近期下载者:
相关文件:
收藏者: