
开发工具:Jupyter Notebook
上传日期:2023-04-23 21:05:15
上 传 者sh-1993
说明:  ThoughtSource,大型语言模型中与思维链推理相关的数据和工具的中心开放资源。开发,

文件列表: (7280, 2023-04-17) (1068, 2023-04-17)
apps (0, 2023-04-17)
apps\annotator-backend (0, 2023-04-17)
apps\annotator-backend\ (4326, 2023-04-17)
apps\annotator-backend\ (1292, 2023-04-17)
apps\annotator-backend\example.env (73, 2023-04-17)
apps\annotator-backend\requirements.txt (524, 2023-04-17)
apps\annotator-backend\ (9096, 2023-04-17)
apps\annotator (0, 2023-04-17)
apps\annotator\example.env (93, 2023-04-17)
apps\annotator\generate-react-cli.json (295, 2023-04-17)
apps\annotator\package-lock.json (1301456, 2023-04-17)
apps\annotator\package.json (1516, 2023-04-17)
apps\annotator\public (0, 2023-04-17)
apps\annotator\public\favicon.ico (3870, 2023-04-17)
apps\annotator\public\index.html (1721, 2023-04-17)
apps\annotator\public\logo192.png (5347, 2023-04-17)
apps\annotator\public\logo512.png (9664, 2023-04-17)
apps\annotator\public\manifest.json (492, 2023-04-17)
apps\annotator\public\robots.txt (67, 2023-04-17)
apps\annotator\src (0, 2023-04-17)
apps\annotator\src\components (0, 2023-04-17)
apps\annotator\src\components\annotator (0, 2023-04-17)
apps\annotator\src\components\annotator\annotator.module.scss (61, 2023-04-17)
apps\annotator\src\components\annotator\annotator.tsx (945, 2023-04-17)
apps\annotator\src\components\backupservice.ts (1117, 2023-04-17)
apps\annotator\src\components\cotoutputelement (0, 2023-04-17)
apps\annotator\src\components\cotoutputelement\CotOutputElement.module.scss (762, 2023-04-17)
apps\annotator\src\components\cotoutputelement\CotOutputElement.tsx (4002, 2023-04-17)
... ...

# ThoughtSource __A framework for the science of machine thinking__ _[Datasets]( [Tutorial notebook]( [Installation guide]( [Dataset Annotator]( ThoughtSource is a central, open resource and community centered on data and tools for chain-of-thought reasoning in large language models ([Wei 2022]( Our long-term goal is to enable trustworthy and robust reasoning in advanced AI systems for driving scientific research and medical practice.

ThoughtSource overview 3

“ Pre-print: Ott _et al._ [__"ThoughtSource: A central hub for large language model reasoning data"__](, arXiv, 2023 ## Workflow

ThoughtSource overview 1 ThoughtSource overview 2

## Available datasets Our [dataloaders]( allow you to access the following datasets in a standardized chain-of-thought format. The dataloaders create objects in the [Hugging Face ¤— Datasets format]( We (sometimes extensively) post-processed the source datasets in different ways to create more coherent reasoning chains. ---

Datasets can be browsed online through the Dataset Viewer ”

--- ### General question answering * __[commonsense_qa]( Multiple-choice commonsense knowledge question answering dataset ([Talmor 2018](, _License:_ MIT). Reasoning chains from three different sources are included: * __Human-generated__ reasoning chains derived from the __[ECQA dataset]( ([Aggarwal 2021]( for train and validation split. Used as gold standard. _License:_ Community Data License Agreements Sharing license 1.0. * __AI-generated (few-shot prompting)__ reasoning chains from __[Wei 2022]( Only available for __validation split__. _License:_ Unknown * __AI-generated (zero-shot prompting)__ generated reasoning chains from __[Kojima 2022]( Only available for __validation split__. _License:_ Unknown * __[strategy_qa]( General-domain question-answering data from the StrategyQA dataset, reasoning chains are derived from original dataset. ([Geva 2021]( _License:_ MIT. * __Human-generated__ reasoning chains derived from the original dataset for train split. Used as gold standard. _License:_ MIT. * __AI-generated (few-shot)__ reasoning chains from __[Wei 2022]( Only available for __train split__. _License:_ Unknown * __AI-generated (zero-shot)__ generated reasoning chains from __[Kojima 2022]( Only available for __train split__. _License:_ Unknown * __[qed]( General-domain question-answering data and justifications from the QED dataset ([Lamm 2020]( _License:_ CC BY-SA 3.0. ### Scientific / medical question answering * __[worldtree]( Scientific question-answering data from the WorldTree v2 dataset ([Xie 2020]( __Human-generated__ reasoning chains derived from the original dataset. _License:_ AI2 Mercury. * __[entailment_bank]( Science exam questions with expert-authored explanations from the EntailmentBank dataset ([Dalvi 2022]( __Human-generated__ reasoning chains derived from the original dataset. _License:_ CC BY 4.0. (Note: significant overlap with worldtree v2) * __[open_book_qa]( Scientific question-answering modeled after open book exams for assessing human understanding from the OpenBookQA dataset ([Mihaylov 2018]( __Human-generated__ reasoning chains derived from the original dataset. _License:_ Apache License 2.0. * __[med_qa]( (USMLE subset):__ Free-form multiple-choice OpenQA dataset containing questions from medical board exams in US (USMLE). Note: the original MedQA dataset also provides Chinese-language data, which are currently not included. ([Jin 2020]( _License:_ MIT. * __AI-generated (zero-shot)__ reasoning chains derived from __[Livin 2022]( Only available for the __test split__, only US questions. _License:_ Unknown. * __[medmc_qa]( Multiple-Choice Question Answering dataset containing real-world medical entrance exam questions from the All India Institute of Medical Sciences (AIIMS PG) and National Eligibility cum Entrance Test (NEET PG). ([Pal 2022]( _License:_ MIT. * __Human-generated__ reasoning chains derived from the original dataset for ~85% of train and validation split. Used as gold standard. _License:_ MIT. * __AI-generated (zero-shot)__ reasoning chains derived from __[Livin 2022]( Only available for 1000 samples from the __validation split__. _License:_ CC-BY. * __[pubmed_qa]( QA dataset containing biomedical questions extracted from PubMed abstracts that can be answered with yes/no/maybe ([Jin 2019]( _License:_ MIT. * __Human-generated__ reasoning chains derived from the original dataset. Used as gold standard. _License:_ MIT. * __AI-generated (zero-shot)__ reasoning chains derived from __[Livin 2022]( Only available for the __test split__. _License:_ CC-BY. ### Math word problems * __[aqua]( Math word problems from the AQUA-RAT (Algebra Question Answering with Rationales) dataset ([Ling 2017]( Reasoning chains derived from the original dataset. _License:_ Apache 2.0. * __[***iv](***iv-dataset):__ Math word problems from the Academia Sinica Diverse MWP dataset ([Miao 2020]( Reasoning chains derived from the original dataset. _License:_ CC BY-NC 4.0. * __[gsm8k]( Math word problems from the GSM8K dataset ([Cobbe 2021]( Reasoning chains derived from the original dataset. _License:_ MIT. * __[mawps]( Math word problems from MAWPS, the Math Word Problem Repository dataset ([Koncel-Kedziorski 2016]( Reasoning chains derived from the original dataset. _License:_ MIT. * __[svamp]( Math word problems. Source: SVAMP ([Patel 2021]( Reasoning chains derived from the original dataset. _License:_ MIT. We are working on collecting and generating additional datasets, and on further improving the quality of existing datasets (see [dataset issues]( We welcome suggestions for the inclusion of other datasets. __We welcome dataset contributions! ‘‰ Have a look at our [contribution guide](!__ ## Annotator

Demonstration of the annotator tool The annotator allows for highlighting similarities between different generated reasoning chains, making it easier to spot strenghts and weaknesses and to select best results.


Use the web-based annotator “
To try out the annotator, simply type in your name and load this example file

## Installation and code structure ### Installation execute in terminal line by line: ```bash git clone cd ThoughtSource # install pip and virtualenv sudo apt install python3-pip sudo apt install python3-venv # create and activate virtual environment python3 -m venv venv source ./venv/bin/activate # install requirements and API packages pip install -e ./libs/cot[api] ``` ### Applications * __[annotator]( Web-based tool for annotating chain-of-thought data. * __[dataset-viewer]( Streamlit application for browsing ThoughtSource datasets ### Libraries * __[cot]( * __dataloader__: Creating and processing of ThoughtSource datasets (based on the Hugging Face ¤— Datasets library). * __generate__: Generating reasoning chains with a wide variety of language models (currently OpenAI and models on Hugging Face hub) * __evaluate__: Evaluate the performance of predictions extracted using generated reasoning chains ```python # 1) Dataset loading and selecting a random sample collection = Collection(["worldtree"], verbose=False) collection ="train", number_samples=10) # 2) Language Model generates chains of thought and then extracts answers config={ "instruction_keys": ['qa-01'], # "Answer the following question through step-by-step reasoning." "cot_trigger_keys": ['kojima-01'], # "Answer: Let's think step by step." "answer_extraction_keys": ['kojima-A-D'], # "Therefore, among A through D, the answer is" "api_service": "huggingface_hub", "engine": "google/flan-t5-xl", "warn": False, "verbose": False, } collection.generate(config=config) # 3) Performance evaluation collection.evaluate() ``` ``` {'accuracy': {'qa-01_kojima-01_kojima-A-D': 0.6}} ``` ---

‘‰ See the tutorial notebook for more code examples.

--- ## Citation ```bibtex @misc{, doi = {10.48550/ARXIV.2301.11596}, url = {}, author = {Ott, Simon and Hebenstreit, Konstantin and Livin, Valentin and Hother, Christoffer Egeberg and Moradi, Milad and Mayrhauser, Maximilian and Praas, Robert and Winther, Ole and Samwald, Matthias}, keywords = {Computation and Language (cs.CL), Artificial Intelligence (cs.AI), FOS: Computer and information sciences, FOS: Computer and information sciences}, title = {ThoughtSource: A central hub for large language model reasoning data}, publisher = {arXiv}, year = {2023}, copyright = {Creative Commons Attribution 4.0 International} } ``` ## Versioning All updates/changes to datasets are explicitly mentioned in bold. 0.0.5 (2023-03-10) - Function to select which generated CoTs to keep after loading: collection.select_generated_cots(author="thoughtsource") 0.0.4 (2023-03-08) - Evaluation function improved. Function to load ThoughtSource100 collection: Collection.load_thoughtsource_100() 0.0.3 (2023-02-24) - ThoughtSource_100 collection released with reasoning chains from GPT-text-davinci-003, flan-t5-xxl, and cohere's command-xl 0.0.2 (2023-02-15) - Annotator tool updated for correct data schema (this might result in errors loading old datasets, when loading from json files).         **Pubmed_qa**: Included "LONG_ANSWER" from origin schema as "cot" in ThoughtSource schema 0.0.1 (2023-02-01) - Initial release after Twitter announcement of project


