
开发工具:Jupyter Notebook
上传日期:2023-03-05 12:58:20
上 传 者sh-1993
说明:  dcai实验室,麻省理工学院IAP 2023数据中心人工智能导论的实验室任务
(dcai-lab,Lab assignments for Introduction to Data-Centric AI, MIT IAP 2023)

LICENSE.txt (34523, 2023-03-05)
data_centric_evaluation (0, 2023-03-05)
data_centric_evaluation\Lab - Data-Centric Evaluation.ipynb (6182, 2023-03-05)
data_centric_evaluation\solutions (0, 2023-03-05)
data_centric_evaluation\solutions\Solutions - Data-Centric Evaluation.ipynb (144929, 2023-03-05)
data_centric_evaluation\test.csv (402822, 2023-03-05)
data_centric_evaluation\train.csv (408097, 2023-03-05)
data_centric_model_centric (0, 2023-03-05)
data_centric_model_centric\Lab - Data-Centric AI vs Model-Centric AI.ipynb (16055, 2023-03-05)
data_centric_model_centric\reviews_test.csv (98929, 2023-03-05)
data_centric_model_centric\reviews_train.csv (699692, 2023-03-05)
data_centric_model_centric\solutions (0, 2023-03-05)
data_centric_model_centric\solutions\Solution - Data-Centric AI vs Model-Centric AI.ipynb (23244, 2023-03-05)
dataset_curation (0, 2023-03-05)
dataset_curation\Lab - Dataset Curation.ipynb (23011, 2023-03-05)
dataset_curation\solutions (0, 2023-03-05)
dataset_curation\solutions\Solutions - Dataset Curation.ipynb (26780, 2023-03-05)
growing_datasets (0, 2023-03-05)
growing_datasets\Lab - Growing Datasets.ipynb (278639, 2023-03-05)
growing_datasets\solutions (0, 2023-03-05)
growing_datasets\solutions\Lab - Growing Datasets.ipynb (279663, 2023-03-05)
interpretable_features (0, 2023-03-05)
interpretable_features\Lab - Interpretable Features.ipynb (91182, 2023-03-05)
interpretable_features\Solution - Interpretable Features.ipynb (606176, 2023-03-05)
interpretable_features\X_test_ft.csv (666645, 2023-03-05)
interpretable_features\X_train_ft.csv (3719410, 2023-03-05)
interpretable_features\data_description.txt (13713, 2023-03-05)
interpretable_features\example_explanation.png (43062, 2023-03-05)
interpretable_features\feature_descriptions.csv (20730, 2023-03-05)
interpretable_features\test_data.csv (69175, 2023-03-05)
interpretable_features\train_data.csv (386379, 2023-03-05)
label_errors (0, 2023-03-05)
label_errors\Lab - Label Errors.ipynb (18997, 2023-03-05)
label_errors\solutions (0, 2023-03-05)
label_errors\solutions\Solution - Label Errors.ipynb (34171, 2023-03-05)
label_errors\student-grades.csv (33037, 2023-03-05)
membership_inference (0, 2023-03-05)
... ...

# Lab assignments for Introduction to Data-Centric AI This repository contains the lab assignments for the [Introduction to Data-Centric AI](https://dcai.csail.mit.edu/) class. Contributions are most welcome! If you have ideas for improving the labs, please open an issue or submit a pull request. For more hands-on experience with techniques taught in this class, participate in the [Data-centric AI Competition 2023](https://machinehack.com/tournaments/data_centric_ai_competition_2023). ## [Lab 1: Data-Centric AI vs. Model-Centric AI][lab-1] The [first lab assignment][lab-1] walks you through an ML task of building a text classifier, and illustrates the power (and often simplicity) of data-centric approaches. [lab-1]: data_centric_model_centric/Lab%20-%20Data-Centric%20AI%20vs%20Model-Centric%20AI.ipynb ## [Lab 2: Label Errors][lab-2] [This lab][lab-2] guides you through writing your own implementation of automatic label error identification using Confident Learning, the technique taught in [today’s lecture][lec-2]. [lab-2]: label_errors/Lab%20-%20Label%20Errors.ipynb [lec-2]: https://dcai.csail.mit.edu/lectures/label-errors/ ## [Lab 3: Dataset Creation and Curation][lab-3] [This lab assignment][lab-3] is to analyze an already collected dataset labeled by multiple annotators. [lab-3]: dataset_curation/Lab%20-%20Dataset%20Curation.ipynb ## [Lab 4: Data-centric Evaluation of ML Models][lab-4] [This lab assignment][lab-4] is to try improving the performance of a given model solely by improving its training data via some of the various strategies covered here. [lab-4]: data_centric_evaluation/Lab%20-%20Data-Centric%20Evaluation.ipynb ## [Lab 5: Class Imbalance, Outliers, and Distribution Shift][lab-5] [The lab assignment][lab-5] for this lecture is to implement and compare different methods for identifying outliers. For this lab, we've focused on anomaly detection. You are given a clean training dataset consisting of many pictures of dogs, and an evaluation dataset that contains outliers (non-dogs). Your task is to implement and compare various methods for detecting these outliers. You may implement some of the ideas presented in [today's lecture][lec-5], or you can look up other outlier detection algorithms in the linked references or online. [lab-5]: outliers/Lab%20-%20Outliers.ipynb [lec-5]: https://dcai.csail.mit.edu/lectures/imbalance-outliers-shift/ ## [Lab 6: Growing or Compressing Datasets][lab-6] [This lab][lab-6] guides you through an implementation of active learning. [lab-6]: growing_datasets/Lab%20-%20Growing%20Datasets.ipynb ## [Lab 7: Interpretability in Data-Centric ML][lab-7] [This lab][lab-7] guides you through finding issues in a dataset’s features by applying interpretability techniques. [lab-7]: interpretable_features/Lab%20-%20Interpretable%20Features.ipynb ## [Lab 8: Encoding Human Priors: Data Augmentation and Prompt Engineering][lab-8] [This lab] guides you through prompt engineering, crafting inputs for large language models (LLMs). With these large pre-trained models, even small amounts of data can make them very useful. This lab is also [available on Colab][lab-8-colab]. [lab-8]: prompt_engineering/Lab_Prompt_Engineering.ipynb [lab-8-colab]: https://colab.research.google.com/drive/1cipH-u6Jz0EH-6Cd9MPYgY4K0sJZwRJq ## [Lab 9: Data Privacy and Security][lab-9] The [lab assignment][lab-9] for this lecture is to implement a membership inference attack. You are given a trained machine learning model, available as a black-box prediction function. Your task is to devise a method to determine whether or not a given data point was in the training set of this model. You may implement some of the ideas presented in [today’s lecture][lec-9], or you can look up other membership inference attack algorithms. [lab-9]: membership_inference/Lab%20-%20Membership%20Inference.ipynb [lec-9]: https://dcai.csail.mit.edu/lectures/data-privacy-security/ ## License Copyright (c) by the instructors of Introduction to Data-Centric AI (dcai.csail.mit.edu). dcai-lab is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. dcai-lab is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See [GNU Affero General Public LICENSE](https://github.com/dcai-course/dcai-lab/blob/master/LICENSE.txt) for details.


