BertFt

所属分类:人工智能/神经网络/深度学习
开发工具:Jupyter Notebook
文件大小:0KB
下载次数:0
上传日期:2023-09-08 21:27:39
上 传 者sh-1993
说明:  贝特福德,,
(BertFt,,)

文件列表:
GLUE/ (0, 2024-01-07)
GLUE/trainseed_1337/ (0, 2024-01-07)
GLUE/trainseed_1337/task_cola/ (0, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/ (0, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/ (0, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_0/ (0, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_0/lr2e-5_epoch3_bs32/ (0, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_0/lr2e-5_epoch3_bs32/baseline.npy (7666, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_0/lr2e-5_epoch3_bs32/stats/ (0, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_0/lr2e-5_epoch3_bs32/stats/epoch_0.csv (27071, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_0/lr2e-5_epoch3_bs32/stats/epoch_1.csv (27067, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_0/lr2e-5_epoch3_bs32/stats/epoch_2.csv (27078, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_0/lr2e-5_epoch3_bs32/stats/freeze_0.csv (11101, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_0/lr2e-5_epoch3_bs32/stats/freeze_1.csv (11101, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_0/lr2e-5_epoch3_bs32/stats/freeze_2.csv (11101, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_1/ (0, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_1/lr2e-5_epoch3_bs32/ (0, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_1/lr2e-5_epoch3_bs32/baseline.npy (7666, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_1/lr2e-5_epoch3_bs32/stats/ (0, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_1/lr2e-5_epoch3_bs32/stats/epoch_0.csv (27071, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_1/lr2e-5_epoch3_bs32/stats/epoch_1.csv (27080, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_1/lr2e-5_epoch3_bs32/stats/epoch_2.csv (27080, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_1/lr2e-5_epoch3_bs32/stats/freeze_0.csv (11103, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_1/lr2e-5_epoch3_bs32/stats/freeze_1.csv (11103, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_1/lr2e-5_epoch3_bs32/stats/freeze_2.csv (11103, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_10/ (0, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_10/lr2e-5_epoch3_bs32/ (0, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_10/lr2e-5_epoch3_bs32/baseline.npy (7666, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_10/lr2e-5_epoch3_bs32/stats/ (0, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_10/lr2e-5_epoch3_bs32/stats/epoch_0.csv (27071, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_10/lr2e-5_epoch3_bs32/stats/epoch_1.csv (27074, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_10/lr2e-5_epoch3_bs32/stats/epoch_2.csv (27085, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_10/lr2e-5_epoch3_bs32/stats/freeze_0.csv (11121, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_10/lr2e-5_epoch3_bs32/stats/freeze_1.csv (11121, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_10/lr2e-5_epoch3_bs32/stats/freeze_2.csv (11121, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_12/ (0, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_12/lr2e-5_epoch3_bs32/ (0, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_12/lr2e-5_epoch3_bs32/baseline.npy (7666, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_12/lr2e-5_epoch3_bs32/stats/ (0, 2024-01-07)
GLUE/trainseed_1337/task_cola/lay_norm_False/alpha_asc_False/layers_12/lr2e-5_epoch3_bs32/stats/epoch_0.csv (27071, 2024-01-07)
... ...

# BertFt This research project explores different methods of fine-tuning the pre-trained Google BERT model on various datasets from the General Language Understanding Evaluation (GLUE) benchmark.

The project also highlights the variations in performance observed over different training layers of the model.
The results obtained thus far are new to the academia and have not been obtained before.
Model chosen to obtain the results is the bert-base model and the tokenizer used is the BERT Tokenizer from HuggingFace Transformers.

Following are the choices of hyperparameters used:
learning_rate = 2e-5
batch_size = 32
epochs = 3
optimizer = ADAMW
padding = max_length

CoLA

MRPC

QNLI

RTE

SST-2

STSB



Script usage: num_layers="0 1 2 3 4 5 6 8 10 12 18 24 30 36 72 74" task_list="mrpc qnli qqp rte sst2 stsb wnli" alpha_list="True False" laynorm="False" model="bert-base-uncased" batch_size=32 for task in $task_list
do
for alpha in $alpha_list
do
for layers in $num_layers
do
save_path="YOUR_SAVE_PATH"
OMP_NUM_THREADS=1 CUDA_VISIBLE_DEVICES=0 python bertft.py \ --savepath "$save_path" \ --epochs 3 \ --model_name $model \ --task_name "$task" \ --max_length 128 \ --batch_size $batch_size \ --learning_rate "2e-5" \ --seed 7 \ --freeze True \ --num_layers "$layers" \ --alpha_ascending "$alpha" \ --slow_tokenizer True \ --pad_to_max_length True \ --add_layer_norm $laynorm \ --max_train_steps 1000 \ --grad_acc_steps 1 \ --accelerate False \ --debug False
done
done
done

近期下载者

相关文件


收藏者