Swin3D

所属分类:人工智能/神经网络/深度学习
开发工具:Cuda
文件大小:0KB
下载次数:0
上传日期:2023-06-25 05:12:20
上 传 者sh-1993
说明:  用于3D稀疏任务的基于移位窗口的转换器,
(A shift-window based transformer for 3D sparse tasks,)

文件列表:
CODE_OF_CONDUCT.md (444, 2023-06-24)
LICENSE (1141, 2023-06-24)
SECURITY.md (2757, 2023-06-24)
SUPPORT.md (1244, 2023-06-24)
Swin3D/ (0, 2023-06-24)
Swin3D/__init__.py (231, 2023-06-24)
Swin3D/models/ (0, 2023-06-24)
Swin3D/models/Swin3D.py (11799, 2023-06-24)
Swin3D/models/__init__.py (194, 2023-06-24)
Swin3D/modules/ (0, 2023-06-24)
Swin3D/modules/__init__.py (80, 2023-06-24)
Swin3D/modules/mink_layers.py (6790, 2023-06-24)
Swin3D/modules/swin3d_layers.py (30328, 2023-06-24)
Swin3D/sparse_dl/ (0, 2023-06-24)
Swin3D/sparse_dl/__init__.py (81, 2023-06-24)
Swin3D/sparse_dl/attn/ (0, 2023-06-24)
Swin3D/sparse_dl/attn/__init__.py (80, 2023-06-24)
Swin3D/sparse_dl/attn/attn_coff.py (10545, 2023-06-24)
Swin3D/sparse_dl/knn/ (0, 2023-06-24)
Swin3D/sparse_dl/knn/__init__.py (129, 2023-06-24)
Swin3D/sparse_dl/knn/knn.py (829, 2023-06-24)
Swin3D/src/ (0, 2023-06-24)
Swin3D/src/attn/ (0, 2023-06-24)
Swin3D/src/attn/attn_api.cpp (1369, 2023-06-24)
Swin3D/src/attn/attn_utils.cuh (15992, 2023-06-24)
Swin3D/src/attn/self_attn_aio_bwd.cu (24988, 2023-06-24)
Swin3D/src/attn/self_attn_aio_bwd.h (851, 2023-06-24)
Swin3D/src/attn/self_attn_aio_fwd.cu (31327, 2023-06-24)
Swin3D/src/attn/self_attn_aio_fwd.h (658, 2023-06-24)
Swin3D/src/attn/self_attn_apply_coff_indir_backward.cu (8964, 2023-06-24)
Swin3D/src/attn/self_attn_apply_coff_indir_backward.h (529, 2023-06-24)
Swin3D/src/attn/self_attn_apply_coff_indir_forward.cu (6299, 2023-06-24)
Swin3D/src/attn/self_attn_apply_coff_indir_forward.h (489, 2023-06-24)
Swin3D/src/attn/self_attn_cal_coff_indir_backward.cu (9431, 2023-06-24)
Swin3D/src/attn/self_attn_cal_coff_indir_backward.h (541, 2023-06-24)
... ...

# Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin3d-a-pretrained-transformer-backbone-for/semantic-segmentation-on-scannet)](https://paperswithcode.com/sota/semantic-segmentation-on-scannet?p=swin3d-a-pretrained-transformer-backbone-for) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin3d-a-pretrained-transformer-backbone-for/semantic-segmentation-on-s3dis-area5)](https://paperswithcode.com/sota/semantic-segmentation-on-s3dis-area5?p=swin3d-a-pretrained-transformer-backbone-for) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin3d-a-pretrained-transformer-backbone-for/semantic-segmentation-on-s3dis)](https://paperswithcode.com/sota/semantic-segmentation-on-s3dis?p=swin3d-a-pretrained-transformer-backbone-for) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin3d-a-pretrained-transformer-backbone-for/3d-object-detection-on-scannetv2)](https://paperswithcode.com/sota/3d-object-detection-on-scannetv2?p=swin3d-a-pretrained-transformer-backbone-for) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/swin3d-a-pretrained-transformer-backbone-for/3d-object-detection-on-s3dis)](https://paperswithcode.com/sota/3d-object-detection-on-s3dis?p=swin3d-a-pretrained-transformer-backbone-for) ## Updates ***27/04/2023*** Initial commits: 1. Pretrained models on Structured3D are provided. 2. The supported code for Semantic Segmentation on ScanNet and S3DIS are provided. ## Introduction We present a pretrained 3D backbone, named Swin3D, that first-time outperforms all state-of-the-art methods on downstream 3D indoor scene understanding tasks. Our backbone network is based on a 3D Swin transformer and carefully designed for efficiently conducting self-attention on sparse voxels with a linear memory complexity and capturing the irregularity of point signals via generalized contextual relative positional embedding. Based on this backbone design, we pretrained a large Swin3D model on a synthetic Structured3D dataset that is 10 times larger than the ScanNet dataset and fine-tuned the pretrained model on various downstream real-world indoor scene understanding tasks. ![teaser](figures/swin3D.png) ## Overview - [Data Preparation](#data-preparation) - [Pretrained Models](#pretrained-models) - [Quick Start](#quick-start) - [Results and models](#results-and-models) - [Citation](#citation) ## Data Preparation We pretrained our Swin3D on Structured3D, please refer to this [link](https://github.com/yuxiaoguo/Uni3DScenes) to prepare the data. ## Pretrained Models The models pretrained on Structured3D with different cRSE are provided here. | | Pretrain | #params | cRSE | mIoU(val) | Model | Log | | :------- | :----------: | :------ | :----------- | :-------: | :-----------------------------------------------------------------------------------------: | :---------------------------------------------------------------------------------------: | | Swin3D-S | Structured3D | 23.57M | XYZ,RGB | 77.69 | [model](https://drive.google.com/file/d/1oezNkN3_HZvyxGxjtOpSaQUbGl3YYF90/view?usp=sharing) | [log](https://drive.google.com/file/d/1TuwZqpKm8OYj8BeMhDUhLcGqzXhgJcpC/view?usp=sharing) | | Swin3D-S | Structured3D | 23.57M | XYZ,RGB,NORM | 79.15 | [model](https://drive.google.com/file/d/1FMmAgHwS__NtFldH-lFTsraKj0my62t4/view?usp=sharing) | [log](https://drive.google.com/file/d/1-0kz81X0j2Zp-mntN1GwQlsm5sLIy3JX/view?usp=sharing) | | Swin3D-L | Structured3D | 60.75M | XYZ,RGB | 79.79 | [model](https://drive.google.com/file/d/1ior8uAQRiVd2mwfYapcaF_e_R80y7DQm/view?usp=sharing) | [log](https://drive.google.com/file/d/1YYd8SOaAIqz16T7XOL54aGPC4sSoMXsW/view?usp=sharing) | | Swin3D-L | Structured3D | 60.75M | XYZ,RGB,NORM | 81.04 | [model](https://drive.google.com/file/d/1ySNrP39H6m-euK-2La60-MNOp0e3Pe_4/view?usp=sharing) | [log](https://drive.google.com/file/d/1nXQCw5G2swrSksBnpGBveNSHwAqy8hAZ/view?usp=sharing) | ## Quick Start Install the package using pip install -r requirements.txt python setup.py install Build models and load our pretrained weight, Then you can finetune your model in various task. import torch from Swin3D.models import Swin3DUNet model = Swin3DUNet(depths, channels, num_heads, \ window_sizes, quant_size, up_k=up_k, \ drop_path_rate=drop_path_rate, num_classes=num_classes, \ num_layers=num_layers, stem_transformer=stem_transformer, \ upsample=upsample, first_down_stride=down_stride, \ knn_down=knn_down, in_channels=in_channels, \ cRSE='XYZ_RGB_NORM', fp16_mode=1) model.load_pretrained_model(ckpt_path) ## Results and models To reproduce our results on downstream tasks, please follow the code in this [repo](https://github.com/Yukichiii/Swin3D_Task). The results are provided here. ### ScanNet Segmentation | | Pretrained | mIoU(Val) | mIoU(Test) | | :------- | :--------: | :--------: | :--------: | | Swin3D-S | ✗ | 75.2 | - | | Swin3D-S | ✓ | 75.6(76.8) | - | | Swin3D-L | ✓ | 76.2(77.5) | 77.9 | ### S3DIS Segmentation | | Pretrained | Area 5 mIoU | 6-fold mIoU | | :------- | :--------: | :---------: | :---------: | | Swin3D-S | ✗ | 72.5 | 76.9 | | Swin3D-S | ✓ | 73.0 | 78.2 | | Swin3D-L | ✓ | 74.5 | 79.8 | ### ScanNet 3D Detection | | Pretrained | mAP@0.25 | mAP@0.50 | | :----------------- | :--------: | :------: | :------: | | Swin3D-S+FCAF3D | ✓ | 74.2 | 59.5 | | Swin3D-L+FCAF3D | ✓ | 74.2 | 58.6 | | Swin3D-S+CAGroup3D | ✓ | 76.4 | 62.7 | | Swin3D-L+CAGroup3D | ✓ | 76.4 | 63.2 | ### S3DIS 3D Detection | | Pretrained | mAP@0.25 | mAP@0.50 | | :-------------- | :--------: | :------: | :------: | | Swin3D-S+FCAF3D | ✓ | 69.9 | 50.2 | | Swin3D-L+FCAF3D | ✓ | 72.1 | 54.0 | ## Citation If you find Swin3D useful to your research, please cite our work: ``` @misc{yang2023swin3d, title={Swin3D: A Pretrained Transformer Backbone for 3D Indoor Scene Understanding}, author={Yu-Qi Yang and Yu-Xiao Guo and Jian-Yu Xiong and Yang Liu and Hao Pan and Peng-Shuai Wang and Xin Tong and Baining Guo}, year={2023}, eprint={2304.06906}, archivePrefix={arXiv}, primaryClass={cs.CV} } ```

近期下载者

相关文件


收藏者