Adding Swin Transformer architecture #5491

xiaohu2015 · 2022-02-27T14:32:43Z

This work related to #2707 and #5410: add swin transformer to torchvision model_zoo.

Implement swin_transformer.
Check model accuracy from converted weights swin_tiny_weights.

torchrun --nproc_per_node=8 train.py --model swin_tiny --interpolation bicubic --test-only --pretrained

Refactor code.
I made some modifications compared to official code:
remove absolute position embedding: as we can see from table 4 in the paper, the swin model with relative position bias get best results, so the default swin model does not use absolute position embedding. Another trouble with absolute position embedding is that we have to set input_size to the model to initialize the pos_embedding.
remove the input_resolution parameter: so input with arbitrary shape can be handled by the swint model, which is necessary for some tasks eg. segmentaion and object detection. Compared to offical code, we keep tensor with shape [B, H, W, C] instead of [B, N, C], so we can get width and height without input_resolution. But after do that, one must dynamically compare the window size and input size in the shifted window attention, for example, if the input size is lower than window size (when the image size is 224, the feature size of last stage is 7x7), you need't do shift operation. but the dynamic behavior is not well supported in torch.fx, so I create shifted_window_attention function and warp it. Note: this modification can add run time as we have to generate attention_mask dynamically, but the cost time is insignificant.
Validate the training.

torchrun --nproc_per_node=8 train.py\
    --model swin_t --epochs 300 --batch-size 128 --opt adamw --lr 0.001 --weight-decay 0.05 --norm-weight-decay 0.0\
    --bias-weight-decay 0.0 --transformer-embedding-decay 0.0 --lr-scheduler cosineannealinglr --lr-min 0.00001 --lr-warmup-method linear\
    --lr-warmup-epochs 20 --lr-warmup-decay 0.01 --amp --label-smoothing 0.1 --mixup-alpha 0.8\
    --clip-grad-norm 5.0 --cutmix-alpha 1.0 --random-erase 0.25 --interpolation bicubic --auto-augment ra

which can give result: Acc@1 81.222 Acc@5 95.332 train logs

I also modified the reference code, https://github.com/xiaohu2015/vision/blob/main/references/classification/utils.py#L406. as the current code only supports no weight decay for norm layers.

references

facebook-github-bot · 2022-02-27T14:32:50Z

💊 CI failures summary and remediations

As of commit 92ae7dd (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

datumbox · 2022-04-25T13:29:51Z

@xiaohu2015 Can you please share the logs with @jdsgomes and all the information that will allow us to reproduce your experiment (for example the git commit hashcodes you used etc). It's also unclear to me whether you used TorchVision's reference scripts or something else in your experiments. Could you please clarify?

jdsgomes · 2022-04-25T13:34:20Z

Hi @xiaohu2015
Finally I get the final results:

swin_t: 81.204 (batch 1 testing) vs 81.204 (results reported after training) vs 81.3 (paper)

swin_s: 82.602 (batch 1 testing) vs 82.598 (results reported after training) vs 83 (paper)

swin_b: 82.750 (batch 1 testing) vs 82.940 (results reported after training) vs 83.5 (paper)

As you see there are some small discrepancies in the final testing vs the results reported in the paper. I will reach out offline to discuss what could be the causes of the discrepancies
ps: corrected the results after @xiaohu2015 spotted the missing pre-processing transforms in the Weights.

thanks, it seems that the swin_t can match the offical result, but for swin_s and swin_b, it drops about 0.5 point. I checked the training logs, I found the result can match the result with https://github.com/open-mmlab/mmclassification/tree/master/configs/swin_transformer before 200 epoch (for swint_b) and 250 epoch (for swint_s). I think maybe there is some difference about the lr scheduler. Torchvision use epoch-based lr scheduler, but timm and mmcls use iteration-based lr scheduler. @datumbox @jdsgomes what your thoughts?

Despite they use iteration-based lr scheduler, in the official repo they convert parameterise the scheduler builder with the number of epochs and just convert the number of epochs to iterations so I think it should be equivalent.

xiaohu2015 · 2022-04-26T02:17:07Z

@xiaohu2015 Can you please share the logs with @jdsgomes and all the information that will allow us to reproduce your experiment (for example the git commit hashcodes you used etc). It's also unclear to me whether you used TorchVision's reference scripts or something else in your experiments. Could you please clarify?

for swin_t, I have shared the training logs (just use TorchVision's reference script ) with @jdsgomes, I got Acc@1 81.222 Acc@5 95.332, @jdsgomes can reproduce the training (81.204), the result can match the offical result. But for swin_s and swin_b, it drops about 0.5 point compared to the offical result, in fact, I only trained swin_t. I plan to check these two models. thanks to the training logs from @jdsgomes, I found that these two models seem to get behind in the last few epochs.

@jdsgomes yes, iteration-based lr scheduler and epoch-based lr scheduler should be equivalent. I just suspect that might be the reason, because some minor difference can make a result difference.

jdsgomes · 2022-04-26T08:52:26Z

@xiaohu2015 Can you please share the logs with @jdsgomes and all the information that will allow us to reproduce your experiment (for example the git commit hashcodes you used etc). It's also unclear to me whether you used TorchVision's reference scripts or something else in your experiments. Could you please clarify?

for swin_t, I have shared the training logs (just use TorchVision's reference script ) with @jdsgomes, I got Acc@1 81.222 Acc@5 95.332, @jdsgomes can reproduce the training (81.204), the result can match the offical result. But for swin_s and swin_b, it drops about 0.5 point compared to the offical result, in fact, I only trained swin_t. I plan to check these two models. thanks to the training logs from @jdsgomes, I found that these two models seem to get behind in the last few epochs.

@jdsgomes yes, iteration-based lr scheduler and epoch-based lr scheduler should be equivalent. I just suspect that might be the reason, because some minor difference can make a result difference.

After discussing offline with @datumbox I think we should proceed to merge the PR with the swin_t only since it is clear that we can reproduce the result, so great work @xiaohu2015 !

After that we can continue investigations to close the gap and aim to merge the other variants in a different PR.

I will do the final cleanups between today and tomorrow.

datumbox · 2022-04-26T10:52:40Z

Just wanted to echo what Joao said. Big massive thank you @xiaohu2015 for your awesome contribution. Top notch code and excellent research reproduction skills. Also apologies for taking us long to review and reproduce the PR; it's something we want to improve upon.

Looking forward seeing this merged!

xiaohu2015 · 2022-04-26T11:15:07Z

@datumbox @jdsgomes Thanks very much！

jdsgomes · 2022-04-27T08:34:20Z

torchvision/models/swin_transformer.py

@@ -416,7 +416,7 @@ class Swin_T_Weights(WeightsEnum):
    IMAGENET1K_V1 = Weights(
        url="https://download.pytorch.org/models/swin_t-81486767.pth",
        transforms=partial(
-            ImageClassification, crop_size=224, resize_size=256, interpolation=InterpolationMode.BICUBIC
+            ImageClassification, crop_size=224, resize_size=238, interpolation=InterpolationMode.BICUBIC


This value was determined in post-training optimisation similarly to what we did in convenext

datumbox

Thanks @jdsgomes. I know you are looking into all these, just added a few comments for changes done on the documentation side last week so that we don't forget to include it.

docs/source/models.rst

torchvision/models/swin_transformer.py

xiaohu2015 · 2022-04-27T09:34:53Z

for other models, I can convert the offical weight to torchvision version just like efficientnet.

jdsgomes · 2022-04-27T10:00:26Z

to

@xiaohu2015 I understand that would be useful to include pre-trained weights from the initial implementation, but we would prefer to include the other variants once we can replicate the results fully. I am running a few experiments now, and hopefully we can get good results, but for now I will remove even the constructors so this PR can be merged.

datumbox

LGTM, thanks again @xiaohu2015 for the awesome contribution.

@jdsgomes thanks as well for your support and guidance.

I think we are good to merge. Just make sure we remove the unnecessary expect files ModelTester.test_swin_*_expect.pk for variants s/b/l that were removed.

jdsgomes

Thanks @xiaohu2015 for the great contribution and @datumbox for the feedback

add swin transformer

533d2c0

pytorch-bot bot added the ciflow/default label Feb 27, 2022

facebook-github-bot added the cla signed label Feb 27, 2022

xiaohu2015 marked this pull request as draft Feb 27, 2022

datumbox added module: models new feature labels Feb 27, 2022

datumbox added this to In progress in Batteries Included - Phase 2 via automation Feb 27, 2022

datumbox moved this from In progress to To do in Batteries Included - Phase 2 Feb 27, 2022

datumbox mentioned this pull request Feb 27, 2022

Are new models planned to be added? #2707

Open

35 tasks

datumbox mentioned this pull request Feb 27, 2022

[RFC] Batteries Included - Phase 2 #5410

Open

24 tasks

xiaohu2015 added 9 commits Mar 6, 2022

Update swin_transformer.py

311751e

Merge branch 'main' into main

8db4fcd

Update swin_transformer.py

d478852

fix lint

92a1cf5

fix lint

8ac8077

refactor code

c4445a7

add swin_transformer

97e22d7

Update swin_transformer.py

8599a4b

fix bug

c378934

xiaohu2015 marked this pull request as ready for review Mar 6, 2022

xiaohu2015 added 9 commits Mar 7, 2022

refactor code

c8e8fe2

fix lint

45bbbfc

update init_weights

ebae8b1

move shift_window into attention

0e76444

refactor code

9a953c3

fix bug

b9321c7

Update swin_transformer.py

f33d1cd

Update swin_transformer.py

41e54b8

fix lint

71ef011

fix linter

09d63f5

jdsgomes added 5 commits Apr 26, 2022

add pretrained weights for swin_t

b6fec69

merge upstream changes

64b52d4

fix format

64af984

Merge branch 'main' into main

9e0bfcb

apply ufmt

1528ca8

add documentation

e6e9ffe

jdsgomes reviewed Apr 27, 2022

View changes

update references README

137d634

datumbox reviewed Apr 27, 2022

View changes

docs/source/models.rst Outdated Show resolved Hide resolved

torchvision/models/swin_transformer.py Show resolved Hide resolved

jdsgomes added 3 commits Apr 27, 2022

adding new style docs

3457abb

update pre-trained weights values

d3599ef

Merge branch 'main' into main

4e05993

jdsgomes added 3 commits Apr 27, 2022

remove other variants

6addd1b

Merge branch 'main' of github.com:xiaohu2015/vision into xiaohu2015/main

6c328f9

fix typo

ca59aaf

datumbox approved these changes Apr 27, 2022

View changes

Remove expect for the variants not yet supported

e4c9646

jdsgomes approved these changes Apr 27, 2022

View changes

Merge branch 'main' into main

9999e64

jdsgomes merged commit e288f6c into pytorch:main Apr 27, 2022
5 of 87 checks passed

Batteries Included - Phase 2 automation moved this from In progress to Done Apr 27, 2022

pytorch / vision Public

Adding Swin Transformer architecture #5491

Adding Swin Transformer architecture #5491

xiaohu2015 commented Feb 27, 2022 •

edited

facebook-github-bot commented Feb 27, 2022 •

edited

datumbox commented Apr 25, 2022

jdsgomes commented Apr 25, 2022

xiaohu2015 commented Apr 26, 2022 •

edited

jdsgomes commented Apr 26, 2022

datumbox commented Apr 26, 2022

xiaohu2015 commented Apr 26, 2022

jdsgomes Apr 27, 2022

datumbox left a comment

xiaohu2015 commented Apr 27, 2022

jdsgomes commented Apr 27, 2022

datumbox left a comment

jdsgomes left a comment

pytorch / vision Public

Adding Swin Transformer architecture #5491

Adding Swin Transformer architecture #5491

Conversation

xiaohu2015 commented Feb 27, 2022 • edited

references

facebook-github-bot commented Feb 27, 2022 • edited

💊 CI failures summary and remediations

datumbox commented Apr 25, 2022

jdsgomes commented Apr 25, 2022

xiaohu2015 commented Apr 26, 2022 • edited

jdsgomes commented Apr 26, 2022

datumbox commented Apr 26, 2022

xiaohu2015 commented Apr 26, 2022

jdsgomes Apr 27, 2022

Choose a reason for hiding this comment

datumbox left a comment

xiaohu2015 commented Apr 27, 2022

jdsgomes commented Apr 27, 2022

datumbox left a comment

jdsgomes left a comment

xiaohu2015 commented Feb 27, 2022 •

edited

facebook-github-bot commented Feb 27, 2022 •

edited

xiaohu2015 commented Apr 26, 2022 •

edited