Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline parallel training engine. #392

Merged
merged 45 commits into from Sep 10, 2020

Conversation

@ShadenSmith
Copy link
Contributor

@ShadenSmith ShadenSmith commented Sep 10, 2020

No description provided.

ShadenSmith and others added 30 commits Sep 2, 2020
* cleaning pipe logging

* Fixes checkpointing with non-float activations.

* less verbose output

* improve pipeline installation

* Improves startup time and reduces logging.

* reduces logging

* reduces progress reporting

* removing test-pipe/

* DSE commit?

* trying out new pip dependency resolver

* specify torchvision version for compatibility

* pip upgrade-strategy

* quiet installation

* pre-install torch with pip

* wrong pip options

* more wrong pip options lol

* torch version macro

* fp16 paramdict build fail

* only fused lamb

* improving timers

Co-authored-by: Shaden Smith <[email protected]>
* Tied module indexing bugfix.

* Train and inference pipeline schedules.

* Move code quality tests to Azure-hosted agents. (microsoft#368)
jeffra
jeffra approved these changes Sep 10, 2020
@ShadenSmith ShadenSmith merged commit 65c2f97 into microsoft:master Sep 10, 2020
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants