Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify Google Cloud Compute Engine disk type #1444

Open
twbattaglia opened this issue Jan 7, 2020 · 2 comments
Open

Specify Google Cloud Compute Engine disk type #1444

twbattaglia opened this issue Jan 7, 2020 · 2 comments

Comments

@twbattaglia
Copy link

@twbattaglia twbattaglia commented Jan 7, 2020

New feature

Ability to specify the Compute Engine disk type (pd-standard or local-SSD) found in the new Cloud Life Sciences API (https://cloud.google.com/life-sciences/docs/reference/rpc/google.cloud.lifesciences.v2beta#disk).

Usage scenario

Job's that require a high input/output operations per second and lower latency (https://cloud.google.com/compute/docs/disks/local-ssd).

Suggest implementation

The API documentation states it can be set using setType() (https://developers.google.com/resources/api-libraries/documentation/genomics/v1alpha2/java/latest/com/google/api/services/genomics/model/Disk.html#setType-java.lang.String-)

Add disk type during formation of VM in GoogleLifeSciencesHelper.groovy

protected Resources createResources(GoogleLifeSciencesSubmitRequest req) {
        def disk = new Disk()
        disk.setName(req.diskName)
        disk.setSizeGb(req.diskSizeGb)
        disk.setType(req.diskType)

Where req.diskType is specified in GoogleLifeSciencesTaskHandler.groovy

    req.bootDiskSizeGb = executor.config.bootDiskSize?.toGiga() as Integer
    req.diskType = task.config.getDiskType() as String
    return req

getDiskType() can be set within TaskConfig.groovy, where it is set to pd-standard by default.

    String getDiskType() {
        def value = get('diskType')

        if( !value ) return "pd-standard"

        if (value.toString()=="pd-standard" || value.toString()=="local-ssd") {
            return value.toString()
        } else {
            return "pd-standard"
        }
    }

Preliminary tests showed it was successful to generate a Computer Engine instance with SSD attached.

@stale
Copy link

@stale stale bot commented Apr 27, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Apr 27, 2020
@pditommaso pditommaso added stale and removed wontfix labels Apr 27, 2020
@nibscles
Copy link

@nibscles nibscles commented Jun 10, 2020

I would definitely support this. The key logic of nextflow is a little challenged on the cloud: unless one has a shared disk which can be mounted by all tasks VMs, each task will copy back and forth files to/from the bucket instead of using sym links as on-prem.
This behaviour huuuugely multiplies costs by increasing both I/O and runtime.
The possibility of specifying the disk type could change the IOPS of the VMs and improve performance on worker VMs. This feature would help optimizing nextflow pipelines on the cloud. quite important :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.