Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added initial support for Multi Task Jobs #853

Merged
merged 9 commits into from
Oct 13, 2021
Merged

Added initial support for Multi Task Jobs #853

merged 9 commits into from
Oct 13, 2021

Conversation

nfx
Copy link
Contributor

@nfx nfx commented Oct 8, 2021

  • task block of databricks_job is currently slice, so adding and removing different tasks might cause confusing, but still correct diffs

This resolves #747

Syntax example:

provider "databricks" {
  use_multitask_jobs = true
}

data "databricks_current_user" "me" {}
data "databricks_spark_version" "latest" {}
data "databricks_node_type" "smallest" {
  local_disk = true
}

resource "databricks_notebook" "this" {
  path     = "${data.databricks_current_user.me.home}/Terraform"
  language = "PYTHON"
  content_base64 = base64encode(<<-EOT
    # created from ${abspath(path.module)}
    display(spark.range(10))
    EOT
  )
}

resource "databricks_job" "this" {
  name = "MTJ Demo (${data.databricks_current_user.me.alphanumeric})"

  task {
    task_key = "a"

    new_cluster {
      num_workers   = 1
      spark_version = data.databricks_spark_version.latest.id
      node_type_id  = data.databricks_node_type.smallest.id
    }

    notebook_task {
      notebook_path = databricks_notebook.this.path
    }
  }

  task {
    task_key = "c"

    depends_on {
      task_key = "a"
    }

    new_cluster {
      num_workers   = 8
      spark_version = data.databricks_spark_version.latest.id
      node_type_id  = data.databricks_node_type.smallest.id
    }

    notebook_task {
      notebook_path = databricks_notebook.this.path
    }
  }

  task {
    task_key = "b"

    depends_on {
      task_key = "c"
    }

    new_cluster {
      num_workers   = 20
      spark_version = data.databricks_spark_version.latest.id
      node_type_id  = data.databricks_node_type.smallest.id
    }

    notebook_task {
      notebook_path = databricks_notebook.this.path
    }
  }
}

output "job_url" {
  value = "${databricks_job.this.url}/tasks"
}

@nfx nfx added this to the v0.3.9 milestone Oct 8, 2021
@nfx nfx requested review from alexott and pietern October 8, 2021 20:45
@nfx
Copy link
Contributor Author

nfx commented Oct 8, 2021

@pietern can you take a look at completeUrl change?

@codecov
Copy link

codecov bot commented Oct 8, 2021

Codecov Report

Merging #853 (08abb9b) into master (513b574) will increase coverage by 0.57%.
The diff coverage is 85.71%.

❗ Current head 08abb9b differs from pull request most recent head f77c7de. Consider uploading reports for the commit f77c7de to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #853      +/-   ##
==========================================
+ Coverage   82.99%   83.56%   +0.57%     
==========================================
  Files          92       92              
  Lines        8215     8241      +26     
==========================================
+ Hits         6818     6887      +69     
+ Misses        896      855      -41     
+ Partials      501      499       -2     
Impacted Files Coverage Δ
common/version.go 100.00% <ø> (ø)
compute/resource_cluster.go 77.82% <ø> (-0.29%) ⬇️
compute/resource_instance_pool.go 94.11% <ø> (-0.17%) ⬇️
sqlanalytics/resource_sql_endpoint.go 87.50% <ø> (-0.16%) ⬇️
common/reflect_resource.go 82.13% <60.00%> (-0.75%) ⬇️
qa/testing.go 78.14% <60.00%> (+9.79%) ⬆️
compute/resource_job.go 91.66% <87.50%> (+3.33%) ⬆️
common/http.go 86.44% <100.00%> (+1.64%) ⬆️
common/resource.go 83.92% <100.00%> (+18.54%) ⬆️
compute/commands.go 93.13% <100.00%> (+0.06%) ⬆️
... and 5 more

@nfx nfx self-assigned this Oct 8, 2021
@alexott
Copy link
Contributor

alexott commented Oct 10, 2021

If would be nice to split into two diffs - for suppressing the diffs, and actual implementation

@nfx
Copy link
Contributor Author

nfx commented Oct 10, 2021

@alexott It's in separate commits

common/client.go Outdated Show resolved Hide resolved
* provider has to be initialized with `use_multitask_jobs = true`
* `task` block of `databricks_job` is currently slice, so adding and removing different tasks might cause confusing, but still correct diffs
* we may explore `tf:slice_set` mechanics for `task` blocks, though initial testing turned out to be harder to test
* `always_running` parameter still has to be tested for API 2.1 compatibility

This implements feature #747
common/http_test.go Show resolved Hide resolved
common/reflect_resource.go Outdated Show resolved Hide resolved
compute/resource_job.go Outdated Show resolved Hide resolved
compute/resource_job.go Outdated Show resolved Hide resolved
@alexott
Copy link
Contributor

alexott commented Oct 11, 2021

Overall, code looks good. But I would add additional validation - check that dependencies are correct in the Create method or something like - it would be better to do than wait for API return an error

@nfx
Copy link
Contributor Author

nfx commented Oct 12, 2021

Okay, tried making slice_set to work with diff suppress functions, but it's not actually a good TF state change UX for the purpose of MTJ tasks: if one of the fields of the task is changed, the entire task has to be replaced, as hashcode is different and Terraform doesn't yet have a way to present a diff visually as nicely, as for list blocks.

this gist contains work-in-progress for making a custom set hash function, though i've observed it creating a phantom task. https://gist.github.com/nfx/620721f593ca42447528598294e8ddc1. perhaps we'll return to it later, but this feature is going to release with a requirement to sort tasks by task key.

@nfx nfx requested review from alexott and removed request for pietern October 12, 2021 18:52
Copy link
Contributor

@alexott alexott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code looks good, small comments only...

Can we check what API returns if depends_on is pointing to incorrect task_key? Is message understandable? If it's vague, like, "job spec error", then it would be useful to do validation during creation

compute/resource_job.go Outdated Show resolved Hide resolved
docs/resources/job.md Show resolved Hide resolved
@nfx
Copy link
Contributor Author

nfx commented Oct 13, 2021

Can we check what API returns if depends_on is pointing to incorrect task_key? Is message understandable? If it's vague, like, "job spec error", then it would be useful to do validation during creation

@alexott it's actually pretty readable:

  # databricks_job.this will be updated in-place
  ~ resource "databricks_job" "this" {
        id                        = "2561"
        name                      = "MTJ Demo (beep)"
        # (8 unchanged attributes hidden)
      ~ task {
            # (5 unchanged attributes hidden)
          ~ depends_on {
              ~ task_key = "a" -> "blopidy-boo"
            }
            # (3 unchanged blocks hidden)
        }
        # (4 unchanged blocks hidden)
    }

databricks_job.this: Modifying... [id=2561]
╷
│ Error: Invalid dependency graph for task 'c_after', job specification does not contain task with key 'blopidy-boo'
│ 
│   with databricks_job.this,
│   on main.tf line 27, in resource "databricks_job" "this":
│   27: resource "databricks_job" "this" {

@nfx
Copy link
Contributor Author

nfx commented Oct 13, 2021

Adding task-order-insensitive databricks_job_task is not possible until we can create a "placeholder" job - but here's the WIP of that resource. https://gist.github.com/nfx/473e07d6aff0ffb84b8fc8c2acf537e8

@nfx nfx merged commit b5075f2 into master Oct 13, 2021
@nfx nfx deleted the mtj branch October 13, 2021 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Support for task orchestration
3 participants