-
Notifications
You must be signed in to change notification settings - Fork 384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Single Node cluster #411
Comments
We would also be using |
Hi, from my point of view no one provided us a solution. |
@sudeepgupta90 @kamakay right now busy with preparing next release. If you want to speed things up - pull requests are more than welcome :) |
I see this was committed but it's still not working: Any idea why? |
What kind of problem have you found? Do you have the issue when executing the terraform script? |
When I set the num_workers to 0 I get |
@lyogev @kamakay c28b209 was breaking It was breaking the release, so i've reverted that commit. it would be fixed somewhere in 0.3.x @lyogev @kamakay @sudeepgupta90 @kung-foo @tkasu what use-cases do you foresee with this feature? would it drive DBU usage up? |
Hi thanks for the feedback, I thing the first thing to understand is if the underlying databricks api are supporting num_workers = 0 for single node cluster. |
Plan: 2 to add, 0 to change, 0 to destroy. Error: cannot create cluster: NumWorkers could be 0 only for SingleNode clusters. See https://docs.databricks.com/clusters/single-node.html for more details with databricks_cluster.modeling_cluster, This seems to be coming out again this was working last weekend and now is failing for now I am working with small testing so I will bump it to 1 but num_workers were 0 before and was working great will come to update if it works |
What I am actually trying to be able to do here is install libraries, but it doesn't seem clear how to install libraries on a databricks_job cluster I had tried Iibrary { and said it wasn't allowed, which seems a little odd job_cluster - (Optional) A list of job databricks_cluster specifications that can be shared and reused by tasks of this job. Libraries cannot be declared in a shared job cluster. You must declare dependent libraries in task settings. Multi-task syntax This is fine but there isn't really a good way to see how this is done effectively and my DB technical rep said that docker image bases would lead to to much complexity |
Libraries are installed to jobs differently than on clusters - look into specific task definition. It's how Workflows are defined |
Here is a simple way to do it on the cluster, if you have an example that supports this that would be amazing as all documentation I have seen does not show that process happening in the job_cluster new_cluster process, but these would be essentially the same thing on the back end if I was to create and interactive cluster that stand-alone, which make working with databricks jobs more complicated, and additionally the docs say the following: spark_submit_task Configuration Block
This seems like a lot of overkill to get an install working on a cluster when it is supported. Yes, the docker image being passed to the cluster seems the easier way to do this overall to keep things stable, but this takes a little more setup than I have time to support currently, but will eventually move in that direction. Ideally I have a down image in my repo that I want created in the set up the job cluster that will allow that to happen but I have not seen that happen anywhere yet. resource "databricks_cluster" "shared_autoscaling" {
cluster_name = "Shared Autoscaling"
spark_version = data.databricks_spark_version.latest_lts.id
node_type_id = data.databricks_node_type.smallest.id
autotermination_minutes = 20
autoscale {
min_workers = 1
max_workers = 2
}
dynamic "library" {
for_each = toset(var.listOfMavenPackages)
content {
maven {
coordinates = library.value
}
}
}
} I know how to do this when it's not a job cluster. The docs don't show how to approach it |
for any of you wondering what to do here with the jobs as this isn't extremely obvious dynamic "task" {
for_each = var.task_lists
content {
task_key = task.value.task_resource_name
dynamic "pipeline_task" {
for_each = task.value.task_type == "data" ? [1]: []
content {
pipeline_id = task.value.task_resource_id
}
}
dynamic "notebook_task" {
for_each = task.value.task_type == "notebook" ? [1] : []
content {
notebook_path = task.value.task_resource_id
}
}
dynamic "depends_on" {
for_each = var.task_dependency[task.value.task_resource_name] == "" ? [] : [var.task_dependency[task.value.task_resource_name]]
content {
task_key = var.task_dependency[task.value.task_resource_name]
}
}
dynamic "library" {
for_each = toset(var.list_of_R_packages)
content {
cran {
package = library.value
}
}
}
job_cluster_key = task.value.task_type == "notebook" ? var.job_cluster_key : null
}
} This is how I had come through and figured it out now. The variable is just a list of R packages Ideally, if I remember ill come through and show what we did for the docker image but moving forward with the project for now this is what it looks like in terraform task {
+ job_cluster_key = "Cluster"
+ retry_on_timeout = (known after apply)
+ task_key = "EXECUTE"
+ depends_on {
+ task_key = "00_SayHello"
}
+ library {
+ cran {
+ package = "Robyn"
}
}
+ library {
+ cran {
+ package = "patchwork"
}
}
+ notebook_task {
+ notebook_path = "/Repos/GITHUB/TestProject/FakePath/Fake_File"
}
} happy coding everyone seems like single node is working again today as well |
Dear all,
I would ask any help in order to add a single node cluster on azure databricks using
Here my code
but everytime i run terraform apply i receive the error:
Error: Missing required field: size
I have tried different configurations but nothing appears to help me.
Any help willl be really appreciated.
Thx
The text was updated successfully, but these errors were encountered: