This is document to capture standing up nodes in cardano using divnix and terraform

This document belongs with https://github.com/bernokl/nix-ops-node
We are going to explore standing up cardano nodes using out existing divnix and terraform pattern
The end goal is autonomous deploy of iohk/cardano-node flakes to have relay and then blockproducers.

Stand up relay node

Here is the steps I aim to follow:

- Deploy nixos ec2 instance in aws using terraform
- Clone the cardano-node repository.
- In the cardano-node repository, create a new file called configuration.nix.
- In the configuration.nix file, add the following code:
       {
         imports = [
           "github:input-output-hk/cardano-node?ref=master"
         ];
       }
- Run the following command to build the cardano node:
       nix build github:input-output-hk/cardano-node?ref=master
- Once the cardano node is built, you can start it by running the following command:
       nix run github:input-output-hk/cardano-node?ref=master run
- Here are some additional details about the instructions above:
  - The imports section of the configuration.nix file specifies the Nix flakes that the cardano node depends on.
  - The nix build command builds the cardano node from the Nix flakes that are specified in the imports section.
  - The nix run command starts the cardano node.
  - The cardano-cli tool is used to interact with the cardano node.

In my repo I am going to copy the terrafor/cache-server to make my relay-node folder.
I am going to strip the terraform to give me just aws instance:
I enable envrc with:

direnv allow .

That loads my aws keys into env
TODO: Ongoing reminder that we need to think about credentials.
Run init

aws_terraform_init

Apply:

aws_terraform_apply

Grab ip from aws-console ssh in

ssh -i id_rsa.pem root@xx.xx.xx.xx

OK, lets run:

nix build github:input-output-hk/cardano-node?ref=master

Build was less than 3 minuts lets try the run
Lets pass in run

nix run github:input-output-hk/cardano-node?ref=master run

New error! OO this is why they want you to clone the rope first, lets go look

InvalidYaml (Just (YamlException "Yaml file not found: configuration/cardano/mainnet-config.json"))

cardano-node: YAML exception:
Yaml file not found: configuration/cardano/mainnet-config.json

Clone repo to our server

git clone https://github.com/input-output-hk/cardano-node.git

cd

cd cardano-node

Lets create the configuration.nix

{
  imports = [
    "github:input-output-hk/cardano-node?ref=master"
  ];
}

Run from inside repo

nix run github:input-output-hk/cardano-node?ref=master run

Boom, we have a running relay.
The above should be very easy to add to user_data in terraform.
Lets strip user_data out to file and:
- Clone repo
- Add our configuration.nix
- Build
- Run
Lets update the main.tf to have this:

user_data = "${file("start_node.sh")}"

And lets go crate a start_node.sh

#!env bash -xe
git clone https://github.com/input-output-hk/cardano-node.git &&
cd cardano-node
cat << 'EOF' > configuration.nix
{
  imports = [
    "github:input-output-hk/cardano-node?ref=master"
  ];
}
EOF
yes | nix build github:input-output-hk/cardano-node?ref=master && 
echo node_done_building > /tmp/outNix
yes | nix run github:input-output-hk/cardano-node?ref=master run

Now we destroy the host and start the apply again so that start_node.sh can run.
We can manually build and start a cardano relay by running: aws_terraform_apply
Right now the node is up, and I see a process: nix build github:input-output-hk/cardano-node?ref=master
If I strace there is activity, also we are steadily using more disk space.
I do not understand why my manual build was so much quicker.
It has been building for almost exactly 2 hours. I do see the load is 12 on 4 cores meaning the cpu is not nearly keeping up.
I think I might have scaled to 2xlarge or even 4xlarge for the build phase in diypool, might have to consider doing the same here.
Will leave it to run for now, I wish I had a sense of % done
It took a couple of hours of google and play, but came up with:

nix build --accept-flake-config github:input-output-hk/cardano-node?ref=master

I got it to work, but still have some unexpected behaviuor.
Let me destroy and rebuild with everything vanilla then try to run nix build –accept-flake-config on clean os
YAS, running this manually builds a node I can run in 5 minutes

nix build --accept-flake-config github:input-output-hk/cardano-node?ref=master

OK going to destroy and build with that in my startup_node.sh

nix build --accept-flake-config github:input-output-hk/cardano-node?ref=master &&
echo we_got_clean_build > /tmp/outNix
nix run --accept-flake-config github:input-output-hk/cardano-node?ref=master run

This builds the node! and I think starts it, BUT it does not say running.
The matrue solution would be to have the run executed by a daemon service like system.d.
At this point I fell into a multi hour investigation into adding traditional /etc/systemd/system/*.service I could run.
Turns out nix wants you to define your service in configurattion.nix
It seems like you add it to configurion something like:

config.systemd.services.interosEsMdb = {
  description = "Interos MongoDB+ES log capture";
  after = ["network.target"];
  wantedBy = ["multi-user.target"];

  serviceConfig = {
    # change this to refer to your actual derivation
    ExecStart = "${interosEsMdb}/bin/syslog-exec.sh";
    EnvironmentFile = "${interosEsMdb}/lib/es-service.env";
    Restart = "always";
    RestartSec = 1;
  }

Lots of itteration later I ended with this:

systemd.services.cardano-node-relay-daemon = {
  enable = true;
  description = "Cardano relay daemon";
  after = ["network.target"];
  wantedBy = ["multi-user.target"];

  serviceConfig = {
    ExecStart = "${pkgs.nix}/bin/nix run --accept-flake-config github:input-output-hk/cardano-node?ref=master run";
    Restart = "always";
    User = "root";
    WorkingDirectory="/cardano-node/";
    RestartSec = 1;
  };
};

I also needed to add line to startup_node.sh to start the service

systemctl start cardano-node-relay-daemon.service

Nice cheat to find aws ec2 external ip, replace running with tag or other metadata you car about

aws ec2 describe-instances --filters 'Name=instance-state-name,Values=running' --query 'Reservations[*].Instances[*].[InstanceId,PublicIpAddress]' --output text

returns:

i-01a7e8d4e89049894     13.239.136.44

And on this host I see the daemon running:

> systemctl status cardano-node-relay-daemon.service 
● cardano-node-relay-daemon.service - Cardano relay daemon
     Loaded: loaded (�]8;;file://ip-172-31-19-21.ap-southeast-2.compute.internal/etc/systemd/system/cardano-node-relay-daemon.service/etc/systemd/system/cardano-node-relay-daemon.service�]8;;; enabled; preset: enabled)�]8;;
     Active: active (running) since Mon 2023-05-08 13:47:51 UTC; 4h 24min ago
   Main PID: 2101 (cardano-node)
         IP: 15.2G in, 157.6M out
         IO: 316.0K read, 18.3G written
      Tasks: 16 (limit: 9155)
     Memory: 6.1G
        CPU: 8h 50min 11.556s
     CGroup: /system.slice/cardano-node-relay-daemon.service
             └─2101 /nix/store/0ndig34c9qizj3g4z1s1scwk3pxcvfzn-cardano-node-exe-cardano-node-8.0.0/bin/cardano-node>

May 08 18:12:16 ip-172-31-19-21.ap-southeast-2.compute.internal nix[2101]: [ip-172-3:cardano.node.ChainDB:Notice:35]>
May 08 18:12:18 ip-172-31-19-21.ap-southeast-2.compute.internal nix[2101]: [ip-172-3:cardano.node.ChainDB:Notice:35]>
May 08 18:12:19 ip-172-31-19-21.ap-southeast-2.compute.internal nix[2101]: [ip-172-3:cardano.node.ChainDB:Notice:35]>
May 08 18:12:20 ip-172-31-19-21.ap-southeast-2.compute.internal nix[2101]: [ip-172-3:cardano.node.ChainDB:Notice:35]>
May 08 18:12:21 ip-172-31-19-21.ap-southeast-2.compute.internal nix[2101]: [ip-172-3:cardano.node.ChainDB:Notice:35]>
May 08 18:12:23 ip-172-31-19-21.ap-southeast-2.compute.internal nix[2101]: [ip-172-3:cardano.node.ChainDB:Notice:35]>
May 08 18:12:24 ip-172-31-19-21.ap-southeast-2.compute.internal nix[2101]: [ip-172-3:cardano.node.ChainDB:Notice:35]>
May 08 18:12:25 ip-172-31-19-21.ap-southeast-2.compute.internal nix[2101]: [ip-172-3:cardano.node.ChainDB:Notice:35]>
May 08 18:12:26 ip-172-31-19-21.ap-southeast-2.compute.internal nix[2101]: [ip-172-3:cardano.node.ChainDB:Notice:35]>
May 08 18:12:28 ip-172-31-19-21.ap-southeast-2.compute.internal nix[2101]: [ip-172-3:cardano.node.ChainDB:Notice:35]>

Robert suggeted I focus on tailscale integration next as he already had the template on hokioi
Here is the updates I made to get the ec2 instance running tailscale and connecting to yumi network

 { config, lib, pkgs, modulesPath, ... }:
+let
+   system.autoUpgrade.channel = "https://nixos.org/channels/nixos-unstable";
+   nixos-unstable = import <nixos-unstable> {};
 
-{
+in {
   imports = [ "${modulesPath}/virtualisation/amazon-image.nix" ];
 
   ec2.hvm = true;
@@ -26,10 +29,47 @@
     };
   };
 
+ services.tailscale.enable = true;
+
+ systemd.services.tailscale-autoconnect = {
+    description = "Automatic connection to Tailscale";
+
+    # make sure tailscale is running before trying to connect to tailscale
+    after = [ "network-pre.target" "tailscale.service" ];
+    wants = [ "network-pre.target" "tailscale.service" ];
+    wantedBy = [ "multi-user.target" ];
+
+    # set this service as a oneshot job
+    serviceConfig.Type = "oneshot";
+
+    # have the job run this shell script
+    script = with pkgs; ''
+      # wait for tailscaled to settle
+      sleep 2
+
+      # check if we are already authenticated to tailscale
+      status="$(${tailscale}/bin/tailscale status -json | ${jq}/bin/jq -r .BackendState)"
+      if [ $status = "Running" ]; then # if so, then do nothing
+        exit 0
+      fi
+
+      # otherwise authenticate with tailscale
#
+      ${tailscale}/bin/tailscale up --ssh -authkey tskey-auth-########
+    '';
+};
+
+  networking.firewall = {
+    checkReversePath = "loose";
+    enable = true;
+    trustedInterfaces = [ "tailscale0" ];
+    allowedUDPPorts = [ config.services.tailscale.port ];
+  };
+
+  networking.hostName = "aws-1";
+  networking.domain = "husky-ostrich.ts.net";

   environment.systemPackages = with pkgs; [
     git
     vim
     htop
+    tailscale
     lsof
   ];
 }

Note we give the machine a networking.hostName, that registers the name we want for this machine in tailscale
Also VERY important once it is connected to tailscale your ssh sessions over the 10. network with be authenticated through tailscale.
This is a very important bennefit.
Also very important, the authkey used needs to be set to be ephemiral, pre-authenticate the hosts and assign tags we want for the machines.
This makes management very simple, but needs to be carefully managed.
We will integrate SOPS/1Password/Key-store to hold keys we can then hydrate on host with env-vars in our session.
Trying to do some testing, lets start with what we can see on the node:

journalctl -u cardano-node-relay-daemon.service

May 10 18:08:04 aws-1 nix[2100]: Event: LedgerUpdate (HardForkUpdateInEra S (S (Z (WrapLedgerUpdate {unwrapLedgerUpdate = ShelleyUpdatedProtocolUpdates []}))))
May 10 18:08:04 aws-1 nix[2100]: [aws-1:cardano.node.ChainDB:Notice:35] [2023-05-10 18:08:04.96 UTC] Chain extended, new tip: 2198c40091993baed54b4638473327b0b77c5dccaa56768690f5b56>
May 10 18:08:06 aws-1 nix[2100]: [aws-1:cardano.node.ChainDB:Notice:35] [2023-05-10 18:08:06.21 UTC] Chain extended, new tip: 2e7ccf635d45201aaf52c5a2e7e10f7c5b90a2ca5ed10356210859e>
May 10 18:08:07 aws-1 nix[2100]: [aws-1:cardano.node.ChainDB:Notice:35] [2023-05-10 18:08:07.46 UTC] Chain extended, new tip: 18ba572b54363a6bcb43bccb283828ab559cd5ddf7d73c2b0c07c51>
May 10 18:08:08 aws-1 nix[2100]: [aws-1:cardano.node.ChainDB:Notice:35] [2023-05-10 18:08:08.71 UTC] Chain extended, new tip: 87cfa4a2f2258217adbde872e2ab53906f43d95e1f9fbbbc4dc362a>
May 10 18:08:09 aws-1 nix[2100]: [aws-1:cardano.node.ChainDB:Notice:35] [2023-05-10 18:08:09.53 UTC] Chain extended, new tip: 8add136563b0c36616f42ad52815e3c666be1b8b087e4268949395c>
May 10 18:08:09 aws-1 nix[2100]: Event: LedgerUpdate (HardForkUpdateInEra S (S (Z (WrapLedgerUpdate {unwrapLedgerUpdate = ShelleyUpdatedProtocolUpdates [ProtocolUpdate {protocolUpda>
May 10 18:08:09 aws-1 nix[2100]: [aws-1:cardano.node.ChainDB:Notice:35] [2023-05-10 18:08:09.54 UTC] Chain extended, new tip: e6878f21c35b5c9c233bf54207c28dbeb0743c6cf19d1468afc78c6>
May 10 18:08:10 aws-1 nix[2100]: [aws-1:cardano.node.ChainDB:Notice:35] [2023-05-10 18:08:10.79 UTC] Chain extended, new tip: 92fc5c7b7e6b8a84623787b2b3a52400d9388958258e7cdf57c9d6f>
May 10 18:08:12 aws-1 nix[2100]: [aws-1:cardano.node.ChainDB:Notice:35] [2023-05-10 18:08:12.04 UTC] Chain extended, new tip: f02680481aa08d0d53b4d1574d063fcba04d8b10b8e91ef07c52737>
May 10 18:08:13 aws-1 nix[2100]: [aws-1:cardano.node.ChainDB:Notice:35] [2023-05-10 18:08:13.29 UTC] Chain extended, new tip: 83fdc9ce9b078a25880964ab7c87bf5e66a73849fbe06918580f738>

That seems positive confirmation that we are participating in the network traffic.
Going to try to query the node using cardano-cli turns out to be a pain.
First I decide to install it like this:

git clone https://github.com/input-output-hk/cardano-node.git
cd cardano-node/cardano-cli 
cabal update 
cabal build
cabal install
/root/.cabal/bin/cardano-cli --version
#returns: cardano-cli 8.1.0 - linux-x86_64 - ghc-8.10
/root/.cabal/bin/cardano-cli query tip --mainnet
# returns Missing --socket-path SOCKET_PATH

I have been trying to find the socket path for this node for too long.
Next options is to specify –socket-path when I start it up or keep trying to resolve this absurd roadblock.
I do feel a bit abstracted from what I have deployed I am sure there has to be default socket path, I even tried mlocate for
I noticed two ports bound to 127.0.0.1, running curl 127.0.0.1:12788 I can see there is a web page
Lets forward it with socat so I can hit that address over tailscale.

nix-env -i socat
  socat TCP-LISTEN:5000,reuseaddr,fork TCP:127.0.0.1:12788

Now I go to my laptop that is logged into tailscale and visit http://100.xx.xx.72:5000
YAS I have a very nice dashboard with residency (memory?) allocation rate and productivity. I am not 100% what these relate to but they seem healthy.
Mmmm I feel like I might want to leave the cardano-cli to someone who understands it better like Jack or Robert.

#current

We can now deploy a cardano-relay in AWS using, terraform init, terraform user_data (start_node.sh) and configuration.nix
The node gets started as a service we define in configuration.nix “systemctl status cardano-node-relay-daemon.service”
The new machine registers itself in tailscale, you can use tailscales to authenticate ssh over the 10. network, you can find machine by ip or networking.hostName
I can see healty logs with work the node is doing, I can see a dashboard with healty metrics, I still need to query and cli interact with the server
Next step is cli testing of the node. I think I will ask for help from Jack for this one

Update terraform structure of nix-ops-node to implement terragrunt.

First lets add some folders: accounts/sandbox/ap-southeast2
Now we move node-relay into that region ie accounts/sandbox/ap-southeast2/node-relay-1
The 5 files in that directory (main.tf, terragrunt.hcl, configuration.nix, start_node.sh, variables.tf) will allow you to spin up a machine
Here is our layout:

terraform
├── modules
│   ├ node-relay (goal would be to set common features here)
│   ├── main.tf
│   ├── variables.tf
├── accounts
│   ├ sandbox
│   ├── terragrunt.hcl(This sets up our .tfstate in s3, long term this would keep common configs like, machine type or other sandbox components like security_groups?)
│   ├── ap-southeast-2
│   ├────── terragrunt.hcl(I did not keep, but do we have common components that would be here?) 
│   ├────── node-relay-1 (This works as is in the repo)
│   ├───────  main.cf (This is self contained and works without the module)
│   ├───────  configuration.nix (This contains initial machine state including tailscale and setting up iohk/cardano-node service)
│   ├───────  start_node.sh ( This imports cardano-node referer, builds the flake and then starts the service we set up in configuration.nix)
│   ├───────  terragrunt.hcl (This passes in variables, for this POC it just passes in machine type, but can be expanded.
│   ├───────  variables.tf (Defines the variables used by the module)
│   ├────── node-relay-2 (This is experiment to be more DRY, it would mean less duplication of code in main.tf by re-using what we have in modules)
│   ├───────  main.cf (Sources modules/node-relay and passes in variblees it needs.)
│   ├───────  configuration.nix (Same as above)
│   ├───────  start_node.sh (Same as above)
│   ├───────  terragrunt.hcl (sources modules/node-relay *note the source in main.tf should not be needed, but this is still in testing, provides values to varibles.
│   ├───────  variables.tf (Defines the variables used by the module)
│   ├── ap-southeast-2
│   ├────── terragrunt.hcl(I did not keep, but do we have common components that would be here?) 
│   ├────── node-relay-1 (This works as is in the repo)
│   ├───────  main.cf (This is self contained and works without the module)
│   ├───────  configuration.nix (This contains initial machine state including tailscale and setting up iohk/cardano-node service)
│   ├───────  start_node.sh ( This imports cardano-node referer, builds the flake and then starts the service we set up in configuration.nix)
│   ├───────  terragrunt.hcl (This passes in variables, for this POC it just passes in machine type, but can be expanded.
│   ├───────  variables.tf (Defines the variables used by the module)

Before you can deploy anything in the repo you will need to replace the tailscale key in configuration.nix and the whitelist ip address to one that will ssh in.
TODO: Decide if we want to allow ssh outside tailscale perhaps not? The applied machine is accessible from tailscale, we can alwaays manually add whiteliting if we can not get to it from tailscale
Apply your changes with:

terragrunt init &&
terragrunt apply

node-relay-1 works as expected we still manually add TS-key, whitelist ip and end up with SSH key locally.
TODO: incorporate sops to handle keys and secrets
node-relay-2 is WORKING! (can you tell it was a pain?)
node-relay-2 is more DRY, main.tf is only call out to the module.
Next I am going to create copies of relay-1 in 2 other regions provide each with unique name and key see how we can interact between them.
Our 3 regions for this proof of concept (can always change in new regions)

ap-southeast-2 (Asia Pacific Sydney)
eu-north-1 (Europe Stockholm) 
ap-south-1 (Asia Pacific Mumbai)

Here are the steps to create a new region:
Copy a known good directory - I am copying sanbox/ap-southeast-2 naming the copy ap-south-1
I am going to re-use only structure of node-relay-1 because I only need one node in each region, and architectually I am not sold on making nodes into a module, I want this to be easilly replicated to other infrastructures. I am wondering if the directory structure is too complex. I am keeping it for ease of remote state management, but I am open to improvement suggestions.
main.tf:

- You will need to update the provider.region to your new region
- Make sure the nix_image.source has reference to a commit that has an image specified for the region you are setting up
- Make sure any ip's you need whitelisted is in aws_security_group.ssh_and_egress.from_port=22 cidr_blocks, 
- TODO if it is a relay-node we should be able to reach over tailscale then perhaps we want to get rid of port 22 whitelisting in aws_security_group
- For aws_instance.machine.subnet_id you will currently need to go look this up in aws-console-vpc-subnets

For the above we have a dependency on keeping up the images list in a fork of https://github.com/nix-community/terraform-nixos.git we host.
TODO: There should be automation around https://github.com/nix-community/terraform-nixos/blob/master/aws_image_nixos/url_map.tf it is currently a manual update of the copy our code points to.
configuration.nix:

- Make sure you are using the right tskey-auth-xxx 
- Update the networking.hostName

We will add a .envrc to this folder to update our AWS_REGION for ap-south1 I did:

# I source the original envrc that gets unloaded when I cd in here
source ../../../../.envrc
# I add the region we will be using
export AWS_REGION=ap-south-1

This works to deploy a new node to ap-south-1, lets do the same for eu-north-1
Made copy of ap-south1 to create eu-north1
Made the same updates to main.tf, configuration.nix and .envrc
I set my region in aws by running “direnv allow .” from the eu-north1/
I confirm I am pointing to the right region with

aws configure get region

Next I run terragrunt init/apply
And I see my new aws-eu-n-1-1 node in tailscale, and YUP it has a running node.

Block producer

Tomorrow morning will set up folder structure using relay, update to create producer
Here is the steps bard.google.com comes up with:
Notice it is lying there is no install.sh for the cardano node, but I do see cardano-node.cabal, I will stick with flake build run.

The recommended steps for deploying a Cardano block producer according to the Cardano node repository on GitHub are as follows:

- Install the latest version of Cardano node.
- Create a new directory for your block producer node.
- Copy the config.yaml file from the Cardano node repository to your new directory.
- Edit the config.yaml file to specify your pool's details, such as its name, ID, and keys.
- Start the relay node.
- Start the block producer node.
- The following are some additional details about each step:

- To install the latest version of Cardano node, you can use the following command:
Code snippet
curl -sSLO https://github.com/input-output-hk/cardano-node/master/install.sh
bash install.sh
Use code with caution. Learn more
To create a new directory for your block producer node, you can use the following command:
Code snippet
mkdir my-pool
Use code with caution. Learn more
- To copy the config.yaml file from the Cardano node repository to your new directory, you can use the following command:
Code snippet
cp https://github.com/input-output-hk/cardano-node/master/config.yaml my-pool
Use code with caution. Learn more
- To edit the config.yaml file, you can use a text editor, such as Nano or Vim. In the config.yaml file, you will need to specify your pool's details, such as its name, ID, and keys. The following is an example of a config.yaml file for a pool named "My Pool":
Code snippet
relay-node:
host: "localhost"
port: 3001

block-producer-node:
host: "localhost"
port: 3002

pool-key: "/path/to/pool/key.skey"

vrf-key: "/path/to/vrf/key.skey"
Use code with caution. Learn more
- To start the relay node, you can use the following command:
Code snippet
cardano-node \
--config config.yaml \
--relay-node \
--host localhost \
--port 3001 \
&> relay.log &
Use code with caution. Learn more
- To start the block producer node, you can use the following command:
Code snippet
cardano-node \
--config config.yaml \
--block-producer-node \
--host localhost \
--port 3002 \
--pool-key /path/to/pool/key.skey \
--vrf-key /path/to/vrf/key.skey \
&> block-producer.log &
Use code with caution. Learn more
- Once you have completed these steps, your Cardano block producer node will be deployed and running. You can check the status of your block producer node using the following command:
Code snippet
cardano-cli query block-producer-status
Use code with caution. Learn more
- This will output the following information:
Code snippet
Name: My Pool
Pool ID: xyz1234567890abcdefghijklmnopqrstuv
Status: Running
Active Stake: 1000000000 lovelace
Last Epoch: 1234567890
Last Slot: 1234567890

I think I can update the files for the block producer in the cloned repo.
Ok, lets set up a new directory in ap-southeast2 for our block producer, making a copy of node-relay-aws-1
Lets update configuration.nix

# I update the following service name and description.
  systemd.services.cardano-node-block-producer-daemon = {
    enable = true;
    description = "Cardano block producer daemon";
# Need to figure if I need to pass flags into the nix run on the ExecStart line
    serviceConfig = {
      ExecStart = "${pkgs.nix}/bin/nix run --accept-flake-config github:input-output-hk/cardano-node?ref=master run";
# New tailscale auth key
# Note I destroyed all existing keys and set up a new one for this. We have been very lax with keys will harden when we move out of sandbox
      ${tailscale}/bin/tailscale up --ssh -authkey tskey-auth-xxxxx
# Update hostName
# TODO: parhaps go back to relays and add "-r-" to indicate relay, adding "-bp-" to this one for now.
  networking.hostName = "aws-ap-se-bp-2-1";

For start_node.sh I am going to comment out nix build and systemctl start so I can manually play with them.
And updates to main.tf:

# I am going to completely remove 22 whitelisting from security_groups:
-     ingress {
-         from_port   = 22
-         to_port     = 22
-         protocol    = "tcp"
-         cidr_blocks = [ "xx.xx.xx.xx/32" ]
-     }
# TODO: Do we want to allow tls_ssh keys on block producers? Only leave ssm access? Leaving it alone for now
# Do we want to give block producers their own subnet? interconectivity is through tailscale so cant see downside of seperation. Leaving it for current testing
# TODO: Decide on relay vs bp subnets

Leaving terragrunt.tf alone, it only contains instance_type
Lets init our new directory:

terragrunt init

Lets double check we are in the right region

aws configure get region

Yup it returns:

ap-southeast-2

And lets deploy:

terragrunt apply

We suddenly have order of operation issue:

   on .terraform/modules/deploy_nixos/deploy_nixos/main.tf line 129, in locals:
│  129:   ssh_private_key      = local.ssh_private_key_file == "-" ? var.ssh_private_key : file(local.ssh_private_key_file)

To get around this I am going to touch and chmod the file to exist.
TODO: figure out why this is required for this apply

touch ./id_rsa.pem
chmod 600 ./id_rsa.pem

Lets try again:

terragrunt apply

Oops without the 22 whitelisting the deploy_nixos can not get on the machine to deploy

module.deploy_nixos.null_resource.deploy_nixos: Still creating... [1m30s elapsed]
╷
│ Error: file provisioner error
│
│   with module.deploy_nixos.null_resource.deploy_nixos,
│   on .terraform/modules/deploy_nixos/deploy_nixos/main.tf line 165, in resource "null_resource" "deploy_nixos":
│  165:   provisioner "file" {

For now I am going to add whitelisting for my ip back in
Ok it is up, unexpectedly it still started the service, I do have enabled set to true, but thought that setting did not allow autostart
TODO: Trace service startup so you can see what it is doing.
Ok I stopped the service, lets go look at our LLM steps and see if we can tease out configuration files we need and process for setting up.
I know me and Jack did this a while back, I can go look at that as backup, but trying to stay in current repo if I can help
mmm llm is a bit of a lyer might be more trouble than it is worth
Looking at developers.cardano.org I see there is 4 files involved in BP:

- Main Config: It contains general node settings such as logging and versioning. It also points to the Byron Genesis and the Shelly Genesis file.
- Byron Genesis: It contains the initial protocol parameters and instructs the cardano-node on how to bootstrap the Byron Era of the Cardano blockchain.
- Shelly Genesis: It contains the initial protocol parameters and instructs the cardano-node on how to bootstrap the Shelly Era of the Cardano blockchain.
- optional    Alonzo Genesis: It contains the initial protocol parameters and instructs the cardano-node on how to bootstrap the Alonzo Era of the Cardano blockchain.
- optional   Conway Genesis: It contains the initial protocol parameters and instructs the cardano-node on how to bootstrap the Conway Era of the Cardano blockchain.
- Topology: It contains the list of network peers (IP Address and Port of other nodes running the blockchain network) that your node will connect to.

In the repo I see:

ls -al configuration/cardano

Returns:

total 1184
drwxr-xr-x 3 root root    4096 May 18 15:36 .
drwxr-xr-x 6 root root    4096 May 18 14:29 ..
drwxr-xr-x 2 root root    4096 May 18 14:29 alonzo
-rw-r--r-- 1 root root    9459 May 18 14:29 mainnet-alonzo-genesis.json
-rw-r--r-- 1 root root 1056360 May 18 14:29 mainnet-byron-genesis.json
-rw-r--r-- 1 root root    2885 May 18 14:29 mainnet-config.json
-rw-r--r-- 1 root root    1657 May 18 14:29 mainnet-config-new-tracing.yaml
-rw-r--r-- 1 root root    8263 May 18 14:29 mainnet-config.yaml
-rw-r--r-- 1 root root      22 May 18 14:29 mainnet-conway-genesis.json
-rw-r--r-- 1 root root     284 May 18 14:29 mainnet-p2p-toplogy.json
-rw-r--r-- 1 root root    2486 May 18 14:29 mainnet-shelley-genesis.json
-rw-r--r-- 1 root root     128 May 18 14:29 mainnet-topology.json

Hopefully that means I want to update config.json and p2p-topology.json
Looking at the readme in configuration I see files should be referenced on https://book.world.dev.cardano.org/environments.html
MMM, Next I need to now look at changes that I need vs content of these files
This is documentation I could find on these based on cardano-foundation stake-pool-course me and Jack did in 2021
Two thumbs up for waybackMachine for giving me access to https://cardano-foundation.gitbook.io/stake-pool-course/stake-pool-guide/getting-started/
Config files explained:
Topology.json
Tells your node to which nodes in the network it should talk to. A minimal version of this file looks like this:

{
  "Producers": [
    {
      "addr": "x.x.x.x",
      "port": 3001,
      "valency": 1
    }
  ]
}

*.configuration.json

# This was from the tutorial, not this requires magic because it refers to testnet
{
  "Protocol": "Cardano",
  "GenesisFile": "testnet-shelley-genesis.json",
  "RequiresNetworkMagic": "RequiresMagic",
# This is the same section in 2023/5 mainnet-configuration.json 
  "Protocol": "Cardano",
  "RequiresNetworkMagic": "RequiresNoMagic",
  "ShelleyGenesisFile": "mainnet-shelley-genesis.json"

it also update parameteres

This protocol version number gets used by block producing nodes as part of the system for agreeing on and synchronising protocol updates.You just need to be aware of the latest version supported by the network. You dont need to change anything here.

it configures Tracing

Tracers tell your node what information you are interested in when logging, such as switches that you can turn ON or OFF according the type and quantity of information that you are interesetd in. This provides fairly coarse grained control, but it is relatively efficient at filtering out unwanted trace output.

it allows fine grained logging control, I see the current file uses this setting for EKG metrics.

It is also possible to have more fine-grained control over the filtering of trace output, and to match and route trace output to particular backends. This is less efficient than the coarse trace filters above but provides much more precise control. options: mapBackends This routes metrics matching specific names to particular backends. This overrides the defaultBackends listed above. Note that it is an override and not an extension so anything matched here will not go to the default backend, only to the explicitly listed backends. mapSubtrace This section is more expressive, we are working on its documentation.

On my block producers I will nered to update mainnet-topology.json to:

nano testnet-topology.json

  {
    "Producers": [
      {
        "addr": "<YOUR RELAY NODE TAILSCALE IP ADDRESS>",
        "port": <PORT>,
        "valency": 1
      }
    ]
  }

On my relay nodes I will need to make the following updates:

{
  "Producers": [
    {
      "addr": "<YOUR BLOCK-PRODUCING NODE IP ADDRESS>",
      "port": <PORT>,
      "valency": 1
    },
    {
      "addr": "<OTHER RELAY NODE IP ADDRESS>",
      "port": <PORT>,
      "valency": 1
    },
    {
      "addr": "<OTHER RELAY NODE IP ADDRESS>",
      "port": <PORT>,
      "valency": 1
    }
  ]

I know how to pass in the flags I need!

nix run --accept-flake-config github:input-output-hk/cardano-node?ref=master run -- --help

Returns:

Usage: cardano-node run [--topology FILEPATH]
                          [--database-path FILEPATH]
                          [--socket-path FILEPATH]
                          [ --tracer-socket-path-accept FILEPATH
                          | --tracer-socket-path-connect FILEPATH
                          ]
                          [--byron-delegation-certificate FILEPATH]
                          [--byron-signing-key FILEPATH]
                          [--shelley-kes-key FILEPATH]
                          [--shelley-vrf-key FILEPATH]
                          [--shelley-operational-certificate FILEPATH]
                          [--bulk-credentials-file FILEPATH]
                          [--host-addr IPV4]
                          [--host-ipv6-addr IPV6]
                          [--port PORT]
                          [--config NODE-CONFIGURATION]
                          [--snapshot-interval SNAPSHOTINTERVAL]
                          [--validate-db]
                          [ --mempool-capacity-override BYTES
                          | --no-mempool-capacity-override
                          ]
....
with lots more available options...

The following will run our node with /tmp/cardano-node.socket:

nix run --accept-flake-config github:input-output-hk/cardano-node?ref=master run -- --socket-path /tmp/cardano-node.socket

Confirmation:

lsof /tmp/cardano-node.socket
COMMAND    PID USER   FD   TYPE             DEVICE SIZE/OFF  NODE NAME
cardano-n 7031 root   29u  unix 0xffff8f40099bee80      0t0 38007 /tmp/cardano-node.socket type=STREAM (LISTEN)

Can I have woot woot? Now on a relay we simple update topology.json start the node with our flags, get kes keys, update producer and we should be very close.
Lets see if we can query our node from cardano-node run:

nix run .#cardano-cli -- version

Lets query our node
First we set our node.socket env var with:

export CARDANO_NODE_SOCKET_PATH=/tmp/cardano-node.socket

Next lets see if we can find our current tip:

nix run .#cardano-cli -- query tip --mainnet

YAS!

{
    "block": 4267441,
    "epoch": 197,
    "era": "Byron",
    "hash": "568fb79a14b8e10b9811a7c8252a94c8ab7afa7a4f71c343adc6748ccb20b4b1",
    "slot": 4269593,
    "slotInEpoch": 14393,
    "slotsToEpochEnd": 7207,
    "syncProgress": "47.88"
}

Machine torn down, updates made to the flags of our service:

"${pkgs.nix}/bin/nix run --accept-flake-config github:input-output-hk/cardano-node?ref=master run -- --topology /cardano-node//configuration/cardano/testnet-topology.json --socket-path /tmp/cardano-node.socket --config /cardano-node//configuration/cardano/testnet-config.json

Lets re-provision and confirm our node starts up
Grab the ip:

aws ec2 describe-instances --filters 'Name=instance-state-name,Values=running' --query 'Reservations[*].Instances[*].[InstanceId,PublicIpAddress]' --output text

Returns:

i-07518b843710xxxxxxxxxxxxxxx     13.xx.xx.77

Lets ssh:

ssh -i id_rsa.pem root@13.xx.xx.77

Next step is to update topology for both relay and producer and generate some keys implement rest of the “outstanding steps from buildCardanoStakePoolUbuntu.org”
I spent an hour troubhelshooting why terragrunt now fails if we do not pre-create the id_rsa.pem. I understand why it fails, but not how to fix it.
Adding manual for now.

touch id_rsa.pem
chmod 600 id_rsa.pem

OK, I pull in current configs update the config file-names to match our flake and that allows the tesnet relay to come up automatically
Section I added to start_node.sh

/run/current-system/sw/bin/curl -o /cardano-node/configuration/cardano/testnet-topology.json https://book.world.dev.cardano.org/environments/preprod/topology.json &
/run/current-system/sw/bin/curl -o /cardano-node/configuration/cardano/testnet-byron-genesis.json https://book.world.dev.cardano.org/environments/preprod/byron-genesis.json &
/run/current-system/sw/bin/curl -o /cardano-node/configuration/cardano/testnet-shelley-genesis.json https://book.world.dev.cardano.org/environments/preprod/shelley-genesis.json &
/run/current-system/sw/bin/curl -o /cardano-node/configuration/cardano/testnet-alonzo-genesis.json https://book.world.dev.cardano.org/environments/preprod/alonzo-genesis.json &
/run/current-system/sw/bin/curl -o /cardano-node/configuration/cardano/testnet-conway-genesis.json https://book.world.dev.cardano.org/environments/preprod/conway-genesis.json  &
/run/current-system/sw/bin/curl -o /cardano-node/configuration/cardano/testnet-config.json https://book.world.dev.cardano.org/environments/preprod/config.json &
# Fix paths set in official config.json
sed -i 's/conway-genesis.json/testnet-conway-genesis.json/g' /cardano-node/configuration/cardano/testnet-config.json
sed -i 's/alonzo-genesis.json/testnet-alonzo-genesis.json/g' /cardano-node/configuration/cardano/testnet-config.json
sed -i 's/byron-genesis.json/testnet-byron-genesis.json/g' /cardano-node/configuration/cardano/testnet-config.json
sed -i 's/shelley-genesis.json/testnet-shelley-genesis.json/g' /cardano-node/configuration/cardano/testnet-config.json

Confirmed it is working in ap-southeast2 lets do the same in eu-north1
The node isup, the topology file confirmed pointing to preprod, lets connect

export CARDANO_NODE_SOCKET_PATH=/tmp/cardano-node.socket
nix run .#cardano-cli -- query tip --testnet-magic 1

Returns:

{
    "block": 223797,
    "epoch": 30,
    "era": "Babbage",
    "hash": "7510ea7eba8e63764dd5c14689a47fa15cea0875adf0fd311ed394c5f7746e91",
    "slot": 11618608,
    "slotInEpoch": 300208,
    "slotsToEpochEnd": 131792,
    "syncProgress": "43.01"
}

[root@aws-eu-n-1-nr-1:/cardano-node]

Nice, lets go tear down ap-southeast2, build that from repo, if good push and commit changes
Did the same for the block producer all 4 instances in the sanbox account now connects to testnet.
Now we will go look at our tutorial and create the keys, register our address
Looks like the guide we will try to follow to finish up our stakepool in testnet:
https://developers.cardano.org/docs/operate-a-stake-pool/
Picking up here: https://developers.cardano.org/docs/operate-a-stake-pool/generating-wallet-keys
Going to do this on my bp-node from my node-block directory

ssh -i id_rsa.pem root@xx.xx.xx.xx

Lets create directory to store everythhing in:

mkdir -p $HOME/cardano-testnet/keys
cd $HOME/cardano-testnet/keys

Lets generate the payment pair:

cd /cardano-node/
  nix run .#cardano-cli -- address key-gen \
      --verification-key-file $HOME/cardano-testnet/keys/payment.vkey \
      --signing-key-file $HOME/cardano-testnet/keys/payment.skey

Create new stakepool address pair:

nix run .#cardano-cli -- stake-address key-gen \
 --verification-key-file $HOME/cardano-testnet/keys/stake.vkey \
 --signing-key-file $HOME/cardano-testnet/keys/stake.skey

Generate a wallet address for the payment key payment.vkey which will delegate to the stake address stake.vkey:

nix run .#cardano-cli --  address build \
  --payment-verification-key-file $HOME/cardano-testnet/keys/payment.vkey \
  --out-file $HOME/cardano-testnet/keys/payment.addr \
  --testnet-magic 1

With the output of the payment.addr I can see the address on chain and that it has no money in it.

nix run .#cardano-cli -- query utxo --address $(cat payment.addr)  --testnet-magic 1
warning: Git tree '/cardano-node' is dirty
                           TxHash                                 TxIx        Amount
--------------------------------------------------------------------------------------

Next go register on faucet so we can get some ada to test with: https://docs.cardano.org/cardano-testnet/tools/faucet/
I did initially request the funds for preview instead of preprod, it did not give error, but I did not get funds
After I sennt the funds to the right network lets see:

nix run .#cardano-cli --  query utxo --address addr_test1vqvpj9hm86lnp7n3p5qjkm7df38av6k302rcx788f287uaqdrernh --testnet-magic 1

YAS, we have funds:

                           TxHash                                 TxIx        Amount
--------------------------------------------------------------------------------------
86f4a5ffc63d317a7eaf28ca86819511ea3eeb90d7696faae6efd177c6cfe687     0        10000000000 lovelace + TxOutDatumNone

Create some cold keys and counter

nix run .#cardano-cli -- node key-gen \
    --cold-verification-key-file $HOME/cardano-testnet/keys/cold.vkey \
    --cold-signing-key-file $HOME/cardano-testnet/keys/cold.skey \
    --operational-certificate-issue-counter $HOME/cardano-testnet/keys/cold.counter

Generate KES keys

cd /cardano-node/
nix run .#cardano-cli -- node key-gen-KES \
    --verification-key-file $HOME/cardano-testnet/keys/kes.vkey \
    --signing-key-file $HOME/cardano-testnet/keys/kes.skey

Make a VRF key pair.

nix run .#cardano-cli -- node key-gen-VRF \
    --verification-key-file $HOME/cardano-testnet/keys/vrf.vkey \
    --signing-key-file $HOME/cardano-testnet/keys/vrf.skey

Update permissions on skey to set it to readonly

chmod 400 $HOME/cardano-testnet/keys/vrf.skey

#Current:

Continue tutorial here https://developers.cardano.org/docs/operate-a-stake-pool/block-producer-keys#stakepool-operational-certificate-generation
Get slotNo

slotsPerKESPeriod=$(cat /cardano-node/configuration/cardano/testnet-shelley-genesis.json | jq -r '.slotsPerKESPeriod')
echo slotsPerKESPeriod: ${slotsPerKESPeriod}

slotNo=$(nix run .#cardano-cli -- query tip --testnet-magic 1 | jq -r '.slot')
echo slotNo: ${slotNo}

Boo:

jq: command not found

For now we will install in our session, for future we will add the package to our configuration.nix

nix-env -i jq

Lets try getting the slotNo again
Yas, this time we get:

 echo slotNo: ${slotNo}
slotNo: 29254618

Find kesPeriod by dividing the slot tip number by slotsPerKESPeriod.

kesPeriod=$((${slotNo} / ${slotsPerKESPeriod}))
echo kesPeriod: ${kesPeriod}
startKesPeriod=${kesPeriod}
echo startKesPeriod: ${startKesPeriod}

Returns:

echo kesPeriod: ${kesPeriod}
kesPeriod: 225

echo startKesPeriod: ${startKesPeriod}
startKesPeriod: 225

With this calculation, we can generate an operational certificate for the pool. Change the {startKesPeriod} in script from the value above accordingly.

cd /cardano-node/
nix run .#cardano-cli -- node issue-op-cert \
      --kes-verification-key-file $HOME/cardano-testnet/keys/kes.vkey \
      --cold-signing-key-file $HOME/cardano-testnet/keys/cold.skey \
      --operational-certificate-issue-counter $HOME/cardano-testnet/keys/cold.counter \
      --kes-period ${startKesPeriod} \
      --out-file $HOME/cardano-testnet/keys/node.cert

Lets set the env vars, see if we can run our node with keys:

KES=$HOME/cardano-testnet/keys/kes.skey
VRF=$HOME/cardano-testnet/keys/vrf.skey
CERT=$HOME/cardano-testnet/keys/node.cert

First lets stop the service that is currently running our node

systemctl stop cardano-node-block-producer-daemon.service

Let us manually try to run our node with the new values:

nix run --accept-flake-config github:input-output-hk/cardano-node?ref=master run -- --topology /cardano-node/configuration/cardano/testnet-topology.json --socket-path /tmp/cardano-node.socket --port 6001 --config /cardano-node/configuration/cardano/testnet-config.json --shelley-kes-key ${KES} --shelley-vrf-key ${VRF} --shelley-operational-certificate ${CERT}

It is coming up, it is looking healty, I can port-forward and see ekg stats.
Struggling to confirm it is running as Core node, but I do See “TraceNodeNotLeader” so that seems good enough for now
Next I will go register our stakepool https://developers.cardano.org/docs/operate-a-stake-pool/register-stake-address
Register Stake Address:
Query the UTXO of the address that pays for the transaction and deposit:

cd /cardano-node/ 
nix run .#cardano-cli -- query utxo \
    --address $(cat $HOME/cardano-testnet/keys/payment.addr) \
    --testnet-magic 1 > $HOME/cardano-testnet/keys/fullUtxo.out

Confirm

cat $HOME/cardano-testnet/keys/fullUtxo.out

I see the txHash and details
Find out current slot:

cd /cardano-node/ 
currentSlot=$(nix run .#cardano-cli -- query tip --testnet-magic 1 | jq -r '.slot')
echo Current Slot: $currentSlot

Create a stake address registration certificate

cd /cardano-node/ 
nix run .#cardano-cli -- stake-address registration-certificate \
    --stake-verification-key-file $HOME/cardano-testnet/keys/stake.vkey \
    --out-file $HOME/cardano-testnet/keys/stake.cert

Now, we build the transaction which will return the tx.raw transaction file and also the transaction fees:

cd /cardano-node
nix run .#cardano-cli -- transaction build \
      --tx-in 2c62f98035ee9c1a6da177d5aa5f69b82acb7134cae940dd35188a353f411ff8#0 \
      --tx-out $(cat $HOME/cardano-testnet/keys/payment.addr)+1000000 \
      --change-address $(cat $HOME/cardano-testnet/keys/payment.addr) \
      --testnet-magic 1 \
      --certificate-file $HOME/cardano-testnet/keys/stake.cert \
      --invalid-hereafter $(( ${currentSlot} + 1000)) \
      --witness-override 2 \
      --out-file $HOME/cardano-testnet/keys/tx.raw

Output:

Estimated transaction fee: Lovelace 172013

Go find the deposit amount in the protocol parameters

cd /cardano-node/
nix run .#cardano-cli -- query protocol-parameters \
      --testnet-magic 1  \
      --out-file $HOME/cardano-testnet/keys/protocol.json

stakeAddressDeposit=$(cat $HOME/cardano-testnet/keys/protocol.json | jq -r '.stakeAddressDeposit')
echo $stakeAddressDeposit

Returns:

Next, the complete transaction output is calculated by subtracting the deposit and transaction fees from the amount we have in our payment address:

txOut=$((10000000000-${stakeAddressDeposit}-172013))
echo ${txOut}

Now we have all the information in place to build the final transaction file:

cd /cardano-node/
nix run .#cardano-cli -- transaction build-raw \
      --tx-in 2c62f98035ee9c1a6da177d5aa5f69b82acb7134cae940dd35188a353f411ff8#0 \
      --tx-out $(cat $HOME/cardano-testnet/keys/payment.addr)+${txOut} \
      --invalid-hereafter $((${currentSlot} + 1000)) \
      --fee 172013 \
      --certificate-file $HOME/cardano-testnet/keys/stake.cert \
      --out-file $HOME/cardano-testnet/keys/tx.raw

Sign the transaction with both the payment and stake secret keys:

cd /cardano-node/
nix run .#cardano-cli -- transaction sign \
      --tx-body-file $HOME/cardano-testnet/keys/tx.raw \
      --signing-key-file $HOME/cardano-testnet/keys/payment.skey \
      --signing-key-file $HOME/cardano-testnet/keys/stake.skey \
      --testnet-magic 1 \
      --out-file $HOME/cardano-testnet/keys/tx.signed

Lets go submit our signed transaction

nix run .#cardano-cli -- transaction submit \
    --tx-file $HOME/cardano-testnet/keys/tx.signed \
    --testnet-magic 1

Next Register a Stake Pool with Metadata
Create a json file with your metadata

{
    "name": "TestPool",
    "description": "The pool that tests all the pools",
    "ticker": "TEST",
    "homepage": "https://teststakepool.com"
}

Lets get the hash for our metadata file:

nix run .#cardano-cli -- stake-pool metadata-hash --pool-metadata-file $HOME/cardano-testnet/poolMetadata.json

Returns:

8292a9e45df8a72f975d6222690ce5bf3fe4a34ac72e272577dad47d075e7582

Lets Generate the stake pool registration certificate
NOTE: for the url below I took the permalink for a file I put in github into tinyrl because the permalink is too long for the cert

nix run .#cardano-cli -- stake-pool registration-certificate \
    --cold-verification-key-file $HOME/cardano-testnet/keys/cold.vkey \
    --vrf-verification-key-file $HOME/cardano-testnet/keys/vrf.vkey \
    --pool-pledge 10000 \
    --pool-cost 340000000 \
    --pool-margin 1 \
    --pool-reward-account-verification-key-file $HOME/cardano-testnet/keys/stake.vkey \
    --pool-owner-stake-verification-key-file $HOME/cardano-testnet/keys/stake.vkey \
    --testnet-magic 1 \
    --pool-relay-ipv4 16.16.199.77 \
    --pool-relay-port 41783 \
    --metadata-url https://tinyurl.com/yc5xxnke \
    --metadata-hash 8292a9e45df8a72f975d6222690ce5bf3fe4a34ac72e272577dad47d075e7582  \
    --out-file $HOME/cardano-testnet/keys/pool-registration.cert

NEXT: finish up the above and pick up here: https://developers.cardano.org/docs/operate-a-stake-pool/register-stake-pool-metadata#generate-stake-pool-registration-certificate
Let me go spin up our eu-north relay node so we can get a public ip for a relay
The relay came up quick, I am taking a minute to make sure I understand the node port

sudo netstat -tulpen | grep cardano-node
tcp        0      0 0.0.0.0:41783           0.0.0.0:*               LISTEN      0          17293      2111/cardano-node   
tcp        0      0 127.0.0.1:12798         0.0.0.0:*               LISTEN      0          19353      2111/cardano-node   
tcp        0      0 127.0.0.1:12788         0.0.0.0:*               LISTEN      0          19352      2111/cardano-node   
tcp6       0      0 :::34509                :::*                    LISTEN      0          17294      2111/cardano-node

Is that what my node is creating calling?

Lets confirm and see what is using port 41783

sudo lsof -i -P -n | grep 41783

Returns:

cardano-n 2111   root   32u  IPv4  17293      0t0  TCP *:41783 (LISTEN)

Lets see what we made:

cat $HOME/cardano-testnet/keys/pool-registration.cert

EYEEE, it looks like a certificate:

{
    "type": "CertificateShelley",
    "description": "Stake Pool Registration Certificate",
    "cborHex": "8a03581c045132653833613fb65c1bc9669d3fe617ad65361b2bc8490d51beee5820f91aec34939da643299bacaad2bcba7a0a140d6450c593238f8f57cf0d1ff0c71927101961a8d81e820101581de1a3fcba1fc8f9a73c05d114f79c465c2b62571d17d42fafcccc86557881581ca3fcba1fc8f9a73c05d11.....
}

To honor your pledge, create a delegation certificate:

cd /cardano-node/
nix run .#cardano-cli -- stake-address delegation-certificate \
      --stake-verification-key-file $HOME/cardano-testnet/keys/stake.vkey \
      --cold-verification-key-file $HOME/cardano-testnet/keys/cold.vkey \
      --out-file $HOME/cardano-testnet/keys/delegation.cert

Draft the transaction to submit the registration certificate to the blockchain:
NOTE: You can find the TxHash#TxIx by running “cardano-cli query utxo –address payment.addr”

cd /cardano-node/
  nix run .#cardano-cli -- transaction build-raw \
      --tx-in 300ab13a2539d437d9144fd35bbfbd3c74cc4825f8773f2da54bbcf81e74a1c4#0 \
      --tx-out $(cat $HOME/cardano-testnet/keys/payment.addr)+0 \
      --invalid-hereafter 0 \
      --fee 0 \
      --out-file $HOME/cardano-testnet/keys/tx.draft \
      --certificate-file $HOME/cardano-testnet/keys/pool-registration.cert \
      --certificate-file $HOME/cardano-testnet/keys/delegation.cert

Calculate the fees:

cd /cardano-node/
  nix run .#cardano-cli -- transaction calculate-min-fee \
      --tx-body-file $HOME/cardano-testnet/keys/tx.draft \
      --tx-in-count 1 \
      --tx-out-count 1 \
      --witness-count 3 \
      --byron-witness-count 0 \
      --testnet-magic 1 \
      --protocol-params-file $HOME/cardano-testnet/keys/protocol.json

Returns:

194761 Lovelace

I look up the pool deposit, I know there is a better way:

grep PoolDeposit $HOME/cardano-testnet/keys/protocol.json
   "stakePoolDeposit": 500000000,

Lets look up our UTxO Balance

nix run .#cardano-cli --  query utxo --address addr_test1vr5m5e9ws0mfwrc8hupp0ty9srzjfvjwagxqk6qa7vdfa7cvkkp98 --testnet-magic 1

Returns the hash we used in our build-raw above

                           TxHash                                 TxIx        Amount
--------------------------------------------------------------------------------------
300ab13a2539d437d9144fd35bbfbd3c74cc4825f8773f2da54bbcf81e74a1c4     0        9997827987 lovelace + TxOutDatumNone

Calculate the change for tx-out:

expr <UTxO BALANCE> - <poolDeposit> - <TRANSACTION FEE>

So in our case

expr 9997827987 - 500000000 - 194761

Gives me:

9497633226

Build the transaction

cd /cardano-node/
  nix run .#cardano-cli -- transaction build-raw \
      --tx-in 300ab13a2539d437d9144fd35bbfbd3c74cc4825f8773f2da54bbcf81e74a1c4#0 \
      --tx-out $(cat $HOME/cardano-testnet/keys/payment.addr)+9497633226 \
      --invalid-hereafter $(( ${currentSlot} + 1000))  \
      --fee 194761 \
      --out-file $HOME/cardano-testnet/keys/tx.raw \
      --certificate-file $HOME/cardano-testnet/keys/pool-registration.cert \
      --certificate-file $HOME/cardano-testnet/keys/delegation.cert

Sign the transaction:

cd /cardano-node/
  nix run .#cardano-cli -- transaction sign \
      --tx-body-file $HOME/cardano-testnet/keys/tx.raw \
      --signing-key-file $HOME/cardano-testnet/keys/payment.skey \
      --signing-key-file $HOME/cardano-testnet/keys/stake.skey \
      --signing-key-file $HOME/cardano-testnet/keys/cold.skey \
      --testnet-magic 1 \
      --out-file $HOME/cardano-testnet/keys/tx.signed

Submit the transaction:

cd /cardano-node/
  nix run .#cardano-cli -- transaction submit \
      --tx-file $HOME/cardano-testnet/keys/tx.signed \
      --testnet-magic 1

Lets see if we can find our pool, first get the poolid

cd /cardano-node
  nix run .#cardano-cli --  stake-pool id --cold-verification-key-file $HOME/cardano-testnet/keys/cold.vkey --output-format "hex"

And lets look for the pool on the network ledger

cd /cardano-node/
  nix run .#cardano-cli -- query ledger-state --testnet-magic 1 | grep publicKey | grep 045132653833613fb65c1bc9669d3fe617ad65361b2bc8490d51beee

SOOO COOL. I would like to use pool-tool or other to get more details, but I see the id in the ledger, need to find more metadata.

Secrets:

TO get this done I need to:
Fix startup script for bp to use aws secrets
Fix topology between nodes, go read relay topology section, the talk about bp
Get visibility, viewGL.sh?, ekg (what else does it show?), grafana (what already exists), prometheus exporter, what do I get?
Curent process,
Lets set the env vars, see if we can run our node with keys:

KES=$HOME/cardano-testnet/keys/kes.skey
VRF=$HOME/cardano-testnet/keys/vrf.skey
CERT=$HOME/cardano-testnet/keys/node.cert

First lets stop the service that is currently running our node

systemctl stop cardano-node-block-producer-daemon.service

Let us manually try to run our node with the new values:

nix run --accept-flake-config github:input-output-hk/cardano-node?ref=master run -- --topology /cardano-node/configuration/cardano/testnet-topology.json --socket-path /tmp/cardano-node.socket --port 6001 --config /cardano-node/configuration/cardano/testnet-config.json --shelley-kes-key ${KES} --shelley-vrf-key ${VRF} --shelley-operational-certificate ${CERT}

So we set env vars and run daemen passing those in. How do you do that for a service?
Lets ask LLM for its opinion.
It suggests creating an environment file I then reference for the service
I like that idea as it allows us to itterate on the creation of that environment file and keys to be secure.
Lets see how we go, setting up file that looks exacly like KES, VRF, CERT file above
Updating service to something like this. I could never get environment file to work, decided to switch to script that sets vars)

[Unit]
Description=block-producer

[Service]
EnvironmentFile=/root/cardano/testnet/environment
ExecStart= nix run --accept-flake-config github:input-output-hk/cardano-node?ref=master run -- --topology /cardano-node/configuration/cardano/testnet-topology.json --socket-path /tmp/cardano-node.socket --port 6001 --config /cardano-node/configuration/cardano/testnet-config.json --shelley-kes-key ${KES} --shelley-vrf-key ${VRF} --shelley-operational-certificate ${CERT}

[Install]
WantedBy=multi-user.target

Lets see what that looks like in our configuration.nix
I tried various incarnations declaring s3 json file setting that to /root/secret and a few others, but never got it to work right.
I think the right think would be to call key-store like aws or other for this.
To get around this for now I am going to add the env var setting and startup script to a sh I will call from ExecStart, that way terraform has a valid object and the vars can be set at runtime, I hope.
For now create static, add the following to start_node.sh

# Create control script for the node
cat << 'EOF' > /run/run_bp
source /root/cardano-testnet/environment
/run/current-system/sw/bin/nix run --accept-flake-config github:input-output-hk/cardano-node?ref=master run -- --topology /cardano-node/configuration/cardano/testnet-topology.json --socket-path /tmp/cardano-node.socket --port 6001 --config /cardano-node/configuration/cardano/testnet-config.json --shelley-kes-key ${KES} --shelley-vrf-key ${VRF} --shelley-operational-certificate ${CERT}
EOF

At this point we have to manually craate the environment file:

export KES=$HOME/cardano-testnet/keys/kes.skey
export VRF=$HOME/cardano-testnet/keys/vrf.skey
export CERT=$HOME/cardano-testnet/keys/node.cert

We also create those files outside of the provisioning steps.
One way to move them off is to add them as aws secrets in terraform
Playing with this:

user_data = templatefile("${path.module}/start_node.sh", {
   KES = data.aws_secretsmanager_secret_version.testnet/kes.skey.secret_string
   VRF = data.aws_secretsmanager_secret_version.testnet/vrf.skey.secret_string
   CERT = data.aws_secretsmanager_secret_version.testnet/node_cert.secret_string
 })

Troubleshooting, lots of itteration to try to get tf to show me the variables with output.
Without sensitive = true it errors out trying to avoid exposing secrets, with it set to true the output is redacted in the apply

output "user_data_script" {
  sensitive = true
  value = aws_instance.machine.user_data
}

Finally learned on snarky stackoverflow, after the apply run:

terragrunt output user_data_script

It takes a minute, but shows me that my variables is not being substituted....
Solution, what I implemented after some itteration. At the end I will discuss some improvements.
I moved the command to run the node to a script of its own, I like modular script I can tweak for env vars etc without impacting node_start.sh
I do end up with the secrets written to files I use in my startup.
I used a remote executioner I added to aws_instance, I do not like this solution, it is crude.
It gives us something to itterate on, I need adult supervision for this implementation:
Here are the updates I ended up making to main.tf:

# This is first pass at pulling a secret out, will need to play
# I do not like that the secret arn goes here, but have not find a good way to keep the arn out of git.
data "aws_secretsmanager_secret_version" "kes_secret" {
  secret_id = "arn:aws:secretsmanager:ap-southeast-2:407250907589:secret:testnet/kes.skey-LR7hrJ"
}
#....etc for all 3

#This is inside resource "aws_instance"
  connection {
    type        = "ssh"
    host        = aws_instance.machine.public_ip
    user        = "root"
    private_key = file("${path.module}/id_rsa.pem")
  }
  provisioner "remote-exec" {
# I did lots of itteration with env vars I would then pass to user_data, but this was the only reliable way I found to access the secrets from the daemon start. I really want a cleaner way, but with these secured.
    inline = [
      "echo '${data.aws_secretsmanager_secret_version.cert_secret.secret_string}' > /run/node_cert",
      "echo '${data.aws_secretsmanager_secret_version.vrf_secret.secret_string}' > /run/vrf_secret",
      "echo '${data.aws_secretsmanager_secret_version.kes_secret.secret_string}' > /run/kes_secret",
    ]
  }
  
# This was a multi hour research gem on why deploy_nixos stopped running after I added the remote-exec, 
# Adding this depends_on seems to make deploy_nixos run every time I update the machine. Not sure if there is better way, 
# I know I can manually terraform -apply deploy_nixos, leaving this in for now.
+    depends_on = [aws_instance.machine]

In the node_start.sh I create a run_bp I will call from my configuration.nix service creation:

# Create control script for the node
cat << EOF > /run/run_bp
# source /root/cardano-testnet/environment
/run/current-system/sw/bin/nix run --accept-flake-config github:input-output-hk/cardano-node?ref=master run -- --topology /cardano-node/configuration/cardano/testnet-topology.json --socket-path /tmp/cardano-node.socket --port 6001 --config /cardano-node/configuration/cardano/testnet-config.json --shelley-kes-key /run/kes_secret --shelley-vrf-key /run/vrf_secret --shelley-operational-certificate /run/node_cert


EOF

chmod 700 /run/run_bp
chmod 600 /run/kes_secret
chmod 600 /run/vrf_secret
chmod 600 /run/node_cert

And then I update my ExecStart in cardano serviceConfig section of configuration.nix to just call the run_bp we created.

    serviceConfig = {
-      ExecStart = "${pkgs.nix}/bin/nix run --accept-flake-config github:input-output-hk/cardano-node?ref=master run -- --topology /cardano-node/configuration/cardano/testnet-topology.json --socket-path /tmp/cardano-node.socket --port 6001 --config /cardano-node/configuration/cardano/testnet-config.json --shelley-kes-key ${KES} --shelley-vrf-key ${VRF} --shelley-operational-certificate ${CERT}";
 +     ExecStart = "${pkgs.bash}/bin/bash -c /run/run_bp";

TODO:
IMPROVEMENTS:
The aws secrets need to be env var or at least current file method needs to be vetted for better design and security

Additional information

We can start with a copy of https://github.com/bernokl/nix-ops take learnings from https://github.com/yumiai/docs/blob/main/bernoHome/diypool_apply.org and deploy what we can from: input-output-hk/cardano-node
Step 1, properly review what we did in diypool_apply and compare that to information you can find in iohk/cardano-node to see if we can build up list of steps for deploy

Next steps:

Current

I still need some validation on what I did, but sandbox has a working testnet bp, rn and stake-pool
I have working producer that will start up based on secrets I set in aws.
Next I am going to split the work into a private yumi repo where I will be cleaning up cruft, hardening steps create README for final deploy.
I still also need to fix topology and create some monitoring.
Keep in mind ARM cost savings once we are running.

Archive of old steps, this is all done

Add final parts to make prod ready and move on to block-producer
The node gets started as a service we define in configuration.nix “systemctl status cardano-node-relay-daemon.service”
The new machine registers itself in tailscale, you can use tailscales to authenticate ssh over the 10. network, you can find machine by ip or networking.hostName
I can see healty logs with work the node is doing, I can see a dashboard with healty metrics, I still need to query and cli interact with the server
Document and test what we have, I see
The terragrunt research culmintated in restructure that allows us to spin up 3 relay-nodes each checking in a different region, each configured with tailscale
We are now in the process of re-applying what we overlap on a block-producer
Researh shows we have the files we need to apply this change to in input-output-hk/cardano-node/configuration/cardano
I figured out how to pass in the flags we need. – duh, now I can create and set the node-socket and even qury our node
Set up configs to auto provision 3 relays and a bp node to use testnet in sandbox
Next step is to update topology for both relay and producer and generate some keys implement rest of the “outstanding steps from buildCardanoStakePoolUbuntu.org”
Continue block producer tutorial here: https://developers.cardano.org/docs/operate-a-stake-pool/block-producer-keys#stakepool-operational-certificate-generation
Worked through that link and now I have a block producer with a registered stake pool.

Things to keep in mind:

- Still need to figure out how we set configurations for the node ie whitelist block producer etc
- iptables
- networks in aws, do we allow any internal communication through aws or will everything flow through tailscale?
- still need to think about key management, aegis/sops

Troubleshooting tips

Don’t forget to do tcpdump to see where it is trying to get artifacts from
Things to keep in mind:

- iptables
- network groups in aws
- still need to think about key management

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4_cardano_node.org

4_cardano_node.org

This is document to capture standing up nodes in cardano using divnix and terraform

Stand up relay node

Update terraform structure of nix-ops-node to implement terragrunt.

Block producer

Secrets:

Additional information

Next steps:

Current

Archive of old steps, this is all done

Troubleshooting tips

Files

4_cardano_node.org

Latest commit

History

4_cardano_node.org

File metadata and controls

This is document to capture standing up nodes in cardano using divnix and terraform

Stand up relay node

Update terraform structure of nix-ops-node to implement terragrunt.

Block producer

Secrets:

Additional information

Next steps:

Current

Archive of old steps, this is all done

Troubleshooting tips