Skip to content

Commit f1355fc

Browse files
committed
Update README.md
updated directions for export and scheduled export
1 parent 1cf2e73 commit f1355fc

File tree

1 file changed

+38
-4
lines changed
  • scala/datastax-v4/aws-glue/export-to-s3

1 file changed

+38
-4
lines changed

scala/datastax-v4/aws-glue/export-to-s3/README.md

Lines changed: 38 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Optionally, you may also provide the S3URI, FORMAT. Later you can change paramet
2525
```
2626

2727

28-
By default the export will copy data to S3 bucket specified in the parent stack in the format. You can override the S3 bucket at run time.
28+
By default the export will copy data to S3 bucket specified in the parent stack in the format. You can override the S3 bucket at run time. The data below used in the example will be replaced with the time of the export
2929

3030
```shell
3131
\--- S3_BUCKET
@@ -37,7 +37,7 @@ By default the export will copy data to S3 bucket specified in the parent stack
3737
\----- keyspace_name
3838
\----- table_name
3939
\----- snapshot
40-
\----- year=2025
40+
\----- year=2025
4141
\----- month=01
4242
\----- day=02
4343
\----- hour=09
@@ -46,9 +46,43 @@ By default the export will copy data to S3 bucket specified in the parent stack
4646

4747
```
4848

49-
Running the job can be done through the AWS CLI. In the following example the command is running the job created in the previous step, but overrides the number of glue workers, worker type, and script arguments such as the table name. You can override any of the glue job parameters at run time.
49+
### Running the export from the CLI
50+
51+
Running the job can be done through the AWS CLI. In the following example the command is running the job created in the previous step, but overrides the number of glue workers, worker type, and script arguments such as the table name. You can override any of the glue job parameters at run time and the default arguments.
5052

5153
```shell
52-
aws-glue % aws glue start-job-run --job-name AmazonKeyspacesExportToS3-shuffletest-shuffletest-export1 --number-of-workers 8 --worker-type G.2X --arguments '{"--TABLE_NAME":"transactions"}'
54+
aws-glue % aws glue start-job-run --job-name AmazonKeyspacesExportToS3-aksglue-aksglue-export --number-of-workers 8 --worker-type G.2X --arguments '{"--TABLE_NAME":"transactions"}'
5355
```
5456

57+
Full list of aws cli arguments [start-job-run arguments](https://docs.aws.amazon.com/cli/latest/reference/glue/start-job-run.html)
58+
59+
### List of export script arguments
60+
61+
| argument | defenition | default |
62+
| :---------------- | :---------------------------------------------- | ----: |
63+
| --KEYSPACE_NAME | Name of the keyspace of the table to export | 23.99 |
64+
| --TABLE_NAME | Name of the table to export | 23.99 |
65+
| --S3_URI | S3 URI where the root of the export will be located. The folder structure is added dynamically in the export-sample.scala | The default location is the s3 bucked provided when setting up the parent stack or the export stack |
66+
| --FORMAT | THe format of the export. Its recommended to use parquet. You could alternativley use json or other types supported by spark s3 libraries | parquet
67+
| --DRIVER_CONF | the file containing the driver configuration. | By default the parent stack sets up a config for Cassandra and config for keyspaces. You can add as many additional configurations as you like by dropping them in the same location in s3. | keyspaces-application.conf
68+
69+
70+
### Scheduled Trigger (Cron)
71+
You can trigger this export regularly using a scheduled trigger. Here is a simple AWS CLI command to create a Glue Trigger that runs your Glue job Export once per week (every Monday at 12:00 UTC):
72+
73+
```shell
74+
aws glue create-trigger \
75+
--name KeyspacesExportWeeklyTrigger \
76+
--type SCHEDULED \
77+
--schedule "cron(0 12 ? * MON *)" \
78+
--start-on-creation \
79+
--actions '[{
80+
"JobName": "AmazonKeyspacesExportToS3-aksglue-aksglue-export",
81+
"WorkerType": "G.2X",
82+
"NumberOfWorkers": 8,
83+
"Arguments": {
84+
"--table_name": "transactions",
85+
"--keyspace_name": "aws"
86+
}
87+
}]'
88+
```

0 commit comments

Comments
 (0)