You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: scala/datastax-v4/aws-glue/export-to-s3/README.md
+38-4Lines changed: 38 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -25,7 +25,7 @@ Optionally, you may also provide the S3URI, FORMAT. Later you can change paramet
25
25
```
26
26
27
27
28
-
By default the export will copy data to S3 bucket specified in the parent stack in the format. You can override the S3 bucket at run time.
28
+
By default the export will copy data to S3 bucket specified in the parent stack in the format. You can override the S3 bucket at run time. The data below used in the example will be replaced with the time of the export
29
29
30
30
```shell
31
31
\--- S3_BUCKET
@@ -37,7 +37,7 @@ By default the export will copy data to S3 bucket specified in the parent stack
37
37
\----- keyspace_name
38
38
\----- table_name
39
39
\----- snapshot
40
-
\----- year=2025
40
+
\----- year=2025
41
41
\----- month=01
42
42
\----- day=02
43
43
\----- hour=09
@@ -46,9 +46,43 @@ By default the export will copy data to S3 bucket specified in the parent stack
46
46
47
47
```
48
48
49
-
Running the job can be done through the AWS CLI. In the following example the command is running the job created in the previous step, but overrides the number of glue workers, worker type, and script arguments such as the table name. You can override any of the glue job parameters at run time.
49
+
### Running the export from the CLI
50
+
51
+
Running the job can be done through the AWS CLI. In the following example the command is running the job created in the previous step, but overrides the number of glue workers, worker type, and script arguments such as the table name. You can override any of the glue job parameters at run time and the default arguments.
| --KEYSPACE_NAME | Name of the keyspace of the table to export | 23.99 |
64
+
| --TABLE_NAME | Name of the table to export | 23.99 |
65
+
| --S3_URI | S3 URI where the root of the export will be located. The folder structure is added dynamically in the export-sample.scala | The default location is the s3 bucked provided when setting up the parent stack or the export stack |
66
+
| --FORMAT | THe format of the export. Its recommended to use parquet. You could alternativley use json or other types supported by spark s3 libraries | parquet
67
+
| --DRIVER_CONF | the file containing the driver configuration. | By default the parent stack sets up a config for Cassandra and config for keyspaces. You can add as many additional configurations as you like by dropping them in the same location in s3. | keyspaces-application.conf
68
+
69
+
70
+
### Scheduled Trigger (Cron)
71
+
You can trigger this export regularly using a scheduled trigger. Here is a simple AWS CLI command to create a Glue Trigger that runs your Glue job Export once per week (every Monday at 12:00 UTC):
0 commit comments