Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New command: shell #57

Open
anatoly-scherbakov opened this issue Sep 13, 2020 · 2 comments
Open

New command: shell #57

anatoly-scherbakov opened this issue Sep 13, 2020 · 2 comments

Comments

@anatoly-scherbakov
Copy link
Collaborator

columns:
  uuid:
    shell:
      command: uuid -v4
      pipe: false

ysv should leverage the ecosystem of UNIX command line tools. It should permit the user to process the values of a given column through an external program.

There are multiple use cases to this.

  1. As we have seen in practice, sometimes ysv's built in filters are not enough. We have to write custom code in another language to do some sort of complex processing for particular columns.

With shell command, we could teach ysv to call our Python script in a separate process and feed the values, line by line, to that script. It will read the output from stdout of the script and incorporate the resulting values into the output CSV dataset.

This would make ysv enormously extensible. Moreover, we could allow it to run multiple instances of the external program and thus facilitate the multiprocessing capabilities of modern hardware (which, say, Python alone cannot easily do).

  1. Even without custom code, the communication using UNIX pipes allows to use standard command line tools, for example awk.

In both of these cases, we will get substantial expansion in functionality by leveraging tools that already exist out there, – and we can do that with great efficiency.

@anatoly-scherbakov
Copy link
Collaborator Author

anatoly-scherbakov commented Sep 13, 2020

More examples.

columns:
  number:
    input: number_plus_five
    shell: awk { $1 + 5 }

or

columns:
  phone_number:
    input: Phone
    shell: python run.py validate_united_states_phonenumbers

In each case, ysv runs the provided shell command as another process (or processes) and feeds the input values to the stdin of that command. It then reads the processed values from stdout and inserts them into the output CSV dataset.

@anatoly-scherbakov
Copy link
Collaborator Author

It seems jq team is working on something similar: jqlang/jq#147

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant