Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Column Selection Feature to Ingress Descriptor #312

Open
youen opened this issue Sep 2, 2024 · 0 comments · May be fixed by #313
Open

Add Column Selection Feature to Ingress Descriptor #312

youen opened this issue Sep 2, 2024 · 0 comments · May be fixed by #313

Comments

@youen
Copy link
Member

youen commented Sep 2, 2024

Currently, the IngressDescriptor configuration in our system does not allow users to specify which columns to select from tables during the data extraction process. By default, all columns from the startTable and related tables are included, which can lead to unnecessary data being processed and transferred.

Proposed Enhancement:

We propose adding a select option to the IngressDescriptor YAML configuration that allows users to explicitly define the columns they want to extract from each table. This feature should apply to both the startTable and any related tables defined under relations.

Current Configuration Example:

version: v1
IngressDescriptor:
    startTable: public.customer
    relations:
      - name: film_original_language_id_fkey
        parent:
            name: public.film
            lookup: false
        child:
            name: public.language
            lookup: true
            where: "creation_date >= '01/01/2023'"

Proposed Configuration Example:

version: v1
IngressDescriptor:
    startTable: public.customer
    select: ["customer_id", "first_name"]  # Select only specific columns from the start table
    relations:
      - name: film_original_language_id_fkey
        parent:
            name: public.film
            lookup: false
        child:
            name: public.language
            lookup: true
            where: "creation_date >= '01/01/2023'"
            select: ["language_id", "title"]  # Select specific columns from the related table

Benefits:

  1. Optimized Data Transfer: Reducing the amount of data transferred by selecting only necessary columns.
  2. Improved Performance: Potentially faster query execution and data processing by limiting the scope of the data extracted.
  3. Greater Flexibility: Users gain more control over the data extraction process, tailoring it to their specific needs.

Impact and Dependencies:

This enhancement will require modifications to the YAML parsing logic and the underlying SQL query generation to accommodate the new select attribute for each table involved in the extraction process. It may also require updates to documentation and examples provided to users.

Acceptance Criteria:

  1. Users can specify a select option for both startTable and each related table in relations.
  2. The system correctly generates SQL queries that only include the specified columns.
  3. Extensive testing is performed to ensure backward compatibility with configurations that do not include the select option.
@youen youen linked a pull request Sep 3, 2024 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant