Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debugging #33

Closed
dejmail opened this issue Sep 10, 2020 · 2 comments
Closed

Debugging #33

dejmail opened this issue Sep 10, 2020 · 2 comments
Labels

Comments

@dejmail
Copy link

dejmail commented Sep 10, 2020

Hi there

This might be a question for DRF instead, but how exactly does one use pdb with this library. If one inserts a set_trace() the output on the django server keeps rolling past and so even though one is able to interact with pdb, the command prompt disappears under a torrent of HTTP requests. Is there any way to pause everything so I can debug ?

Thanks

@techdragon
Copy link

I wouldn't mind some debugging insights, not for the reasons you asked... but since even using "BACKEND": "data_wizard.backends.immediate" I couldn't seem to get my IDE (Pycharm) to catch any errors from data_wizard. Which combined with the test setup complexity, makes it harder than it needs to be to work with this library. I'm trying to fix #31 because I'm using the very common django-storages library, and making zero progress because I'm getting no useful output from tests or debugging.

sheppard added a commit to wq/itertable that referenced this issue Nov 18, 2021
@sheppard
Copy link
Member

I will add some documentation on debugging tips, but here are a few things to start:

General Tips

  1. Given the wide variety of use cases and failure points, Data Wizard traps most errors by default, to ensure the user can get a short, hopefully informative message rather than a generic 500 error. The trapped errors are logged via python's logging module.

  2. The threading backend (enabled by default) adds another layer of indirection when trying to identify an exception.

  3. Thus, if you are writing a custom Iter or Serializer class, make sure each component works in isolation before trying to debug within the Data Wizard stack. (See examples below)

  4. Once you have confirmed that itertable and the serializer are working individually, try running data_wizard without any web UI traffic via the CLI (./manage.py runwizard).

  5. Once that is working, try running through the web UI with ./manage.py runserver and the immediate backend:

DATA_WIZARD = {
    "BACKEND": "data_wizard.backends.immediate"
}

Debugging File Loading/Parsing (IterTable)

To debug issues loading and parsing files, try using itertable directly:

from itertable import load_file

for row in load_file('/path/to/file.xlsx'):
    print(row)

Note that existing releases of itertable automatically suppress the OSError raised when a file is inaccessible, so it doesn't even make it back to Data Wizard. For the next release, I changed this to raise itertable.exceptions.LoadFailed unless require_existing is explicitly set to false.

If you are writing a custom Iter class, test the class with a similar loop:

from myapp import CustomIter

for row in CustomIter(filename='/path/to/file.xlsx'):
    print(row)

Debugging the Serializer (DRF)

To investigate validation issues, try instantiating the DRF serializer class directly.

from data_wizard import registry
Serializer = registry.get_serializer("My Model")
serializer = Serializer(data={"test": "data"})
serializer.is_valid(raise_exception=True)

Note that data_wizard traps any and all serializer errors for individual rows, saving only the error text to the Record table. The full stack trace is still sent to the Python logging module.

@sheppard sheppard pinned this issue Nov 18, 2021
@sheppard sheppard unpinned this issue Jun 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants