Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_dta:A provided string value was longer than the available storage size of the specified column #268

Open
shezhou opened this issue Aug 21, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@shezhou
Copy link

shezhou commented Aug 21, 2024

arg1 = 'E:\test\single\dta\PRI_Basic.json'
arg3 = 'E:\test\single\dta\PRI_Basic.dta'
df = pd.read_json(arg1, dtype=dtype_dict, lines=True)
pyreadstat.write_dta(df, arg3)
The following error occurred:
yreadstat._readstat_parser.ReadstatError: A provided string value was longer than the available storage size of the specified column
View history lssues It seems to only solve the SAV format,

@ofajardo
Copy link
Collaborator

hi, thanks for the report. Please provide the data to reproduce the problem as indicated in the template.

@shezhou
Copy link
Author

shezhou commented Aug 21, 2024

hi, thanks for the report. Please provide the data to reproduce the problem as indicated in the template.

Thank you for your reply, this is data .
You can ignore dtype=dtype_dict in df = pd.read_json(arg1, dtype=dtype_dict, lines=True)
PRI_Basic.json

@ofajardo
Copy link
Collaborator

ofajardo commented Sep 2, 2024

in order to make the issue reproducible, please provide dtype_dict ( and any other information necessary to reproduce)

@ofajardo
Copy link
Collaborator

ofajardo commented Sep 2, 2024

OK, it seems that the issue is that you have one specific row with a very long string (of length 1988). Right now pyreadstat is writing it as dta type str which max length is 2045 bytes (that means ~1020 python characters). It seems that there is a way to write the newer strL type that can have much longer strings (see here), I can see if I can implement that in the future. For now the solution is to avoid writing such long strings, you could for example split them in multiple columns.

@ofajardo ofajardo added bug Something isn't working enhancement New feature or request and removed bug Something isn't working labels Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants