Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

column.replace() may fail if the resulting string column is too big #1694

Closed
st-pasha opened this issue Mar 1, 2019 · 0 comments · Fixed by #1696
Closed

column.replace() may fail if the resulting string column is too big #1694

st-pasha opened this issue Mar 1, 2019 · 0 comments · Fixed by #1696
Assignees
Labels
bug Any bugs / errors in datatable; however for severe bugs use [segfault] label low priority Low priority tasks
Milestone

Comments

@st-pasha
Copy link
Contributor

st-pasha commented Mar 1, 2019

>>> DT = dt.Frame(a=["a"] * 10000000)
>>> DT.replace("a", "A" * 250)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError: Assertion 'sb.size() == (mb.get_element<T>(n) & ~GETNA<T>())' failed in "c/column_string.cc", line 38

The problem here is that the size of the column created exceeds the limits of int32. The proper course of action in this case is to create the STR64 column instead.

@st-pasha st-pasha added bug Any bugs / errors in datatable; however for severe bugs use [segfault] label low priority Low priority tasks labels Mar 1, 2019
@st-pasha st-pasha added this to the Release 0.9.0 milestone Mar 1, 2019
@st-pasha st-pasha self-assigned this Mar 1, 2019
st-pasha added a commit that referenced this issue Mar 2, 2019
Direct construction of StringColumn<T> objects is now prohibited (except for the cases when a blank column is created). Instead, we use new_string_column() function which returns either StringColumn<uint32_t>* or StingColumn<uint64_t>*, whichever is more appropriate.

This function also takes care of "fixing" the offsets array in case it overflown during the construction. This works provided that no individual string in the column exceeds 2Gb in size.

Closes #1694
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Any bugs / errors in datatable; however for severe bugs use [segfault] label low priority Low priority tasks
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant