Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constant string concatenation #546

Open
Llewyllen opened this issue Jul 2, 2024 · 4 comments
Open

Constant string concatenation #546

Llewyllen opened this issue Jul 2, 2024 · 4 comments

Comments

@Llewyllen
Copy link

Llewyllen commented Jul 2, 2024

The following valid c99 code

char test()
{
  char* tmp = "\07""7";
  return tmp[0];
}

is wrongly parsed and returns a c_ast.Constant object with value '\077' which is incorrect. Same goes with hexadecimal.

The easy solution is to modify CParser.p_unified_string_literal by replacing
p[1].value = p[1].value[:-1] + p[2][1:]
by
p[1].value = p[1].value + p[2]

as simply removing double quotes it not a good idea. The modification would return a value of '\07""7' which is better but needs to be parsed to get each characters.

Another solution would be to have a list of strings for the value, but that would have way more impacts on other parts of the code (like the generator)

@eliben
Copy link
Owner

eliben commented Jul 3, 2024

I don't understand the issue. The following C program prints abcxyz, according to the standard:

#include <stdio.h>

int main() {
  char* str = "abc""xyz";
  printf("%s\n", str);
  return 0;
}

Can you clarify what pycparser is doing wrong, in your opinion?

@Llewyllen
Copy link
Author

Llewyllen commented Jul 3, 2024

For octal
"\07""7" is a 3 bytes string composed of 0x07 (octal value 7), 0x37 (character '7') and 0x00 (string end)
"\077" is a 2 bytes strings composed of 0x3F (octal value 77) and 0x00

For hexadecimal
"\x7""7" is a 3 bytes string composed of 0x07, 0x37 and 0x00
"\x77" is a 2 bytes string composed of 0x77 and 0x00

So if you simply remove consecutive double quotes (what PyCParser does), you get the wrong value

char test1()
{
  char* tmp = "\07""7";
  return tmp[0];
}

char test2()
{
  char* tmp = "\077";
  return tmp[0];
}

These 2 functions do not return the same value. First one returns 0x07, second one returns 0x3F

@eliben
Copy link
Owner

eliben commented Jul 3, 2024

Ah, so it's specific to octal and hex, then...
PR to fix welcome, though it has to handle all cases of string literal concatenation properly

@Llewyllen
Copy link
Author

Llewyllen commented Jul 3, 2024

As I said, there are not that many solutions

  • keep a list of strings, but has impact on other parts of PyCParser and might have an impact on people using PyCParser
  • keep the double quotes, but might have an impact on people using PyCParser

so I won't do a PR, as there is no ideal solution

Well, I did create a PR, not sure it will pass the tests (but it works for my needs)

From what I saw, it will not pass the test_unified_string_literals test, but then, this test is rather wrong because string concatenation is not as simple as removing consecutive double quotes.

I could add the test

d7 = self.get_decl_init(r'char* s = "\07" "7";')
self.assertNotEqual(d7, ['Constant', 'string', r'"\077"'])

and the current version would fail

I just saw that p_unified_wstring_literal has the same problem, but I won't put my hand in the widechar trap

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants