Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why does pyyaml disallow trailing spaces in block scalars? #121

Open
oulenz opened this issue Jan 16, 2018 · 22 comments · Fixed by argoproj-labs/hera#935
Open

Why does pyyaml disallow trailing spaces in block scalars? #121

oulenz opened this issue Jan 16, 2018 · 22 comments · Fixed by argoproj-labs/hera#935

Comments

@oulenz
Copy link

oulenz commented Jan 16, 2018

We do not permit trailing spaces for block scalars.

Block style (|, <) is overruled with quoted style (") if the string contains trailing spaces. I don't see any reason for this, and the fact that I can't force a certain style (even it means losing information) has caused me a lot of grief.

@perlpunk
Copy link
Member

perlpunk commented Apr 1, 2018

I agree that trailing spaces shouldn't be a reason to disallow block scalars.
Can't say what changes are needed to allow this, though.
If we could integrate https://github.com/yaml/yaml-test-suite and check that parsing the output again returns the same parsing events we could make sure that we don't break anything.

@ingydotnet
Copy link
Member

@oulenz, can you provide a small code sample of where this is causing you grief?

The emitter should never lose information. It has to check that the data can be expressed in the style requested (or make the best (yes, opinionated) choice, when no style is requested).

Trailing whitespace would be collapsed in folded < style, and is not visible in literal |. I can see why this choice was made.

I'd like to see a bit of what you are doing to see if it is worth the trouble to allow trailing spaces in (explicitly requested) literal.

@perlpunk
Copy link
Member

perlpunk commented Apr 2, 2018

@ingydotnet I agree loss of information is not an option.

For literal | block style it's easy to imagine use cases IMHO.
If you have text data in a specific format that requires trailing spaces, you still might want to dump it as a block scalar for readability.

some text format: |
    line 1___
    line 2
    line 3

Folded style is used when you have long lines. Imagine you have a long input line, with words seperated by different amount of spaces.

# _ == trailing spaces
foo: >
  xxxxxxxxxx    xxxxxxxxxx    xxxxxxxxxx    xxxxxxxxxx____
  xxxxxxxxxx    xxxxxxxxxx    xxxxxxxxxx    xxxxxxxxxx____
  xxxxxxxxxx    xxxxxxxxxx    xxxxxxxxxx    xxxxxxxxxx____
  xxxxxxxxxx    xxxxxxxxxx    xxxxxxxxxx    xxxxxxxxxx____

The emitter now tries to break this up into several lines to get below a limit of characters per line.

I think there are only two possibilities to to that, double quoted and folded block.
libyaml, pyyaml and ruamel all emit this as a double quoted scalar, overruling the requested block scalar style:

foo: "xxxxxxxxxx    xxxxxxxxxx    xxxxxxxxxx    xxxxxxxxxx     xxxxxxxxxx    xxxxxxxxxx
  \   xxxxxxxxxx    xxxxxxxxxx     xxxxxxxxxx    xxxxxxxxxx    xxxxxxxxxx    xxxxxxxxxx
  \    xxxxxxxxxx    xxxxxxxxxx    xxxxxxxxxx    xxxxxxxxxx    \n"

Which I think is not very readable. The only advantage is that there are no trailing spaces,.

@perlpunk
Copy link
Member

perlpunk commented Apr 2, 2018

btw, trailing spaces in folded block are not always folded, for example if an empty line follows:

foo: >
  some words___

  more words

@ingydotnet
Copy link
Member

@perlpunk I'm aware of how YAML works in these cases. I was asking @oulenz to show his actual use cases and code so that other solutions might be found.

imho, this is a pretty rare edge case, and it would require patching pyyaml and libyaml, and so it is likely a low priority item.

@perlpunk
Copy link
Member

perlpunk commented Apr 2, 2018

@ingydotnet , I was trying to give examples, to every reader of this issue, where I can understand the wish for block style to be preserved. I was not writing this to imply that you don't know how it works.

@oulenz
Copy link
Author

oulenz commented Apr 3, 2018

Thank you @perlpunk for these examples, they perfectly illustrate what I mean, and thank you both for looking into this.

My use case was that I had a large number of dictionaries that I wanted to store as yaml records, in | block style. Some of these happened to have trailing spaces and were output wrongly, as a double quoted scalar. This was frustrating because

  1. I had to reverse-engineer why this was happening, especially since I couldn't find any reason in the yaml spec why my strings shouldn't be representable in | block style. This really boggles the mind: pyyaml is needlessly deviating from the yaml spec. So this is a bug, that should be fixed.
  2. I was explicitly telling the emitter that I wanted | block style. The way every other Python function that I know of works is that the option I set is obeyed, possibly leading to warnings and/or errors, and that the consequences (like losing information in edge cases) are for me to deal with. This is programming, I should be able to control things. For pyyaml to override my option due to style considerations really is not acceptable.

My 'solution' was to strip trailing spaces before emitting so I would get | block style across the board, so ironically, this led me to lose information I wouldn't have lost otherwise.

@ysaakpr
Copy link

ysaakpr commented Oct 29, 2019

Why there is no traction on this. This seems to be what i was also facing, as @oulenz mentioned, spend hell lot of time identifying why its causing, while the yaml was correctly indented.

@andrewasheridan
Copy link

Identifying that this was the issue I was facing just killed a few hours for me.

At the very least, can pyyaml please throw a warning that says “cannot use style | on content with trailing spaces”

@iameugenejo
Copy link

I wasted a WHOLE DAY because of this bug.. why is this issue still open? It's been years...

@libreliu
Copy link

libreliu commented Sep 3, 2020

Another several hours for the bug..

@Jwink3101
Copy link

Any update on this? I moved to ruamel.yaml for writing to avoid this issue but fall back to pyyaml for reading since CLoader is way (!!!) faster

@cosven
Copy link

cosven commented Nov 23, 2020

Explicit is better than implicit and the behaviour is implicit. 😭

@johnhaire89
Copy link

I'm another person who spent a day trying to work out why my long strings weren't being styled as block literals -.-'

@maragunde93
Copy link

Its happening the same to me, I am trying to modify and Azure pipeline automatically and this behaivor screws the format of:

script: |
echo "this is a test"

@rfsantanna
Copy link

It works for me:

import yaml

def yaml_multiline_string_pipe(dumper, data):
    text_list = [line.rstrip() for line in data.splitlines()]
    fixed_data = "\n".join(text_list)
    if len(text_list) > 1:
        return dumper.represent_scalar('tag:yaml.org,2002:str', fixed_data, style="|")
    return dumper.represent_scalar('tag:yaml.org,2002:str', fixed_data)

yaml.add_representer(str, yaml_multiline_string_pipe)
print(yaml.dump({"multiline":"First line         \r\nSecond line\r\nThird line"}, allow_unicode=True))
multiline: |-
  First line
  Second line
  Third line

@ingydotnet
Copy link
Member

@rfsantanna nice.

I'd like to soon introduce a yaml.Config class for both loading and dumping.
In this case you'd simply be able to provide a mapping of regex to preferred scalar dumping style.

@Jwink3101
Copy link

It works for me:

import yaml

def yaml_multiline_string_pipe(dumper, data):
    text_list = [line.rstrip() for line in data.splitlines()]
    fixed_data = "\n".join(text_list)
    if len(text_list) > 1:
        return dumper.represent_scalar('tag:yaml.org,2002:str', fixed_data, style="|")
    return dumper.represent_scalar('tag:yaml.org,2002:str', fixed_data)

yaml.add_representer(str, yaml_multiline_string_pipe)
print(yaml.dump({"multiline":"First line         \r\nSecond line\r\nThird line"}, allow_unicode=True))
multiline: |-
  First line
  Second line
  Third line

This is not a solution as it does not preserve the spaces. You cannot round-trip this!

@nsheff
Copy link

nsheff commented Aug 9, 2022

I also just spent several hours trying to figure this out. The behavior of the current pyyaml is thus broken for my use case, because I need to preserve trailing spaces on line endings in a literal.

If I am specifying to use a block scalar, then I want to use a block scalar, trailing spaces and all.

@Jwink3101
Copy link

Is there any update to this? Using markdown inside of YAML is broken when you can't have trailing spaces (trailing spaces in markdown before a newline means <br> as opposed to <p>)

@jonesbusy
Copy link

Same issue here. Lost many hours. Didn't find any solution

rhmdnd added a commit to rhmdnd/content that referenced this issue Jul 18, 2023
There is an issue with how pyyaml dumps lines, where if the line ends
with whitespace, styling won't matter.

  yaml/pyyaml#121

This was affecting our ability to generate YAML output where audit
procedures (with many long lines) would get rendered as a single line
with escaped newlines. Instead, we want to render newlines as a block.

This commit parse each line of the data before it is dumped to make sure
trailing whitespace is removed as a workaround for this issue in pyyaml.
rhmdnd added a commit to rhmdnd/content that referenced this issue Jul 21, 2023
There is an issue with how pyyaml dumps lines, where if the line ends
with whitespace, styling won't matter.

  yaml/pyyaml#121

This was affecting our ability to generate YAML output where audit
procedures (with many long lines) would get rendered as a single line
with escaped newlines. Instead, we want to render newlines as a block.

This commit parse each line of the data before it is dumped to make sure
trailing whitespace is removed as a workaround for this issue in pyyaml.
@nitzmahone
Copy link
Member

I think we're all agreed this is an old implementation issue, not a spec issue- I'm willing to spend some time trying to make it work in the next release, but as @perlpunk mentioned earlier, I'd want to integrate the more comprehensive YAML test suite first to have some confidence we didn't break anything in the process.

I've added this issue to the PyYAML 6.1 planning project.

elliotgunton added a commit to argoproj-labs/hera that referenced this issue Jan 22, 2024
**Pull Request Checklist**
- [x] Tests added
- [x] [Good commit messages](https://cbea.ms/git-commit/) and/or PR
title

**Description of PR**
Currently, multiline strings serialize to yaml in a way that is not true
to the original content.

For example, copying [argo workflows' exit-handler
example](https://github.com/argoproj/argo-workflows/blob/main/examples/exit-handler-slack.yaml)

excerpt:
```yaml
      args: [
        "curl -X POST --data-urlencode 'payload={
          \"text\": \"{{workflow.name}} finished\",
          \"blocks\": [
            {
              \"type\": \"section\",
              \"text\": {
                \"type\": \"mrkdwn\",
                \"text\": \"Workflow {{workflow.name}} {{workflow.status}}\",
              }
            }
          ]
        }'
        YOUR_WEBHOOK_URL_HERE"
      ]
```

currently produces:
```yaml
        args:
        - "curl -X POST --data-urlencode 'payload={\n          \"text\": \"{{workflow.name}}\
          \ finished\",\n          \"blocks\": [\n            {\n              \"\
          type\": \"section\",\n              \"text\": {\n                \"type\"\
          : \"mrkdwn\",\n                \"text\": \"Workflow {{workflow.name}} {{workflow.status}}\"\
          ,\n              }\n            }\n          ]\n        }'\n        YOUR_WEBHOOK_URL_HERE\"\
          \n        "
```

whereas this PR produces:
```yaml
        args:
        - |-
          curl -X POST --data-urlencode 'payload={
                    "text": "{{workflow.name}} finished",
                    "blocks": [
                      {
                        "type": "section",
                        "text": {
                          "type": "mrkdwn",
                          "text": "Workflow {{workflow.name}} {{workflow.status}}",
                        }
                      }
                    ]
                  }'
                  YOUR_WEBHOOK_URL_HERE";
```

---

Note that this does not fix
yaml/pyyaml#121 (comment). That
is, the following example will revert back to original formatting due to
the trailing `\n` at the end of the string.
```python
args="""
two
"""
```

I would suggest switching `ruamel.yaml` over pyyaml just generally,
which would fix this issue, but that's probably going to be considered a
breaking change.

In any case is probably more tangential to the root issue above (which
affects both libraries in the same way).

---------

Signed-off-by: DanCardin <ddcardin@gmail.com>
Signed-off-by: DanCardin <DanCardin@users.noreply.github.com>
Co-authored-by: Elliot Gunton <egunton@bloomberg.net>
Co-authored-by: Flaviu Vadan <flaviuvadan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.