Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal]: UTF8 literal support for nameof() expressions #6235

Closed
1 of 4 tasks
Sergio0694 opened this issue Jun 24, 2022 · 4 comments
Closed
1 of 4 tasks

[Proposal]: UTF8 literal support for nameof() expressions #6235

Sergio0694 opened this issue Jun 24, 2022 · 4 comments

Comments

@Sergio0694
Copy link

Sergio0694 commented Jun 24, 2022

Note: extracted minimal proposal from #184 (comment).

UTF8 literal support for nameof() expressions

  • Proposed
  • Prototype: Not Started
  • Implementation: Not Started
  • Specification: Not Started

Summary and motivation

This proposal is a small follow up to the UTF8 literal feature. There are many scenarios where developers would want to move to passing UTF8 literals to get better performance and not having to transcode the text at runtime, but the way the feature is currently specced, it would mean that replacing all callsites where today developers are using nameof, [CallerMemberName], [CallerArgumentExpression] etc. would not be possible. This would mean that developers would be forced to choose between less error prone code with worse performance (what they have today), or more error prone code (due to hardcoded literals) with faster performance. Having to make a compromise here is not ideal, and this proposal is meant to offer minimal support to solve this issue.

Specifically, the proposal is only to enable nameof expressions to be used as UTF8 arguments. This would allow developers to get robust code by using the nameof operator, and callsites currently using [CallerMemberName] and [CallerArgumentExpression] could just move to nameof instead to still have build time checks for the literals, instead of having hardcoded strings.

Essentially, the proposal is to allow u8 to be appended to a nameof expression, like so:

void Test(int parameter)
{
    var testUtf8 = nameof(Test)u8;
    var parameterUtf8 = nameof(parameter)u8;
}

This would have the same flexibility as nameof today, but with UTF8 support. Roslyn would just lower this exactly the same as it would have had if a literal had been used in place of that nameof expression, with the u8 suffix at the end.

Use case example

As mentioned in #184 (comment), we have a custom, high-performance, reflection-free ETW logger in the Microsoft Store, which exposes a builder pattern to allow callers to construct their structured events to submit, like so (simplified):

public void SomeEvent(string someText, string someOtherText, ...)
{
    using TracingDataBuilder builder = TracingDataBuilder.Create();
 
    // ...
   
    builder.AppendEventTagAndName(SOME_TAG.BAR); // [CallerMemberName]
    builder.AppendWStringKeyValuePair(pSomeText, someTextLength, nameof(someText));
    builder.AppendWStringKeyValuePair(pSomeOtherText, someOtherTextName, nameof(someOtherText));
    builder.AppendWStringKeyValuePair(pSomeId, someIdLength, nameof(someId));
    builder.AppendWStringKeyValuePair(pSomeType, someTypeLength, nameof(someType));
    builder.AppendBoolKeyValuePair(&someBoolParameter, nameof(someBoolParameter));
    builder.AppendBoolKeyValuePair(&someOtherBoolParameter, nameof(someOtherBoolParameter));
    builder.AppendInt32KeyValuePair(&someIntParameter, nameof(someIntParameter));
    
    _ = builder.EventWriteTransfer(_traceLogger, descriptor, null, null);
}

This currently uses Encoding.UTF8 to get the UTF8 data for those event and field names and build the event metadata. This is not ideal for performance, and we'd like to migrate to just using UTF8 literals. The issue with that is that due to the current design, there would be no way to use them without having to fall back to having hardcoded strings everywhere, which is not great.

With the proposed feature instead, we could just drop the attributes and simply do the following:

public void SomeEvent(string someText, string someOtherText, ...)
{
    using TracingDataBuilder builder = TracingDataBuilder.Create();
 
    // ...
   
    builder.AppendEventTagAndName(SOME_TAG.BAR, nameof(SomeEvent)u8);
    builder.AppendWStringKeyValuePair(pSomeText, someTextLength, nameof(someText)u8);
    builder.AppendWStringKeyValuePair(pSomeOtherText, someOtherTextName, nameof(someOtherText)u8);
    builder.AppendWStringKeyValuePair(pSomeId, someIdLength, nameof(someId)u8);
    builder.AppendWStringKeyValuePair(pSomeType, someTypeLength, nameof(someType)u8);
    builder.AppendBoolKeyValuePair(&someBoolParameter, nameof(someBoolParameter)u8);
    builder.AppendBoolKeyValuePair(&someOtherBoolParameter, nameof(someOtherBoolParameter)u8);
    builder.AppendInt32KeyValuePair(&someIntParameter, nameof(someIntParameter)u8);
    
    _ = builder.EventWriteTransfer(_traceLogger, descriptor, null, null);
}

The .NET runtime also has some examples of where this could be useful. For instance, consider this: https://github.com/dotnet/runtime/blob/2a01ceb2d004a125757a4bb95a9341cc283c5afd/src/libraries/System.Reflection.MetadataLoadContext/src/System/Reflection/TypeLoading/General/Utf8Constants.cs#L41-L62. This is exactly the kind of scenario where developers used to hardcode literals before we had the nameof operator, and then switched to Foo => nameof(Foo) when that became available. Having to go back to literals once again just to be able to get UTF8 encoding is not great. This could just be written as Foo => nameof(Foo)u8 instead to avoid the use of hardcoded string literals whenever possible.

Detailed design

The design allows using the u8 suffix on a nameof expression to have it lowered to an UTF8 ReadOnlySpan<byte>. All the same other rules as normal UTF8 literals (see here) apply here. A nameof expression with the u8 suffix would be lowered exactly the same as a literal with the same content of the nameof expression, with the u8 suffix.

The same conversion rules and other general rules around UTF8 literals would apply here as well, with no changes.

Drawbacks

It's a new, although small, bit of syntax, and it would likely need some mapping in the public code analysis APIs as well.

Alternatives

Force developers to go back to hardcoded strings in all of these scenarios. This works, but it's far from ideal.

@teo-tsirpanis
Copy link

The u8 suffix after a call-like expression does not look nice to me. I would prefer either nameof8(x), or making the suffix optional if it is unambiguous that we want the name in UTF-8.

@Sergio0694
Copy link
Author

Sergio0694 commented Jun 24, 2022

The proposal uses u8 as a suffix mostly for consistency, if we consider a nameof expression to be conceptually the same as a string literal. I would personally be fine with omitting the suffix and just allowing a nameof expression as a ReadOnlySpan<byte> parameter, but if I remember correctly, the LDM was in general against having implicit UTF8 conversions, which is why u8 was also made mandatory, so I don't see that happening (and I agree u8 should be explicit).

Integrating some comments from Discord by @sylveon:

maybe be C++ bias but i think nameof(Test)u8 makes sense
since u8 is a s suffix to the string
and if the mental model is nameof(T) -> "T", then appending u8 should give "T"u8

That is exactly what I had in mind too and why it looks consistent to me 🙂

Another issue with eg. nameof8 (or the nameofu8() proposed by @EgorBo): that's effectively a new keyword, so it's automatically -100000 starting points and basically a complete non-starter. The proposed syntax just has the same existing u8 suffix, but after a nameof expression. The nameof operator specifically is equivalent to a literal in that it returns a string constant, so it'd keep the same current "string literal + u8" structure that normal UTF8 string literals have.

@HaloFour
Copy link
Contributor

Feels like a lot of this could be handled through some target typing?

void Foo([CallerMemberName] ReadOnlySpan<byte> memberName = default) {
    ReadOnlySpan<byte> paramName = nameof(memberName);
}

@Sergio0694
Copy link
Author

This was something I thought about too, but after talking with @333fred and @jaredpar, they mentioned that those attributes are just not specced to support UTF8 spans, so enabling that would require more complex changes that they likely wouldn't consider to be worth it. Additionally, there's the point about the LDM deciding to always use u8 explicitly, which would make this a somewhat weird scenario (there's an implicit literal passed here but the fact the resulting data is UTF8 is not specified anywhere). That's where the current proposal comes from: it's simply an extension of the existing nameof expression, it uses the same "literal + u8" explicit suffix structure to be consistent, and it would allow supporting all of these scenarios with minimal changes. Sure, you wouldn't have the implicit caller attributes, but at least there would be no need to use hardcoded literals anymore in any of these scenarios, and by only making a change to the nameof syntax, and nothing else.

@dotnet dotnet locked and limited conversation to collaborators Jun 24, 2022
@333fred 333fred converted this issue into discussion #6238 Jun 24, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants