Allow lexers to use custom source types #222

Kixiron · 2021-09-01T22:07:51Z

I'm currently using ropey in my LSP where logos is the backing lexer, and I'd like to be able to use a Rope natively within my generated lexer. The Source trait looks perfect, except there's no way I can find to be able to use arbitrary implements with logos::Lexer

The text was updated successfully, but these errors were encountered:

maciejhirsz · 2023-02-26T12:48:36Z

Fixed in #225.

kmicklas · 2023-08-13T19:49:33Z

Unless I'm misunderstanding the Source API, this still isn't possible for non-contiguous types like ropes.

#225 (comment) notes that it might be possible using Ropey's chunks API, but you don't get to choose the size of those. Source::read lets the caller request any sized contiguous Chunk. The docs seem to imply that read should return None only when hitting EOF. Is that not accurate? Can we return None until a small enough Chunk is requested?

jeertmans · 2023-08-14T09:17:44Z

@kmicklas a more recent issue #324 discussed the problem of reader by chunks, which I assume Rope to be the same. Maybe it is worth moving your question there to centralize things?

I don’t know how much related it is

kmicklas · 2023-08-14T19:19:07Z

@jeertmans I think the concern there is a bit different since rope types should generally have no problems with random access.

kmicklas · 2023-08-16T18:48:29Z

I put together a basic crate demoing the possibility of using Ropey and Logos together: https://github.com/kmicklas/logos-ropey

However, I can't publish it on crates.io until #332 and #333 are released.

jeertmans · 2023-08-17T15:09:38Z

Oh that's pretty nice @kmicklas! I think Ropey's Rope is so important that it might be worth to implement that directly within the logos crate, to avoid creating an intermediate struct, and hide this behind some feature like ropey.

What do you think?

kmicklas · 2023-08-17T15:12:40Z

Oh that's pretty nice @kmicklas! I think Ropey's Rope is so important that it might be worth to implement that directly within the logos crate, to avoid creating an intermediate struct, and hide this behind some feature like ropey.

That would be nice! I'm happy to make a PR for that if you think it's a better approach.

jeertmans · 2023-08-17T19:39:50Z

Super! I just merged the two PRs :)

kmicklas · 2023-08-18T19:37:09Z

Okay, I did some testing, and #222 (comment) is definitely a problem. (It's not actually totally trivial to trigger a chunk size greater than 1.)

To support non-contiguous sources like ropes, I can think of three possibilities:

We can alter the generated algorithm to try smaller chunk sizes when a larger one doesn't work. This would likely have some performance impact on contiguous sources, although maybe we can get around it by not ever using None from read to indicate EOF, and instead check EOF ahead of time so that None always means we're just handling non-contiguity.
We can alter the read/Chunk API so that Chunk is an owned value (i.e. [u8; N]) instead of a view into (assumed) contiguous bytes from the input (&'src [u8; N] as it is now). My guess is that N is usually small enough for this to be okay. For example the tests don't trigger an N bigger than 8, which would probably be more efficient to put directly in a word with no indirection anyway. However given a pathological grammar it could involve an arbitrary amount of copying.
We can invert the control of chunk reading to take a closure which gets a local reference to a contiguous chunk. This would allow a rope source to stack allocate a contiguous chunk to copy data into if necessary, but use the underlying data without copying in the common case. This seems like the most general solution that doesn't harm the string case, but I don't understand the generated code well enough to understand if this will easily work.

kmicklas · 2023-08-19T07:49:56Z

I thought of another option that might work well and be easy to implement. Instead of Chunk being a trait provided by the caller, it can be a GAT defined by the source:

type Chunk<'a, const N: usize>: Into<&'a [u8; N]>
    where Self: 'a

The contiguous string sources would just define this as &'a [u8; N]. Rope-like sources could use something like Either<&'a [u8; N], [u8; N]>. (It's like Cow but that won't work here because of ToOwned.)

This avoids any branching or copying in the string case and lets the rope case use borrowed data when possible.

jeertmans · 2023-08-21T16:06:21Z

Nice finding @kmicklas! I think the best is to test that out, and include some benchmarks if possible.

If we can change the current traits without degrading performances on &str and &[u8], that would be super!

flyingmutant · 2024-01-02T09:47:53Z

Is there any particular direction your decided to pursue to make logos work with non-contiguous types, any area where it is possible to help? I am interested in using rope or gap buffer source as well.

jeertmans · 2024-01-02T13:43:05Z

Hello @flyingmutant, well almost nothing was done on that topic, but you can surely help :-)

There is also a list of "future work" that @maciejhirsz mentioned in the last release: https://github.com/maciejhirsz/logos/releases/tag/v0.13.

philliard3 mentioned this issue Sep 29, 2021

Custom Source implementation when using derive macro #225

Merged

maciejhirsz closed this as completed Feb 26, 2023

kmicklas mentioned this issue Aug 13, 2023

Use lifetime GAT for Source::Slice #332

Merged

kmicklas mentioned this issue Aug 15, 2023

Fix ignored source attribute #333

Merged

kmicklas mentioned this issue Aug 22, 2023

Cleanup unused Source features #335

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow lexers to use custom source types #222

Allow lexers to use custom source types #222

Kixiron commented Sep 1, 2021

maciejhirsz commented Feb 26, 2023

kmicklas commented Aug 13, 2023

jeertmans commented Aug 14, 2023

kmicklas commented Aug 14, 2023

kmicklas commented Aug 16, 2023

jeertmans commented Aug 17, 2023

kmicklas commented Aug 17, 2023

jeertmans commented Aug 17, 2023

kmicklas commented Aug 18, 2023

kmicklas commented Aug 19, 2023 •

edited

Loading

jeertmans commented Aug 21, 2023

flyingmutant commented Jan 2, 2024

jeertmans commented Jan 2, 2024

Allow lexers to use custom source types #222

Allow lexers to use custom source types #222

Comments

Kixiron commented Sep 1, 2021

maciejhirsz commented Feb 26, 2023

kmicklas commented Aug 13, 2023

jeertmans commented Aug 14, 2023

kmicklas commented Aug 14, 2023

kmicklas commented Aug 16, 2023

jeertmans commented Aug 17, 2023

kmicklas commented Aug 17, 2023

jeertmans commented Aug 17, 2023

kmicklas commented Aug 18, 2023

kmicklas commented Aug 19, 2023 • edited Loading

jeertmans commented Aug 21, 2023

flyingmutant commented Jan 2, 2024

jeertmans commented Jan 2, 2024

kmicklas commented Aug 19, 2023 •

edited

Loading