Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor to remove the need for a PassThrough stream #15

Closed
mcollina opened this issue Jan 13, 2023 · 25 comments · Fixed by #77
Closed

Refactor to remove the need for a PassThrough stream #15

mcollina opened this issue Jan 13, 2023 · 25 comments · Fixed by #77

Comments

@mcollina
Copy link
Member

in fastify-static, we need a PassThrough object to convert Send to an actual Writable:

https://github.com/fastify/fastify-static/blob/master/index.js#L113-L140

I think this module should be refactored to in combination with fastify-static to remove the need for the wrapper completely.

@Uzlopak Uzlopak changed the title Refactor to the move the need for a PassThrought object Refactor to remove the need for a PassThrought object Jan 14, 2023
@Uzlopak Uzlopak changed the title Refactor to remove the need for a PassThrought object Refactor to remove the need for a PassThrough stream Jan 14, 2023
@Uzlopak
Copy link
Contributor

Uzlopak commented Jan 14, 2023

I have no clue what is needed to do it.

@mcollina
Copy link
Member Author

Currently Send() is a stream-like object that don't really quack like a Stream.

It's also a "fake" synchronous API. There are two sets of asynchronous ops:

  1. gathet the stats of the file
  2. start the stream

Both of then happen after a call to pipe().

Essentially we need to refactor it so we split them. Get the metadata and then read the stream only if it's needed (in case or conditional gets).

@climba03003
Copy link
Member

climba03003 commented Jan 18, 2023

@mcollina Just some question before start.

Do you means Duplex instead of Writable? It should be a middle stream between fs.createReadStream and OutgoingMessage.

I assume it is what we would like to achieve?

import http from 'http'
import SendStream from '@fastify/send'
import { pipeline } from 'stream'

const app = http.createServer(function (req, res) {
  // the options include the duplex options
  const stream = new SendStream(req, req.url, { root: './' })
  pipeline(stream, res, function(err) {
    console.log(err)
  })
})

Does it means that we need to inspect on pipe and unpipe event?

graph TD
    A[SendStream] --> |created| B{Event Emitter}
    B --> |ready| C[Waiting for Pipe]
    B --> |pipe| D[Waiting for Ready]
    B --> |unpipe| E{Check Status}
    B --> |error| G[Cleanup and Destory]
    C --> |is pipe| F[Create File Readable]
    D --> |is ready| F
    E --> |flowing| G[Cleanup and Destory]
    E --> |finished| G
    E --> |never flow| C
    F --> |.pipe this| H[Streaming]
    H --> |end| G
Loading

@mcollina
Copy link
Member Author

No, not really.

I think we need something like:

import http from 'http'
import send from '@fastify/send'
import { pipeline } from 'stream'

const app = http.createServer(async function (req, res) {
  // the options include the duplex options
  const { statusCode, headers, stream } = await send(req, req.url, { root: './' })
  res.writeHead(statusCode, headers)
  if (stream) {
    pipeline(stream, res, function(err) {
      console.log(err)
    })
  } else {
    res.end()
  }
})

It might be callback-based instead of promise-based.

@serzero2007
Copy link

serzero2007 commented Jul 30, 2023

No, not really.

I think we need something like:

import http from 'http'
import send from '@fastify/send'
import { pipeline } from 'stream'

const app = http.createServer(async function (req, res) {
  // the options include the duplex options
  const { statusCode, headers, stream } = await send(req, req.url, { root: './' })
  res.writeHead(statusCode, headers)
  if (stream) {
    pipeline(stream, res, function(err) {
      console.log(err)
    })
  } else {
    res.end()
  }
})

It might be callback-based instead of promise-based.

Please check if you are interested in my proof of concept implementation of proposed API here:
https://gist.github.com/serzero2007/c6c48b4c90c4318edcc17b458cf876bd

All imported functions are from this repository. The code is just a draft rewrite of existing code to use new API and not tested at all. Index support is not done yet and also it's not clear how to re-implement events in the new API.

Looking forward to your feedback.

@mcollina
Copy link
Member Author

This is amazing! Would you mind to send a PR?

@serzero2007
Copy link

Well, it will take some time to make PR. It took 2 hours to get POC, but PR will take days.

There are questions we need to discuss first:

  1. The API is changed dramatically. I think it's better to create a separate package with a new API, rather than breaking compatibility with previous API or bundling both APIs in the same package. What do you think?
  2. Events API. We need somehow bring the functionality of it to a new API. Any suggestions?

@Uzlopak
Copy link
Contributor

Uzlopak commented Jul 30, 2023

We use fastify/send for our fastify/static module. And only we are currently using this package. So you are imho free to break the api, as long it works flawless in fastify/static. We would also need to bench the performance. I dont know ad hoc, if we need the events api. Again, it has to work flawless in fastify/static.

@serzero2007
Copy link

Well, it obviously will require changes on the side of fastify/static, because API will have changed. For beginning, I will just drop event api. I suppose it is not a big deal to add it afterwards.

@mcollina
Copy link
Member Author

This will give significant performance improvement to fastify-static, we are definitely happy to make the relevant the changes in that module.

We are not using the events API, so no problems there. We do support the index capability, however I think we could move it to @fastify/static or even drop that altogether.

@serzero2007
Copy link

serzero2007 commented Jul 31, 2023

I have a question. Why here

const etagL = etag.length
const isMatching = parseTokenList(ifNoneMatch, function (match) {
      const mL = match.length

      if (
        (etagL === mL && match === etag) ||
            (etagL > mL && 'W/' + match === etag)
      ) {
        return true
      }
})

do we compare the lengths of strings at all? Could we just compare strings itself? Is it a performance (micro)optimisation? Is it worth it after all? I am going to drop it and just compare strings:

function isIfNoneMatchFailure (req, context) {
    const ifNoneMatch = req.headers['if-none-match']

    if (ifNoneMatch === '*') return true
    if (typeof context.etag !== 'string') return false
    
    function checker(match) {
        if (match === context.etag) return true
        if ('W/' + match === context.etag) return true
        return undefined
    }

    const isMatching = parseTokenList(ifNoneMatch, checker, false)
    return isMatching
}

@Uzlopak
Copy link
Contributor

Uzlopak commented Jul 31, 2023

Yes the length comparison is a perf optimization. It is worth.

@serzero2007
Copy link

serzero2007 commented Jul 31, 2023

It's strange because it was only in one of two places in the code where this kind of things happen. Well, then I will do it in both places:

function isIfMatchPreconditionFailure (req, context) {
  const ifMatch = req.headers['if-match']
  if (!ifMatch || ifMatch === '*') return false

  const etag = context.etag
  const etagL = etag.length
  function checker (match) {
    const mL = match.length
    if (etagL === mL && match === etag) return true
    if (etagL > mL && 'W/' + match === etag) return true
    return undefined
  }

  const isMatching = parseTokenList(ifMatch, checker, false)

  return !isMatching
}

serzero2007@2cf019f

@serzero2007
Copy link

serzero2007 commented Jul 31, 2023

When we return 304, we should

res.removeHeader('Content-Encoding')
res.removeHeader('Content-Language')
res.removeHeader('Content-Length')
res.removeHeader('Content-Range')
res.removeHeader('Content-Type')

but in this API we only return header object, that should be then applied manually. How should I designate in that object, that headers are to be removed? I can set them to undefined or whatever, but then people may accidentally place undefined into setHeader like this res.setHeader('Content-Type', undefined) so undefined will be converted to string and the header will have been set to undefined instead of being removed. Any thoughts?

@mcollina

@Uzlopak
Copy link
Contributor

Uzlopak commented Jul 31, 2023

Yeah, when i think about it, we could do instead of etagL > mL this etagL - mL === 2

@serzero2007
Copy link

serzero2007 commented Jul 31, 2023

@Uzlopak serzero2007@7ec36db

Still working:

test('if-none-match', async function (t) {
  const result1 = await send({ headers: {} }, '/name.txt', { root: fixtures })
  t.strictSame(result1.status, 200)

  const result2a = await send({ headers: { 'if-none-match': result1.headers.ETag } }, '/name.txt', { root: fixtures })
  t.strictSame(result2a.status, 304)

  const result2b = await send({ headers: { 'if-none-match': result1.headers.ETag.slice(2) } }, '/name.txt', { root: fixtures })
  t.strictSame(result2b.status, 304)

  const result3 = await send({ headers: { 'if-none-match': result1.headers.ETag + 'corrupt' } }, '/name.txt', { root: fixtures })
  t.strictSame(result3.status, 200)
})

By the way, why this perf opt is important? Doesn't V8 itself compare strings the smart way by first comparing their lengths?

@serzero2007
Copy link

serzero2007 commented Jul 31, 2023

@Uzlopak

const test_string1 = Array.from(10000, i => `${i}`).join()
const test_string2 = Array.from(10000, i => `${i+1}`).join()

let k;
console.time("isLengthEqual");
for (let i = 0; i < 10000000; i++) {
    const isLengthEqual = test_string1.length === test_string2.length
    k = isLengthEqual
}
console.timeEnd("isLengthEqual");

let m;
console.time("isEqual");
for (let i = 0; i < 10000000; i++) {
    const isEqual = test_string1 === test_string2
    m = isEqual
}
console.timeEnd("isEqual");

Result:

isLengthEqual: 7.506ms
isEqual: 6.395ms

k and m are here to make sure V8 will not optimize everything out and skip comparing altogether.

At least it works in firefox:

isLengthEqual: 1294ms - timer ended        debugger eval code:9:9
isEqual: 1371ms - timer ended              debugger eval code:16:9

The same question: is it worth it or am I doing something wrong?

@Uzlopak
Copy link
Contributor

Uzlopak commented Jul 31, 2023

Well. v8 is smart. I kind of doubt that the length check is made by v8.

Anyhow. in the checker function the length check makes alot of sense, because the string comparison is usually more expensive than the length comparison. And in the checker we have two times string comparisons.

I personally would not think about perf improvements on this detailed level. First rewrite it so that the tests pass (if possible without modifying the tests much). Then we can improve the perf and benchmark it.

@serzero2007
Copy link

serzero2007 commented Jul 31, 2023

@Uzlopak Tests are to be modified to make use of a new API.

An example of test case:

test('if-unmodified-since', async function (t) {
  const result1 = await send({ headers: {} }, '/name.txt', { root: fixtures })

  const lmod = new Date(result1.headers['last-modified'])
  const date = new Date(lmod - 60000).toUTCString()

  const result2 = await send({ headers: { 'if-unmodified-since': date } }, '/name.txt', { root: fixtures })
  t.strictSame(result2.status, 412)

  // TODO: Is it correct?
  const result3 = await send({ headers: { 'if-unmodified-since': "corrupted" } }, '/name.txt', { root: fixtures })
  t.strictSame(result3.status, 200)

  const content1 = await streamToString2(result1.stream)
  const content3 = await streamToString2(result3.stream)

  t.strictSame(content1, "tobi")
  t.strictSame(content3, "tobi")
})

test('extentions', async function (t) {
  const result1 = await send({ headers: {} }, '/name', { root: fixtures, extensions: "txt" })
  t.strictSame(result1.status, 200)

  const result2 = await send({ headers: {} }, '/name', { root: fixtures, extensions: ["dir", "txt", "html"] })
  t.strictSame(result2.status, 200)

  const result3 = await send({ headers: {} }, '/name', { root: fixtures, extensions: ["html"] })
  t.strictSame(result3.status, 200)

  const content1 = await streamToString2(result1.stream)
  const content2 = await streamToString2(result2.stream)
  const content3 = await streamToString2(result3.stream)

  t.strictSame(content1, "tobi")
  t.strictSame(content2, "tobi")
  t.strictSame(content3, "<p>tobi</p>")

  const result4 = await send({ headers: {} }, '/name/', { root: fixtures, extensions: ["dir", "txt", "html"] })
  t.strictSame(result4.status, 404)

  const result5 = await send({ headers: {} }, '/name.html/', { root: fixtures, extensions: ["dir", "txt", "html"] })
  t.strictSame(result5.status, 404)
})

68bf54c

I will try to adapt all tests from the original code, but you will need to double check that everything is ok.

@serzero2007
Copy link

For now I wrote my own tests to get full test coverage. Check it here:
serzero2007@7e11d95

@Uzlopak
Copy link
Contributor

Uzlopak commented Aug 1, 2023

You want to provide a PR so that we can review?

@serzero2007
Copy link

serzero2007 commented Aug 1, 2023

It's a work in progress now. You can review current state at https://github.com/serzero2007/send/tree/proofofconceptnewapi
I can create PR, but I see no sense in it for now. Do you want me to create PR with current code?

@mcollina
Copy link
Member Author

@serzero2007 I only saw this message now. Of course, create a PR!

@gurgunday
Copy link
Member

Wow, somehow I missed it too

It looks interesting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants