Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow developer to control FpmHandler timeout #770

Closed
wants to merge 2 commits into from

Conversation

deleugpn
Copy link
Member

@deleugpn deleugpn commented Oct 8, 2020

Problem

Bref currently don't support APIs that take longer than 30 seconds, even if running behing Application Load Balancer, which don't have the same restriction as API Gateway.

Diagnostics

  1. Lambda Timeout: Specified on the Lambda Resource (SAM/Serverless), it is currently limited by AWS up to 15 minutes. I have set this to 180 seconds to support APIs that take up to 3 minutes.

  2. PHP Max Execution Time: ini configuration that comes by default set to 27s1. We are able to increase this value by providing our own ini configuration2.

  3. Unix Read Write Timeout: The focus of this Pull Request. Currently, The Unix Socket has a timeout of 30 seconds3. Any request that reaches the 30 seconds mark gets the following exception:

Error communicating with PHP-FPM to read the HTTP response. 
A root cause of this can be that the Lambda (or PHP) timed out, 
for example when trying to connect to a remote API or database, if this happens continuously check for those!

Solutions

In order to support requests that takes longer than 30 seconds we have a few options.

  • Environment Variable. BREF can expose one environment variable that controls the Unix timeout. Ideally, if AWS Lambda had a standard environment variable containing the time specified in the Lambda Timeout (item 1), that would be perfect for us to intercept and use. Unfortunately, AWS does not offer that and although it could solve the issue, I don't know if it's fair to ask Bref to expose yet another environment variable for us. As a user, I'm not against this option, but I sympathize that it's not ideal for Bref maintenance.

  • Allow users to set the Timeout via the setter. This pull request adds that ability. It is, however, a setter method on an @internal class, which doesn't make much sense because Bref internally don't need this setter. The fact that this class is internal makes it a strong argument to decline this change.

  • Users can write their own FpmHandler. This option requires users to copy/paste the entire FpmHandler class while still being subject to the same issues as the previous option.

Usage

Where users would be able to call the setReadWriteTimeout method? We would have to mix 2 layers: FPM and Function. The FPM layer brings all the necessary files to get FPM to work. The Function layer allows us to return our own handler4.
The final result looks like this:

In the Template, we want the function layer to come in the end as it will overwrite files brought by the previous layer5

      Layers:
        - !Sub "arn:aws:lambda:${AWS::Region}:209497400698:layer:php-74-fpm:13"
        - !Sub "arn:aws:lambda:${AWS::Region}:209497400698:layer:php-74:13"

Our Handler would point to a functional handler:

    Handler: src/MyFpmHandler.php

The Handler File could instantiate and modify the Fpm object:

use Bref\Event\Http\FpmHandler;

$handler = new FpmHandler(__DIR__ . '/index.php');

// 180 seconds to match with what I wanted from Lambda Timeout
$handler->setSocketReadWriteTimeout(180000)->start();

return $handler;

Caveat / Gotcha

1- The Function layer will also overwrite the php.ini from the FPM Layer. We're still able to patch that up inside our projects, but it's something users would need to keep in mind.
2- Changes to Function/bootstrap.php could theoretically negatively affect this setup, but I say theoretically because we're still respecting the contract from Function: we're returning a valid Handler.
3- Changes to Fpm/bootstrap would not take effect on people doing a setup like this. This one is less theoretical as any improvements made to Fpm Bref would not be seen by people doing this.
4- The Lambda handler is no longer the index.php and the custom Handler has to instantiate FpmHandler specifying where the index.php is located.
5- FpmHandler is internal.

Conclusion

While writing all of this, I'm starting to think that this is not a good idea for Bref. I will create the PR anyway as it could be useful information for future users even if it gets declined. I think this PR could be a good place to get the discussion rolling on:

1- Should Bref support APIs that take longer than 30 seconds behind ALB?
2- Are there any alternatives that better fit Bref philosophy?
3- Are there ways to mitigate the caveats / gotchas?


Reference

[1]: https://github.com/brefphp/bref/blob/master/runtime/layers/fpm/php.ini#L41
[2]: https://bref.sh/docs/environment/php.html#phpini
[3]: https://github.com/brefphp/bref/blob/master/src/Event/Http/FpmHandler.php#L81
[4]: https://github.com/brefphp/bref/pull/694
[5]: Your function can access the content of the layer during execution in the /opt directory. Layers are applied in the order that's specified, merging any folders with the same name. If the same file appears in multiple layers, the version in the last applied layer is used. 
https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html

@deleugpn
Copy link
Member Author

deleugpn commented Oct 9, 2020

Closing this as it's a bad solution anyway.

@deleugpn deleugpn closed this Oct 9, 2020
@deleugpn deleugpn deleted the timeout branch November 16, 2021 19:46
mnapoli added a commit that referenced this pull request Jan 4, 2022
When Lambda times out with the PHP-FPM layer, the logs written by the PHP script are never flushed to stderr by PHP-FPM. That means they never reach CloudWatch, which makes timeouts really hard to debug.

With this change, Bref waits for the FPM response until 1 second before the actual Lambda timeout (via a connection timeout on the FastCGI connection).

If Bref reaches that point, it will ask PHP-FPM to gracefully restart the PHP-FPM worker, which:

- flushes the logs (logs end up in CloudWatch, which is great)
- restarts a clean FPM worker, without doing a full FPM restart (which may take longer)

Follow up of #770, #772, #895

May address some of #862

Note: this does not change anything for the Function layer (only affects FPM). Also this does not show a full stack track of the place in the code where the timeout happens (#895 did). Still it's an improvement over the current status.
@mnapoli
Copy link
Member

mnapoli commented Jan 4, 2022

Do you remember why it was a bad solution? I follow the same path and I managed to get something working: #1133

mnapoli added a commit that referenced this pull request Jan 4, 2022
When Lambda times out with the PHP-FPM layer, the logs written by the PHP script are never flushed to stderr by PHP-FPM. That means they never reach CloudWatch, which makes timeouts really hard to debug.

With this change, Bref waits for the FPM response until 1 second before the actual Lambda timeout (via a connection timeout on the FastCGI connection).

If Bref reaches that point, it will ask PHP-FPM to gracefully restart the PHP-FPM worker, which:

- flushes the logs (logs end up in CloudWatch, which is great)
- restarts a clean FPM worker, without doing a full FPM restart (which may take longer)

Follow up of #770, #772, #895

May address some of #862

Note: this does not change anything for the Function layer (only affects FPM). Also this does not show a full stack track of the place in the code where the timeout happens (#895 did). Still it's an improvement over the current status.
mnapoli added a commit that referenced this pull request Jan 4, 2022
When Lambda times out with the PHP-FPM layer, the logs written by the PHP script are never flushed to stderr by PHP-FPM. That means they never reach CloudWatch, which makes timeouts really hard to debug.

With this change, Bref waits for the FPM response until 1 second before the actual Lambda timeout (via a connection timeout on the FastCGI connection).

If Bref reaches that point, it will ask PHP-FPM to gracefully restart the PHP-FPM worker, which:

- flushes the logs (logs end up in CloudWatch, which is great)
- restarts a clean FPM worker, without doing a full FPM restart (which may take longer)

Follow up of #770, #772, #895

May address some of #862

Note: this does not change anything for the Function layer (only affects FPM). Also this does not show a full stack track of the place in the code where the timeout happens (#895 did). Still it's an improvement over the current status.
@deleugpn
Copy link
Member Author

deleugpn commented Jan 4, 2022

The setter was bad because it required users to write their own bootstrap.php to modify the timeout. #1133 uses the context to offer a much better user experience

mnapoli added a commit that referenced this pull request Feb 14, 2023
When Lambda times out with the PHP-FPM layer, the logs written by the PHP script are never flushed to stderr by PHP-FPM. That means they never reach CloudWatch, which makes timeouts really hard to debug.

With this change, Bref waits for the FPM response until 1 second before the actual Lambda timeout (via a connection timeout on the FastCGI connection).

If Bref reaches that point, it will ask PHP-FPM to gracefully restart the PHP-FPM worker, which:

- flushes the logs (logs end up in CloudWatch, which is great)
- restarts a clean FPM worker, without doing a full FPM restart (which may take longer)

Follow up of #770, #772, #895

May address some of #862

Note: this does not change anything for the Function layer (only affects FPM). Also this does not show a full stack track of the place in the code where the timeout happens (#895 did). Still it's an improvement over the current status.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants