Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore curl support #1133

Closed
wants to merge 15 commits into from
Closed

Explore curl support #1133

wants to merge 15 commits into from

Conversation

adamziel
Copy link
Collaborator

@adamziel adamziel commented Mar 22, 2024

What is this PR doing?

Explores building PHP with libcurl support

CURL builds, PHP builds with the --with-curl flag, curl_init() etc run as expected.

However, running the following PHP snippet fails:

<?php 
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://wordpress.org');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$output = curl_exec($ch);
var_dump($output);
var_dump(curl_error($ch));
curl_close($ch);

CleanShot 2024-03-22 at 18 17 12@2x

Reproduction link: http://localhost:5400/website-server/?php=8.0&wp=6.4&storage=none&php-extension-bundle=kitchen-sink&url=/test-curl.php

Curl likely runs fork() internally, similarly to PHP's proc_open(). Getting it to work in Playground will require patching curl source code to remove that fork() call and, likely, replace it with a JavaScript function call – similarly to the0 proc_open() patch.

To rebuild curl, run:

cd packages/php-wasm/compile
rm -rf libcurl/dist
make libcurl
cd ../../../
nx reset; npm run recompile:php:web:kitchen-sink:8.0

cc @mho22 – I spent an hour here just to get to the first roadblock. I won't be able to spend more time here for now – you're more than welcome to take over. I'd love to see a functional CURL extension!

Related resources

@adamziel adamziel requested a review from a team as a code owner March 22, 2024 17:22
@adamziel adamziel changed the base branch from ssl-network-bridge to trunk March 22, 2024 17:22
@adamziel adamziel changed the title Curl support Explore curl support Mar 22, 2024
@bgrgicak
Copy link
Collaborator

File descriptors, fork and NTLM
An application that uses libcurl and invokes fork() gets all file
descriptors duplicated in the child process, including the ones libcurl
created.
libcurl itself uses fork() and execl() if told to use the
CURLAUTH_NTLM_WB authentication method which then invokes the helper
command in a child process with file descriptors duplicated. Make sure that
only the trusted and reliable helper program is invoked!

https://github.com/curl/curl/blob/647e86a3efe1eea7a2a456c009cfe1eb55fe48eb/docs/libcurl/libcurl-security.md?plain=1#L452C1-L462C1

@bgrgicak
Copy link
Collaborator

bgrgicak commented Apr 1, 2024

NTML and NTLM_WP are both set to no in PHP info. I don't think that this is caused by using fork.

This message is documented in CURL. We could try to disable AsynchDNS and see if this resolves the issue.

@bgrgicak
Copy link
Collaborator

bgrgicak commented Apr 1, 2024

Disabling AsynchDNS resolved the thread failed to start error.

Now the test request times-out and from the sound of my fans, it keeps doing something in the background. I haven't debugged it.

@bgrgicak
Copy link
Collaborator

bgrgicak commented Apr 1, 2024

The verbose output has some insights:

* Trying 172.29.1.0:80... * Could not set TCP_NODELAY: Protocol not available * Connection timed out after 10000 milliseconds * Closing connection 0 bool(false) string(45) "Connection timed out after 10000 milliseconds"

I see that we use TCP_NODELAY in PHP-WASM, but I don't know where is this coming from: TCP_NODELAY: Protocol not available

@bgrgicak
Copy link
Collaborator

bgrgicak commented Apr 1, 2024

curl_setopt($ch, CURLOPT_TCP_NODELAY, 0); resolves the TCP_NODELAY, now I'm back to timeouts.

@bgrgicak
Copy link
Collaborator

bgrgicak commented Apr 1, 2024

I need to wrap up now, this is what I found today:

  • NTML and NTLM_WP are disabled so they won't trigger fork
  • AsynchDNS started a new thread and that was resolved by disabling it
  • TCP_NODELAY is enabled by default, but it doesn't work (Protocol not available), disabling it in the request works, but I'm not sure if it's required
  • Requests are now timing out without any errors. I assume, that the request isn't properly sent to WASM, or that the response isn't properly returned, but I wasn't able to debug this part today.

@mho22
Copy link
Contributor

mho22 commented Apr 2, 2024

I attempted another approach on my end by trying to run php-wasm/node with curl. It does appear in my modules list when I run it. I created a file named curl.php:

<?php

$ch = curl_init();

curl_setopt( $ch, CURLOPT_URL, 'http://wordpress.org' );
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, 1 );

var_dump( curl_version() );
var_dump( curl_getinfo( $ch ) );

$output = curl_exec( $ch );
var_dump( $output );
var_dump( curl_error( $ch ) );
curl_close( $ch );

I still get the same error of course :

bool(false)
string(37) "getaddrinfo() thread failed to start\n"

So I tried to make a comparison with built-in php where no error occur when running php curl.php.

curl_version() in php-wasm :

["ssl_version"]=>
 string(0) ""
["libz_version"]=>
 string(0) ""

when curl_version() in PHP8.3 :

["ssl_version"]=>
string(13) "OpenSSL/3.1.4"
["libz_version"]=>
string(5) "1.3.1"

Additionally, the following information is present in PHP 8.3 but missing in php-wasmwhen running curl_getinfo($ch):

["effective_method"]=>
 string(3) "GET"
["capath"]=>
 string(14) "/etc/ssl/certs"
["cainfo"]=>
 string(17) "/etc/ssl/cert.pem"

I'm not sure if this information could be helpful, but it's something I noticed."

@bgrgicak
Copy link
Collaborator

bgrgicak commented Apr 2, 2024

Thank you @mho22! I have an hour now and can take a look at it.

@bgrgicak
Copy link
Collaborator

bgrgicak commented Apr 2, 2024

@bgrgicak
Copy link
Collaborator

bgrgicak commented Apr 2, 2024

@bgrgicak
Copy link
Collaborator

bgrgicak commented Apr 2, 2024

It looks like this will take some effort to find the correct combination of flags and link all required libraries.
For example, the scp protocol which I assume we need, requires libssh. We need to add libssh, build it, and link it.

@bgrgicak
Copy link
Collaborator

bgrgicak commented Apr 2, 2024

@mho22 feel free to take over, I'm not sure if I will have time to work more on this.

@adamziel
Copy link
Collaborator Author

adamziel commented Apr 2, 2024

scp protocol which I assume we need

I only had the http:// and https:// support in mind for the first iteration here. That's about what the browser can support anyway. Anything beyond that would make a great follow-up effort, but I wouldn't block v1 on it.

@mho22
Copy link
Contributor

mho22 commented Apr 2, 2024

I don't know why it produces an error here but :

cd packages/php-wasm/compile
rm -rf libcurl/dist
make libcurl

returns :

#14 17.84   CC       ../lib/curl-nonblock.o
#14 17.91   CC       ../lib/curl-warnless.o
#14 17.98   CC       ../lib/curl-curl_ctype.o
#14 18.05   CCLD     curl
#14 18.15 wasm-ld: error: duplicate symbol: curlx_strtoofft
#14 18.15 >>> defined in ../lib/curl-strtoofft.o
#14 18.15 >>> defined in ../lib/.libs/libcurl.a(libcurl_la-strtoofft.o)
#14 18.15
#14 18.15 wasm-ld: error: duplicate symbol: curlx_nonblock
#14 18.15 >>> defined in ../lib/curl-nonblock.o
#14 18.15 >>> defined in ../lib/.libs/libcurl.a(libcurl_la-nonblock.o)
...

duplicate symbol errors prevent the script to successfully end.

@adamziel
Copy link
Collaborator Author

adamziel commented Apr 2, 2024

I think that's fine, at that point libcurl.a is already created in the filesystem. This is why I put || true in this line:

RUN source /root/emsdk/emsdk_env.sh && EMCC_SKIP="-lc -lz -lcurl" EMCC_FLAGS="-sSIDE_MODULE" emmake make || true

It would be useful to have a comment in place to document that behavior.

@mho22
Copy link
Contributor

mho22 commented Apr 2, 2024

@adamziel Ok thank you. Could it be possible that curl has no ssl and libz even if this portion of code added them [ or at least openssl ] :

RUN CPPFLAGS="-I/root/lib/include " \
    LDFLAGS="-L/root/lib/lib " \
    PKG_CONFIG_PATH=$PKG_CONFIG_PATH \
    source /root/emsdk/emsdk_env.sh && \
    emconfigure ./configure \
        --build i386-pc-linux-gnu \
        --target wasm32-unknown-emscripten \
        --prefix=/root/install/ \
        --disable-shared \
        --enable-static \
        --with-openssl \
        --enable-https \
        --enable-http

I suspect curl to not properly load openssl and zlib as displayed using var_dump( curl_version() );

I currently don't have the tools to investigate this. But I should try :

  • I suppose make libcurl will run the script from compile/Makefile of course.
  • It will run base-image libz and libopenssl scripts before all.
  • create a dist directory dist/root/lib in libcurl
  • run the libcurl/Dockerfile and return the different resulting directories curl-7.69.1/libs/.libs -> libcurl/dist/root/lib/lib and curl-7.69.1/include -> ./libcurl/dist/root/lib/include.

This assumes that the Dockerfile script runs correctly. We can consider having curl with openssl [ openssl having zlib ].

  • running npm run recompile:php:node:8.0 should then add curl to php thanks to this :
# Add curl if needed
RUN if [ "$WITH_CURL" = "yes" ]; \
	then \
		echo -n ' --with-curl=/root/lib ' >> /root/.php-configure-flags; \
		echo -n ' /root/lib/lib/libcurl.a' >> /root/.emcc-php-wasm-sources; \
	fi;

And in fact if we display phpinfo() , curl exists. But something is missing between php-wasm phpinfo() and php phpinfo() :

php-wasm phpinfo() :

curl

cURL Information => 7.69.1
Age => 5
IPv6 => No
libz => No
NTLM => No
SSL => No
TLS-SRP => No
HTTP2 => No
HTTPS_PROXY => No
Host => i386-pc-linux-gnu

curl.cainfo => no value => no value

php8.3 phpinfo()

curl

cURL Information => 8.6.0
Age => 10
IPv6 => Yes
libz => Yes
NTLM => Yes
SSL => Yes
TLS-SRP => Yes
HTTP2 => Yes
HTTPS_PROXY => Yes
ALTSVC => Yes
HTTP3 => No
UNICODE => No
ZSTD => No
HSTS => Yes
GSASL => No
Protocols => ftps, gophers, https, imaps, ldap, ldaps, mqtt,pop3s, smb, smbs, smtps
Host => Darwin
SSL Version => OpenSSL/3.1.4
ZLib Version => 1.3.1

curl.cainfo => .../config/php/cacert.pem

I only displayed the differences between the two curls. libz, SSL are part of the main differences.

But what next ?

How can I be sure the problem comes from compile/libcurl/Dockerfile or maybe libcurl/dist/root/lib/lib/libcurl.a or libcurl/dist/root/lib/include/Makefile ? Where should I investigate ?

@@ -29,6 +29,8 @@ RUN CPPFLAGS="-I/root/lib/include " \
--enable-https \
--enable-http \
--disable-pthreads \
--disable-threaded-resolver
--enable-websockets \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not needed for v1

@bgrgicak
Copy link
Collaborator

bgrgicak commented Apr 3, 2024

I only displayed the differences between the two curls. libz, SSL are part of the main differences.

I forgot to push it yesterday. This commit adds zlib. 0c84fd1

@mho22
Copy link
Contributor

mho22 commented Apr 8, 2024

@bgrgicakI Thanks ! I found out two days ago it was Curl_wait_ms and Curl_readwrite. It now directly runs and returns Empty reply from server. I don't know now what to do and at this point I am too afraid to ask.

Before curl_exec * Trying 172.29.1.0:80...  * Connected to wordpress.org (172.29.1.0) port 80 (#0) > GET / HTTP/1.1 Host: wordpress.org Accept: */* * Empty reply from server * Connection #0 to host wordpress.org left intact After curl_exec bool(false) string(23) "Empty reply from server"

@bgrgicak
Copy link
Collaborator

bgrgicak commented Apr 8, 2024

@mho22 I added your code and found it 's actually getting responses 🎉
It's just that the request we made was a 302 redirect.

I pushed your changes and updated the request URL so that we can see the next error.

Now when I run it, there is an Asyncify error. Here are the missing functions:
* __wrap_select
* RAND_poll
* rand_status
* RAND_status
* Curl_ossl_seed
* ossl_connect_common
* Curl_ossl_connect_nonblocking
* Curl_ssl_connect_nonblocking

I tried adding these functions to the Docker file but it can't rebuild because of Emscripten complains it can't find some Asyncify functions. Looking at the output it seems like it stops on ossl_connect_common so this could be the cause.
I don't have time to dive deeper, but there seem to be some related error reports with CURL https://curl.se/mail/lib-2012-03/0008.html

@mho22
Copy link
Contributor

mho22 commented Apr 8, 2024

@bgrgicak I added these methods in packages/php-wasm/compile/php/Dockerfile :

"Curl_poll",\
+"Curl_wait_ms",\
+"Curl_readwrite",\
+"Curl_http_connect",\
+"https_connecting",\
+"Curl_ssl_connect_nonblocking",\
+"Curl_ossl_connect_nonblocking"\
+"ossl_connect_common",\
+"Curl_ossl_seed",\
+"RAND_status",\
+"rand_status",\
+"RAND_poll",\

Now I get an error linked with emscripten_sleep(0) :

base-php.ts:486 RuntimeError: Aborted(invalid state: 1). Build with -sASSERTIONS for more info.
    at abort (php_8_0.js:432:11)
    at handleSleep (php_8_0.js:7896:11)
    at Asyncify.handleSleep (php_8_0.js:9130:16)
    at _emscripten_sleep (php_8_0.js:5758:23)
    at __wrap_select (04a8b976:0x6d55bd)
    at RAND_poll (04a8b976:0x213bb0)
    at rand_bytes (04a8b976:0x214603)
    at rand_nopseudo_bytes (04a8b976:0x21452b)
    at RAND_bytes (04a8b976:0x21515f)
    at SSL_CTX_new (04a8b976:0x2cdc7b)

Asyncify.state returns Asyncify.State.Unwinding. It is apparently an invalid state for the handleSleep method.

@bgrgicak
Copy link
Collaborator

bgrgicak commented Apr 8, 2024

I pushed your changes and rebuilt 8.0.

This is the error I get in the browser.

Error: FS error
    at #handleRequest (base-php.ts:657:21)
    at async WebPHP.run (base-php.ts:278:21)
    at async #dispatchToPHP (php-request-handler.ts:240:11)
    at async PHPRequestHandler.request (php-request-handler.ts:133:11)
    at async PHPBrowser.request (php-browser.ts:61:20)
    
Uncaught (in promise) DOMException: Failed to execute 'postMessage' on 'MessagePort': function(errno) {
            this.errno = errno;
          } could not be cloned.
    at http://localhost:5400/@fs/home/bero/Projects/wordpress-playground/node_modules/.vite/playground/deps/comlink.js?v=044e5c47:314:8
    at new Promise (<anonymous>)
    at requestResponseMessage (http://localhost:5400/@fs/home/bero/Projects/wordpress-playground/node_modules/.vite/playground/deps/comlink.js?v=044e5c47:302:10)
    at Object.apply (http://localhost:5400/@fs/home/bero/Projects/wordpress-playground/node_modules/.vite/playground/deps/comlink.js?v=044e5c47:230:14)
    at WebPHP.dispatchEvent (http://localhost:5400/@fs/home/bero/Projects/wordpress-playground/packages/php-wasm/universal/src/lib/base-php.ts:78:7)
    at WebPHP.run (http://localhost:5400/@fs/home/bero/Projects/wordpress-playground/packages/php-wasm/universal/src/lib/base-php.ts:228:12)
    at async #dispatchToPHP (http://localhost:5400/@fs/home/bero/Projects/wordpress-playground/packages/php-wasm/universal/src/lib/php-request-handler.ts:172:14)
    at async PHPRequestHandler.request (http://localhost:5400/@fs/home/bero/Projects/wordpress-playground/packages/php-wasm/universal/src/lib/php-request-handler.ts:87:14)
    at async PHPBrowser.request (http://localhost:5400/@fs/home/bero/Projects/wordpress-playground/packages/php-wasm/universal/src/lib/php-browser.ts:32:22)
    
Uncaught (in promise) Error: FS error
    at #handleRequest (base-php.ts:657:21)
    at async WebPHP.run (base-php.ts:278:21)
    at async #dispatchToPHP (php-request-handler.ts:240:11)
    at async PHPRequestHandler.request (php-request-handler.ts:133:11)
    at async PHPBrowser.request (php-browser.ts:61:20)

@bgrgicak
Copy link
Collaborator

bgrgicak commented Apr 8, 2024

Unrelated, I tried compiling Node to add a test and it can't compile PHP<=7.3

@mho22
Copy link
Contributor

mho22 commented Apr 10, 2024

@bgrgicak @adamziel I tried to run this with php-wasm/node and I still face an issue when it first calls __wrap_select() from libcurl/bin/select.c method Curl_socket_check on line 287.

node --stack-trace-limit=100 bin/cli.js curl.php

 * __wrap_select
 * Curl_socket_check
 * Curl_is_connected
 * multi_runsingle
 * curl_multi_perform
 * curl_easy_perform
 * zif_curl_exec
 * ZEND_DO_ICALL_SPEC_RETVAL_USED_HANDLER
 * execute_ex
 * zend_execute
 * zend_execute_scripts
 * php_execute_script
 * do_cli
 * main
 * run_cli

Each element of this list has its name in the ASYNCIFY_ONLY list. I realised that if I run node --stack-trace-limit=2 bin/cli.js curl.php, the error message indicates nothing in the stack-trace.

When running node --stack-trace-limit=3 bin/cli.js curl.php, adding 3 to the limit it returns this :

* __wrap_select

I don't know if this can help ? This is the only way I found to have a bit of informations about this. But I am stuck here. Some methods are just invisible.

@mho22
Copy link
Contributor

mho22 commented Apr 11, 2024

@bgrgicak You were right. Your new error was the next one. Mine made me lose a lot of time I suppose. To have new information about the next error, I had to remove that could not be cloned function :

packages/php-wasm/web/public/kitchen-sink/php_8_0.js in method ensureErrnoError :

// before 

this.setErrno = /** @this{Object} */ function(errno) {
    this.errno = errno;
};
this.setErrno(errno);

// after

this.errno = errno;

Now the error indicates that there is an invalid websocket when calling _wasm_poll_socket for the 5 time.

When we tested on http://wordpress.org the wasm_poll_socket method [ called in wrap_select ] was only called 4 times with these data :

socketd : 18, events : POLLOUT, timeout : 0
socketd : 18, events : POLLOUT, timeout : 0
socketd : 18, events : POLLERR, timeout : 0
socketd : 18, events : POLLERR, timeout : 0

Now, when we call the https://wordpress.org url, we get :

socketd : 18, events : POLLOUT, timeout : 0
socketd : 18, events : POLLOUT, timeout : 0
socketd : 18, events : POLLERR, timeout : 0
socketd : 18, events : POLLERR, timeout : 0
socketd : 19, events : POLLIN | POLLOUT, timeout : 10

With the error :

ErrnoError {name: 'ErrnoError', node: undefined, errno: 8, message: 'FS error', cause: Error at Asyncify.handleSleep }

Error
    at Asyncify.handleSleep (packages/php-wasm/web/public/kitchen-sink/php_8_0.js?t=1712818739314:9106:49)
    at _wasm_poll_socket (packages/php-wasm/web/public/kitchen-sink/php_8_0.js?t=1712818739314:7503:21)
    at __wrap_select (wasm://wasm/04a92e06:wasm-function[15359]:0x6d387b)
    at RAND_poll (wasm://wasm/04a92e06:wasm-function[4070]:0x213af7)
    at rand_status (wasm://wasm/04a92e06:wasm-function[4079]:0x214d29)
    at RAND_status (wasm://wasm/04a92e06:wasm-function[4086]:0x215184)
    at Curl_ossl_seed (wasm://wasm/04a92e06:wasm-function[16676]:0x767636)
    at ossl_connect_common (wasm://wasm/04a92e06:wasm-function[16679]:0x7682a7)
    at Curl_ossl_connect_nonblocking (wasm://wasm/04a92e06:wasm-function[16680]:0x77001a)
    at Curl_ssl_connect_nonblocking (wasm://wasm/04a92e06:wasm-function[16713]:0x771624)
    
Errno : 8

Errno : 8 returns EBADF which means : EBADF The socket socket is not a valid file descriptor.

@adamziel This looks like something promising right ?

@adamziel
Copy link
Collaborator Author

@mho22 apologies for the radio silence, I was quite swamped lately. Thank you for persevering – this looks very promising indeed!

Now it gets interesting. The fd number is 19, which tells me the ___syscall_connect earlier succeeded and actually initiated the FetchWebsocketConstructor class. At the same time, the error seems to be coming from the getSocketFromFD(fd) call inside _wasm_poll_socket() and would happen if fd=19 is not a "socket" fd as defined by:

isSocket: mode => (mode & 49152) === 49152,

A few ideas:

  • FetchWebsocketConstructor is called but the web socket is closed before that poll, Emscripten notices and closes fd 19 or at least changes something about its state.
  • I'm wrong and FetchWebsocketConstructor is never called, that fd is not a socket one.
  • The error doesn't actually come from getSocketFromFD(fd) but from another place.

A few console.log statements in FetchWebsocketConstructor constructor and close as well as _wasm_poll_socket() and getSocketFromFD() should paint a clearer picture.

@mho22
Copy link
Contributor

mho22 commented Apr 12, 2024

@adamziel You were right, this comes from getSocketFromFd(fd) where SOCKFS.getSocket(19) returns null.

FetchWebsocketConstructor is correctly created before all and is correctly closed after all.

I console.logged the SOCKFS and the returned socket everytime _wasm_poll_socket calls getSocketFromFD .

Capture d’écran 2024-04-12 à 08 58 16

I should continue my investigation, but if you have any other clues, I'm open to following them.

I found out that the FS.getStream(19) returns an existing stream but however FS.isSocket( stream.node.mode ) return false. That's why SOCKFS.getSocket(19) returns null. Because stream.node.mode == 8557. Anyway, if it was true, it would return stream.node.sock and there is no sock in stream.node here.

This is a normal Stream :

Capture d’écran 2024-04-12 à 09 07 07

This is our Stream fd 19 :

Capture d’écran 2024-04-12 à 09 06 26

I also found out there is only one __syscall_connect for the 18 fd. Nothing for 19. And this is probably why there is no sock on the stream. I now need to find out where __syscall_connect(19,...) should be called. Or maybe ___syscall_socket(...).

@adamziel
Copy link
Collaborator Author

Oh, path: '/dev/urandom' caught my eye. So this is the stack trace you shared earlier:

at Asyncify.handleSleep (packages/php-wasm/web/public/kitchen-sink/php_8_0.js?t=1712818739314:9106:49)
    at _wasm_poll_socket (packages/php-wasm/web/public/kitchen-sink/php_8_0.js?t=1712818739314:7503:21)
    at __wrap_select (wasm://wasm/04a92e06:wasm-function[15359]:0x6d387b)
    at RAND_poll (wasm://wasm/04a92e06:wasm-function[4070]:0x213af7)
    at rand_status (wasm://wasm/04a92e06:wasm-function[4079]:0x214d29)
    at RAND_status (wasm://wasm/04a92e06:wasm-function[4086]:0x215184)

I'm not sure what is the relevance of RAND_ in there, but it seems like libcurl needs some random bytes, tries to source them from /dev/urandom, and then polls that /dev/urandom fd which means we may need to support that in wasm_poll_socket. I think emscripten provides a fake device at /dev/urandom so maybe we could just short-circuit and return 1 in this case?

@mho22
Copy link
Contributor

mho22 commented Apr 12, 2024

@adamziel Fantastic ! I only commented the line 3903 if (!socket) throw new FS.ErrnoError(8): from getSocketFromFD(fd) method and I got this :

Capture d’écran 2024-04-12 à 14 06 48

I don't know where I should short-cicuit this and return 1 in this case. There is something missing here concerning certificates but I think we should add a CURLOPT linked with certificates maybe :

@adamziel
Copy link
Collaborator Author

I don't know where I should short-cicuit this and return 1 in this case.

In _wasm_poll_socket return 1 if the mode is 8557. Perhaps you could also check if the path is /dev/urandom.

To solve the certificate error, adjust the top-level domain in base-php where a fake temporary SSL certificate is generated.

@mho22
Copy link
Contributor

mho22 commented Apr 12, 2024

I hoped it worked fine but I modified downloads.wordpress.org by wordpress.org on line 178 in web-php.ts but the error is still the same :

Before curl_exec * Trying 172.29.1.0:443... * Connected to wordpress.org (172.29.1.0) port 443 (#0) * ALPN, offering http/1.1 * successfully set certificate verify locations: * CAfile: /tmp/ca-bundle.crt CApath: none * SSL connection using TLSv1.0 / AES256-SHA * ALPN, server did not agree to a protocol * Server certificate: * subject: CN=wordpress.org; subjectAltName=wordpress.org; C=US; ST=Virginia; L=Blacksburg; O=Test; OU=Test * start date: Apr 12 12:41:31 2024 GMT * expire date: Apr 12 12:41:31 2025 GMT * subjectAltName does not match wordpress.org * SSL: no alternative certificate subject name matches target host name 'wordpress.org' * Closing connection 0 After curl_exec bool(false) string(85) "SSL: no alternative certificate subject name matches target host name 'wordpress.org'"

As you can see CN is no more downloads.wordpress.org but wordpress.org

I also added a subjectAltName attribute. Both CN and subjectAltName are not solving the issue.

I decided to try out something else and add two other curl_opts:

  • curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
  • curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);

But it resulted with a CORS error.

@adamziel
Copy link
Collaborator Author

@mho22 CORS error could be the expected outcome here. Try https://downloads.wordpress.org/plugin/hello-dolly.1.7.3.zip – it should have the correct CORS headers.

@mho22
Copy link
Contributor

mho22 commented Apr 12, 2024

@adamziel I am sorry but I am a little bit confused here. Do I need to add the curl_setopt CURLOPT_SSL_VERIFYHOST and CURLOPT_SSL_VERIFYPEER in worker-thread.ts and modify commonName with https://downloads.wordpress.org/plugin/hello-dolly.1.7.3.zip in web-php.ts ?

Sorry for this. I am a little bit lost with SSL.

@adamziel
Copy link
Collaborator Author

@mho I just meant use CURL to request https://downloads.wordpress.org/plugin/hello-dolly.1.7.3.zip instead of the URL it requests now to avoid the CORS issue. Disabling peer verification may help, too, just to get it off the ground, but eventually we'll make the fake certificate "just work".

@mho22
Copy link
Contributor

mho22 commented Apr 13, 2024

@adamziel I replaced https://wordpress.org by https://downloads.wordpress.org/plugin/hello-dolly.1.7.3.zip in worker-thread.ts file when writing test-curl.php. I modified the commonName to become downloads.wordpress.org but I had to modify the existing subjectAltName parameters. the 7 type had to be commented :

web-php.ts on line 142 :

{
	name: 'subjectAltName',
	altNames: [
		{
			type: 6, // URI
			value: '*.org',
		},
		// {
		// 	type: 7, // IP
		// 	ip: '127.0.0.1',
		// },
	],
},
Capture d’écran 2024-04-13 à 09 11 28

Great ! Even if there is a uh oh unknown error. So now let's try again with https://wordpress.org :

Capture d’écran 2024-04-13 à 09 41 33

The CORS error appears. I decide to add these lines in the test-curl.php file :

$headers = [ 'Access-Control-Allow-Origin: *' ];
curl_setopt ( $ch , CURLOPT_HTTPHEADER, $headers );

And the header appears in the request :

Capture d’écran 2024-04-13 à 09 42 38

But of course, CORS is still there, what do you think should be the next step here ?

@adamziel
Copy link
Collaborator Author

adamziel commented Apr 13, 2024

@mho22 I'm afraid this is as far as we can go. CURL or not, ultimately all the traffic goes through fetch() which is limited by the CORS rules.

To access arbitrary URLs, we'd need to set up a server-side WebSockets<->TCP proxy like the one wp-now uses. Or open up "plugin-proxy.php" to all requests and methods.

@mho22
Copy link
Contributor

mho22 commented Apr 14, 2024

@adamziel We should also try to make this work in any PHP versions on node, light and kitchen-sink right ? Don't hesitate to give me next steps, insights and comments. I would be glad to add curl into php-wasm for sure.

@bgrgicak
Copy link
Collaborator

Great work @mho22 🚀

@adamziel
Copy link
Collaborator Author

@adamziel We should also try to make this work in any PHP versions on node, light and kitchen-sink right ? Don't hesitate to give me next steps, insights and comments. I would be glad to add curl into php-wasm for sure.

Let's start with node since we can open arbitrary network sockets there. Once it's functional there, #1093 will add the browsers support.

As for next steps here, I suggest:

  1. Start a PR with your changes – it could be targeted either at this branch or trunk
  2. Add a few unit tests to ensure curl works in node.js
  3. Get it to a state where there are no random, related errors like the uh oh unknown error

And then let's get it merged! Thank you for your fantastic work @mho22! :)

@adamziel
Copy link
Collaborator Author

@mho22 any updates on this one? It would be really cool to support curl.

@mho22
Copy link
Contributor

mho22 commented Apr 17, 2024

@adamziel Sorry for the silent days. I have to focus on other tasks on my side. Anyways, I cannot figure out why I still have unreachable errors with node. Is it possible that the ASYNCIFY_ONLY could be cached when I try to build node with :

nx reset && npm run recompile:php:node:8.0

Here is the list of methods called with the "unreachable" WASM instruction executed :

 CLI option:

      * __wrap_select
      * Curl_socket_check
      * Curl_is_connected
      * multi_runsingle
      * curl_multi_perform
      * curl_easy_perform
      * zif_curl_exec
      * ZEND_DO_ICALL_SPEC_RETVAL_USED_HANDLER
      * execute_ex
      * zend_execute
      * zend_execute_scripts
      * php_execute_script
      * do_cli
      * main
      * run_cli

Every line is in the ASYNCIFY_ONLY array.

I am currently trying to run php-wasm:node outside of wordpress-playground so here are the steps I follow.

in 'php-wasm/wordpress-playground' : nx reset && npm run recompile:php:node:8.0
in 'php-wasm/wordpress-playground' : nx run php-wasm-node:build

in 'php-wasm/test' file package.json :

"dependencies" : {
    "@php-wasm/node" : "file:../wordpress-playground/dist/packages/php-wasm/node",

in 'php-wasm/test' : npm update
in 'php-wasm/test': node --stack-trace-limit=100 cli.js curl.php

Where my curl.php file is as follows :

<?php
   	
echo '<plaintext>';
$ch = curl_init();
$streamVerboseHandle = fopen('php://stdout', 'w+');

curl_setopt($ch, CURLOPT_URL, 'https://wordpress.org');
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_TCP_NODELAY, 0);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 25);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_STDERR, $streamVerboseHandle);

// var_dump( curl_version() );
// var_dump( curl_getinfo( $ch ) );

echo "Before curl_exec\n\n";
$output = curl_exec($ch);
echo "\n\nAfter curl_exec\n";

var_dump($output);
var_dump(curl_error($ch));

curl_close($ch);

You probably have a better way to try to make node version work without all these steps but I wanted to make this way work.

My cli.js file is simply this :

#!/usr/bin/env node

import { NodePHP } from '@php-wasm/node';

NodePHP.load( "8.0" ).then( php =>
{
   php.useHostFilesystem();

   let args = process.argv.slice( 2 );

   php.cli( [ 'php', ...args ] ).catch( ( result : string ) => { throw result; } ).finally( () => process.exit( 0 ) );
} );

I should probably create a new branch a start from scratch and follow the things we added progressively here.

@adamziel
Copy link
Collaborator Author

@mho22 I see, thank you for fleshing the issue out so thoroughly! If you start a new branch, @bgrgicak and I might be able to poke around and see if we're able to fix the issue.

@adamziel
Copy link
Collaborator Author

Closing in favor of #1273 that ships a working curl build

@adamziel adamziel closed this Apr 29, 2024
adamziel added a commit that referenced this pull request Apr 29, 2024
Ships the Node.js version of PHP built with `--with-libcurl` option to support the curl extension.

It also changes two nuances in the overall PHP build process:

* It replaces the `select(2)` function using `-Wl,--wrap=select` emcc
option instead of patching PHP source code – this enables supporting
asynchronous `select(2)` in curl without additional patches.
* Brings the `__wrap_select` implementation more in line with
`select(2)`, add support for `POLLERR`.
* Adds support for polling file descriptors that represent neither child
processes nor streams in `poll(2)` – that's because `libcurl` polls
`/dev/urandom`.

Builds on top of and supersedes
#1133

## Debugging Asyncify problems

The [typical way of resolving Asyncify
crashes](https://wordpress.github.io/wordpress-playground/architecture/wasm-asyncify/)
didn't work during the work on this PR. Functions didn't come up in the
error messages and even raw stack traces. The reasons are unclear.

[The JSPI build of
PHP](#1339) was
more helpful as it enabled logging the current stack trace in all the
asynchronous calls, which quickly revealed all the missing
`ASYNCIFY_ONLY` functions. This is the way to debug any future issues
until we fully migrate to JSPI.

## Testing Instructions

Confirm the CI checks pass. This PR ships a few new tests specifically
targeting networking with curl.


## Related resources

* #85
* #1093

---------

Co-authored-by: Adam Zieliński <adam@adamziel.com>
Co-authored-by: MHO <yannick@chillpills.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants