LinuxFest Northwest 2012

Bellingham, WA April 28th & 29th

Platinum Sponsors

PHP Multitasking without forking

mikeytown2's picture

Slides: https://docs.google.com/presentation/pub?id=1LMQnMnzZDgjJ8BMozZLeue8vXeU...

 

Did you know that PHP comes with a very powerful function called stream_select()? In short it's a wrapper for the select() system call. This allows for synchronous I/O multiplexing, or in sudo-code, a way to read multiple input/output streams without blocking; only selecting streams that are ready for I/O. In English this means that you can do parallel work from a single PHP thread without forking (forking a process is slow and complex in PHP).

In this talk I'll be going over some code called the "HTTP Parallel Request & Threading Library" or HTTPRL in short. Using PHP's stream_select() HTTPRL can send http requests out in parallel. These requests can be made in a blocking or non-blocking way. Blocking will wait for the http response; Non-Blocking will close the connection not waiting for the response back. Non-Blocking requests are what make this better in comparison to cURL, better control over redirects is the other reason.

Having this tool-set allows for some very interesting use cases.

Most PHP applications do work sequentially & single threaded. If your program needed to retrieve multiple feeds at two different urls, usually you would request the first feed, wait till the response to arrive, request the second feed... resulting in more waiting. HTTPRL allows you to request both feeds at the same time.

Another example is for speeding up a complex web application. If you've built the site using Edge Side Includes/Server Side Includes (ESI/SSI) using HTTPRL's non blocking requests you can prefetch the ESI fragments so they will be available in the edge's cache once your main page is ready to be rendered. Do prefetching as early as possible in the script.

Some info about HTTPRL:
It is a flexible and powerful HTTP client implementation. Correctly handles GET, POST, PUT or any other HTTP requests & the sending of data. Correctly follows redirects. Issue blocking or non-blocking requests in parallel. Set timeouts, max simultaneous connection limits, chunk size, and max redirects to follow. Can handle data with content-encoding and transfer-encoding headers set. Option to forward the referrer when a redirect is found. Cookie extraction and parsing into key value pairs. Currently on production boxes HTTPRL is used to send emails out; interact with complex internal REST api's; generate imagemagick images in the background; and build CSS/JS aggregates on demand.

Recent developments:
Callbacks and background callbacks after retrieving a HTTP request; background callbacks can be blocking or non-blocking. Also securely calling any function in a blocking or non-blocking manner; if using blocking, pass-by-reference works. These new features make HTTPRL a very powerful HTTP client and makes utilizing all your cores for a batch operation a lot simpler.

 

Notes:
 - stream_select() can also be used to launch system commands in parallel via proc_open(). I didn't code for this as I usually issues system commands in a cron job. Eventually I will allow background callbacks and calling any function via proc_open() instead of using HTTP.
 - Select is not at the same level of awesome as epoll is; epoll is not a part of the standard php library thus I didn't code against it. If you wish to play around with libevent (epoll) checkout the PECL libevent package. As of this writing latest version is 0.0.5.

Speaker(s): 

Slides: 

Time slot: 

Room: 

Session Length: 

Session Tags: