Connection Pooling in Apache

Privileges for anonymous
Read	Yes
Annotate	Yes
Edit	No
Manage	No

A database is a fundamental component of many web applications. But connecting to it is an overhead that affects traditional application architectures such as CGI and LAMP. In this article, we show how we can manage database connections more efficiently with Apache 2's threaded MPMs, and thus achieve improvements in performance and scalability.

A more efficient LAMP

Many web-based applications generate dynamic content in whole or in part from a backend server. Where the backend is designed as connection-oriented, there is a mismatch with the request-oriented HTTP protocol. This can easily lead to inefficiency in applications. The most common case is that of an SQL backend, where there is always an overhead to creating a connection and logging in. This is exemplified by the environment commonly known as LAMP (Linux, Apache, MySQL, [Perl|PHP|Python]). When the connection is TCP/IP over a network, that is an additional overhead.

Note by anonymous, Sat Jul 17 11:27:01 2004

This paragraph needs reordering. Describe the application first, then the mismatch with HTTP.

Note by anonymous, Fri Feb 9 05:00:22 2007

i want to see how is working

Note by anonymous, Mon Sep 10 18:25:42 2007

This seems rather complex

The simple CGI case

A CGI script services a single request. So the baseline for CGI to access a database is to open a connection, run any necessary queries, close the connection, and return content to the Client.

This is fine for a low-traffic site, but grows inefficient as the hit rate rises above a few tens per minute. So as traffic rises, an alternative model is required. For CGI, we can use an alternative implementation such as FastCGI. But the most widely-used architecture is LAMP.

The classic LAMP case

The classic solution to this, as provided for many years by application development environments such as mod_perl and PHP, is for the Apache server process to hold a database connection open, saving the overhead of opening and closing a connection for every request. With Apache 1.x, this is essentially the best you can do, and is the usual way of working.

However, this solution has its own problems. Although it substantially reduces the per-hit overhead, it introduces another: namely that of holding a large number of backend connections open. This in itself puts a load both on the webserver itself and the backend and limits the number of users that can be concurrently serviced. This doesn't just affect database-driven traffic: requests for static webpages also have to be served by an Apache process that is keeping an open connection to the backend.

Taking advantage of Apache 2

With Apache 2 and threaded MPMs, a wider range of altogether more efficient and scalable options present themselves. Starting from what we already have, we can list our options:

Classic CGI: one connection per request.
Classic LAMP: one persistent connection per thread.
Alternative LAMP: one persistent connection per process, with a mechanism for a thread to take and lock it.
Connection Pooling: more than one connection per process, but fewer than one per thread, with a mechanism for a thread to take and lock a connection from the pool.
Dynamic Connection Pooling: a variable-size connection pool, that will grow or shrink according to actual database traffic levels.

Looking at these in order, we can see the advantages and drawbacks of each option. We have already dealt with the first two.

The third dispenses with the LAMP overhead at the cost of preventing parallel accesses to the backend. It may be an efficient solution in some cases, but clearly presents its own problems with servicing concurrent requests.

The fourth and fifth present an optimal solution whose scalability is limited only by the available hardware and operating system. The number of backend connections to threads should reflect the proportion of the total traffic that requires the backend. So, in simple terms, if one in every five requests to the webserver requires the database, then a pool might have one connection per five threads. Just as Apache itself maintains a dynamic pool of threads to service incoming HTTP connections, so the optimal solution to managing backend connections is a dynamic pool whose size is driven by actual demand rather than best-guess configuration.

Note by anonymous, Fri Dec 2 10:05:13 2005

So, do I have to do to use this new feature from Perl running under FastCGI?

Implementation of Connection Pooling

Although the case for connection pooling is clear, implementation is a recent development. Its conception was around the time of ApacheCon in November 2003, when I floated the idea in a "Birds of a Feather" session entitled "the module developers wishlist". Having discussed it and found I was not alone in wanting it, I procceed to implement, in addition to the Site Valet connection pooling module, open-source PostgreSQL and MySQL connection pooling modules. Paul Querna has implemented a similar module for connecting to a database with libdbi.

The PostgreSQL and MySQL implementations are available at apache.webthing.com. The libdbi implementation is at outoforder.cc.

Note by anonymous, Sat Jul 17 11:18:03 2004

Maybe add references here?

Note by niq, Wed Jul 21 11:26:34 2004

Done. But I still need to document the modules:-)

Note by anonymous, Sun Jul 23 16:55:53 2006

Thank you for posting something like this. This will help for our understanding of connection pooling. IT would be much better if it shows a more practical example for implementing connection pooling for web aaplication.

Note by anonymous, Tue Mar 27 19:02:22 2007

[FEATURE EXPLANATION] [httpd.conf EXAMPLE] [Virtual host config EXAMPLE] [page.PHP/page.Perl EXAMPLES] Zah....maybe it's just me.

Dynamic Resource Pools: apr_reslist

Implementation of connection pooling in practice is straightforward. An API for maintaining a dynamic respource pool is already provided by the apr_reslist module. All we need to do is provide a constructor and destructor for our resource, and functions to retrieve an instance of our resource from the pool, and release it back to the pool. The rest is managed by apr_reslist.

To illustrate this, here are the core functions of the MySQL implementation mod_mysql_pool. First, a constructor and destructor. We never call these directly; apr_reslist calls them when required.


/* an apr_reslist_constructor for MySQL connections
   Opens a MySQL connection and stores the handle in *db
*/
static apr_status_t mysqlpool_construct(void** db, void* params, apr_pool_t* pool) {
  svr_cfg* svr = (svr_cfg*) params ;
  MYSQL* sql = NULL ;
  sql = mysql_init(sql) ;
  if ( sql == NULL ) {
    ap_log_perror(APLOG_MARK, APLOG_CRIT, 0, pool, "mysql_init failed") ;
    return APR_EGENERAL ;
  }
  *db = mysql_real_connect(sql, svr->host, svr->user, svr->pass,
        svr->db, svr->port, svr->sock, 0) ;

  if ( ! *db ) {
    ap_log_perror(APLOG_MARK, APLOG_CRIT, 0, pool, "MySQL Error: %s",
        mysql_error(sql) ) ;
    return APR_EGENERAL ;
  }
  return APR_SUCCESS ;
}
static apr_status_t mysqlpool_destruct(void* sql, void* params, apr_pool_t* pool)
{
  mysql_close((MYSQL*)sql) ;
  return APR_SUCCESS ;
}

Now, we need a function to register our dynamic resource list. This is called in a post_config hook, and calls apr_reslist_create to set up a pool of MySQL connections:


static int setup_db_pool(apr_pool_t* p, apr_pool_t* plog,
        apr_pool_t* ptemp, server_rec* s) {
  svr_cfg* svr = (svr_cfg*)
        ap_get_module_config(s->module_config, &mysql_pool_module) ;

  if ( apr_reslist_create(&svr->dbpool, svr->nmin, svr->nkeep,
        svr->nmax, svr->exptime, mysqlpool_construct, mysqlpool_destruct,
        svr, p) != APR_SUCCESS ) {
    ap_log_error(APLOG_MARK, APLOG_CRIT, 0, s, "MySQLPool: failed to initialise"
) ;
    return 500 ;
  }
  apr_pool_cleanup_register(p, svr->dbpool,
        (void*)apr_reslist_destroy,
        apr_pool_cleanup_null) ;
  return OK ;
}

Finally, we provide functions other modules can use to acquire a connection from the pool and return it. We'll just show one function here: mysql_acquire is for the commonest case, where a connection is needed for the duration of a request. Note that:

By registering a cleanup on the request pool, mysql_acquire guarantees the connection is returned to the pool at the end of the request.
By storing the connection handle on our own request_config, we guarantee that the same connection is returned every time sql_acquire is called within a single request.


MYSQL* mysql_acquire(request_rec* r, unsigned int flags) {
  mysql_request* req = (mysql_request*)
        ap_get_module_config(r->request_config, &mysql_module) ;
  if ( ! req ) {	/* use pool if and only if we haven't already got one */
    svr_cfg* svr = (svr_cfg*)
        ap_get_module_config(r->server->module_config, &mysql_module) ;
    req = (mysql_request*) apr_palloc(r->pool, sizeof(mysql_request) ) ;
    req->flags = flags ;
    req->dbpool = svr->dbpool ;
    if ( apr_reslist_acquire(svr->dbpool, (void**)&req->sql)
        != APR_SUCCESS ) {
      ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r,
        "Failed to acquire MySQL connection from pool") ;
      return NULL ;
    }
    if ( mysql_ping(req->sql) != 0 ) {
      ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r, "MySQL: %s",
	mysql_error(req->sql) ) ;
      apr_reslist_invalidate(svr->dbpool, req->sql) ;
      return NULL ;
    }
    ap_set_module_config(r->request_config, &mysql_module, req) ;
    apr_pool_cleanup_register(r->pool, req,
        mysql_release, apr_pool_cleanup_null) ;
  } else {
    req->flags |= flags ;  /* ensure all required cleanups happen */
  }
  return req->sql ;
}

The rest is housekeeping. The full code for this module, as well as a PostgreSQL companion module, is available at WebÞing.

Note by anonymous, Fri Jul 16 16:30:23 2004

Please describe the structs!

Note by anonymous, Wed Oct 13 21:12:37 2004

Another Apache module that uses Connection Pooling is mod_dbi_pool: http://www.outoforder.cc/projects/apache/mod_dbi_pool/

Note by anonymous, Tue Jul 5 03:56:37 2005

y Como bajo ese modulo?

Note by anonymous, Thu Jun 8 02:51:27 2006

You also see following web sites,so You are able to understand all you want to know!! The Database Pooling Modules also use apr_reslist: http://apache.webthing.com/mod_pg_pool.c http://apache.webthing.com/mod_mysql_pool.c and, http://cvs.sourceforge.net/viewcvs.py/mod-auth/mod_dbi_pool/src/mod_dbi_pool.c?rev=1.8&view=auto

Note by anonymous, Wed Jun 21 23:11:11 2006

can this be used with PHP