[code analysis] Nginx zero downtime upgrade

Nginx can gracefully upgrade without shutdown the existing server. Detail upgrade process can be found at the nginx document

In short,

  • Build a new nginx binary, move it to /usr/local/nginx/sbin/nginx
  • Send a USR2 signal to existing nginx process, the this process will execute new binary, the new binary will accept requests with the old one simultaneously.
  • If nothing wrong, send WINCH(optional) and QUIT signal to stop old process.

nginx_zero_downtime_upgrade.png

Background

fork and execve syscall

From fork and execve man page:

Fork: > fork() creates a new process by duplicating the calling process. > The child inherits copies of the parent’s set of open file descriptors. Each file descriptor in the child refers to the same open file description as the corresponding file descriptor in the parent.

The file descriptors are shared in parent and child process.

Execve: > execve() executes the program pointed to by filename. > By default, file descriptors remain open across an execve(). File descriptors that are marked close-on-exec are closed;

close on exit flag

From fctnl doc

If the FD_CLOEXEC bit is 0, the file descriptor will remain open across an execve(2), otherwise it will be closed.

if a fd is set FD_CLOEXEC, then when parent process execve() a new process, old file descriptor will be closed. It’s set for security reason which not allow child process to access parent open fd.

Source Code

nginx receive signal from master process, master process mainpulate child process.

ngx_process.c:

handle system signal, in ngx_singal_handler func, check if signal == USR2, set ngx_change_binary=1

void ngx_signal_handler(int signo)
{
    ...
    
    switch (ngx_process) {

    case NGX_PROCESS_MASTER:
    case NGX_PROCESS_SINGLE:
        switch (signo) {
        ...

        case ngx_signal_value(NGX_CHANGEBIN_SIGNAL):
            if (getppid() > 1 || ngx_new_binary > 0) {

                /*
                 * Ignore the signal in the new binary if its parent is
                 * not the init process, i.e. the old binary's process
                 * is still running.  Or ignore the signal in the old binary's
                 * process if the new binary's process is already running.
                 */

                action = ", ignoring";
                ignore = 1;
                break;
            }
            ngx_change_binary = 1;      //  set ngx_change_binary variable to 1
            action = ", changing binary";
            break;
        }
    }
    ...
}

nginx_process_cycle.c

ngx_master_process_cycle is the main loop the nginx,

ngx_master_process_cycle(ngx_cycle_t *cycle)
{
    ...

    for ( ;; ) {
        ...
        ...
        if (ngx_change_binary) {
            ngx_change_binary = 0;
            ngx_log_error(NGX_LOG_NOTICE, cycle->log, 0, "changing binary");
            ngx_new_binary = ngx_exec_new_binary(cycle, ngx_argv);    // exec new binary
        }
    }
}

nginx.c

in ngx_exec_new_binary func, save listening file descriptors to environment variable and share with new process,

ngx_pid_t ngx_exec_new_binary(ngx_cycle_t *cycle, char *const *argv) 
{
    ...
    
    env = ngx_set_environment(cycle, &n);
    var = ngx_alloc(sizeof(NGINX_VAR) 
        + cycle->listening.nelts * (NGX_INT32_LEN + 1) + 2, cycle->log);
    p = ngx_cpymem(var, NGINX_VAR "=", sizeof(NGINX_VAR));
    ls = cycle->listening.elts;
    for (i = 0; i < cycle->listening.nelts; i++) {
        p = ngx_sprintf(p, "%ud;", ls[i].fd);   // save listen fd in foramt "NGINX=3;4;5;"
    }
    *p = '\0';
    
    env[n++] = var;  // save to environment variable
    ctx.envp = (char *const *) env;

    pid = ngx_execute(cycle, &ctx);   // call execute func
}

ngx_process.c

ngx_pid_t ngx_execute(ngx_cycle_t *cycle, ngx_exec_ctx_t *ctx)
{
    return ngx_spawn_process(cycle, ngx_execute_proc, ctx, ctx->name,
                             NGX_PROCESS_DETACHED);
}


ngx_pid_t ngx_spawn_process(ngx_cycle_t *cycle, ngx_spawn_proc_pt proc, void *data,
    char *name, ngx_int_t respawn)
{
    ...
    
    pid = fork();        // first, fork a new process

    switch (pid) {

    case -1:
        ngx_log_error(NGX_LOG_ALERT, cycle->log, ngx_errno,
                      "fork() failed while spawning \"%s\"", name);
        ngx_close_channel(ngx_processes[s].channel, cycle->log);
        return NGX_INVALID_PID;

    case 0:             
        // if self is child process, exec proc(),
        // in the upgrade binary case, will execute ngx_execute_proc()
        ngx_pid = ngx_getpid();
        proc(cycle, data);
        break;

    default:
        break;
    }
}


static void ngx_execute_proc(ngx_cycle_t *cycle, void *data)
{
    ngx_exec_ctx_t  *ctx = data;

    if (execve(ctx->path, ctx->argv, ctx->envp) == -1) {
        ngx_log_error(NGX_LOG_ALERT, cycle->log, ngx_errno,
                      "execve() failed while executing %s \"%s\"",
                      ctx->name, ctx->path);
    }

    exit(1);       // execve won't return on success, if err occurs, exit(1)
}

TroubleShooting

how to deal with existing connection fd

If a connection is inherited by a forked process, its fd won’t be released if we don’t actively close it in child process, which will lead to fd leak.

Solution: mark all client connection fd as CLOSE_ON_EXEC flag, unmark the listener fd. The StartProcess func in golang helps us to do it.

comments powered by Disqus