Sockets From Scratch

Every web framework (Express, Flask, Django, Rails) does the same thing under the hood. It creates a socket, binds it to a port, listens for connections, accepts them, reads requests, and writes responses. The framework just hides it behind a single call.

C doesn't hide anything. When you write a web server in C, you see every system call, every file descriptor, every queue. That's what makes it the best way to understand what's actually happening.

File Descriptors

In Unix, everything is a file. Sockets, pipes, printers: they all get a file descriptor (fd), which is just a non-negative integer the OS uses to track open resources.

Our server needs two:

int server_fd;  // The listening socket, one per server
int client_fd;  // A connection, one per client

The server fd is created once and lives for the lifetime of the server. Client fds are created each time a connection is accepted, and destroyed when the connection closes. There's one server_fd but potentially thousands of client_fds.

Creating a Socket

The first system call is socket(). It asks the kernel for a new communication endpoint:

server_fd = socket(AF_INET, SOCK_STREAM, 0);

Three parameters:

AF_INET: IPv4 address family. Use AF_INET6 for IPv6.
SOCK_STREAM: A reliable, ordered byte stream. This means TCP. For UDP, you'd use SOCK_DGRAM.
0: Let the kernel pick the protocol. For SOCK_STREAM, it always picks TCP.

At this point, we have a socket but it's not attached to any address or port. It's like having a phone but no phone number.

Configuring the Address

Before binding, we fill out a sockaddr_in structure that tells the kernel where to listen:

struct sockaddr_in address;
 
address.sin_family = AF_INET;          // IPv4
address.sin_addr.s_addr = INADDR_ANY;  // All interfaces (0.0.0.0)
address.sin_port = htons(8080);        // Port in network byte order

INADDR_ANY means "listen on every network interface": Wi-Fi, Ethernet, loopback, all of them. This is convenient for development but risky in production. Many database leaks happen because someone accidentally exposed MongoDB on a public interface this way.

The htons() call is subtle but critical. Network protocols use big-endian byte order, but your CPU might use little-endian. htons (host-to-network-short) converts the port number to the right format:

Network Byte Order

Why htons() matters for port numbers

Port number

16-bit representation

8080 = 0x1F90

Host (Little-Endian)

Sent on wire

Addr 0

0x90

0x1F

Addr 1

Memory: [0x90, 0x1F]

Network (Big-Endian)

Addr 0

0x1F

0x90

Addr 1

Memory: [0x1F, 0x90]

Misinterpreted

Without htons(), the host sends bytes in little-endian order. The receiver interprets them as big-endian and reads port 36895 instead of 8080.

Binding and Listening

With the address configured, two calls activate the server:

// Step 1: Register the address
bind(server_fd, (struct sockaddr *)&address, sizeof(address));
 
// Step 2: Actually start listening
listen(server_fd, 10);

This is a detail most people miss: bind() does not start listening. It only associates the socket with the address and port. The kernel hasn't created any internal data structures for accepting connections yet.

listen() is what flips the switch. It creates the accept queue, a kernel-managed buffer where completed TCP connections wait to be picked up by your application. The second parameter (10) is the backlog: the maximum number of connections that can sit in this queue.

The Accept Queue

This is where things get interesting. When a client connects to your server, a three-step handshake happens entirely in the kernel:

Client sends SYN → "I want to connect"
Server responds SYN-ACK → "Acknowledged, go ahead"
Client sends ACK → "Connected"

Once complete, the connection is placed in the accept queue. It sits there until your application calls accept().

Accept Queue Simulator

See how connections queue up and what happens when your server is too slow.

Accept Queue

backlog: 5

→accept()

0/5 slots used

Accepted

In Queue

Rejected

Event Log

server.log

Waiting for connections...

If the queue fills up because your application is too slow calling accept(), new connections get rejected. The kernel won't even respond with a SYN-ACK. The client sees a connection timeout or reset.

This is one of the most common production bottlenecks. A slow backend that can't drain the accept queue fast enough will silently drop connections.

The Server Loop

Every socket server has an infinite loop at its core. Accept a connection, read the request, send a response, close:

while (1) {
    // 1. Accept (blocks until a connection arrives)
    client_fd = accept(server_fd, ...);
 
    // 2. Read (blocks until data is available)
    read(client_fd, buffer, 1024);
 
    // 3. Write the response
    write(client_fd, http_response, strlen(http_response));
 
    // 4. Close the connection
    close(client_fd);
}

Each of these calls can block. accept() halts execution until there's a connection in the queue. read() halts until the client sends data. This is why a single-threaded blocking server can only handle one request at a time, and why async I/O frameworks exist.

Step through the complete lifecycle to see what happens at each stage:

Server Lifecycle

Step through each system call and see what happens

socket()→

bind()→

listen()→

accept()→

connected→

read()→

write()→

close()

C Code

server_fd = socket(
  AF_INET, SOCK_STREAM, 0
);

System State

server_fd3

client_fd-

Port-

Accept Queue-

StatusSocket created

What's happening

The kernel creates a new file descriptor (fd=3) for our TCP socket. No address or port assigned yet.

Data Flow: The Hidden Copies

When data moves between a client and your server, it doesn't teleport. It gets copied multiple times:

Client sends data → arrives at your NIC (network interface card)
NIC copies it to the kernel receive queue (per-connection buffer)
Your read() call copies it from the kernel into your application buffer
You process it and call write(), which copies your response to the kernel send queue
The kernel copies it to the NIC for transmission

That's 3-4 memory copies for a single request-response cycle. Every one of those copies costs CPU time and memory bandwidth. This is why kernel developers are obsessed with "zero-copy" techniques. Reducing or eliminating these copies is one of the biggest performance wins in networking.

Request-Response Data Path

Every request involves multiple memory copies

Copies0

Client

Client App

NIC

NIC (Send)

copy

Kernel

Kernel Recv Buffer

copy

App

Application

copy

Kernel

Kernel Send Buffer

copy

NIC

NIC (Recv)

Click "Play" to watch data flow through the network stack and count memory copies.

The Complete Server

Here's everything tied together in a working HTTP server, about 40 lines of C:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/socket.h>
#include <netinet/in.h>
 
#define APP_MAX_BUFFER 1024
#define PORT 8080
 
int main() {
    int server_fd, client_fd;
    struct sockaddr_in address;
    int address_len = sizeof(address);
    char buffer[APP_MAX_BUFFER] = {0};
 
    // Create socket
    server_fd = socket(AF_INET, SOCK_STREAM, 0);
 
    // Configure address
    address.sin_family = AF_INET;
    address.sin_addr.s_addr = INADDR_ANY;
    address.sin_port = htons(PORT);
 
    // Bind and listen
    bind(server_fd, (struct sockaddr *)&address, address_len);
    listen(server_fd, 10);
 
    // Server loop
    while (1) {
        client_fd = accept(server_fd,
            (struct sockaddr *)&address,
            (socklen_t *)&address_len);
 
        read(client_fd, buffer, APP_MAX_BUFFER);
 
        const char *response =
            "HTTP/1.1 200 OK\r\n"
            "Content-Type: text/plain\r\n"
            "Content-Length: 13\r\n\r\n"
            "Hello, World!";
 
        write(client_fd, response, strlen(response));
        close(client_fd);
    }
 
    return 0;
}

Compile and run it with gcc -g server.c -o server && ./server, then hit curl localhost:8080 and you'll see "Hello, World!".

Try it with a debugger. The highlighted lines above are the system calls that do the real work. Set breakpoints on them with GDB (gdb ./server, then break 18, break 26, etc.) and step through. You'll see the file descriptors change, the accept call block, and the data flow in real time. The -g flag in the compile command includes debug symbols for this.

For a fully annotated version of this server with detailed comments explaining every system call, check out the complete source on GitHub.

Wrapping Up

Every web server, whether it's Nginx serving millions of requests or a tiny Flask app, does exactly what we just built. The difference is in how they handle the accept loop:

Single-threaded blocking (what we built): one connection at a time. Simple but can't scale.
Multi-threaded: spawn a thread per connection. Better, but threads are expensive.
Event-driven (epoll/kqueue): monitor thousands of connections with a single thread. This is what Node.js and Nginx do.
io_uring: the newest Linux approach, reducing system call overhead even further.

Understanding the raw socket layer makes every framework decision click. When Express says it's "non-blocking," you now know what's not blocking. When Nginx talks about its "worker connections," you know it's about accept queue management. When someone says "zero-copy," you know exactly which copies they're trying to eliminate.

Resources

Hussein Nasser's YouTube channel. This post was inspired by his backend engineering videos. His explanations of networking internals, TCP, and server architecture are some of the best on YouTube.
Beej's Guide to Network Programming. The best free resource for learning socket programming in C. Covers everything from basic sockets to select(), poll(), and advanced techniques.
How TCP Backlog Works in Linux. A deep dive into the SYN queue vs accept queue distinction and how the backlog parameter actually behaves.
Eli Bendersky's Concurrent Servers series. A four-part series that starts where this post ends, building progressively from threads to select() to epoll.