HTTP Server & Proxy
Raw sockets, no libraries
HTTP web server and caching proxy built from scratch with raw sockets.
No external libraries, no frameworks, no HTTP parsers — just Python's
standard socket module and hand-rolled HTTP/1.1 parsing.
What it is
Two cooperating Python programs that implement the HTTP/1.1
request/response cycle at the socket layer. webserver.py is an origin
server on 127.0.0.1:6789 that parses GET requests, resolves paths
against the working directory, detects MIME types, and streams file
bytes back. proxyserver.py is a forwarding HTTP proxy on
127.0.0.1:8888 with an on-disk cache: it accepts absolute-URI GET
requests, opens a fresh TCP socket to the origin on port 80, caches the
response to cache/, and serves subsequent requests without touching
the network. Everything lives below the http.server / urllib
abstractions — request lines are tokenized by hand, response headers
are concatenated strings, sockets are opened and closed explicitly.
By the numbers
| Metric | Value |
|---|---|
| Servers | 2 (origin + forwarding proxy) |
| External dependencies | 0 (standard library only) |
| Total LOC | ~250 (176 server + 218 proxy) |
| Recv buffer | 1024 B (server) / 4096 B (proxy) |
| Upstream timeout | 10 s |
| Listen backlog | 5 |
| Term | Feb - Mar 2025 |
Architecture
Client (curl / browser)
|
| HTTP/1.1 GET
v
+--------------------+ +----------------------+
| webserver.py | | proxyserver.py |
| 127.0.0.1:6789 | | 127.0.0.1:8888 |
| | | |
| recv -> parse -> | | recv -> parse -> |
| resolve file -> | | cache lookup -> |
| mimetype -> | | HIT: serve cached |
| 200 OK + bytes | | MISS: forward -> |
| (or 404) | | connect :80 -> |
| | | recv -> cache -|----> origin
+--------------------+ +----------------------+ (port 80)Key features
- Raw socket programming —
AF_INET+SOCK_STREAMwithSO_REUSEADDR, explicitbind/listen(5)/acceptloop,sendallfor complete response delivery, andsettimeout(10)on upstream connections to prevent hangs. - Hand-rolled HTTP/1.1 parsing — splits requests on
\r\n, tokenizes the request line into method + URI + version, validatesGET, and constructs response headers as formatted strings withContent-Type,Content-Length, andConnection: close. - MIME type detection —
mimetypes.guess_typemaps extensions to types (HTML, text, PNG, JPEG), falling back toapplication/octet-stream. Binary-safe file reads (rb) for images. - On-disk proxy cache — cache key derived from URL with
/replaced by_. Cache HIT reads from disk and skips the upstream fetch; cache MISS fetches from origin, persists raw bytes, and forwards to the client. Bypassable viaCache-Control: no-cache. - Upstream forwarding — rewrites absolute-URI requests as
GET <path> HTTP/1.0\r\nHost: <hostname>\r\n..., setsAccept-Encoding: identityto avoid compressed responses, and sends a customUser-Agent: CSE310-Proxy.
What makes it stand out
- Below the abstraction — no Flask, no
http.server, nourllib, norequests. The wire format is visible in every line. - End-to-end cache semantics — key derivation, hit/miss branching, disk persistence, and invalidation all implemented by hand.
- Validated against real traffic — tested with Chrome, Firefox, and
curl -x, hittinggaia.cs.umass.edu/wireshark-labs/— the same HTTP-only targets used in Kurose & Ross Wireshark labs.
Stack
| Layer | Technology |
|---|---|
| Language | Python 3 |
| Networking | socket (standard library) |
| File I/O | os (standard library) |
| MIME detection | mimetypes (standard library) |
| HTTP parser | Hand-rolled string splitting |
| Cache storage | Flat files in cache/ directory |
Built for CSE 310: Computer Networks at Stony Brook University (Spring 2025). The assignment required implementing the HTTP request/response cycle without any HTTP abstraction libraries.