Skip to main content
WorkProjects

HTTP Server & Proxy

Raw sockets, no libraries

stable
View raw

HTTP web server and caching proxy built from scratch with raw sockets. No external libraries, no frameworks, no HTTP parsers — just Python's standard socket module and hand-rolled HTTP/1.1 parsing.

What it is

Two cooperating Python programs that implement the HTTP/1.1 request/response cycle at the socket layer. webserver.py is an origin server on 127.0.0.1:6789 that parses GET requests, resolves paths against the working directory, detects MIME types, and streams file bytes back. proxyserver.py is a forwarding HTTP proxy on 127.0.0.1:8888 with an on-disk cache: it accepts absolute-URI GET requests, opens a fresh TCP socket to the origin on port 80, caches the response to cache/, and serves subsequent requests without touching the network. Everything lives below the http.server / urllib abstractions — request lines are tokenized by hand, response headers are concatenated strings, sockets are opened and closed explicitly.

By the numbers

MetricValue
Servers2 (origin + forwarding proxy)
External dependencies0 (standard library only)
Total LOC~250 (176 server + 218 proxy)
Recv buffer1024 B (server) / 4096 B (proxy)
Upstream timeout10 s
Listen backlog5
TermFeb - Mar 2025

Architecture

Client (curl / browser)
   |
   |  HTTP/1.1 GET
   v
+--------------------+         +----------------------+
|  webserver.py      |         |  proxyserver.py      |
|  127.0.0.1:6789    |         |  127.0.0.1:8888      |
|                    |         |                      |
|  recv -> parse ->  |         |  recv -> parse ->    |
|  resolve file ->   |         |  cache lookup ->     |
|  mimetype ->       |         |    HIT: serve cached |
|  200 OK + bytes    |         |    MISS: forward ->  |
|   (or 404)         |         |      connect :80 ->  |
|                    |         |      recv -> cache  -|----> origin
+--------------------+         +----------------------+       (port 80)

Key features

  • Raw socket programmingAF_INET + SOCK_STREAM with SO_REUSEADDR, explicit bind / listen(5) / accept loop, sendall for complete response delivery, and settimeout(10) on upstream connections to prevent hangs.
  • Hand-rolled HTTP/1.1 parsing — splits requests on \r\n, tokenizes the request line into method + URI + version, validates GET, and constructs response headers as formatted strings with Content-Type, Content-Length, and Connection: close.
  • MIME type detectionmimetypes.guess_type maps extensions to types (HTML, text, PNG, JPEG), falling back to application/octet-stream. Binary-safe file reads (rb) for images.
  • On-disk proxy cache — cache key derived from URL with / replaced by _. Cache HIT reads from disk and skips the upstream fetch; cache MISS fetches from origin, persists raw bytes, and forwards to the client. Bypassable via Cache-Control: no-cache.
  • Upstream forwarding — rewrites absolute-URI requests as GET <path> HTTP/1.0\r\nHost: <hostname>\r\n..., sets Accept-Encoding: identity to avoid compressed responses, and sends a custom User-Agent: CSE310-Proxy.

What makes it stand out

  • Below the abstraction — no Flask, no http.server, no urllib, no requests. The wire format is visible in every line.
  • End-to-end cache semantics — key derivation, hit/miss branching, disk persistence, and invalidation all implemented by hand.
  • Validated against real traffic — tested with Chrome, Firefox, and curl -x, hitting gaia.cs.umass.edu/wireshark-labs/ — the same HTTP-only targets used in Kurose & Ross Wireshark labs.

Stack

LayerTechnology
LanguagePython 3
Networkingsocket (standard library)
File I/Oos (standard library)
MIME detectionmimetypes (standard library)
HTTP parserHand-rolled string splitting
Cache storageFlat files in cache/ directory

Built for CSE 310: Computer Networks at Stony Brook University (Spring 2025). The assignment required implementing the HTTP request/response cycle without any HTTP abstraction libraries.