Energy Demand Forecasting
Weather-driven residential modeling
Weather-driven residential energy-demand modelling. One year of hourly smart-meter data fused with co-located weather observations, aggregated to daily resolution, and pushed through two supervised learners: OLS regression for continuous load forecasting and logistic regression for demand-response classification.
What it is
An end-to-end pipeline that quantifies how residential electricity consumption responds to local weather. The project answers two questions: how well daily total load can be predicted from weather alone (regression), and whether the same feature set can discriminate high-temperature days at a useful rate (classification, a proxy for demand-response dispatch). A third module disaggregates appliance-level telemetry (refrigerator, dryer) into day/night profiles, surfacing load-shifting candidates for time-of-use tariffs.
By the numbers
| Metric | Value |
|---|---|
| Daily rows | 365 (1 Jan 2014 – 31 Dec 2014) |
| Missingness | 0 |
| Linear Regression RMSE | 10.73 kW |
| Logistic Regression F1 | 0.5909 |
| Train / test split | 334 days (Jan–Nov) / 31 days (Dec hold-out) |
| Dryer day:night ratio | 4.29 |
| Fridge day:night ratio | 1.27 |
Data pipeline
| Stream | Source | Cadence | Aggregation |
|---|---|---|---|
| Energy | Hourly smart meter (use [kW]) | 1 h | Sum to daily |
| Weather | DarkSky API snapshots | 5 min | Mean to daily |
- Load raw CSVs, validate path safety, coerce Unix timestamps to dates.
- Collapse hourly energy to daily totals; 5-min weather to daily means.
- Outer-merge on
date. Result: 365 rows, zero nulls. - Engineer predictors: temperature, dew point, pressure, humidity, cloud cover, precipitation intensity + probability, visibility, wind speed, wind bearing, Unix epoch time.
- Sine-cosine decomposition of wind bearing was tested and discarded (no gain).
- Z-standardise continuous predictors where the estimator requires it.
Key features
- Linear regression for continuous load — OLS on daily totals, 334-day train / 31-day forward hold-out. RMSE 10.73 kW. December residuals concentrate around holiday weeks, indicating missing calendar covariates.
- Logistic regression for demand-response classification — positive
class is
temperature >= 35 C, class prior 24/76 preserved without SMOTE,liblinearsolver atmax_iter=1000. F1 0.5909. F1 was chosen over accuracy because the skewed prior would reward the majority-class trivial classifier. - Appliance load profiling — per-circuit sub-meter aggregation partitioned into day (06:00–18:00) vs night (19:00–05:00). Dryer draws 981.84 kWh day vs 228.77 kWh night — a 4.29x daytime skew and the headline demand-response candidate.
- Deterministic artefacts —
energy_usage_predictions_linear.csvandhigh_temperature_classification_logistic.csvregenerate identically on each run.
What makes it stand out
- Weather-only baseline, honestly scored. Linear methods were chosen deliberately to isolate the weather-to-load signal before introducing non-linear or sequence estimators.
- Imbalanced classification handled without resampling. Class prior preserved; metric chosen to match the decision problem rather than the default.
- Actionable disaggregation. The 4.29x dryer day:night ratio is a concrete load-shifting target, not a vanity chart.
Stack
| Layer | Technology |
|---|---|
| Language | Python 3.11+ |
| Data | pandas >= 2.2, numpy >= 1.26 |
| Modelling | scikit-learn >= 1.5 |
| Visualisation | matplotlib >= 3.9 |
| Notebook | jupyterlab >= 4.1 |