Skip to main content
WorkProjects

Energy Demand Forecasting

Weather-driven residential modeling

stable
View raw

Weather-driven residential energy-demand modelling. One year of hourly smart-meter data fused with co-located weather observations, aggregated to daily resolution, and pushed through two supervised learners: OLS regression for continuous load forecasting and logistic regression for demand-response classification.

What it is

An end-to-end pipeline that quantifies how residential electricity consumption responds to local weather. The project answers two questions: how well daily total load can be predicted from weather alone (regression), and whether the same feature set can discriminate high-temperature days at a useful rate (classification, a proxy for demand-response dispatch). A third module disaggregates appliance-level telemetry (refrigerator, dryer) into day/night profiles, surfacing load-shifting candidates for time-of-use tariffs.

By the numbers

MetricValue
Daily rows365 (1 Jan 2014 – 31 Dec 2014)
Missingness0
Linear Regression RMSE10.73 kW
Logistic Regression F10.5909
Train / test split334 days (Jan–Nov) / 31 days (Dec hold-out)
Dryer day:night ratio4.29
Fridge day:night ratio1.27

Data pipeline

StreamSourceCadenceAggregation
EnergyHourly smart meter (use [kW])1 hSum to daily
WeatherDarkSky API snapshots5 minMean to daily
  1. Load raw CSVs, validate path safety, coerce Unix timestamps to dates.
  2. Collapse hourly energy to daily totals; 5-min weather to daily means.
  3. Outer-merge on date. Result: 365 rows, zero nulls.
  4. Engineer predictors: temperature, dew point, pressure, humidity, cloud cover, precipitation intensity + probability, visibility, wind speed, wind bearing, Unix epoch time.
  5. Sine-cosine decomposition of wind bearing was tested and discarded (no gain).
  6. Z-standardise continuous predictors where the estimator requires it.

Key features

  • Linear regression for continuous load — OLS on daily totals, 334-day train / 31-day forward hold-out. RMSE 10.73 kW. December residuals concentrate around holiday weeks, indicating missing calendar covariates.
  • Logistic regression for demand-response classification — positive class is temperature >= 35 C, class prior 24/76 preserved without SMOTE, liblinear solver at max_iter=1000. F1 0.5909. F1 was chosen over accuracy because the skewed prior would reward the majority-class trivial classifier.
  • Appliance load profiling — per-circuit sub-meter aggregation partitioned into day (06:00–18:00) vs night (19:00–05:00). Dryer draws 981.84 kWh day vs 228.77 kWh night — a 4.29x daytime skew and the headline demand-response candidate.
  • Deterministic artefactsenergy_usage_predictions_linear.csv and high_temperature_classification_logistic.csv regenerate identically on each run.

What makes it stand out

  • Weather-only baseline, honestly scored. Linear methods were chosen deliberately to isolate the weather-to-load signal before introducing non-linear or sequence estimators.
  • Imbalanced classification handled without resampling. Class prior preserved; metric chosen to match the decision problem rather than the default.
  • Actionable disaggregation. The 4.29x dryer day:night ratio is a concrete load-shifting target, not a vanity chart.

Stack

LayerTechnology
LanguagePython 3.11+
Datapandas >= 2.2, numpy >= 1.26
Modellingscikit-learn >= 1.5
Visualisationmatplotlib >= 3.9
Notebookjupyterlab >= 4.1