Skip to main content
Firaz Zakariya
← Projects

Lithium-ion cells lose capacity as they cycle. For EVs, knowing how much capacity remains — and when a pack will fall below the threshold that makes a vehicle unusable — matters for resale value, warranty pricing, and second-life battery markets. This project is a reproducible benchmark of four model families on that prediction task.

Problem

Existing comparisons in the literature are hard to replicate: different datasets, different train/test splits, different error metrics. Before choosing a model architecture for a production system, I wanted an honest, apples-to-apples comparison on real driving profiles rather than lab constant-current cycling.

Approach

Data. [Describe the dataset(s) — e.g. a public EV field dataset, synthetic profiles generated from a physics model, or a combination. Key fields: voltage, current, temperature, state of charge over time, measured capacity per cycle.]

Four model families compared:

  1. Statistical baseline — [e.g. exponential decay fit, Kalman filter, or linear regression on cycle features]. Interpretable and fast; the benchmark everything else must beat.
  2. Gradient-boosted (LightGBM) — hand-crafted features per cycle (capacity fade rate, internal resistance estimate, discharge curve statistics), trained with grouped cross-validation by cell to prevent leakage.
  3. LSTM — sequence model operating directly on raw voltage/current curves within each cycle. [Describe architecture, input window, training setup.]
  4. Physics-informed — [Describe: e.g. a neural ODE or a model that embeds the Arrhenius degradation equation as a constraint. What prior knowledge was encoded and why.]

Evaluation. Prognostic metrics: RMSE on remaining useful life, relative error at end-of-life prediction, and cross-dataset generalisation (train on one dataset, evaluate on another).

FastAPI. The best-performing model is served via a small REST API — POST a cell’s recent cycle data, get back a predicted RUL and confidence interval.

Results

[Fill in your actual numbers. A table here is ideal — one row per model, columns for in-distribution RMSE, out-of-distribution RMSE, inference latency.]

ModelRMSENotes
ModelIn-distribution RMSECross-dataset RMSE
Statistical baseline
LightGBM
LSTM
Physics-informed

[Narrative: which model family won, where it failed, what the cross-dataset gap tells you about generalisation.]

Code

[GitHub URL] — includes data preprocessing, training scripts, evaluation notebooks, and the FastAPI service.