undefined — Firaz Zakariya

Lithium-ion cells lose capacity as they cycle. For EVs, knowing how much capacity remains — and when a pack will fall below the threshold that makes a vehicle unusable — matters for resale value, warranty pricing, and second-life battery markets. This project is a reproducible benchmark of four model families on that prediction task.

Problem

Existing comparisons in the literature are hard to replicate: different datasets, different train/test splits, different error metrics. Before choosing a model architecture for a production system, I wanted an honest, apples-to-apples comparison on real driving profiles rather than lab constant-current cycling.

Approach

Data. [Describe the dataset(s) — e.g. a public EV field dataset, synthetic profiles generated from a physics model, or a combination. Key fields: voltage, current, temperature, state of charge over time, measured capacity per cycle.]

Four model families compared:

Statistical baseline — [e.g. exponential decay fit, Kalman filter, or linear regression on cycle features]. Interpretable and fast; the benchmark everything else must beat.
Gradient-boosted (LightGBM) — hand-crafted features per cycle (capacity fade rate, internal resistance estimate, discharge curve statistics), trained with grouped cross-validation by cell to prevent leakage.
LSTM — sequence model operating directly on raw voltage/current curves within each cycle. [Describe architecture, input window, training setup.]
Physics-informed — [Describe: e.g. a neural ODE or a model that embeds the Arrhenius degradation equation as a constraint. What prior knowledge was encoded and why.]

Evaluation. Prognostic metrics: RMSE on remaining useful life, relative error at end-of-life prediction, and cross-dataset generalisation (train on one dataset, evaluate on another).

FastAPI. The best-performing model is served via a small REST API — POST a cell’s recent cycle data, get back a predicted RUL and confidence interval.

Results

[Fill in your actual numbers. A table here is ideal — one row per model, columns for in-distribution RMSE, out-of-distribution RMSE, inference latency.]

Model	RMSE	Notes
—	—	—

Model	In-distribution RMSE	Cross-dataset RMSE
Statistical baseline	—	—
LightGBM	—	—
LSTM	—	—
Physics-informed	—	—

[Narrative: which model family won, where it failed, what the cross-dataset gap tells you about generalisation.]

Code

[GitHub URL] — includes data preprocessing, training scripts, evaluation notebooks, and the FastAPI service.