Lithium-ion cells lose capacity as they cycle. For EVs, knowing how much capacity remains — and when a pack will fall below the threshold that makes a vehicle unusable — matters for resale value, warranty pricing, and second-life battery markets. This project is a reproducible benchmark of four model families on that prediction task.
Problem
Existing comparisons in the literature are hard to replicate: different datasets, different train/test splits, different error metrics. Before choosing a model architecture for a production system, I wanted an honest, apples-to-apples comparison on real driving profiles rather than lab constant-current cycling.
Approach
Data. [Describe the dataset(s) — e.g. a public EV field dataset, synthetic profiles generated from a physics model, or a combination. Key fields: voltage, current, temperature, state of charge over time, measured capacity per cycle.]
Four model families compared:
- Statistical baseline — [e.g. exponential decay fit, Kalman filter, or linear regression on cycle features]. Interpretable and fast; the benchmark everything else must beat.
- Gradient-boosted (LightGBM) — hand-crafted features per cycle (capacity fade rate, internal resistance estimate, discharge curve statistics), trained with grouped cross-validation by cell to prevent leakage.
- LSTM — sequence model operating directly on raw voltage/current curves within each cycle. [Describe architecture, input window, training setup.]
- Physics-informed — [Describe: e.g. a neural ODE or a model that embeds the Arrhenius degradation equation as a constraint. What prior knowledge was encoded and why.]
Evaluation. Prognostic metrics: RMSE on remaining useful life, relative error at end-of-life prediction, and cross-dataset generalisation (train on one dataset, evaluate on another).
FastAPI. The best-performing model is served via a small REST API — POST a cell’s recent cycle data, get back a predicted RUL and confidence interval.
Results
[Fill in your actual numbers. A table here is ideal — one row per model, columns for in-distribution RMSE, out-of-distribution RMSE, inference latency.]
| Model | RMSE | Notes |
|---|---|---|
| — | — | — |
| Model | In-distribution RMSE | Cross-dataset RMSE |
|---|---|---|
| Statistical baseline | — | — |
| LightGBM | — | — |
| LSTM | — | — |
| Physics-informed | — | — |
[Narrative: which model family won, where it failed, what the cross-dataset gap tells you about generalisation.]
Code
[GitHub URL] — includes data preprocessing, training scripts, evaluation notebooks, and the FastAPI service.