Adaptive K

The big idea behind adaptive K is that we want to have both features that high K-steps’ rapid loss decreasing and low K-steps’ less overfitting. So we need to have a K-step value that can dynamically change during training.

To achieve this, we hope to reward the steep decrease and penalize the steep increase. In the following equations, and are hyperparameters.

v1

  • : constrains the K-step between empirical K-steps, such as
  • : rewards the steeper decrease, leads to higher K-step
  • : prevents to have concave function leading to overfitting

v2

It has a roughly same idea that allows K to fluctuate, but without derivatives.