Dynamic Loops

poet::dynamic_for runs a runtime range by emitting compile-time unrolled blocks.

Basic form

poet::dynamic_for<4>(0u, n, [](std::size_t i) {
    out[i] = f(i);
});

Other overloads:

  • poet::dynamic_for<Unroll>(count, func) for [0, count)

  • poet::dynamic_for<Unroll>(begin, end, func) for inferred +1 or -1 step

  • poet::dynamic_for<Unroll>(begin, end, step, func) for runtime step

  • poet::dynamic_for<Unroll, Step>(begin, end, func) for compile-time step

Lane-aware callbacks

The two-argument form exposes the lane within the current unrolled block:

std::array<double, 4> acc{};
poet::dynamic_for<4>(0u, n, [&](auto lane, std::size_t i) {
    acc[lane] += work(i);
});

This is the main performance-oriented use case. For trivial index-only work, an ordinary for loop can be just as good or better.

Compile-time step

poet::dynamic_for<4, 2>(0, 16, [](int i) {
    use(i); // 0, 2, 4, ..., 14
});

poet::dynamic_for<4, -1>(10, 0, [](int i) {
    use(i); // 10, 9, ..., 1
});

C++20 adaptor

auto r = std::views::iota(0) | std::views::take(10);
r | poet::make_dynamic_for<4>([](int i) {
    use(i);
});

std::tuple{0, 24, 2} | poet::make_dynamic_for<4>([](int i) {
    use(i);
});

Notes:

  • The adaptor is eager: it invokes dynamic_for immediately.

  • The generic range overload treats the input as a consecutive [start, start + count) sequence.

  • Tuple input preserves explicit (begin, end, step) semantics.

Runnable example

For a worked lane-aware-ILP example, see examples/dot_product.cpp (Try on Compiler Explorer Compiler Explorer), and the Google Benchmark microbench examples/benchmark.cpp which runs scalar vs dynamic_for<4> vs dynamic_for<8> directly on Compiler Explorer in Execute mode (Try on Compiler Explorer Run benchmark on Compiler Explorer).

Links are regenerated by tools/make_godbolt_links.py; all links point at the latest amalgamated header on the single-header branch via Compiler Explorer’s URL-include feature.