A Brief Discussion on Chunking

Meaning of Block Partitioning#

Block partitioning is a concept, not a data structure.
The basic idea of block partitioning is to achieve better time complexity than general brute-force algorithms by appropriately dividing the original data and preprocessing some information on each block after partitioning.

Time Complexity#

The time complexity of block partitioning mainly depends on the block length, which can generally be determined by the mean inequality to find the optimal block length for a given problem, along with the corresponding time complexity.

Mean Inequality#

For the perfect square formula we learned in middle school, $(x+y)^2 \ge 0$, through simple transformation we can derive $x^2+y^2 \ge 2xy$.

By substituting $x = \sqrt{a}$ and $y = \sqrt{b}$, we get $a + b \ge 2\sqrt{ab}$ $(a, b > 0)$, where equality holds when $a = b$, which is the mean inequality.

Analysis#

Taking a problem as an example: Segment Tree 1

~~That's right, although it's a template problem for segment trees, it can also be solved using block partitioning, and it runs very fast.~~

Let the block length be $s$. Regardless of the operation on [l, r]:

If $l$ and $r$ are in the same block, enumerate: $O(s)$.
If $l$ and $r$ are not in the same block: enumerate the block containing $l$, then enumerate from the next block of $l$ to the previous block of $r$, and enumerate the block containing $r$: $O(s + \frac{n}{s})$.

If there are $m$ operations in total, the overall time complexity is $O(m(s + \frac{n}{s}))$. Using the mean inequality, we know that when $s = \frac{n}{s}$, this expression achieves its minimum value, i.e., optimal time complexity, at this point $s = \sqrt{n}$.

(tips: For different problems, the block length $s$ may vary, and specific analysis is required, but generally speaking, for operations like this problem, the block length is typically $\sqrt{n}$.)

Template#

For the segment tree problem mentioned earlier, we can write the block partitioning code.

#include <bits/stdc++.h>

using namespace std;
const int N = 1e5 + 5;

int n, m;
long long a[N], blk[N], b[N];
int pos[N];
int R;

void add(int l, int r, long long k)
{
    if(pos[l] == pos[r])
    {
        for(int i = l;i <= r;i++)
            a[i] += k;
        blk[pos[l]] += (r - l + 1) * k;
        return;
    }
    for(int i = l;pos[i] == pos[l];i++)
    {
        a[i] += k;
        blk[pos[i]] += k;
    }
    for(int i = pos[l] + 1;i <= pos[r] - 1;i++)
    {
        blk[i] += R * k;
        b[i] += k;
    }
    for(int i = r;pos[i] == pos[r];i--)
    {
        a[i] += k;
        blk[pos[i]] += k;
    }
}

long long query(int l, int r)
{
    long long ans = 0;
    if(pos[l] == pos[r])
    {
        for(int i = l;i <= r;i++)
        {
            ans += a[i] + b[pos[i]];
        }
        return ans;
    }
    for(int i = l;pos[i] == pos[l];i++)
    {
        ans += a[i] + b[pos[i]];
    }
    for(int i = pos[l] + 1;i <= pos[r] - 1;i++)
    {
        ans += blk[i];
    }
    for(int i = r;pos[i] == pos[r];i--)
    {
        ans += a[i] + b[pos[i]];
    }
    return ans;
}

int main()
{
    clock_t c1 = clock();
#ifdef LOCAL
    freopen("in.in", "r", stdin);
    freopen("out.out", "w", stdout);
#endif
    //------------------------------
    scanf("%d%d", &n, &m);
    R = sqrt(n) + 1;

    for(int i = 1;i <= n;i++)
    {
        scanf("%lld", &a[i]);
        pos[i] = i / R + 1;
        blk[pos[i]] += a[i];
    }
    long long opt, x, y, z;
    while(m--)
    {
        scanf("%lld", &opt);
        if(opt == 1)
        {
            scanf("%lld%lld%lld", &x, &y, &z);
            add(x, y, z);
        }
        else if(opt == 2)
        {
            scanf("%lld%lld", &x, &y);
            printf("%lld\n", query(x, y));
        }
    }

  
  
    //------------------------------
end:
    cerr << "Time Used:" << clock() - c1 << "ms" << endl;
    return 0;
}

tips:

The $pos$ array records the block number of each element.
The $blk$ array records the actual sum within each block.
The $b$ array acts like the $lazytag$ in segment trees, recording values that have been added to the $blk$ array but have not yet been "propagated" to the original array (however, propagation is not necessary here, as it is not implemented in the code).

histcat