Clean up getContigMergeOfInnerSize #5936

wujingyue · 2026-02-08T03:37:16Z

Makes the code less error prone, and removes the reliance on isValidDeviceSplit to support non-outermost sharding in the future.

Should be an NFC.

wujingyue · 2026-02-08T03:37:38Z

!test

github-actions · 2026-02-08T03:38:02Z

Review updated until commit cb8be65

Description

Clean up and simplify getContigMergeOfInnerSize function by removing complex device split handling
Refactor projectId method to accept additional leaf parameter for allocation domain projection
Move addProjectedExtent method implementation from header to cpp file
Improve code readability with modern C++ features like structured binding and range-based loops
Update test files with minor formatting and code quality improvements

Changes walkthrough

Relevant files

Enhancement

vectorize_helper.cpp `Major refactoring of vectorization helper functions` csrc/scheduler/vectorize_helper.cpp Added missing header includes and removed unused multidevice utils Moved addProjectedExtent method implementation from header to cpp file Refactored projectId method to accept additional leaf parameter for allocation domain handling Significantly simplified getContigMergeOfInnerSize by removing complex device split logic and using modern C++ iteration patterns Improved error handling and code structure in projection logic	+127/-116
vectorize_helper.h `Update method signatures and documentation` csrc/scheduler/vectorize_helper.h Moved addProjectedExtent method declaration from inline to proper method declaration Updated projectId method signature to include new leaf parameter Added comprehensive documentation explaining the projection logic and parameters	+10/-18

Tests

test_pointwise.cpp `Minor test improvements and formatting fixes` tests/cpp/test_pointwise.cpp Fixed minor formatting issue by removing trailing semicolon in constructor Improved code readability by using structured binding in for loops	+3/-7

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 PR contains tests

⚡ Recommended focus areas for review

Breaking API Change

The projectId function signature has been changed to add a new 'leaf' parameter. This is a breaking change that could affect any code calling this function. The PR should verify that all call sites have been updated and consider if this change is truly necessary or if there are alternative approaches.

std::vector<IterDomain*> ContiguousInnerDimensionsMapper::projectId(
    const std::vector<IterDomain*>& from,
    const std::vector<IterDomain*>& to,
    const std::vector<IterDomain*>& leaf) {

Loss of Device Sharding Logic

The new getContigMergeOfInnerSize implementation removes significant logic related to device sharding and multi-device scenarios. The old implementation had extensive handling for device splits, sharding validation, and device dimensions. This removal could break multi-device functionality. The PR should validate that device sharding is still properly handled or explain why this logic is no longer needed.

Val* ContiguousInnerDimensionsMapper::getContigMergeOfInnerSize(
    TensorView* tv) {
  FusionGuard fg(tv->fusion());
  const std::vector<IterDomain*>& alloc = tv->getMaybeAllocationDomain();
  const std::vector<std::optional<bool>>& contiguity = tv->getContiguity();

  NVF_ERROR(hasMappedDims(tv));
  const std::vector<IterDomain*>& projected_dims = mappedLogicalIds(tv);

  Val* product_of_inner_extents = tv->container()->oneVal();
  // Order is important, need to make sure dimensions match up correctly with
  // what was propogated through the mapper. The mapper's dimensions is
  // propogated in the order of the reference, if that order doesn't match the
  // tensor we're mapping too then a transpose interfered with expanded the
  // vectorize dimension.
  auto projected_dim = projected_dims.rbegin();
  // Wish I could `zip(alloc, contiguity) | std::views::reverse` here. It
  // doesn't compile.
  for (auto [alloc_id, cont] :
       zip(alloc | std::views::reverse, contiguity | std::views::reverse)) {
    auto is_treated_as_size_one = [](IterDomain* id) {
      return id->isReduction() || id->isBroadcast() || id->isParallelized() ||
          id->extent()->isOneInt();
    };
    if (is_treated_as_size_one(alloc_id)) {
      continue;
    }

    NVF_ERROR(cont.has_value());
    if (!cont.value()) {
      break;
    }

    while (projected_dim != projected_dims.rend() &&
           is_treated_as_size_one(*projected_dim)) {
      projected_dim++;
    }

    IterDomain* logical_id = [&]() {
      std::vector<IterDomain*> reachable_ids =
          ir_utils::getReachableIds(tv->getLogicalDomain(), {alloc_id});
      NVF_ERROR_EQ(reachable_ids.size(), 1);
      return reachable_ids.front();
    }();

    // Mapping order isn't correct, cannot expand vectorization dimension.
    if (projected_dim == projected_dims.rend() ||
        *projected_dim != logical_id) {
      break;
    }
    // This assumes projected_dim can be matched only once. This assumption is
    // OK for now but when we get to non-outermost sharding such as
    // ```
    //    [iS0]
    //    /  \.
    //  iS1  iS2
    //       /  \.
    // iDIDx3  iS4
    // ```
    // We may want to allow multiple contiguous allocation IDs to match
    // projected_dim.
    projected_dim++;

    product_of_inner_extents = SimplifyingIrBuilder::mulExpr(
        product_of_inner_extents, getProjectedExtent(alloc_id));
  }
  return product_of_inner_extents;
}

Complex Logic Refactoring

The refactored getContigMergeOfInnerSize uses reverse iterators and complex lambda logic that differs significantly from the original implementation. The new logic makes assumptions about dimension mapping and contiguity analysis that should be validated. The PR should include performance benchmarks and correctness tests to ensure the refactoring doesn't introduce regressions.

auto projected_dim = projected_dims.rbegin();
// Wish I could `zip(alloc, contiguity) | std::views::reverse` here. It
// doesn't compile.
for (auto [alloc_id, cont] :
     zip(alloc | std::views::reverse, contiguity | std::views::reverse)) {
  auto is_treated_as_size_one = [](IterDomain* id) {
    return id->isReduction() || id->isBroadcast() || id->isParallelized() ||
        id->extent()->isOneInt();
  };
  if (is_treated_as_size_one(alloc_id)) {
    continue;
  }

  NVF_ERROR(cont.has_value());
  if (!cont.value()) {
    break;
  }

  while (projected_dim != projected_dims.rend() &&
         is_treated_as_size_one(*projected_dim)) {
    projected_dim++;
  }

  IterDomain* logical_id = [&]() {
    std::vector<IterDomain*> reachable_ids =
        ir_utils::getReachableIds(tv->getLogicalDomain(), {alloc_id});
    NVF_ERROR_EQ(reachable_ids.size(), 1);
    return reachable_ids.front();
  }();

  // Mapping order isn't correct, cannot expand vectorization dimension.
  if (projected_dim == projected_dims.rend() ||
      *projected_dim != logical_id) {
    break;
  }
  // This assumes projected_dim can be matched only once. This assumption is
  // OK for now but when we get to non-outermost sharding such as
  // ```
  //    [iS0]
  //    /  \.
  //  iS1  iS2
  //       /  \.
  // iDIDx3  iS4
  // ```
  // We may want to allow multiple contiguous allocation IDs to match
  // projected_dim.
  projected_dim++;

  product_of_inner_extents = SimplifyingIrBuilder::mulExpr(
      product_of_inner_extents, getProjectedExtent(alloc_id));
}
return product_of_inner_extents;

wujingyue · 2026-02-08T06:47:37Z

!test

wujingyue · 2026-02-10T18:36:07Z

!test

wujingyue · 2026-02-10T23:03:05Z

!test

wujingyue · 2026-02-11T01:33:22Z

!test

wujingyue · 2026-02-11T01:34:50Z

csrc/scheduler/vectorize_helper.cpp

      getProjectedExtent(id), commonOrConstExtent(ca_map_, id));
 }

+void ContiguousInnerDimensionsMapper::addProjectedExtent(


https://google.github.io/styleguide/cppguide.html#Defining_Functions_in_Header_Files

wujingyue · 2026-02-11T01:35:50Z

csrc/scheduler/vectorize_helper.cpp

  // Ordering of dimensions is important in this analysis, if an ordering is
  // contiguous in the reference, but not the target tensor views, then we
  // cannot consider that a contiguous merge dimension for vectorization.
-  auto projected_logical = projectId(filtered_ids, logical_domain);


projected_logical gives me the wrong impression that the whole logical domain is projected. In fact, it's still as filtered as filtered_ids.

greptile-apps · 2026-02-11T01:36:41Z

Greptile Overview

Greptile Summary

This PR refactors ContiguousInnerDimensionsMapper’s projection bookkeeping to avoid double-registering projected extents and to additionally project down to allocation domains. It also rewrites getContigMergeOfInnerSize to iterate allocation IDs and contiguity in reverse using structured bindings, and updates a couple of pointwise tests with structured bindings.

The main functional change is that the contig-inner-size computation now attempts to multiply projected extents of allocation IDs after matching against the mapper’s logical IDs, relying on projectId(..., leaf=allocation_domain) to have recorded those allocation projected extents.

Confidence Score: 2/5

This PR has a couple of correctness risks in vectorization extent computation that should be addressed before merging.
The refactor changes getContigMergeOfInnerSize to depend on projected extents for allocation IDs and to zip allocation IDs with contiguity. If allocation projected extents are not always recorded, this can throw at runtime; if contiguity is not aligned with allocation domain, the computed inner extent can silently be wrong, affecting vectorization factor selection.
csrc/scheduler/vectorize_helper.cpp

Important Files Changed

Filename	Overview
csrc/scheduler/vectorize_helper.cpp	Refactors projection logic to record projected extents across root/logical/allocation domains and rewrites `getContigMergeOfInnerSize` to iterate allocation+contiguity in reverse; potential runtime errors if allocation IDs aren’t recorded in `projected_extent_` and possible contiguity/allocation size mismatch due to zipping.
csrc/scheduler/vectorize_helper.h	Updates `projectId` signature to include a `leaf` domain for allocation projection and moves `addProjectedExtent` definition to the .cpp; no standalone issues found in the header.
tests/cpp/test_pointwise.cpp	Minor test cleanups: structured bindings in loops and a trivial constructor formatting tweak; no functional changes to coverage.

Sequence Diagram

sequenceDiagram
  participant S as Scheduler/VectorizeHeuristic
  participant M as ContiguousInnerDimensionsMapper
  participant TV as TensorView (reference)
  participant TV2 as TensorView (target)

  S->>M: map(reference_tv, logical_ids)
  activate M
  M->>M: recording_=true
  M->>M: addProjectedExtent(logical_id, commonOrConstExtent)
  M->>M: projectId(filtered_ids, logical_domain, logical_domain)
  M->>M: projectId(filtered_ids, root_domain, allocation_domain)
  M->>M: recording_=false
  M->>M: traverse spanning tree
  deactivate M

  S->>M: getTvToContigMergeOfInnerSizeMap()
  activate M
  loop for each tv in tv_infos_
    M->>M: getContigMergeOfInnerSize(tv)
    M->>TV2: alloc = getMaybeAllocationDomain()
    M->>TV2: contiguity = getContiguity()
    M->>M: projected_dims = mappedLogicalIds(tv)
    M->>M: iterate alloc & contiguity (reverse)
    M->>M: logical_id = ir_utils::getReachableIds(logical_domain, {alloc_id})
    M->>M: if logical_id matches next projected_dim
    M->>M: product *= getProjectedExtent(alloc_id)
  end
  deactivate M

greptile-apps

_{3 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-02-11T01:36:44Z

csrc/scheduler/vectorize_helper.cpp

+  for (auto [alloc_id, cont] :
+       zip(alloc | std::views::reverse, contiguity | std::views::reverse)) {
+    auto is_treated_as_size_one = [](IterDomain* id) {
+      return id->isReduction() || id->isBroadcast() || id->isParallelized() ||
+          id->extent()->isOneInt();
+    };
+    if (is_treated_as_size_one(alloc_id)) {
      continue;
    }

-    auto contiguity_i = contiguity.at(alloc_ii);
-    if (!contiguity_i.has_value()) {
-      NVF_THROW("contiguity flag at alloc_ii can't be null");
-    } else {
-      // Not contiguous
-      if (!contiguity_i.value()) {
-        break;
-      }
+    NVF_ERROR(cont.has_value());
+    if (!cont.value()) {
+      break;
    }

-    // Get the logical ID corresponding to the allocation ID.
-    auto exprs = DependencyCheck::getAllExprsBetween(
-        {tv->getLogicalDomain().begin(), tv->getLogicalDomain().end()},
-        {alloc_iid});
-    IterDomain* logical_id = alloc_iid;
-    Val* num_devices = tv->container()->oneVal();
-    bool only_valid_device_split = true;
-    for (Expr* expr : exprs | std::views::reverse) {
-      if (!isValidDeviceSplit(expr)) {
-        only_valid_device_split = false;
-        break;
-      }
-      auto* split = expr->as<Split>();
-      logical_id = split->in();
-      num_devices = SimplifyingIrBuilder::mulExpr(num_devices, split->factor());
+    while (projected_dim != projected_dims.rend() &&
+           is_treated_as_size_one(*projected_dim)) {
+      projected_dim++;
    }

-    // Non device split could lead to padding, which prevents vectorization
-    if (!only_valid_device_split) {
-      break;
-    }
+    IterDomain* logical_id = [&]() {
+      std::vector<IterDomain*> reachable_ids =
+          ir_utils::getReachableIds(tv->getLogicalDomain(), {alloc_id});
+      NVF_ERROR_EQ(reachable_ids.size(), 1);
+      return reachable_ids.front();
+    }();

    // Mapping order isn't correct, cannot expand vectorization dimension.
-    if (projected_dims[--projected_dims_i] != logical_id) {
+    if (projected_dim == projected_dims.rend() ||
+        *projected_dim != logical_id) {
      break;
    }
-
-    Val* sharded_extent;
-    if (logical_id->isDeviceDim()) {
-      sharded_extent = tv->container()->oneVal();
-    } else {
-      sharded_extent = SimplifyingIrBuilder::divExpr(
-          getProjectedExtent(logical_id), num_devices);
-    }
-    product_of_inner_extents =
-        SimplifyingIrBuilder::mulExpr(product_of_inner_extents, sharded_extent);
+    // This assumes projected_dim can be matched only once. This assumption is
+    // OK for now but when we get to non-outermost sharding such as
+    // ```
+    //    [iS0]
+    //    /  \.
+    //  iS1  iS2
+    //       /  \.
+    // iDIDx3  iS4
+    // ```
+    // We may want to allow multiple contiguous allocation IDs to match
+    // projected_dim.
+    projected_dim++;
+
+    product_of_inner_extents = SimplifyingIrBuilder::mulExpr(
+        product_of_inner_extents, getProjectedExtent(alloc_id));


Allocation extent mismatched

getContigMergeOfInnerSize now multiplies getProjectedExtent(alloc_id) (i.e., allocation ID) after matching logical_id against mappedLogicalIds(tv) (logical IDs). projected_extent_ values originate from recording logical/root projections and are not guaranteed to include allocation IDs; in those cases getProjectedExtent(alloc_id) will throw "Not projected" at runtime. Even when present, using allocation-ID extents here changes semantics vs the previous logical-ID-based computation and can incorrectly size the contig inner extent for TVs with an allocation permutation. Consider multiplying the projected extent for the matched logical_id (or ensure allocation IDs are always recorded consistently before using them here).

greptile-apps · 2026-02-11T01:36:45Z

csrc/scheduler/vectorize_helper.cpp

  const std::vector<IterDomain*>& alloc = tv->getMaybeAllocationDomain();
+  const std::vector<std::optional<bool>>& contiguity = tv->getContiguity();

  NVF_ERROR(hasMappedDims(tv));
-
  const std::vector<IterDomain*>& projected_dims = mappedLogicalIds(tv);
-  auto alloc_no_reductions = TensorDomain::noReductions(alloc);
-
-  std::vector<std::optional<bool>> contiguity = tv->domain()->contiguity();
-  NVF_ERROR_EQ(contiguity.size(), alloc.size());
-  // Appears after reductions the reduction domain often has a contiguity entry.
-  // This only matters if the result of the reduction is an output
-  if (contiguity.size() != alloc_no_reductions.size()) {
-    std::vector<std::optional<bool>> new_contiguity;
-    for (auto i : arange(alloc.size())) {
-      if (!alloc[i]->isReduction()) {
-        new_contiguity.push_back(contiguity.at(i));
-      }
-    }
-    contiguity = new_contiguity;
-  }
-
-  auto alloc_no_reductions_size = alloc_no_reductions.size();
-
-  NVF_ERROR_EQ(alloc_no_reductions_size, contiguity.size());

+  Val* product_of_inner_extents = tv->container()->oneVal();
  // Order is important, need to make sure dimensions match up correctly with
  // what was propogated through the mapper. The mapper's dimensions is
  // propogated in the order of the reference, if that order doesn't match the
  // tensor we're mapping too then a transpose interfered with expanded the
  // vectorize dimension.
-  size_t projected_dims_i = projected_dims.size();
-
-  for (auto i : arange(alloc_no_reductions_size)) {
-    if (projected_dims_i == 0) {
-      break;
-    }
-    auto alloc_ii = alloc_no_reductions_size - i - 1;
-    auto alloc_iid = alloc_no_reductions.at(alloc_ii);
-
-    if (alloc_iid->extent()->isOneInt() || alloc_iid->isBroadcast()) {
-      if (projected_dims[projected_dims_i - 1] == alloc_iid) {
-        --projected_dims_i;
-      }
+  auto projected_dim = projected_dims.rbegin();
+  // Wish I could `zip(alloc, contiguity) | std::views::reverse` here. It
+  // doesn't compile.
+  for (auto [alloc_id, cont] :
+       zip(alloc | std::views::reverse, contiguity | std::views::reverse)) {


Contiguity/alloc size assumption

This loop zips tv->getMaybeAllocationDomain() with tv->getContiguity() and iterates them in lockstep. If getContiguity() is defined in terms of the logical/root domain (as it historically was via tv->domain()->contiguity()), TVs with a distinct allocation domain can have a different rank/order, making the zip silently drop trailing elements and compute an incorrect inner-extent product. At minimum this should assert alloc.size() == contiguity.size() before zipping (or fetch contiguity for the allocation domain explicitly).

wujingyue changed the title ~~Wjy/sloppy~~ Clean up getContigMergeOfInnerSize Feb 8, 2026

wujingyue force-pushed the wjy/sloppy branch 2 times, most recently from 78310cd to ba2f27d Compare February 8, 2026 06:46

wujingyue force-pushed the wjy/sloppy branch 3 times, most recently from f3a9608 to f184027 Compare February 10, 2026 06:56

wujingyue added 4 commits February 10, 2026 15:01

Unnecessary condition

ffaa4c1

WIP

160f4a7

Use getProjectedExtent

74d0b7e

WIP -- traverse to loop

28258ee

wujingyue force-pushed the wjy/sloppy branch from f184027 to b7345f7 Compare February 10, 2026 23:02

wujingyue added 3 commits February 10, 2026 16:29

WIP

c741536

Propagate to allocation instead of loop

5840c87

Clean up

12cc04e

wujingyue force-pushed the wjy/sloppy branch from aaa42fd to 12cc04e Compare February 11, 2026 00:29

wujingyue added 3 commits February 10, 2026 16:37

More

2a5d779

More

335bfb0

Skip size-1 dimensions

cb8be65

wujingyue marked this pull request as ready for review February 11, 2026 01:29

wujingyue requested review from Priya2698 and jjsjann123 February 11, 2026 01:32

wujingyue commented Feb 11, 2026

View reviewed changes

greptile-apps bot reviewed Feb 11, 2026

View reviewed changes

Clean up getContigMergeOfInnerSize #5936

Are you sure you want to change the base?

Clean up getContigMergeOfInnerSize #5936

Conversation

wujingyue commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wujingyue commented Feb 8, 2026

Uh oh!

github-actions bot commented Feb 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes walkthrough

PR Reviewer Guide

Uh oh!

wujingyue commented Feb 8, 2026

Uh oh!

wujingyue commented Feb 10, 2026

Uh oh!

wujingyue commented Feb 10, 2026

Uh oh!

wujingyue commented Feb 11, 2026

Uh oh!

wujingyue Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

wujingyue Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot commented Feb 11, 2026

Greptile Overview

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wujingyue commented Feb 8, 2026 •

edited

Loading

github-actions bot commented Feb 8, 2026 •

edited

Loading