You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The new logic doubles tile_size2 when bits_in_flight_per_sm < required_bits_per_sm. This is a reasonable approach, but consider whether a single doubling is sufficient or if iterative doubling might be needed for very large gaps. Also verify that the calculation of total_input_bits_per_elem correctly accounts for all input tensors in complex fusion scenarios.
// Double tile_size2 if the default configuration doesn't provide enough// bytes in flight to saturate memory bandwidth. This is based on Little's// law: bytes_in_flight = bandwidth * latency. We estimate the bits in flight// per SM as: (sum of input tensor element sizes) * elements_per_tile *// blocks_per_sm. If this is less than the required bits in flight (derived// from hardware bandwidth and memory latency), we double tile_size2 to// increase the data in flight.constauto dev_prop = at::cuda::getCurrentDeviceProperties();
constint64_t max_blocks_per_sm = dev_prop->maxBlocksPerMultiProcessor;
constint64_t num_elems_per_tile = tparams->tile_size1 * tparams->tile_size2;
constint64_t required_bits_per_sm =
scheduler_utils::getRequiredBitsInFlight();
int64_t total_input_bits_per_elem = 0;
for (auto tv : ir_utils::filterByType<TensorView>(fusion->inputs())) {
total_input_bits_per_elem +=
dataTypeSizeBit(tv->getDataType().value(), index_type);
}
constint64_t bits_in_flight_per_sm =
total_input_bits_per_elem * num_elems_per_tile * max_blocks_per_sm;
if (bits_in_flight_per_sm < required_bits_per_sm) {
tparams->tile_size2 *= 2;
}
The test modifications introduce num_inputs parameter (1 or 2) to validate both single and multi-input transpose scenarios. Ensure that the test coverage adequately exercises the new tile size adjustment logic across different input configurations and that the baseline comparisons remain valid.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.