Fix array_repeat handling of null count values#20102
Conversation
| OffsetBuffer::new(offsets.into()), | ||
| repeated_values, | ||
| None, | ||
| Some(NullBuffer::new(nulls.finish())), |
There was a problem hiding this comment.
We can copy the nulls buffer from the count array instead of using a builder
| Some(NullBuffer::new(nulls.finish())), | |
| count_array.nulls().cloned(), |
| let total_repeated_values: usize = count_array | ||
| .values() | ||
| .iter() | ||
| .map(|&c| if c > 0 { c as usize } else { 0 }) |
There was a problem hiding this comment.
nit: technically the spec allows null slots to have non-0 values, so this could overestimate
There was a problem hiding this comment.
Possible fix:
let total_repeated_values: usize = (0..count_array.len())
.map(|i| get_count_with_validity(count_array, i).0)
.sum();| ), | ||
| Arc::new(inner_list), | ||
| None, | ||
| Some(NullBuffer::new(outer_nulls.finish())), |
There was a problem hiding this comment.
Same note here about reusing input null buffer
| let total_repeated_values: usize = count_array | ||
| .values() | ||
| .iter() | ||
| .map(|&c| if c > 0 { c as usize } else { 0 }) |
There was a problem hiding this comment.
Possible fix:
let total_repeated_values: usize = (0..count_array.len())
.map(|i| get_count_with_validity(count_array, i).0)
.sum();| count_array: &UInt64Array, | ||
| count_array: &Int64Array, | ||
| ) -> Result<ArrayRef> { | ||
| let counts = count_array.values(); |
There was a problem hiding this comment.
This also does not check for NULLs in the count_array and may lead to overestimates. You need to use get_count_with_validity() at line 245 too
| let (count, is_valid) = get_count_with_validity(count_array, idx); | ||
|
|
||
| running_offset += count; | ||
| offsets.push(O::from_usize(running_offset).unwrap()); |
There was a problem hiding this comment.
Extreme case:
- if the input values is ListArray, then its offset type will be i32
- and if the count value is bigger than i32::MAX
- then i32::from_usize(i32::MAX + 1) will return None and it will panic
Co-authored-by: Martin Grigorov <[email protected]> Co-authored-by: Jeffrey Vo <[email protected]>
|
Thanks for the review. |
Which issue does this PR close?
array_repeatfunction does not handle null values in count array #20075.Rationale for this change
The previous implementation of
array_repeatrelied on Arrow defaults when handling null and negative count values. As a result, null counts were implicitly treated as zero and returned empty arrays, which is a correctness issue.This PR makes the handling of these edge cases explicit and aligns the function with SQL null semantics.
What changes are included in this PR?
Int64Are these changes tested?
Yes, SLTs added and pass.
Are there any user-facing changes?
Yes. When the count value is null,
array_repeatnow returns a null array instead of an empty array.