libstdc++: Optimize _Utf_iterator for size

This reorders the data members of _Utf_iterator to avoid padding bytes
between members due to alignment requirements. For x86_64 the previous
layout had padding after _M_buf and after _M_to_increment for the common
case where the iterators and sentinel types are pointers, so the size
shrinks from 40 bytes to 32 bytes.  (For i686 there's no change, it's
still 20 bytes).

We could compress the three uint8_t members into one byte by using
bit-fields:

uint8_t _M_buf_index : 2;    // [0,3]
uint8_t _M_buf_last  : 3;    // [0,4]
uint8_t _M_to_increment : 3; // [0,4]

But there doesn't seem to be any point, because it will just be slower
to access them and there will be tail padding so the size isn't any
smaller. We could also reduce _M_buf_last and _M_to_increment to 2 bits
because the 0 value is only used for a default constructed iterator, and
we don't actually care about the values in that case. Again, this
doesn't seem worth doing.

libstdc++-v3/ChangeLog:

	* include/bits/unicode.h (_Utf_iterator): Reorder data members
	to be more compact.

Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
This commit is contained in:
Jonathan Wakely
2025-07-17 00:11:49 +01:00
committed by Jonathan Wakely
parent 9892d6c4d7
commit ed6a9cfc4a

View File

@ -509,9 +509,6 @@ namespace __unicode
constexpr _Iter
_M_curr() const { return _M_first_and_curr._M_curr; }
// Buffer holding the individual code units of the current code point.
array<value_type, 4 / sizeof(_ToFmt)> _M_buf;
// _M_first is not needed for non-bidirectional ranges.
template<typename _It>
struct _First_and_curr
@ -553,13 +550,16 @@ namespace __unicode
// start (or end, for non-forward iterators) of the current code point.
_First_and_curr<_Iter> _M_first_and_curr;
// The end of the underlying input range.
[[no_unique_address]] _Sent _M_last;
// Buffer holding the individual code units of the current code point.
array<value_type, 4 / sizeof(_ToFmt)> _M_buf;
uint8_t _M_buf_index = 0; // Index of current code unit in the buffer.
uint8_t _M_buf_last = 0; // Number of code units in the buffer.
uint8_t _M_to_increment = 0; // How far to advance _M_curr on increment.
// The end of the underlying input range.
[[no_unique_address]] _Sent _M_last;
template<typename _FromFmt2, typename _ToFmt2,
input_iterator _Iter2, sentinel_for<_Iter2> _Sent2,
typename _ErrHandler>