Richard Sandiford 1f52396c6f aarch64: Tweak handling of general SVE permutes [PR121027]
This PR is partly about a code quality regression that was triggered
by g:caa7a99a052929d5970677c5b639e1fa5166e334.  That patch taught the
gimple optimisers to fold two VEC_PERM_EXPRs into one, conditional
upon either (a) the original permutations not being "native" operations
or (b) the combined permutation being a "native" operation.

Whether something is a "native" operation is tested by calling
can_vec_perm_const_p with allow_variable_p set to false.  This requires
the permutation to be supported directly by TARGET_VECTORIZE_VEC_PERM_CONST,
rather than falling back to the general vec_perm optab.

This exposed a problem with the way that we handled general 2-input
permutations for SVE.  Unlike Advanced SIMD, base SVE does not have
an instruction to do general 2-input permutations.  We do still implement
the vec_perm optab for SVE, but only when the vector length is known at
compile time.  The general expansion is pretty expensive: an AND, a SUB,
two TBLs, and an ORR.  It certainly couldn't be considered a "native"
operation.

However, if a VEC_PERM_EXPR has a constant selector, the indices can
be wider than the elements being permuted.  This is not true for the
vec_perm optab, where the indices and permuted elements must have the
same precision.

This leads to one case where we cannot leave a general 2-input permutation
to be handled by the vec_perm optab: when permuting bytes on a target
with 2048-bit vectors.  In that case, the indices of the elements in
the second vector are in the range [256, 511], which cannot be stored
in a byte index.

TARGET_VECTORIZE_VEC_PERM_CONST therefore has to handle 2-input SVE
permutations for one specific case.  Rather than check for that
specific case, the code went ahead and used the vec_perm expansion
whenever it worked.  But that undermines the !allow_variable_p
handling in can_vec_perm_const_p; it becomes impossible for
target-independent code to distinguish "native" operations from
the worst-case fallback.

This patch instead limits TARGET_VECTORIZE_VEC_PERM_CONST to the
cases that it has to handle.  It fixes the PR for all vector lengths
except 2048 bits.

A better fix would be to introduce some sort of costing mechanism,
which would allow us to reject the new VEC_PERM_EXPR even for
2048-bit targets.  But that would be a significant amount of work
and would not be backportable.

gcc/
	PR target/121027
	* config/aarch64/aarch64.cc (aarch64_evpc_sve_tbl): Punt on 2-input
	operations that can be handled by vec_perm.

gcc/testsuite/
	PR target/121027
	* gcc.target/aarch64/sve/acle/general/perm_1.c: New test.
2025-07-11 16:48:41 +01:00
2025-07-11 08:33:18 +02:00
2025-07-10 00:20:18 +00:00
2025-07-11 00:19:26 +00:00
2025-07-11 00:19:26 +00:00
2025-07-10 00:20:18 +00:00

This directory contains the GNU Compiler Collection (GCC).

The GNU Compiler Collection is free software.  See the files whose
names start with COPYING for copying permission.  The manuals, and
some of the runtime libraries, are under different terms; see the
individual source files for details.

The directory INSTALL contains copies of the installation information
as HTML and plain text.  The source of this information is
gcc/doc/install.texi.  The installation information includes details
of what is included in the GCC sources and what files GCC installs.

See the file gcc/doc/gcc.texi (together with other files that it
includes) for usage and porting information.  An online readable
version of the manual is in the files gcc/doc/gcc.info*.

See http://gcc.gnu.org/bugs/ for how to report bugs usefully.

Copyright years on GCC source files may be listed using range
notation, e.g., 1987-2012, indicating that every year in the range,
inclusive, is a copyrightable year that could otherwise be listed
individually.
Description
No description provided
Readme 3.9 GiB
Languages
C++ 31%
C 30.2%
Ada 15%
D 6.3%
Go 5.9%
Other 11.1%