19 Commits

Author SHA1 Message Date
d5ba6f71b9 Tag push_warning_printf with ATTRIBUTE_FORMAT
* Let GCC `-Wformat` check formats sent to these `my_vsnprintf_ex` users
* Migrate them from the old extension specifiers
  to the new `-Wformat`-compatible suffixes
2025-02-12 10:17:44 +01:00
9799777992 MDEV-35919 Server crashes in Item_func_vec_distance::fix_length_and_dec upon reading from I_S table 2025-02-11 20:31:42 +01:00
9ee09a33bb Merge branch '11.7' into 11.8 2025-02-11 20:29:43 +01:00
2b17265ae2 MDEV-35186 IGNORED attribute has no effect on vector keys 2025-02-10 12:22:05 +01:00
a2f0234c82 MDEV-36011 Server crashes in Charset::mbminlen / Item_func_vec_fromtext::val_str upon mixing vector type with string 2025-02-06 21:47:01 +01:00
e11592aed3 MDEV-35450 VEC_DISTANCE() function to autouse the available index type 2025-01-21 12:18:56 +01:00
528249a20a cleanup: one Item_func_vec_distance class, not three
prepare for MDEV-35450 VEC_DISTANCE auto-detection
2025-01-21 12:18:56 +01:00
8988decbfe MDEV-35220 Assertion `!item->null_value' failed upon VEC_TOTEXT call
don't forget to reset null_value for each row
2024-11-05 14:00:52 -08:00
1a53048299 MDEV-35215 ASAN errors in Item_func_vec_fromtext::val_str upon VEC_FROMTEXT with an invalid argument 2024-11-05 14:00:52 -08:00
e020a3a2ce MDEV-35210 Vector type cannot store values which VEC_FromText produces and VEC_ToText accepts
let VEC_FromText validate that the vector l2squared isn't NaN.
VEC_ToText still prints everything.
2024-11-05 14:00:52 -08:00
f336b10bb1 MDEV-35212 Server crashes in Item_func_vec_fromtext::val_str upon query from empty table 2024-11-05 14:00:52 -08:00
88119addff Vec_ToText was underestimating max_length of the result
switch to a more predictable, shorter, and more correct output
that is, print as many significant digits as necessary.
but not more (they'd be just zeros) and not less (it'd lose precision)
2024-11-05 14:00:51 -08:00
a471389d07 MDEV-34970 Vector search fails to compile on s390x
add missing casts to float4store/float8store for bigendian.
fix a typo in float4store() usage
remove unnecessary one-byte-at-time appends
2024-11-05 14:00:50 -08:00
2ad9df8c9b VEC_Distance_Cosine() 2024-11-05 14:00:50 -08:00
2e1fcc6a80 rename VEC_Distance to VEC_Distance_Euclidean
and create a parent Item_func_vec_distance_common class
2024-11-05 14:00:50 -08:00
eec1339f5d MDEV-32886 Vec_FromText and Vec_ToText
This commit introduces two utility functions meant to make working with
vectors simpler.

Vec_ToText converts a binary vector into a json array of numbers
(floats).
Vec_FromText takes in a json array of numbers and converts it into a
little-endian IEEE float sequence of bytes (4 bytes per float).
2024-11-05 14:00:49 -08:00
88839e71a3 Initial HNSW implementation
This commit includes the work done in collaboration with Hugo Wen from
Amazon:

    MDEV-33408 Alter HNSW graph storage and fix memory leak

    This commit changes the way HNSW graph information is stored in the
    second table. Instead of storing connections as separate records, it now
    stores neighbors for each node, leading to significant performance
    improvements and storage savings.

    Comparing with the previous approach, the insert speed is 5 times faster,
    search speed improves by 23%, and storage usage is reduced by 73%, based
    on ann-benchmark tests with random-xs-20-euclidean and
    random-s-100-euclidean datasets.

    Additionally, in previous code, vector objects were not released after
    use, resulting in excessive memory consumption (over 20GB for building
    the index with 90,000 records), preventing tests with large datasets.
    Now ensure that vectors are released appropriately during the insert and
    search functions. Note there are still some vectors that need to be
    cleaned up after search query completion. Needs to be addressed in a
    future commit.

    All new code of the whole pull request, including one or several files
    that are either new files or modified ones, are contributed under the
    BSD-new license. I am contributing on behalf of my employer Amazon Web
    Services, Inc.

As well as the commit:

    Introduce session variables to manage HNSW index parameters

    Three variables:

    hnsw_max_connection_per_layer
    hnsw_ef_constructor
    hnsw_ef_search

    ann-benchmark tool is also updated to support these variables in commit
    https://github.com/HugoWenTD/ann-benchmarks/commit/e09784e for branch
    https://github.com/HugoWenTD/ann-benchmarks/tree/mariadb-configurable

    All new code of the whole pull request, including one or several files
    that are either new files or modified ones, are contributed under the
    BSD-new license. I am contributing on behalf of my employer Amazon Web
    Services, Inc.

Co-authored-by: Hugo Wen <wenhug@amazon.com>
2024-11-05 14:00:48 -08:00
d6add9a03d initial support for vector indexes
MDEV-33407 Parser support for vector indexes

The syntax is

  create table t1 (... vector index (v) ...);

limitation:
* v is a binary string and NOT NULL
* only one vector index per table
* temporary tables are not supported

MDEV-33404 Engine-independent indexes: subtable method

added support for so-called "high level indexes", they are not visible
to the storage engine, implemented on the sql level. For every such
an index in a table, say, t1, the server implicitly creates a second
table named, like, t1#i#05 (where "05" is the index number in t1).
This table has a fixed structure, no frm, not accessible directly,
doesn't go into the table cache, needs no MDLs.

MDEV-33406 basic optimizer support for k-NN searches

for a query like SELECT ... ORDER BY func() optimizer will use
item_func->part_of_sortkey() to decide what keys can be used
to resolve ORDER BY.
2024-11-05 14:00:48 -08:00
9ccf02a9a7 MDEV-32885 VEC_DISTANCE() function 2024-11-05 14:00:48 -08:00