mirror of
https://github.com/MariaDB/server.git
synced 2026-01-13 06:03:12 +00:00
Atomic CREATE OR REPLACE allows to keep an old table intact if the
command fails or during the crash. That is done by renaming the
original table to temporary name, as a backup and restoring it if the
CREATE fails. When the command is complete and logged the backup
table is deleted.
Atomic replace algorithm
Two DDL chains are used for CREATE OR REPLACE:
ddl_log_state_create (C) and ddl_log_state_rm (D).
1. (C) Log rename of ORIG to TMP table (Rename TMP to original).
2. Rename orignal to TMP.
3. (C) Log CREATE_TABLE_ACTION of ORIG (drops ORIG);
4. Do everything with ORIG (like insert data)
5. (D) Log drop of TMP
6. Write query to binlog (this marks (C) to be closed in
case of failure)
7. Execute drop of TMP through (D)
8. Close (C) and (D)
If there is a failure before 6) we revert the changes in (C)
Chain (D) is only executed if 6) succeded (C is closed on
crash recovery).
Foreign key errors will be found at the 1) stage.
Additional notes
- CREATE TABLE without REPLACE and temporary tables is not affected
by this commit.
set @@drop_before_create_or_replace=1 can be used to
get old behaviour where existing tables are dropped
in CREATE OR REPLACE.
- CREATE TABLE is reverted if binlogging the query fails.
- Engines having HTON_EXPENSIVE_RENAME flag set are not affected by
this commit. Conflicting tables marked with this flag will be
deleted with CREATE OR REPLACE.
- Replication execution is not affected by this commit.
- Replication will first drop the conflicting table and then
creating the new one.
- CREATE TABLE .. SELECT XID usage is fixed and now there is no need
to log DROP TABLE via DDL_CREATE_TABLE_PHASE_LOG (see comments in
do_postlock()). XID is now correctly updated so it disables
DDL_LOG_DROP_TABLE_ACTION. Note that binary log is flushed at the
final stage when the table is ready. So if we have XID in the
binary log we don't need to drop the table.
- Three variations of CREATE OR REPLACE handled:
1. CREATE OR REPLACE TABLE t1 (..);
2. CREATE OR REPLACE TABLE t1 LIKE t2;
3. CREATE OR REPLACE TABLE t1 SELECT ..;
- Test case uses 6 combinations for engines (aria, aria_notrans,
myisam, ib, lock_tables, expensive_rename) and 2 combinations for
binlog types (row, stmt). Combinations help to check differences
between the results. Error failures are tested for the above three
variations.
- expensive_rename tests CREATE OR REPLACE without atomic
replace. The effect should be the same as with the old behaviour
before this commit.
- Triggers mechanism is unaffected by this change. This is tested in
create_replace.test.
- LOCK TABLES is affected. Lock restoration must be done after new
table is created or TMP is renamed back to ORIG
- Moved ddl_log_complete() from send_eof() to finalize_ddl(). This
checkpoint was not executed before for normal CREATE TABLE but is
executed now.
- CREATE TABLE will now rollback also if writing to the binary
logging failed. See rpl_gtid_strict.test
backup ddl log changes
- In case of a successfull CREATE OR REPLACE we only log
the CREATE event, not the DROP TABLE event of the old table.
ddl_log.cc changes
ddl_log_execute_action() now properly return error conditions.
ddl_log_disable_entry() added to allow one to disable one entry.
The entry on disk is still reserved until ddl_log_complete() is
executed.
On XID usage
Like with all other atomic DDL operations XID is used to avoid
inconsistency between master and slave in the case of a crash after
binary log is written and before ddl_log_state_create is closed. On
recovery XIDs are taken from binary log and corresponding DDL log
events get disabled. That is done by
ddl_log_close_binlogged_events().
On linking two chains together
Chains are executed in the ascending order of entry_pos of execute
entries. But entry_pos assignment order is undefined: it may assign
bigger number for the first chain and then smaller number for the
second chain. So the execution order in that case will be reverse:
second chain will be executed first.
To avoid that we link one chain to another. While the base chain
(ddl_log_state_create) is active the secondary chain
(ddl_log_state_rm) is not executed. That is: only one chain can be
executed in two linked chains.
The interface ddl_log_link_chains() was defined in "MDEV-22166
ddl_log_write_execute_entry() extension".
Atomic info parameters in HA_CREATE_INFO
Many functions in CREATE TABLE pass the same parameters. These
parameters are part of table creation info and should be in
HA_CREATE_INFO (or whatever). Passing parameters via single
structure is much easier for adding new data and
refactoring.
InnoDB changes
Added ha_innobase::can_be_renamed_to_backup() to check if
a table with foreign keys can be renamed.
Aria changes:
- Fixed issue in Aria engine with CREATE + locked tables
that data was not properly commited in some cases in
case of crashes.
Other changes:
- Removed some auto variables in log.cc for better code readability.
- Fixed old bug that CREATE ... SELECT would not be able to auto repair
a table that is part of the SELECT.
- Marked MyISAM that it does not support ROLLBACK (not required but
done for better consistency with other engines).
Known issues:
- InnoDB tables with foreign key definitions are not fully supported
with atomic create and replace:
- ha_innobase::can_be_renamed_to_backup() can detect some cases
where InnoDB does not support renaming table with foreign key
constraints. In this case MariaDB will drop the old table before
creating the new one.
The detected cases are:
- The new and old table is using the same foreign key constraint
name.
- The old table has self referencing constraints.
- If the old and new table uses the same name for a constraint the
create of the new table will fail. The orignal table will be
restored in this case.
- The above issues will be fixed in a future commit.
- CREATE OR REPLACE TEMPORARY table is not full atomic. Any conflicting
table will always be dropped before creating a new one. (Old behaviour).
Bug fixes related to this MDEV:
MDEV-36435 Assertion failure in finalize_locked_tables()
MDEV-36439 Assertion `thd_arg->lex->sql_command != SQLCOM_CREATE_SEQUENCE...
MDEV-36498 Failed CoR in non-atomic mode no longer generates DROP in RBR...
MDEV-36508 Temporary files #sql-create-....frm occasionally stay after
crash recovery
MDEV-38479 Crash in CREATE OR REPLACE SEQUENCE when new sequence cannot
be created
MDEV-36497 Assertion failure after atomic CoR with Aria under lock in
transactional context
InnoDB related changes:
- ha_innodb::rename_table() does not handle foreign key constraint
when renaming an normal table to internal tempory tables. This
causes problems for CREATE OR REPLACE as the old constraints causes
failure when creating a new table with the same constraints.
This is fixed inside InnoDB by not threating tempfiles (#sql-create-..),
created as part of CREATE OR REPLACE, as temporary files.
- In ha_innobase::delete_table(), ignore checking of constraints when
dropping a #sql-create temporary table.
- In tablename_to_filename() and filename_to_tablename(), don't do
filename conversion for internal temporary tables (#sql-...)
Other things:
- maria_create_trn_for_mysql() does not register a new transaction
handler for commits. This was needed to ensure create or replace
will not end with an active transaction.
- We do not get anymore warnings about "Engine not supporting atomic
create" when doing a legal CREATE OR REPLACE on a table with
foreign key constraints.
- Updated VIDEX engine flags to disable CREATE SEQUENCE.
Reverted commits:
MDEV-36685 "CREATE-SELECT may lose in binlog side-effects of
stored-routine" as it did not take into account that it safe to clear
binlogs if the created table is non transactional and there are no
other non transactional tables used.
- This was done because it caused extra logging when it is not needed
(not using any non transactional tables) and it also did not solve
side effects when using statement based loggging.
242 lines
10 KiB
C++
242 lines
10 KiB
C++
/* Copyright (c) 2006, 2014, Oracle and/or its affiliates.
|
|
Copyright (c) 2011, 2017, MariaDB Corporation.
|
|
|
|
This program is free software; you can redistribute it and/or modify
|
|
it under the terms of the GNU General Public License as published by
|
|
the Free Software Foundation; version 2 of the License.
|
|
|
|
This program is distributed in the hope that it will be useful,
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
GNU General Public License for more details.
|
|
|
|
You should have received a copy of the GNU General Public License
|
|
along with this program; if not, write to the Free Software
|
|
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1335 USA */
|
|
|
|
#ifndef SQL_TABLE_INCLUDED
|
|
#define SQL_TABLE_INCLUDED
|
|
|
|
#include <my_sys.h> // pthread_mutex_t
|
|
#include "m_string.h" // LEX_CUSTRING
|
|
#include "lex_charset.h"
|
|
#include "lex_ident.h"
|
|
|
|
#define ERROR_INJECT(code) \
|
|
((DBUG_IF("crash_" code) && (DBUG_SUICIDE(), 0)) || \
|
|
(DBUG_IF("fail_" code) && (my_error(ER_UNKNOWN_ERROR, MYF(0)), 1)))
|
|
|
|
class Alter_info;
|
|
class Alter_table_ctx;
|
|
class Column_definition;
|
|
class Create_field;
|
|
struct TABLE_LIST;
|
|
class THD;
|
|
struct TABLE;
|
|
struct handlerton;
|
|
class handler;
|
|
class String;
|
|
typedef struct st_ha_check_opt HA_CHECK_OPT;
|
|
struct HA_CREATE_INFO;
|
|
struct Table_specification_st;
|
|
typedef struct st_key KEY;
|
|
typedef struct st_key_cache KEY_CACHE;
|
|
typedef struct st_lock_param_type ALTER_PARTITION_PARAM_TYPE;
|
|
typedef struct st_order ORDER;
|
|
typedef struct st_ddl_log_state DDL_LOG_STATE;
|
|
extern LEX_CSTRING generated_by_server;
|
|
|
|
enum enum_explain_filename_mode
|
|
{
|
|
EXPLAIN_ALL_VERBOSE= 0,
|
|
EXPLAIN_PARTITIONS_VERBOSE,
|
|
EXPLAIN_PARTITIONS_AS_COMMENT
|
|
};
|
|
|
|
|
|
/* depends on errmsg.txt Database `db`, Table `t` ... */
|
|
#define EXPLAIN_FILENAME_MAX_EXTRA_LENGTH 63
|
|
|
|
/* See mysql_write_frm function comment for explanations of these flags */
|
|
#define WFRM_WRITE_SHADOW 1
|
|
#define WFRM_INSTALL_SHADOW 2
|
|
#define WFRM_KEEP_SHARE 4
|
|
#define WFRM_WRITE_CONVERTED_TO 8
|
|
#define WFRM_BACKUP_ORIGINAL 16
|
|
#define WFRM_ALTER_INFO_PREPARED 32
|
|
|
|
/* Flags for conversion functions. */
|
|
static constexpr uint FN_FROM_IS_TMP= 1 << 0;
|
|
static constexpr uint FN_TO_IS_TMP= 1 << 1;
|
|
static constexpr uint FN_IS_TMP= FN_FROM_IS_TMP | FN_TO_IS_TMP;
|
|
/* Remove .frm table metadata. */
|
|
static constexpr uint QRMT_FRM= 1 << 2;
|
|
/* Remove .par partitioning metadata. */
|
|
static constexpr uint QRMT_PAR= 1 << 3;
|
|
/* Remove handler files and high-level indexes. */
|
|
static constexpr uint QRMT_HANDLER= 1 << 4;
|
|
/* Default behaviour is to drop .FRM and handler, but not .par. */
|
|
static constexpr uint QRMT_DEFAULT= QRMT_FRM | QRMT_HANDLER;
|
|
/** Don't resolve MySQL's fake "foo.sym" symbolic directory names. */
|
|
static constexpr uint SKIP_SYMDIR_ACCESS= 1 << 5;
|
|
/** Don't check foreign key constraints while renaming table */
|
|
static constexpr uint NO_FK_CHECKS= 1 << 6;
|
|
|
|
uint filename_to_tablename(const char *from, char *to, size_t to_length,
|
|
bool stay_quiet = false);
|
|
uint tablename_to_filename(const char *from, char *to, size_t to_length);
|
|
uint check_n_cut_mysql50_prefix(const char *from, char *to, size_t to_length);
|
|
bool check_mysql50_prefix(const char *name);
|
|
uint build_table_filename(char *buff, size_t bufflen, const char *db,
|
|
const char *table, const char *ext, uint flags);
|
|
uint build_table_shadow_filename(char *buff, size_t bufflen,
|
|
ALTER_PARTITION_PARAM_TYPE *lpt,
|
|
bool backup= false);
|
|
void build_lower_case_table_filename(char *buff, size_t bufflen,
|
|
const LEX_CSTRING *db,
|
|
const LEX_CSTRING *table,
|
|
uint flags);
|
|
uint build_tmptable_filename(THD* thd, char *buff, size_t bufflen);
|
|
void make_tmp_table_name(THD *thd, LEX_STRING *to, const char *prefix);
|
|
bool add_keyword_to_query(THD *thd, String *result, const LEX_CSTRING *keyword,
|
|
const LEX_CSTRING *add);
|
|
|
|
/*
|
|
mysql_create_table_no_lock can be called in one of the following
|
|
mutually exclusive situations:
|
|
|
|
- Just a normal ordinary CREATE TABLE statement that explicitly
|
|
defines the table structure.
|
|
|
|
- CREATE TABLE ... SELECT. It is special, because only in this case,
|
|
the list of fields is allowed to have duplicates, as long as one of the
|
|
duplicates comes from the select list, and the other doesn't. For
|
|
example in
|
|
|
|
CREATE TABLE t1 (a int(5) NOT NUL) SELECT b+10 as a FROM t2;
|
|
|
|
the list in alter_info->create_list will have two fields `a`.
|
|
|
|
- ALTER TABLE, that creates a temporary table #sql-xxx, which will be later
|
|
renamed to replace the original table.
|
|
|
|
- ALTER TABLE as above, but which only modifies the frm file, it only
|
|
creates an frm file for the #sql-xxx, the table in the engine is not
|
|
created.
|
|
|
|
- Assisted discovery, CREATE TABLE statement without the table structure.
|
|
|
|
These situations are distinguished by the following "create table mode"
|
|
values, where a CREATE ... SELECT is denoted by any non-negative number
|
|
(which should be the number of fields in the SELECT ... part), and other
|
|
cases use constants as defined below.
|
|
*/
|
|
#define C_ORDINARY_CREATE 0
|
|
#define C_ASSISTED_DISCOVERY -1
|
|
#define C_ALTER_TABLE -2
|
|
#define C_ALTER_TABLE_FRM_ONLY -3
|
|
|
|
int mysql_create_table_no_lock(THD *thd,
|
|
DDL_LOG_STATE *ddl_log_state,
|
|
DDL_LOG_STATE *ddl_log_state_rm,
|
|
Table_specification_st *create_info,
|
|
Alter_info *alter_info, bool *is_trans,
|
|
int create_table_mode, TABLE_LIST *table);
|
|
|
|
handler *mysql_create_frm_image(THD *thd, HA_CREATE_INFO *create_info,
|
|
Alter_info *alter_info, int create_table_mode,
|
|
KEY **key_info, uint *key_count,
|
|
LEX_CUSTRING *frm);
|
|
|
|
int mysql_discard_or_import_tablespace(THD *thd, TABLE_LIST *table_list,
|
|
bool discard);
|
|
|
|
bool mysql_prepare_alter_table(THD *thd, TABLE *table,
|
|
Table_specification_st *create_info,
|
|
Alter_info *alter_info,
|
|
Alter_table_ctx *alter_ctx);
|
|
bool mysql_trans_prepare_alter_copy_data(THD *thd);
|
|
bool mysql_trans_commit_alter_copy_data(THD *thd);
|
|
bool mysql_alter_table(THD *thd, const LEX_CSTRING *new_db,
|
|
const LEX_CSTRING *new_name,
|
|
Table_specification_st *create_info,
|
|
TABLE_LIST *table_list,
|
|
class Recreate_info *recreate_info,
|
|
Alter_info *alter_info,
|
|
uint order_num, ORDER *order, bool ignore,
|
|
bool if_exists);
|
|
bool mysql_compare_tables(TABLE *table,
|
|
Alter_info *alter_info,
|
|
HA_CREATE_INFO *create_info,
|
|
bool *metadata_equal);
|
|
bool mysql_recreate_table(THD *thd, TABLE_LIST *table_list,
|
|
class Recreate_info *recreate_info,
|
|
bool table_copy);
|
|
bool mysql_rename_table(handlerton *base, const LEX_CSTRING *old_db,
|
|
const LEX_CSTRING *old_name, const LEX_CSTRING *new_db,
|
|
const LEX_CSTRING *new_name, const LEX_CUSTRING *id,
|
|
uint flags);
|
|
bool mysql_backup_table(THD* thd, TABLE_LIST* table_list);
|
|
bool mysql_restore_table(THD* thd, TABLE_LIST* table_list);
|
|
|
|
template<typename T> class List;
|
|
void fill_checksum_table_metadata_fields(THD *thd, List<Item> *fields);
|
|
bool mysql_checksum_table(THD* thd, TABLE_LIST* table_list,
|
|
HA_CHECK_OPT* check_opt);
|
|
bool mysql_rm_table(THD *thd,TABLE_LIST *tables, bool if_exists,
|
|
bool drop_temporary, bool drop_sequence,
|
|
bool dont_log_query);
|
|
int mysql_rm_table_no_locks(THD *thd, TABLE_LIST *tables,
|
|
const LEX_CSTRING *db,
|
|
DDL_LOG_STATE *ddl_log_state,
|
|
bool if_exists,
|
|
bool drop_temporary, bool drop_view,
|
|
bool drop_sequence,
|
|
bool dont_log_query, bool dont_free_locks);
|
|
bool log_drop_table(THD *thd, const LEX_CSTRING *db_name,
|
|
const LEX_CSTRING *table_name, const LEX_CSTRING *handler,
|
|
bool partitioned, const LEX_CUSTRING *id,
|
|
bool temporary_table);
|
|
int get_hlindex_keys_by_open(THD *thd, const LEX_CSTRING *db,
|
|
const LEX_CSTRING *table_name, const char *path,
|
|
uint *keys, uint *total_keys);
|
|
bool quick_rm_table(THD *thd, handlerton *base, const LEX_CSTRING *db,
|
|
const LEX_CSTRING *table_name, uint flags,
|
|
const char *table_path=0);
|
|
void close_cached_table(THD *thd, TABLE *table);
|
|
void sp_prepare_create_field(THD *thd, Column_definition *sql_field);
|
|
bool mysql_write_frm(ALTER_PARTITION_PARAM_TYPE *lpt, uint flags);
|
|
|
|
int write_bin_log_with_stat(THD *thd, bool clear_error,
|
|
char const *query, ulong query_length,
|
|
bool is_trans= FALSE);
|
|
int write_bin_log_with_if_exists(THD *thd, bool clear_error,
|
|
bool is_trans, bool add_if_exists,
|
|
bool commit_alter= false);
|
|
/*
|
|
Write to binlog, but return true only if if write failed
|
|
*/
|
|
inline bool write_bin_log(THD *thd, bool clear_error,
|
|
char const *query, ulong query_length,
|
|
bool is_trans= FALSE)
|
|
{
|
|
int res= write_bin_log_with_stat(thd, clear_error, query, query_length,
|
|
is_trans);
|
|
return res <= 0 ? false : true;
|
|
}
|
|
|
|
void promote_first_timestamp_column(List<Create_field> *column_definitions);
|
|
|
|
/*
|
|
These prototypes where under INNODB_COMPATIBILITY_HOOKS.
|
|
*/
|
|
uint explain_filename(THD* thd, const char *from, char *to, uint to_length,
|
|
enum_explain_filename_mode explain_mode);
|
|
|
|
|
|
extern MYSQL_PLUGIN_IMPORT const Lex_ident_column primary_key_name;
|
|
|
|
bool check_engine(THD *, const char *, const char *, HA_CREATE_INFO *);
|
|
|
|
#endif /* SQL_TABLE_INCLUDED */
|