[PATCH RFC 06/11] migration/block: Extend the migration_block

qemu-block

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH RFC 06/11] migration/block: Extend the migration_block_* API to d

From:	Peter Xu
Subject:	[PATCH RFC 06/11] migration/block: Extend the migration_block_* API to dest side
Date:	Tue, 3 Dec 2024 19:51:33 -0500

Extend the usage of such API to the receiver side, because such status
maintenance is also necessary there.

It's needed to avoid bdrv_inactivate_all() not being invoked when the
drives are already inactive.  When invoked, it can crash QEMU.  See the
issue link for more information.

So it means we need to start remember the block activation status not only
on src, but also on dest.

One thing tricky here is, when we need to consider incoming QEMU, we need
to also consider its initial status as incoming side.  Due to the fact that
before switchover the incoming QEMU doesn't own the disks, we need to set
initial state to FALSE for a incoming QEMU.  In this case, postcopy recover
doesn't count.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2395
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/migration.h | 13 +++++++++++++
 migration/migration.c | 13 +++++++------
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/migration/migration.h b/migration/migration.h
index b71c062ad2..f477494c5d 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -380,8 +380,21 @@ struct MigrationState {
      *   switchover phase).  When that happens, we need to be able to
      *   recover the block drive status by re-activating them.
      *
+     * - Currently bdrv_inactivate_all() is not safe to be invoked on top
+     *   of invalidated drives (even if bdrv_activate_all() is actually
+     *   safe to be called any time!).  It means remembering this could
+     *   help migration to make sure it won't invalidate twice in a row,
+     *   crashing QEMU.  It can happen when we migrate a PAUSED VM from
+     *   host1 to host2, then migrate again to host3 without starting it.
+     *   TODO: a cleaner solution is to allow safe invoke of
+     *   bdrv_inactivate_all() at anytime, like bdrv_activate_all().
+     *
      * For freshly started QEMU, block_active is initialized to TRUE
      * reflecting the scenario where QEMU owns block device ownerships.
+     *
+     * For incoming QEMU taking a migration stream, block_active is set to
+     * FALSE reflecting that the incoming side doesn't own the block
+     * devices, not until switchover happens.
      */
     bool block_active;
 
diff --git a/migration/migration.c b/migration/migration.c
index 8f7d09ca84..e01264168f 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -701,6 +701,12 @@ migration_incoming_state_setup(MigrationIncomingState 
*mis, Error **errp)
         return false;
     }
 
+    /*
+     * Newly setup QEMU, prepared for incoming migration.  Mark the block
+     * active state to reflect that the src currently owns the disks.
+     */
+    migrate_get_current()->block_active = false;
+
     migrate_set_state(&mis->state, current, MIGRATION_STATUS_SETUP);
     return true;
 }
@@ -779,7 +785,6 @@ static void qemu_start_incoming_migration(const char *uri, 
bool has_channels,
 
 static void process_incoming_migration_bh(void *opaque)
 {
-    Error *local_err = NULL;
     MigrationIncomingState *mis = opaque;
 
     trace_vmstate_downtime_checkpoint("dst-precopy-bh-enter");
@@ -810,11 +815,7 @@ static void process_incoming_migration_bh(void *opaque)
              * Make sure all file formats throw away their mutable
              * metadata.  If error, don't restart the VM yet.
              */
-            bdrv_activate_all(&local_err);
-            if (local_err) {
-                error_report_err(local_err);
-                local_err = NULL;
-            } else {
+            if (migration_block_activate(migrate_get_current())) {
                 vm_start();
             }
         } else {
-- 
2.47.0

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH RFC 00/11] migration/block: disk activation rewrite, Peter Xu, 2024/12/03
- [PATCH RFC 03/11] migration/block: Apply late-block-active behavior to postcopy, Peter Xu, 2024/12/03
- [PATCH RFC 04/11] migration/block: Fix possible race with block_inactive, Peter Xu, 2024/12/03
- [PATCH RFC 02/11] migration/block: Make late-block-active the default, Peter Xu, 2024/12/03
- [PATCH RFC 01/11] migration: Add helper to get target runstate, Peter Xu, 2024/12/03
- [PATCH RFC 05/11] migration/block: Merge block reactivations for fail/cancel, Peter Xu, 2024/12/03
- [PATCH RFC 06/11] migration/block: Extend the migration_block_* API to dest side, Peter Xu <=
  - Re: [PATCH RFC 06/11] migration/block: Extend the migration_block_* API to dest side, Peter Xu, 2024/12/04
- [PATCH RFC 08/11] tests/qtest/migration: Move more code under only_target, Peter Xu, 2024/12/03
- [PATCH RFC 07/11] migration/block: Apply the migration_block_* API to postcopy, Peter Xu, 2024/12/03
- [PATCH RFC 09/11] tests/qtest/migration: Don't use hardcoded strings for -serial, Peter Xu, 2024/12/03
  - Re: [PATCH RFC 09/11] tests/qtest/migration: Don't use hardcoded strings for -serial, Fabiano Rosas, 2024/12/04
- [PATCH RFC 10/11] tests/qtest/migration: Support cleaning up only one side of migration, Peter Xu, 2024/12/03
- [PATCH RFC 11/11] tests/qtest/migration: Test successive migrations, Peter Xu, 2024/12/03
- Re: [PATCH RFC 00/11] migration/block: disk activation rewrite, Peter Xu, 2024/12/04

Prev by Date: [PATCH RFC 05/11] migration/block: Merge block reactivations for fail/cancel
Next by Date: [PATCH RFC 08/11] tests/qtest/migration: Move more code under only_target
Previous by thread: [PATCH RFC 05/11] migration/block: Merge block reactivations for fail/cancel
Next by thread: Re: [PATCH RFC 06/11] migration/block: Extend the migration_block_* API to dest side
Index(es):
- Date
- Thread