[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 2/4] block: add block topology options
From: |
Jamie Lokier |
Subject: |
Re: [Qemu-devel] [PATCH 2/4] block: add block topology options |
Date: |
Fri, 5 Feb 2010 17:16:53 +0000 |
User-agent: |
Mutt/1.5.13 (2006-08-11) |
Christoph Hellwig wrote:
> Note that not the physical block size attribute can
> we a data integrity issue, though.
I agree.
> A storage device guarnatees that it can write a sector atomically,
I've looked everywhere for confirmation of *atomicity*, and so far all
I've seen are rumours. Some people believe sector writes are atomic
during power failure, some people believe they are not. Those who
believe it is, don't have reliable references, and I haven't seen it
in any standard.
SQLite has a flag which can be set for a backing store to say block
writes are atomic, to enable it to optimise some things; the flag is
not set when writing to a disk block device, because they didn't find
confirmation of it.
> so moving from a 4k to a 512 byte physcical sector device could lead
> to not beeing able to atomically write a 4k piece of data that the
> guest expects to write atomcially.
If there is no confirmation that sector writes are atomic, then no
database or filesystem should be relying on that property anyway.
> I'm not sure how failure safe the read-modify-write algorithms on
> 4k sector disks with logical 512 bye blocks are, but I'd expect issues
> there, too.
I think you might be referring to what I'm calling "radius of
destruction", because I don't know if there's a well known term for it.
By that I mean if you write 512 bytes, and it's implemented as RMW to
a 4k sector, then on power failure any part of the 4k sector could be
corrupted.
On some RAIDs the size is much larger; also on many flash devices.
(RAIDs make it clearer that the alignment is relevant too.)
Note that if 4k sector writes were _atomic_, then read-modify-write of
512 bytes would be completely reliable.
> > Even if you just convert between qcow2 and a raw block device, or the
> > other way, you'll sometimes want to be sure it's not guest-visible.
>
> The image format has no hooks into these options currently.
No, but whatever is reported to the guest, you may device you want it
to continue being reported to the guest after doing the convert
operation. Even if it's a data integrity concern. In fact
*precisely* when the guest has algorithms which write differently
depending on the sector size, for integrity, that means changing the
guest-visible sector size may trigger bugs and other changes that you
sometimes don't want.
I agree that it should report the size by default, though, because the
integrity concern.
-- Jamie