[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [mtools] Short filenames, codepages and possible mtools/kernel bug
From: |
Alain Knaff |
Subject: |
Re: [mtools] Short filenames, codepages and possible mtools/kernel bug |
Date: |
Mon, 29 May 2006 12:57:35 +0200 |
User-agent: |
Thunderbird 1.5 (X11/20051201) |
David C Niemi wrote:
This probably goes back to a shortcut I took in 1994 to get VFAT working
on Mtools. As you may know VFAT uses 16-bit Unicode, and I assumed that
the high bits would always be zero. So there's no support for special
code pages unless Alain has since added it.
However, mounting the floppy with the kernel MSDOS file system support
is a totally separate implementation, by different people.
DCN
Nope, I didn't add any full Unicode support since then...
However, the problem still is weird, because for Ç you don't need
unicode. ISO-8859-1, which is supported, should be enough.
VFAT uses a constant 2 byte format for its unicode (UCS-2?), and in this
representation, all ISO-8859-1 characters (which include Ç) have their
high byte equal to zero.
The same is not true with variable-length unicode encoding (UTF-8),
which add an escape byte to all characters from 0x80 to 0xff.
I tried reproducing the problem here, but I do get a Ç as I should.
[...]
Then I swap over to Linux, and run "mdir a:". What I now see is:
AB�DE TXT 0 2006-05-28 16:00 AB�DE.TXT
1 file 0 bytes
1 457 664 bytes free
It's not necessarily an mtools problem, it could also be a terminal
(konsole, gterm, ...) issue.
Try doing mdir a: | hexdump -C
If you see C7 for the Ç, it is ok (and the mess up only happened on
display), if something else, then it is indeed an mtools bug.
(the capital C cedilla has been replaced by a tiny white question mark
inside a black diamond/lozenge). Just to check, I mount the filesystem
using the following command:
mount -t msdos -o codepage=850 /dev/fd0 temp
Try mount -t vfat instead to get long names and extended characters)
Then, ls shows me a question mark where the capital C cedilla should be.
That's an ls issue (not an msdos/vfat filesystem issue). Ls replaces,
_on_display_ , those characters that it thinks are unprintable with
question marks. Depending on your settings (LANG, LC_CTYPE and LC_ALL
environment variables), ls may think that the Ç is an unprintable
character, and replace it by a question mark. This even happens on
native Linux filesystems (reiserfs, etc...). Try it by creating a file
with a Ç in it, and then doing ls.
I've found that with LC_ALL=en_US , the Ç is displayed correctly.
If that doesn't help, try ls -b instead. Ls -b substitutes "unprintable"
characters with their octal code (Should be \307 in case of Ç).
[...]
25F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
2600 E5 41 00 42 00 C7 00 44 00 45 00 0F 00 19 2E 00
^^
.A.B...D.E......
The C7 is the proper (unicode, iso-8859-1) representation for Ç, so
everything should be ok there...
2610 54 00 58 00 54 00 00 00 FF FF 00 00 FF FF FF FF
T.X.T...........
2620 E5 42 80 44 45 20 20 20 54 58 54 20 00 30 03 80
.B.DE TXT .0..
2630 BC 34 BC 34 00 00 04 80 BC 34 00 00 00 00 00 00
.4.4.....4......
2640 41 41 00 42 00 C7 00 44 00 45 00 0F 00 19 2E 00
AA.B...D.E......
2650 54 00 58 00 54 00 00 00 FF FF 00 00 FF FF FF FF
T.X.T...........
2660 41 42 80 44 45 20 20 20 54 58 54 20 00 30 03 80
AB.DE TXT .0..
2670 BC 34 BC 34 00 00 04 80 BC 34 00 00 00 00 00 00
.4.4.....4......
2680 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
................
I assume that the "80"s in between the "42"s and the "44"s are my
missing capital C cedillas (both codepages 437 and 850 list the capital
C cedilla as occupying point 80 hex).
The 80 is indeed very confusing. At first I was confused by this too, as
I assumed this to be an "unknown character" placeholder.
However, after further analysis, I noticed that 0x80 is indeed the
correct legacy MS-DOS code for Ç, as surprising as it sounds. (MS-Dos
didn't use standard ISO-8859-1, but its own proprietary encoding, as
specified in the codepage...)
If you use a different example than Ç (such as for example é), you see a
different code there.
In case it helps, I've left a truncated binary disk image of the
diskette here:
http://www.carbon.eclipse.co.uk/msdosfs.diskImage
Just tried to do an mdir on it... and indeed, it showed me Ç:
> mdir -i msdosfs.diskImage ::
Volume in drive : has no label
Volume Serial Number is 2C2F-5EDB
Directory for ::/
ABÇDE TXT 0 2006-05-28 16:00 ABÇDE.TXT
1 file 0 bytes
1 457 664 bytes free
Could anyone please tell me whether this is my error, or is it a bug
(possibly in mtools, possibly in the kernel)?
It suspect the error might be in the terminal program that you are using
(which might be set to display UTF-8. Try changing that to ISO-8859-1
a.k.a Iso-Latin-1)
Regards,
Alain
_______________________________________________
mtools mailing list
address@hidden
http://www.tux.org/mailman/listinfo/mtools