[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Pan-devel] Re: [Pan-users] RFC: Detecting multiparts (was: .94 weirdnes
From: |
Douglas Bollinger |
Subject: |
[Pan-devel] Re: [Pan-users] RFC: Detecting multiparts (was: .94 weirdness with detecting attachments) |
Date: |
Sat, 9 Aug 2003 10:16:31 -0400 |
On Fri, 8 Aug 2003 10:49:09 -0700
Charles Kerr <address@hidden> wrote:
> Here's a rough draft for a better detection scheme. I'm posting it here
> so that people can refine it and/or shoot holes in it.
OK. I'm cleaning out my shotgun now. :)
<snip>
> * likely_binary_group is true if the newsgroup name contains
> any of: "binaries", "fan", "mag", "sex", false otherwise
Probably true. I can't recall d/l'ed a binary that wasn't in the binary
hiecharcy, but it might be nice to be able to force a binary d/l just in
case.
> * likely_binary_subject is true if the Subject: header contains
> any of: "jpeg" "jpg" "gif" "tiff" "png", false otherwise
Not for me. Lately I've been spending some time in
alt.binaries.multimedia.scfi and nothing in there uses any of the file
extensions. Usually there are .mpg or .avi posted in the .rar format with
.par backups.
> * part = 0, or if either "(x/y)" or "[x/y]" is in Subject:, then x.
> (Work backwards from the end of the string, in case someone's
> posting a set of multiparts and (x/y) appears in the Subject: twice)
>
> * parts = 0, or if either "(x/y)" or "[x/y]" is in Subject:, then y.
> (Work backwards here too)
>
> * lines = number of lines in article
>
> * is_reply = true if Subject: begins with "Re:", false otherwise
>
> * is_binary: true or false. This is what we're trying to guess.
Here's a header from a movie Pan d/l'ed fine a few days ago. Actually this
is a good test case, because it's a huge d/l with multipart .rar files,
par files, etc.
Ultraman (Hayata) (Disc 1 of 6: Eps 1-5) - [048/181] - Ultraman - Episode
02.part01.PAR (1/1)
Upto the first apostrophe, it's just subject identification which Pan
should ignore. The next bit in the brackets [048/181], is the file upload
number, which Pan should ignore again. Finally the last bit (1/1) is
important, as it means all the file is in only one posting.
Ultraman (Hayata) (Disc 1 of 6: Eps 1-5) - [062/181] - Ultraman - Episode
02.part14.rar (01/45)
There's another bit. Part of one of the rar files I needed to d/l.
Ultraman (Hayata) (Disc 1 of 6: Eps 1-5) - [038/181] - Ultraman - Episode
02.part01.P01 (01/45)
And that's one of the parity files.
Everything d/l fine with Pan 0.14.0.91.
> UNLESS: once in a blue moon people will post binaries as follow-ups,
> so hedge our bets:
> leave is_binary as true if lines > 500.
>
> 5. if is_binary is true,
> and the subject contains any of: "Frequently Asked Questions",
> "FAQ", "Weekly", "Monthly", then it's a FAQ or periodic posting
> being posted in pieces. set is_binary to false.
Right now I can force an article read by choosing "read" from the menu. If
you could somehow have a switch (in the menu or keypress) that turns binary
on/off for a thread it might catch the "Once in a blue moon" cases easier.
--
"Even if you're on the right track, you'll get run over if you just sit there."
-- Will Rogers