[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Pan-devel] some musings on speed
From: |
Anatoly Vorobey |
Subject: |
[Pan-devel] some musings on speed |
Date: |
Thu, 10 Jun 2004 02:27:49 +0300 |
User-agent: |
Mutt/1.5.4i |
Hello,
I've tried to investigate why mass-deletions of many articles from large
groups are so slow in Pan (I'm assuming here I'm not the only one who
noticed). When I open a typical binaries group with 180,000 articles
and try to select and delete all of them, the deletion takes about 40
seconds on my machine (P4 2Ghz, otherwise idle) - plus another 5 or so
seconds to update the header pane. I've done some bechmarking and
discovered that only about 3/4 of that time is spent on marking
as purged those articles that are crossposted into other, non-loaded
groups (this typical group had, as usual for binaries, many crossposts).
The actual deletion of Article structs from memory for loaded
groups (both the group I'm deleting from and those crossposted into, but
currently loaded) was much faster by comparison.
That marking is done by calling group_mark_article_purged in group.c for
every article in every crossposted non-loaded group separately; this
uses newsrc_mark_article() in newsrc.c which is itself very slow (as it
inserts/removes ranges inside the array of Range structs, when merging
existing ranges or creating a new one, it shifts the whole tail of the
potentially very large array, every time). So >100,000 calls to a very
slow merging functinon are responsible for most of the delay; and the
delay grows non-linearly with the number of deleted articles.
I've written a patch which optimised purging articles from non-loaded
groups by using a (new) function newsrc_mark_array_read() which inserts
a whole array of numbers to mark into an existing Newsrc object at once.
My patch also optimises the other part of deletion - deletion of
articles from loaded groups - by avoiding the costly and long transition
through MessageIdentifier objects, and by using arrays rather than
hashes to build up lists of articles to remove (the existing code uses
hashes to avoid removing the same article twice, but it can be done much
faster with an extra bit inside the Article struct, which doesn't take
up any new space at all). With my patch, removing 180,000 articles takes
about 5 seconds (not counting the header pane update time), and grows
roughly linearly with time. It's seen
some testing, but is still very raw, and probably has bugs in it. If
anyone wants to review it/try it out/let me know your thoughts on its
design/incorporate it into the source, please feel free to. I've put
the current patch against CVS at
http://www.forum2.org/mellon/patches/pan_delete.patch .
--
avva
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Pan-devel] some musings on speed,
Anatoly Vorobey <=