[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Pan-devel] SQLite versioning and timing issues. Was: [Pan-users] 0.14
From: |
Duncan |
Subject: |
[Pan-devel] SQLite versioning and timing issues. Was: [Pan-users] 0.14.0 Slow on large group |
Date: |
Mon, 30 Jun 2003 15:55:36 -0700 |
User-agent: |
KMail/1.5.2 |
[I've taken the liberty to post this to the devel group/list as well. Please
redirect replies to one OR the other, as appropriate, deleting the other
group/list.]
On Sun 29 Jun 2003 16:58, address@hidden posted as excerpted below:
> I've suggested this several times. If you ever take a look at a windows
> News reader called "Xnews" it does just that, and its a big help for
> large groups.
Upside down posting.. Grumble, grumble... Yes, I know you were just
following the person you replied to, but this applies to all the replies so
far.. Replies should be below what they quote, so one can tell what the
reply is about. That also encourages proper editing of the quote so a two
line reply isn't following 200 lines of quote with little to do with the
reply.
> > I don't know anything about the internal handling of headers for a large
> > group like this, but I've seen the same degradation with groups with
> > anything over 200,000 headers.
Yes.. This is a known problem with PAN. To some extent it has to do with the
GTK widgets used, which have been a problem for PAN for some time. Those
widgets were apparently designed for some several hundred to several thousand
sorted items max, nothing over 100K, certainly, and it definitely shows.
> > In thinking about this problem, what about splitting the headers into
> > chunks of 50,000 to sort them, and then take the sorted headers and
> > merge them. This should allow you to thread the individual sorts as
> > they were read, and merging pre-sorted headers should be fairly linear.
This is an interesting idea. I'm not an active PAN developer (yet, anyway),
so don't know if chunking as described here has been tried or not, but if
not, it would certainly be worth a try.
FWIW, one thing that may help some is huge amounts of memory. According to
the posts here, one guy with was it a gig, or 3/4 gig? of memory, was
complaining when he tried to load several multi-hundred-K overview groups at
a time. That seems far better than the single group that gives me problems
at a couple hundred-K overviews, with 1/2 a gig (512M). Those with 256M
memory report PAN choking at 100K overviews. Thus, memory DOES seem to play
a fairly large part in PAN's performance with huge groups.
I guess one conclusion that can be drawn from the above is that folks with
less than half a gig of memory should choose a tool other than PAN if they
are going to be working with groups of several hundred K overviews, at least
at this time, unfortunately..
Looking toward the future, one of the BIG projects awaiting the developers is
the transfer to a different database back end. Right now, PAN handles all
the sorting and storage basically on its own, and is somewhat limited in the
features that can be added and the size of the groups handled (as the above
demonstrates), without making the code hugely unmanagable due to growth of
code and duplication of function of what COULD be handed off to a database
library specializing in management of large numbers of data points.
Honestly, the current code is unlikely to undergo any huge changes in that
area, when we (tho I'm not a developer on PAN yet, I include myself as a PAN
regular both here and on the devel list) are planning to offload much of that
processing to a library in the not-to-distant future.
The library that's been mentioned is SQLite, which, as the name suggests, is a
lite SQL style database library designed for inclusion by apps such as PAN.
This will bring several benefits. Hopefully, it will make processing of
large numbers of overviews much more efficient as it's designed for that sort
of larger quantity of datapoints handling, making it topical for this thread.
In addition and dependent on that, it should enable virtual servers and
virtual groups similar to the way BNR2 handles things, among other oft
requested features.
(BTW, BNR2, which has both Linux and MSWormOS binaries, based on Borland
Delphi/Kylex, just had a new Linux release, bringing it upto date with the
MSWormOS release. It had been several releases behind. Unfortunately, BNR2
isn't fully open source, nor could it be, based as it is on the proprietary
Kylex. Still, it does some stuff that no one else does as well, managing
multiple servers and multiple groups and allowing one to combine several into
a single virtual server or virtual group view. If you are pragmatic in your
approach to open source, BNR2 may well be the way to go now, for such huge
newsgroups and other high end binary group features.)
Anyway, back to SQLite and PAN. In addition to the above "virtual" features
and hopefully far more efficient handling of large groups upon which those
virtual features depend, SQLite is MySQL and perhaps other DB compatible, in
its data stores. Thus, once PAN's transfer to it is complete, those wishing
even MORE power will be able to integrate PAN into a larger database based
framework.
The catch in all this is that the back end rewrite to merge SQLite into PAN
will likely be the biggest and deepest core modification project PAN has ever
undergone. If you were around for the transfer to GTK2, and for the
introduction of scoring, you may realize what this means, but on a larger
scale. PAN will likely be rather unstable and possibly lose some current
functionality temporarily, during the transfer, and may not be fully stable
and with full functionality for several "stable" point releases afterward.
However, once it's done, entire new realms of possibilities will be opened,
and PAN will likely enter a whole new feature and performance domain.
I do not, however, know when this job is likely to be undertaken, as there has
been some discussion both here and on the devel list, but it remains AFAIK
"in the future" discussion, rather than immediately pending. It's possible
PAN will be released in a 1.0 version before this major rewrite, and the
rewrite would then be for 2.0. However, my personal feeling, given that PAN
stands for the longer name and ultimate goal of "Pimp-Ass Newsreader", is
that this will be done before PAN 1.0.
Still, if it were me, I'd likely advance the version to 0.50 indicating a
definite milestone b4 the rewrite, and start the rewrite at 0.60 or 0.70
perhaps (depending on how actively the "stable" code was intended to be
maintained), allowing plenty of maintenance releases of the current code
before the new code is considered mature enough for full featured stable
deployment. In hindsight (and IMO, personally), perhaps the same should have
been done with the GTK2 port, starting it at 0.20, since Gnome/GTK1 was by
then at 0.11. It would have been perhaps a bit easier of a concept to grasp
for newbies not deeply into Linux and the complexities of versioning yet,
that 0.20 was designed around the newer GTK2 while 0.1x was the older
GTK/Gnome 1, than it was to explain an immediate progression from 0.11 to
0.12. IMO, advancing the version number to 0.50 indicating a decently stable
milestone series, and starting the new SQLite version at 0.60 or 0.70 would
be equally indicative of intentions.
WDYT, Charles, or am I blowing the job all out of proportion and it won't be
that big a deal, with little new code and mostly deleted old backend code?
Still, for an SQLite port, an advance to 0.20 (or 0.30, if we are already at
0.2x by then) at least might be appropriate?
BTW, is that an immediately approaching job or still some time out? My
feeling from the list is it should be done /fairly/ soon, as there are
several new features waiting behind it for implementation.
--
Duncan - List replies preferred.
"They that can give up essential liberty to obtain a little
temporary safety, deserve neither liberty nor safety."
Benjamin Franklin
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Pan-devel] SQLite versioning and timing issues. Was: [Pan-users] 0.14.0 Slow on large group,
Duncan <=