[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Pan-devel] newsrc with DB
From: |
Jeffrey Stedfast |
Subject: |
Re: [Pan-devel] newsrc with DB |
Date: |
Thu, 10 Jun 2004 09:16:15 -0400 |
On Thu, 2004-06-10 at 03:08, K. Haley wrote:
> Jeffrey Stedfast wrote:
>
> >I'm not much of a database guy...but...
> >
> >1. is a database really necessary?
> >
> >"as far as I've seen, once those database worms eat into your brain,
> >every thumb looks like a nail" -- jwz
> >
> >seriously tho. one should definitely read jwz's document on summary
> >files.
> >
> >Evolution uses jwz's approach to summary files and it is EXTREMELY
> >scalable. I have multi-gigabyte mbox files in Evolution right now and
> >you wouldn't know it based on load time. Heck, based on load time you
> >might expect my folders are 5 or 6 messages tops :-)
> >
> >
> How many messages in one of those multi-gig mbox files?
~135,000 messages
> We're looking
> at handling >1 million articles per group.
ah.
> I just read the article. It
> sounds like Pans current implementation with sumary files for each group
> on each server. If I understand the code correctly Pan also loads the
> summary file into memory as jwz suggested. There are two problems with
> this.
>
> 1. With such a large article count the summary will be >100MB. In
> gourps with long subject lines, like binary groups, expect it to be
> >200MB. Several users have seen memory usage well above that. The
> only effective solution here is to load the data only when it is needed.
fair enough.
>
> 2. What we really want is to combine the group lists for several
> servers. Say you have two news servers. You open a group that is on
> both servers. Pan shows the combined article list from both servers.
>
> Solving both problems would basicaly mean writing a mini dbe so we might
> as well use a small fast db like sqlite. His suggestion about doing
> lazy updates was a good one though. Mark an article read, queue it,
> start an update thread that waits for 30 sec before doing anything,
> mark more articles read and add them to the queue. When the thread runs
> it updates all the articles in the queue.
ok.
>
> >2. if you use a database with a table that contains the message-id's of
> >the articles, PLEASE don't store the message-id as <address@hidden>, store it
> >as address@hidden - I say this for several reasons:
> >
> >
> That's good to know.
>
> >ok, exception to #2 is that using the first 8 bytes of the md5sum'd
> >(canonical) message-id might be just as good. plus it saves a ton more
> >memory :-)
> >
> >
> Good idea, although we might need to use the full 16 bytes just to be
> safe. Since the article table will hold all the article summaries for
> all groups there will be more than a few users with several MILLION
> entries in it.
yes, with millions of messages - might need the entire md5sum.
Jeff
--
Jeffrey Stedfast <address@hidden>