Re: [Pan-devel] More database thoughts

pan-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-devel] More database thoughts

From:	K. Haley
Subject:	Re: [Pan-devel] More database thoughts
Date:	Sun, 20 Jun 2004 00:00:41 -0600
User-agent:	Mozilla Thunderbird 0.7 (Windows/20040616)

lets see if it gets posted this time.

Tom wrote:

I like Calin's incremental approach. If you add the DB stuff to what's
already there, it makes it easy to cross-check for debugging purposes.
Also, unless you're confident that SQLite will never corrupt the DB,
having the files as a backup is an insurance factor. In any case, adding
a "(re)build DB" menu option someplace might be a good starting point.

Pan currently uses one directory for each server. These directoriescontain files for each group for which you've downloaded headers,containing the article info for that group. This would seem to makecross-checking complicated. As for corruption, Pan's current setup getscorupted occasionaly as is. The only real solution here is to use morethan one DB file. The first would hold the server, group, andgroup-server tables. The article and article-server stuff could be inone or more additional tables. It's an interesting tade-off.

If stored in one table then article status would be tracked forcross-posts, however all article data is lost if the file is corrupted.

If stored in one table per group then only that groups data would belost and the user could nuke the file if it's not wanted, howeverarticle status would not be tracked for cross-posts. It would also bemore difficult to implement.

It looks to me that the Article structure is a big memory user. The
thing is, you rarely if ever display anything other than part 0 or 1, so
to me it makes sense to only keep the part 0/1 in memory, and retrieve
from DB/display the others on an as-needed basis. Correct me if I'm
wrong, but I suspect that few (text or binary) groups would have more
than 100,000 "unique" subjects (part 0/1). It seems to me that trying to
truncate the (xx/yy) from the subject string would be a small saving by

comparison.

My idea is to extend the duplicate checking to include authors. Thiswould offer more space savings in most groups. Whether or not thesubject gets truncated is another matter. The same table that hold thesubjects will hold the authors as well. No need for an additionalauthors table since both are used for finding duplicates.


TABLE duplicates
text
ref_cnt
id

Article
subject    duplicates:id
author     duplicates:id

As for the Article structure usgin a lot of memory, all we really needis a small cache of 100-200 entries for the visible articles.

signature.asc
Description: OpenPGP digital signature

[Prev in Thread]

Current Thread

[Next in Thread]

[Pan-devel] More database thoughts, Tom, 2004/06/15
- [Pan-devel] Re: More database thoughts, Duncan, 2004/06/16
  - Re: [Pan-devel] Re: More database thoughts, Jeff Vian, 2004/06/16
    - [Pan-devel] Re: Re: More database thoughts, Duncan, 2004/06/17
- Re: [Pan-devel] More database thoughts, K. Haley <=
  - Re: [Pan-devel] More database thoughts, Sebastian Kapfer, 2004/06/20

Prev by Date: [Pan-devel] musings on memory consumption
Next by Date: Re: [Pan-devel] Want to fix memory consumption issues
Previous by thread: [Pan-devel] Re: Re: More database thoughts
Next by thread: Re: [Pan-devel] More database thoughts
Index(es):
- Date
- Thread