|
From: | Steven J. DeRose |
Subject: | [Mifluz-dev] Question re. word data |
Date: | Sun, 27 Jan 2002 15:40:46 -0500 |
The goal for me would be to make it able to do containment and structure queries on hierarchical data, particularly mbox files and XML. By storing a "word" record with start and end offset for each XML element, or each MIME mail message (and maybe each MIME header line), mifluz could find words only when they occur in a certain context.
A quick look suggests this would mainly involve adding a new record type besides DATA and STRING, and adding the necessary APIs to get them in, found, and out.
I've built systems like this before that scaled into the 100s of MB per document, so I know most of the general constraints, but I don't know anything about the internals of Mifluz (yet). Does this sound like a feasible enhancement and approach?
Any advice appreciated. Thanks! -- Steve DeRose
[Prev in Thread] | Current Thread | [Next in Thread] |