[Mifluz-dev] Question re. word data

mifluz-dev

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Mifluz-dev] Question re. word data

From:	Steven J. DeRose
Subject:	[Mifluz-dev] Question re. word data
Date:	Sun, 27 Jan 2002 15:40:46 -0500

I've been looking at the MiFluz doc, and it looks quite nice. I'mwondering what it would take to enhance it to support storing thescope or extent of a token, rather than just a single integer

The goal for me would be to make it able to do containment andstructure queries on hierarchical data, particularly mbox files andXML. By storing a "word" record with start and end offset for eachXML element, or each MIME mail message (and maybe each MIME headerline), mifluz could find words only when they occur in a certaincontext.

A quick look suggests this would mainly involve adding a new recordtype besides DATA and STRING, and adding the necessary APIs to getthem in, found, and out.

I've built systems like this before that scaled into the 100s of MBper document, so I know most of the general constraints, but I don'tknow anything about the internals of Mifluz (yet). Does this soundlike a feasible enhancement and approach?


Any advice appreciated.

Thanks!

--
Steve DeRose

[Prev in Thread]

Current Thread

[Next in Thread]

[Mifluz-dev] Question re. word data, Steven J. DeRose <=
- Re: [Mifluz-dev] Question re. word data, Søren Raunsbæk Jørgensen, 2002/01/28

Prev by Date: [Mifluz-dev] Meilleurs Voeux pour 2002 : ann�e de m�moire, de mobilisation, d'action, de justice et de s�r�nit� - Appel au soutien moral et financier
Next by Date: Re: [Mifluz-dev] Question re. word data
Previous by thread: [Mifluz-dev] Meilleurs Voeux pour 2002 : ann�e de m�moire, de mobilisation, d'action, de justice et de s�r�nit� - Appel au soutien moral et financier
Next by thread: Re: [Mifluz-dev] Question re. word data
Index(es):
- Date
- Thread