[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Mifluz-dev] Question on API
From: |
Geoff Hutchison |
Subject: |
Re: [Mifluz-dev] Question on API |
Date: |
Sun, 17 Feb 2002 22:43:54 -0600 |
On Sunday, February 17, 2002, at 04:34 PM, Brian Aker wrote:
1) Has an API that allows me to create an initial index from the api.
2) Allows me to insert a blob of text with a unique keyword
4) Allows me to delete entries (and replace).
Of course.
5) Needs to be thread safe and not leak memory.
Loic may know better than I how rigorously this has been tested, but
yes, this should be fine.
7) Needs to be able to search and index that was created with 1 gig of
text in under 3 seconds.
No offense, but this is a bit nonsensical. Granted, I'll assume that
you're going to back things up with reliable, fast hardware. But there's
a great deal of difference between say, 1 billion keys with a few bytes
of record attached and 1 million keys with a few K of record. Things
generally scale by the number of keys more than anything else. Even so,
unless you're return a lot of query hits and need to do significant work
before presentation, 3 seconds is a lot of CPU time.
6) It would be great if I could restrict a search to a certain set of
unique keywords (aka the keys representing the text blobs).
Not a problem. Consider for example the substring or prefix "fuzzy
match" algorithms used by ht://Dig.
3) Allows me to pass it a query string and have it return the unique
keyword that the text was identified with when it was inserted (and it
would be really nice if it gave back some sort of number representing
the matched value).
I'm not quite sure I follow. This sounds like you want the query to
match the blob (i.e. the record) and return the key? Normally you'd use
the keywords to retrieve the blob. Or am I misunderstanding you?
--
-Geoff Hutchison
Williams Students Online
http://wso.williams.edu/