How "Search TItles" works


Back to topics list
[post:508#4978]
Rebecca

04/04/2011 02:48 PM

Reviews: 23
Posts: 773

It's currently an ugly hand-rolled set of SQL queries. Someday I would like to look to moving to a proper search engine-- there are a lot of good options these days.

First, it splits your query into words. It does understand quotes, so for instance:

this "is a" test

Counts as three words. The quotes are not used in the search.

For each word, it does a substring match of the title and the synopsis. If it title matches, it assigns a score of 20, if the synopsis matches it assigns a score of 10. If both match, it assigns a score of 30. It then sorts all matching results by score, followed by title.

Now, you may be wondering how multiple titles fit into this. For the purposes of matching, it treats each title as if it were it's own series. So a single series may show up in the search results once for every title, if, for instance, there was a match in the synopsis.

Currently nothing else is searched, so no keywords, notables, nor reviews or links.

If we can find ways of making these results more relevant using the existing structure, I'm absolutely open to those. Things that require major changes are less likely to get the time they would need.

Edited on 04/04/2011 02:50 PM.

[post:508#4979]
Devil Doll

04/04/2011 04:42 PM

Reviews: 365
Posts: 1574

I'm not sure what exactly Ggultra2764 was searching for but I guess it was "Hanasaku" which didn't get a match in "Hana-Saku", and I have no idea how to reasonably get one there. (If it were the opposite, i. e. searching for "Hana-Saku" and getting zero results then it might be an idea to automatically split up the search term "Hana-Saku" into "Hana" and "Saku", i. e. use all non-letter characters as potential separators, and then try the query once more.) Had he tried "Iroha" he would have found the existing entry, so using the longer term instead must have been bad luck.

Searching for hana saku iroha results in just one anime hit (under four titles) whereas searching for hana saku returns many hits, thus several search terms are combined with an "AND" operator, right? Knowing this, adding too many search terms can be counterproductive. In this case, would it make sense to handle queries with zero results (these are the critical ones for both visitors and creators of new anime entries) by at least offering a hint to use fewer terms and/or substrings in order to get at least some results? (I don't ask for automatically generating subsets of search terms and tring each of these as the result might then be confusing for the user in some cases, even though this would be closer to Google's "Did you mean... ?". After all, how would your routine know where exactly to split up "Hanasaku" into two terms?)

Not searching keywords and notables is plausible; searching links might be an idea but probably only lead to more results within the same franchise, so it should get a low scoring factor if any. Searching reviews would result in giving many of my reviews for other anime as results as they tend to link to 'similar' anime, so rather don't do this. In general I like the scoring system as it is.

[post:508#4984]
Rebecca

04/05/2011 04:13 PM

Reviews: 23
Posts: 773

That's a good idea-- something I'll look into.

[post:508#4985]
Rebecca

04/07/2011 12:33 AM

Reviews: 23
Posts: 773

So I did tweak how search works. Two changes.

First, as Devil Doll suggested, if a regular search produces zero results then it will do an OR based search instead. Not ideal, but definitely better then matching nothing. And it seems to produce basically the right results.

And second, I made it treat hyphens as a word separator. So searching for «hana-saku» is the same thing as searching for «hana saku». If there are any other characters you think I should treat this way, let me know. I could just do all punctuation, but then you can't search for titles that have punctuation in their names.

[post:508#4986]
Devil Doll

04/07/2011 12:16 PM

Reviews: 365
Posts: 1574

Would it make sense to make that auto-replacing of punctuation characters dependent on whether the query term is embedded in quotation marks?

[post:508#5000]
Rebecca

04/14/2011 10:24 AM

Reviews: 23
Posts: 773

Yeah, that is a good point...

Reply to this topic Start a new topic
Back to topics list

Community Anime Reviews

anime mikomi org