[UPHPU] keyword substring scoring
Brandon Stout
hplsbyufan at imapmail.org
Fri Jul 14 00:06:10 MDT 2006
Jon Jensen wrote:
> Another thing you may want to do is limit your initial item query to just
> those that have the keywords. This is especially important if you have a lot
> of items and only a couple keywords. A full-text index can come in quite
> handy ("MATCH (title,desc,link) AGAINST ('keyword1 keyword2 ...')"). If your
> database/table/column doesn't support full-text indexes, a simple where
> clause is still better than nothing ("WHERE title LIKE '%keyword1%' OR desc
> LIKE ..."). No sense retrieving rows you are just going to eliminate.
>
> Hope that helps,
>
> Jon
>
> http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
Jon,
Thank you for the ideas. Your suggestions were clear yet thorough, and
helpful. Although I'm not finished yet, I'm on my way to completing
it. Before you sent this, I decided to try MySQL's REGEXP function.
I've always avoided it because it seemed very complicated, and regular
expressions are slower than alternatives. However, this seems to
retrieve the rows I need rather quickly:
$keywdsql = "SELECT keyword, pointvalue FROM keywords";
$keywdqry = mysql_query($keywdsql, $mysql);
while ($keywds = mysql_fetch_assoc($keywdqry)) {
global $keys;
// create a space-delimited string of all the keywords
$keys .= $keywds[keyword] . " ";
}
// trim the trailing space off the delimited keyword list
// change from space-delimited to pipe delimited
// pipe-delimited needed for MySQL REGEXP function
// ... unless there's a better way ...
$regexp = str_replace(" ", "|", trim($keys));
$itemsql = "
SELECT `link`, `title`, `desc` FROM `rssitems`
WHERE `title` REGEXP '$regexp'
OR `desc` REGEXP '$regexp'
";
$items = mysql_query($itemsql, $mysql);
// etc...
I plan to score on this narrowed-down result set.
Thank you for the link on fulltext indexes as well. I actually had my
table fulltext indexed on the `desc` column, but didn't know it was any
different than other index types. It seems with fulltext searching, I
may not need to manually score each keyword.
Thank you again,
Brandon Stout
http://mscis.org
http://flfn.org
More information about the UPHPU
mailing list