[UPHPU] Deduping a list
Wayne Jensen
jensenw at gmail.com
Fri Aug 4 18:18:19 MDT 2006
On 7/26/06, Ash <ashovi at qwest.net> wrote:
> I have a list of email addresses and a list of those who have
> unsubscribed. (We get a new list to send to every time from our partner,
> but we have to keep track of those who have unsubscribed from getting
> our emails, and remove them from the new list every time.)
>
> So I have list 1 that has 10,000 email addresses in it and a list of
> unsubscribes that has about 200 email addresses. I want to remove all
> unsubscribes from the main list. Is there an easy way to do it?
>
> Ash
If for some reason you had to or wanted to do it all in PHP I would
recommend taking advantage of PHP's hash table-like associative
arrays.
In other words, instead of something like
$badEmails[] = $emailAddress;
.....
if (in_array($addressToCheck,$badEmails))
I would use something like
$badEmails[$emailAddress] = 1;
...
if (isset($badEmails[$addressToCheck]))
in_array will loop through your array and check each item in it to see
if it matches the one you're looking for, so it's O(n). Doing this in
a loop it's O(mn) where m is the number of addresses you're checking
and n is the number of bad addresses to check against. It will run
much faster using the hash table.
I wrote a small script that does this to dedup files based on the
value of a field or multiple fields (comm wouldn't work because the
lines could be different but a field the same, e.g. two names for the
same phone number). It's a CLI script and works on fixed-width files,
but it could be easily modified to work on delimited files if anyone
cared to.
As for the SQL solutions, why not just have one table with all of the
email addresses in it and just have a flag for
subscribed/unsubscribed? Insert records into the table, if the email
address is already in there then the insert fails (assuming that
column is unique). Update the good/bad flag as needed. SELECT * FROM
yourtable WHERE subscribed=1. Wouldn't it be faster with no
subselects, etc? It also makes more sense to me, personally, to have
all of these contacts in one table and set a flag than to move them in
and out of two different tables. I dunno.
More information about the UPHPU
mailing list