[UPHPU] Database Question
cole at colejoplin.com
cole at colejoplin.com
Wed Aug 16 01:24:50 MDT 2006
Hi Everyone,
As far as the vanilla items, they should all have a single record,
which probably has a corresponding product number. You won't gain
anything extra by splitting it up. In fact, you'll always have one
more record per item than you would have had. That's a net loss.
After redesigning more databases than I want to remember, and reading
the comments, I'll share my experiences with databases small and large.
1) The goal is speed, room for growth, reasonable costs, and low
impact on applications that access it. There is only one way to get
all four, which is:
2) Normalization - the most important thing you will ever need to know
about database design. Sadly, proper normalization is the exception
rather than the rule, and the inevitable result is bloat and sloth. If
you don't normalize, you will pay for it later in a slow, painful, and
expensive fashion. Best reason to normalize? MONEY! Sometimes big
money. Without normalization, you will eventually fail in all four
goals.
3) I also feel that you must go to 5th normal form, instead of the 3rd
shorcut. I don't think it's optional. 5th form just means don't orphan
data, which is easy to accept. 4th is critical, and here's why.
It is possible to create a many-to-many relational table, and
technically stay within the first three forms. That's bad! If you
don't see a lot of blank fields, you may not recognize the problem,
and it eventually exposes itself as data errors in your apps. Then
you're in serious trouble trying to set it right. So, always split a
many-to-many into two one-to-manys, with a shared junction table, no
exceptions. That's the core of what 4th is all about. If you don't
know why and how to use a junction table, then you do not know
database design.
4) To get speed, you must use indexes. Your SQL statements will show
you what fields or multiple fields you should index. The key to speed
is NOT using more simple queries. First, a properly indexed join on
two tables will always beat two table queries. I'm not saying only
have one SQL statement! I'm saying that SQL joins have an appropriate
usage, and you should understand why and when.
Second, an organized recordset from a join is infinitely easier to
program, loop and parse, rather than php hopping around two recordsets
or some multi-dimensional array with counters. I'm positive you'll
notice the difference in lines of php complexity. Why work so hard?
I'll take one intense SQL statement over the extra hundred lines of
code any day.
5) Sometimes it doesn't make sense to run through large tables, and a
summary table is the way to go. This is all about speed, where
normalization-only doesn't make sense. Here's an example, you want to
do a report on the last decade of year's sales numbers for a bunch of
categories. You can read all those detail records and get the results
- or - have a table that stores the results that are likely to be
called for frequently.
A difference of reading 20 records or 20 million affects speed. Some
purists may say that's not strict normalization, but they wouldn't
deny that the performance gains for a few extra tables is worth it. It
all comes down to the fastest result for the end-user, which includes
connection time, server processing power, load and memory.
6) Always ask yourself the question, "Will this ever potentially grow
in the future?" If you're doing this for a business, the default
answer is yes. So plan accordingly. I'm working on a side project
right now that was not normalized, where no one thought about the
future, just the deadline in front of them. Not a single junction
table, not a single index, and tons of bugs. And my bill to make it
expandable and stable again....oh that's sweet.
Wait a minute! What am I doing? Forget everthing I just wrote. Don't
normalize, just slap it in there any way you want. Call me later when
things go bad. ;)
-- Cole
Quoting Jacob Wright <jacwright at gmail.com>:
> I would like to add my two cents worth on this topic. Database design, like
> anything else, is learning the rules, knowing the benefits of each rule, and
> knowing when and why to break the rules. Generally you want to start off by
> designing the database as normalized as possible. Then you need to go
> through and un-normalize things where you don't need (and are confident that
> you won't in the future) the flexability. This will provide optimal
> performance and smaller queries.
>
> Of course, as stated earlier, you need to learn what normal form is in the
> first place.
>
> I'm just all against the "that's the rules, you have to do it that way"
> mantra. It makes slow databases, bloated object-oriented code, and boring
> music. Gotta know the rules, and then how to break them. :)
>
> Jacob Wright
>
> On 8/14/06, Walt Haas <haas at xmission.com> wrote:
>>
>> Jonathan Duncan <jonathan at jkdwebmagic.com> wrote:
>>
>>> On Mon, 14 Aug 2006, Webot Graphics wrote:
>>>
>>> >> You should read up on first, second, and third level
>> normalization. If yo
>>> u
>>> >> plan on creating databases at all, you should do yourself the favor
>> of
>>> >> understanding what normalization is.
>>>
>>> I would also highly suggest this book:
>>> Database Design for Mere Mortals
>>> http://www.amazon.com/gp/product/0201752840
>>
>> Another good clear introductory book is "The Practical SQL Handbook"
>>
>> http://www.amazon.com/gp/product/0201703092
>>
>> -- Walt
>> -------
>> Walt Haas The Web Site Doctor - Cures Sick Web Sites
>> (801) 534-1262 http://thewebsitedoctor.net
>>
>> _______________________________________________
>>
>> UPHPU mailing list
>> UPHPU at uphpu.org
>> http://uphpu.org/mailman/listinfo/uphpu
>> IRC: #uphpu on irc.freenode.net
>>
>
> _______________________________________________
>
> UPHPU mailing list
> UPHPU at uphpu.org
> http://uphpu.org/mailman/listinfo/uphpu
> IRC: #uphpu on irc.freenode.net
More information about the UPHPU
mailing list