[UPHPU] Star and Snowflake Schemas
Alvaro Carrasco
alvaro at epliant.com
Tue Feb 13 13:10:57 MST 2007
Richard K Miller wrote:
>
> On Feb 13, 2007, at 1:26 AM, Alvaro Carrasco wrote:
>
>> Daniel C. wrote:
>>> Hi,
>>>
>>> Just wondering if anyone here has experience implementing star or
>>> snowflake schemas in an RDBMS, and (assuming someone has), whether
>>> you'd be willing to tell me a bit about them. I understand the
>>> problems they're designed to solve, but I haven't been able to find a
>>> good example of an implementation of either schema.
>>>
>>> Thanks,
>>> Dan
>>>
>>>
>>
>> I'm no expert, but this is what i know.
>>
>> They're used for reporting and business intelligence because of how
>> easy is to query and aggregate data.
>> A star schema is composed of a fact table, with dimensions tables
>> around it.
>> A good example would be a fact table of "number of requests" and
>> dimension tables of "request types", "time"
>>
>> NumberOfRequests: (Fact)
>> - TypeID (FK)
>> - TimeID (FK)
>> - Number of requests
>>
>> RequestTypes: (Dimension)
>> - TypeID
>> - Name (Inbound, Outbound, etc)
>> - Code
>>
>> Time: (Dimension)
>> - TimeID
>> - DayOfWeek
>> - DayOfMonth
>> - Month
>> - Quarter
>> - Year
>>
>> With this setup it's easy to retrieve the total number of requests of
>> type "Outbound" on "Fridays", or total requests with code "23" on
>> "February"
>> You can also add dimensions like "recipients" or "responses" and now
>> it start to look more like a star.
>>
>> A star schema is very denormalized, to the point that it might take
>> too much space. If that's the case you might start to normalize some
>> of the dimension tables, thus making it look more like a snowflake.
>
> Is the idea behind this setup that the hub table is a table of
> "metadata"? In your example, would you setup a cron job to
> periodically update the NumberOfRequests table with the correct
> number? Or is this where the data originally lives?
>
>
These types of schema are never the original source of the data. You
would probably have a very normalized schema for your day-to-day
transactions (original source), and a cron (just like you said) to
populate the reporting schema, which would be read-only.
Alvaro
More information about the UPHPU
mailing list