|
How
Relational Databases Work
(June 16, 2000)
|
 |
By Richard N. Tooker
Senior Vice President Database/Interactive Marketing, DMW
Every
time I make a database marketing
presentation, I first try to determine
the level of sophistication of the
audience. One question I usually ask is
"How many of you know how a
relational database works?"
Usually,
fewer than half of the people raise
their hands, even at an event such as
the Direct Marketing Association’s
biannual National Center for Database
Marketing conferences, which attract
advanced participants. Everyone knows
that relational databases provide the
most flexibility and are the preferred
architecture for marketing databases,
but at least half of them do not know
how or why.
One
of the most important aspects of
creating a marketing database is to
design it for efficient storage, data
access and query processing, and
understanding the relational database
model is necessary to do that.
Flat
Files. The first database management computer programs placed all
the information about specific customers
and their relationships into a single
computer record called a "flat
file." For example, customers'
names and addresses, what they have
purchased, plus any other information
such as demographics, all are in a
single record.
But
what if a customer had purchased three
times? Flat files would work, but much
information would be duplicated. Data
elements such as name, customer number,
address and ZIP code would be identical
in all three records. Thus, the flat
file would store the same information
three times. From a storage standpoint,
this is inefficient.
Scanning
that file for information is also
inefficient. Suppose you wanted to find
and create a list of all the customers
who had bought a specific product in a
file of this nature. Flat file databases
scan all the fields in each record from
left to right, looking for
"hits" in a search pattern.
Flat
files, while easy to construct, result
in inefficient storage and inefficient
and slow processing. The relational
database model solves this.
The
Relational Database Model. If we were to reorganize the same data for a
relational database, we would use a
group of tables to store the
information.
First,
we would create a customer information
table, into which we would place a
single record for each customer
containing all the information specific
to the customer such as customer ID
number, name and address.
Next,
we would create a purchase table, into
which we would place the customer ID
number, the product type, the date of
purchase and the price. If the customer
bought three products, we would create
three records in the purchase table to
hold that information.
Using
this architecture, we would have stored
the same information about customers and
their purchases in much less space
because we would have stored the
customers' names and addresses -- the
bulk of the information -- only once.
The process of reorganizing this
information into separate tables is
called "normalization."
In
our relational database, all the records
have one common data element: the
customer ID number. This number is a
unique identifier because it is used
only to identify a specific customer.
This unique number is the key to making
a relational database work.
Suppose
we once again wanted to find and create
a list of the customers who have
purchased specific products. In our
relational database, the system would
first search the purchase table to find
the products and store the customer ID
numbers in the qualifying records in
temporary memory. The system would then
go to the customer table, extract only
those records in the table that have the
same customer numbers and put the
customer information and the account
information together. This process of
combining the information found in
multiple tables is called a
"join."
The
same two tables also could be used in a
reverse sequence to produce a different
kind of report. If we wanted to find and
list all of the products bought by a
specific customer, the system would
first search the customer table to find
the customer, then use the customer ID
number to locate all the records in the
purchase table that belong to the same
customer, joining them to create the
report.
Keep
in mind that the examples above are
tremendously simplified. In a real
database, the records in both the
customer table and the purchase table
would contain many more fields. Plus,
there would likely be additional tables
containing information such as a history
of the promotions sent to each customer,
a record of customer transactions,
customer demographics, perhaps a history
of customer service activities and so
on. Marketing databases can have dozens
of tables.
One
of the most important benefits of a
relational database is the ability to
add or edit data. Another table can be
added; another field can be added to an
existing table, and values in existing
fields can be changed, all without
having to rebuild the database. This
makes the architecture extremely
flexible.
There
is a downside to normalization: Joins
take time. Relational databases are
flexible and efficient in storing
information, but if a query or report
has to join too many tables, the answers
will be slow in coming. For that reason,
most smart marketers will try to balance
the need for flexibility and efficiency
with the need for speed, building the
marketing database with as few separate
tables as possible. Collapsing a
relational database into fewer tables is
called "denormalization," or
sometimes referred to as
"summarization."
Indexing.
One trick to speed up relational databases is to "index" the
fields that are likely to be in common
use, a way of creating pointers that can
help the system quickly find the records
in the database having each of the
values to be found in a specific field.
The technique can help, but it also has
drawbacks.
For
example, indexing works well on fields
with only a few discrete values such as
gender, which would have only three
values -- male, female and unknown. But
there is no speed gain if the field has
100 or more values. So a continually
variable field like account balance,
which could theoretically have a million
or more values, does not benefit from
indexing. An even greater problem with
indexing is its effect on storage
requirements. A fully indexed database
can easily balloon up to five times its
original size. So indexing, though
helpful in some situations, is not
always the way to solve performance
problems.
Bit
Mapping. Sometimes, it is possible to treat data elements as bits,
rather than bytes, a data organization
referred to as "bit mapping."
A byte consists of eight bits, so
storing and accessing data at the bit
level can make processing eight times
faster.
Inverted
Files. Some proprietary vendor database offerings use an
"inverted" file structure to
speed query processing. In an inverted
file, the system would perform a
horizontal search of the first record in
the database looking for the field it
wants, then go through the rest of the
file vertically, searching only that
field.
This
search pattern can produce
lightning-speed results, sometimes
returning an answer in just a few
seconds against a multimillion-record
database. And since the query searches
only the fields it needs to satisfy the
query rather than all the fields in each
record, the length of the records or the
number of fields in those records has no
effect on speed.
So
why don’t all databases use an
inverted file structure? There are two
major drawbacks. First, inverted files
don’t offer much in the way of
flexibility. If you wanted to add some
information to the database, you
couldn’t just create or edit a table
as you would if the database were
relational -- you would have to rebuild
the entire database.
Second,
inverted files are typically not
compliant with open database
connectivity. The ODBC standard allows
applications to share information, a key
requirement of today’s analytical
programs. So using an inverted database
management system usually limits the
user to the reporting functions that
came bundled with the system. To get
data into other analytical applications
requires time-consuming exporting of
data sets that meet the query criteria.
The
most advanced systems sometimes use a
combination of these technologies to
satisfy user needs. One key to getting
the maximum benefit from a marketing
database is to understand enough about
how the data is processed to make the
right judgments about data organization
before the database is built.
Copyright
©
DM News 2000 All Rights Reserved
|