Friday, February 14, 2014

Big Table partition techniques


I believe most the features are from Oracle. However DB2 does provide range partitioning. So what is the use case? 
Say you have an eCommerce site, a million user visits per second, you want to build user visit monitor interface with ability to zoom in or out to see visit per year, per month, per week, per day, per hour, per min. How can you design the table? We are talking about billions of row here!
The answer is range partitioning by user visit time. Of course if the table becomes too big, we can create new table every 3 years. Since the old logging history could be discarded once business deems it useless.

  • Range Partitioning - Used when there are logical ranges of data. Possible usage: dates, part numbers, and serial numbers.
  • Hash Partitioning - Used to spread data evenly over partitions. Possible usage: data has no logical groupings.
  • List Partitioning - Used to list together unrelated data into partitions. Possible usage: a number of states list partitioned into a region.
  • Composite Range-Hash Partitioning - Used to range partition first, then spreads data into hash partitions. Possible usage: range partition by date of birth then hash partition by name; store the results into the hash partitions.
  • Composite Range-List Partitioning - Used to range partition first, then spreads data into list partitions.  Possible usage: range partition by date of birth then list partition by state, then store the results into the list partitions."
 Benefits:
  • Query performance,
  • Getting large volumes of new data into these tables and,
  • Removing large volumes of obsolete data every month or quarter.
 

Thursday, February 13, 2014

update a single column in a table with billion rows

Question:
Supposed we have huge table called bigTable, we want to update col3's 'old val' to 'new val'.

Solution:

1. if there is enough db space

INSERT into temp
SELECT pk, col1, col2, col 4, col5
          CASE col3 WHEN 'old val' THEN 'new val' ELSE col END
FROM bigTable;

Create indices;

Drop related FKs;

Drop bigTable;

Rename Table temp To bigTable;

Create FKs;

Update: This technique is called CTAS in the link:
http://www.dba-oracle.com/t_fast_batch_updates_ctas.htm

create table 
   new_mytab 
NOLOGGING as
select /*+ full parallel(mytab,35)*/ 
   decode (status,'new','old',status,
   col2, col3, col4
from mytab;

-- rebuild indexes, triggers and constraints to new_mytab

rename mytab     to bkup_mytab;

rename new_mytab to mytab;
 

2. if there is limited db space

INSERT into temp
SELECT pk, 'new val' as col
FROM bigTable
Where col3 = 'old val';

MERGE INTO bonuses D
   USING temp S
   ON (D.pk = S.pk)
   WHEN MATCHED THEN UPDATE SET D.col3 = S.col
;