vendredi 8 mai 2015

SQL Tuning : SQL Join Internal


  • Introduction
Joining 2 tables with index will result in Nested loops join, while missing index may result in sort merge join

Nested loops join may serve outer join
  • Logical joins
    • equi-join
select emp.name, bonnus.com from emp, bonnus where emp.name=bonnus.name
select emp.name, bonnus.com from emp inner join bonnus on emp.name=bonnus.name
    • outer join
select emp.name, bonnus.com from emp, bonnus where emp.name=bonnus.name(+) : include emp rows even where there was no matching in bonus table
select emp.name, bonnus.com from emp LEFT OUTER JOIN bonnus  ON emp.name=bonnus.name


    • full outer join
select
   author_last_name C1,
   book_title   C2
from
  author full outer join book_author using (author_key)
             full outer join book using (book_key)
order by author_last_name;  : include books without author and author without books


    • Union and Union all
with Union duplicate rows are removed, internally union all  plan is executed and sort unique operation is done : so Union all is better than Union  regarding performance aspect


    • minus
select deptno from dept
minus
select deptno from emp

Use minus instead of not in operation


    • intersect

select deptno from dept
intersect
select deptno from emp


    • anti-join ( not in, not exists)
select author_last_name from author 
where
  author_key not in ( select author_key from book_author);

select author_last_name from author a
where not exists
  ( select 1   from book_author b where b.author_key=a.author_key)

    • cross join
select last_name, dept_id
from
emp cross join dept;

result in a cartesian product
    • nutural join
select stuff from emp natural join dept;
Require that tables have same column name

    • using clause
select depart_name, city
from
departments join locations using (location_id)

  • SORT MERGE JOIN
    • Usually used : no index exist to join the table columns, hash_area_size or pga_aggregate_target does not offer enough RAM for a hash join
    • Optimizer cardinality estimates indicate that the SQL will require visiting the majority of data blocks from both tables
    • When the all_rows optimizer_mode determines that a full-table scan is less resource intensive than index access

    • Find_merge_joins.sql
col c1 heading ‘Date’  format a20
col c2 heading ‘Hash|Join|Count’  format 99,999,999
col c3 heading ‘Rows|Processed’  format 99,999,999
col c4 heading ‘Disk|Reads’  format 99,999,999
col c5 heading ‘CPU|Time’ format 99,999,999
ttitle ‘Merge Joins over time’
select
  to_char(
    sn.begin_interval_time,
    'yy-mm-dd hh24'
  )
  count(*)
  sum(st.rows_processed_delta)
  sum(st.disk_reads_delta)
  sum(st.cpu_time_delta)
from
   dba_hist_snapshot   sn,
   dba_hist_sqlstat    st,
   dba_hist_sql_plan   sp
snap_time,
ct,
row_ct,
disk,
cpu
where
-- Additional cost licenses are required to access the dba_hist tables.
   st.snap_id = sn.snap_id
and
   st.dbid = sn.dbid
and
   st.instance_number = sn.instance_number
and
   sp.sql_id = st.sql_id
and
   sp.dbid = st.dbid
and
   sp.plan_hash_value = st.plan_hash_value
and
   sp.operation = 'MERGE JOIN’
group by
   to_char(sn.begin_interval_time,'yy-mm-dd hh24');

    • use_merge hint
select /+* use_merge(e,b) parallel(e, 63 ) parallel (b,63) */
e.ename, hiredate, b.com
from emp e, bonnus b
where e.ename=b.ename

A parallel query is run against a 64 CPU server
  • NESTED LOOPS JOIN
    • sample
select employee_name, department_name 
from employees e, departments d
where e.department_id=d.department_id

suppose we have standard primary key,  foreign-key relationship for department_id (i.e. index on these columns) Oracle may use NESTED LOOP method performing full table scan on departements table( the driving table ) and looping through the emp.idpt_ix to access the machine rows in emp table


    • The driving table 
Optimizer will choose the smallest table or table without index

    • no parallelism
Nested loop joins always invoke an index and hence, they cannot be paralleled
    • hints
Use use_nl_with_index hint instead of use_nl,  Optimizer will use nested loop if suitable index exists


    • awr_nested_join_alert.sql

col c1 heading ‘Date’                format a20
col c2 heading ‘Nested|Loops|Count’  format 99,999,999
col c3 heading ‘Rows|Processed’ format 99,999,999
col c4 heading ‘Disk|Reads’ format 99,999,999
col c5 heading ‘CPU|Time’ format 99,999,999
accept nested_thr char prompt ‘Enter Nested Join Threshold: ‘
ttitle ‘Nested Join Threshold|&nested_thr’
select
  to_char(
    sn.begin_interval_time,
    'yy-mm-dd hh24'
  )
  count(*)
  sum(st.rows_processed_delta)
  sum(st.disk_reads_delta)
  sum(st.cpu_time_delta)
from
   dba_hist_snapshot   sn,
   dba_hist_sqlstat    st,
   dba_hist_sql_plan   sp
snap_time,
ct,
row_ct,
disk,
cpu
-- make sure that you are licensed to read the AWR tables
where
   st.snap_id = sn.snap_id
and
   st.dbid = sn.dbid
and
   st.instance_number = sn.instance_number
and
   sp.sql_id = st.sql_id
and
   sp.dbid = st.dbid
and
   sp.plan_hash_value = st.plan_hash_value
and
   sp.operation = 'NESTED LOOPS'
group by
   to_char(sn.begin_interval_time,'yy-mm-dd hh24')
having
        count(*) > &nested_thr;


  • HASH JOIN
    • how it works


Use PGA RAM to build an in-memory hash table from the smaller driving table. Rows in the RAM hast table are then used as build input rows source, providing ROWID required into the larger table.

    • hash_area_size or pga_aggregate_target  have to be set to high value
    • temp usage
If there is no room in the RAM Oracle can overflow on TEMP tablespace

    • vs nested loop
Whenever the driving table will fit into the hash_area_size(pga_aggregate_target) RAM, a hash join may run faster than a nested loops join.

    • Sizing hash area in RAM
hash area should be 1.6 times as large as a driving table :
set heading off;
set feedback off;
set verify off;
set pages 999;
spool run_hash.sql
select
'alter session set hash_area_size='||trunc(sum(bytes)*1.6)||';'
from
   dba_segments
where
   segment_name = upper('&1');
spool off;

    • Hint
Use hint parallel  instead of setting parallelisum on table level
select /+* use_hash(e,b) parallel(e, 63 ) parallel (b,63) */
e.ename, hiredate, b.com
from emp e, bonnus b
where e.ename=b.ename

    • hash_join_alert.sql

col c1 heading ‘Date’ format a20
col c2 heading ‘Hash|Join|Count’ format 99,999,999
col c3 heading ‘Rows|Processed’ format 99,999,999
col c4 heading ‘Disk|Reads’ format 99,999,999
col c5 heading ‘CPU|Time’ format 99,999,999
accept hash_thr char prompt ‘Enter Hash Join Threshold: ‘
ttitle ‘Hash Join Threshold|&hash_thr’
select
   to_char(sn.begin_interval_time,'yy-mm-dd hh24')  c1,
   count(*)                                         c2,
   sum(st.rows_processed_delta)                     c3,
   sum(st.disk_reads_delta)                         c4,
   sum(st.cpu_time_delta)                           c5
from
   dba_hist_snapshot sn,
   dba_hist_sql_plan  p,
   dba_hist_sqlstat  st
where
   st.sql_id = p.sql_id
and
   sn.snap_id = st.snap_id
and
   p.operation = 'HASH JOIN'
having
   count(*) > &hash_thr
group by
   begin_interval_time;

Aucun commentaire:

Enregistrer un commentaire