myouanorablog: SQL Tuning : SQL Join Internal

Introduction

Joining 2 tables with index will result in Nested loops join, while missing index may result in sort merge join

Nested loops join may serve outer join

Logical joins

equi-join

select emp.name, bonnus.com from emp, bonnus where emp.name=bonnus.name

select emp.name, bonnus.com from emp inner join bonnus on emp.name=bonnus.name

outer join

select emp.name, bonnus.com from emp, bonnus where emp.name=bonnus.name(+) : include emp rows even where there was no matching in bonus table

select emp.name, bonnus.com from emp LEFT OUTER JOIN bonnus ON emp.name=bonnus.name

full outer join

select
author_last_name C1,
book_title C2
from
author full outer join book_author using (author_key)
full outer join book using (book_key)
order by author_last_name; : include books without author and author without books

Union and Union all

with Union duplicate rows are removed, internally union all plan is executed and sort unique operation is done : so Union all is better than Union regarding performance aspect

minus

select deptno from dept
minus
select deptno from emp

Use minus instead of not in operation

intersect

select deptno from dept
intersect
select deptno from emp

anti-join ( not in, not exists)

select author_last_name from author

where

author_key not in ( select author_key from book_author);

select author_last_name from author a

where not exists

( select 1 from book_author b where b.author_key=a.author_key)

cross join

select last_name, dept_id

from

emp cross join dept;

result in a cartesian product

nutural join

select stuff from emp natural join dept;

Require that tables have same column name

using clause

select depart_name, city

from

departments join locations using (location_id)

SORT MERGE JOIN

Usually used : no index exist to join the table columns, hash_area_size or pga_aggregate_target does not offer enough RAM for a hash join
Optimizer cardinality estimates indicate that the SQL will require visiting the majority of data blocks from both tables
When the all_rows optimizer_mode determines that a full-table scan is less resource intensive than index access

Find_merge_joins.sql

col c1 heading ‘Date’ format a20

col c2 heading ‘Hash|Join|Count’ format 99,999,999

col c3 heading ‘Rows|Processed’ format 99,999,999

col c4 heading ‘Disk|Reads’ format 99,999,999

col c5 heading ‘CPU|Time’ format 99,999,999

ttitle ‘Merge Joins over time’

select

to_char(

sn.begin_interval_time,

'yy-mm-dd hh24'

)

count(*)

sum(st.rows_processed_delta)

sum(st.disk_reads_delta)

sum(st.cpu_time_delta)

from

dba_hist_snapshot sn,

dba_hist_sqlstat st,

dba_hist_sql_plan sp

snap_time,

ct,

row_ct,

disk,

cpu

where

-- Additional cost licenses are required to access the dba_hist tables.

st.snap_id = sn.snap_id

and

st.dbid = sn.dbid

and

st.instance_number = sn.instance_number

and

sp.sql_id = st.sql_id

and

sp.dbid = st.dbid

and

sp.plan_hash_value = st.plan_hash_value

and

sp.operation = 'MERGE JOIN’

group by

to_char(sn.begin_interval_time,'yy-mm-dd hh24');

use_merge hint

select /+* use_merge(e,b) parallel(e, 63 ) parallel (b,63) */
e.ename, hiredate, b.com
from emp e, bonnus b
where e.ename=b.ename

A parallel query is run against a 64 CPU server

NESTED LOOPS JOIN

sample

select employee_name, department_name

from employees e, departments d

where e.department_id=d.department_id

suppose we have standard primary key, foreign-key relationship for department_id (i.e. index on these columns) Oracle may use NESTED LOOP method performing full table scan on departements table( the driving table ) and looping through the emp.idpt_ix to access the machine rows in emp table

The driving table

Optimizer will choose the smallest table or table without index

no parallelism

Nested loop joins always invoke an index and hence, they cannot be paralleled

hints

Use use_nl_with_index hint instead of use_nl, Optimizer will use nested loop if suitable index exists

awr_nested_join_alert.sql

col c1 heading ‘Date’ format a20
col c2 heading ‘Nested|Loops|Count’ format 99,999,999
col c3 heading ‘Rows|Processed’ format 99,999,999
col c4 heading ‘Disk|Reads’ format 99,999,999
col c5 heading ‘CPU|Time’ format 99,999,999
accept nested_thr char prompt ‘Enter Nested Join Threshold: ‘
ttitle ‘Nested Join Threshold|&nested_thr’
select
to_char(
sn.begin_interval_time,
'yy-mm-dd hh24'
)
count(*)
sum(st.rows_processed_delta)
sum(st.disk_reads_delta)
sum(st.cpu_time_delta)
from
dba_hist_snapshot sn,
dba_hist_sqlstat st,
dba_hist_sql_plan sp
snap_time,
ct,
row_ct,
disk,
cpu
-- make sure that you are licensed to read the AWR tables
where
st.snap_id = sn.snap_id
and
st.dbid = sn.dbid
and
st.instance_number = sn.instance_number
and
sp.sql_id = st.sql_id
and
sp.dbid = st.dbid
and
sp.plan_hash_value = st.plan_hash_value
and
sp.operation = 'NESTED LOOPS'
group by
to_char(sn.begin_interval_time,'yy-mm-dd hh24')
having
count(*) > &nested_thr;

HASH JOIN

how it works

Use PGA RAM to build an in-memory hash table from the smaller driving table. Rows in the RAM hast table are then used as build input rows source, providing ROWID required into the larger table.

hash_area_size or pga_aggregate_target have to be set to high value
temp usage

If there is no room in the RAM Oracle can overflow on TEMP tablespace

vs nested loop

Whenever the driving table will fit into the hash_area_size(pga_aggregate_target) RAM, a hash join may run faster than a nested loops join.

Sizing hash area in RAM

hash area should be 1.6 times as large as a driving table :
set heading off;
set feedback off;
set verify off;
set pages 999;
spool run_hash.sql
select
'alter session set hash_area_size='||trunc(sum(bytes)*1.6)||';'
from
dba_segments
where
segment_name = upper('&1');
spool off;

Hint

Use hint parallel instead of setting parallelisum on table level
select /+* use_hash(e,b) parallel(e, 63 ) parallel (b,63) */
e.ename, hiredate, b.com
from emp e, bonnus b
where e.ename=b.ename

hash_join_alert.sql

col c1 heading ‘Date’ format a20
col c2 heading ‘Hash|Join|Count’ format 99,999,999
col c3 heading ‘Rows|Processed’ format 99,999,999
col c4 heading ‘Disk|Reads’ format 99,999,999
col c5 heading ‘CPU|Time’ format 99,999,999
accept hash_thr char prompt ‘Enter Hash Join Threshold: ‘
ttitle ‘Hash Join Threshold|&hash_thr’
select
to_char(sn.begin_interval_time,'yy-mm-dd hh24') c1,
count(*) c2,
sum(st.rows_processed_delta) c3,
sum(st.disk_reads_delta) c4,
sum(st.cpu_time_delta) c5
from
dba_hist_snapshot sn,
dba_hist_sql_plan p,
dba_hist_sqlstat st
where
st.sql_id = p.sql_id
and
sn.snap_id = st.snap_id
and
p.operation = 'HASH JOIN'
having
count(*) > &hash_thr
group by
begin_interval_time;

myouanorablog

vendredi 8 mai 2015

SQL Tuning : SQL Join Internal

Aucun commentaire:

Enregistrer un commentaire

Archives du blog