Joining 2 tables with index will result in Nested loops join, while missing index may result in sort merge join
Nested loops join may serve outer join
select emp.name, bonnus.com from emp, bonnus where emp.name=bonnus.name
select emp.name, bonnus.com from emp inner join bonnus on emp.name=bonnus.name
select emp.name, bonnus.com from emp, bonnus where emp.name=bonnus.name(+) : include emp rows even where there was no matching in bonus table
select emp.name, bonnus.com from emp LEFT OUTER JOIN bonnus ON emp.name=bonnus.name
select
author_last_name C1,
book_title C2
from
author full outer join book_author using (author_key)
full outer join book using (book_key)
order by author_last_name; : include books without author and author without books
with
Union duplicate rows are removed, internally
union all plan is executed and
sort unique operation is done : so
Union all is better than
Union regarding performance aspect
select deptno from dept
minus
select deptno from emp
Use
minus instead of
not in operation
select deptno from dept
intersect
select deptno from emp
- anti-join ( not in, not exists)
select author_last_name from author
where
author_key not in ( select author_key from book_author);
select author_last_name from author a
where not exists
( select 1 from book_author b where b.author_key=a.author_key)
select last_name, dept_id
from
emp cross join dept;
result in a cartesian product
select stuff from emp natural join dept;
Require that tables have same column name
select depart_name, city
from
departments join locations using (location_id)
- SORT MERGE JOIN
- Usually used : no index exist to join the table columns, hash_area_size or pga_aggregate_target does not offer enough RAM for a hash join
- Optimizer cardinality estimates indicate that the SQL will require visiting the majority of data blocks from both tables
- When the all_rows optimizer_mode determines that a full-table scan is less resource intensive than index access
col c1 heading ‘Date’ format a20
col c2 heading ‘Hash|Join|Count’ format 99,999,999
col c3 heading ‘Rows|Processed’ format 99,999,999
col c4 heading ‘Disk|Reads’ format 99,999,999
col c5 heading ‘CPU|Time’ format 99,999,999
ttitle ‘Merge Joins over time’
select
to_char(
sn.begin_interval_time,
'yy-mm-dd hh24'
)
count(*)
sum(st.rows_processed_delta)
sum(st.disk_reads_delta)
sum(st.cpu_time_delta)
from
dba_hist_snapshot sn,
dba_hist_sqlstat st,
dba_hist_sql_plan sp
snap_time,
ct,
row_ct,
disk,
cpu
where
-- Additional cost licenses are required to access the dba_hist tables.
st.snap_id = sn.snap_id
and
st.dbid = sn.dbid
and
st.instance_number = sn.instance_number
and
sp.sql_id = st.sql_id
and
sp.dbid = st.dbid
and
sp.plan_hash_value = st.plan_hash_value
and
sp.operation = 'MERGE JOIN’
group by
to_char(sn.begin_interval_time,'yy-mm-dd hh24');
select /+* use_merge(e,b) parallel(e, 63 ) parallel (b,63) */
e.ename, hiredate, b.com
from emp e, bonnus b
where e.ename=b.ename
A parallel query is run against a 64 CPU server
select employee_name, department_name
from employees e, departments d
where e.department_id=d.department_id
suppose we have standard primary key, foreign-key relationship for department_id (i.e. index on these columns) Oracle may use NESTED LOOP method performing full table scan on departements table( the driving table ) and looping through the emp.idpt_ix to access the machine rows in emp table
Optimizer will choose the smallest table or table without index
Nested loop joins always invoke an index and hence, they cannot be paralleled
Use
use_nl_with_index hint instead of use_nl, Optimizer will use nested loop if suitable index exists
- awr_nested_join_alert.sql
col c1 heading ‘Date’ format a20
col c2 heading ‘Nested|Loops|Count’ format 99,999,999
col c3 heading ‘Rows|Processed’ format 99,999,999
col c4 heading ‘Disk|Reads’ format 99,999,999
col c5 heading ‘CPU|Time’ format 99,999,999
accept nested_thr char prompt ‘Enter Nested Join Threshold: ‘
ttitle ‘Nested Join Threshold|&nested_thr’
select
to_char(
sn.begin_interval_time,
'yy-mm-dd hh24'
)
count(*)
sum(st.rows_processed_delta)
sum(st.disk_reads_delta)
sum(st.cpu_time_delta)
from
dba_hist_snapshot sn,
dba_hist_sqlstat st,
dba_hist_sql_plan sp
snap_time,
ct,
row_ct,
disk,
cpu
-- make sure that you are licensed to read the AWR tables
where
st.snap_id = sn.snap_id
and
st.dbid = sn.dbid
and
st.instance_number = sn.instance_number
and
sp.sql_id = st.sql_id
and
sp.dbid = st.dbid
and
sp.plan_hash_value = st.plan_hash_value
and
sp.operation = 'NESTED LOOPS'
group by
to_char(sn.begin_interval_time,'yy-mm-dd hh24')
having
count(*) > &nested_thr;
Use PGA RAM to build an in-memory hash table from the smaller driving table. Rows in the RAM hast table are then used as build input rows source, providing ROWID required into the larger table.
- hash_area_size or pga_aggregate_target have to be set to high value
- temp usage
If there is no room in the RAM Oracle can overflow on TEMP tablespace
Whenever the driving table will fit into the hash_area_size(pga_aggregate_target) RAM, a
hash join may run faster than a
nested loops join.
hash area should be 1.6 times as large as a driving table :
set heading off;
set feedback off;
set verify off;
set pages 999;
spool run_hash.sql
select
'alter session set hash_area_size='||trunc(sum(bytes)*1.6)||';'
from
dba_segments
where
segment_name = upper('&1');
spool off;
Use hint
parallel instead of setting parallelisum on table level
select /+* use_hash(e,b) parallel(e, 63 ) parallel (b,63) */
e.ename, hiredate, b.com
from emp e, bonnus b
where e.ename=b.ename
col c1 heading ‘Date’ format a20
col c2 heading ‘Hash|Join|Count’ format 99,999,999
col c3 heading ‘Rows|Processed’ format 99,999,999
col c4 heading ‘Disk|Reads’ format 99,999,999
col c5 heading ‘CPU|Time’ format 99,999,999
accept hash_thr char prompt ‘Enter Hash Join Threshold: ‘
ttitle ‘Hash Join Threshold|&hash_thr’
select
to_char(sn.begin_interval_time,'yy-mm-dd hh24') c1,
count(*) c2,
sum(st.rows_processed_delta) c3,
sum(st.disk_reads_delta) c4,
sum(st.cpu_time_delta) c5
from
dba_hist_snapshot sn,
dba_hist_sql_plan p,
dba_hist_sqlstat st
where
st.sql_id = p.sql_id
and
sn.snap_id = st.snap_id
and
p.operation = 'HASH JOIN'
having
count(*) > &hash_thr
group by
begin_interval_time;