myouanorablog: mai 2015

jeudi 21 mai 2015

SQL Tuning : TABLE FULL SCAN / WITH INDEXES

Full Scan I/O is cheaper than I/O

95% of disk latency is the time required for the read-write head to move itself under the proper cylinder

Blevel

number of level in a b-tree index, common size 3 but can be 4 in data warehouse
column on dba_indexes

dimanche 10 mai 2015

SQL Tuning : Dynamic sampling

Optimizer_dynamic_sampling parameter

Default value 2, 0 to 10

Goal

Create more accurate selectivity and cardinality estimates

Level

1 : sample tables that appear in join or subquery conditions that have no indexes
2 : Samples all unanalyzed tables that have more than 32 blocks
3 : Samples using a column that applies the selectivity of the table
4 : 2 or more columns
5 : 64 blocks used
6 : 128 blocks

Table with >32 blocks

dynamic_sampling hint

select /* dynamic_sampling (customer 4) */
customer_name ...

Useful for tables that are created dynamically such as GTT

Scope

Ideal for a query that are going to be executed many times
When there is complex where clause

SQL Tuning : Join order

Table join optimization

Histograms

Setting histograms on columns in where clause join can help Optimizer to choose optimal join order

Transitive clausure

if a=b and b=c, therefore a=c

select * from a,b,c where
a.id=b.id and b.id=c.id
select * from a,b,c where
a.id=b.id and b.id=c.id and a.id=c.id

Ordered hint

The first table that appear the from clause is the driving table, often used when 4 or more tables are involved in join. It can save a huge amount of parse time.

select /*+ ordered use_hash(emp, dept) use_nl (sal, bonus) */
from
emp,
dept,
sal,
bonnus
where ....

orderered_predicates hint

CBO typically evaluates SQL predicate following 3 rules :
a) Subqueries are evaluated before outer Booleans conditions
b) Booleans conditions lacking (subqueries or built in-fonction) are evaluated in their order in where clause
c) Boolean conditions using build in-function are evaluated according to their costs

There ordered_predicates overrides these rules, allowing the where clause items to be evaluated in the order that they appear in the query.

vendredi 8 mai 2015

SQL Tuning : SQL Join Internal

Introduction

Joining 2 tables with index will result in Nested loops join, while missing index may result in sort merge join

Nested loops join may serve outer join

Logical joins

equi-join

select emp.name, bonnus.com from emp, bonnus where emp.name=bonnus.name

select emp.name, bonnus.com from emp inner join bonnus on emp.name=bonnus.name

outer join

select emp.name, bonnus.com from emp, bonnus where emp.name=bonnus.name(+) : include emp rows even where there was no matching in bonus table

select emp.name, bonnus.com from emp LEFT OUTER JOIN bonnus ON emp.name=bonnus.name

full outer join

select
author_last_name C1,
book_title C2
from
author full outer join book_author using (author_key)
full outer join book using (book_key)
order by author_last_name; : include books without author and author without books

Union and Union all

with Union duplicate rows are removed, internally union all plan is executed and sort unique operation is done : so Union all is better than Union regarding performance aspect

minus

select deptno from dept
minus
select deptno from emp

Use minus instead of not in operation

intersect

select deptno from dept
intersect
select deptno from emp

anti-join ( not in, not exists)

select author_last_name from author

where

author_key not in ( select author_key from book_author);

select author_last_name from author a

where not exists

( select 1 from book_author b where b.author_key=a.author_key)

cross join

select last_name, dept_id

from

emp cross join dept;

result in a cartesian product

nutural join

select stuff from emp natural join dept;

Require that tables have same column name

using clause

select depart_name, city

from

departments join locations using (location_id)

SORT MERGE JOIN

Usually used : no index exist to join the table columns, hash_area_size or pga_aggregate_target does not offer enough RAM for a hash join
Optimizer cardinality estimates indicate that the SQL will require visiting the majority of data blocks from both tables
When the all_rows optimizer_mode determines that a full-table scan is less resource intensive than index access

Find_merge_joins.sql

col c1 heading ‘Date’ format a20

col c2 heading ‘Hash|Join|Count’ format 99,999,999

col c3 heading ‘Rows|Processed’ format 99,999,999

col c4 heading ‘Disk|Reads’ format 99,999,999

col c5 heading ‘CPU|Time’ format 99,999,999

ttitle ‘Merge Joins over time’

select

to_char(

sn.begin_interval_time,

'yy-mm-dd hh24'

)

count(*)

sum(st.rows_processed_delta)

sum(st.disk_reads_delta)

sum(st.cpu_time_delta)

from

dba_hist_snapshot sn,

dba_hist_sqlstat st,

dba_hist_sql_plan sp

snap_time,

ct,

row_ct,

disk,

cpu

where

-- Additional cost licenses are required to access the dba_hist tables.

st.snap_id = sn.snap_id

and

st.dbid = sn.dbid

and

st.instance_number = sn.instance_number

and

sp.sql_id = st.sql_id

and

sp.dbid = st.dbid

and

sp.plan_hash_value = st.plan_hash_value

and

sp.operation = 'MERGE JOIN’

group by

to_char(sn.begin_interval_time,'yy-mm-dd hh24');

use_merge hint

select /+* use_merge(e,b) parallel(e, 63 ) parallel (b,63) */
e.ename, hiredate, b.com
from emp e, bonnus b
where e.ename=b.ename

A parallel query is run against a 64 CPU server

NESTED LOOPS JOIN

sample

select employee_name, department_name

from employees e, departments d

where e.department_id=d.department_id

suppose we have standard primary key, foreign-key relationship for department_id (i.e. index on these columns) Oracle may use NESTED LOOP method performing full table scan on departements table( the driving table ) and looping through the emp.idpt_ix to access the machine rows in emp table

The driving table

Optimizer will choose the smallest table or table without index

no parallelism

Nested loop joins always invoke an index and hence, they cannot be paralleled

hints

Use use_nl_with_index hint instead of use_nl, Optimizer will use nested loop if suitable index exists

awr_nested_join_alert.sql

col c1 heading ‘Date’ format a20
col c2 heading ‘Nested|Loops|Count’ format 99,999,999
col c3 heading ‘Rows|Processed’ format 99,999,999
col c4 heading ‘Disk|Reads’ format 99,999,999
col c5 heading ‘CPU|Time’ format 99,999,999
accept nested_thr char prompt ‘Enter Nested Join Threshold: ‘
ttitle ‘Nested Join Threshold|&nested_thr’
select
to_char(
sn.begin_interval_time,
'yy-mm-dd hh24'
)
count(*)
sum(st.rows_processed_delta)
sum(st.disk_reads_delta)
sum(st.cpu_time_delta)
from
dba_hist_snapshot sn,
dba_hist_sqlstat st,
dba_hist_sql_plan sp
snap_time,
ct,
row_ct,
disk,
cpu
-- make sure that you are licensed to read the AWR tables
where
st.snap_id = sn.snap_id
and
st.dbid = sn.dbid
and
st.instance_number = sn.instance_number
and
sp.sql_id = st.sql_id
and
sp.dbid = st.dbid
and
sp.plan_hash_value = st.plan_hash_value
and
sp.operation = 'NESTED LOOPS'
group by
to_char(sn.begin_interval_time,'yy-mm-dd hh24')
having
count(*) > &nested_thr;

HASH JOIN

how it works

Use PGA RAM to build an in-memory hash table from the smaller driving table. Rows in the RAM hast table are then used as build input rows source, providing ROWID required into the larger table.

hash_area_size or pga_aggregate_target have to be set to high value
temp usage

If there is no room in the RAM Oracle can overflow on TEMP tablespace

vs nested loop

Whenever the driving table will fit into the hash_area_size(pga_aggregate_target) RAM, a hash join may run faster than a nested loops join.

Sizing hash area in RAM

hash area should be 1.6 times as large as a driving table :
set heading off;
set feedback off;
set verify off;
set pages 999;
spool run_hash.sql
select
'alter session set hash_area_size='||trunc(sum(bytes)*1.6)||';'
from
dba_segments
where
segment_name = upper('&1');
spool off;

Hint

Use hint parallel instead of setting parallelisum on table level
select /+* use_hash(e,b) parallel(e, 63 ) parallel (b,63) */
e.ename, hiredate, b.com
from emp e, bonnus b
where e.ename=b.ename

hash_join_alert.sql

col c1 heading ‘Date’ format a20
col c2 heading ‘Hash|Join|Count’ format 99,999,999
col c3 heading ‘Rows|Processed’ format 99,999,999
col c4 heading ‘Disk|Reads’ format 99,999,999
col c5 heading ‘CPU|Time’ format 99,999,999
accept hash_thr char prompt ‘Enter Hash Join Threshold: ‘
ttitle ‘Hash Join Threshold|&hash_thr’
select
to_char(sn.begin_interval_time,'yy-mm-dd hh24') c1,
count(*) c2,
sum(st.rows_processed_delta) c3,
sum(st.disk_reads_delta) c4,
sum(st.cpu_time_delta) c5
from
dba_hist_snapshot sn,
dba_hist_sql_plan p,
dba_hist_sqlstat st
where
st.sql_id = p.sql_id
and
sn.snap_id = st.snap_id
and
p.operation = 'HASH JOIN'
having
count(*) > &hash_thr
group by
begin_interval_time;

SQL Tuning : Using Hints

Good hints vs Bad hints

good hint : ordered

Automatic Query Rewrite with Hint

use_concat

select ename from emp
where
deptno=10
or
sal< 5000
or
job='CLERK'

The first option is to create a bitmap index on the 3 columns, the second option is to use use_concat hint :

select /*+ use_concat */ ename from emp
where
deptno=10
or
sal< 5000
or
job='CLERK'

SQL Tuning : INDEX

- Bitmap index and clause or : Oracle automatically uses bitmap indexes for queries with multiple or conditions on bitmap columns

lundi 4 mai 2015

Execution Plan

SET AUTOTRACE TRACEONLY STATISTICS EXPLAIN
select * from module where module_id=125;
==> compair rows processed with consistant gets

Costs and the Cost-based Optimizer
cost column in execution plan is not a reliable way to judge the real costs of a SQL statements response time. It is supposed to be a guess of number of single block reads required

The most common causes for inaccurate CBO costing estimates may include :

Bad table join order : Sub-optimal table join order can be caused by missing histogram
Inaccurate I/O estimates : Oracle uses the estimate of single block read time which can be inaccurate if system statistics are missing
Buffer caching estimate : If opimizer_index_caching is not set or if the content of the data buffers carry greatly, the optimizer can mis-estimate the I/O costs for a query optimization.

Reading Execution Plan

Explain plan supplemental information

Card : number of rows Optimizer guesses it will return, used to determine optimal table join order
Bytes : This is the number of bytes that might be returned, a rough idea of the total baggage that must be passed to any subsequent steps

The Nested Loops Join

Driving table : usually uses table access full or index range scan access method, it is the table specificed first after nested loops table access method
Second table : normally uses index unique scan method
More : only one of indexes may be accessed as a range

Phisical Disk reads : aw_top_tables_phyrd.sql

col c0 heading ‘Begin|Interval|time’ format a8

col c1 heading ‘Table|Name’

col c2 heading ‘Disk|Reads’

col c3 heading ‘Rows|Processed’

select

from (

select

format a20

format 99,999,999

to_char(s.begin_interval_time,'mm-dd hh24') c0,

p.object_name c1,

sum(t.disk_reads_total) c2,

sum(t.rows_processed_total) c3,

DENSE_RANK() OVER (PARTITION BY to_char(s.begin_interval_time,'mm-dd hh24') ORDER BY SUM(t.disk_reads_total) desc) AS rnk

from

dba_hist_sql_plan p,

dba_hist_sqlstat t,

dba_hist_snapshot s

where

p.sql_id = t.sql_id

and

t.snap_id = s.snap_id

and

p.object_type like '%TABLE%'

group by

to_char(s.begin_interval_time,'mm-dd hh24'),

p.object_name

order by

c0 desc, rnk

)

where rnk <= 5;

Index range scan : awr_sql_index_freq.sql

col c1 heading ‘Object|Name’ format a30

col c2 heading ‘Option’ format a15

col c3 heading ‘Index|Usage|Count’ format 999,999

select

p.object_name c1,

p.options c2,

count(1) c3

from

dba_hist_sql_plan p,

dba_hist_sqlstat s

where

p.object_owner <> 'SYS'

and

p.options like '%RANGE SCAN%'

and

p.operation like ‘%INDEX%’

and

p.sql_id = s.sql_id

group by

p.object_name,

p.operation,

p.options

order by

1,2,3;

Total counts for each object and table access : aw_sql_object_freq.sql

col c1 heading ‘Object|Name’ format a30

col c2 heading ‘Operation’ format a15

col c3 heading ‘Option’ format a15

col c4 heading ‘Object|Count’ format 999,999

break on c1 skip 2

break on c2 skip 2

select

p.object_name c1,

p.operation c2,

p.options c3,

count(1) c4

from

dba_hist_sql_plan p,

dba_hist_sqlstat s

where

p.object_owner <> 'SYS'

and

p.sql_id = s.sql_id

group by

p.object_name,

p.operation,

p.options

order by

1,2,3;

Count number of access type : TFS, Index range scan, index unique access, index full scan

plan9i.sql : v$sql_plan
plan10g.sql : dba_hist_sql_plan
Use case 1 : candidate for keep pool ( small TFS <2% db_cache_size), for recycle pool ( large TFS)
Use case 2 : Indexes that are accessed via ROWID ==> non-range scan access

Tracing SQL execution history : dba_hist_sql_plan : dbms_xplan.display_awr
Using SQL profiles : stored persistently in the data dictionary : generate a better execution plan than the normal optimisation because it is tested against a real-world workload in the SQL Tuning Set(STS)

dimanche 3 mai 2015

SQL Tuning : Gather statistics

export_fixed_objects_stats
import_fixed_objects_stats

Workload Statistics
multiblock reads ( mreadtim)
sequential read I/O time ( seriatim)
Both are updated by dbms_stats.gather_system.stats procedure

Noworkload Statistics : cpuspeednw, ioseektim, and iofrspeed
Workload Statistics : sreadtim, mreadtim, cpuspeed, mbrc, maxthr and slavethr updated by dbms_stats.gather_system.stats procedure
If Both are available CBO will used Workload Statistics

select
sname,
pname,
pvall
from
sys.aux_stats$

sreadtim : Single block read time in millisecond
mreadtim : Multiple block read time in millisecond
cpuspeed : CPU speed
mbrc : Average block read per multiblock read
maxthr : Maximum I/O throughput
slavethr : Slave throughput

dbms_stats.gather_system.stats procedure is important to choose TFS vs Index scan

execute bms_stats.gather_system.stats( start)
-- one hour delay during high worklaod
execute bms_stats.gather_system.stats( stop)

or
dbms_stats.gather_system_stats('INTERVAL', interval=60)

SQL Tuning : Histograms

Skew only is time consuming
only useful for column used in where clause
Histograms are used only in 2 cases :
- Tables join order
- Table access : TFS vs Index Scan

3 options
method_opt=> 'for all columns size skewonly'
method_opt=> 'for all columns size repeat'
method_opt=> 'for all columns size auto'

1) for all columns size skew only
2) Monitor ( alter table toto monitor) : no default?
3) for all columns size auto
4)for all columns size repeat
5) periodically skew only & auto, when there is a new column added

A low cardinality < 254 disincts values will create a frequency histo
A high caridinality >254 distints values will create a heigh-balanced hits

Use histogram for foreign key columns; lack of histogram can lead to bad order in join, set histogram instead of using ORDER hint