新浦京81707con > 注册购买 > 统计信息,PLSQL_统计信息系列1_统计信息的概念和

原标题:统计信息,PLSQL_统计信息系列1_统计信息的概念和

浏览次数:93 时间:2020-01-15

本节差相当的少介绍了PostgreSQL数据库中执会考察计算局计消息有关的数额词典表及其构造,包蕴pg_class、pg_statistic和pg_statistic_ext。

2014-12-18 Created By BaoXinjian

一、pg_class

在pg_class数据字典表中,存款和储蓄了Relation的多个总括音讯:页面占用总量relpages和元组总的数量reltuples,在优化器实施物理优化时用来估摸访谈路线的启航资金和总花销.测量试验脚本如下,在t_grxx表中插入10w数据,成立2个目录:

drop table if exists t_grxx;create table t_grxx(dwbh varchar,grbh varchar,xm varchar,xb varchar,nl int);insert into t_grxx(dwbh,grbh,xm,xb,nl) select generate_series/10||'',generate_series,'XM'||generate_series,(case when (floor*2)=0) then '男' else '女' end),floor * 100   1)::int;create index idx_t_grxx_grbh on t_grxx;create index idx_t_grxx_dwbh on t_grxx;

根底关系t_grxx中的总括音讯

testdb=# select relpages,reltuples from pg_class where relname = 't_grxx'; relpages | reltuples ---------- ----------- 726 | 100000 --> 页面数726,元组数10,000

索引idx_t_grxx_grbh的总计音讯

testdb=# select relpages,reltuples from pg_class where relname = 'idx_t_grxx_grbh'; relpages | reltuples ---------- ----------- 276 | 100000 --> 页面数276,元组数10,000

图片 1一、摘要

二、pg_statistic

pg_statistic是PG中存款和储蓄总计音信的严重性数据辞书表.通过命令ANALYZE生成总括数据,这么些总结数据在布置阶段提须要优化器使用,是开支估量的基本功.pg_statistic的表构造如下:

testdb=# d pg_statistic Table "pg_catalog.pg_statistic" Column | Type | Collation | Nullable | Default ------------- ---------- ----------- ---------- --------- starelid | oid | | not null | staattnum | smallint | | not null | stainherit | boolean | | not null | stanullfrac | real | | not null | stawidth | integer | | not null | stadistinct | real | | not null | stakind1 | smallint | | not null | stakind2 | smallint | | not null | stakind3 | smallint | | not null | stakind4 | smallint | | not null | stakind5 | smallint | | not null | staop1 | oid | | not null | staop2 | oid | | not null | staop3 | oid | | not null | staop4 | oid | | not null | staop5 | oid | | not null | stanumbers1 | real[] | | | stanumbers2 | real[] | | | stanumbers3 | real[] | | | stanumbers4 | real[] | | | stanumbers5 | real[] | | | stavalues1 | anyarray | | | stavalues2 | anyarray | | | stavalues3 | anyarray | | | stavalues4 | anyarray | | | stavalues5 | anyarray | | | Indexes: "pg_statistic_relid_att_inh_index" UNIQUE, btree (starelid, staattnum, stainherit)

中间:starelid:数据列所属的关联或索引Oidstaattnum:数据列编号stainherit:是还是不是三番一回表stanullfrac:NULL值所占的百分比stawidth:平均行大小,以字节为单位stadistinct:非NULL值的并世无双值音信.>0,表示有一些个唯意气风发值;=0,表示未知;<0,依据重复值现身次数求得:*stadistinct

x(1-stanullfrac)/timesstakindN:在N号slot槽中存款和储蓄的是哪一类类型的总计音讯staopN:在N号槽中所动用的操作符,如"=","<",">"等stanumbersN:在N号槽中存款和储蓄的数值类型总计音信stavaluesN:在N号槽中存放的数据值(使用anyarray类型State of Qatar

总括新闻项目,在src/include/catalog/pg_statistic.h文件中有连带解释,定义如下:

#define STATISTIC_KIND_MCV 1#define STATISTIC_KIND_HISTOGRAM 2#define STATISTIC_KIND_CORRELATION 3#define STATISTIC_KIND_MCELEM 4#define STATISTIC_KIND_DECHIST 5#define STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM 6#define STATISTIC_KIND_BOUNDS_HISTOGRAM 7

MCV=most common values,最多如牛毛值,即高频值HISTOGRAM=HISTOGRAM,数据布满直方图COENCORERELATION=COPAJERORELATION,相关周到,该列未排序的数据遍及与排序后的数据遍布的相关性MCELEM=most common elements,与MCV相像,存款和储蓄的是最普及非NULL值DECHIST=distinct elements count histogram,描述数组类型列的每行中分化值的布满。RANGE_LENGTH_HISTOGRAM=length histogram,以行为单位的偏离类型列的尺寸布满。BOUNDS_HISTOGRAM=与HISTOGRAM类似,面向range-type column

仍以t_grxx表为例说明该表的总结信息表构造:

testdb=# d t_grxx Table "public.t_grxx" Column | Type | Collation | Nullable | Default -------- ----------------------- ----------- ---------- --------- dwbh | character varying | | | grbh | character varying | | | xm | character varying | | | xb | character varying | | | nl | integer | | | Indexes: "idx_t_grxx_dwbh" btree  "idx_t_grxx_grbh" btree 

数码列信息:

testdb=# select attrelid,attname,attnum from pg_attribute where attrelid = 16742 order by attnum; attrelid | attname | attnum ---------- ---------- -------- 16742 | tableoid | -7 16742 | cmax | -6 16742 | xmax | -5 16742 | cmin | -4 16742 | xmin | -3 16742 | ctid | -1 16742 | dwbh | 1 16742 | grbh | 2 16742 | xm | 3 16742 | xb | 4 16742 | nl | 5 

dwbh/grbh/xm/xb/nl那5列的号码分别是三分之一/3/4/5.查询常规的总结音信

testdb=# select starelid,staattnum,stainherit,stanullfrac,stawidth,stadistinct from pg_statistic where starelid = 16742; starelid | staattnum | stainherit | stanullfrac | stawidth | stadistinct ---------- ----------- ------------ ------------- ---------- ------------- 16742 | 1 | f | 0 | 4 | 9984 16742 | 2 | f | 0 | 5 | -1 16742 | 3 | f | 0 | 7 | -1 16742 | 4 | f | 0 | 4 | 2 16742 | 5 | f | 0 | 4 | 100

stanullfrac为0表示一向不NULL值stawidth表示各列的平均行大小,譬如dwbh平均行大小为4个字节stadistinct表示唯大器晚成值音信,比方dwbh,10000行中有99捌十六个唯生机勃勃值,而xb只有2个,grbh和xm则海市蜃楼双重值.

查询其余总计音信

testdb=# select starelid,staattnum,stakind1,stakind2,stakind3,stakind4,stakind5 from pg_statistic where starelid = 16742 order by staattnum; starelid | staattnum | stakind1 | stakind2 | stakind3 | stakind4 | stakind5 ---------- ----------- ---------- ---------- ---------- ---------- ---------- 16742 | 1 | 1 | 2 | 3 | 0 | 0 16742 | 2 | 2 | 3 | 0 | 0 | 0 16742 | 3 | 2 | 3 | 0 | 0 | 0 16742 | 4 | 1 | 3 | 0 | 0 | 0 16742 | 5 | 1 | 3 | 0 | 0 | 0

第1列,在第51%/3个槽中分头存款和储蓄了系列为第59%/3种类型的计算新闻,即STATISTIC_KIND_MCV/STATISTIC_KIND_HISTOGRAM/STATISTIC_KIND_CO君越RELATION,其余槽未有内容第2列,在第二分之一个槽中分头存款和储蓄了连串为第2/3种档期的顺序的计算消息,即STATISTIC_KIND_HISTOGRAM/STATISTIC_KIND_COXC60RELATION,别的槽未有内容第5列,在第四分一个槽中分头存款和储蓄了系列为第1/4种档案的次序的总括音信,即STATISTIC_KIND_MCV/STATISTIC_KIND_COPAJERORELATION,别的槽未有内容首先列总计信息

testdb=# xExpanded display is on.testdb=# select staattnum,stakind1,staop1,stanumbers1,stavalues1,testdb-# stakind2,staop2,stanumbers2,stavalues2,testdb-# stakind3,staop3,stanumbers3,stavalues3testdb-# from pg_statistic testdb-# where starelid = 16742 testdb-# and staattnum = 1;-[ RECORD 1 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------staattnum | 1 -->dwbhstakind1 | 1 -->STATISTIC_KIND_MCV,高频值统计信息staop1 | 98 -->"=",查询pg_operator,select * from pg_operator where oid=98;stanumbers1 | {0.0003} -->高频值比例,即3/10000stavalues1 | {24} -->表示'24'这个值stakind2 | 2 -->STATISTIC_KIND_HISTOGRAM,直方图staop2 | 664 -->"<",左闭右开的区间stanumbers2 | -->无stavalues2 | {0,1084,1172,1262,1356,1441,1533,1622,1703,1790,1885,198,207,2165,2259,2351,2447,2530,262,2708,2799,2895,2985,3087,3172,3262,3359,3459,3541,3629,3716,3805,3900,3995,4078,417,4257,4345,4432,4515,461,47,479,489,4985,5069,5154,5244,533,542,5510,5596,5686,5786,587,5963,6053,6145,6232,6323,6412,650,6599,6686,6779,6868,6957,7038,7127,7218,7305,7403,7495,7588,7679,7767,7857,7942,8037,8125,8209,8307,8392,8481,8575,8664,8755,8844,8935,9022,9115,9202,9296,9378,9464,9561,965,9731,982,9906,9999} -->剔除高频值后的数据分布,假定平均分布,0为MIN值,9999为MAX值 -->[0,1084),[1084,1172),...[9906,9999)stakind3 | 3 -->STATISTIC_KIND_CORRELATION,相关系数staop3 | 664 -->"<"stanumbers3 | {0.817163}-->未排序和已排序的数据分布,有81.7%的相关性,值越高,顺序扫描结果越近似于排序stavalues3 | -->无

第二列总计音信

testdb=# xExpanded display is on.testdb=# select staattnum,stakind1,staop1,stanumbers1,stavalues1, stakind2,staop2,stanumbers2,stavalues2, stakind3,staop3,stanumbers3,stavalues3from pg_statistic where starelid = 16742 and staattnum = 2;-[ RECORD 1 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------staattnum | 2 -->grbhstakind1 | 2 -->STATISTIC_KIND_HISTOGRAM,直方图staop1 | 664 -->"<"stanumbers1 | -->NULLstavalues1 | {1,1088,11753,12659,1358,14448,15354,16253,17071,17939,18872,19821,20730,21676,22630,23540,24479,25302,26198,27069,27974,2894,29836,30851,31725,32619,33587,34584,35427,36280,37164,3806,39000,39942,40786,41695,42571,43452,44330,4516,46102,46997,47900,48899,49852,50697,5157,52452,533,54204,5512,55963,5686,57860,5871,59633,60544,61468,62331,63252,64134,65016,65993,66870,67798,68687,69568,70391,71285,72205,73066,74045,74962,75881,7679,77675,78573,79434,8038,81276,82125,83078,83939,84822,85764,86654,87557,88450,89352,90249,91171,92030,92979,93797,94665,95628,96510,97328,98212,99074,99998} -->参照第一列stakind2 | 3 -->STATISTIC_KIND_CORRELATION,相关系数staop2 | 664 -->"<"stanumbers2 | {0.816172}-->未排序和已排序的数据分布stavalues2 | -->NULLstakind3 | 0 -->第3个槽无统计信息staop3 | 0stanumbers3 | stavalues3 | 

第五列计算音讯

testdb=# select staattnum,stakind1,staop1,stanumbers1,stavalues1,testdb-# stakind2,staop2,stanumbers2,stavalues2,testdb-# stakind3,staop3,stanumbers3,stavalues3testdb-# from pg_statistic testdb-# where starelid = 16742 testdb-# and staattnum = 5;-[ RECORD 1 ]-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------staattnum | 5 -->nlstakind1 | 1 -->STATISTIC_KIND_MCV,高频值统计信息staop1 | 96 -->"="stanumbers1 | {0.0117667,0.0117333,0.0113,0.0112667,0.0112333,0.0110333,0.0109333,0.0109333,0.0109333,0.0108333,0.0108,0.0108,0.0107667,0.0107667,0.0107333,0.0107333,0.0107333,0.0107,0.0106333,0.0106333,0.0106333,0.0105667,0.0105333,0.0105333,0.0105,0.0104667,0.0104667,0.0104333,0.0104333,0.0103667,0.0103,0.0102667,0.0102333,0.0102333,0.0102333,0.0102333,0.0102,0.0102,0.0102,0.0101667,0.0101667,0.0101333,0.0101,0.0101,0.0100667,0.0100333,0.0100333,0.01,0.00996667,0.00996667,0.00996667,0.00996667,0.00993333,0.00993333,0.00993333,0.00993333,0.0099,0.00986667,0.00983333,0.00983333,0.0098,0.0098,0.0098,0.00973333,0.0097,0.00966667,0.00966667,0.00963333,0.00963333,0.00963333,0.00963333,0.0096,0.0096,0.00956667,0.00953333,0.0095,0.0095,0.0095,0.0095,0.00943333,0.0094,0.00936667,0.00936667,0.00933333,0.00933333,0.0093,0.00916667,0.00916667,0.00916667,0.00913333,0.0091,0.0091,0.0091,0.00906667,0.009,0.00896667,0.00893333,0.00876667,0.00876667,0.0079} -->每个高频值的出现频率stavalues1 | {73,40,7,51,15,8,14,74,100,53,77,89,33,42,26,29,78,11,16,38,58,18,27,95,49,34,92,65,87,30,32,28,23,62,82,96,1,50,60,36,70,52,9,37,84,54,61,5,59,72,80,93,35,66,79,81,55,68,88,98,13,47,56,22,24,39,43,3,21,75,85,25,44,71,86,4,20,83,97,91,99,63,94,17,46,57,48,67,69,90,2,6,45,19,10,64,76,31,41,12} -->高频值stakind2 | 3 -->STATISTIC_KIND_CORRELATION,相关系数staop2 | 97 -->"<",整型比较,664是字符型比较stanumbers2 | {0.00562935} -->0.5%的相关性,相关性较低stavalues2 | -->NULLstakind3 | 0 -->无相关信息staop3 | 0stanumbers3 | stavalues3 | 

Statistic 对Oracle 是非常关键的。

三、pg_statistic_ext

pg_statistic_ext数据字典表用于累积多列总计音信,需选用CREATE STATISTICS命令创立计算新闻,在推行ANALYZE命令时总计.仍以t_grxx为例,总结dwbh和grbh这两列的新闻

testdb=# create statistics sta_t_grxx_dwbh_grbh on dwbh,grbh from t_grxx;CREATE STATISTICStestdb=# analyze t_grxx(dwbh,grbh);ANALYZE

查询多列计算信息

testdb=# xExpanded display is on.testdb=# select * from pg_statistic_ext;-[ RECORD 1 ]--- -----------------------------------------stxrelid | 16742 -->数据表Oidstxname | sta_t_grxx_dwbh_grbh -->统计信息名称stxnamespace | 2200 -->表空间stxowner | 10 -->统计信息的Ownerstxkeys | 1 2 -->列编号,1 2表示dwbh grbhstxkind | {d,f} -->STATS_EXT_NDISTINCT/STATS_EXT_DEPENDENCIESstxndistinct | {"1, 2": 100000} -->STATS_EXT_NDISTINCT统计信息stxdependencies | {"1 => 2": 0