Why are partial PostgreSQL HASH indices not smaller than full indices?Why does MySQL not have hash indices on MyISAM or InnoDB?Adding index to large mysql tablesWhy would function based indices I’ve created lower the cost but not show up in the explain plan breakdown?Optimizing indexesPostgreSQL indices on multi fields queriesPostgreSQL not using partial index when using boolean in WHERE clause512 Bytes are not being used from SQL Server's 8 KByte data pageIndexing strategy for VARCHAR2 LIKE searchHow is it possible for Hash Index not to be faster than Btree for equality lookups?PostgreSQL suitabilty of Hash Index on PK and FK
Why does a car's steering wheel get lighter with increasing speed
Why do we say 'Pairwise Disjoint', rather than 'Disjoint'?
When Central Limit Theorem breaks down
Why isn't P and P/poly trivially the same?
Why restrict private health insurance?
What does *dead* mean in *What do you mean, dead?*?
How does a sound wave propagate?
Why do we call complex numbers “numbers” but we don’t consider 2-vectors numbers?
Are brahmins allowed to drink alcohol?
Vector-transposing function
What is the orbit and expected lifetime of Crew Dragon trunk?
Can I challenge the interviewer to give me a proper technical feedback?
What is the oldest European royal house?
How do you make a gun that shoots melee weapons and/or swords?
How to educate team mate to take screenshots for bugs with out unwanted stuff
Does an unused member variable take up memory?
How to distinguish easily different soldier of ww2?
The (Easy) Road to Code
How to install "rounded" brake pads
Mixed Feelings - What am I
What is the purpose of a disclaimer like "this is not legal advice"?
Is divide-by-zero a security vulnerability?
Will the concrete slab in a partially heated shed conduct a lot of heat to the unconditioned area?
Is there a math expression equivalent to the conditional ternary operator?
Why are partial PostgreSQL HASH indices not smaller than full indices?
Why does MySQL not have hash indices on MyISAM or InnoDB?Adding index to large mysql tablesWhy would function based indices I’ve created lower the cost but not show up in the explain plan breakdown?Optimizing indexesPostgreSQL indices on multi fields queriesPostgreSQL not using partial index when using boolean in WHERE clause512 Bytes are not being used from SQL Server's 8 KByte data pageIndexing strategy for VARCHAR2 LIKE searchHow is it possible for Hash Index not to be faster than Btree for equality lookups?PostgreSQL suitabilty of Hash Index on PK and FK
I want to create the most efficient index for a sparsely populated column. I only need equality operations, so a HASH index should be beneficial.
Now I'm wondering why a partial HASH index isn't smaller than a full hash index:
CREATE INDEX full_hash ON mytable USING HASH(my_id); # 256 MB
CREATE INDEX partial_hash ON mytable USING HASH(my_id) WHERE my_ID IS NOT NULL; # 256 MB
CREATE INDEX full_btree ON mytable (my_id); # 537 MB
CREATE INDEX partial_btree ON mytable (my_id) WHERE my_ID IS NOT NULL; # 32 MB
Both hash indices take exactly the same amount of space (as shown in pgHero). However, when using standard BTREE indices, the partial index takes only 5% of the space of the full index.
Are partial HASH indices not supported in PostgreSQL 10?
postgresql index index-tuning postgresql-10
add a comment |
I want to create the most efficient index for a sparsely populated column. I only need equality operations, so a HASH index should be beneficial.
Now I'm wondering why a partial HASH index isn't smaller than a full hash index:
CREATE INDEX full_hash ON mytable USING HASH(my_id); # 256 MB
CREATE INDEX partial_hash ON mytable USING HASH(my_id) WHERE my_ID IS NOT NULL; # 256 MB
CREATE INDEX full_btree ON mytable (my_id); # 537 MB
CREATE INDEX partial_btree ON mytable (my_id) WHERE my_ID IS NOT NULL; # 32 MB
Both hash indices take exactly the same amount of space (as shown in pgHero). However, when using standard BTREE indices, the partial index takes only 5% of the space of the full index.
Are partial HASH indices not supported in PostgreSQL 10?
postgresql index index-tuning postgresql-10
add a comment |
I want to create the most efficient index for a sparsely populated column. I only need equality operations, so a HASH index should be beneficial.
Now I'm wondering why a partial HASH index isn't smaller than a full hash index:
CREATE INDEX full_hash ON mytable USING HASH(my_id); # 256 MB
CREATE INDEX partial_hash ON mytable USING HASH(my_id) WHERE my_ID IS NOT NULL; # 256 MB
CREATE INDEX full_btree ON mytable (my_id); # 537 MB
CREATE INDEX partial_btree ON mytable (my_id) WHERE my_ID IS NOT NULL; # 32 MB
Both hash indices take exactly the same amount of space (as shown in pgHero). However, when using standard BTREE indices, the partial index takes only 5% of the space of the full index.
Are partial HASH indices not supported in PostgreSQL 10?
postgresql index index-tuning postgresql-10
I want to create the most efficient index for a sparsely populated column. I only need equality operations, so a HASH index should be beneficial.
Now I'm wondering why a partial HASH index isn't smaller than a full hash index:
CREATE INDEX full_hash ON mytable USING HASH(my_id); # 256 MB
CREATE INDEX partial_hash ON mytable USING HASH(my_id) WHERE my_ID IS NOT NULL; # 256 MB
CREATE INDEX full_btree ON mytable (my_id); # 537 MB
CREATE INDEX partial_btree ON mytable (my_id) WHERE my_ID IS NOT NULL; # 32 MB
Both hash indices take exactly the same amount of space (as shown in pgHero). However, when using standard BTREE indices, the partial index takes only 5% of the space of the full index.
Are partial HASH indices not supported in PostgreSQL 10?
postgresql index index-tuning postgresql-10
postgresql index index-tuning postgresql-10
asked 18 hours ago
Ortwin GentzOrtwin Gentz
1324
1324
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
I would argue that this is a bug in the hash index code. When you create an index on an already-populated table, it tries to pre-size the index to hold all the data so that it doesn't have to keep splitting buckets as the index is created. But the code for doing this does not take the NULL fraction of the column nor (apparently) the selectivity of the partial index clause into account, so it arrives at a too-large number for the pre-sizing.
If you were to create the index first, and then populated the table, you will find that the hash index is small, whether you made it partial or not. If the table is going to grow substantially after the index is created, the extra space consumed by the index upon original creation will be put to good use.
3
I've started a thread about this on the developers mailing list (postgresql.org/message-id/flat/…) if anyone here would like to follow it.
– jjanes
10 hours ago
1
Oh, and I submitted a bug already: postgresql.org/message-id/…
– Ortwin Gentz
9 hours ago
add a comment |
It's not explicitly stated in the documentation, but in the source code there is the following comment:
/*
* We do not insert null values into hash indexes. This is okay because
* the only supported search operator is '=', and we assume it is strict.
*/
So the is not null
predicate does indeed change nothing, as null values are always ignored for hash indexes (which does make sense, as comparing null
values with =
would never return true).
2
Interesting. So apparently, hash indexes aren't appropriate for sparsely populated columns. I tested with a column even less populated (only a few 100 records out of >10 m total) and the index took 256 MB as well. So it looks like the space of a hash index only depends on table size, not on the number of different indexable values.
– Ortwin Gentz
17 hours ago
This explains why the two HASH indexes are the same size as each other, but not why they are so large compared to the btree indexes.
– jjanes
16 hours ago
The full btree index is more than double the size of the hash index.
– Ortwin Gentz
11 hours ago
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "182"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f231647%2fwhy-are-partial-postgresql-hash-indices-not-smaller-than-full-indices%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
I would argue that this is a bug in the hash index code. When you create an index on an already-populated table, it tries to pre-size the index to hold all the data so that it doesn't have to keep splitting buckets as the index is created. But the code for doing this does not take the NULL fraction of the column nor (apparently) the selectivity of the partial index clause into account, so it arrives at a too-large number for the pre-sizing.
If you were to create the index first, and then populated the table, you will find that the hash index is small, whether you made it partial or not. If the table is going to grow substantially after the index is created, the extra space consumed by the index upon original creation will be put to good use.
3
I've started a thread about this on the developers mailing list (postgresql.org/message-id/flat/…) if anyone here would like to follow it.
– jjanes
10 hours ago
1
Oh, and I submitted a bug already: postgresql.org/message-id/…
– Ortwin Gentz
9 hours ago
add a comment |
I would argue that this is a bug in the hash index code. When you create an index on an already-populated table, it tries to pre-size the index to hold all the data so that it doesn't have to keep splitting buckets as the index is created. But the code for doing this does not take the NULL fraction of the column nor (apparently) the selectivity of the partial index clause into account, so it arrives at a too-large number for the pre-sizing.
If you were to create the index first, and then populated the table, you will find that the hash index is small, whether you made it partial or not. If the table is going to grow substantially after the index is created, the extra space consumed by the index upon original creation will be put to good use.
3
I've started a thread about this on the developers mailing list (postgresql.org/message-id/flat/…) if anyone here would like to follow it.
– jjanes
10 hours ago
1
Oh, and I submitted a bug already: postgresql.org/message-id/…
– Ortwin Gentz
9 hours ago
add a comment |
I would argue that this is a bug in the hash index code. When you create an index on an already-populated table, it tries to pre-size the index to hold all the data so that it doesn't have to keep splitting buckets as the index is created. But the code for doing this does not take the NULL fraction of the column nor (apparently) the selectivity of the partial index clause into account, so it arrives at a too-large number for the pre-sizing.
If you were to create the index first, and then populated the table, you will find that the hash index is small, whether you made it partial or not. If the table is going to grow substantially after the index is created, the extra space consumed by the index upon original creation will be put to good use.
I would argue that this is a bug in the hash index code. When you create an index on an already-populated table, it tries to pre-size the index to hold all the data so that it doesn't have to keep splitting buckets as the index is created. But the code for doing this does not take the NULL fraction of the column nor (apparently) the selectivity of the partial index clause into account, so it arrives at a too-large number for the pre-sizing.
If you were to create the index first, and then populated the table, you will find that the hash index is small, whether you made it partial or not. If the table is going to grow substantially after the index is created, the extra space consumed by the index upon original creation will be put to good use.
answered 16 hours ago
jjanesjjanes
13.6k917
13.6k917
3
I've started a thread about this on the developers mailing list (postgresql.org/message-id/flat/…) if anyone here would like to follow it.
– jjanes
10 hours ago
1
Oh, and I submitted a bug already: postgresql.org/message-id/…
– Ortwin Gentz
9 hours ago
add a comment |
3
I've started a thread about this on the developers mailing list (postgresql.org/message-id/flat/…) if anyone here would like to follow it.
– jjanes
10 hours ago
1
Oh, and I submitted a bug already: postgresql.org/message-id/…
– Ortwin Gentz
9 hours ago
3
3
I've started a thread about this on the developers mailing list (postgresql.org/message-id/flat/…) if anyone here would like to follow it.
– jjanes
10 hours ago
I've started a thread about this on the developers mailing list (postgresql.org/message-id/flat/…) if anyone here would like to follow it.
– jjanes
10 hours ago
1
1
Oh, and I submitted a bug already: postgresql.org/message-id/…
– Ortwin Gentz
9 hours ago
Oh, and I submitted a bug already: postgresql.org/message-id/…
– Ortwin Gentz
9 hours ago
add a comment |
It's not explicitly stated in the documentation, but in the source code there is the following comment:
/*
* We do not insert null values into hash indexes. This is okay because
* the only supported search operator is '=', and we assume it is strict.
*/
So the is not null
predicate does indeed change nothing, as null values are always ignored for hash indexes (which does make sense, as comparing null
values with =
would never return true).
2
Interesting. So apparently, hash indexes aren't appropriate for sparsely populated columns. I tested with a column even less populated (only a few 100 records out of >10 m total) and the index took 256 MB as well. So it looks like the space of a hash index only depends on table size, not on the number of different indexable values.
– Ortwin Gentz
17 hours ago
This explains why the two HASH indexes are the same size as each other, but not why they are so large compared to the btree indexes.
– jjanes
16 hours ago
The full btree index is more than double the size of the hash index.
– Ortwin Gentz
11 hours ago
add a comment |
It's not explicitly stated in the documentation, but in the source code there is the following comment:
/*
* We do not insert null values into hash indexes. This is okay because
* the only supported search operator is '=', and we assume it is strict.
*/
So the is not null
predicate does indeed change nothing, as null values are always ignored for hash indexes (which does make sense, as comparing null
values with =
would never return true).
2
Interesting. So apparently, hash indexes aren't appropriate for sparsely populated columns. I tested with a column even less populated (only a few 100 records out of >10 m total) and the index took 256 MB as well. So it looks like the space of a hash index only depends on table size, not on the number of different indexable values.
– Ortwin Gentz
17 hours ago
This explains why the two HASH indexes are the same size as each other, but not why they are so large compared to the btree indexes.
– jjanes
16 hours ago
The full btree index is more than double the size of the hash index.
– Ortwin Gentz
11 hours ago
add a comment |
It's not explicitly stated in the documentation, but in the source code there is the following comment:
/*
* We do not insert null values into hash indexes. This is okay because
* the only supported search operator is '=', and we assume it is strict.
*/
So the is not null
predicate does indeed change nothing, as null values are always ignored for hash indexes (which does make sense, as comparing null
values with =
would never return true).
It's not explicitly stated in the documentation, but in the source code there is the following comment:
/*
* We do not insert null values into hash indexes. This is okay because
* the only supported search operator is '=', and we assume it is strict.
*/
So the is not null
predicate does indeed change nothing, as null values are always ignored for hash indexes (which does make sense, as comparing null
values with =
would never return true).
edited 17 hours ago
answered 18 hours ago
a_horse_with_no_namea_horse_with_no_name
40.5k777113
40.5k777113
2
Interesting. So apparently, hash indexes aren't appropriate for sparsely populated columns. I tested with a column even less populated (only a few 100 records out of >10 m total) and the index took 256 MB as well. So it looks like the space of a hash index only depends on table size, not on the number of different indexable values.
– Ortwin Gentz
17 hours ago
This explains why the two HASH indexes are the same size as each other, but not why they are so large compared to the btree indexes.
– jjanes
16 hours ago
The full btree index is more than double the size of the hash index.
– Ortwin Gentz
11 hours ago
add a comment |
2
Interesting. So apparently, hash indexes aren't appropriate for sparsely populated columns. I tested with a column even less populated (only a few 100 records out of >10 m total) and the index took 256 MB as well. So it looks like the space of a hash index only depends on table size, not on the number of different indexable values.
– Ortwin Gentz
17 hours ago
This explains why the two HASH indexes are the same size as each other, but not why they are so large compared to the btree indexes.
– jjanes
16 hours ago
The full btree index is more than double the size of the hash index.
– Ortwin Gentz
11 hours ago
2
2
Interesting. So apparently, hash indexes aren't appropriate for sparsely populated columns. I tested with a column even less populated (only a few 100 records out of >10 m total) and the index took 256 MB as well. So it looks like the space of a hash index only depends on table size, not on the number of different indexable values.
– Ortwin Gentz
17 hours ago
Interesting. So apparently, hash indexes aren't appropriate for sparsely populated columns. I tested with a column even less populated (only a few 100 records out of >10 m total) and the index took 256 MB as well. So it looks like the space of a hash index only depends on table size, not on the number of different indexable values.
– Ortwin Gentz
17 hours ago
This explains why the two HASH indexes are the same size as each other, but not why they are so large compared to the btree indexes.
– jjanes
16 hours ago
This explains why the two HASH indexes are the same size as each other, but not why they are so large compared to the btree indexes.
– jjanes
16 hours ago
The full btree index is more than double the size of the hash index.
– Ortwin Gentz
11 hours ago
The full btree index is more than double the size of the hash index.
– Ortwin Gentz
11 hours ago
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f231647%2fwhy-are-partial-postgresql-hash-indices-not-smaller-than-full-indices%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown