Generating YouTube-like IDs in Postgres using PL/V8 and Hashids
Recently on a Rails project, I ran into an issue where I wanted to expose a resource (lets say it was a product) in a RESTful route, but I also didn’t want the URLs to be easily guessable. In other words, following Rails conventions my standard “show” actions would be URLs like https://example.com/products/1, https://example.com/products/2, https://example.com/products/3, which are trivially guessable since we’re exposing the database’s auto-incrementing integer primary key as the resource ID. To prevent people from writing a super simple script that could scrape my whole product catalog, it would be nice if we could make the URLs not trivially guessable while still remaining publicly-accessable for people who know them.
One approach that some people advocate is simply using UUIDs, but I think URLs like https://example.com/products/3bc95fb9-f0c1-4af8-989e-6ea8467879d3 simply look nasty, particularly when you get into nested sub-resources with their own UUIDs tacked on. It’s something I don’t want to subject my users’ eyes to or have potentially affect SEO / page rank due to the extraneous length.1
Hashids
A nice compromise here is using a library called Hashids, which can take an integer input (e.g. our primary keys), and a salt, and obfuscate2 them into YouTube-like, short, non-guessable IDs like these: https://example.com/products/NV, https://example.com/products/6m, https://example.com/products/yD.
The Hashids project links to many implementations and documentation in various languages, including Ruby. Since my project is using Rails, a simple solution would be to add an after_create callback to my model to set an attribute using the Ruby library:
# == Schema Information
#
# Table name: products
#
# id :integer not null, primary key
# title :string
# hashid :string
#
# Indexes
#
# index_products_on_hashid (hashid)
#
class Product < ActiveRecord::Base
after_create :save_hashid
private
def save_hashid
unless self.hashid
h = Hashids.new(ENV["HASHID_SALT"], ENV["HASHID_MIN_LENGTH"].to_i)
self.update!(hashid: h.encode(self.id))
end
end
end
This works! However there are at least two drawbacks:
- Creating a
Productrequires two round-trips to the database: an INSERT to create the record with a NULL value in thehashidcolumn, then an UPDATE after Rails gets the value of the integeridcolumn and can calculate the Hashid value, and update the record with it. This should be safe in terms of not leaving half-bakedproductsrecords with NULLhashidvalues out there, since Rails runsafter_createcallbacks in the same transaction that creates the record, but it’s not good performance-wise. - Somewhat related to the first drawback, the schema for this table is not optimal as the
hashidcolumn should really have a NOT NULL constraint with a UNIQUE index. But using Rails callbacks forces it to be this way. It would be much more preferable if we could lean on the database to enforce referential integrity; at my job we’ve seen plenty of instances of bad data getting into loose schemas that Should Never Happen™ from the application’s point of view.
If only there were a way for Postgres to populate that column instead…
Executing JavaScript in Postgres using PL/V8
Luckily there is a way to do this using a Postgres extension that embeds the V8 JavaScript engine in Postgres called PL/V8!3
On Ubuntu, installing PL/V8 is as easy as doing sudo apt-get install postgresql-9.6-plv8 (substitute 9.6 with whatever Postgres version you have installed) and restarting the database cluster with sudo service postgres restart. Then, open a SQL prompt on the database you want to enable it for, and execute CREATE EXTENSION plv8;. Now you can write JavaScript functions in the database!
The first step is writing a function to load the Hashids library:
| CREATE OR REPLACE FUNCTION load_hashids() RETURNS VOID AS $$ | |
| (function() { | |
| !function(t,e){if("function"==typeof define&&define.amd)define(["module","exports"],e);else if("undefined"!=typeof exports)e(module,exports);else{var s={exports:{}};e(s,s.exports),t.Hashids=s.exports}}(this,function(t,e){"use strict";function s(t,e){if(!(t instanceof e))throw new TypeError("Cannot call a class as a function")}Object.defineProperty(e,"__esModule",{value:!0});var h=function(){function t(t,e){for(var s=0;s<e.length;s++){var h=e[s];h.enumerable=h.enumerable||!1,h.configurable=!0,"value"in h&&(h.writable=!0),Object.defineProperty(t,h.key,h)}}return function(e,s,h){return s&&t(e.prototype,s),h&&t(e,h),e}}(),r=function(){function t(){var e=arguments.length<=0||void 0===arguments[0]?"":arguments[0],h=arguments.length<=1||void 0===arguments[1]?0:arguments[1],r=arguments.length<=2||void 0===arguments[2]?"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890":arguments[2];s(this,t);var a=16,n=3.5,i=12,l="error: alphabet must contain at least X unique characters",u="error: alphabet cannot contain spaces",p="",o=void 0,f=void 0;this.escapeRegExp=function(t){return t.replace(/[-[\]{}()*+?.,\\^$|#\s]/g,"\\$&")},this.parseInt=function(t,e){return/^(\-|\+)?([0-9]+|Infinity)$/.test(t)?parseInt(t,e):NaN},this.seps="cfhistuCFHISTU",this.minLength=parseInt(h,10)>0?h:0,this.salt="string"==typeof e?e:"","string"==typeof r&&(this.alphabet=r);for(var g=0;g!==this.alphabet.length;g++)p.indexOf(this.alphabet.charAt(g))===-1&&(p+=this.alphabet.charAt(g));if(this.alphabet=p,this.alphabet.length<a)throw l.replace("X",a);if(this.alphabet.search(" ")!==-1)throw u;for(var c=0;c!==this.seps.length;c++){var b=this.alphabet.indexOf(this.seps.charAt(c));b===-1?this.seps=this.seps.substr(0,c)+" "+this.seps.substr(c+1):this.alphabet=this.alphabet.substr(0,b)+" "+this.alphabet.substr(b+1)}this.alphabet=this.alphabet.replace(/ /g,""),this.seps=this.seps.replace(/ /g,""),this.seps=this._shuffle(this.seps,this.salt),(!this.seps.length||this.alphabet.length/this.seps.length>n)&&(o=Math.ceil(this.alphabet.length/n),o>this.seps.length&&(f=o-this.seps.length,this.seps+=this.alphabet.substr(0,f),this.alphabet=this.alphabet.substr(f))),this.alphabet=this._shuffle(this.alphabet,this.salt);var d=Math.ceil(this.alphabet.length/i);this.alphabet.length<3?(this.guards=this.seps.substr(0,d),this.seps=this.seps.substr(d)):(this.guards=this.alphabet.substr(0,d),this.alphabet=this.alphabet.substr(d))}return h(t,[{key:"encode",value:function(){for(var t=arguments.length,e=Array(t),s=0;s<t;s++)e[s]=arguments[s];var h="";if(!e.length)return h;if(e[0]&&e[0].constructor===Array&&(e=e[0],!e.length))return h;for(var r=0;r!==e.length;r++)if(e[r]=this.parseInt(e[r],10),!(e[r]>=0))return h;return this._encode(e)}},{key:"decode",value:function(t){var e=[];return t&&t.length&&"string"==typeof t?this._decode(t,this.alphabet):e}},{key:"encodeHex",value:function(t){if(t=t.toString(),!/^[0-9a-fA-F]+$/.test(t))return"";for(var e=t.match(/[\w\W]{1,12}/g),s=0;s!==e.length;s++)e[s]=parseInt("1"+e[s],16);return this.encode.apply(this,e)}},{key:"decodeHex",value:function(t){for(var e=[],s=this.decode(t),h=0;h!==s.length;h++)e+=s[h].toString(16).substr(1);return e}},{key:"_encode",value:function(t){for(var e=void 0,s=this.alphabet,h=0,r=0;r!==t.length;r++)h+=t[r]%(r+100);e=s.charAt(h%s.length);for(var a=e,n=0;n!==t.length;n++){var i=t[n],l=a+this.salt+s;s=this._shuffle(s,l.substr(0,s.length));var u=this._toAlphabet(i,s);if(e+=u,n+1<t.length){i%=u.charCodeAt(0)+n;var p=i%this.seps.length;e+=this.seps.charAt(p)}}if(e.length<this.minLength){var o=(h+e[0].charCodeAt(0))%this.guards.length,f=this.guards[o];e=f+e,e.length<this.minLength&&(o=(h+e[2].charCodeAt(0))%this.guards.length,f=this.guards[o],e+=f)}for(var g=parseInt(s.length/2,10);e.length<this.minLength;){s=this._shuffle(s,s),e=s.substr(g)+e+s.substr(0,g);var c=e.length-this.minLength;c>0&&(e=e.substr(c/2,this.minLength))}return e}},{key:"_decode",value:function(t,e){var s=[],h=0,r=new RegExp("["+this.escapeRegExp(this.guards)+"]","g"),a=t.replace(r," "),n=a.split(" ");if(3!==n.length&&2!==n.length||(h=1),a=n[h],"undefined"!=typeof a[0]){var i=a[0];a=a.substr(1),r=new RegExp("["+this.escapeRegExp(this.seps)+"]","g"),a=a.replace(r," "),n=a.split(" ");for(var l=0;l!==n.length;l++){var u=n[l],p=i+this.salt+e;e=this._shuffle(e,p.substr(0,e.length)),s.push(this._fromAlphabet(u,e))}this._encode(s)!==t&&(s=[])}return s}},{key:"_shuffle",value:function(t,e){var s=void 0;if(!e.length)return t;for(var h=t.length-1,r=0,a=0,n=0;h>0;h--,r++){r%=e.length,a+=s=e.charAt(r).charCodeAt(0),n=(s+r+a)%h;var i=t[n];t=t.substr(0,n)+t.charAt(h)+t.substr(n+1),t=t.substr(0,h)+i+t.substr(h+1)}return t}},{key:"_toAlphabet",value:function(t,e){var s="";do s=e.charAt(t%e.length)+s,t=parseInt(t/e.length,10);while(t);return s}},{key:"_fromAlphabet",value:function(t,e){for(var s=0,h=0;h<t.length;h++){var r=e.indexOf(t[h]);s+=r*Math.pow(e.length,t.length-h-1)}return s}}]),t}();e.default=r,t.exports=e.default}); | |
| })() | |
| $$ LANGUAGE plv8 IMMUTABLE STRICT; |
(The above is simply the source for hashids.min.js wrapped in an immediately-executed anonymous function).
After executing that DDL to create the function, execute this SQL to run it:
SELECT load_hashids();
And now, the Hashids constant is ready for use in any JavaScript code inside PL/V8 functions for the remainder of the SQL session (each session gets its own global JS runtime context). We can now do a quick test of the Hashids library inside Postgres:
DO LANGUAGE PLV8 $$
var h = new Hashids('foo');
plv8.elog(NOTICE,h.encode(123));
$$;
You should see NOTICE: 1yR in the output, confirming it works!
As mentioned, this constant will only live as long as the SQL session. A new connection will require rerunning SELECT load_hashids(); to make it available again. Luckily, PL/V8 comes with support for a postgresql.conf configuration value we can use to load a custom PL/V8 function when the runtime is initialized. Simply add this to to postgresql.conf:
plv8.start_proc = 'load_hashids'
And now that is all handled for us!
An example usage
Now let’s put it all together with an example that fixes my issue with products. First, let’s make a helper SQL function to generate Hashids that we’ll be able to call from other SQL functions (like triggers):
CREATE FUNCTION gen_hashid(salt TEXT, min_length BIGINT, key BIGINT) RETURNS TEXT AS $$
var h = new Hashids(salt, min_length);
return h.encode(key);
$$ LANGUAGE PLV8 IMMUTABLE STRICT;
This can be tested like so:
SELECT gen_hashid('foo', 5, 123);
Which should output 61yR6.
Next, here’s a little mockup of a products schema that uses a pre-insert trigger to automatically generate Hashids:
| CREATE TABLE products ( | |
| id BIGSERIAL, | |
| title TEXT NOT NULL, | |
| hashid TEXT NOT NULL | |
| ); | |
| CREATE FUNCTION products_pre_insert() RETURNS trigger AS $$ | |
| BEGIN | |
| NEW.hashid := gen_hashid('products_secret_salt_here', 3, NEW.id); | |
| RETURN NEW; | |
| END; | |
| $$ LANGUAGE plpgsql; | |
| CREATE TRIGGER products_pre_insert BEFORE INSERT ON products FOR EACH ROW EXECUTE PROCEDURE products_pre_insert(); |
Now let’s test it out by inserting some test records:
INSERT INTO products (title) VALUES ('foo');
INSERT INTO products (title) VALUES ('foo');
INSERT INTO products (title) VALUES ('bar');
INSERT INTO products (title) VALUES ('baz');
And now let’s see what SELECT * FROM products returns:
id | title | hashid
----+-------+--------
1 | foo | WmX
2 | foo | 4zq
3 | bar | eJk
4 | baz | eEp
(4 rows)
Works beautifully! My problem is solved.
Note that in this example I hardcoded the salt and minimum length values in the products_pre_insert() function definition, but in reality one would probably want to create a table to store salt values as there should be a different salt value for each table that uses Hashids, and also salts should not be re-used between test environments and production.
Footnotes
1 I’m not saying it would necessarily affect SEO today, but SEO tends to trickle down from what Google et al consider to be human-friendly, which I don’t think excessively long machine-readable IDs are. I certainly think URLs that scroll way past the address bar with seemingly-random gibberish discourage people who share URLs via address bar copy and paste.
2 Although hash is in the name, the project makes clear it’s not a true cryptographic hash function (and thus not secure). But for my purposes, it’s exactly what I needed to discourage casual scraping while maintaining a certain level of user-friendliness that a very secure solution (UUIDs, real crypto hash functions) wouldn’t allow.
3 There are other Postgres extensions that add support for other languages, like PL/Python, but PL/V8 is a “trusted” Postgres language, while PL/Python is “untrusted.” Trusted languages are safer as they come with certain protections on what actions they can perform - untrusted languages can do anything that the database administrator can do! This is probably why AWS RDS supports PL/V8 but doesn’t support PL/Python.
Generating YouTube-like IDs in Postgres using PL/V8 and Hashids的更多相关文章
- A Deep Dive into PL/v8
Back in August, Compose.io announced the addition of JavaScript as an internal language for all new ...
- PostgreSQL-PL/pgSQL
参考: https://wiki.postgresql.org/wiki/9.1%E7%AC%AC%E4%B8%89%E5%8D%81%E4%B9%9D%E7%AB%A0 摘记: PL/pgSQL是 ...
- 通过ipv6访问 g o o g l e
Google.Youtube.Facebook等均支持IPv6访问,IPv4网络的用户大部分都无法访问,比如Gmail,Google Docs等等各种相关服务.而该类网站大部分均已接入IPv6网络,因 ...
- Consolidated Seed Table Upgrade Patch(Patch 17204589)
$ adop phase=apply patches= hotpatch=yes abandon=no Enter the APPS password: Enter the SYSTEM passwo ...
- shell 脚本阅读之二——ltp工具下的runltp
#!/bin/sh ################################################################################ ## ## ## ...
- <转>年终盘点!2017年超有价值的Golang文章
马上就要进入2018年了,作为年终的盘点,本文列出了一些2017年的关于Go编程的一些文章,并加上简短的介绍. 文章排名不分先后, 文章也不一定完全按照日期来排列.我按照文章的大致内容分了类,便于查找 ...
- kong服务网关API
kong服务网关API pingforever关注 0.1762017.05.23 11:16:08字数 834阅读 7,367 kong简介 Kong 是在客户端和(微)服务间转发API通信的API ...
- 【QQ技术】群文件报毒怎样下载?~ 变相绕过QQ复杂检验过程
刚才又人问我,要是群文件被鉴定为病毒那怎么下载? 我简单说一下吧: 其实qq客户端过滤比较严的,而web段却还是老一套,很多人说出现这个情况其实是腾讯已经把他库里面的文件删了,其实不然 如果源删了,那 ...
- (03)odoo模型/记录集/公用操作
-----------------更新时间11:17 2016-09-18 星期日11:00 2016-03-13 星期日09:10 2016-03-03 星期四11:46 2016-02-25 星期 ...
随机推荐
- 【2019年07月08日】A股最便宜的股票
查看更多A股最便宜的股票:androidinvest.com/CNValueTop/ 便宜指数 = PE + PB + 股息 + ROE,四因子等权,数值越大代表越低估. 本策略只是根据最新的数据来选 ...
- Flask高级
关于Flask启动,和请求处理 #关于后端服务的启动,无非就是启动实现了WSGI协议的socket #关于flask启动的无非就是下面两段代码 #加载配置,创建Flask对象 app = Flask( ...
- Sentry异常捕获平台
本文包括Sentry平台的介绍,以及环境搭建两部分,更多细节请查阅官方文档. 简介 Sentry是一个实时事件的日志聚合平台.它专门监测错误并提取所有有用信息用于分析,不再麻烦地依赖用户反馈来定位问题 ...
- 【POI】java服务生成List数据集合,后台服务生成xlsx临时文件,并将临时文件上传到腾讯云上
场景: java服务生成List数据集合,后台服务生成xlsx临时文件,并将临时文件上传到腾讯云上 今日份代码: 1.先是一个变量,作为文件名 private static final String ...
- 信安周报-第04周:系统函数与UDF
信安之路 第04周 前言 这周自主研究的任务如下: 附录解释: SQLi的时候应对各种限制,可以使用数据库自带的系统函数来进行一系列变换绕过验证 eg:字符串转换函数.截取字符串长度函数等 注入的时候 ...
- IEEE浮点表示 (原发布 csdn 2018-10-14 10:29:33)
目录 观察IEEE浮点表示 工作中遇到过整型转浮点型(union那种转换),碰到就看下书,过后就遗忘了.等过段时间又出现此现象,又重新拿起书本,这次记录了过程.然而一直等到今天才写出来,以防以后还用到 ...
- git 命令从入门到放弃
o(︶︿︶)o 由于项目使用 git 作为版本控制工具,自己可以进行一些常用操作但是有时候还是会忘掉,导致每次遇到 git 命令的使用问题时就要再查一遍,效率就立马降下来了,所以今天就来一个从头到尾 ...
- .net core+topshelf+quartz创建windows定时任务服务
.net core+topshelf+quartz创建windows定时任务服务 准备工作 创建.net core 控制台应用程序,这里不做过多介绍 添加TopShelf包:TopShelf: 添加Q ...
- Asp.net core 简单介绍
Asp.net core 是一个开源和跨平台的框架,用于构建如WEB应用,物联网(IoT)应用和移动后端应用等连接到互联网的基于云的现代应用程序.asp.net core 应用可运行.net和.net ...
- Windows Server2008R2,ServerWin2012 R2设置自动登录注册表配置
serverWin2008 R2 2012自动登录一般是通过control userpasswords2 命令修改,其实注册表修改更简单.复制以下保存为xx.reg文件导入即可即可. Windows ...