[转载] ORMs under the hood
原文: http://www.vertabelo.com/blog/technical-articles/orms-under-the-hood
It often happens that if something is loved, it is also hated with the same power. The idea of object relational mapping fits into this concept perfectly. You will definitely come across many opposite points and fierce discussions as well as unwavering advocates and haters. So if you have ever asked whether to use ORM or not, the answer “it depends” will not be enough.
What is an object-relational mapping (ORM)?
Object-relational mapping refers to synchronization between two different representations of data. From one side, there is a relational database, where the data is represented in terms of tuples, grouped into relations, while in application we manipulate objects.
Most contemporary applications are built in object oriented technologies like Java or C# and use mostly relational databases to store data. Since it is definitely more convenient when data retrieved from the database can be accessed in the form of objects, there are also many implementations of ORM for many languages and frameworks (current list is quite huge ).
There is a big controversy connected to this topic which forces us to ask: What exactly are the benefits of an ORM and what are its detriments? Wading through different blog posts and discussions I will start from the opinion of Megan Bowra-Dean (@megahbite):
They’re great 99% of the time in MVC frameworks but that other 1% causes a lot of headaches if you don’t know SQL.
It’s said that ORM allows a developer to completely separate themselves from the database and almost forget about it (apart from situations when database changes need to be performed). Indeed a large number of developers are not very passionate about SQL and database concepts, and they live with the illusion that ORM will handle the job for them. Consequently the database sinks into oblivion till some problem appears. Neglecting the database while using an ORM is a big mistake and causes many problems.
Why SQL is so important while using an ORM?
Generally, there are two approaches that are followed. The first one is “Domain model centric” in which the code drives database design, while the second one, “Relational model centric”, places database design as the first step followed by the code. Both approaches have their pros and cons, but the second one is earning more advocates, mostly due to the database-first approach and requirements that are impossible to avoid like knowing about the principles of relational database and SQL.
So, what are the most common dangers associated with using ORM and not knowing SQL?
Danger 1: Not knowing database data types. This usually involves using too-general data types like ‘text’ type in PostgreSQL for most of the columns. This type is introduced to store strings of unlimited length what is identified with CLOB type. Multiple columns with this type cause an enormous performance challenge.
Danger 2: Not knowing constraints. This approach is quite often seen in many database designs. As joins become the most vulnerable case, in this approach they are overused which is extremely inefficient.
Danger 3: Not knowing basic database concepts : for example, not knowing about indexes. Without these smart structures, retrieving data becomes a performance nightmare. A system with data stored in a database with no indexes is a sign of imminent death for the application.
These three points are some of the many things the beginner developer need to know. A strong enthusiast of SQL and the creator of jOOQ (SQL library for Java) – Lukas Eder (@lukaseder), confirms a painful truth that many developers don’t know SQL:
We’re just not exposed to SQL nowadays. But consider this: We developers (or our customers) are paying millions of dollars every year to Oracle, Microsoft, IBM, SAP for their excellent RDBMS, only to ignore 90% of their database features and to perform a little bit of CRUD and a little bit of ACID with ORMs like Hibernate. We have forgotten about why those RDBMS were so expensive in the first place.
Source: “NoSQL? No, SQL! – How to Calculate Running Totals”
Use only SQL or stay with ORM?
The thesis of the article “What ORMs have taught me: just learn SQL” by Geoff Wozniak:
Attribute creep and excessive use of foreign keys shows me that in order to use ORMs effectively, you still need to know SQL. My contention with ORMs is that, if you need to know SQL, just use SQL since it prevents the need to know how non-SQL gets translated to SQL.
This article caused an avalanche of replies and contradictory discussions. Most of them were focusing on abandoning ORM and instead using only SQL or staying with it and suffering from learning both the ORM and SQL. It still appears to be an unresolved problem.
One of the first-class benefits of ORM is its ease-of-use and the fact that it requires fewer lines of code. It definitely makes development quicker but only in case you’re familiar with a particular ORM. For others, ORM is a waste of time spent resolving persistence related issues. Not knowing it imposes some learning time and cumbersome debugging as a consequence of lack of experience. Indeed, with an ORM, you have no control over what is happening “under the hood.” Sometimes there are many configurations that are difficult to manage efficiently, many weird behaviors and in many cases problems with generating an optimized SQL query with an ORM. And here a common, popular story gets repeated: an ORM presents nice development initially and then there are problems when tracking ORM related bugs and inefficiencies. The statement of fellow ORM developer Gavin King – the creator of Hibernate rings true.
At this point I could come to the conclusion that ORM is not a universal tool and that all the haters are justified, because ORMs were designed this way on purpose, like the concept of ‘leaky abstraction.’ In his comments on Reddit, Gavin King explains this concept as:
the leakiness of the ORM abstraction is a feature, not a bug. It’s meant to be that way, because both object model and the relational model are valid ways of looking at the data, and both are useful
systems like Hibernate are intentionally designed as leaky abstractions so that it’s possible to easily mix in native SQL where necessary
Not only Gavin, but many others that have problems with ORMs say that it is connected to not enough knowledge about SQL, relational databases and the relational model instead of the way that ORM works with SQL. The commonly cited example of a developer with “a single object” with 600 attributes and 14 joins perfectly proves this phenomenon.
How does ORM work?
The main postulate that characterizes ORM is that it encapsulates database interaction inside an object. One part of the object keeps the data and the second one knows how to deal with the data and transfers it to the relational database (this functionality is implemented inside the ORM engine).
ORMs design patterns
According to Martin Fowler (@martinfowler), there are two patterns that were adopted in different object relational mappers.
The first one is Active Record and it is “An object that wraps a row in a database table or view, encapsulates the database access, and adds domain logic on that data.” It means that a record on a table in a relational database is represented as an object.
Source: http://www.martinfowler.com/eaaCatalog/activeRecord.html
The second one is a Data Mapper.
According to the official definition, it is “A layer of Mappers that moves data between objects and a database while keeping them independent of each other and the mapper itself.” That means that there is a special layer that separates in-memory objects from the database and its main responsibility is to transfer data between those two layers.
Source: http://martinfowler.com/eaaCatalog/dataMapper.html
There are a lot discussions about the supremacy of one over the other. Generally speaking, the Data Mapper pattern lacks the efficiencies when it comes to CRUD operations. In comparison to the Active Record pattern, the representation of an object is not necessarily a record in the database. The strongest feature of the Data Mapper design is the one API for various data stores (relational and NO SQL databases: external data stores accessed through REST, file based data storage or schema free databases).
Popular ORMs
Here is the list of some popular ORMs for the three leading programming languages: Python, Java and PHP.
ORMs using Active Record pattern:
Python | Java | PHP | ||
---|---|---|---|---|
ORMs using Data Mapper pattern:
Python | Java | PHP | ||
---|---|---|---|---|
In brief: advantages and disadvantages of using ORM
Let’s gather together the most important points for and against ORM.
FOR:
- let the developer think in terms of objects rather than tables
- no need to write SQL code
- many advanced features like lazy loading
- database independent: No need to write code specific to a particular database
- reduces code and allows developers to focus on their business logic, rather than complex database queries
- various ORMs provide a rich query interface
AGAINST:
- complex (because they handle a bidirectional mapping). Their complexity implies a grueling learning curve – they have a special query language which developers have to learn
- provides only a leaky abstraction over a relational data store
- usually systems using an ORM perform badly (due to naive interactions with the underlying database)
- ORM, by adding a layer of abstraction, speeds up the development but adds overhead to the application
Depending on your problem domain, the cons may outweigh the pros and so you might just want to stick with straight SQL. It’s important to know when to use and when not to use ORM.
SAY YES TO ORM:
- it is intended for OLTP applications
SAY NO TO ORM:
- not intended for batch processing
- not recommended where there is a need to perform huge analysis
ORMs by example
Considering the below simple model, let’s see how some example usage looks in top ORMs in Python, Java and PHP:
↑ Click on a logo to preview the model in Vertabelo | Download the model as a png file
The ddl for this example looks as follows:
-- tables -- Table: client CREATE TABLE client ( id int NOT NULL , full_name varchar (255) NOT NULL , email varchar (255) NOT NULL , CONSTRAINT client_pk PRIMARY KEY (id) ); -- Table: purchase CREATE TABLE purchase ( id int NOT NULL , purchase_no char (12) NOT NULL , client_id int NOT NULL , CONSTRAINT purchase_pk PRIMARY KEY (id) ); -- foreign keys -- Reference: client_purchase (table: purchase) ALTER TABLE purchase ADD CONSTRAINT client_purchase FOREIGN KEY (client_id) REFERENCES client (id) NOT DEFERRABLE INITIALLY IMMEDIATE ; |
Python
Let’s start from one of the most-used Python ORM – SQLAlchemy. This framework provides an ORM with an adopted Data Mapper pattern. It contains two parts. The first one is ORM, while the second one is the Core. The Core is a fully featured SQL abstraction toolkit with SQL Expression Language that enable you to write SQL statements via generative Python expressions. Advanced ORM features like unit of work (confines all the database manipulation code to a specific database session that controls the life cycles of every object in that session. Session here is a beefed up version of database transaction), in-memory collections, eager loading of collections via joins and secondary subselects, and other optimizations allow SQLAlchemy’s ORM to emit efficient queries, select rows from not only tables, but also joins and other select statements and compose it to various Python structures.
In order to map tables to Python classes we have to define subclasses of declarative_base()
. If you are using Vertabelo as a database modeling tool, you can provide some automation and generate it using open-source script hosted on Github, that takes as an argument the model structure in Vertabelo XML (you can generate it by clicking on XML button in the application). Follow full tutorial here or you can simply write the classes by hand.
The model looks as follows:
import sqlalchemy from sqlalchemy import create_engine from sqlalchemy.ext.declarative import declarative_base from sqlalchemy import Column, Integer, BigInteger,String, ForeignKey, Unicode , Binary, LargeBinary, Time, DateTime, Date, Text, Boolean from sqlalchemy.orm import relationship, backref, deferred from sqlalchemy.orm import sessionmaker Base = declarative_base() class Purchase (Base): __tablename__ = "purchase" id = Column( 'id' , Integer, primary_key = True ) purchase_no = Column( 'purchase_no' , Unicode ) client_id = Column( 'client_id' , Integer, ForeignKey( 'client.id' )) client = relationship( 'Client' , foreign_keys = client_id) class Client (Base): __tablename__ = "client" id = Column( 'id' , Integer, primary_key = True ) full_name = Column( 'full_name' , Unicode ) email = Column( 'email' , Unicode ) |
To carry out a small investigation of this ORM we will create tables in SQLite database from the generated Python classes and try some Python expressions to perform some insert.
Run the interactive Python console and create a database engine for our database session.
>>> from sqlalchemy import create_engine |
Import classes from generated output.py file
>>> from output import * |
Create a database session and database tables for classes defined as Client
and Purchase
. The lines provided below will create a session
object which will be bound to the sqlite database. All of the database manipulation code and objects will be attached to a database session.
>>> from sqlalchemy.orm import sessionmaker >>> session = sessionmaker() >>> session.configure(bind = engine) >>> Base.metadata.create_all(engine) |
Any changes performed on a database are done in the context of a session. Multiple changes can be grouped in a session via session.add(object)
. To register all the changes in a database, issue a statementdb.session.commit()
.
Inserting data can be performed in three steps.
Create a session object
- Create a Python object of
Client
andPurchase
classes - Add it to the session
- Commit the session
>>> s = session() >>> c = Client(full_name = ’client1’, email = ’client1@gmail.com’) >>> p = Purchase(purchase_no = u’abcd123’, client_id = c. id ) >>> s.add(c) >>> s.add(p) >>> s.commit() |
To retrieve the database objects, we call query()
and filter()
methods from the databasesession
object.
>>> c = s.query(Client). filter (Client.full_name = = 'client1' ).one() >>> c.full_name u 'client1' >>> c.email u 'client1@gmail.com' >>> p = s.query(Purchase). filter (Purchase.client_id = = c. id ).one() >>> p.purchase_no u 'abcd123' |
Python frameworks to use with SQLAlchemy:
- Pyramid
- CherryPy
- Flask (via Flask-SQLAlchemy plugin)
- Web2py
Java
jOOQ – is a database-mapping tool that implements the Active Record pattern, and is commonly considered a “happy medium between ORMs and JDBC”, because it embraces SQL rather than trying to hide it. jOOQ focuses on a relational model centric approach and assumes that database design should drive Java code. One of the features is possibility to construct queries that has similar structure to SQL.
Via jOOQ you can generate Java classes that model tables, records, sequences, stored procedures and many more. There are three ways to generate the jOOQ classes. You can generate jOOQ classse from the existing database or using Vertabelo-jOOQ integration. Having already prepared the database structure in Vertabeloyou can export it as a Vertabelo XML file and add as a property in the codegen configuration file (Full tutorialhere) or you can download the model directly from the cloud by providing the Vertabelo API token (model identifier) to the codegen file (full tutorial here). More information can be found here.
For example, an insert for our client-purchase example can be visualized in the code as follows:
// Create a new record ClientRecord client = create.newRecord(CLIENT); // Insert the record: INSERT INTO CLIENT VALUES (‘John’, ‘john@gmail.com’) client.setFullName(‘John’); client.setEmail(‘john @gmail .com’); client.store(); |
Or we can loop over the clients returned from SELECT statements and delete those whose full_name is ‘John’.
for (PurchaseRecord purchase : create .selectFrom(PURCHASE) .where(PURCHASE.CLIENT_ID.in( select(CLIENT.ID) .from(CLIENT) .where(CLIENT.FULL_NAME.equal( "John" )) )) { purchase.delete(); } |
PHP
Propel – is a PHP ORM with implemented Active Record pattern. It describes the model structure (tables, columns, relationships) in an XML file called the schema and generates classes based on the schema definition of tables.
The schema.xml can be generated by using a vertabeloPropel script. For more details follow this tutorial.
After generation, the structure of the database schema looks like the following:
<?xml version= "1.0" ?> <database name= "my_db" defaultIdMethod= "native" > <table name= "purchase" > <column name= "id" type= "INTEGER" sqlType= "int" required= "true" primaryKey= "true" /> <column name= "purchase_no" type= "CHAR" size= "12" sqlType= "char(12)" required= "true" /> <column name= "client_id" type= "INTEGER" sqlType= "int" required= "true" /> <foreign-key foreignTable= "client" name= "client_purchase" onDelete= "none" onUpdate= "none" > <reference foreign= "id" local= "client_id" /> </foreign-key> </table> <table name= "client" > <column name= "id" type= "INTEGER" sqlType= "int" required= "true" primaryKey= "true" /> <column name= "full_name" type= "VARCHAR" size= "255" sqlType= "varchar(255)" required= "true" /> <column name= "email" type= "VARCHAR" size= "255" sqlType= "varchar(255)" required= "true" /> </table> </database> |
Having described the structure, we can generate the SQL code, by issuing:
$ propel sql:build |
and then generate model classes, by running:
$ propel model:build |
Adding new data in Propel can be performed in few steps
- Create a Propel object
- Define a value for each column by using
setXXX()
method - Call
save()
method.
$client = new Client(); $client ->setFullName( 'client1' ); $client ->setEmail( 'client1@gmail.com' ); $client ->save(); |
To read a row from a database, we use generated Query object and generated findPK()
method. Reading object properties is done via calling the getter on the column.
$firstClient = ClientQuery::create()->findPK(1); echo $firstClient ->getFullName(); |
where line $firstClient = ClientQuery::create()->findPK(1);
issues a simple SQL query.
SELECT client.id, client.full_name, client.email FROM client WHERE client.id = 1 LIMIT 1; |
Yes, the problem of Object Relational Mapping is definitely a hard one. Most of the discussion still keeps revolving around the thoughts: “I would like to have something that lacks typical Hibernate or another ORM defects, but I don’t have a viable replacement. I am not planning to write all the code in pure SQL/JDBC, therefore I will stick to what I know.” The problem of Object Relational Mapping is still unresolved and discussions about it still become a battlefield. How do you handle the ORM problem? When do you say YES or NO to using ORM?
[转载] ORMs under the hood的更多相关文章
- python 常用模块(转载)
转载地址:http://codeweblog.com/python-%e5%b8%b8%e7%94%a8%e6%a8%a1%e5%9d%97/ adodb:我们领导推荐的数据库连接组件bsddb3:B ...
- 【转载】LINUX 和 WINDOWS 内核的区别
LINUX 和 WINDOWS 内核的区别 [声明:欢迎转载,转载请注明出自CU ACCESSORY http://linux.chinaunix.net/bbs/thread-1153868-1-1 ...
- Valgrind使用-转载
完全转载的,仅当学习使用 对应链接地址:http://www.gjprj.cn/news1.asp?id=2862 另见valgrind的使用简介,http://blog.csdn.net/sduli ...
- Crystal Clear Applied: The Seven Properties of Running an Agile Project (转载)
作者Alistair Cockburn, Crystal Clear的7个成功要素,写得挺好. 敏捷方法的关注点,大家可以参考,太激动所以转载了. 原文:http://www.informit.com ...
- RTP与RTCP协议介绍(转载)
RTSP发起/终结流媒体.RTP传输流媒体数据 .RTCP对RTP进行控制,同步.RTP中没有连接的概念,本身并不能为按序传输数据包提供可靠的保证,也不提供流量控制和拥塞控制,这些都由RTCP来负责完 ...
- 《Walking the callstack(转载)》
本文转载自:https://www.codeproject.com/articles/11132/walking-the-callstack Download demo project with so ...
- [转载]MVVM模式原理分析及实践
没有找到很好的MVVM模式介绍文章,简单找了一篇,分享一下.MVVM实现了UI\UE设计师(Expression Blend 4设计界面)和软件工程师的合理分工,在SilverLight.WPF.Wi ...
- [转载]:STM32为什么必须先配置时钟再配置GPIO
转载来源 :http://blog.csdn.net/fushiqianxun/article/details/7926442 [原创]:我来添两句,就是很多同学(包括我)之前搞低端单片机,到了stm ...
- [转载]从MyEclipse到IntelliJ IDEA-让你摆脱鼠标,全键盘操作
从MyEclipse转战到IntelliJ IDEA的经历 注转载址:http://blog.csdn.net/luoweifu/article/details/13985835 我一个朋友写了一篇“ ...
随机推荐
- SQLITE3 使用总结
转自: http://blog.chinaunix.net/uid-8447633-id-3321394.html 前序: Sqlite3 的确很好用.小巧.速度快.但是因为非微软的产品,帮助文档总觉 ...
- AjaxPro 的基本用法
通过 Ajax可以直接访问后台的代码 实现的步骤: 一 ,添加 引用 AjaxPro.2.dll 文件 二 配置配置文件 <httpHandlers> <add verb=" ...
- div使用jqueryui 源码 | gridview多个功能的源码
div使用jqueryui 源码 | gridview多个功能的源码 一.gridview 选中行 改变颜色,双击选中 改变颜色 protected void gv1_SelectedIndexCha ...
- android 数据库操作
做android的数据库方面用的很少,所以用的时候记录下来,下次碰到直接copy,一下代码是最基本的实现: 首先是继承helper类: package com.create.rycreateim.db ...
- MySQL数据类型总结
MySQL中的数据类型大的方面来分,可以分为:日期和时间.数值,以及字符串.下面就分开来进行总结. 日期和时间数据类型 MySQL数据类型 含义 date 3字节,日期,格式:2014-09-18 t ...
- ACM第二站————归并排序
转载请注明出处,谢谢!http://www.cnblogs.com/Asimple/p/5459664.html 归并排序————二分的思想 以中间的数为基准,每次排序都将比其小[升序排](大[降序排 ...
- 在ubuntu上搭建turnserver
这边文章的目的:搭建turnserver,设定开机启动 1.下载turnserver的源码,最新的地址https://code.google.com/p/rfc5766-turn-server/ no ...
- java 反编译和文档生成器
挺有趣的东西. 代码在上一篇中有. 反编译器javap.exe javadoc.exe制作源文件类结构的html格式文档
- C# .ToString() 格式化
c# ToString() 格式化字符串 格式化数值:有时,我们可能需要将数值以一定的格式来呈现,就需要对数值进行格式化.我们使用格式字符串指定格式.格式字符串采用以下形式:Axx,其中 A 为格式 ...
- Educational Codeforces Round 1 A
In this problem you are to calculate the sum of all integers from 1 to n, but you should take all po ...