Isolation is the I
in ACID, one of the 4 famous properties in database transaction.
Isolation level play a critial role in concurrent scenarios. Higher isolation level offers better gurantee of data consistency, however worse performance in concurrency.
To better understand 4 common isolation levels, let’s see the following examples.
Here we have one user
table which has only two fields, id
and name
. There is one record in the table.
-------------
| id | name |
-----+-------
| 1 | Tom |
-------------
Two transactions(TA, TB) occur concurrently:
Read Uncommitted
Read uncommitted is the lowest isolation level, which may result in dirty data, let’s have a look at this example:
#1 TA:
>> Begin();
>> select * from user where id=1;
-------------
| id | name |
-----+-------
| 1 | Tom |
-------------
#2 TB:
>> Begin();
>> update set name="Kim" from user where id=1;
#3 TA:
>> select * from user where id=1;
-------------
| id | name |
-----+-------
| 1 | Kim |
-------------
#4 TB:
>> Rollback();
#5 TA:
>> select * from user where id=1;
-------------
| id | name |
-----+-------
| 1 | Tom |
-------------
In this example, name=Kim
is the dirty data(which has been rollbacked). By right, in other transaction it shouldn’t have the knowledge of this value. But in step 3, TA accidently gets this temporary value.
In order to avoid such case, we can increase isolation level to read committed
.
Read Committed
#1 TA:
>> Begin();
>> select * from user where id=1;
-------------
| id | name |
-----+-------
| 1 | Tom |
-------------
#2 TB:
>> Begin();
>> update set name="Kim" from user where id=1;
#3 TA:
select * from user where id=1;
-------------
| id | name |
-----+-------
| 1 | Tom |
-------------
#4 TB:
>> Commit();
#5 TA:
select * from user where id=1;
-------------
| id | name |
-----+-------
| 1 | Kim |
-------------
In this example, TA can only read name=Kim
after TB is committed, thus it solves the dirty read issue. Changes which are not committed to database will not be seen by other transaction.
Read committed isolation level has limitation as well:
In the lifetime of TA, it queries id=1
twice, and gets 2 different names(1st: Tom, 2nd: Kim), even if TA doesn’t change the value for id=1
, the result is NOT stable
every time this query runs.
To mitigate the above case, we can continue to increase isolation level to repeated read
.
Details behind the scene, for every query in transaction, DB will create a
read view
(snapshot), For read committed isolation level, the read view will be released after statement is executed(not transaction committed or rollbacked).As a result, for query
select * from user where id<10 for update
, DB won’t addgap lock
for id < 10, and won’t prevent the insertion or update for id < 10 in other transaction.
Read committed is the default isolation level for Oracle, SQL Server and most of distributed transaction manager.
Repeatable Read
#1 TA:
>> Begin();
>> select * from user where id=1;
-------------
| id | name |
-----+-------
| 1 | Tom |
-------------
#2 TB:
>> Begin();
>> update set name="Kim" from user where id=1;
>> Commit();
#3 TA:
>> select * from user where id=1;
-------------
| id | name |
-----+-------
| 1 | Tom |
-------------
In this example, even if TB updates the record and commits to DB, TA still gets the old name Tom in the second query. DB create a read view
the first time TA reads such record.
For repeated read, the
read view
will be released after transaction is committed or rollbacked.
Another case:
#1 TA:
>> Begin();
>> select * from user where id=1 for update;
-------------
| id | name |
-----+-------
| 1 | Tom |
-------------
>> update set name="Kim" from user where id=1;
#2 TB:
>> Begin();
>> select * from user where id=1;
-------------
| id | name |
-----+-------
| 1 | Tom |
-------------
In this case, TB still get the data(Tom) instead of TA(Kim) when TB start.
Repeatable Read is the default isolation level of InnoDB
engine in MySQL.
Serializable
This isolation level is to make sure concurrent operation on same records will be handled in sequence.
InnoDB implicitly converts all plain SELECT statements to SELECT … FOR SHARE
Serializable is the highest isolation level with the worst concurrency performance. It’s not widely used in high concurrent system.