A Feature-Enriched Completely Blind Image Quality Evaluator Lin Zhang, Lei Zhang, and Alan C. Bovik |
Introduction
Existing blind image quality assessment (BIQA) methods are mostly opinion-aware. They learn regression models from training images with associated human subjective scores to predict the perceptual quality of test images. Such opinion-aware methods, however, require a large amount of training samples with associated human subjective scores and of a variety of distortion types. The BIQA models learned by opinion-aware methods often have weak generalization capability, hereby limiting their usability in practice. By comparison, opinion-unaware methods do not need human subjective scores for training, and thus have greater potential for good generalization capability. Unfortunately, thus far no opinion-unaware BIQA method has shown consistently better quality prediction accuracy than opinion-aware methods. Here we aim to develop an opinion-unaware BIQA method that can compete with, and perhaps outperform existing opinion-aware methods. By integrating natural image statistics features derived from multiple cues, we learn a multivariate Gaussian model of image patches from a collection of pristine natural images. Using the learned multivariate Gaussian model, a Bhattacharyya-like distance is used to measure the quality of each image patch, then an overall quality score is obtained by average pooling. The proposed BIQA method does not need any distorted sample images nor subjective quality scores for training, yet extensive experiments demonstrate its superior quality-prediction performance to state-of-the-art opinion-aware BIQA methods.
Source Code
The source code for the proposed blind NR-IQA metric IL-NIQE can be downloaded here: ILNIQE.zip.
In order to facilitate the other researchers, the prediction scores obtained by IL-NIQE along with MOS (or DMOS) on different datasets are provided here.
With the above mat file, the performance metrics could be easily computed. For example, with the following codes, the IL-NIQE's SROCC on TID2013 can be obtained :
matData
= load('ILNIQEOnTID2013.mat');
ILNIQEOnTID2013= matData.ILNIQEOnTID2013;
SROCC = corr(ILNIQEOnTID2013(:,1),
ILNIQEOnTID2013(:,2), 'type',
'spearman');
Evaluation Results
A. Databases
Four benchmark large-scale IQA datasets are used to evaluate the proposed IL-NIQE index, including TID2013, CSIQ, LIVE, and LIVE Multiply Distortion. The LIVE Multiply Distortion (MD) dataset was constructed by two sections and we regard them as two separate datasets, denoted by LIVE MD1 and LIVE MD2.
B. Performance on Each Individual Dataset
Since opinion-aware methods need to use the distorted images in the dataset to learn the model, we partition the dataset into a training subset and a testing subset. We report results under three partition settings: distorted images associated to 80%, 50%, and 10% of the reference images are used for training and the remaining for testing. The partition was randomly conducted 1,000 times and the median results are reported in the Table 1. For IL-NIQE, NIQE and QAC, though they do not need training on the dataset, we report their results on the partitioned test subset to make the comparison consistent.
Table 1 Results of Performance Evaluation on Each Individual Dataset
Datasets |
Methods |
80% |
50% |
10% |
SRCC PLCC |
SRCC PLCC |
SRCC PLCC |
||
TID2013 |
BIQI |
0.349 0.366 |
0.332 0.332 |
0.199 0.250 |
BRISQUE |
0.573 0.651 |
0.563 0.645 |
0.513 0.587 |
|
BLIINDS2 |
0.536 0.628 |
0.458 0.480 |
0.402 0.447 |
|
DIIVINE |
0.549 0.654 |
0.503 0.602 |
0.330 0.391 |
|
CORNIA |
0.549 0.613 |
0.573 0.652 |
0.508 0.603 |
|
NIQE |
0.317 0.426 |
0.317 0.420 |
0.313 0.398 |
|
QAC |
0.390 0.495 |
0.390 0.489 |
0.372 0.435 |
|
IL-NIQE |
0.521 0.648 |
0.513 0.641 |
0.494 0.590 |
|
CSIQ |
BIQI |
0.092 0.237 |
0.092 0.396 |
0.020 0.311 |
BRISQUE |
0.775 0.817 |
0.736 0.781 |
0.545 0.596 |
|
BLIINDS2 |
0.780 0.832 |
0.749 0.806 |
0.628 0.688 |
|
DIIVINE |
0.757 0.795 |
0.652 0.716 |
0.441 0.492 |
|
CORNIA |
0.714 0.781 |
0.678 0.754 |
0.638 0.732 |
|
NIQE |
0.627 0.725 |
0.626 0.716 |
0.624 0.714 |
|
QAC |
0.486 0.654 |
0.494 0.706 |
0.490 0.707 |
|
IL-NIQE |
0.822 0.865 |
0.814 0.854 |
0.813 0.852 |
|
LIVE |
BIQI |
0.825 0.840 |
0.739 0.764 |
0.547 0.623 |
BRISQUE |
0.933 0.931 |
0.917 0.919 |
0.806 0.816 |
|
BLIINDS2 |
0.924 0.927 |
0.901 0.901 |
0.836 0.834 |
|
DIIVINE |
0.884 0.893 |
0.858 0.866 |
0.695 0.701 |
|
CORNIA |
0.940 0.944 |
0.933 0.934 |
0.893 0.894 |
|
NIQE |
0.908 0.908 |
0.905 0.904 |
0.905 0.903 |
|
QAC |
0.874 0.868 |
0.869 0.864 |
0.866 0.860 |
|
IL-NIQE |
0.902 0.906 |
0.899 0.903 |
0.899 0.903 |
|
MD1 |
BIQI |
0.769 0.831 | 0.580 0.663 | 0.159 0.457 |
BRISQUE |
0.887 0.921 | 0.851 0.873 | 0.829 0.860 | |
BLIINDS2 |
0.885 0.925 | 0.841 0.879 | 0.823 0.859 | |
DIIVINE |
0.846 0.891 | 0.805 0.836 | 0.631 0.675 | |
CORNIA |
0.904 0.931 | 0.878 0.905 | 0.855 0.889 | |
NIQE |
0.909 0.942 | 0.883 0.921 | 0.874 0.912 | |
QAC |
0.418 0.597 | 0.406 0.552 | 0.397 0.541 | |
IL-NIQE |
0.911 0.930 | 0.899 0.916 | 0.893 0.907 | |
MD2 |
BIQI |
0.897 0.919 | 0.835 0.860 | 0.769 0.773 |
BRISQUE |
0.888 0.915 | 0.864 0.881 | 0.849 0.867 | |
BLIINDS2 |
0.893 0.910 | 0.852 0.874 | 0.850 0.868 | |
DIIVINE |
0.888 0.916 | 0.855 0.880 | 0.832 0.851 | |
CORNIA |
0.908 0.920 | 0.876 0.890 | 0.843 0.866 | |
NIQE |
0.834 0.884 | 0.808 0.860 | 0.796 0.852 | |
QAC |
0.501 0.718 | 0.480 0.689 | 0.473 0.678 | |
IL-NIQE |
0.928 0.915 | 0.890 0.895 | 0.882 0.896 |
C. Cross-datasets Performance Evaluation
For the five opinion-aware BIQA methods, their quality prediction models trained on the entire LIVE dataset are provided by the original authors. Thus, we directly use them for testing on the other datasets. The results are shown in Table 2. For each performance measure, the two best results are highlighted in bold. In Table 3 we present the weighted-average SRCC and PLCC indices of all methods over the four datasets, and the weight assigned to each dataset linearly depends on the number of distorted images contained in that dataset. Furthermore, we train the opinion-aware methods on the entire TID2013 dataset and then perform testing on the rest datasets. The results are shown in Table 4 and Table 5.
Table 2 Evaluation Results when Trained on LIVE
|
TID2013 |
CSIQ |
MD1 |
MD2 |
SRCC PLCC |
SRCC PLCC |
SRCC PLCC |
SRCC PLCC |
|
BIQI |
0.394 0.468 |
0.619 0.695 |
0.654 0.774 |
0.490 0.766 |
BRISQUE |
0.367 0.475 |
0.557 0.742 |
0.791 0.866 |
0.299 0.459 |
BLIINDS2 |
0.393 0.470 |
0.577 0.724 |
0.665 0.710 |
0.015 0.302 |
DIIVINE |
0.355 0.545 |
0.596 0.697 |
0.708 0.767 |
0.602 0.702 |
CORNIA |
0.429 0.575 |
0.663 0.764 |
0.839 0.871 |
0.841 0.864 |
NIQE |
0.311 0.398 |
0.627 0.716 |
0.871 0.909 |
0.795 0.848 |
QAC |
0.372 0.437 |
0.490 0.708 |
0.396 0.538 |
0.471 0.672 |
IL-NIQE |
0.494 0.589 |
0.815 0.854 |
0.891 0.905 |
0.882 0.897 |
Table 3 Weighted-average Performance Evaluation Based on Table 2
|
BIQI |
BRISQUE |
BLIINDS2 |
DIIVINE |
CORNIA |
NIQE |
QAC |
IL-NIQE |
SRCC |
0.458 |
0.424 |
0.424 |
0.435 |
0.519 |
0.429 |
0.402 |
0.599 |
PLCC |
0.545 |
0.548 |
0.525 |
0.595 |
0.643 |
0.512 |
0.509 |
0.675 |
Table 4 Evaluation Results when Trained on TID2013
|
LIVE |
CSIQ |
MD1 |
MD2 |
SRCC PLCC |
SRCC PLCC |
SRCC PLCC |
SRCC PLCC |
|
BIQI |
0.047 0.311 |
0.010 0.181 |
0.156 0.175 |
0.332 0.380 |
BRISQUE |
0.088 0.108 |
0.639 0.728 |
0.625 0.807 |
0.184 0.591 |
BLIINDS2 |
0.076 0.089 |
0.456 0.527 |
0.507 0.690 |
0.032 0.222 |
DIIVINE |
0.042 0.093 |
0.146 0.255 |
0.639 0.669 |
0.252 0.367 |
CORNIA |
0.097 0.132 |
0.656 0.750 |
0.772 0.847 |
0.655 0.719 |
NIQE |
0.906 0.904 |
0.627 0.716 |
0.871 0.909 |
0.795 0.848 |
QAC |
0.868 0.863 |
0.490 0.708 |
0.396 0.538 |
0.471 0.672 |
IL-NIQE |
0.898 0.903 |
0.815 0.854 |
0.891 0.905 |
0.882 0.897 |
Table 5 Weighted-average Performance Evaluation Based on Table 4
|
BIQI |
BRISQUE |
BLIINDS2 |
DIIVINE |
CORNIA |
NIQE |
QAC |
IL-NIQE |
SRCC |
0.074 |
0.384 |
0.275 |
0.172 |
0.461 |
0.775 |
0.618 |
0.861 |
PLCC |
0.250 |
0.491 |
0.349 |
0.251 |
0.527 |
0.821 |
0.744 |
0.882 |
Created on: Nov. 06, 2014
Last update: Apr. 26, 2015