A Feature-Enriched Completely Blind Image Quality Evaluator

Lin Zhang, Lei Zhang, and Alan C. Bovik


Introduction

Existing blind image quality assessment (BIQA) methods are mostly opinion-aware. They learn regression models from training images with associated human subjective scores to predict the perceptual quality of test images. Such opinion-aware methods, however, require a large amount of training samples with associated human subjective scores and of a variety of distortion types. The BIQA models learned by opinion-aware methods often have weak generalization capability, hereby limiting their usability in practice. By comparison, opinion-unaware methods do not need human subjective scores for training, and thus have greater potential for good generalization capability. Unfortunately, thus far no opinion-unaware BIQA method has shown consistently better quality prediction accuracy than opinion-aware methods. Here we aim to develop an opinion-unaware BIQA method that can compete with, and perhaps outperform existing opinion-aware methods. By integrating natural image statistics features derived from multiple cues, we learn a multivariate Gaussian model of image patches from a collection of pristine natural images. Using the learned multivariate Gaussian model, a Bhattacharyya-like distance is used to measure the quality of each image patch, then an overall quality score is obtained by average pooling. The proposed BIQA method does not need any distorted sample images nor subjective quality scores for training, yet extensive experiments demonstrate its superior quality-prediction performance to state-of-the-art opinion-aware BIQA methods.


Source Code

The source code for the proposed blind NR-IQA metric IL-NIQE can be downloaded here: ILNIQE.zip.

In order to facilitate the other researchers, the prediction scores obtained by IL-NIQE along with MOS (or DMOS) on different datasets are provided here.

ILNIQEOnTID2013.mat

ILNIQEOnCSIQ.mat

ILNIQEOnLIVE.mat

ILNIQEOnMD1.mat

ILNIQEOnMD2.mat

With the above mat file, the performance metrics could be easily computed. For example, with the following codes, the IL-NIQE's SROCC on TID2013 can be obtained :

matData = load('ILNIQEOnTID2013.mat');
ILNIQEOnTID2013= matData.ILNIQEOnTID2013;
SROCC = corr(ILNIQEOnTID2013(:,1), ILNIQEOnTID2013(:,2), 'type', 'spearman');


Evaluation Results

A. Databases

Four benchmark large-scale IQA datasets are used to evaluate the proposed IL-NIQE index, including TID2013, CSIQ, LIVE, and LIVE Multiply Distortion. The LIVE Multiply Distortion (MD) dataset was constructed by two sections and we regard them as two separate datasets, denoted by LIVE MD1 and LIVE MD2.

B. Performance on Each Individual Dataset

Since opinion-aware methods need to use the distorted images in the dataset to learn the model, we partition the dataset into a training subset and a testing subset. We report results under three partition settings: distorted images associated to 80%, 50%, and 10% of the reference images are used for training and the remaining for testing. The partition was randomly conducted 1,000 times and the median results are reported in the Table 1. For IL-NIQE, NIQE and QAC, though they do not need training on the dataset, we report their results on the partitioned test subset to make the comparison consistent.

Table 1 Results of Performance Evaluation on Each Individual Dataset

Datasets

Methods

80%

50%

10%

SRCC PLCC

SRCC PLCC

SRCC PLCC

TID2013

BIQI

0.349  0.366

0.332  0.332

0.199  0.250

BRISQUE

0.573  0.651

0.563  0.645

0.513  0.587

BLIINDS2

0.536  0.628

0.458  0.480

0.402  0.447

DIIVINE

0.549  0.654

0.503  0.602

0.330  0.391

CORNIA

0.549  0.613

0.573  0.652

0.508  0.603

NIQE

0.317  0.426

0.317  0.420

0.313  0.398

QAC

0.390  0.495

0.390  0.489

0.372  0.435

IL-NIQE

0.521  0.648

0.513  0.641

0.494  0.590

CSIQ

BIQI

0.092  0.237

0.092  0.396

0.020  0.311

BRISQUE

0.775  0.817

0.736  0.781

0.545  0.596

BLIINDS2

0.780  0.832

0.749  0.806

0.628  0.688

DIIVINE

0.757  0.795

0.652  0.716

0.441  0.492

CORNIA

0.714  0.781

0.678  0.754

0.638  0.732

NIQE

0.627  0.725

0.626  0.716

0.624  0.714

QAC

0.486  0.654

0.494  0.706

0.490   0.707

IL-NIQE

0.822  0.865

0.814  0.854

0.813  0.852

LIVE

BIQI

0.825  0.840

0.739  0.764

0.547  0.623

BRISQUE

0.933  0.931

0.917  0.919

0.806  0.816

BLIINDS2

0.924  0.927

0.901  0.901

0.836  0.834

DIIVINE

0.884  0.893

0.858  0.866

0.695  0.701

CORNIA

0.940  0.944

0.933  0.934

0.893  0.894

NIQE

0.908  0.908

0.905  0.904

0.905  0.903

QAC

0.874  0.868

0.869  0.864

0.866  0.860

IL-NIQE

0.902  0.906

0.899  0.903

0.899  0.903

MD1

BIQI

0.769  0.831 0.580  0.663 0.159  0.457

BRISQUE

0.887  0.921 0.851  0.873 0.829  0.860

BLIINDS2

0.885  0.925 0.841  0.879 0.823  0.859

DIIVINE

0.846  0.891 0.805  0.836 0.631  0.675

CORNIA

0.904  0.931 0.878  0.905 0.855  0.889

NIQE

0.909  0.942 0.883  0.921 0.874  0.912

QAC

0.418  0.597 0.406  0.552 0.397  0.541

IL-NIQE

0.911  0.930 0.899  0.916 0.893  0.907

MD2

BIQI

0.897  0.919 0.835  0.860 0.769  0.773

BRISQUE

0.888  0.915 0.864  0.881 0.849  0.867

BLIINDS2

0.893  0.910 0.852  0.874 0.850  0.868

DIIVINE

0.888  0.916 0.855  0.880 0.832  0.851

CORNIA

0.908  0.920 0.876  0.890 0.843  0.866

NIQE

0.834  0.884 0.808  0.860 0.796  0.852

QAC

0.501  0.718 0.480  0.689 0.473  0.678

IL-NIQE

0.928  0.915 0.890  0.895 0.882  0.896

C. Cross-datasets Performance Evaluation

For the five opinion-aware BIQA methods, their quality prediction models trained on the entire LIVE dataset are provided by the original authors. Thus, we directly use them for testing on the other datasets. The results are shown in Table 2. For each performance measure, the two best results are highlighted in bold. In Table 3 we present the weighted-average SRCC and PLCC indices of all methods over the four datasets, and the weight assigned to each dataset linearly depends on the number of distorted images contained in that dataset. Furthermore, we train the opinion-aware methods on the entire TID2013 dataset and then perform testing on the rest datasets. The results are shown in Table 4 and Table 5.

Table 2 Evaluation Results when Trained on LIVE

 

TID2013

CSIQ

MD1

MD2

SRCC PLCC

SRCC PLCC

SRCC PLCC

SRCC PLCC

BIQI

0.394  0.468

0.619  0.695

0.654  0.774

0.490   0.766

BRISQUE

0.367  0.475

0.557  0.742

0.791  0.866

0.299  0.459

BLIINDS2

0.393  0.470

0.577  0.724

0.665  0.710

0.015  0.302

DIIVINE

0.355  0.545

0.596  0.697

0.708  0.767

0.602   0.702

CORNIA

0.429  0.575

0.663  0.764

0.839  0.871

0.841   0.864

NIQE

0.311  0.398

0.627  0.716

0.871  0.909

0.795   0.848

QAC

0.372  0.437

0.490  0.708

0.396  0.538

0.471   0.672

IL-NIQE

0.494  0.589

0.815  0.854

0.891  0.905

0.882   0.897

 Table 3 Weighted-average Performance Evaluation Based on Table 2

 

BIQI

BRISQUE

BLIINDS2

DIIVINE

CORNIA

NIQE

QAC

IL-NIQE

SRCC

0.458

0.424

0.424

0.435

0.519

0.429

0.402

0.599

PLCC

0.545

0.548

0.525

0.595

0.643

0.512

0.509

0.675

 Table 4 Evaluation Results when Trained on TID2013

 

LIVE

CSIQ

MD1

MD2

SRCC PLCC

SRCC PLCC

SRCC PLCC

SRCC PLCC

BIQI

0.047  0.311

0.010  0.181

0.156  0.175

0.332  0.380

BRISQUE

0.088  0.108

0.639  0.728

0.625  0.807

0.184  0.591

BLIINDS2

0.076  0.089

0.456  0.527

0.507  0.690

0.032  0.222

DIIVINE

0.042  0.093

0.146  0.255

0.639  0.669

0.252  0.367

CORNIA

0.097  0.132

0.656  0.750

0.772  0.847

0.655  0.719

NIQE

0.906  0.904

0.627  0.716

0.871  0.909

0.795  0.848

QAC

0.868  0.863

0.490  0.708

0.396  0.538

0.471  0.672

IL-NIQE

0.898  0.903

0.815  0.854

0.891  0.905

0.882  0.897

 Table 5 Weighted-average Performance Evaluation Based on Table 4

 

BIQI

BRISQUE

BLIINDS2

DIIVINE

CORNIA

NIQE

QAC

IL-NIQE

SRCC

0.074

0.384

0.275

0.172

0.461

0.775

0.618

0.861

PLCC

0.250

0.491

0.349

0.251

0.527

0.821

0.744

0.882


Created on: Nov. 06, 2014

Last update: Apr. 26, 2015