Problem of current IQA methods for SCIs:
(1) Divided large SCIs into image patches for data augmentation.
A. A single image patch can not represent the quality of the entire image, especially in IQA of SCI.
B. SCI patches of an entire image degraded by the same distortion type and strength may have drastically different quality.
(2) Assign differential mean opinion score (DMOS) to all image patches
Inherent error and inaccuracy between DMOS and quality scores of image patches.
(3) Adopted mean square error (MSE) between the predicted quality and the subjective differential mean opinion score (DMOS), without considering quality ranking between different SCIs. (The main reason is that they do not consider the ranking information in term of quality between SCIs. → small MSE error but poor performance)
(4) Most databases are laboratory generated, i.e., synthetically distorted images. (authentic distortions)
Patches Size of SCIs:
Pre-processing: The SCIs are ﬁrst divided into multiple regions averagely, and an image patch of size 128 × 128 is extracted from each region as a representative sample of each region.
(1) 128 X 128
For Natural IQA models, 32 X 32 is the most optimum size.
Ideas of some current methods:
(a) Assess the quality of textual regions and pictorial regions separately.
(b) (Better image quality representation) Obtain multiple features, e.g., local, global perceptual features/representations.
(Multi-region) local features from textual regions and local features from pictorial regions
global features from the entire image
(c) Multi-task training: quality score prediction, distortion type (noise) classification
(d) Different ranking loss to rank image
Local Feature Extraction Module:
A. The input SCIs are divided into multiple regions, and one image patch of size 128 × 128 is extracted from each region as a representative sample of each region.
B. In the VGG-based network, combination of two 3 × 3 convolutional layers and one pooling layers owns a larger view to extract local features with less data, which has an excellent feature extraction ability.
C. The ﬁrst two convolutional layers adopt a large-scale convolutional kernel of size 5 × 5 to acquire general information of local features from an entire input patch.
Pseudo Global Feature Generation Module:
A. Fuses local features to generate pseudo global features.
B. The concatenate layer is used to fuse features of multiple regions,
C. 1 × 1 convolutional layer is utilized for feature fusion and feature dimension reduction.
Multi-Task Training Module:
A. Pseudo global features are used to train a multi-task learning model.
B. Noise classiﬁcation task and quality score prediction task.
C. Noise classiﬁcation: two fully connected (FC) layers and one softmax layer. Since the image subjective quality depends on noise type, noise strength and image content, this branch adopts a classiﬁcation network to extract features of noise type and strength.
D. Quality score prediction: three FC layers and one concatenate layer.
E. Feature vectors of noise classiﬁcation task and quality score prediction task are concatenated into a new feature vector of quality score prediction task.
Siamese Network Module:
A. Extract features of different SCIs with shared weights of the proposed model.
B. Two different SCIs are input into one model simultaneously, and then two predicted scores are obtained as output of the model.
*The advantages of utilizing multi-region features:*
A. Compared to local features of image patches, the pseudo global features are better representation of the entire image quality, and are reasonable to be labeled with DOMSs;
B. Compared to using large image patches, utilizing multi-region image patches can obtain more training samples to solve the problem of insuﬃcient data, and take into account the characteristics of the entire image, which has less computation for employing the shared local feature extraction module and a 1 × 1 convolutional layer;
C. Utilizing multi-region features can reduce the inﬂuence of image patch contents with large naturalness statistical differences between SCI patches.
(The noise loss of the noise classiﬁcation task adopts the empirical cross entropy loss which shows superior performance in classiﬁcation models.)
(The smooth L 1 loss is less sensitive to outliers than the L2 loss, and owns better ﬁtting ability than L1 loss.)
(This ranking loss can make networks learn the quality difference of two different SCIs resulting in owning ranking ability.)
DMOS values are ranged from 0 to 100, where 0 indicates the best quality, and 100 indicates the worst quality.
o_i and s_i are the objective and subject scores
e_i is the difference between the subjective and object results
A. Pearson Linear Correlation Coefficient (PLCC)
measure a method’s prediction accuracy
B. Spearman Rank Order Correlation Coefficient (SRCC)
measure a method’s prediction monotonicity
Non-parametric rank-order based correlation metric that is independent of any monotonic score mapping.
It is employed to access prediction monotonicity
C. Kendalls Rank-order Correlation Coefficient (KRCC)
measure for monotonicity prediction
D. Root Mean Square Error (RMSE)
RMSE can be adopted to gauge the prediction consistency.
E. Statistical Significance
signify whether the difference in the performance of one IQA method with respect to another, on a set of sample points, is purely due to chance or due to some genuine underlying effect
F. Five-parameter mapping function
Nonlinearly regress the quality scores into a common space
Objective quality scores of SCIs may have different ranges. It is necessary to map the above-mentioned scores into a common range
x: score estimated by the proposed model
Q(x): corresponding mapped score
β1 ,. . . , β5: parameters to be computed with a curve fitting process (ﬁtted by minimizing the sum of squared errors)
- Review another paper again
- Continue to review SC IQA Codes