May 14, 2021

Today Finish:

  1. Summarize the following two papers:
    A. No-reference screen content image quality assessment based on multi-region features

    B. No-Reference Quality Assessment for Screen Content Images Based on Hybrid Region Features Fusion

    (1) Screen Content images (SCIs): mixed contents including textual and pictorial regions, e.g., natural scenes pictures, documents and texts images, and computer-generated graphics.

    (2) SCIs Properties: sharp edges, thin lines, little color variations for massive existence of texts and computer-generated graphics.

    (3) Natural Images (NIs) Properties: Continuous-tone content with smooth edges, thick lines, textures, rich color changes

    (4) SCIs and NIs Differences:

    A. SCIs have more sharp edges → Features change significantly

    B. SCIs appear dissimilar treads of variations (dissimilar peculiarity) when they are contaminated by various kinds of distortions with different intensities, and the content of different regions varies greatly.

    Notice: Mean Subtracted Contrast Normalized (MSCN) coefficients of pictorial and textual regions (the MSCN Histogram)

    “The MSCN coefficient histogram of the pictorial region exhibits a Gaussian-like appearance. By contrast, the textual region yields a quite different MSCN distribution.”

    (5) (Model) Input Images:

    A. resized image

    B. image patch, e.g., size 24 X 24, 32 X 32 (overlap or non-overlap)

    (6) Benchmark Dataset: screen content image quality assessment database (SIQAD)

    current SOTA (SRCC): ~ 0.852

    Our Goal (SRCC): ~ 0.91 - 0.95

    The SIQAD is built for evaluating the perceptual quality of the SCIs, which consists of 20 pristine SCIs and 980 corresponding images distorted by 7 types of distortions on 7 distortion levels, involving Gaussian noise (GN), Gaussian blur (GB), motion blur (MB), contrast change (CC), JPEG compression (JPEG), JPEG2000 compression (JP2K) and layer segmentation-based coding (LSC).

    We run 10 times of this random train-test splitting operation and the median SRCC and PLCC values are reported.

    (7) Dataset Partition: 60% Training, 20% Validation, 20% Testing

  2. Review SSIM Paper: Image Quality Assessment: From Error Visibility to Structural Similarity

    (1) objective image quality metric Applications:

    A. It can be used to dynamically monitor and adjust image quality.

    B. It can be used to optimize algorithms and parameter settings of image processing systems (i.e., use this IQA Metric as the object / loss function for the following low-level vision tasks: image enhancement, image denoising, blind image deblurring, image super-resolution, lossy image compression, and image generation, and etc.).

    C. It can be used to benchmark image processing systems and algorithms.

    (2) Why does design an better IQA metric an **optimization** task / problem?:

    as shown above objective image quality metric Applications

    1. **Optimize** a better IQA metric can evaluate and assess the quality 
    of distorted images.

    2. It can be used to ***optimize** algorithms and parameter settings
    of image processing systems*
    ***(i.e., use this IQA Metric as the object / loss function
    for the following low-level vision tasks:
    image enhancement, image denoising, blind image deblurring,
    image super-resolution, lossy image compression,
    and image generation, and etc.).***

    , and

    Our primary goal for designing a better IQA metric is

    1. **Fidelity**: keep the contents of the distorted images 
    (semantic information) unchanged
    2. **Quality**: try to improve the quality for low-level vision tasks

    (3) Objective IQA Types:

    A. Full-reference (FR): have a complete reference image

    B. No-reference (NR) or “blind” IQA: the reference image is not available

    C. Reduced-reference (RR): the reference image is only partially available as a set of extracted features (statistics)

    (4) Widely used IQA algorithms:

    A. Mean Squared Error

    Pros: ideal target for optimization

    a. based on a valid distance metric ($L_2$)

    b. satisfy positive definiteness

    c. symmetry

    d. triangular inequality properties

    e. convex

    f. differentiable

    g. memoryless

    h. additive for independent sources of distortions

    i. energy preserving under orthogonal or unitary transformations


    a. poor correlation with perceptual image quality

    b. based on point-wise signal differences, which are independent of the underlying signal structure.

    Some Thought between MSE and SSIM:

    The simplest implementation of this concept is the MSE, which objectively quantifies the strength of the error signal. But two distorted images with the same MSE may have very different types of errors, some of which are much more visible than others.

    *Reference patch 和 Distorted patch 的相减squared后sum如果相等的话,就认为MSE就相等。但是”But two distorted images with the same MSE may have very different types of errors, some of which are much more visible than others.“ 但SSIM 告诉我们:A和A’、A’’、A’’’等等的它的distorted images的distance不仅仅和它们之间的距离有关,而且和他们本身的特性(比如说图像的object的structure)有关。*

    (最本质理解图像质量) A和B是不同content,虽然A到A‘之间的距离相等,B到B’之间的距离相等,但是它们的perceptual content不同,所以叫做adaptive to local content

    FYI, what is structure distortions and texture resampling?

    structure distortions of an image: artifacts due to noise, blur, or compression

    texture resampling: exchanging a texture region with a new sample

    (5) Generic IQA Framework:

    (6) Hypothesis of the SSIM: Human Visual System (HVS) is highly adapted for extracting structural information.

    Natural Images’ pixels exhibit strong dependencies (spatially proximate) → dependencies carry important information about the structure of the objects in the visual scene.

    (7) SSIM Philosophy V.S. Error Sensitivity Philosophy:

    A. Quantify Image degradation: perceived changes in structural information variation V.S. perceived errors

    B. Paradigm: top-down approach V.S. bottom-up approach

    C. Natural image complexity and de-correlation problem: structural changes V.S. accumulating the errors

    (8) How to calculate the SSIM Index?:

    A. Independent three components: luminance, contrast, and structure

    B. luminance: mean intensity

    (The above formula is qualitatively consistent with Weber’s law, which has been widely used to model light adaptation (also called luminance masking) in the HVS)

    (*the HVS is sensitive to the relative luminance change, and not the absolute luminance change.*)

    C. Contrast: standard deviation

    (With the same amount of contrast change , this measure is less sensitive to the case of high base contrast than low base contrast. This is consistent with the contrast-masking feature of the HVS.)

    D. Structure: normalized signals (unit standard deviation)


    1. Unit vectors each lying in the hyperplane → indicating the structures of the two images
    2. The correlation (inner product) between these is a simple and effective measure to quantify the structural similarity.

    The structure comparison formula:

    E: Overall Similarity Measure:

    (9) Relationships between the SSIM and other metrics:

    Equal-distortion contours drawn around three different example reference vectors, each of which represents the local content of one reference image.

    Each contour represents a set of images with equal distortions relative to the enclosed reference image.

    "**adaptive to local content"**

    illustrated geometrically in a vector space of image components (e.g., pixel intensities, extracted features, or transformed linear coefficients)

    shape: depends on the metric formula, e.g., MSE→circle

    size: Even though different local contents keep equal distortion, the contour may show different sizes due to signal magnitude

    (a) Minkowski metric (Assumed MSE, an exponent of 2): circle, each contour has the same size and shape → perceptual distance corresponds to Euclidean distance

    (b) Minkowski metric (Assumed different image components are weighted differently using CSF): ellipses, each contour has the same size

    (c) Adaptive distortion metric: rescaling the equal-distortion contours according to the signal magnitude

    (d) Contrast masking (magnitude weighting) combination followed by component weighting

    (e) SSIM: separately computes a comparison of two independent quantities: the vector lengths, and their angles. And the contours will be aligned with the axes of a polar coordinate system

    (f) SSIM: computed with different exponents compared with (e)

    (this may be viewed as an adaptive distortion metric, but unlike previous models, both the size and the shape of the contours are adapted to the underlying signal.)

    (10) Why use local patch? ****

    e.g., 8 X 8 square window and moves pixel-by-pixel over the entire image. —> undesirable “blocking” artifacts

    SSIM: 11 X 11 circular-symmetric Gaussian weighting function —> locally isotropic property

    A. luminance and contrast can vary across a scene

    B. image statistical features are usually highly spatially non-stationary

    C. image distortions may also be space-variant

    D. only a local area in the image can be perceived with high resolution by the human observer at one time instance

    E. localized quality measurement can provide a spatially varying quality map of the image, which delivers more information about the quality degradation of the image and may be useful in some applications.

    As a result, for the SSIM index, they used spatial patches extracted from each image.

    One Problem: the input patch and distorted patch must be perfectly aligned with each other

    Another wait-to-discuss problem: Is it really great to compare the quality of patches, or can we compare the full image directly rather than the patches?

  3. Review SC IQA Codes

Tomorrow Work:

  1. Continue to review SC IQA Codes
  2. connect to the server and download the dataset
  3. Run the codes and get results