Online Supplement

ViS3: An Algorithm for Video Quality Assessment via Analysis of Spatial and Spatiotemporal Slices

P. V. Vu and D. M. Chandler

Journal of Electronic Imaging, 23(1), 01316, Feb 2014, doi: 10.1117/1.JEI.23.1.013016

General information

ViS3 algorithm
The ViS3 algorithm estimates video quality by measuring spatial distortion and spatiotemporal dissimilarity separately. To estimate perceived video quality degradation due to spatial distortion, both the detection-based strategy and the appearancebased strategy of our MAD algorithm are adapted and applied to groups of normal video frames. A simple model of temporal weighting using optical-flow motion estimation is employed to give greater weights to distortions in the slowmoving regions. To estimate spatiotemporal dissimilarity, we extend the models of Watson–Ahumada and Adelson–Bergen, which have been used to measure energy of motion in videos, to the STS images and measure the local variance of spatiotemporal neural responses. The spatiotemporal response is measured by filtering the STS image via one one-dimensional (1-D) spatial filter and one 1-D temporal filter. The overall estimate of perceived video quality degradation is given by a geometric mean of the spatial distortion and spatiotemporal dissimilarity values.

1. Spatial distortion  

The figure below shows examples of the first frame (a) and the last frame (b) of a specific GOF of video mc2_50fps.yuv from the LIVE video database. The visible distortion map (c), the statistical difference map (d), the motion magnitude map (e), and the spatial distortion map (f) computed for this GOF are also shown. As seen from the visible distortion map (c) and the statistical difference map (d), at the regions of high visible distortion level (i.e., the train, the numbers in the calendar), the spatial distortion map is weighted more by the statistical difference map. At the regions of low visible distortion level (i.e., the wall background), the spatial distortion map is weighted more by the visible distortion map. As also seen from Figs. (c) and (d), the region corresponding to the train at the bottom of the frames is more heavily distorted than the other regions. However, due to the fast movement of the train, which is reflected in the bottom of the motion magnitude map (e), the visibility of distortion is reduced, making this region less bright in the spatial distortion map (f).

(a) First frame of the distorted GOF (b) Last frame of the distorted GOF (c) Visible distortion map
(d) Statistical difference map (e) Motion magnitude map (f) Spatial distortion map

2. Spatiotemporal dissimilarity  

As observed from the video mc2_50fps.yuv (LIVE), the spatial distortion occurs more frequently in the middle frames. These middle frames are also heavily distorted in nearly every spatial region. This fact is well-captured by the spatiotemporal dissimilarity map in Fig. (c) (upper). As observed in Fig. (c) (upper), the dissimilarity map is brighter in the middle and along the entire spatial dimension. In video PartyScene_dst_09.yuv (CSIQ), the spatial distortion that occurs in the center of the video is smaller than the distortion in the surrounding area. This fact is also reflected in the spatiotemporal dissimilarity map in Fig. (c) (lower), where the spatiotemporal dissimilarity map shows brighter surrounding regions compared to the center regions across the temporal dimension.

mc2_50fps.yuv (LIVE)
PartyScene_dst_09.yuv (CSIQ)
  (a) Reference STS images (b) Distorted STS images (c) Spatiotemporal dissimilarity maps

Supplementary Information

Notes on the change of threshold value
As shown in Equation (17) of the manuscript, the threshold value of 0.9 was chosen empirically so that a relatively high positive correlation (greater than the threshold value) is still considered perfect by the algorithm. We perform the test by changing the threshold value in the range from 0.85 to 0.95 and observing the performance of the algorithm. The results are shown in the following table.

Threshold value 0.85 0.875 0.90 0.925 0.95
SROCC LIVE 0.753 0.745 0.736 0.723 0.707
IVPL 0.807 0.813 0.817 0.821 0.824
CSIQ 0.831 0.832 0.831 0.831 0.830
CC LIVE 0.762 0.755 0.746 0.734 0.722
IVPL 0.816 0.819 0.823 0.826 0.829
CSIQ 0.828 0.829 0.830 0.829 0.828
SROCC LIVE 0.820 0.819 0.816 0.810 0.807
IVPL 0.894 0.895 0.896 0.897 0.897
CSIQ 0.842 0.841 0.841 0.839 0.837
CC LIVE 0.833 0.831 0.829 0.825 0.822
IVPL 0.894 0.895 0.896 0.897 0.897
CSIQ 0.832 0.831 0.830 0.828 0.825
Table I: Overall performance of ViS2 and ViS3 with different threshold value.

As we see in the table, the performance of the algorithm is relatively robust to small changes in this threshold value. We select the threshold value of 0.9 because it generally provides good performance on all three databases. However, the optimal choice of the threshold value remains an open research question.

Links to the Other Databases Reported in the Paper
The following list provides access information for the databases reported in the paper.