Another saga for corrupt images

I had blogged in the past on how to track corrupt images while rendering a SSRS report containing images. In this post, I shall talk about generating reports for reports for images which have a corrupted JPEG stream.

If you have an image which has a corrupt JPEG stream, then the report will render without any issues and an export of the report to a PDF document will report no errors. However, when you open the PDF document, you will be greeted with the error seen in the screenshot below reporting “An internal error occurred”:

image

Now this can be a bit disconcerting when you are automating your report generation and are not aware of any exceptions reported. Interestingly when this report is rendered in MHTML format, the file can be opened without issues but you still see the corrupt image stream. The crux of the matter lies in how PDF documents are rendered and when a corrupt JPEG stream is encountered. This issue manifests itself while creating a PDF document with such a corrupt image stream outside of SQL Server Reporting Services.

Now that we know that there are image coding standards at play here and nothing much can be done, what are the options.

For the curious mind, the image when viewed in a picture viewing application shows the corrupt stream as seen below:

image

The first option is to use an image processing executable and detect all images that you will be processing in your report before you render it into a PDF format.

Image Magick DLLs and binaries are one such option. You could either choose to use the binaries that are available on the site or you could create your own picture verification package using their DLLs available on the site. Their code is available on CodePlex as well. The binaries available on their site has an executable called “Identify”. This can be used to identify corrupt image streams. The output of the tool when used to evaluate an image will be as follows:

identify.exe: Corrupt JPEG data: bad Huffman code `C:\Tempdb\1.jpg’ @ warning/jp
eg.c/JPEGWarningHandler/346.
identify.exe: Corrupt JPEG data: 20 extraneous bytes before marker 0xd9 `C:\Temp
db\1.jpg’ @ warning/jpeg.c/JPEGWarningHandler/346.

The other way to workaround this issue would be to use Custom Code within your report to re-save the image. This doesn’t remove the corrupt image stream but the image snippet that you see in the screenshot above now becomes part of your image. This will help you generate your PDF documents correctly and open them for viewing later. However, if you want to correct the corrupt JPEG stream, then you will need to store the correct image in your database.

Steps for using custom code to prevent PDF opening failure

1. Paste the VBCode below in the Code section of the Report.

' From the data base the data is returned in the form of byte  array. So we’re writing a function that accepts the byte array as parameter.

' Also it is going to return the new image as byte array.

Public Shared Function ConvertImage(ByVal imgData As Byte()) As Byte()

'Stream to hold the source image (jpeg).

Dim msSource As New System.IO.MemoryStream()

'Stream to hold the converted image (png).

Dim msDest As New System.IO.MemoryStream()

' Reading the image from byte array to the source stream.

msSource.Write(imgData, 0, imgData.Length)

msSource.Flush()

'Image object that will store the image extracted from the source stream.

Dim imgSource As System.Drawing.Image = System.Drawing.Image.FromStream(msSource)

'Saving the New, Converted Image in to the destination stream.

imgSource.Save(msDest, System.Drawing.Imaging.ImageFormat.Png)

'Converting the New stream into Byte array, to be used inside the reporting service image control.

Dim imgDest(msDest.Length) As Byte

imgDest = msDest.GetBuffer()

Return imgDest

End Function

2. Now click on the References tab, Under Assembly Name, click on the ellipses button “…”.

3. Locate and select System.Drawing.dll, Click on Add and then Click Ok.

4. Now go to the report, Image control Properties and click on the Value. which would look something similar to this =First(Fields!Photo.Value, “DataSetName”)

5. Replace the above value with the expression, =Code.ConvertImage(First(Fields!Photo.Value,”DataSetName”)).

6. Export the report to PDF and you will now notice that the previous “internal error” is not reported.

Since you are adding custom code to your report to re-save your binary image stream, you will incur an additional processing overhead while generating the reports. If your PDF contains a large number of images, then you will have to factor in the additional resource and time consumed to perform this sanitization process!

IMHO using the first option of checking the correctness of the JPEG stream using a separate executable would be much more feasible in case you are processing 1000+ images in a single report.

A special shout-out to my colleagues Selva [Blog] and Arvind [Blog] for helping me out on this issue!

Disclaimer: The Sample Code is provided for the purpose of illustration only and is not intended to be used in a production environment. THIS SAMPLE CODE AND ANY RELATED INFORMATION ARE PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE. Image Magick is a third-party utility which has been referenced in this post. Please refer documentation on the Image Magick website regarding licensing, warranty and usage before implementing the use of the same in your environment.

 

Advertisement

The case of the corrupt images

I was recently working on an environment where there was a need to find find out corrupted image streams in a database in the form of BLOBs. You might say why not run a CHECKDB on the entire table and find out if there are any inconsistencies. Well if it were that simple, I probably wouldn’t have started penning this post! 🙂 Let me give you a bit of history here to set things in perspective…. The binary stream was inconsistent when the data was inserted into the database. So this was a clear case of “Garbage In Garbage Out”.

imageWhat made the problem take a convoluted turn was the use of this BLOB data in SQL Server Reporting Services reports. If you have reports which convert BLOB data into images before rendering them into reports, then the rendering would fail if the binary stream is inconsistent. When you have a few images in the table, it’s quite possible to manually identify the inconsistent BLOB data. Now when the number of images stored in the database moves to tens of millions, I am sure the DBA will find it very amusing to spend a years looking through the images manually! So what really shows up as the image when you have a corrupted binary stream in a SSRS report is just a Red “x”! (see left).

Since the manual option didn’t seem very palatable, I thought it might be time for some automation. And that is where C# came to my rescue. The logic was very simple. Pick an image of the database, store it on the filesystem in JPG/PNG/BMP format and then verify it’s validity. There are three ways that the image being rendered can have an issue:

1. There is no image in the BLOB column in the database i.e. the column value is NULL. This is quite easy to circumvent by using the IS NOT NULL filter or the ISNULL function.
2. The image column value is not null but it stores a 0KB image which will again create a problem during the report rendering phase. This still ain’t that bad a problem as we can search for images with binary data values as 0x00 or 0x0 with relative ease!
3. The third type of image is the main problem where there might be a few bit flips or part of the binary stream is inconsistent. This makes it exceptionally difficult to locate beforehand which images will fail to render while generating the report.

0x89504E470D0A1A0A0000000….
NULL
0x0

0x89504E47309AFD068163D86……

If you look at the binary streams above, the last three fall under categories described above. All the image streams in read will create a problem. The first and last streams look valid but the fourth stream is invalid even though if it has a valid header.

Using the code below, you could identify the three different failure scenarios by scanning all the images that are stored as BLOBs in your database. Though if you have the need to undertake this task, then it is advisable that you perform such a validity check during off business hours.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Drawing;
using System.Data;
using System.Data.SqlClient;

namespace ImageCheck
{

class Program
{
static void Main(string[] args)
{
// While loop to iterate through the database table
int ctr = 1;
while (ctr <= 24)
{
ReValidateImage(ctr);
ctr++;
}

// Wait for user input before exiting
Console.ReadKey();
}


public static void ReValidateImage(int SNo)
{

bool vImgCheck;
// SQL query to fetch the image
string sql = "SELECT SNo,[Document] from dbo.myTable WHERE SNo = " + SNo.ToString();

SqlConnection con = new SqlConnection("Server=.;Integrated Security=True;database=ReportDB");
SqlDataAdapter da = new SqlDataAdapter(sql, con);
SqlCommandBuilder MyCB = new SqlCommandBuilder(da);
DataSet ds = new DataSet("MyImages");
byte[] MyData = new byte[0];

// Fetch the image into a dataset
da.Fill(ds, "MyImages");
DataRow myRow;

myRow = ds.Tables["MyImages"].Rows[0];

// Store the image on the filesystem
try
{
if (myRow["document"] != null)
{
MyData = (byte[])myRow["document"];
int ArraySize = new int();
ArraySize = MyData.GetUpperBound(0);
// Handle ZERO kb image size
if (ArraySize > 0)
{
FileStream fs = new FileStream(@"C:\1.jpg", FileMode.OpenOrCreate, FileAccess.Write);
fs.Write(MyData, 0, ArraySize);
fs.Close();
// Check image saved on filesystem for validity
using (FileStream fsRead = new FileStream(@"C:\1.jpg", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
using (StreamReader sr = new StreamReader(fsRead))
{
vImgCheck = IsValidImage(fsRead);
if (vImgCheck == true)
Console.WriteLine("Image is OK for iteration " + SNo.ToString());
else
Console.WriteLine("Image is corrupt for iteration " + SNo.ToString());
}
}
}
else
{
// Report ZERO kb images
Console.WriteLine("Image size is ZERO for iteration " + SNo.ToString());
}
}
}
catch (SystemException ex)
{
// Handle NULLs extracted from the database
if (ex.GetType().ToString() == "System.InvalidCastException")
Console.WriteLine("Cannot export image for iteration " + SNo.ToString());
}
}
// Function to check validity of an image
public static bool IsValidImage(Stream stream)
{
try
{
try
{
// Try and check if the seek is possible. If it fails, then we have problems!
stream.Position = 0;
stream.Seek(0, SeekOrigin.Begin);
}
catch { }
using (Image img = Image.FromStream(stream))
{
// IF there are no errors, then report success
img.Dispose();
return true;
}
}
catch
{
return false;
}


}
}

A sample output of the above code when executed against a table which has two columns SNo (int) and Document (image) with SNo being the primary key would look like this:

Image is OK for iteration 1
Image size is ZERO for iteration 2
Cannot export image for iteration 3
Image is corrupt for iteration 4
Image is corrupt for iteration 5
Image is corrupt for iteration 6

So what can you do with the above code1? Firstly, you would need to modify the functions to align with the schema of your table. The second TO-DO item would be to write another function for storing the inconsistent binary stream row details in a text file or into a database table. You could even add an user interface if you want to go all out fancy!

Disclaimer: This Sample Code is provided for the purpose of illustration only and is not intended to be used in a production environment. THIS SAMPLE CODE AND ANY RELATED INFORMATION ARE PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE.