|
|
Lab Report |
Compute CRC-32s for Files, a Directory, or a Volume |
||
|
||
Purpose
The purpose of this project is to show how to compute CRC-32s for one or more files
and to form a "metaCRC" based on an ordered sequence of files for a directory or volume. The integrity
of files processed through a "scan" operation can be checked at any later time in a "verify"
operation by saving a file containing the observed CRC values.
Background
For QA/QC purposes, knowing that a file, a directory or even a volume is exactly
the same as another is very useful. After "burning" several CD-Rs I discovered that some of the
files were not being written reliably. I wanted to find a way to verify whether the CD "burn" had
exactly the same contents as the original disk copy.
For example, I discovered a CD-R with 646 directories, 13,421 files and 509,314,783 bytes had 12 bad files! As long as I could identify which files were "bad," and verify the "bad" files could be safely ignored, I could then "accept" the CD as a valid backup copy even with bad files. This Lab Report is about a FileCheck utility that can be used to automate this verification of CD-Rs or other media.
Materials and Equipment
Software Requirements
Windows 95/98
Delphi 3/4/5 (to recompile)Hardware Requirements
VGA display
Procedure
Discussion
"Scan" and "Verify" are the two main functions of this program. A volume, directory
or file can be scanned with resulting data written to a File List (for viewing by a human much like a DOS
DIR list) and a Verify File.
An example "Log" for a successful volume scan appears as follows:
Volume e:\ |
A File List disk file created during a scan is an expanded version of the information shown by the DOS "DIR" command and is intended to be in a human readable.
Sample FileList disk file |
FileCheck: e:\ 09/07/1999 21:45 |
A Verify disk file created during a scan is an ASCII text file. This file is intended for processing by the "Verify" operation, or other computer programs. (Editing this file may cause the Verify operation to report erroneous results.)
Sample Verify disk file |
V
Label = EFG-DELL-E VolSer = D7D4B00A |
At a later time, the information stored in the Verify File can be verified to see that all CRCs match the original values. A Print button allows printing the Scan or Verify operations for documenation purposes.
A volume "scan" is much like the scan of the root directory of a volume, except that the volume label and volume serial number are stored as part of the information about a volume. A volume "scan" always implies that all subdirectories should be scanned. The Subdirs Checkbox allows one to specify whether subdirectories should be scanned in a Directory "scan."
If multiple instances of FileCheck are run, be sure that unique File List and Verify files are specified. If you blank either of the fields for these files, the corresponding file is not created.
The BitBtnScanClick method is called for a "click" on any of the Scan buttons. The Tag value of each button is used to determine whether the scan is for the volume in the TDriveCombobox, the directory in the TDirectoryListBox, or the file in the TFileListBox. A further helper routine, ScanDirectoryTarget, is called for processing a volume of directory scan.
The BitBtnVerifyFileClick method is called for the "verify" operation. Many of the variables used for scanning are replicated within this routine so that (in theory) a scan and a verify could run simultaneously without interfering with each other.
See the CRC Calculator Lab Report for how to compute the CRC-16/CRC-32 of a character string, including source code for a CalcFileCRC32 procedure from the CRC32.PAS unit.
Two versions of CalcFileCRC32 are available. The StreamIO conditional compilation variable allows to select I/O using Streams or with the older BlockRead routine. Since I have observed that BlockRead is still faster than Stream.LoadFileFrom, the default is setting is NoStreamIO.
Here are the two possible ways the CRC32 of a file is computed using the CalcCRC32 procedure:
CalcFileCRC32 using a TMemoryStream |
// The CRC-32 value calculated here matches the one from the PKZIP program. // Use MemoryStream to read file in binary mode. PROCEDURE CalcFileCRC32 (FromName: STRING; VAR CRCvalue: DWORD; VAR TotalBytes: TInteger8; VAR error: WORD); VAR Stream: TMemoryStream; BEGIN error := 0; CRCValue := $FFFFFFFF; Stream := TMemoryStream.Create; TRY TRY Stream.LoadFromFile(FromName); IF Stream.Size > 0 THEN CalcCRC32 (Stream.Memory, Stream.Size, CRCvalue) EXCEPT ON E: EReadError DO error := 1 // arbitrarily set this for now END; CRCvalue := NOT CRCvalue; TotalBytes := Stream.Size FINALLY Stream.Free END END {CalcFileCRC32}; |
An Error code 1 is return from this procedure when an EReadException is encountered since the Exception Message string did have any additional useful information. (See IOResult values below with BlockRead).
CalcFileCRC32 using BlockRead |
// The CRC-32 value calculated here matches the one from the PKZIP program. // Use BlockRead to read file in binary mode. PROCEDURE CalcFileCRC32 (FromName: STRING; VAR CRCvalue: DWORD; VAR TotalBytes: TInteger8; VAR error: WORD); CONST BufferSize = 32768; TYPE BufferIndex = 0..BufferSize-1; TBuffer = ARRAY[BufferIndex] OF BYTE; pBuffer = ^TBuffer; VAR BytesRead: INTEGER; FromFile : FILE; IOBuffer : pBuffer; BEGIN New(IOBuffer); TRY FileMode := 0; {Turbo default is 2 for R/W; 0 is for R/O} CRCValue := $FFFFFFFF; ASSIGN (FromFile,FromName); {$I-} RESET (FromFile,1); {$I+} error := IOResult; IF error = 0 THEN BEGIN TotalBytes := 0; REPEAT {$I-} BlockRead (FromFile, IOBuffer^, BufferSize, BytesRead); {$I+} error := IOResult; IF (error = 0) AND (BytesRead > 0) THEN BEGIN CalcCRC32 (IOBuffer, BytesRead, CRCvalue); TotalBytes := TotalBytes + BytesRead; // can't use INC with COMP END UNTIL (BytesRead = 0) OR (error > 0); CLOSE (FromFile) END; CRCvalue := NOT CRCvalue FINALLY Dispose(IOBuffer) END END {CalcFileCRC32}; |
The most likely error values returned by this routine are as follows:
Error |
Brief Description |
30 | ERROR_READ_FAULT occurs when the system cannot read from the specified device. |
31 | ERROR_GEN_FAILURE occurs when a device attached to the system is not functioning. |
32 | ERROR_SHARING_VIOLATION. The process cannot access the file because it is being used by another process. This is likely to happen if you try to scan the Windows Swap file, e.g., Error Code 32 reading file c:\WINDOWS\WIN386.SWP |
Whenver a read error occurs, an error message is displayed in the log and the CRC is assigned a value of $00000000.
A CRC-32 value can be computed for a each file in a directory. The CRC of an ordered list of files in a directory could be directly computed, but maintaining the information about the computation is somewhat a pain. So instead of a "true" directory CRC, a "MetaCRC" is computed for a well-ordered list of files in a directory. This MetaCRC is simply a CRC of the file CRCs.
A Directory MetaCRC is a CRC of the file CRCs in a directory, which are processed in alphabetical order. Each of the file CRCs is converted to an 8-byte hex string for computing the Directory MetaCRC. (This facilitates a similar computation on machines of a different endianess. That is, CRCing the list of file hex CRCs will give the same result on either a PC with little endian words, or a UNIX workstation with big endian words.)
The Volume MetaCRC is a CRC of the Directory MetaCRCs taken in alphabetical order.
The FileListLibrary.PAS unit provides a ScanDirectory procedure for a generic way to process a hierarchy of directories and files. Two callback routines are parameters to ScanDirectory to process each file, and to process the beginning and end of a directory. The routines ProcessDirectory and ProcessFile in ScreenFileCheck.PAS are the routines used as parameters to ScanDirectory.
To define a well-ordered list of files in a directory, a third parameter is a routine that is used to compare file names within a directory. The OrderByFilename function in ScreenFileCheck.PAS uses StrIComp to compare filenames in a case insensitive way.
A global variable in the FileListLibrary unit, ContinueScan, allows an external routine to stop the processing of directories and files (intended to be set by a "Cancel" button).
The Dbt_h.PAS file is a partial translation of DBT.H, which was adapted from "Notification of CD-ROM insertion and removal," http://www.undu.com/Articles/980221b.htm. The WmDeviceChange message is used to detect a change in CD-ROMs so the . (Setting a Debug compilation conditional enables additional log comments when this messgae is received).
The Refresh button on the Scan TabSheet forces an update of the TDriveComboBox, which may be necessary on some devices that do not generate a WmDeviceChange such as ZIP drives. Calling the BuildList methods of both the TDriveComboBox and the TDirectoryListBox updated these controls.
Unfortunately, the BuildList methods of both the TDriveComboBox and the TDirectoryListBox are protected methods. Creating new controls derived from these classes is somewhat of a pain just to call the protected BuildList method. To get around this limitation, derived classes were defined:
type // Trick to call protected method of TDriveCombobox TMyDriveComboBox = CLASS(TDriveComboBox) END; // Trick to call protected method of TDirectoryListbox TMyDirectoryListBox = CLASS(TDirectoryListbox) END; |
These new derived classes were only used to typecast the original values and call the "protected" methods in the WmDeviceChange routine and the following:
procedure TFormFileList.SpeedButtonRefreshClick(Sender: TObject); VAR SaveDrive: CHAR; begin SaveDrive := DriveComboBox.Drive; TMyDriveComboBox(DriveComboBox).BuildList; DriveComboBox.Drive := SaveDrive; TMyDirectoryListBox(DirectoryListBox).BuildList; end; |
Any change in a file will most likely result in different CRC value. Keeping the number of bytes and the CRC value the same is even a more strict requirement. The "verify" operation for each file checks that a file's size and CRC-32 is the same. The "verify" operation for a directory is that the directory has the same number of files, bytes and MetaCRC values. Likewise, a volume match looks for the same number of directories, files, bytes and MetaCRC values.
A ScanDetails Radiobox is partially implemented but is hidden in the current implementation. This allows the CRC file to only contain directory information instead of file-by-file details. (The "Scan" functionality of this feature works, but the "Verify" functionality doesn't work correctly when "Directories" is chosen instead of "Files.")
The Verify operation reads a Verify.CRC file created in the Scan phase. The number of lines in this file is used as the measure of progress in the progress bar. A TTokens class is used to parse the tokens in the Verify.CRC file.
So far, the process of simply attempting to read each file on a CD-R has identified the "bad" files -- files that cannot be opened and read. CRC mismatches have not yet been observed on the same CD-R over time.
One side effect of the process of verifying every byte on a CD was to identify a virus (using McAfee VirusScan) that was stored on several of my CD backups.
Conclusions
The FileCheck utility is a handy utility to verify a copy of a file, directory or even
a volume (within acceptable probabilities).
Keywords
cyclic redundancy check, CRC-32, Lookup Table, MetaCRC, CalcCRC32, CalcFileCrc32, Stream I/O, TMemoryStream,
BlockRead, WmDeviceChange message, DBT.H, FindFirst/FindNext/FindClose, TSearchRec, TStringList, Sort, StrIComp,
Int64, Comp, IntToHex, FormatFloat, FormatDateTime, Format, GetVolumeInformation, Volume Serial Number, Volume
Label, TTabSheet, TDriveComboBox, TDirectoryListBox, TFileListBox, procedure variables, calling protected methods,
tokens
Files (only for noncommercial use)
Delphi 3/4/5 Source Code and EXE (195 KB): FileCheck.ZIP