Developer's Diary
Software development, with Terry Ebdon
09-Jul-2017

Sunday 9แต—สฐ July, 2017

Verifying Checksums with Ant and Groovy

Today I'll be taking a look at the Ant checksum task. This task allows you to:

  • Generate a checksum for a file.
  • Verify a file's checksum.
  • Generate a checksum for a fileset.
  • Verify the checksum for a fileset.
Note
When passed a fileset to check the checksum task will give a single true / false response for the entire fileset. To find out which file failed you need a separate checksum call for each file.

In the following discussion I'll be using these files as test data:

C:\test>dir/b
apache-log4j-2.8.2-bin.tar.gz
apache-log4j-2.8.2-bin.tar.gz.md5
apache-log4j-2.8.2-bin.tar.gz.sha1
apache-log4j-2.8.2-bin.zip
apache-log4j-2.8.2-bin.zip.md5
apache-log4j-2.8.2-bin.zip.sha1

C:\test>

I have two jar files. For each jar there's a checksum file for MD5 and another for SHA-1.

For a file-by-file checksum I'll be building on this Ant snippet:

<checksum
file = 'apache-log4j-2.8.2-bin.zip'
algorithm = 'md5'
verifyProperty = 'prop'
/>

This calculates the MD5 checksum for the file then looks for a file of the same name but with '.md5' tacked on the end. It expects the .md5 file to contain the MD5 checksum. If the checksum matches the calculated one the value of prop is set to true, otherwise it's set to false.

I'll be using Groovy, with AntBuilder, and delegating as much as possible to Ant.

Note
I'm specifically discussing the Ant checksum task. For a pure Groovy, or Java, solution consider using java.security.MessageDigest or the Apache Commons DigestUtils class.

Let's start by gathering a collection of all files that might have signatures:

24 new AntBuilder().with {
25  def fs = fileScanner {
26  fileset( dir: '.' ) {
27  include name: '*gz'
28  include name: '*zip'
29  }
30  }

I've created a fileset, that includes only *.gz and *.zip files, and asked fileScanner to create a list of Files from that fileset.

At line 33 I've defined a collection of algorithm names. I'm only interested in the MD5 and SHA-1 signatures, though Ant can handle other types. The code will look at each file in the fileset twice, once for each algorithm (line 37).

For each algorithm I iterate through the file list, at line 38, searching for the corresponding signature files.

33  def algorithms = ['md5','sha1']
34  def colSize = algorithms.collect { it.length() }.max()
35 
36  print '\n'
37  algorithms.each { algorithm ->
38  for ( file in fs ) {

If I pass the file name and algorithm type, e.g. md5, to checksum it can tell me whether the signature matches.

But what if I have a mixture of md5 and sha1 signature files?

41  def final prop = "isChecksumOk"
42  if ( new File( file.name + ".$algorithm" ).exists() ) {
43  new AntBuilder().with {

If you ask checksum to verify with an algorithm but the corresponding signature file doesn't exist it will report a failure. It doesn't differentiate between a bad checksum and a missing checksum.

At line 42 I check if the checksum file exists for the current algorithm. I only call the checksum task if a checksum exists for this algorithm. But note the new AntBuilder instance at line 43. Why is that needed? It's because Ant properties are immutable. Once you assign them a value they never change. The checksum task returns its result by setting a property. If I keep calling the task with the same property the result will never change; it will be set by the result of the first call.

Ant has a <local/> task that changes the scope of a property, which should allow a single property to be re-used. That didn't work for me. I'll expand on that in a future post.

I could make the property name different for each file I'm checking. That would work, though the code would be more complex and the properties hash table could get very large. I decided, instead, to instantiate another AntBuilder instance for each file. The per-file instance goes out of scope when the loop moves to the next file. This is not the most efficient code, but it doesn't need to be.

The call to AntBuilder's checksum task, at line 46, is very similar to the Ant XML I started with:

46  checksum(
47  file: file,
48  algorithm: algorithm,
49  verifyProperty: prop )

Now I just need to display the result:

52  print "${algorithm.padLeft( colSize ) }: "
53  print project.properties[ prop ] == 'true' ? 'ok ' : 'FAIL!'
54  println " -- ${file.name}"

Running against the downloaded Log4J 2 files gives:

C:\Downloads>groovy Checksum.groovy

 md5: ok    -- apache-log4j-2.8.2-bin.tar.gz
 md5: ok    -- apache-log4j-2.8.2-bin.zip

sha1: ok    -- apache-log4j-2.8.2-bin.tar.gz
sha1: ok    -- apache-log4j-2.8.2-bin.zip

C:\Downloads>

If I edit an MD5 signature file, to force an error, the output becomes:

C:\Downloads>groovy Checksum.groovy

 md5: ok    -- apache-log4j-2.8.2-bin.tar.gz
 md5: FAIL! -- apache-log4j-2.8.2-bin.zip

sha1: ok    -- apache-log4j-2.8.2-bin.tar.gz
sha1: ok    -- apache-log4j-2.8.2-bin.zip

C:\Downloads>

Related web pages



8th July ๐Ÿ‘ˆ Top of page ๐Ÿ‘‰ 10th July

© 2017 Terry Ebdon.

Find me coding on GitHub, networking on LinkedIn, answering questions on Stack Exchange and hanging out on twitter.