java.util.UUID accepts truncated UUIDs
Strange but true! I was working on some code today where I wanted to accept some UUIDs in String format and throw an exception if they weren’t valid. Rather than reinvent the wheel (or regex) I decided I would use the platform’s UUID type and ask it to parse the string. If it succeeds then I know it’s okay.
First up was Objective-C and NSUUID
. Depending on what I pass in, the initialiser would return either a valid instance or nil
. Like a good little programmer I wrote some unit tests with a selection of valid and invalid UUIDs, and so far this plan was working nicely.
Then I turned to Android, implemented in Kotlin. I did exactly the same thing except this time using UUID.fromString(str)
to create the value. This method is supposed to throw an InvalidArgumentException
when the provided string doesn’t parse as a UUID.
I ported across my unit tests and the Kotlin equivalent was failing. The failing case: when I deleted the last hex digit of the UUID. Consider this code:
val uuid1 = UUID.fromString("e1b378ab-03b3-47b3-9aa0-af9cd199e0bd")
println(uuid1)
val uuid2 = UUID.fromString("e1b378ab-03b3-47b3-9aa0-af9cd199e0b")
println(uuid2)
This prints: (Kotlin playground)
e1b378ab-03b3-47b3-9aa0-af9cd199e0bd
e1b378ab-03b3-47b3-9aa0-0af9cd199e0b
You can see that the “node” field of the UUID (after the final -
) has been prefixed with an extra 0
. I presume it thinks this 48-bit value has been emitted without the leading zero, and attempted to patch it up. This isn’t very helpful to me - a large part of what I’m trying to validate is that the user hasn’t accidentally omitted a character. Very confusing results will occur if the UUID appears to be accepted but it is the wrong value.
More interestingly still, this behaviour doesn’t appear to be sanctioned by the RFC. RFC 4122 states:
Each field is treated as an integer and has its value printed as a zero-filled hexadecimal digit string with the most significant digit first.
…
node = 6hexOctet
hexOctet = hexDigit hexDigit
The ITU page about UUIDs also shows a string representation example with leading zeroes in the “node” field:
Example: The following is an example of the string representation of a UUID as a URN:
urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6
In the scheme of things it doesn’t matter too much but it’s an interesting quirk. The things you discover when you write unit tests!