A set that ultimately(1) contains other sets can miscompute membership. For example, the attached script, if run 100 times, will fail to delete the given set member about 15% of the time. This appears to be due to randomizing the hash functions, because in related testing I was able to get consistent behavior by loading the same seeds each time.
When fixing this problem, if set operations like union/intersection have been implemented (per a pending branch), then be sure to test them too.
(1) We don't currently support sets-of-sets. Would be good to fix this at the same time too.