A benchmark dataset and evaluation suite for verifiable natural language to linear temporal logic translation.