bbt presently ignores some differences between the expected and actual results (case, blank space on a line, blank lines).
In a discussion, I suggested that for a possible future release, when choosing whether or not to be sensitve to changes in blanks (space, tab) on a line, it would be better to say whitespace [in]sensitive rather than blank [in]sensitive.
Given that some definitions of “whitespace” include newlines, and bbt internally already has the notion of blank-lines [in]sensitive, what would people prefer?
maybe options to ignore:
blank space - non new line whitespace only
new line - new line characters only
empty line - empty lines only
whitespace - any blank space or new line characters
I don’t know appropriate names for all of those though.
Regarding what is commonly understood by whitespace or blank, I just ask to “Le Chat” (Mistral LLM chat):
In the context of command-line interfaces for computer applications, the terms “whitespace insensitive” and “blank insensitive” refer to how input is interpreted regarding spaces and other whitespace characters.
Whitespace Insensitive: This means that the command or input does not distinguish between different types of whitespace characters, such as spaces, tabs, newlines, etc. All these characters are treated equivalently. For example, a command might ignore the difference between a space and a tab when parsing arguments.
Blank Insensitive: This term is less commonly used, but it generally refers to ignoring blank spaces specifically, rather than all types of whitespace. It might mean that spaces are ignored, but other whitespace characters like tabs or newlines are not necessarily treated the same way.
In practice, “whitespace insensitive” is a broader term that encompasses all types of whitespace, while “blank insensitive” is more specific to spaces. The exact behavior can depend on the implementation of the command-line interface or application in question.
This seems to confirm that whitespace is a better choice here!
From a functional point a view, I see two use-cases :
an exact match with expected results,
or a more “human reading” semantic match (ignoring empty lines and spaces differences)
BTW, there is here two possible understandings of “ignoring”:
Most human may consider
line 1
line 2
and
line 1
line 2
semantically equal, but not
word1 word2
and
word1word2
in other words, I may suppress all empty lines before comparing, but for spaces, what I most probably want is compressing consecutive spaces to one space before comparing.
(not to mention:
word1:word2
that would be considered equal to
word1 : word2
!).
Those nuances are not precisely reflected in current options names.
But, if I go back to my point, I should maybe simplify to:
a picky default behavior that expect an exact match;
a single “human semantic” option that ignore supernumerary spaces/tab/etc. and empty lines.
and this single option could be named --ignore_whitespace.
I think for your first example that word1word2 has 0 whitespace while the line example still has at least one whitespace (a single new line between). I believe the essence of any ignore whitespace option means that it ignores all additional whitespace, treating them as one whitespace character, but it doesn’t mean that it ignores all whitespace.