Search in binary files


#1

Hello. Is it possible to search for text in binary files, such as Microsoft Word documents? When I do an Edit > Find in Files, with the “Where” option set to “Files”, the search feature seems to notice that the binary files exist, but it doesn’t seem to be looking inside of the binary files because it never finds text that is present and that I’m searching for.

For instance, let’s say I have a Word document containing the string “Example”, located in a temporary folder, “D:\tmp”. When I search for that string in the files in that directory, it doesn’t find the string. Yet if I open the file in Komodo and search in “Current document”, it finds the string.

By default is Komodo ignoring binary files?


#2

hey @mjross,

In certain cases is does but it looks like not in the case of “find all in files”. I think what might be going on here is that that’s not a binary file. I found that at least for docx files, they aren’t binary but zipped files so unless Komodo knew to unzip files before searching them, it can’t search word files in that format. I don’t know what the excuse would be for other types of word files.

Looking at the code in Komodo though, if you have an actual binary file, find should work.

  • Carey

#3

Hi, @careyh! Is that true also for older “.doc” files (I’m using Office XP) and not just “.docx” files? I was under the impression that the former were binary files. And I should have been more specific in my original post.


#4

Hi!

I’m not sure. I just happened across a comment in another forum post that explained that docx files were basically zipped files. I don’t actually have an easy way to get a *.doc file. Could you send me a sample file and I’ll try a couple things? careyh at activestate.

  • Carey

#5

Sorry for the delay! I normally see notifications of new messages, but somehow missed this last one. I created a simple *.doc file, containing the unique text (to better narrow down any search) of “teststring” (just the word – no quote marks). But this UI won’t let me upload it. I’ll email it to in a moment…


#6

Looks like there is hidden info in there that we mortals cannot see so even with a simple string like teststring we can’t search for it. I couldn’t even get grep to pick up the string of the document though it WILL find metadata inside the document. So to answer your question, yes Komodo should search binary files but this is a great example of why that’s not a great idea, there is information in binary that we’re not away of.

Ref: https://stackoverflow.com/a/11462227


#7

Thanks for your reply. That Stack Overflow post mentions “text is broken up and interspersed with field codes and formatting information”, which is true when describing multiple paragraphs or text that has had some sort of formatting, but it isn’t true for a string of plain text within a single paragraph, which is what I’m trying to search for. I will attach a screenshot of the test string in the Word document, when viewed in Komodo IDE.

One of the other posts claims that “Its a .doc file and any search more than 3 characted doesn’t work.” But when I double-click the string “teststring” in Komodo, and then paste it somewhere else, it is just those 10 characters and nothing else.

In fact, when I do a regular Komodo search (Edit > Find) for that test string, Komodo finds it. But when I do Edit > Find in Files, and that Word document is the only one in the directory that I search, then Komodo does not find the string. Specifically, the message at the bottom of the search results pane is “Found 0 occurrences in 0 of 3 files.” (I suspect that the two additional files are the file pointers for Windows, namely, the “.” and “…”.) So Komodo is finding the Word document, but not finding the search string when searching through files, even though it finds it when the file is open. It’s the strangest thing.