someone was having the problem below with an older version of acrobat.
is there now a solution in acrobat mac x?
i note that exporting to image file loses quality and increases file size
thanks
Well, since this is the digital age, it makes sense that I ought to read the PDFs in digital form (this is a stretch for me, I really like paper), which is facilitated by a tablet since I can actually see the page when it’s in the portrait configuration. It also makes sense that I ought to mark up the file in Acrobat, using the native highlighting and searching tools, which is also facilitated by the tablet for obvious reasons.
Here’s the problem. Apparently *every* PDF file, in every digital library, is tagged with headers, or footers, or bates numbers, or some other tag that halts the OCR recognition of the PDF file. If you google “This page contains renderable text”, you’ll see that this has been a complaint since Acrobat 6 at least. So you can’t just OCR the document and get a nice, mark-up-able document.
Now, I know what you’re thinking. There has to be a workaround, right? Of course, there is. You can manually remove the headers and try again. Oh, now there’s a footer; you can take that out too (manually) and try again. Oh, now there’s a bates number, okay, take that out too. There’s STILL some renderable text in there somewhere, well, now you can either try and edit out the blocks of renderable text (again, manually, made more entertaining by the fact that you can’t just right click on the page and say “remove renderable text”), or you can export the entire document to a graphics file (say, a TIFF), re-convert it to a PDF file (which turns the entire document into a rasterized image), and THEN run the OCR tool to get an actual mark-up-able document. This process is made more enjoyable by the fact that Acrobat will turn that 300 page dissertation you’re reading as part of your research into 300 distinct TIFF files, which you then need to recombine into a PDF file. Multiply this by 100, and you’ll see what sort of a barrier to productivity this is for me to get started organizing my existing document collection.
This is CLOSE TO THE DUMBEST THING I HAVE EVER SEEN. And I’ve seen a LOT of bad design. Rather than prompting me “This document has renderable text” and giving me “Cancel” as the only option, any feature-driven developer would say, “Gosh, people get really frustrated by this. I know, because I can read the results of a simple google search. We need to change this right away! Here, I’ll make it so that you can just click ‘Treat existing renderable text as white space’ or even prompt the user to rasterize the renderable text and embed it in the document, then OCR the resulting file!”
The only conceivable reason I can imagine that this hasn’t taken place is because your lovable electronic document vendor wants to make it a colossally, enormously painful process for someone to actually do anything to the document they’re providing you to use. Thank you, electronic document vendor. You’re going to be wasting about 20% of the time that you’re saving me by giving me electronic access to this document in the first place.
Progress is grand. Collide it with self-interest, progress seems to lose out more often than not.
Now, if you’ll pardon me, I’m going to go get some sleep. Then I’m going to get up in the morning and go to work. Then I’m going to come home, and instead of enjoying some family time with my kids, I’m going to fart around with manual document conversion.