Quantcast
Channel: Adobe Community : Popular Discussions - Acrobat Macintosh (read-only)
Viewing all articles
Browse latest Browse all 6300

OCR renderable text error

$
0
0

someone was having the problem below with an older version of acrobat.

is there now a solution in acrobat mac x?

i note that exporting to image file loses quality and increases file size

thanks

 

 

 

Well, since this is the digital age, it makes sense that I ought to  read the PDFs in digital form (this is a stretch for me, I really like  paper), which is facilitated by a tablet since I can actually see the  page when it’s in the portrait configuration.  It also makes sense that I  ought to mark up the file in Acrobat, using the native highlighting and  searching tools, which is also facilitated by the tablet for obvious  reasons.

Here’s the problem.  Apparently *every* PDF file, in every digital library, is tagged with headers, or footers, or bates numbers, or some other tag that halts the OCR recognition of the PDF file.   If you google “This page contains renderable text”, you’ll see that  this has been a complaint since Acrobat 6 at least.  So you can’t just  OCR the document and get a nice,  mark-up-able document.

Now, I know what you’re thinking.  There has to be a workaround,  right?  Of course, there is.  You can manually remove the headers and  try again.  Oh, now there’s a footer; you can take that out too  (manually) and try again.  Oh, now there’s a bates number, okay, take  that out too.  There’s STILL some renderable text in there somewhere,  well, now you can either try and edit out the blocks of renderable text  (again, manually, made more entertaining by the fact that you can’t just  right click on the page and say “remove renderable text”), or you can  export the entire document to a graphics file (say, a TIFF), re-convert  it to a PDF file (which turns the entire document into a rasterized  image), and THEN run the OCR tool to get an actual mark-up-able  document.  This process is made more enjoyable by the fact that Acrobat  will turn that 300 page dissertation you’re reading as part of your  research into 300 distinct TIFF files, which you then need to recombine  into a PDF file.  Multiply this by 100, and you’ll see what sort of a  barrier to productivity this is for me to get started organizing my  existing document collection.

This is CLOSE TO THE DUMBEST THING I HAVE EVER SEEN.  And I’ve seen a  LOT of bad design.  Rather than prompting me “This document has  renderable text” and giving me “Cancel” as the only option, any  feature-driven developer would say, “Gosh, people get really frustrated  by this.  I know, because I can read the results of a simple google search.    We need to change this right away!  Here, I’ll make it so that you  can just click ‘Treat existing renderable text as white space’ or even  prompt the user to rasterize the renderable text and embed it in the  document, then OCR the resulting file!”

The only conceivable reason I can imagine that this hasn’t taken  place is because your lovable electronic document vendor wants to make  it a colossally, enormously painful process for someone to actually do anything to the document they’re providing you to use.  Thank you, electronic  document vendor.  You’re going to be wasting about 20% of the time that  you’re saving me by giving me electronic access to this document in the  first place.

Progress is grand.  Collide it with self-interest, progress seems to lose out more often than not.

Now, if you’ll pardon me, I’m going to go get some sleep.  Then I’m  going to get up in the morning and go to work.  Then I’m going to come  home, and instead of enjoying some family time with my kids, I’m going  to fart around with manual document conversion.


Viewing all articles
Browse latest Browse all 6300

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>