|
| |
|
|
BeTheBall
Posts: 6365 Joined: 6/21/2002 From: West Point Utah USA Status: online
|
Advanced PDF Search - 7/27/2005 9:44:03
I need a script or other tool that will search through a .pdf and return a list of all web addresses and phone numbers occuring within the document. Has anyone heard of such a thing? A member of our staff has the assignment of verifying the accuracy of all phone numbers and web addresses in our various form instructions and publications. Currently, she has to go through these items page by page and some are a couple hundred pages long. Anything I can come up with to make this process less burdensome will earn me a gold star.
_____________________________
Duane Some people are like Slinkies . . . Not really good for anything . . . . . But they still bring a smile to your face when you push them down a flight of stairs.
|
|
|
|
mar0364
Posts: 3159 Joined: 4/5/2002 From: Florida, US Status: offline
|
RE: Advanced PDF Search - 7/27/2005 10:01:32
Are the documents OCR readable?
|
|
|
|
BeTheBall
Posts: 6365 Joined: 6/21/2002 From: West Point Utah USA Status: online
|
RE: Advanced PDF Search - 7/27/2005 11:00:05
quote:
ORIGINAL: mar0364 Are the documents OCR readable? I have no idea. Do you have a solution if it is?
_____________________________
Duane Some people are like Slinkies . . . Not really good for anything . . . . . But they still bring a smile to your face when you push them down a flight of stairs.
|
|
|
|
caz
Posts: 3552 Joined: 10/10/2001 From: Somewhere south of Chester, UK Status: offline
|
RE: Advanced PDF Search - 7/27/2005 13:01:18
The Atomz search engine will search within pdf documents; the free version that I used to use on one site certainly did but I think that there is a limit of 500 pages for the equivalent to the free one. I am not tooo sure about that because we no longer use it.
_____________________________
Do not meddle in the affairs of cats, for they are subtle and will dance, or more on your keyboard. Cheshire cat. www.doracat.co.uk I remember when it took less than 4hrs to fly across the Atlantic.
|
|
|
|
rdouglass
Posts: 9272 From: Biddeford, ME USA Status: offline
|
RE: Advanced PDF Search - 7/27/2005 13:07:11
quote:
The Atomz search engine will search within pdf documents; I think that's just like the Google Mini - it searches only the metadata *unless* the doc is OCR'd. Unless Atomz has an OCR engine in it, it won't be able to do it unless it is OCR'd.
_____________________________
Don't take you're eye off your final destination. ASP Checkbox Function Tutorial.
|
|
|
|
mar0364
Posts: 3159 Joined: 4/5/2002 From: Florida, US Status: offline
|
RE: Advanced PDF Search - 7/27/2005 13:15:35
Yes if you have Adobe standard an they are OCR you can search them. Douglas is correct the other option is to do a paper capture. I use Adobe Standard 6 and you can paper capture a OCR readable document. However if the document was scanned as an image and saved as a PDF I don't know if there is anything that will search that. What version of Adobe are you using?
|
|
|
|
caz
Posts: 3552 Joined: 10/10/2001 From: Somewhere south of Chester, UK Status: offline
|
RE: Advanced PDF Search - 7/27/2005 14:48:34
Have you tried this plugin for the Google Desktop Scansoft Omnipage Search It would appear to do what is needed to OCR, build a document index and then search. It would also appear that Paper Capture was last in Acrobat4 as standard, but it is available now in Acrobat 7, as an extra I think and combined with other features.
_____________________________
Do not meddle in the affairs of cats, for they are subtle and will dance, or more on your keyboard. Cheshire cat. www.doracat.co.uk I remember when it took less than 4hrs to fly across the Atlantic.
|
|
|
|
rdouglass
Posts: 9272 From: Biddeford, ME USA Status: offline
|
RE: Advanced PDF Search - 7/27/2005 15:02:27
That ScanSoft Omnipage Search looks interesting but it seems to work only at the individual level and with Google Desktop. To me, that seems pretty restrictive yet it does seem to be an option. I do still own Acrobat 4 and yest the capture feature does work like that and then you can search it using MS INdex Server. However, it does require a lot of manual intervention.
_____________________________
Don't take you're eye off your final destination. ASP Checkbox Function Tutorial.
|
|
|
|
rdouglass
Posts: 9272 From: Biddeford, ME USA Status: offline
|
RE: Advanced PDF Search - 7/28/2005 9:16:29
quote:
My thought was that you could run the Omnipage search on the pdfs in bulk to get a set of results, on the urls at least, and then run that set through a url checker? Hey, that might be an idea and possibly could be 'duct taped' together if it was run at the server. Hmmm...
_____________________________
Don't take you're eye off your final destination. ASP Checkbox Function Tutorial.
|
|
|
|
Charles W Davis
Posts: 1725 Joined: 3/7/2002 From: Henderson Nevada USA Status: offline
|
RE: Advanced PDF Search - 8/5/2005 20:37:11
BeTheBall, A site search provided by master.com will search most pdf within a web site. I said "most". On this web site all pdf were created from MS Publisher using Adobe Acrobat Pro 6.0. http://www.myscacc.org/search.htm Search for gabe (one of our contributing authors). It will return several instances. However, a pdf created from a Quark document for a high resolution glossy magazine will not return any hits.
_____________________________
Enjoy! It' s your endeavor! http://www.anthemwebs.com
|
|
|
|
BeTheBall
Posts: 6365 Joined: 6/21/2002 From: West Point Utah USA Status: online
|
RE: Advanced PDF Search - 8/7/2005 19:06:22
Thanks Charles. I'll have a look.
_____________________________
Duane Some people are like Slinkies . . . Not really good for anything . . . . . But they still bring a smile to your face when you push them down a flight of stairs.
|
|
New Messages |
No New Messages |
Hot Topic w/ New Messages |
Hot Topic w/o New Messages |
Locked w/ New Messages |
Locked w/o New Messages |
|
Post New Thread
Reply to Message
Post New Poll
Submit Vote
Delete My Own Post
Delete My Own Thread
Rate Posts
|
|
|