navigation
a webmaster learning community
     Home    Register     Search      Help      Login    
Sponsors

Shopping Cart Software
Ecommerce software integrated into Frontpage, Dreamweaver and Golive templates. No monthly fees and available in ASP and PHP versions.

Website Templates
We also have a wide selection of Dreamweaver, Expression Web and Frontpage templates as well as webmaster tools and CSS layouts.

Frontpage website templates
Creative Website Templates for FrontPage, Dreamweaver, Flash, SwishMax

Search Forums
 

Advanced search
Recent Posts

 Todays Posts
 Most Active posts
 Posts since last visit
 My Recent Posts
 Mark posts read

Microsoft MVP

 

Advanced PDF Search

 
View related threads: (in this forum | in all forums)

Logged in as: Guest
Users viewing this topic: none
Printable Version 

All Forums >> Community >> Computer Software and Hardware issues >> Advanced PDF Search
Page: [1]
 
BeTheBall

 

Posts: 6365
Joined: 6/21/2002
From: West Point Utah USA
Status: online

 
Advanced PDF Search - 7/27/2005 9:44:03   
I need a script or other tool that will search through a .pdf and return a list of all web addresses and phone numbers occuring within the document. Has anyone heard of such a thing?

A member of our staff has the assignment of verifying the accuracy of all phone numbers and web addresses in our various form instructions and publications. Currently, she has to go through these items page by page and some are a couple hundred pages long. Anything I can come up with to make this process less burdensome will earn me a gold star.

_____________________________

Duane

Some people are like Slinkies . . . Not really good for anything . . . . . But they still bring a smile to your face when you push them down a flight of stairs.
mar0364

 

Posts: 3159
Joined: 4/5/2002
From: Florida, US
Status: offline

 
RE: Advanced PDF Search - 7/27/2005 10:01:32   
Are the documents OCR readable?

(in reply to BeTheBall)
BeTheBall

 

Posts: 6365
Joined: 6/21/2002
From: West Point Utah USA
Status: online

 
RE: Advanced PDF Search - 7/27/2005 11:00:05   
quote:

ORIGINAL: mar0364

Are the documents OCR readable?


I have no idea. Do you have a solution if it is?


_____________________________

Duane

Some people are like Slinkies . . . Not really good for anything . . . . . But they still bring a smile to your face when you push them down a flight of stairs.

(in reply to mar0364)
rdouglass

 

Posts: 9272
From: Biddeford, ME USA
Status: offline

 
RE: Advanced PDF Search - 7/27/2005 11:43:41   
Some older versions of Acrobat have a feature called "capture" in them that basically OCR's the doc and builds an index in it. The index is searchable with the Acrobat Reader. Unfortunately the current versions don't and Adobe sells that capability as an additional piece of software.

I have recently looked into this issue 'cause I have around 42,000 PDF doc's I need to make searchable. Current search tools like MS Index Server will only search the metadata of the document and AFAIK that has to be 'typed' in at document creation time.

I was looking into the Google "Mini" search appliance but that too requires metadata and we're not too keen on editing 42K doc's by hand. :)

If you do find a solution other than Adobe Capture, I would be very interested.

EDIT: If you don't know if they are or not, the probably are not. :)

_____________________________

Don't take you're eye off your final destination.

ASP Checkbox Function Tutorial.

(in reply to BeTheBall)
caz

 

Posts: 3552
Joined: 10/10/2001
From: Somewhere south of Chester, UK
Status: offline

 
RE: Advanced PDF Search - 7/27/2005 13:01:18   
The Atomz search engine will search within pdf documents; the free version that I used to use on one site certainly did but I think that there is a limit of 500 pages for the equivalent to the free one. I am not tooo sure about that because we no longer use it.

_____________________________

Do not meddle in the affairs of cats, for they are subtle and will dance, or more on your keyboard.
Cheshire cat. www.doracat.co.uk

I remember when it took less than 4hrs to fly across the Atlantic.

(in reply to rdouglass)
rdouglass

 

Posts: 9272
From: Biddeford, ME USA
Status: offline

 
RE: Advanced PDF Search - 7/27/2005 13:07:11   
quote:

The Atomz search engine will search within pdf documents;


I think that's just like the Google Mini - it searches only the metadata *unless* the doc is OCR'd. Unless Atomz has an OCR engine in it, it won't be able to do it unless it is OCR'd.

_____________________________

Don't take you're eye off your final destination.

ASP Checkbox Function Tutorial.

(in reply to caz)
mar0364

 

Posts: 3159
Joined: 4/5/2002
From: Florida, US
Status: offline

 
RE: Advanced PDF Search - 7/27/2005 13:15:35   
Yes if you have Adobe standard an they are OCR you can search them. Douglas is correct the other option is to do a paper capture. I use Adobe Standard 6 and you can paper capture a OCR readable document. However if the document was scanned as an image and saved as a PDF I don't know if there is anything that will search that.

What version of Adobe are you using?

(in reply to BeTheBall)
caz

 

Posts: 3552
Joined: 10/10/2001
From: Somewhere south of Chester, UK
Status: offline

 
RE: Advanced PDF Search - 7/27/2005 14:48:34   
Have you tried this plugin for the Google Desktop Scansoft Omnipage Search
It would appear to do what is needed to OCR, build a document index and then search. It would also appear that Paper Capture was last in Acrobat4 as standard, but it is available now in Acrobat 7, as an extra I think and combined with other features.

_____________________________

Do not meddle in the affairs of cats, for they are subtle and will dance, or more on your keyboard.
Cheshire cat. www.doracat.co.uk

I remember when it took less than 4hrs to fly across the Atlantic.

(in reply to mar0364)
rdouglass

 

Posts: 9272
From: Biddeford, ME USA
Status: offline

 
RE: Advanced PDF Search - 7/27/2005 15:02:27   
That ScanSoft Omnipage Search looks interesting but it seems to work only at the individual level and with Google Desktop. To me, that seems pretty restrictive yet it does seem to be an option.

I do still own Acrobat 4 and yest the capture feature does work like that and then you can search it using MS INdex Server. However, it does require a lot of manual intervention.

_____________________________

Don't take you're eye off your final destination.

ASP Checkbox Function Tutorial.

(in reply to caz)
caz

 

Posts: 3552
Joined: 10/10/2001
From: Somewhere south of Chester, UK
Status: offline

 
RE: Advanced PDF Search - 7/28/2005 6:57:12   
My thought was that you could run the Omnipage search on the pdfs in bulk to get a set of results, on the urls at least, and then run that set through a url checker? ( The script bit is beyond me though - I would even use FP2003 link verifier :))

As for the phone numbers, I think that's another thing; I don't know of a phone number verifier, apart from a human dialing the number:)

_____________________________

Do not meddle in the affairs of cats, for they are subtle and will dance, or more on your keyboard.
Cheshire cat. www.doracat.co.uk

I remember when it took less than 4hrs to fly across the Atlantic.

(in reply to rdouglass)
rdouglass

 

Posts: 9272
From: Biddeford, ME USA
Status: offline

 
RE: Advanced PDF Search - 7/28/2005 9:16:29   
quote:

My thought was that you could run the Omnipage search on the pdfs in bulk to get a set of results, on the urls at least, and then run that set through a url checker?


Hey, that might be an idea and possibly could be 'duct taped' together if it was run at the server. Hmmm...

_____________________________

Don't take you're eye off your final destination.

ASP Checkbox Function Tutorial.

(in reply to caz)
caz

 

Posts: 3552
Joined: 10/10/2001
From: Somewhere south of Chester, UK
Status: offline

 
RE: Advanced PDF Search - 7/28/2005 16:14:13   
Please let me know if that works :)

_____________________________

Do not meddle in the affairs of cats, for they are subtle and will dance, or more on your keyboard.
Cheshire cat. www.doracat.co.uk

I remember when it took less than 4hrs to fly across the Atlantic.

(in reply to rdouglass)
Charles W Davis

 

Posts: 1725
Joined: 3/7/2002
From: Henderson Nevada USA
Status: offline

 
RE: Advanced PDF Search - 8/5/2005 20:37:11   
BeTheBall,

A site search provided by master.com will search most pdf within a web site.

I said "most". On this web site all pdf were created from MS Publisher using Adobe Acrobat Pro 6.0. http://www.myscacc.org/search.htm
Search for gabe (one of our contributing authors). It will return several instances.

However, a pdf created from a Quark document for a high resolution glossy magazine will not return any hits.

_____________________________

Enjoy! It' s your endeavor!
http://www.anthemwebs.com

(in reply to caz)
BeTheBall

 

Posts: 6365
Joined: 6/21/2002
From: West Point Utah USA
Status: online

 
RE: Advanced PDF Search - 8/7/2005 19:06:22   
Thanks Charles. I'll have a look.

_____________________________

Duane

Some people are like Slinkies . . . Not really good for anything . . . . . But they still bring a smile to your face when you push them down a flight of stairs.

(in reply to Charles W Davis)
Page:   [1]

All Forums >> Community >> Computer Software and Hardware issues >> Advanced PDF Search
Page: [1]
Jump to: 1





New Messages No New Messages
Hot Topic w/ New Messages Hot Topic w/o New Messages
Locked w/ New Messages Locked w/o New Messages
 Post New Thread
 Reply to Message
 Post New Poll
 Submit Vote
 Delete My Own Post
 Delete My Own Thread
 Rate Posts