Berkeley CSUA MOTD:Entry 52701
Berkeley CSUA MOTD
 
WIKI | FAQ | Tech FAQ
http://csua.com/feed/
2025/04/04 [General] UID:1000 Activity:popular
4/4     

2009/3/11-17 [Computer/SW/Apps, Computer/SW/Languages/Misc] UID:52701 Activity:high
3/11    I have a potential gig at a large(ish) magazine that wants to setup
        a system like this to put a digital edition of their issues online:
        http://www.exacteditions.com
        There are many services out there that do this, and in fact they are
        currently using one. But they'd like to save money by moving this
        in-house. They're starting with PDF files and want to display them
        as images in the browser. That part is easy (convert them server-
        side). But they also want highlighted search results on the images,
        a la google books. I turn to the mighty motd for advice on how to
        do this. use a pdf manipulation library to highlight search terms
        and then convert to jpg? convert pdf to svg and highlight in svg
        before converting to jpg? drop the job cuz its too hard?
        sorry for the long post. thx.
    \_ I suspect drawing a rectangle or something would be the easy way to do
       it, but that doesn't solve the problem of finding the location of the
       text in the pdf...
        \_ I suspect drawing a rectangle or something would be the easy
           way to do it, but that doesn't solve the problem of finding
           the location of the text in the pdf...
        \_ How are you rendering these?  HTML --> PDF or using some sort of
           reporting service (Crystal/SSRS)?  If the former, I'd just set up a
           regular expression and drop in a custom CSS tag before and after
           the desired text that deserns what needs to be highlighted.  Then
           render the HTML to PDF.
           \_ well the starting point is InDesign. They export PDFs and send
          that to their print shop. is there some library out there
          to manipulate PDFs so that I could just like "highlight
          search terms" in the document?
          \_ InDesign has a development forum.  I found this there:
http://blogs.adobe.com/acrolaw/2007/12/highlighting_multiple_words_in_a.html
          \_ Probably not, if you mean have it automatically highlight the
             text for you.
        \_ http://www.devdirect.com/all/PDF_MAnipulation_PCAT_2070.aspx
2025/04/04 [General] UID:1000 Activity:popular
4/4     

You may also be interested in these entries...
2013/4/9-5/18 [Computer/SW/Languages/C_Cplusplus, Computer/SW/Apps, Computer/SW/Languages/Perl] UID:54650 Activity:nil
4/04    Is there a good way to diff 2 files that consist of columns of
        floating point numbers, such that it only tells me if there's a
        difference if the numbers on a given line differ by at least a given
        ratio?  Say, 1%?
        \_ Use Excel.
           1. Open foo.txt in Excel.  It should convert all numbers to cells in
	...
2009/8/27-9/9 [Computer/SW/OS/OsX] UID:53304 Activity:nil
8/26    Any suggestions on a good OCR program for either OS X or Windows that
        will work on scanned documents outputted to pdf?  Preferably free?
        Thanks, scottyg
        \_ Check Abbyy or Scansoft.  Not free.
           \_ Thanks...I think I'd prefer a free or opensource piece of
              software unless there is a huge difference in quality.  I
	...
2009/3/26-4/2 [Computer/SW/Languages/Misc, Computer/SW/Apps] UID:52760 Activity:nil
3/26    Anyone here uses Heritrix?  I'm trying to read the Intro document at
        http://crawler.archive.org/An%20Introduction%20to%20Heritrix.pdf but
        both Adobe Reader 8.1.3 (Win32) and gv 3.6.5 (cygwin) display error
        messages and show me blank pages.  Adobe displays:
        "Cannot extract the embedded font 'FTXWSG+TimesNewRomanMS'.  Some
        characters may not display or print correctly."
	...
2008/9/1-3 [Computer/Companies/Google] UID:51015 Activity:moderate
9/1     THE GOOG had Scott McCloud do a comic explaining why THE GOOG Chrome
        (their open-source webbrowser) is cool.  I don't really think it worked
        http://blogoscoped.com/google-chrome
        \_ Oh boy, it comes with porn mode!
        \_ Oh boy, it comes with a porn hider feature!
           http://blogoscoped.com/google-chrome/22
	...
2008/3/17-21 [Computer/SW/Apps] UID:49481 Activity:kinda low
3/17    Is there a way to compare differences between two substantially
        similar Excel and/or Powerpoint documents other than going through
        them manually?
        \_ In Excel 2003, there is a "Compare and Merge Workbooks..." option
           under the Tools menu.  (But for some reason it is greyed out in my
           Excel 2003.)  In PowerPoint 2003 there is "Compare and Merge
	...
2008/1/5-7 [Computer/SW/Apps] UID:48895 Activity:moderate
1/5     I'm an Excel novice who needs help. Let us say I have a column of
        data A. I then take the absolute value of A1 and enter it in B1:
        (B1=ABS(A1)). All is good. My problem is when I want to delete
        Column A, Column B is still referencing A. How do I tell Excel to
        stop referencing the source data and instead let me have the
        result standalone? I don't want to keep A around anymore, but I
	...
2007/5/13-14 [Computer/SW/OS/Windows] UID:46613 Activity:nil
5/13    Someone please give us a 411 on Windows Vista? Is activation
        tougher than WinXP sp2? Is it impossible to get around now forcing
        you to pay for upgrades? Let me just say that I don't like Windows
        OS.  I don't mind using Microsoft Word, Excel, and Powerpoint,
        and they actually make decent games. However, for over a decade
        I've been sucked into using Windows3.0/95/98/2K/XP because
	...
2007/3/29-4/2 [Computer/SW/Virus] UID:46142 Activity:moderate
3/28    After almost a decade of not using windows I'm thinking about getting
        a cheap windows computer.  Security wise what are some musts?
        \_ install Linux
           \_ Haha you are still funny.
              \_ http://www.csua.berkeley.edu/~erikred/imlinux.jpg
           \_ I actually agree w/ this. Install Linux and VMWare. Then
	...
2006/10/10-12 [Computer/HW/Printer] UID:44752 Activity:kinda low
10/10   Any reccomendations for a cheap monochrome laser printer?  Network
        not needed.  A decent-size starter toner cartridge would be nice.
        \_ I got a used HP 2200D off craigslist for $250 or so.  It's
           great, it does postscript and prints on both sides of the
           paper, which is nice.  Speed is not too bad.  -phr
        \_ I recently bought a Samsung 3051n from Newegg for $200 ($250
	...
2006/9/26-27 [Computer/SW/Apps] UID:44550 Activity:nil
9/26    Is there a way to strip away parts of a pdf document in order to
        keep just one part of it, without buying Adobe Acrobat? Thanks.
        \_ convert to postscript, start editing with $EDITOR
        \_ unlicensed copying of Acrobat
        \_ Depending on the document and what you're trying to save, you
           may be able to select the text and copy and paste.  -tom
	...
2013/4/9-5/18 [Computer/SW/Mail, Academia/Berkeley/CSUA] UID:54647 Activity:nil
4/8     What's a good free e-mail provider? I don't want to use Gmail,
        Yahoo, Outlook, or any of those sites with features I never use that
        track my personal info and keep changing their interface. I want just
        simple e-mail without privacy issues or all the baggage these large,
        for-profit companies are adding. I might even be willing to pay.
        Recommendations?
	...
Cache (69 bytes)
www.exacteditions.com
Focus On Africa goes digital Bringing magazines into the digital age.
Cache (3273 bytes)
blogs.adobe.com/acrolaw/2007/12/highlighting_multiple_words_in_a.html
Terms of Use Search Search this site: Search December 04, 2007 Highlighting Multiple Words in a PDF Document Acrobat has powerful search capabilities, but one feature which is lacking is persistent highlighting via search. I discovered an interesting workaround to this problem after pondering this email message from a customer: We have a fairly large case where I pulled up 7,000 pages of shift logs. I need to find select words throughout the document so I am using the word search to go through all the pages and pull out those pages that reference the word I am searching. I have some questions for you: 1) When the word search is done and I am looking at the document, all the words that I searched are highlighted in blue. However, when I print them off they are not highlighted anymore. Is there anyway to make it so those words are highlighted and will stay highlighted when I print them off and are easy to spot? Obviously I really do not want to print off all of those pages. Is there anyway to print off a summary of where that word is on each page without printing off all 3,000 pages? I scratched my head for a bit, but I found a great workaround which takes advantage of Acrobat 8's Redaction feature. The end result is a persistently highlighted document like this: Persistent highlight marks on a PDF document Read on to learn about the workaround in easy step-by-step instructions. Acrobat 8 Professional can mark multiple words as part of a redaction workflow. While redaction is permanent and irrevocable, it would be virtually impossible to apply them accidentally. Don't worry-- until you apply redactions, they are simply Acrobat annotations. If you accidentally click the Apply Redaction button a stern warning message is presented. If you accidentally click OK, Acrobat will ask you to rename your file. Once the words are highlighted, Acrobat can flexibly allow you to view, delete or summarize the comments. Highlighting Multiple Words throughout a PDF To persistently highlight multiple words in a PDF, follow these steps: 1 OCR the document if necessary. Acrobat cannot search for words unless there is a text layer in the document. Note two important options for Whole Words and case sensitivity. Check all button If needed, you can click on each search result in the list to see the corresponding highlighted word in the document. This allows you to highlight a portion of the "hits" in the document. Mark Checked Results for Redaction 8 If you have additional words you would like to highlight, click the New Search button: New Search Button Then, repeat steps 4 to 7 above. Here's how: 1 Choose View-->Navigation Panels-->Comments Alternatively, click on the Comments Panel button at the lower left hand corner of the screen. Comments Panel Button 2 You can work with each comment in the list. Working with the Comments Panel in Acrobat Summarizing Comments Once words are highlighted, it is easy to create a comment summary which creates a new document containing only the pages with the marked words. Set to: - Document and comments with connector lines on single pages - Disable "Pages containing no comments" Summarize Comments window settings You may like other Layout settings. The third option produces a listing-only style document.
Cache (6942 bytes)
www.devdirect.com/all/PDF_MAnipulation_PCAT_2070.aspx
PDF Manipulation Components Manipulate, Merg, Edit & Split PDF (Adobe Portable Document Format) files from within your applications with these components. Access the pages via page-oriented object models, modify the content and save to disk. It also supports for XFA, Concatenating multiple Pdf files into one,14 built-in Font styles, Modifying AcroForm, Extracting and adding Images and Text, Add or set user-defined XMP Metadata to an existing Pdf, Encrypting or Decrypting a Pdf file, Adding Watermark or Logo, Append pages and converting Pdf file to TIFF, BMP, PNG & JPG image. Kit is a Java component for Pdf document manipulation that allows developers to edit existing Pdf documents. It also supports: Creating Application and Pdf document link, Font style, modifying AcroForm, Extracting and adding Images and Text, Getting and modifying Meta information of Pdf file, Encrypting or Decrypting a Pdf file, Adding Watermark or Logo, Append pages and Converting Pdf file to a single Tiff file or XML file. Any printable file can be converted into an image file such as TIFF, JPEG, PDF, EMF and more. The Black Ice printer drivers support all available Windows platforms, Citrix & Termiinal Servers. The ColorPlus PDF printer driver can also extract ASCII text from a printed file in addition to generating image or PDF output from printed files. The ColorPlus PDF printer drivers are Royalty Free allowing developers to bundle and distribute the ColorPlus PDF Printer Driver as part of their own application with no per user fees. The PDF printer driver can also extract ASCII text from a printed file in addition to generating PDF output from printed files. The PDF printer drivers are Royalty Free allowing developers to bundle and distribute the PDF Printer Driver as part of their own application with no per user fees. PDF SDK 25 The PDF SDK/ActiveX plug-in enables the conversion from different image formats to the popular PDF (Adobe Portable Document Format) file format. Using the PDF SDK/ActiveX plug-in, any color or monochrome image type supported by the Black Ice imaging toolkits can be converted into PDF format. The current release includes 1-bit, 8-bit color, 8-bit grayscale and 24-bit color support. PDCAT COM DLL 310 The PDCAT COM DLL is a programmable component that merges (concatenates) the pages of PDF document. The tool can also add and delete links, annotations and bookmarks as part of the merge process. The PDCAT COM DLL includes the PDSplit component for splitting a large PDF document into smaller pieces. The splitting process is controlled either according to the pages or to the bookmark information. PDF Prep Tool Suite 310 The PDF Prep Tool and Prep Tool Enhanced facilitate the generation and manipulation of PDF documents based on existing PDF files or parts thereof, controlled by a simple API. BCL easyPDF (SDK) 51 BCL easyPDF SDK is a comprehensive PDF Programming Toolkit designed specifically to help Programmers develop and maintain PDF server and PDF desktop applications. NET components enabling developers to compose, display, capture, edit and print documents into they Dot Net applications. NET your programs will be able to display documents, acquire image from TWAIN scanners, make image processing, perform optical characters recognition and many other features covering all mainstream areas of document imaging. PDF PageMaster 14 PDF PageMaster is an affordable and effective solution for splitting, merging, editing, and securing PDF documents. PDF PageMaster is offered as an easy-to-use GUI application, as a command-line application, and as a library component that can be used as a building block for other client and server-based applications. PDFNet SDK 40 PDFNet SDK is an amazingly comprehensive, high-quality PDF library meeting requirements of the most demanding and diverse applications. Using the PDFNet toolkit, developers can write stand-alone, cross-platform and reliable commercial applications that can read, write, edit, print and display PDF documents. Net language (eg C#, VB) and as a cross-platform C/C++ library. PDFSecure SDK 36 PDFTron PDFSecure SDK is an affordable and reliable developer library that can be used to add, remove, or change security settings on PDF documents and enables those functionalities to be embedded into third-party client and server-based applications. The library has a simple application programming interface (API) that can be accessed by various languages. PDFSecure SDK is also available as a command-line application (please see PDFTron PDFSecure for more details). NET makes simple the task of merging and adding new content to existing PDF documents. The object model is intuitive and easy to learn, yet very flexible allowing PDF merging, stamping, appending, form filling and page placing, rotating and scaling. Build your applications with unparalleled ease in your existing development environment for Windows, Linux, or Unix. With support for over 100 image formats and the industry's fastest code--backed by the AccuSoft Image Guarantee--ImageGear Professional is the Global Standard for imaging development. Absolute PDF Server Absolute PDF Server saves your organization time by offering everyone the ability to create and extract their PDF files on demand, anytime. The time spent retyping, retrieving and reformatting PDF data is eliminated. Ttransport PDF data into formatted Excel spreadsheets for analysis, editable Word documents for reversioning, HTML, Text and more. PDF files can be created from any printable MS Windows application and can be easily viewed and securely shared by anyone, internally or externally. jPDFFields can extract, merge or flatten acroform fields. jPDFFields is built on top of Qoppa's proprietary PDF technology so there is no need for any third party software or drivers. After editing documents, the library can save them to a local file or the host application can override the save function to save the file to any location locally or on a network. With jPDFProcess, you can deliver customized PDF content to your users by integrating within your servers or applications. jPDFProcess is platform independent, so it can be used in any environment that supports Java, including Windows, Mac OSX and Linux. Designed for a combination of maximum power and ease of use it goes Direct to PDF for blazing speed. Create separate columns and pages with precisely positioned elements. ABCpdf also supports image insertion and drawing graphic objects. Designed for a combination of maximum power and ease of use it goes Direct to PDF for blazing speed. Create separate columns and pages with precisely positioned elements. "Blocks" is a feature new to PDF - introduced by PDFlib in order to ease and speed up server based variable data printing. With the new introduced concept of "PDFlib Blocks", PDFlib offers a solution to the problem of server side personalization of PDF documents.