PDF Converter usually converts PDF file into another file format, such as Word, Excel, PowerPoint, Plain text, html, image, and so on. It should have clear understanding of PDF document structure as well as target file format structure. For instance, a PDF to Word Converter must know PDF objects and Word file structure. In fact, there is no one to one mapping between PDF objects (text streams, images, shapes, etc) and Word document elements. Therefore, PDF Converter has to create compatible Word document elements for each PDF object. This process is further complicated because of the different PDF object attributes in different PDF versions. |
PDF can be converted to various formats: doc, docx, xml, rtf, xls, xlsx, .htm, and so on. It can also be converted to many image formats: |
AVS | JBIG | PGM | SUN |
BMP Mono | JNG | PGM RAW | SVG |
BMP Gray | JP2 | PGNM | TGA |
BMP Sep1 | JPC | PGNM RAW | TIF Gray |
BMP Sep8 | JPG | PNG Mono | TIF 12 bit RGB |
BMP 4 bit | JPG Gray | PNG Gray | TIF 24 bit RGB |
BMP 8 bit | MNG | PNG 4 bit | TIF 48 RGB |
BMP 24 bit | MPEG | PNG 8 bit | TIF 32 bit CMYK |
BMP 32 bit | M2V | PNG 24 bit | TIF 64 bit CMYK |
CIN | MTV | PKSM | TIF G3Fax no RLE |
CMYK | OTB | PKSM RAW | TIF G3Fax RLE |
CMYKA | P7 | PKM | TIF 2DG3Fax |
DCX | PALM | PKM RAW | TIF G4Fax |
DIB | PAM | PNM | TIF LZW |
DPX | PBM | PNM RAW | TIF PackBits |
EMF | PBM RAW | PPM | TIF Sep |
EPS 1 | PCD | PPM RAW | TIF Sep1 |
EPS 1 Color | PCDS | PS 1 | UIL |
EPS 2 | PCL | PS 1 Color | UYVY |
FAX G3 | PCX Mono | PS2 | VICAR |
FAX 2DG3 | PCX Gray | PSD CMYK | VIFF |
FAX G4 | PCX 4 bit | PSD RGB | WBMP |
FITS | PCX 8 bit | PTIF | XBM |
GIF | PCX 24 bit | PXL Mono | XPM |
GPLT | PCX CMYK | PXL Color | XWD |
INFO | PDB | SGI | YCbCr |
There are various layout options available for PDF conversion. Most used option is to convert PDF in the same format with text, images, shapes etc. Other options are formatted text, plain text, or simply extracting images from PDF. |
Some PDF documents have text on images. Scanned PDFs usually results in text on image. Such text on image can be extracted through OCR. Almost all GIRDAC PDF Converters use OCR technology to extract text and format from images. |
Go to: What is Word document? |
Go to: What is OCR? |