Date: Oc­to­ber 10, 2016 9:43:19 PM CEST
Masamichi Hosoda sub­mit­ted the ex­tract­pdf­mark pack­age. Ver­sion: 1.0.0 Li­cense: gpl3 Sum­mary de­scrip­tion: Ex­tract page mode and named des­ti­na­tions as PDF­mark from PDF An­nounce­ment text:
If you cre­ate a PDF doc­u­ment by some­thing like TeX sys­tems, many small PDFs as fig­ures get in­cluded into the main PDF. It is com­mon for each small PDF to use the same fonts. If the small PDFs are em­bed­ded sub­set­ted fonts, the TeX sys­tem in­cludes them as-is for the main PDF. As a re­sult, the main PDF is em­bed­ded dif­fer­ent sub­sets of the same du­pli­cate font. It is not pos­si­ble to re­move the du­pli­cates since they are dif­fer­ent sub­sets. It enor­mously in­creases the main PDF file size. On the other hand, if the small PDFs are em­bed­ded full set fonts, the TeX sys­tem also in­cludes all of them for the main PDF. The main PDF is em­bed­ded many du­pli­cate fonts, but they are all same full set fonts. There­fore, Ghostscript can re­move the du­pli­cates. It can re­duce the main PDF files size. More­over, if the small PDFs are not em­bed­ded any fonts, the TeX sys­tem out­puts the main PDF which lacks some fonts. In this case, Ghostscript can em­bed the nec­es­sary fonts. It can sig­nif­i­cantly re­duce the re­quired disk size. Either way, Ghostscript in­puts the main PDF which is out­putted by the TeX sys­tem, and out­puts the fi­nal PDF. Un­for­tu­nately, dur­ing this pro­cess, Ghostscript does not pre­serve PDF page-mode and named-des­ti­na­tions etc. As a re­sult, when you open the fi­nal PDF, it can­not re­al­ize the in­tended how the doc­u­ment shall be dis­played. Re­mote PDF links also do not work. Ex­tract PDF­mark can ex­tract page mode and named des­ti­na­tions as PDF­mark from PDF. You can get the small PDF that has pre­served them by us­ing this tool. https://github.com/trueroad/ex­tract­pdf­mark
ex­tract­pdf­mark – Ex­tract page mode and named des­ti­na­tions as PDF­mark from PDF

PDF­marks is a tech­nique that ac­com­pa­nies PDF and that is used to store meta­data such as au­thor or ti­tle, but also struc­tural in­for­ma­tion such as book­marks or hy­per­links.

When Ghostscript reads the main PDF gen­er­ated by the sys­tem with em­bed­ded PDF files and out­puts the fi­nal PDF, the PDF page mode and name tar­gets etc. are not pre­served. There­fore, when you open the fi­nal PDF, it is not dis­played cor­rectly. Also, re­mote PDF links do not work cor­rectly.

This pro­gram is able to ex­tract the page mode and named tar­gets as PDF­mark from PDF. In this way, you can ob­tain em­bed­ded PDF files that have kept this in­for­ma­tion.


$ extractpdfmark -System-Outputted.pdf > Extracted-PDFmark.ps

$ gs -q -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -dPDFDontUseFontObjectNum -sOutputFile=Final.pdf -System-Outputted.pdf Extracted-PDFmark.ps

Copy­right2016–2018 Masamichi Hosoda
Main­tainerMasamichi Hosoda



