New on CTAN: extractpdfmark
If you create a PDF document by something like TeX systems, many small PDFs as figures get included into the main PDF. It is common for each small PDF to use the same fonts. If the small PDFs are embedded subsetted fonts, the TeX system includes them as-is for the main PDF. As a result, the main PDF is embedded different subsets of the same duplicate font. It is not possible to remove the duplicates since they are different subsets. It enormously increases the main PDF file size. On the other hand, if the small PDFs are embedded full set fonts, the TeX system also includes all of them for the main PDF. The main PDF is embedded many duplicate fonts, but they are all same full set fonts. Therefore, Ghostscript can remove the duplicates. It can reduce the main PDF files size. Moreover, if the small PDFs are not embedded any fonts, the TeX system outputs the main PDF which lacks some fonts. In this case, Ghostscript can embed the necessary fonts. It can significantly reduce the required disk size. Either way, Ghostscript inputs the main PDF which is outputted by the TeX system, and outputs the final PDF. Unfortunately, during this process, Ghostscript does not preserve PDF page-mode and named-destinations etc. As a result, when you open the final PDF, it cannot realize the intended how the document shall be displayed. Remote PDF links also do not work. Extract PDFmark can extract page mode and named destinations as PDFmark from PDF. You can get the small PDF that has preserved them by using this tool. https://github.com/trueroad/extractpdfmark
The package's Catalogue entry can be viewed at http://www.ctan.org/pkg/extractpdfmark The package's files themselves can be inspected at http://mirror.ctan.org/support/extractpdfmark/
Thanks for the upload. For the CTAN Team Petra Rübe-Pugliese
We are supported by the TeX users groups. Please join a users group; see http://www.tug.org/usergroups.html .
extractpdfmark – Extract page mode and named destinations as PDFmark from PDF
PDFmarks is a technique that accompanies PDF and that is used to store metadata such as author or title, but also structural information such as bookmarks or hyperlinks.
When Ghostscript reads the main PDF generated by the TeX system with embedded PDF files and outputs the final PDF, the PDF page mode and name targets etc. are not preserved. Therefore, when you open the final PDF, it is not displayed correctly. Also, remote PDF links do not work correctly.
This program is able to extract the page mode and named targets as PDFmark from PDF. In this way, you can obtain embedded PDF files that have kept this information.
$ extractpdfmark TeX-System-Outputted.pdf > Extracted-PDFmark.ps
$ gs -q -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -dPDFDontUseFontObjectNum
-sOutputFile=Final.pdf TeX-System-Outputted.pdf Extracted-PDFmark.ps
|Copyright||2016–2018 Masamichi Hosoda|
- 2019-02-12 CTAN update: extractpdfmark
- 2018-12-10 CTAN Update: extractpdfmark
- 2017-04-17 CTAN update: extractpdfmark
- 2016-10-26 CTAN update: extractpdfmark
- 2016-10-10 New on CTAN: extractpdfmark