ディスカッション (10件)
長年ドキュメントのデファクトスタンダードとして君臨してきたPDF。しかし、データ抽出の難しさやレスポンシブ非対応など、現代の開発現場では「負の遺産」としての側面が目立ってきています。今、そんなPDFの限界を打ち破り、よりWebフレンドリーな形式へと移行しようとする「PDFとの戦い」が世界中で熱を帯びています。
Well, that was a nonsense article. Badly written software has trouble with PDFs, accessibility is an afterthought (which, sadly, is true of most things) and some small group thinks they can invent a better wheel, ignoring the fact that they’d have to do a lot of work to overcome the first mover advantages of HTML and PDF and this comment now has more information than the original article thanks to that clause beginning with “ignoring”.
Yet Duff Johnson, head of the PDF Association, protector of the format, argues that the fault lies not in the file type but in ourselves. He contends that there is no reason developers cannot build bots that are able to use PDFs. The AI assistant embedded in Acrobat, Adobe’s PDF reader, is designed to do precisely that, notes Leonard Rosenthol, the software firm’s PDF guru.
Designed to, but does it do it well without the problems noted earlier in the article?
Seems to be a weak pitch for an Israeli startup called Factify. Their new document type is also closed sourced which seems like an obvious showstopper for a ubiquitous global document replacement, especially in today's extremely heated and untrustworthy environment.
No strong argument imo for replacing the pdf.
There are PDF files and there are PDF files.
Many (most?) PDFs I run into are generated from Microsoft Word or some other MS product with no structure at all.
The majority of people use MS products don't understand or care about structure.
The WYSIWYG imperative means lots of markup to describe font size, color, and decoration,
to make every section heading look the same without ever designating the text as a section head.
The same happens with paragraphs, page breaks, and column flow.
The resulting document looks correct enough to the creator.
Other people who have a different version of Word,
different fonts,
and a thousand other little differences,
won't see it correctly.
That leads our author to generate a PDF, probably with embedded fonts,
to ensure uniform appearance across these thousand little exceptions.
The result is a document with the content mixed up so incomprehensibly with appearance controls as to be both unreadable
and without any residue of the underlying intended structure of the document's sections, headers, figures, paragraphs, captions, footnotes, or anything.
And then there's PDF files which are nothing more than a series of images of pages of text.
If you're lucky and the scans are clean a good OCR might be able to recover most of the content.
What I'm saying is,
it doesn't matter the tool,
if authors don't encode structure and formatting in semantically meaningful ways.
Makes me remember of this, which was posted a few days ago here in HN:
https://scottlocklin.wordpress.com/2023/05/31/djvu-and-its-c... (https://scottlocklin.wordpress.com/2023/05/31/djvu-and-its-connection-to-deep-learning/)
The war against pdfs is based on AI being too stupid to read them? That's a condemnation of AI, not pdfs. I, a natural intelligence, can easily read pdfs.
I'll miss getting documentation as a pile of pictures in a PDF.