Reply To: Perfect PDF 9 Editor / Sep 1 2020

Sep 16, 2020 at 3:24 am #16502606 Quote

Gary

Guest

[@Peter Blaise]

I should have put this up first. I reworked it to leave out the parts that I also had in my last comment.

When you made your statement about the output was supposed to be a text file, a text file, a text file, I asked “What encoding did you export the PDF file as?” to see what you knew, even though it was clear you did not know what you were viewing.

>”I did not choose any encoding for a text file export,”
That is correct because you can’t. Soft Xpansion exports only in Unicode. In one method it shows that the output is Unicode.

>”and cannot imagine the meaning of any coding for a test file export”

I am sure you truly think like that. If you knew about different encodings for text, you would have instantly recognized that the output is not using 1 byte per character. Now you know that not all text files are 1 byte per character.

>”or anything I could have done or chosen for a text file export that would have put what appear to be tabs spaces between words causing them to appear as if in spreadsheet columns.”

So from stating that the exported output had “spaces as tab-delimited” a few days ago to “what appear to be tabs spaces” today. Are you starting to accept that they are not tabs?

UNICODE EXPLANATION:

Text file does not imply any encoding at all. It simply means the contents do not have any formatting included such as an MS Word document would have. A text file does not imply that the contents are encoded as 1-byte characters or uses ASCII, EBCDIC, or the incorrect terms ANSI, Windows Standard encoding, or MS-DOS standard encoding or any other encoding or formatting. Neither does it imply that the content CANNOT be 2-byte, 3-byte, or 4-byte characters, or that those characters cannot be considered as UTF-8, UTL-16, or UTF-32, or called Unicode.

The US alphabet has 26 characters, upper and lower case variants, punctuation marks, etc, and all can fit within 128 placeholders, which was the earliest Personal Computer’s character sets, even on mainframes/super-minis, and referred to as ASCII (as opposed to EBCDIC like on your IBM mainframe). Each byte is a binary number, but what that number represents depends on the character set, which is an agreement that each binary number stands for some specific character (even unprintable ones). 128 variants can be formed using only 7 bits, so the 8th bit of a byte was left as a zero. When the IBM PC came along, they used the 8th bit to add another 128 characters (playing card symbols, line drawing shapes, etc), which was often referred to as Extended-ASCII [Note: IBM wanted to use the Epson MX80 printer, but it communicated using 7 bits at a time. The design was modified to store the extra 128 characters and use 8-bits to communicate, so the IBM Printer was the original Epson MX80, modified, and had the IBM logo on it.]

As other countries adopted PCs, they wanted to have their characters be available too, but now all 8-bits of a byte were already in use. That means more than 1 byte would be needed to add support for languages such as Spanish, French, Italian, German, and others. Soft Xpansion is based in Germany, and all over Europe, there is much more of a need to have support for all languages worldwide. Therefore, it is no wonder that Soft Xpansion exports text only as Unicode. The first 128 characters of Unicode are the same as the ASCII characters. If the file used only those characters, it is easy to convert to UTF-8 or the incorrect term ANSI (actually Windows 1252) (UTF-8 and Windows 1252 are different).

Since we already know that we are out of space with 1 byte used to store each character, more bytes are needed to contain Unicode, so what do you think Unicode looks like in a standard text viewer when the text contains only characters used in the U.S. alphabet?

IT LOOKS LIKE THERE IS AN EXTRA SPACE BETWEEN EACH CHARACTER!

And someone that doesn’t know any better may think they are tabs. Granted, if you are not using a standard text viewer, it may collapse the leading bits and show the characters we are accustomed to seeing right next to each other.

>”I guess that Soft-Xpansion Perfect PDF 9 Editor saw a table of contents with numbered chapters and decided the contents was a spreadsheet, but I’m guessing”

Don’t guess. Text files don’t have any formatting, so Perfect PDF 9 will not try to line items up in a column by adding tabs. Programs that convert PDFs to formats that are formatted (e.g., MS Word documents), will attempt to keep the original formatting in the output. That does not happen for exported text.

Take a look at your exported text using a Hex Editor/Viewer.

>”So, has anyone else tried exporting to text?”

Randy stated to you that he exported two PDFs to text and they were just fine. I succeeded in exporting to text, and you succeeded in exporting to text even though you did not understand what you were looking at, so that makes at least three of us. Hopefully, some others will too.

>”… it just confirms for me that Soft-Xpansion does not understand what a human understands when we look at the contents of a PDF file.

Look at it how? Using what program? You cannot look at the contents of a text file and know what encoding it uses.

>”Soft-Xpansion does not understand”

And yet, Soft Xpansion has been in the business for over 25 years, and is considered one of the top companies when it comes to PDF. They have to have a good idea of what a human understands when they look at the contents of a PDF file. Without understanding the PDF format, it might seem like an odd sequence of characters. If you mean look at the “text exported from a PDF file,” then they also know that depending on what you use to view the output will affect what you see and therefore should not be the basis of determining what encoding is in use, and they now know that some people in the U.S. have no clue what Unicode is or that it can be in a text file.

>”Let’s wait for version 11 or later, with no registration needed for it to work on screen at least as a trial.”

“Let’s not” At least, don’t include me. I would suggest that YOU wait for version 111. Soft Xpansion will probably be thrilled if you do. The rest of us that were interested enough to start the download process of Perfect PDF 9 would probably like to be able to use Perfect PDF 9, and hope that they (Soft Xpansion) are not so discouraged with their experience in this offer that they will not return to give SoS users another chance for their software in the future.

>”Thanks in advance for your own report of testing text export.”

No need to thank me in advance, I already did it. As stated all along, as soon as the SoS offer download page had a code, I have been exporting to text using Perfect PDF 9. I am glad to get Unicode exported. Most PDF editors do not export in Unicode. When I need Unicode, Perfect PDF 9 saves me from having to process the exported output through another program. I might have more in my review, which is coming along fine. I did have to do other things but will be finishing that up.