After half a day of searching, I realize that not many libraries support modifying existing PDF files, let alone encryption features
For Phakorn Kiong
At work, a requirement to password protect existing PDF files came up. I was tasked with adding this feature to our existing backend that uses Express
.
I thought it would be a straightforward task to implement it since there are probably plenty of libraries in NPM that are capable of this functionality.
For the sake of simplicity, password protection and encryption is used interchangeably in this article.
After half a day of searching, I realize that not many libraries support modifying existing PDF files, let alone encryption features.
The notable candidate is hummus Recipe (Mostly written in C/C++). However, it does not support output as Buffer
(as of this writing). In the end, I have to save the encrypted PDF file to the local disk and load it back as a Buffer
(Extremely inefficient).
My next preferred candidate is pdf-lib. The library provides many useful PDF modification features and works with all JavaScript environments. However, the encryption feature is not yet available in this library.
So I thought to myself, can I implement this feature?
To get started, I started reading the PDF Specification file to understand the overall structure of a PDF file. I’ve written a short article here.
Encryption in PDF Specification
A Conforming Reader determines if a PDF file is encrypted by first looking at Encrypt
entry in Trailer
dictionary. If it exists, then the file is encrypted and the Conforming Reader will go through the decryption routine to extract the content before rendering the file.
Following is an example of Trailer dictionary of an encrypted PDF file
The Encrypt entry is referencing to Indirect Objects I with object number 7 and generation number 0. This object is known as Encryption dictionary and contains all encryption-related information.
Following is an example of an Encryption dictionary.
It might seem very complicated, but let’s go through each of the entries in the dictionary to understand what it meant.
/Filter
— Identifies the security handler for the document./Standard
means the “Standard password-based security handler” is used./V
— Specify the algorithm to be used in encrypting and decrypting the document./Length
— Length of the encryption key/CF
— A dictionary whose keys is crypt filter names and whose values is the corresponding crypt filter dictionary (Only meaningful forPDF 1.5
and above)/StmF
— Name of the crypt filter to be used by default when decrypting streams in the document/StrF
— Name of the crypt filter to be used by default when decrypting all strings in the document
The following field is only used if /Filter
is /Standard
/R
— Number specifying revision of the standard security handler/O
— 32-byte string, based on bothowner
anduser
passwords. Used in computing the encryption key and determining whether a validowner
password was entered/U
— 32-byte string, based onuser
password. Used in determining whether a validowner
oruser
password was entered/P
— A set of permission flags will be permitted when the document is opened withuser
password. We will get back to this in the latter part of the article.
A security handler is a software module that implements various aspects of the encryption process and controls access to the contents of the encrypted document.
While the PDF Standard specify a “Standard password-based security handler” that all Conforming Reader must support, a custom security handler could be added by Conforming Reader to enhance the security of encrypted PDF File.
Crypt filters provide finer granularity control over encryption within a PDF file (Only for PDF 1.5
and above with a value of /V
entry equal to 4
).
For example, you could define different encryption mechanisms for streams ( /StmF
) and strings ( /StrF
) using different crypt filters defined in /CF
of Encryption
dictionary. There is a couple of entry in crypt filters that we should understand, as follow:
/CFM
— Define the method used by conforming reader to decrypt data. Can beNone
,V2
orAESV2
. The valueV2
usesRC4
algorithm whileAESV2
usesAES
algorithm in Cipher Block Chaining (CBC) mode with 16-byte block size./AuthEvent
— Define the event used to trigger authorization to access the encryption key used by this filter. EitherDocOpen
orEFOpen
.The valueDocOpen
means authorization will be required when a document is opened, whileEFOpen
means authorization will be required when accessing embedded files. Default toDocOpen
./Length
— The bit length of the encryption key. Multiple of 8 in the range of 40 to 128.
Furthermore, crypt filters do support public-key security handlers implementation (Not the scope of this article).
For sake of simplicity, the article will focus on the “Standard password-based security handler” and the algorithm used in conjunction with the crypt filter shown in the example above.
The encryption of data in a PDF file is based on the use of an encryption key computed by the security handler.
While we could define our custom security handler with a different mechanism to compute the encryption key, the process of encrypting the data never changes (Follow “Algorithm 1: Encryption of data using the RC4 or AES algorithms” defined in the specification).
The next question would be, how do we compute the encryption key following the standard security handler in PDF specification?
The process to compute encryption key is described in “Algorithm 2: Computing an encryption key” of the PDF specification. It requires two inputs that have to be computed beforehand, namely O
and ID
entry in the Encryption dictionary.
Computation of U
and O
entry follows steps of padding (to ensure consistent length), MD5 hashing and RC4 encryption function.
U
entry is computed using either “Algorithm 4: Computing the encryption dictionary’s U (user password) value (Security handlers of revision 2)” or “Algorithm 5: Computing the encryption dictionary’s U (user password) value (Security handlers of revision 3 or greater)”.
O
entry is computed using “Algorithm 3: Computing the encryption dictionary’s O (owner password) value”.
PDF specification does not define how ID
entry should be computed.
However, since it is also used as a unique file identifier, we could pass the Info
entry from the Trailer
dictionary into an MD5 Hash to generate the required byte-strings.
(Fun fact: No matter how ID
entry is computed, it will not cause encryption or decryption to fail, as long as it is the constant)
Once we have all the ingredients, we could compute the encryption key.
With the computed encryption key, an encryption function (encrypt the actual data in PDF file) can be created for use by the PDFWriter
(Code that saves the PDF in binary format).
Permission Flag
Coming back to the /P
entry in the Encryption
dictionary, it is also known as user access permission. It is an unsigned 32-bit integer containing a set of flags specifying which access permission shall be granted when a document is opened with user
passwords. Only bit positions 3, 4, 5, 6, 9, 10, 11 and 12 is meaningful.
In the example above, the /P
entry has a value -3896
in decimal format, which is equivalent to 1111000011001000
, since only bit position 4 is set
, the user would be able to modify the file but not print.
With all the above steps in place, all that is left is to plug the code in and ensure that the encryption function is run at the right place for the right data in the PDFWriter
of pdf-lib
. Many thanks to the security
module of pdfkit
for the code on encryption algorithm.
Link to the Pull-Request for the Encryption Feature for pdf-lib
below.
https://github.com/Hopding/pdf-lib/pull/917
Useful Reference
Source: a-sg-techblog
Hackers Exploiting New Auth Bypass Bug Affecting Millions of Arcadyan Routers
Up to 1,500 businesses infected in one of the worst ransomware attacks ever
INTERNATIONAL NEWS
Crypto ID publishes international articles about information security, digital transformation, cyber security, encryption and related topics.
Please check here!