Creating accessible PDFs from Word, Part 2

Transcript

[EMILY BAKER]
Hello and welcome. Today’s live training by the Center for Digital Accessibility is about creating accessible PDFs from Word documents, Part 2, Finishing Touches. And we’re gonna resolve that cliffhanger where we left off at the end of Part 1 when we created a Word doc, but did not find out what happens after we export it. So today’s session is being recorded and once the captions are corrected, we will share a link to the video.

My name is Emily Baker and I’m the Senior Digital Accessibility Specialist on the CDA team. We are here to help you strengthen the accessibility of your digital content, and I tend to work with non-web content, such as documents, captioning for videos, and so on. In this session, we’re gonna briefly recap some of the things that we learned in Part 1 about creating an accessible Word file, using the built-in accessibility checker as we go, and the things that the checker won’t remind us about. We’ll talk about options for exporting correctly to the PDF format.

We’ll check our PDF in Acrobat and another PDF accessibility checker, and we’ll apply some finishing touches in Acrobat, and then go back and check again and see how we did. So just briefly, here’s a quick reminder about the deceptively simple three-step process, for creating accessible PDFs. Again, creating an accessible source file, in this case, Word, and using all the built-in accessibility functionality that exists, exporting it properly, and applying those finishing touches. It sounds really easy, but it is, maybe not always as easy as that.

So we’re gonna move on to our demo. Wish me luck because I won’t just be flipping between Word and Acrobat, but I’ll also be showing you the PAC checker in a virtual Windows installation on my Mac. What could possibly go wrong?

And then also, a quick disclaimer, this process will be very helpful in creating a PDF with no major barriers. If you need to pass all the checkpoints in the PDF/UA specification, there will be more work in Acrobat than we’ll be able to cover today. But some of those PDF/UA checkpoints do not directly affect the accessibility or usability of your document. And for the purposes of this demo, I’ve done some additional fixing up in my document so that we can just focus on the higher priority issues.

So, this is a, maybe a slightly fictionalized example of how to do this. So if you’ll give me a brief moment, I will fix my screen share. Okay, is everybody seeing my Word document?

Excellent, thank you. So, an acknowledgement, again, the document I’m using here is a modified copy of a sample document that is used in some LinkedIn Learning courses on accessible PDFs taught by Chad Chelius. Highly recommended for a deeper dive on all of the techniques that we’re gonna talk about today and then some, including some of those fix ups that I’m actually gonna skip today.

I’ve also heard through the grapevine that Chad just finished recording updated versions of both of his courses, which will no doubt include all the latest functionality in Word, InDesign and Acrobat, so I’ll try to stay tuned and let you know when those are actually released in LinkedIn Learning. So, again, a quick recap, we’re gonna fast-forward to the end of our source file creation process.

We have our Word doc, which we created with accessibility in mind, and we’ve been checking as we go. So we can take a look at our accessibility checker just to see how we did.

On the editing ribbon, on the review tab is where you’ll find the Check Accessibility command. It will launch our Accessibility Assistant in a little side car here. And it seems to think that we did a pretty good job.

Also, if we’re keeping it running while we work, sometimes our little accessibility icon will appear in our document if we create any new problems so that we can fix them in real time as we go. And then, again, as we mentioned last time, the Word accessibility checker does not tell you about including a title in the Document Properties dialog. So we’ll just gonna look here real quick in the File menu, Properties. In this Summary, we just need to make sure that there is a title here in these properties.

The title is mandatory for accessibility and any additional metadata is optional. It might be nice for search engine optimization later, but only the title is mandatory for accessibility.

So it looks like we have a nice accessible Word file, and we have two options for transforming our Word doc into a reasonably accessible tagged and structured PDF file. Print to PDF is not one of those.

Instead, we’re gonna choose Save As. All right, is everybody seeing my entire messy desktop?

[PARTICIPANT ONE]
Yes.

[EMILY BAKER]
Okay, so, let’s see. I was in my Word file, Save as, and the dialog pop up. Here, we can choose several different file formats.

We’re going to PDF and making sure that ‘Best for electronic distribution and accessibility’ is checked here. I’m gonna not replace my document because it already exists, but that’s the routine there. A second method, if you already have Acrobat Pro installed on your computer is to use the Acrobat PDFMaker utility, which automatically inserts an additional tab on your editing ribbon and Create PDF.

So this is another perfectly valid way to create an accessible PDF. All right, I’m gonna clumsily stop sharing again so that I can set up my next phase. All right, so our PDF document now is loaded up in my Acrobat Pro installation here. And we talk about a tagged PDF and what do we mean by that?

In a PDF document, every piece of meaningful content in that file must have a tag that explains what that element is doing. So we’ll take a look at what that kind of looks like conceptually. This is the tags panel.

We’ve opened up our Accessibility tags, so Microsoft Word exported our document. It’s nicely tagged. It seems pretty organized.

We’ll check on this in a minute. Your top level tag inside your document should always be a Document tag, and then you’ll start to see each of the pieces of content in your file. I wonder why it won’t show. Here we go.

So as you select each tag, it will highlight the little tidbit of content that it refers to and we can do what we call walking the tags tree. So as you move up and down, arrow up and down through the tags tree, you can see that each of these tags is capturing a tidbit of content. It is placed in the proper order, it makes logical sense. Here you can see, this is kind of why we don’t recommend having additional carriage returns or whatever you call them in your document.

And last time we talked a little bit about the content in the header and footer zones of your Word document. So this is a little moment where you can see how this plays out in the Word document.

‘Landon Hotel’ was just a little phrase up in the header portion of the Word doc. Anything in the header or footer zone is automatically artifacted, which means it is not considered to be a meaningful piece of content, therefore it doesn’t get tagged and it won’t be read out by assistive technology. So this is great for these repetitive, you know, the title of your document or page number, things that you don’t necessarily need to hear over and over and interrupt the flow of reading your document.

It’s gonna still be there visually in the PDF, but it will not be found or read out by assistive technology. So this is kind of how that plays out as a header and footer piece of content.

So you would walk your tags tree to make sure that all of the things in your document are tagged and that they all kind of operate in the proper order. So that’s basically how that works. Now we’ll see what the Adobe Checker thinks about our document.

The check for this is gonna appear under All Tools, Prepare for Accessibility, and then Check for Accessibility. And it’s gonna run a quick check on your file and Acrobat seems pretty happy with our document.

It’s passing all of these checkpoints. We don’t really have any form fields, so that’s fine.

And it’s flagging two issues. It’s not really calling them problems or errors because they both need a manual check. There’s nothing about the automated checker that can figure this out for you.

Checking your logical reading order has to be done by a person. The document author, you’re the one walking your tags tree that can decide whether or not everything’s in the right order. And, the color contrast, again, needs a manual check and can’t really be fixed inside Acrobat anyway.

So Adobe seems to think that our PDF file is in pretty good shape. And you’d never know that it still needs a few finishing touches. But let’s take a look with the PAC checker. This is a Windows only download.

I’m gonna stick a link to that in the chat. And if you’ll give me another moment, I will clumsily adjust my screen share again.

[No audio]
…about bazillion checkpoints in the PDF/UA spec. And you can see here that we got some intimidating looking red Xs. And that means that the PAC checker has found some issues in our supposedly pretty clean PDF. The PAC checker will organize its findings under PDF/UA or WCAG Standards and you can check under either category.

The issues it finds will be the same, just organized differently under here. There’s also a Quality tab that’s trying to address some best practices in PDF files.

It kind of feels a little undercooked so far. It’s a relatively new feature in the PAC checker and it’ll probably evolve over time, so I don’t try to worry very much about what’s going on in there. So if we wanna get into the details of what’s wrong with our file, we can just select Results in Detail and it’ll pop open some more details about our file. These are some typical PDF errors that we will be able to address in Acrobat even if Acrobat doesn’t actually tell us about them.

Under Fonts, here’s an issue, Font Not Embedded. This is a pretty typical error for the Save As from a Word file. The Acrobat PDF maker tends to embed files or tends to embed fonts. But, again, like I said, we can fix all this.

It has an issue with Logical Structure, Alternative Description for Annotations. We’re gonna go back to our Acrobat and I’ll explain what that means. These are not Alternative descriptions for images. We definitely did that properly in our document.

This has to do with links. The PDF/UA identifier is not set. And all of these same issues just pop up under the different, differently organized under the WCAG check.

So these are the things that are kind of wrong with our file according to the PAC checker. We’ll go back and address those and if we have a few extra minutes at the end, we’ll look at some, even more really useful features in the PAC checker. For a quick and dirty little web-based assessment, we can take a look at our file using this axesCheck web-based widget.

So if you’re comfortable uploading your files to a web-based widget, then this might be a nice way to quickly do this. You got to drag and drop it, agree to the acceptable use policy or whatever. It’s gonna run this little check.

By the way, this is using the same basic engine as the PAC checker, but it’s going to give us a high-level report and it’s not gonna drill down into any of those details. So we know about the font issue. We knew it had some issues in the structure tree and the alt descriptions and the metadata. So it’s basically gonna give you a high-level quick check, of where some issues in your PDF file might be.

I don’t think I’ve ever attempted so much screen sharing juggling before. So thank you for your patience. Now, we’re gonna go back to Acrobat.

Okay, so now that all of our dirty little secrets have been exposed by our robust PDF checkers, we’re gonna go back and perform those finishing touches. Again, these are common PDF issues that cannot be fixed in the original Word file. So we are preparing our digital file for accessibility, so where do you think we’ll find the tools to address these issues? Well, no, you could be forgiven for assuming that we’ll find what we need under Prepare for Accessibility Tools.

Trick question, because Adobe has packed this away under Print Production, Use Print Production, probably for historical reasons. It doesn’t feel logical to me, but that’s where it is. So All Tools use Print Production, and then PreFlight. Oh, boy.

Can everybody see the PreFlight pop up?

[PARTICIPANT ONE]
Yes.

[EMILY BAKER]
Okay, thank goodness. So in the PreFlight dialog, we’re gonna make sure that PDF Standards is selected from this dropdown. And we’re gonna make sure that we’re looking at things under the wrench icon, so we’re gonna select these single fix-ups. In the Documents section, so this is divided into several different categories.

We are going to find, wow, where did my documents stuff go? Ah, Embed Fonts.

Okay, so, even if text is invisible, that doesn’t make any sense to me, so we’re gonna select Embed Fonts and then click the Fix button and we’re gonna save our file. That’s really great. We’re gonna go back and, again, under the Documents Section, we’re gonna choose Add Unique ID to Note Elements.

Right here, it’s number one in the tagging structure and we’re just gonna fix it, save our document again. We’re gonna replace the old file. That’s really great.

So each of these is addressing one of those errors that we found in the PAC checker. Under Interactive Elements and Properties, we’re gonna create content entry for link annotations.

That’s big. That’s gonna fix all of those alternative text mistakes.

These are not super intuitive. You just kind of have to work from a list of the errors in order to do this. The PDF checker in Acrobat doesn’t find it for you, so you have to discover that they exist and then come back and fix it. And then finally, under Document and Metadata, we are gonna pretend that we have a high degree of confidence in our ability to produce a standards-compliant file and we’re gonna set the PDF/UA entry.

So here we are. We have saved all of these fixes in our document. We’re gonna double-check on that title entry. We set it properly in Microsoft Word, right?

So the title exists and we also wanna make sure that document title is selected to show in the initial view. And that should do it for our PDF.

I’m gonna save this document. And once again, I will flip-flop my screen share back to the PAC checker. And we’re gonna drag and drop our freshly-fixed file right into the tool. Oh, boy, did I not fix the fonts?

And we’ll look at the results in detail. Okay, we actually did that.

(Emily laughs)

So we’ll pretend that it kind of worked out. But as you can see, other errors that it found before, right at the alternative descriptions for annotations, errors are gone, all of those other weird errors have now been fixed, thanks to the Acrobat PreFlight.

Most of the time, if you can take these steps, your document will be discoverable, readable, and navigable for assistive technology users. And the more stringent checkers like the PAC checker will likely turn up additional issues as I mentioned and those are usually gonna be the kinds of things that violate the PDF spec, but don’t present barriers for assistive technology users. Fair warning, the rabbit hole of PDF specifications only gets deeper and weirder from here, so follow through with some of those leads at your peril. While we are here, if we have another minute or two, we could take a look at a couple of other nice features in the PAC checker.

So we’ve looked at our document. It’s in pretty good shape. Now maybe we wanna take a peek at the Logical Structure.

So there’s this little, it looks like a teeny tiny little org chart icon, and you can take a look at the structure elements. So these are meaningful pieces of content that have been tagged in your file. We can work our way down, kind of like walking the tags tree in Acrobat. So you get that same kind of bird’s eye look at the structure and whether or not everything seems to be in the proper order.

So this is one useful way to kind of have an abstract look at your content. Pretty, pretty useful.

And, again, if you find something out of order, you can go back into Acrobat and fix it. The other really nice functionality here is this all-seeing eyeball icon.

This is the screen reader preview. If you open this it will give you, oh, good, yes, I can expand it. It’s gonna give you a visualization of the, Oh my gosh. Ouch.

So so this is gonna give you a visualization of the way that your content will be presented to a screen reader. And it labels everything according to what kind of content it is.

So we see that there’s the big document tag here that sort of encompasses everything, our very first figure tag, and we can see that there’s alt text assigned to it, our headers, our paragraph text, all these other content types are sort of explained here. Our table of contents, we’re gonna skip that for today. But, again, you can get another view of how your document is tagged, how it’s organized, how all of these pieces of content relate to one another.

This visualization is particularly nice, I think. This is an unordered list, right? So we have our list tag, underneath which are several list items. Each list item contains a label and the list item body.

So this just shows you at a glance what a properly tagged unordered list is gonna look like structurally. And it’ll do the same thing for a table. I’m trying to scroll, but not too fast.

You can see the table tag, the rows, the headers, the table data, all tagged and kind of, and just sort of laying out those relationships visually so that you can see and this is how a screen reader will encounter all of these items. All righty, I’m gonna stop sharing. So there we are. We created an accessible Word file.

We exported it properly to the PDF format. We checked in Acrobat and we checked using this more stringent PAC checker.

We went back and we applied those infamous finishing touches, and we ended up with a very good result. I don’t know why it’s lying about the font embedding, but there it is.