How can I tell SharePoint about PDF meta-information (author, subject, keywords)?

Chris Shearer Cooper 0 Reputation points
2024-11-04T23:11:48.8833333+00:00

I have a bunch of PDF files that have information in their 'Author', 'Subject', and 'Keywords' fields (I can see the values in Adobe Reader).

When I'm in SharePoint:

  • I can search for text in the Author field, and it will return the correct list of matching files, but there doesn't seem to be any way in SharePoint to view the Author field of a PDF. (BTW, same thing for the 'Title' field)
  • Searching for text in the Subject or Keywords fields doesn't work.

Is there a way to get OneDrive/SharePoint (a plugin maybe?) that will get it to search all the fields in a PDF, and to show those fields?

SharePoint
SharePoint
A group of Microsoft Products and technologies used for sharing and managing content, knowledge, and applications.
10,816 questions
0 comments No comments
{count} votes

3 answers

Sort by: Most helpful
  1. Ling Zhou_MSFT 18,100 Reputation points Microsoft Vendor
    2024-11-05T02:19:30.3166667+00:00

    Hi @Chris Shearer Cooper,

    Thank you for posting in this community.

    I understand you well if we can search the document properties of PDF files. But I am sorry to say that it is very difficult for us to extract the document properties of PDF in SharePoint since PDF files are not developed by Microsoft. There is also no related plugin to support this feature.

    But we have a roundabout way that we can manually create the appropriate columns for these document properties of PDF files and populate them.

    Every column in SharePoint is automatically created with managed properties and crawled, which also makes them searchable. Converting a search of document properties of PDF files into a search of column values can also achieve the search results you want.

    Note: After you have created the columns and populated them with values, you need to wait a while for SharePoint to finish crawling them before you can search for them. Crawling is a timed task.

    Also, for me personally, I think it's important that document properties of PDF files can be extracted in SharePoint because PDF is becoming more and more widely used. We suggest that you can make suggestions on this SharePoint Feedback portal for this feature. I will vote for you.


    If the answer is helpful, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".

    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.


  2. Paul de Jong 721 Reputation points
    2024-11-05T14:58:20.97+00:00

    Out-of-the-box there is no property promotion for pdf files. Properties like Keywords, Created, Modified, Author, Producer, ... are not extracted into SharePoint columns.

    If you are a developer and have time you can build a solution that extracts the properties from pdf files and captures the values into SharePoint columns.

    There are apps available that can extract PDF properties like Subject, Keywords, ... into SharePoint columns. These columns can then be used for filtering, grouping, ordering and also searching.

    Some apps works interactively (during uploading to SharePoint, example) whereas other apps are developed for extracting properties from existing SharePoint pdf files (example) in bulk.

    0 comments No comments

  3. Ling Zhou_MSFT 18,100 Reputation points Microsoft Vendor
    2024-11-07T07:08:35.1266667+00:00

    Hi @Chris Shearer Cooper,

    I apologize for the late reply, it took me a while to verify your response.

    For your first point: SharePoint already extracts the title and author fields from the PDF.

    I suspect that SharePoint just scanned the content of the document and then did a search based on the content of the document. For example, here I enter something in a PDF file and SharePoint is still able to search for the PDF file.

    User's image

    User's image

    And I understand that these PDF meta-information (author, subject, keywords) may not be the content of the PDF file, they need to be set manually. These contents are not extractable for SharePoint.

    User's image

    More feature about the "Add Columns":

    1. If I add a column in a folder, does that column appear for all users who are viewing that same folder? Yes, SharePoint has no way to hide certain columns based on user.
    2. If I add a column in a folder, does that column appear for all sub-folders as well? Yes.
    3. Are there applications or APIs that allow setting of column values that don't require manually entering values for each file? For the time being SharePoint has no automated way to extract the document properties of PDF and populate the columns. I suggest you can use a third party tool to extract the document properties of PDF to excel sheet and populate the excel sheet values to your document library.

    I completely understand the situation and your feelings. But extracting PDF file properties does currently exceed SharePoint's capabilities. We are always looking for ways to improve SharePoint, and your feedback and suggestions are very valuable. The Feedback Forum will have staff collecting this feedback on a regular basis, and SharePoint will consider implementing this feature in the future if it is supported by more and more people. This keeps SharePoint moving forward. Every feedback from you will be appreciated!


    If the answer is helpful, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".

    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.