Microsoft Graph connectors SDK contracts connector crawler API and models

The Microsoft Graph connectors SDK contracts connector crawler API and models are called during a crawl.

Connector crawler API

Method Parameters Return Type Description
GetCrawlStream GetCrawlStreamRequest CrawlStreamBit as a stream Reads data from the data source. This method will be called during full and periodic full crawls where all items should be read from the data source and returned to the platform.
GetIncrementalCrawlStream GetIncrementalCrawlStreamRequest IncrementalCrawlStreamBit as a stream Reads data from the data source. This method is optional and will be called during incremental crawls and returns only the incremental changes in items since last incremental crawl.

Connector crawler models

The following are the connector crawler models.

GetCrawlStreamRequest

Request model for getting items during crawl.

Property Type Description
customConfiguration CustomConfiguration Provides configuration data for the connector.
authenticationData AuthenticationData Holds the data source access URL and the credentials to access it.
crawlProgressMarker CrawlCheckpoint Holds data to identify items that were processed in the last crawl. The connector returns the item's information, and it uses it if the platform crashes during the crawl.
Schema DataSourceSchema Shows the schema of the connection. This property can also be used to set the value.

CrawlStreamBit

Response model that contains the item status indicating success or failure and the indicator/checkpoint for the item being crawled during full or periodic crawl.

Property Type Description
status OperationStatus Shows the status of the operation and error details.
crawlItem CrawlItem Shows a single item crawled from the data source.
crawlProgressMarker CrawlCheckpoint Identifies the item crawled from the data source.

GetIncrementalCrawlStreamRequest

Request model for getting items during an incremental crawl.

Property Type Description
customConfiguration CustomConfiguration Provides configuration data for the connector.
authenticationData AuthenticationData Holds the data source access URL and the credentials to access it.
crawlProgressMarker CrawlCheckpoint Holds data to identify items that were processed in the last crawl. The connector returns the item's information, and it uses it if the platform crashes during the crawl.
schema DataSourceSchema Shows the schema of the connection. This property can also be used to set the value.
previousCrawlStartTimeInUtc Timestamp Shows the previous crawl start time in UTC. This value can be used in the first incremental crawl, but subsequent calls should use the checkpoint value.

IncrementalCrawlStreamBit

Response model containing the item, status indicating success/failures if any and the indicator/checkpoint for the item being crawled during incremental crawl.

Property Type Description
status OperationStatus Shows the status of the operation and error details.
crawlItem IncrementalCrawlItem Shows a single item crawled from the data source during and incremental crawl.
crawlProgressMarker CrawlCheckpoint Identifies the last item crawled from the data source during the last incremental crawl.

ItemType enumeration members for CrawlItem

Enumeration fields for crawl items.

Member Value Description
ContentItem 0 Data items with content to ingest. For example: website content.
LinkItem 1 Link to a content item that will be used in subsequent crawls. For example: Links to a website or a folder.

CrawlItem

Represents an entity in the data source. The maximum size allowed is 4 MB. For example: a file, a folder or a record in a table.

Property Type Description
itemId string Shows the unique ID that represents the item in the data source.
contentItem ContentItem Shows a data item with content to ingest. For example: the content of a website.
linkItem LinkItem Link to a content item that will be used in subsequent crawls. For example: a link to a website or a folder.
itemType ItemType Shows the type of item being sent. This model should have a contentItem or a linkItem set and this enumeration field should correspond to that item.

Note

  • The properties linkItem and contentItem are mutually exclusive.

ItemType enumeration members for IncrementalCrawlItem

Enumeration fields for incremental crawl items.

Member Value Description
ContentItem 0 Data items with content to ingest. For example: the content of a website.
LinkItem 1 Link to a content item that will be used in subsequent crawls. For example: a link to a website or a folder.
DeletedItem 2 Item that was deleted from the data source and should be deleted from the index.

IncrementalCrawlItem

Represents an entity in the data source. For example: a file, a folder or a record in a table.

Property Type Description
itemId string Shows the unique ID that represents the item in the data source.
contentItem ContentItem Shows a data item with content to ingest. For example: the content of a website.
linkItem LinkItem Link to a content item that will be used in subsequent crawls. For example: a link to a website or a folder.
deletedItem DeletedItem Item that is deleted from the datasource and should be removed from the index. If deletedItem is set, contentItem or linkItem can't be set.
itemType ItemType Shows the type of item being sent. This model should have a contentItem or a linkItem set and this enumeration field should correspond to that item.

Note

  • The properties linkItem, contentItem, and deletedItem are mutually exclusive.

ContentItem

Item that holds the content of the data source entity to be ingested. For example: the content of a website.

Property Type Description
propertyValues SourcePropertyValueMap Holds the key and values of each property in the item.
accessList AccessControlList Restricts the access to the item to specific users or groups.
content Content Shows the content property of the item that can be used when displaying search results.

LinkItem

Item that acts as a link to another item. These link items will be sent again to connector for recrawl; for example, in a folder content, files will be content items and sub folders will be link items.

Property Type Description
metadata map<string, GenericType> Holds the metadata needed by the connector to recrawl the item.

DeletedItem

Represents an item that was deleted from the data source and has to be removed from the index.

AccessControlList

Restricts the users that can see the search results.

Property Type Description
Entries repeated AccessControlEntry Shows the array or collection of access control list entries.

AclAccessType enumeration members

Enumeration members of the access control list type.

Member Value Description
None 0 Indicates the default value: deny.
Grant 1 The entry is for users/groups with access to the item.
Deny 2 The entry is for users/groups with no access the item and overrides grant for any user/group.

AccessControlEntry

Holds individual access control entries.

Property Type Description
accessType AclAccessType Shows the access type of the entity either grant or deny.
principal Principal Represents a group or user with defined access.

PrincipalType enumeration members

Enumeration members of the principal type.

Member Value Description
PT_None 0 Indicates the default value: user.
User 1 Type of user.
Group 2 Type of group.
Everyone 3 Special group to grant access to everyone.
EveryoneExceptGuests 4 Special group to grant access to everyone except guests.

IdentitySource enumeration members

Enumeration members of identity source.

Member Value Description
IS_None 0 Indicates the default value: Microsoft Entra ID.
AzureActiveDirectory 1 The source of identity is Microsoft Entra ID.

IdentityType enumeration members

Enumeration members of identity type.

Member Value Description
IT_None 0 Indicates the default value: (Azure ADId).
ActiveDirectorySId 1 SID (On premise security identifier) provided by Active Directory (AD).
UserPrincipalName 2 User principal name (UPN).
AadId 3 Azure ADId.

Principal

Structure to store attributes of the principal (user/group).

Property Type Description
type PrincipalType Type of principal.
value string Principal value: the value of the SID, UPN, Azure ADId, and so on.
identitySource IdentitySource The source of identity.
identityType IdentityType Identity representation type.
identitySourceProperties map<string, string> Metadata about the identity source.

SourcePropertyValueMap

Map of the source property key and its value in the data source. It stores the property value of each item.

Property Type Description
values map<string, GenericType> Holds the key and values of the properties of the item. The key is the property name and the value is property value. For example, file content has properties like title, modifiedDate, and so on. The properties keys will be the properties themselves and their values will be the title of the file and file modified date respectively.

ContentType enumeration members

Enumeration members of content type.

Member Value Description
None 0 Default value.
Text 1 Text content type.
Html 2 HTML content type.

Content

Value of the content property of the item, used to render search results.

Property Type Description
contentType ContentType Type of the content.
contentValue string Value of the content property.

CrawlCheckpoint

Identifies the item that was crawled last. It will be saved by the platform and the checkpoint from last successful item batch will be used for resuming crawl if there's a failure or crash. The platform will send the checkpoint in the GetCrawlStream API.

Property Type Description
pagenumber uint32 Shows the page number to mark crawl progress.
batchSize uint32 Holds the number of items returned in every batch. It has a constant value of 1 because each item is streamed individually.
customMarkerData string Custom data needed to identify the last item crawled from the data source.

GenericType

Model to hold the supported types of values by the platform in certain fields like source property values. Only one of the following fields must be set.

Property Type Description
stringValue string Represents a string value.
intValue int64 Represents an int64 (long) value.
doubleValue double Represents a double value.
dateTimeValue google.protobuf.Timestamp Represents a dateTime value.
boolValue bool Represents a Boolean value.
stingCollectionValue StringCollectionType Represents a collection of strings.
intCollectionValue IntCollectionType Represents a collection of int64 (long).
doubleCollectionValue DoubleCollectionType Represents a collection of double.
dateTimeCollectionValue TimestampCollectionType Represents a collection of dateTime.

StringCollectionType

Collection of strings.

Property Type Description
values repeated string Collection or array of strings.

IntCollectionType

Collection of integer values.

Property Type Description
values repeated int64 Collection or array of int64 (long) values.

DoubleCollectionType

Collection of double values.

Property Type Description
values repeated double Collection or array of double values.

TimestampCollectionType

Collection of DateTime values.

Property Type Description
values repeated google.protobuf.Timestamp Collection or array of dateTime values.