How to copy the files from private github repo to a Azure blob storage

Mahantha MV 35 Reputation points
2024-07-13T16:47:49.3266667+00:00

We have incoming files in the GitHub repo and the task is to ingest these files to Azure blob using Synapse. The file ingested to Blob in a container supposes that if we have the same file of the 2 different versions, the final version should only be shown in the container. But if the previous version file is to be viewed, then it must be possible but at the same time, the blob should contain one version of the file and remember that GitHub is a private repository

Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,860 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,927 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Vinodh247 21,226 Reputation points
    2024-07-14T01:59:05.55+00:00

    Hi Mahantha MV,

    Thanks for reaching out to Microsoft Q&A.

    You can try any of th efollowing approaches to get your files copied from private github repo to a Azure blob storage

    Using Devops pipeline:

    • Have an azure synapse workspace and a linked service to connect to azure blob storage
    • Develop a pipeline in azure synapse that uses data flows or copy activities to fetch files from GitHub and store them in azure blob storage
    • Implement versioning control by using Azure Blob Storage's versioning features or by incorporating logic in your Synapse pipeline

    Using AzCopy:

    Using Azure Functions and PyGithub:

    Create an Azure HTTP trigger function that connects to your private GitHub repository using the PyGithub Python SDK.

    Download the files from the GitHub repo.

    Upload these files to Azure Blob Storage.

    Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.

    0 comments No comments

  2. Sumarigo-MSFT 46,286 Reputation points Microsoft Employee
    2024-07-15T09:06:21.2366667+00:00

    @Mahantha MV Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

    Adding more information to the above response

    To handle the ingestion of files from a private GitHub repository to an Azure blob using Synapse, you can follow these steps:

    1. Create a Linked Service to GitHub: This will allow Azure Data Factory or Synapse Analytics to access the GitHub repository. You'll need to configure the service details, test the connection, and create the new linked service.
    2. Azure Blob Storage Write Settings: Configure the AzureBlobStorageWriteSettings to define how data is written to the Azure Blob storage. This includes specifying the blob path, file format, and any other relevant settings.
    3. Azure Blob Storage Location: Set up the AzureBlobStorageLocation to point to the specific container in the Blob storage where the files will be ingested.
    4. Handle File Versions: To ensure that only the final version of a file is visible in the container, you can use the ETag property to manage file versions. The ETag is a unique identifier that changes every time the file is updated. By comparing ETags, you can determine if a file has a new version and manage it accordingly. SynapseLibraryData.ETag Property
    5. Synapse Pipelines: Use Azure Synapse Pipelines to create and run data ingestion workflows. The pipelines can be configured to ingest data from the GitHub repository and write it to the Azure Blob storage while handling file versions as required.
      Load data into Azure Synapse Analytics using Azure Data Factory or a Synapse pipeline

    Set up the Synapse environment to handle the ingestion process and ensure that the necessary permissions and configurations are in place for accessing the private GitHub repository and writing to the Azure Blob storage. If you need to view previous versions of a file, you may need to implement a version control mechanism within the Blob storage or maintain a separate index or log of file versions.

    Please let us know if you have any further queries. I’m happy to assist you further.    


    Please do not forget to "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.