Product Marketing FYI

Share this post

🚀 Playbook: How to scrape profile info from GitHub

www.productmarketing.fyi

Discover more from Product Marketing FYI

Building a resource I wish I found 6 months ago: 🚀 Go-to-market playbooks & marketing insights💡. Learn what's worked for other marketing pros so you can try it yourself.
Continue reading
Sign in

🚀 Playbook: How to scrape profile info from GitHub

This is a technical playbook I use to export hundreds of GitHub user profiles to a spreadsheet for easy analysis and targeting

Zevi Reinitz
Sep 12, 2023
Share this post

🚀 Playbook: How to scrape profile info from GitHub

www.productmarketing.fyi
Share

GitHub as a source of information

GitHub is the world’s largest social network for technical people, which makes it a great source of information and targeted activities if you’re targeting a technical audience.

But unlike other social networks, GitHub isn’t built for direct communication or chat. It’s designed to get people to look at projects and contribute code. GitHub users can also leave a “star” on projects they’re interested in. It’s equivalent to an “upvote” or a “like” on other platforms.

Analyzing clusters of users based on the projects they star, or the contributions they make can be very helpful for dev-marketers. Check out this playbook, for example, to see how I’ve put this into action IRL.

The hard part is accessing and exporting this information from GitHub.

So I’m going to outline the steps and the tools I use to export this data from GitHub and get it into a usable CSV.

GitHub GraphQL API Explorer

The main tool I use for this is the GitHub GraphQL API Explorer. It allows me to export lists of Stargazers from specific GitHub projects, together with their GitHub profile information.

It’s not glamorous, but it’s the path of least resistance to extracting the info I need.

Here’s how I use it.

The initial query - first 100 results

Open the GitHub GraphQL API Explorer and login with your GitHub account.

Paste the following query into the left pane of the Explorer to extract the relevant data from your desired GitHub project:

{
  repository(owner: "[repo owner]", name: "[repo name]"){
    stargazers(first: 100){
      pageInfo{
        endCursor
        
        hasNextPage
      }
      nodes{
        company
        email
        login
        email
        twitterUsername
        websiteUrl
        followers{
          totalCount
        }
        organizations(first:10){
          nodes{
            name
          }
        }
      }
    }
  }
}

You’ll need to replace the “owner” and “ name” to match the specific repository you’re targeting to extract stargazer lists.

Press the "play” button to run the query. The output is a list of 100 stargazers with the relevant profile information displayed in JSON format on the right side of the Explorer.

100 is the maximum number of results that the API allows in one batch. You can extract additional batches of 100 stargazers from the list by simply running the command again, and telling it to extract the next set of 100 users. But we’ll get to that in a second. First, let’s understand what to do with this raw JSON.

Converting the JSON to CSV

Once you have the JSON staring you in the face, you need to convert it to something usable, like a CSV. I found a simple “JSON to CSV converter” that you can use for free.

Copy the JSON results from the GitHub API explorer, and paste it into the JSON converter. The output is a table with the results that can be copied/pasted into a spreadsheet.

There are ways to further automate this part of the process, but you need to know a little coding to do that. So for now, we’ll stick to the basics.

With the first 100 results in your spreadsheet, you’re ready to get some more. To do this, you need to ask the GitHub Explorer to show you the next set of 100 results in the list.

Asking GitHub for the next 100 users in the list

To do so, go back to the same GitHub Explorer window and modify the top of the query that you’re using in the left panel. The top few lines currently look like this:

{
  repository(owner: "[repo owner]", name: "[repo name]"){
    stargazers(first: 100){
      pageInfo{
        endCursor

You now want it to modify those lines with two simple changes:

  • Add the “after” parameter

  • Make it point to the “endCursor” string

So now, the query on the left side looks like this:

{
  repository(owner: "[repo owner]", name: "[repo name]"){
    stargazers(first: 100 after: "[the endCursor string]="){
      pageInfo{
        endCursor

Now, copy the long “endCursor” string that appears at the top of the right panel, above the first 100 results. Paste that string next to the “after” parameter in your query. (replacing “the endCursor string” placeholder text above).

This is basically asking the API to pull the next 100 results in the list.

Every page of 100 results that is displayed has a unique “endCursor” string that appears at the top of it. So to see the next 100 results in the list, simply copy the endCursor, paste it into the query, and run the query again with the “play” button.

Here’s what it looks like IRL:

Loading video

Each time the new endCursor is applied to the query and the query is run again, the results on the right side will update to show the next 100 users in the list.

For each set of 100 users, simply use the JSON-to-CSV tool to convert it and append it to your growing spreadsheet of contacts.

Run this as many times as needed until you have a healthy number of contacts to work with.

  • Pro tip: Before closing your API Explorer window and analyzing your spreadsheet of GitHub contacts, Copy the next query that you’ll want to run and paste it somewhere for safekeeping. This will let you pick up where you left off in the stargazers list (which can be pretty long for a popular project).

Share this post

🚀 Playbook: How to scrape profile info from GitHub

www.productmarketing.fyi
Share
Comments
Top
New

No posts

Ready for more?

© 2023 Zevi Reinitz
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing