DEV Community

Tamizh
Tamizh

Posted on

Scrapebase + Permit.io: Web Scraping with Authorization

This is a submission for the Permit.io Authorization Challenge: API-First Authorization Reimagined

What I Built

I built Scrapebase - a web scraping service with tiered access controls that demonstrates API-first authorization using Permit.io. The project separates business logic from authorization concerns using Permit.io's policy-as-code approach.

In many applications, authorization is implemented as an afterthought, resulting in security vulnerabilities and technical debt. Scrapebase demonstrates how to build with authorization as a first-class concern from day one.

Demo
Screenshot: Demo page : Click

Key Features

  • Tiered Service Levels: Free, Pro, and Admin tiers with different capabilities
  • API Key Authentication: Simple authentication using API keys
  • Role-Based Access Control: Permissions managed through Permit.io
  • Domain Blacklist System: Resource-level restrictions for sensitive domains
  • Text Processing: Basic and advanced text processing with role-based restrictions

Role-Based Capabilities

Feature Free User Pro User Admin
Basic Scraping
Advanced Scraping
Text Cleaning
AI Summarization
View Blacklist
Manage Blacklist
Access Blacklisted Domains

Demo

Try it live at: https://scrapebase-permit.up.railway.app/

Test Credentials:

  • Free User: newuser / 2025DEVChallenge
  • Admin: admin / 2025DEVChallenge

Project Repo

Step 1: Clone the repository

Repository: github.com/0xtamizh/scrapebase-permit-IO

https://github.com/0xtamizh/scrapebase-permit-IO.git
cd scrapebase-permit-IO
Enter fullscreen mode Exit fullscreen mode

Step 2: Set up Permit.io

  1. Create a free account at Permit.io
  2. Create a new project
  3. Set up:
    • Resource type: website
    • Actions: scrape_basic, scrape_advanced
    • Roles: free_user, pro_user, admin
  4. Configure role permissions as described above
  5. Generate an Environment API key from the dashboard

Step 3: Configure environment variables

Create a .env file in the project root:

# Permit.io
PERMIT_API_KEY=permit_env_YOUR_ENVIRONMENT_KEY

# API Keys for different user tiers
FREE_API_KEY=2025DEVChallenge_free
PRO_API_KEY=2025DEVChallenge_pro
ADMIN_API_KEY=2025DEVChallenge_admin

# Optional: For AI summarization
DEEPINFRA_API_KEY=your_deepinfra_key

# Server configuration
PORT=8080
NODE_ENV=development

# Browser manager settings
MAX_CONCURRENT_REQUESTS=50
REQUEST_TIMEOUT=60000
QUEUE_TIMEOUT=120000
Enter fullscreen mode Exit fullscreen mode

Step 4: Install dependencies and run

# Install dependencies
npm install

# Make sure to comment this line in src/utils/browserManager
//executablePath: process.env.CHROMIUM_PATH || '/usr/bin/chromium-browser', comment this line so it will use default chromium browser on your device

# Run in development mode
npm run dev
Enter fullscreen mode Exit fullscreen mode

The server will start on http://localhost:8080

Step 5: Test the application

Using the UI:

  1. Open http://localhost:8080 in your browser
  2. "Log in" using the provided credentials
    • User credentials: newuser / 2025DEVChallenge
    • Admin credentials: admin / 2025DEVChallenge
  3. Toggle between Basic (Free) and Pro plans
  4. Enter a domain to scrape (e.g., example.com)

Using the API directly:

# Test with free user
curl -X POST http://localhost:8080/api/processLinks \
  -H "Content-Type: application/json" \
  -H "x-api-key: 2025DEVChallenge_free" \
  -d '{"url": "https://example.com"}'

# Test with admin user
curl -X POST http://localhost:8080/api/processLinks \
  -H "Content-Type: application/json" \
  -H "x-api-key: 2025DEVChallenge_admin" \
  -d '{"url": "https://example.com", "advanced": true}'

# Get blacklist
curl http://localhost:8080/api/blacklist \
  -H "x-api-key: 2025DEVChallenge_free"

# Add domain to blacklist (admin only)
curl -X POST http://localhost:8080/api/blacklist \
  -H "Content-Type: application/json" \
  -H "x-api-key: 2025DEVChallenge_admin" \
  -d '{"domain": "example.com"}'
Enter fullscreen mode Exit fullscreen mode

API-First Authorization

Core Authorization Flow

  1. User sends request with x-api-key header
  2. permitAuth middleware intercepts the request
  3. Middleware maps API key to user role
  4. User is synced to Permit.io
  5. Permission check runs against Permit.io cloud PDP
  6. Request is allowed or denied based on policy decision
┌──────────┐    ┌───────────────┐    ┌────────────┐    ┌──────────────┐
│  Client  │───▶│ Scrapebase API│───▶│permitAuth  │───▶│  Permit.io   │
│          │◀───│               │◀───│ middleware │◀───│  Cloud PDP   │
└──────────┘    └───────────────┘    └────────────┘    └──────────────┘
     │                                                        ▲
     │                                                        │
     └────────────────────────────────────────────────────────┘
       Permission policies defined in Permit.io dashboard
Enter fullscreen mode Exit fullscreen mode

Implementation

The permitAuth middleware handles both role assignment and permission enforcement:

// Role assignment based on API key
switch (apiKey) {
  case process.env.ADMIN_API_KEY:
    userKey = '2025DEVChallenge_admin';
    tier = 'admin';
    break;
  // ...other keys
}

// User sync and permission check
await permit.api.syncUser({
  key: userKey,
  email: `${userKey}@scrapebase.xyz`,
  attributes: { tier, roles: [tier] }
});

const permissionCheck = await permit.check(user.key, action, 'website');
Enter fullscreen mode Exit fullscreen mode

Dashboard Configuration

For permissions to work correctly, you must configure roles and their allowed actions in the Permit.io dashboard:

  1. Create resource type website
  2. Create actions scrape_basic and scrape_advanced
  3. Create roles free_user, pro_user, and admin
  4. Assign permissions to roles:
    • free_user: Can scrape_basic on website
    • pro_user: Can scrape_basic and scrape_advanced on website
    • admin: Can do everything on website

Configuring resource types and actions in Permit.io dashboard

Dashboard:Resource
Setting up role-based permissions for different user tiers

Dashboard:Roles
Managing users and their role assignments

Dashboard:Users

Troubleshooting -> Check repo README

Challenges Faced

Cloud PDP Limitations

Initially, I tried implementing Attribute-Based Access Control (ABAC) by passing resource attributes:

// This DIDN'T work with cloud PDP
const resource = {
  type: 'website',
  key: hostname,
  attributes: {
    is_blacklisted: isBlacklistedDomain
  }
};

const permissionCheck = await permit.check(user.key, action, resource);
Enter fullscreen mode Exit fullscreen mode

The cloud PDP returned 501 errors because it only supports basic RBAC. I had to simplify to a pure RBAC approach:

// This works with cloud PDP
const permissionCheck = await permit.check(user.key, action, resourceType);
Enter fullscreen mode Exit fullscreen mode

My Journey

Why I Built This

Traditional approaches to authorization often result in permission checks scattered throughout application code, creating maintenance nightmares and security risks. I created Scrapebase to demonstrate how modern applications can embrace externalized authorization as a core architectural principle.

Scrapebase isn't just another CRUD app – it tackles a real-world use case (web scraping) with meaningful access control requirements:

  1. Tiered service levels that mirror SaaS subscription models
  2. Administrative functions that require elevated permissions
  3. Resource-based restrictions through the domain blacklist system

What I Learned

Building Scrapebase with Permit.io taught me how to:

  1. Technical Benefits

    • Separation of authorization from business logic
    • External policy management without code changes
    • Scalable from RBAC to ABAC
  2. Business Benefits

    • Non-developers can manage permissions
    • Centralized policy management
    • Better security through consistent enforcement
  3. Developer Experience

    • Cleaner codebase
    • Focus on core features
    • Better maintainability

Why Permit.io Works for SaaS

Permit.io is ideal for SaaS applications because it:

  1. Centralizes policy management outside your codebase
  2. Provides a dashboard for non-developers to configure permissions
  3. Scales from simple RBAC to complex ABAC as your needs grow
  4. Offers audit logs for compliance and debugging

This externalized approach enables business stakeholders to manage authorization policies directly through the Permit.io dashboard, while developers focus on building features - the hallmark of a well-designed API-first authorization system.

Future Improvements

With more time, I would:

  1. Set up a local PDP to enable ABAC with resource attributes
  2. Implement tenant isolation for multi-tenant support
  3. Add UI components in the admin dashboard to view permission audit logs
  4. Create more granular roles and permissions beyond the three tiers
  5. Add a user management section to assign roles through the UI

By implementing these controls through Permit.io rather than hardcoding them, Scrapebase demonstrates how authorization can be managed through declarative policies instead of imperative code – fulfilling the promise of truly API-first authorization.

Top comments (23)

Collapse
 
bhagat-surya profile image
Bhagat Surya

super helpful

Collapse
 
tamizhme profile image
Tamizh • Edited

Thanks, feedback much appreciated. Thinking of scaling this as a service. Permit io really simplified this for my use case. A system review would be much appreciated, site

Collapse
 
sonal_noelraj_796532bb9c profile image
Sonal Noel Raj

Looks really nice. Way to go, does the api service available??

Collapse
 
tamizhme profile image
Tamizh

Thanks. Yes I'm planning to do it as a service, check more using this scrapebase.xyz . Its in under development

Collapse
 
sonal_noelraj_796532bb9c profile image
Sonal Noel Raj

cool!

Collapse
 
sanjay_dhanushv_a5c02530 profile image
SANJAY DHANUSH V

Very Useful

Collapse
 
tamizhme profile image
Tamizh

Thanks!

Collapse
 
inatom_labs_6568f3125f77e profile image
inAtom Labs

nice!

Collapse
 
subash_chandrabosea_cf1 profile image
Subash chandra Bose A

Really liked how you diagrammed the flow between Scrapebase and Permit.io. Made the whole authorization layer super easy to understand.

Collapse
 
tamizhme profile image
Tamizh

Thank your. Really thinking about using permit to production scale. I wanna scale this to a end to end service

Collapse
 
srihari_s_e1acdd7c438a617 profile image
SRIHARI S

Nice documentation

Collapse
 
tamizhme profile image
Tamizh

Thanks!

Collapse
 
rithigesh_4334d8839680dbe profile image
Rithigesh

Amazing !!

Collapse
 
tamizhme profile image
Tamizh

Thanks, lets build cool stuff

Collapse
 
ranvijay_singh_5eee1fd3cf profile image
Ranvijay Singh

Scrapebase isn’t just a project… it’s a weapon. Built with precision, controlled with logic. This is a battlefield design -clean, ruthless, unshakable. Scrapebase stands like a fortress.

Collapse
 
tamizhme profile image
Tamizh

dude!, i sus i know you and Drake Ramoray, i will find you guys.

Collapse
 
drake_ramoray_6b5e771e672 profile image
Drake Ramoray

In all my years as a neurosurgeon… I once separated conjoined twins with a spork, but even that doesn’t compare to the elegance of Scrapebase’s architecture. I have never seen access control executed with such precision. Scrapebase is a marvel—if only I could prescribe it to my patients!

Collapse
 
tamizhme profile image
Tamizh

very funny, sounds like an automated reply, but still thanks though!

Collapse
 
ffbetatestingapp_343ee004 profile image
ffbetatestingapp

Learn how Scrapebase and Permit.io simplify web scraping with secure authorization. Explore powerful tools while discovering insights like "free fire beta server download" in real-time, safely and efficiently.

OSZAR »