This article covers phishing infrastructure analysis for defensive purposes — understanding attacker tooling to build better detection and response. All referenced techniques are for authorized security research and defense.
—
Phishing kits are to credential theft what WordPress themes are to websites: packaged, deployable infrastructure that lets someone with minimal technical skill stand up a convincing phishing operation in minutes. Analyzing them is one of the most practically informative things a defender can do — you learn exactly what the attacker sees when they deploy, what artifacts they leave behind, and where detection and takedown opportunities exist.
Here’s what modern phishing kits look like and how to analyze them.
What a Phishing Kit Is
A phishing kit is a compressed archive (typically a ZIP file) containing everything needed to deploy a credential harvesting page: HTML/CSS mimicking a legitimate login page, PHP scripts to capture and exfiltrate submitted credentials, and often anti-analysis features designed to evade automated scanners and redirect researchers.
Kits range from single-file scraped clones of login pages to sophisticated multi-page flows with 2FA bypass infrastructure, victim geolocation filtering, bot detection, and automated credential forwarding via Telegram bots, email, or direct API calls to attacker-controlled infrastructure.
The market for phishing kits is active and professionalized. Dedicated Telegram channels and dark web forums sell kits for specific targets (Microsoft 365, banking institutions, cryptocurrency exchanges) for anywhere from $20 to several hundred dollars. Higher-end kits include update subscriptions as targeted companies change their login page design.
The Anatomy of a Modern Kit
The lure page. The entry point, designed to replicate the legitimate site as closely as possible. Modern kits typically scrape the legitimate login page dynamically (fetching CSS, images, and scripts from the real site at page load time) rather than using static copies — this means the fake page updates automatically as the real site changes, and hosted images don’t need to be bundled.
Credential capture. A PHP backend that receives submitted form data, logs it locally, and forwards it to the attacker. The exfiltration method varies: email (SMTP), Telegram bot API, Discord webhook, or direct write to a log file. Telegram bots have become the dominant exfiltration channel in kits from the past two to three years because they’re fast, encrypted, and the attacker can monitor from a phone.
Anti-analysis features. This is where kit sophistication varies most:
- IP blacklisting of known scanner IPs, Cloudflare ranges, security vendor ranges
- User-agent filtering (blocks headless browsers, common scanner signatures)
- Geolocation filtering (only serves the credential page to IPs from target countries)
- Redirect to legitimate site if the visitor appears to be a researcher
.htaccessrules blocking specific crawlers and ASNs
2FA bypass (in advanced kits). Real-time phishing proxies that sit between victim and legitimate site, forwarding the session in real time. When the victim enters their password, the proxy immediately authenticates to the real site; when the 2FA prompt appears on the real site, the kit prompts the victim for their code; the proxy submits it. The attacker captures a valid, authenticated session cookie — bypassing both password and 2FA. These are called “adversary in the middle” (AiTM) kits. Evilginx2 and Modlishka are the open-source implementations used by both red teamers and real attackers.
Analyzing a Kit: The Methodology
When you obtain a kit (from a phishing URL, threat intel sharing, honeypot, or a takedown engagement), here’s the analytical workflow:
1. Static analysis of the archive.
# List contents without extracting
unzip -l phishing_kit.zip
mkdir /tmp/kit_analysis && cd /tmp/kit_analysis
unzip ../phishing_kit.zip
Look at the file structure first: how many files, what types (PHP, HTML, JS), what’s in the root. The directory structure tells you immediately how sophisticated the kit is.
2. Credential exfiltration endpoint.
Find where credentials go. Search for email sends, Telegram API calls, and file writes:
grep -r "mail(" . --include="*.php"
grep -r "telegram" . --include="*.php" -i
grep -r "bot_token\|chat_id\|webhook" . --include="*.php" -i
grep -r "file_put_contents\|fwrite" . --include="*.php"
The Telegram bot token, if present, is gold for attribution and intelligence. A bot token in the format [0-9]+:[A-Za-z0-9_-]{35} can be queried via the Telegram Bot API to get information about the bot, including when it was created and what chats it’s in (before the attacker revokes it).
3. Target identification.
The kit tells you who it’s targeting. Look at the lure page HTML and any hardcoded URLs or brand assets:
grep -r "microsoft\|office365\|m365\|mimecast\|docusign" . -i | head -20
grep -r "logo\|brand\|favicon" . --include="*.html" | head -10
4. Anti-analysis mechanisms.
cat .htaccess 2>/dev/null
grep -r "ip_block\|blacklist\|block_ip" . --include="*.php" -i
grep -r "user.agent\|HTTP_USER_AGENT" . --include="*.php"
grep -r "geoip\|country\|geo_" . --include="*.php" -i
The blacklisted IP ranges tell you which security vendors the kit author was aware of when they built it. A comprehensive blocklist suggests a more experienced threat actor.
5. Victim logging.
Find where captured credentials are stored locally:
find . -name ".txt" -o -name ".log" -o -name "*.csv" | xargs ls -la 2>/dev/null
grep -r "fopen\|file_put" . --include=".php" | grep -v "//.fopen"
If the kit was already deployed and pulled down with credentials in it, you’ve found victim data — handle appropriately and notify affected parties.
6. Infrastructure indicators.
grep -r "http[s]://" . --include=".php" | grep -v "//.*http"
grep -r "curl_setopt\|curl_exec" . --include="*.php"
Hardcoded C2 URLs, redirects to attacker-controlled infrastructure, and callback URLs are IOCs worth extracting and sharing.
Real-World Kit Characteristics
A few patterns that appear consistently in kit analysis:
The “Redirector” pattern. Many kits include a first stage that validates the victim before serving the phishing page — checking IP reputation, geolocation, and user agent. A researcher hitting the URL gets redirected to the legitimate site; a victim from the target country gets the phishing page. This pattern defeats naive automated scanning.
Panel infrastructure. Higher-end kit operations include a separate admin panel where the attacker can view captured credentials, manage deployments, and track victim statistics. These panels are often deployed on separate infrastructure from the phishing pages and use password authentication. Finding the panel URL in a kit (look for references to /admin, /panel, /dashboard in PHP files) is useful for broader infrastructure mapping.
Kit reuse indicators. Many kit authors leave identifiable artifacts — specific variable names, comment styles, email templates, or credit strings. These “kit fingerprints” let you correlate separate phishing deployments to the same kit author or operation. The Phish Report and PhishTank databases contain kit fingerprints from the research community.
What I’ve Actually Found Pulling These Apart
The most useful ten minutes I’ve spent on a kit was grepping a sample for bot_token and getting an actual live Telegram bot token back — the attacker hadn’t rotated it. Querying getMe on the Bot API told me the bot’s username and creation date; querying getUpdates (before the token got revoked, which happened within a day of the kit becoming public) showed message history that included what looked like test submissions the operator had sent to themselves while building the page. That’s a mistake a more careful operator wouldn’t make, and it’s exactly the kind of thing that shows up more often than you’d expect in kits sold for $20-50 rather than the higher-end AiTM tooling.
The other pattern I run into constantly: kit authors reuse their own infrastructure across multiple kits. The same panel login page, the same directory naming convention, the same comment header left in from whatever base template they started from — once you’ve fingerprinted one kit from an operator, you start recognizing their other deployments just from directory listing structure alone, before even opening a file. It’s the software equivalent of a criminal reusing a getaway car.
One caution worth stating plainly: don’t interact with a live phishing panel beyond passive analysis. Even authenticated-looking admin panels can be instrumented to log and fingerprint visitors, and depending on jurisdiction, accessing one without authorization crosses legal lines regardless of intent. Pull the kit down, analyze it offline, and route anything requiring live interaction with attacker infrastructure through proper authorized channels.
Detection Opportunities
From the defender’s side, kit analysis generates detection opportunities at every layer:
Domain monitoring. Newly registered domains resembling your brand, combined with hosting that resolves to known bulletproof hosting ASNs, frequently precede phishing deployments. Certificate transparency logs (crt.sh) provide near-real-time visibility into SSL certificates issued for potential lookalike domains.
Lure page detection. Automated systems that periodically fetch and analyze pages on lookalike domains can detect phishing pages before significant credential theft occurs. The key signal: a page that scraped content from your legitimate site but is hosted at a different domain.
Telegram bot token exposure. Kits with exposed Telegram tokens can be passively monitored or actively disrupted by querying the bot API. Reporting exposed bot tokens to Telegram for deactivation is a legitimate takedown mechanism.
Kit-specific IOCs. Specific PHP file names, directory structures, and code signatures in kits are detectable by web application firewalls and endpoint controls if the kit is deployed against internal systems.
Phishing Defense: Phishing and social engineering security books on Amazon — practitioner-focused resources on understanding and defending against credential phishing at the infrastructure level.
The Telegram token I mentioned earlier got revoked within a day, which is normal — operators rotate fast once a kit’s public. But for that one day, a ten-minute grep produced more actionable intel than most of the automated detection tooling I’ve worked with. That asymmetry — cheap analysis, disproportionate payoff — is why this belongs in every phishing response workflow, not just the well-funded ones.