The test works. The operating model around it does not.
The penetration test itself is rarely the problem. The methodology is sound. The testers are competent. The report is thorough. And then nothing happens. Only 48% of all vulnerabilities identified in penetration tests are ever resolved. The median time to resolve findings is 67 days -- nearly five times the 14-day SLA most organisations set for themselves. Meanwhile, the average eCrime breakout time -- from initial access to lateral movement -- is 29 minutes.
The mismatch is structural. A standard penetration test runs for one to three weeks. It captures a snapshot of your environment at a single point in time. By the time you receive the report, your environment has changed: new code has been deployed, configurations have shifted, staff have rotated. NIST registered 40,000 CVEs in 2024, up 43% from the prior year. Time-to-exploit for new vulnerabilities has collapsed from 32 days in 2022 to just 5 days in 2023--2024.
This is not an argument against penetration testing. It is an argument against treating a pen test as a complete security programme. An annual pen test without continuous vulnerability management, without a functioning remediation pipeline, without executive accountability for findings -- that is compliance theatre, not security assurance. As our assessment sequencing guide explains, the right assessment at the wrong time delivers the wrong outcome.
The uncomfortable truth: A penetration test tells you what a skilled attacker could achieve in a constrained window. It does not tell you what a persistent threat actor will do over months with unlimited patience. The value is in acting on findings, not in receiving the report.
Organisations that extract genuine value from pen testing share three characteristics: they have a functioning vulnerability management process that tracks findings to closure, executive-level accountability for remediation timelines, and they treat the pen test as one input into a continuous security programme rather than the programme itself. A Lighthouse Assessment can help determine whether your organisation is ready to extract real value from a pen test or whether foundational controls should come first.
The findings are predictable. The remediation gaps are where the risk lives.
After a decade of conducting penetration tests, the most common finding categories remain stubbornly consistent. Security misconfigurations account for 20--30% of all issues. Broken access control is OWASP's number one category in 2025. Cross-site scripting represents 18.4% of web vulnerabilities. Weak or default passwords remain the most common internal pen test finding. And outdated or unpatched software appears in 60% of organisations.
The proportion of serious findings has actually declined from 20% to 11% over the past decade -- a real improvement. But that improvement has plateaued, and it masks a critical gap: only 69% of serious (high and critical) findings are addressed, while 31% remain open. Large enterprises leave roughly 45% of discovered vulnerabilities unresolved after 12 months. Small firms with fewer than 200 employees face the greatest concentration of risk, accounting for 87% of all critical and high findings.
Business logic flaws -- the findings scanners will never catch
The most valuable penetration test findings are often the ones that automated tools cannot detect. Business logic vulnerabilities exploit an application's intended functionality in unintended ways -- working code doing something it should not from a business perspective. OWASP explicitly states they "cannot be detected by a vulnerability scanner and rely upon the skills and creativity of the penetration tester."
Real-world examples illustrate the impact. Negative quantity manipulation has reduced product prices from hundreds of dollars to cents in e-commerce systems. Race conditions in gift card transfer systems have enabled unlimited credit generation. Booking system APIs have exposed passenger data modification capabilities using only a six-character reference -- no authentication required. These vulnerabilities are invisible to automated scanning but represent enormous business risk. Our third-party security risk guide covers how business logic flaws in vendor systems create additional attack surface.
If your pen test report reads like a vulnerability scan export -- a list of CVEs with CVSS scores and no business context -- you are paying pen test prices for scanner results. A genuine penetration test demonstrates exploitation, chains vulnerabilities together, and communicates business impact.
Not all pen tests are the same. Choosing the wrong type wastes budget and creates blind spots.
Penetration tests are classified by knowledge level (black box, grey box, or white box) and by target domain. The right choice depends on your threat model, regulatory obligations, and what you are trying to learn. Most organisations should start with external network and web application testing, then expand scope based on findings and maturity. See our essential assessments guide for sequencing advice.
Black box, grey box, white box -- which approach?
Black box testing (zero prior knowledge) simulates an external attacker with no inside information. White box (full access to source code, architecture docs, and credentials) maximises coverage. Grey box (partial knowledge, typically valid credentials and basic architecture understanding) sits between them.
Our position: grey box is almost always the right default. It reflects the most realistic threat scenario -- an attacker with initial access, stolen credentials, or insider knowledge -- and delivers the best value for testing budget. Black box wastes time on reconnaissance that duplicates what a real attacker would acquire quickly through OSINT or credential theft. White box is appropriate for specific scenarios like source code review or pre-deployment testing.
Scanners find known vulnerabilities. Testers find what matters.
The distinction between automated vulnerability scanning and manual penetration testing is foundational, yet frequently blurred by providers who deliver scanner output repackaged as manual testing. Both are necessary. Neither is sufficient alone. Manual pen testing has uncovered nearly 2,000 times more unique security vulnerabilities than automated scans alone.
| Dimension | Automated scanning | Manual penetration testing |
|---|---|---|
| Scope | Broad infrastructure coverage; checks against databases of known CVEs | Targeted, risk-based; focuses on high-value assets and attack paths |
| Depth | Surface-level; identifies individual vulnerabilities in isolation | Chains vulnerabilities across layers to demonstrate real exploitation paths |
| Business logic | Cannot detect; scanners do not understand business context | Core strength; testers identify flaws in intended application behaviour |
| False positives | 3--48% for SAST tools; legacy DAST tools up to 82%. Triaging a single finding takes ~10 minutes | Very low; testers validate every finding through exploitation |
| Frequency | Continuous or scheduled; suitable for daily/weekly/monthly execution | Periodic; typically annual or after significant changes |
| Cost | Lower per scan; scales efficiently across large environments | Higher per engagement; cost reflects skilled human effort |
| AI/LLM testing | Emerging tools but limited effectiveness for novel attack patterns | Essential; prompt injection and jailbreaking require creative human testing |
You need both. Automated scanning provides the broad, continuous baseline: catch known CVEs, monitor for configuration drift, flag newly disclosed vulnerabilities. Manual penetration testing provides the depth: validate what is actually exploitable, chain findings into real attack paths, test business logic, and demonstrate business impact. Replacing manual testing with scanning is like replacing a building inspector with a smoke detector -- useful, but fundamentally different.
The optimal model uses automated scanning for continuous coverage and schedules manual application testing for depth validation of high-risk areas. Organisations using this combined approach reduce alert noise and focus effort where it counts.
If you have deployed an AI chatbot, you have a new attack surface most pen tests ignore.
AI-related vulnerability reports grew 210% in 2025. Prompt injection reports surged 540%. Yet most penetration test scopes do not include AI or LLM testing because the methodologies are new and the testers lack experience. If your organisation has deployed customer-facing AI, you have an attack surface that your standard pen test is not examining.
The OWASP Top 10 for LLM Applications 2025 defines the key risk categories. Prompt injection -- manipulating model responses via crafted inputs -- is ranked the number one risk, with OWASP acknowledging that fool-proof prevention methods may not exist. Twenty per cent of jailbreak attempts succeed, with the average attack taking just 42 seconds across five interactions. Ninety per cent of successful prompt injections result in leakage of sensitive data.
Real failures that demonstrate the stakes
A car dealership's AI chatbot was manipulated into agreeing to sell a vehicle for one dollar. A parcel delivery firm's chatbot was prompted to swear and recommend competitors, reaching 1.3 million views. Most consequentially, an airline's chatbot gave incorrect refund policy advice, and the tribunal ruled that companies remain liable for information provided by their AI chatbots. In Australia, a major retailer's AI shopping assistant began fabricating personal stories during customer interactions in early 2026.
Standard penetration testing methodologies were not designed for this attack surface. LLM-specific testing requires understanding of prompt injection techniques, multi-turn attack sequences, RAG pipeline poisoning, data exfiltration through conversation manipulation, and agentic AI abuse patterns. Our secure AI adoption guide covers the broader risk framework, and our secure AI services include dedicated AI system testing.
Questions to ask your pen test provider about AI testing: Do your testers have experience with prompt injection and jailbreaking techniques? Do you test against the OWASP Top 10 for LLM Applications? Can you test multi-turn attack sequences, not just single-prompt injections? Do you assess the full stack including RAG pipelines, tool-use capabilities, and system prompt leakage?
From vulnerability scanning to purple teaming -- match your testing to your maturity.
Security testing exists on a maturity curve. Each level builds on the one before it, and skipping levels wastes money. If done too soon, red teaming will expose problems your organisation already knows about but has not fixed. If done too late, penetration testing delivers diminishing returns because your basic vulnerabilities are already managed. The key is matching the test to your current maturity.
Our position: most Australian mid-market organisations should be at Level 2, progressing toward Level 3. Jumping to red teaming before your penetration test findings are largely resolved is paying for advanced testing while basic problems persist. Conversely, if your pen tests consistently return clean results, you have outgrown Level 2 and should be testing your detection and response capabilities at Level 3.
Rotate testers, not necessarily vendors. And know what accreditations actually mean.
No major regulatory framework -- including APRA CPS 234, PCI DSS v4.0, or ISO 27001 -- mandates the rotation of penetration testing providers. They mandate independence, competence, regularity, and systematic approaches, but leave provider selection to organisational discretion.
The case for rotation centres on fresh perspectives: different testers with different backgrounds find different things. New testers approach assessments without the preconceived notions that develop over time. The counter-argument is equally compelling: rotation causes loss of institutional knowledge, imposes learning curves that can reduce effectiveness, and sacrifices historical trend data.
Our position: rotate the individual testers every two to three engagements, not necessarily the vendor. What matters is fresh eyes on the scope, not a new logo on the report. The ideal model has the previous tester perform quality assurance on the new tester's findings, providing fresh perspective while retaining institutional knowledge. This delivers the benefits of rotation without the costs of starting from scratch.
Accreditations that matter in Australia
CREST accreditation is the primary quality assurance mechanism for penetration testing in Australia. CREST requires dual-factor recognition: both the organisation and the individual testers must meet accreditation and certification standards. Member companies undergo rigorous assessment covering operating procedures, personnel security, testing approach, and data security, with full re-assessment every three years. PCI DSS v4.0 specifically references CREST as a recommended certification. Many Australian Government agencies and APRA-regulated entities require or prefer CREST-accredited providers.
Individual tester certifications to look for include OSCP (Offensive Security Certified Professional, demonstrating practical exploitation skills), OSCE/OSWE (advanced web and exploit development), and GPEN/GXPN (SANS-based penetration testing certifications). The certifications matter less than the practical experience behind them, but they provide a baseline indicator of competence.
Red flags in pen test proposals
Watch for these warning signs when evaluating proposals. A scope that is purely automated with no manual testing effort described. No named testers on the engagement. No methodology description or reference to standards like OWASP, PTES, or CREST. A deliverable that is a scanner export with a cover page. A price significantly below market that cannot be explained by reduced scope. And no provision for retesting of critical and high findings -- if your provider does not offer retesting, they are not invested in your remediation outcome.
What APRA, ASD, and the SOCI Act actually require -- and what they leave to your judgement.
Australian regulators are converging on the expectation that penetration testing is a minimum baseline, not an optional extra. The debate is no longer whether to test, but how often and how deeply. The following summarises what each framework actually requires for security testing, linking to our detailed guides where available.
The regulatory intent is consistent: systematic, independent, risk-proportionate security testing at minimum annually. Organisations subject to multiple frameworks -- and most mid-to-large Australian organisations are -- can align a single well-scoped testing programme to satisfy overlapping requirements. A cybersecurity audit can map your testing programme to all applicable regulatory obligations.
A vulnerability found in development costs 6x less to fix than one found in production.
The economic case for finding vulnerabilities earlier is well established. Fixing a bug found during implementation costs approximately 6 times more than one identified during design. During testing, 15 times more. Post-release, 30 times more. The global average data breach cost reached USD 4.88 million in 2024, a 10% increase from the prior year, while breaches contained within 30 days cost USD 1.76 million less than those that took longer.
Integrating security testing into the software development lifecycle -- SAST in code review, DAST in staging, software composition analysis in CI/CD pipelines -- catches the low-hanging vulnerabilities before they reach production. Seventy-four per cent of security professionals have shifted left or plan to. Over half of DevOps teams run SAST scans and approximately 50% scan containers and dependencies.
This does not eliminate the need for penetration testing. It changes what pen testing is for. When automated security checks catch the known vulnerabilities in development, penetration testers spend their time on what they are uniquely qualified to find: business logic flaws, complex attack chains, architectural weaknesses, and contextual risks that automated tools miss. Shift-left reduces the volume of low-hanging findings in your pen test report and increases the value of what testers spend their time on.
Our web application testing engagements are designed to complement DevSecOps pipelines, not duplicate them. We focus manual effort on the vulnerability classes that automation cannot reach, and our security architecture reviews can help embed testing into your SDLC from the start.
CREST-accredited testers. Named individuals. Manual testing. No scanner-only reports.
Every engagement starts with scoping. A Lighthouse Assessment determines the right type, depth, and timing of testing for your environment, threat profile, and regulatory obligations. We do not upsell testing you do not need, and we will tell you if your maturity level means foundational controls should come before a pen test.
Our testing methodology follows OWASP, PTES, and CREST standards. Manual testing is the core -- scanner results supplement human analysis, not replace it. Every engagement has named CREST-accredited testers assigned, not anonymous resources pulled from a bench. You know who is testing your systems and can speak to them directly about findings.
What you receive
Every penetration test delivers an executive summary with business impact context for board and leadership audiences, a detailed technical report with step-by-step reproduction instructions for every finding, risk-rated remediation recommendations prioritised by exploitability and business impact, and retesting of critical and high findings included as standard -- not an optional add-on.
We rotate testers across engagements to ensure fresh perspectives while maintaining institutional knowledge of your environment. And because we follow an assessment-first approach, our recommendations focus on what you actually need to fix, not on selling follow-on work.
Testing services
Our testing and assurance practice covers the full spectrum of security testing.
- →Infrastructure and network penetration testing -- external, internal, and segmentation validation
- →Web application and API testing -- OWASP-aligned with business logic focus
- →Social engineering assessments -- phishing, vishing, and physical intrusion
- →Breach simulation and red teaming -- objective-based adversary emulation
- →Wireless security assessments -- WiFi, segmentation, and rogue device detection
- →AI and LLM security testing -- prompt injection, jailbreaking, and RAG pipeline assessment