BIP Pennsylvania News

collapse
Home / Daily News Analysis / Anthropic is making the security tools it’s used with Claude Mythos Preview just a bit more available.

Anthropic is making the security tools it’s used with Claude Mythos Preview just a bit more available.

May 27, 2026  Twila Rosenbaum  11 views
Anthropic is making the security tools it’s used with Claude Mythos Preview just a bit more available.

Anthropic, the artificial intelligence company behind the Claude family of large language models, has announced that it is making the internal security tools it uses with Claude Mythos Preview more widely available. The company is extending access to these tools—originally designed for internal red-teaming and safety evaluations—to qualifying customers upon request. This includes capabilities such as skills, a Claude harness, and a threat model builder. The announcement comes as part of a larger update on Project Glasswing, an initiative focused on advancing AI security and transparency.

Claude Mythos Preview, which was launched earlier this year, is a specialized variant of Anthropic's Claude model tailored for creative and narrative generation. However, like all powerful AI systems, it requires rigorous security testing to prevent misuse, such as generating harmful content or being exploited by adversarial actors. The tools Anthropic is now sharing were developed over months of internal testing and have been instrumental in identifying vulnerabilities and improving the model's robustness. By offering them to select external partners, the company aims to foster a broader community of security researchers and enterprise users who can contribute to safer deployment of AI.

What the Security Tools Offer

The three main tools being made available are: skills, a Claude harness, and a threat model builder. The skills tool allows developers and security teams to define specific capabilities they want to test or enforce within the AI system. For example, a skill could be designed to detect and block attempts to generate phishing emails or bypass content filters. The Claude harness is a testing framework that automates the execution of thousands of security evaluation scenarios, simulating various attack vectors and monitoring the model's responses. It provides detailed logs and metrics that help identify weaknesses. Finally, the threat model builder is a structured approach for mapping out potential risks, attack surfaces, and mitigation strategies specific to a given AI deployment. Together, these tools form a comprehensive security testing suite that can be integrated into an organization's existing pipeline.

Anthropic has emphasized that access is not automatic; only “qualifying” customers—typically those with strong security practices and a demonstrated need—will be considered. The company is evaluating requests on a case-by-case basis, prioritizing partners who are already involved in AI safety research or who operate in high-stakes domains such as healthcare, finance, or national security. This selective approach is designed to ensure that the tools are used responsibly and that learnings from external deployments can feed back into Anthropic's own safety research.

Project Glasswing: A Broader Initiative

Project Glasswing is Anthropic's umbrella effort to improve transparency and security across its AI systems. The name suggests a focus on making the inner workings of AI more visible and understandable, much like a glass wing allows inspection of an aircraft's internal mechanisms. In addition to the security tools rollout, Anthropic has published a new dashboard of open-source vulnerabilities that have been disclosed by the Mythos Preview program. This dashboard provides researchers and the public with a real-time view of potential weaknesses that have been identified and fixed, fostering a culture of openness that is still rare in the AI industry.

The company also plans to expand Project Glasswing to additional partners in the coming months. While initially focused on Claude Mythos Preview, the goal is to eventually cover the entire Claude ecosystem, including future models. This expansion will likely involve new partnerships with academic institutions, independent security auditors, and enterprise clients who can bring diverse perspectives and use cases. Anthropic's approach mirrors that of other AI leaders who have embraced external red-teaming, but the company is distinguishing itself by sharing the actual tools and frameworks rather than just aggregated results.

Background: The Importance of AI Security Tools

The release of these tools reflects a broader trend in the AI industry toward proactive security testing. As large language models become more powerful and integrated into critical applications, the potential for harm increases. Without robust security measures, AI systems can be used to generate misinformation, impersonate individuals, automate cyberattacks, or leak sensitive data. Traditional cybersecurity approaches often fall short because they are not designed to handle the unique challenges of generative AI, such as prompt injection, model inversion, or adversarial examples.

Anthropic has been a vocal advocate for rigorous safety testing. The company was founded by former OpenAI researchers who left over concerns about safety culture. Since then, it has published numerous papers on interpretability, alignment, and red-teaming. Its models, including Claude and now Claude Mythos Preview, are built with constitutional AI principles that embed ethical guidelines directly into the training process. However, the company recognizes that even the best-constitutional AI models can have blind spots, which is why external testing is crucial.

The tools being made available are not entirely new; they have been used internally since the early days of Claude’s development. But by sharing them, Anthropic hopes to enable a wider community to conduct their own security assessments. This can lead to faster discovery of edge cases and more robust defenses. In a blog post accompanying the announcement, the company stated that “security is a shared responsibility,” and that “the most effective safety measures come from diverse groups of testers using diverse methods.”

Implications for the AI Industry

The move is likely to influence how other AI companies approach security tooling. As of now, many vendors keep their internal testing methodologies proprietary, treating them as competitive advantages. Anthropic’s decision to release these tools could pressure competitors to follow suit, leading to higher baseline security standards across the industry. It may also accelerate the development of standardized benchmarks for AI safety. For instance, the threat model builder could become a template that other organizations adopt, similar to how MITRE ATT&CK frameworks are used in conventional cybersecurity.

On the other hand, there are risks. Making powerful security tools available to a wider audience—even qualified ones—could lead to misuse if they fall into malicious hands. Anthropic’s selective qualification process is intended to mitigate this, but it remains to be seen how effective it will be. The company has also stated that it will monitor usage patterns and reserve the right to revoke access if tools are used irresponsibly.

Another important aspect is the dashboard of open-source vulnerabilities. By publicly listing discovered weaknesses, Anthropic is contributing to a growing repository of knowledge that helps the entire AI ecosystem improve. This move aligns with recent calls from researchers and policymakers for greater transparency in AI development. However, it also raises questions about responsible disclosure: how long should a company wait before publishing a vulnerability, and what safeguards should be in place to prevent exploitation before patches are deployed?

Technical Details for Developers

For developers and security engineers interested in accessing these tools, the qualification process involves submitting an application through Anthropic’s partner portal. Applicants must describe their intended use case, their existing security protocols, and the expected impact on AI safety. Once approved, they receive a secure SDK that integrates with the Claude Mythos Preview API. The skills tool allows them to define custom rules and constraints in a declarative language, while the Claude harness can be run locally or in a cloud environment to simulate large-scale testing. The threat model builder comes with pre-built templates for common AI risk scenarios, such as jailbreaking, system prompt leakage, and bias amplification.

Anthropic has also provided sample code and documentation to help users get started quickly. The company emphasizes that these tools are designed to be flexible and extensible, allowing organizations to adapt them to their own workflows. For example, a financial institution might create a skill that prevents the model from generating any content containing specific financial terms without proper disclaimers. Similarly, a government agency could use the threat model builder to map out risks related to disinformation campaigns and then run harness tests to evaluate how well Claude Mythos Preview resists such attacks.

The dashboard of open-source vulnerabilities is accessible to the public without authentication. It lists each vulnerability with a unique identifier, a severity score, a description, the date discovered, and the date a fix was implemented. Since its launch, the dashboard has cataloged over 40 vulnerabilities, ranging from low-severity input validation issues to high-severity prompt injection vectors. Each entry also includes mitigation advice and links to relevant code changes. This level of transparency is rare in the AI industry, where companies often keep vulnerability databases private to avoid reputational damage.

Beyond the immediate announcement, Anthropic has hinted at future updates. The company is reportedly working on integrating these security tools with its enterprise offering, allowing users to test custom models fine-tuned on proprietary data. Additionally, there are plans to launch a bug bounty program specifically for Claude Mythos Preview, albeit with narrower scope than typical bounty programs. These developments suggest that Anthropic is committed to making security a continuous, evolving process rather than a one-time checkbox.

As AI continues to permeate every sector—from education and entertainment to defense and healthcare—the need for robust, accessible security tools will only grow. Anthropic’s latest move represents a practical step toward bridging the gap between cutting-edge AI development and real-world safety requirements. By empowering qualified customers with the same tools used by its own internal teams, the company is not only enhancing the security of its own models but also setting a precedent for the industry at large. The expansion of Project Glasswing and the vulnerability dashboard further underscore this commitment to transparency and collaboration.

The timing is also noteworthy. With increasing regulatory scrutiny on AI, including the European Union’s AI Act and various proposals in the United States, companies that can proactively demonstrate strong security practices will be better positioned to comply with future regulations. Anthropic’s initiative could serve as a model for how AI developers can voluntarily implement safety measures while maintaining competitive advantage. While it remains to be seen how many customers will qualify and how effective the tools will be in the wild, the announcement marks a significant moment in the ongoing effort to make AI systems both powerful and safe.


Source: The Verge News


Share:

Your experience on this site will be improved by allowing cookies Cookie Policy