Article
OpenAI announced a limited preview of GPT-5.6, introducing three models with a new naming ladder: Sol as the strongest, Terra as a balanced lower-cost option, and Luna as a faster, cheaper model. It says Sol gains a new max reasoning effort plus an ultra mode using subagents, and reports gains on coding (Terminal-Bench 2.1), biology (GeneBench v1), and cybersecurity (ExploitBench, ExploitGym), including stronger performance-per-token and lower token usage for some tasks. The company says safety was hardened with layered model-level refusal training, real-time generation checks, account-level behavior signals, monitoring, and rapid-response jailbreak fixes, plus extensive automated and human red teaming. OpenAI also says the preview is limited at government request: initial access is restricted to trusted partners while it prepares a broader release and helps shape U.S. policy process. It discloses that Sol did not reach autonomous full-chain exploit behavior in evaluated Chrome/Firefox safety tests but still treats that as insufficient proof and keeps controls in place. Pricing was set by tier at 5/30, 2.5/15, and 1/6 input/output per million tokens, with updated cache rules and limited early access on Cerebras at up to 750 tokens per second. It added that API and Codex rollout will expand gradually after the preview. Community reactions focus on whether the improvements are incremental versus major, whether safety language is meaningful, and whether the phased release is justified versus broader access and open benchmarking confidence.
Commenters split between optimism and skepticism. A major cluster is excited about potential latency and agentic workflow gains, especially the announced 750-token-per-second Cerebras option and faster coding assistance. Others argue the value increase is mostly packaging, citing forced migration from older tiers, pricing pressure, naming churn, and confusing positioning across Sol, Terra, and Luna. Several users question benchmark credibility by pointing to prior leaks and anecdotal evidence, and challenge comparisons with Anthropic’s Fable/Mythos, including concerns that Sol may not clearly lead in real usage despite leaderboard gains. Safety and governance receive heavy criticism: many dislike selective rollout to U.S.-approved partners, predict arbitrary blocking in dual-use domains, and worry about account-level monitoring and false positives for legitimate security work. There is also debate over whether stronger cyber safeguards are technically possible without harming legitimate research, and whether this model launch is mainly a response to competitors and policy pressure. Some commenters request practical details on usage limits, Codex tool-call behavior, and the practical function of subagents in ultra mode. Despite the tone in places, there is continuing interest in code, benchmark transparency, and broader release availability rather than pure marketing claims.