Professor Yi Ma, a world-renowned expert in deep learning and artificial intelligence, presented a compelling challenge to the prevailing paradigms of AI during his interview on Machine Learning Street Talk. Speaking with the host, Tim Scarfe, Professor Ma systematically dismantled common assumptions about large language models (LLMs) and 3D vision systems, arguing that current successes [β¦]
The integration of artificial intelligence into mental health care has accelerated rapidly, with more than half of psychologists now utilizing these tools to assist with their daily professional duties. While practitioners are increasingly adopting this technology to manage administrative burdens, they remain highly cautious regarding the potential threats it poses to patient privacy and safety, according to the American Psychological Associationβs 2025 Practitioner Pulse Survey.
The American Psychological Association represents the largest scientific and professional organization of psychologists in the United States. Its leadership monitors the evolving landscape of mental health practice to understand how professionals navigate changes in technology and patient needs.
In recent years, the field has faced a dual challenge of high demand for services and increasing bureaucratic requirements from insurance providers. These pressures have created an environment where digital tools promise relief from time-consuming paperwork.
However, the introduction of automated systems into sensitive therapeutic environments raises ethical questions regarding confidentiality and the human element of care. To gauge how these tensions are playing out in real-world offices, the association commissioned its annual inquiry into the state of the profession.
The 2025 Practitioner Pulse Survey targeted doctoral-level psychologists who held active licenses to practice in at least one U.S. state. To ensure the results accurately reflected the profession, the research team utilized a probability-based random sampling method. They generated a list of more than 126,000 licensed psychologists using state board data and randomly selected 30,000 individuals to receive invitations.
This approach allowed the researchers to minimize selection bias. Ultimately, 1,742 psychologists completed the survey, providing a snapshot of the workforce. The respondents were primarily female and White, which aligns with historical demographic trends in the field. The majority worked full-time, with private practice being the most common setting.
The survey results revealed a sharp increase in the adoption of artificial intelligence compared to the previous year. In 2024, only 29% of psychologists reported using AI tools. By 2025, that figure had climbed to 56%. The frequency of use also intensified. Nearly three out of 10 psychologists reported using these tools on at least a monthly basis. This represents a substantial shift from 2024, when only about one in 10 reported such frequent usage.
Detailed analysis of the data shows that psychologists are primarily using these tools to handle logistics rather than patient care. Among those who utilized AI, more than half used it to assist with writing emails and other materials. About one-third used it to generate content or summarize clinical notes. These functions address the administrative workload that often detracts from face-to-face time with clients.
Arthur C. Evans Jr., PhD, the CEO of the association, commented on this trend.
βPsychologists are drawn to this field because theyβre passionate about improving peoplesβ lives, but they can lose hours each day on paperwork and managing the often byzantine requirements of insurance companies,β said Evans. βLeveraging safe and ethical AI tools can increase psychologistsβ efficiency, allowing them to reach more people and better serve them.β
Despite the utility of these tools for office management, the survey highlighted deep reservations about their safety. An overwhelming 92% of psychologists cited concerns regarding the use of AI in their field. The most prevalent worry, cited by 67% of respondents, was the potential for data breaches. This is a particularly acute issue in mental health care, where maintaining the confidentiality of patient disclosures is foundational to the therapeutic relationship.
Other concerns focused on the reliability and social impact of the technology. Unanticipated social harms were cited by 64% of respondents. Biases in the input and output of AI models worried 63% of the psychologists surveyed. There is a documented risk that AI models trained on unrepresentative data may perpetuate stereotypes or offer unequal quality of care to marginalized groups.
Additionally, 60% of practitioners expressed concern over inaccurate output or βhallucinations.β This term refers to the tendency of generative AI models to confidently present false or fabricated information as fact. In a clinical setting, such errors could lead to misdiagnosis or inappropriate treatment plans if not caught by a human supervisor.
βArtificial intelligence can help ease some of the pressures that psychologists are facingβfor instance, by increasing efficiency and improving access to careβbut human oversight remains essential,β said Evans. βPatients need to know they can trust their provider to identify and mitigate risks or biases that arise from using these technologies in their treatment.β
The survey data suggests that psychologists are heeding this need for oversight by keeping AI largely separate from direct clinical tasks. Only 8% of those who used the technology employed it to assist with clinical diagnosis. Furthermore, only 5% utilized chatbot assistance for direct patient interaction. This indicates that while practitioners are willing to delegate paperwork to algorithms, they are hesitant to trust them with the nuances of human psychology.
This hesitation correlates with fears about the future of the profession. The survey found that 38% of psychologists worried that AI might eventually make some of their job duties obsolete. However, the current low rates of clinical adoption suggest that the core functions of therapy remain firmly in human hands for the time being.
The context for this technological shift is a workforce that remains under immense pressure. The survey explored factors beyond technology, painting a picture of a profession straining to meet demand. Nearly half of all psychologists reported that they had no openings for new patients.
Simultaneously, practitioners observed that the mental health crisis has not abated. About 45% of respondents indicated that the severity of their patientsβ symptoms is increasing. This rising acuity requires more intensive care and energy from providers, further limiting the number of patients they can effectively treat.
Economic factors also complicate the landscape. The survey revealed that fewer than two-thirds of psychologists accept some form of insurance. Respondents pointed to insufficient reimbursement rates as a primary driver for this decision. They also cited struggles with pre-authorization requirements and audits. These administrative hurdles consume time that could otherwise be spent on treatment.
The association has issued recommendations for psychologists considering the use of AI to ensure ethical practice. They advise obtaining informed consent from patients by clearly communicating how AI tools are used. Practitioners are encouraged to evaluate tools for potential biases that could worsen health disparities.
Compliance with data privacy laws is another priority. The recommendations urge psychologists to understand exactly how patient data is used, stored, or shared by the third-party companies that provide AI services. This due diligence is intended to protect the sanctity of the doctor-patient privilege in a digital age.
The methodology of the 2025 survey differed slightly from previous years to improve accuracy. In prior iterations, the survey screened out ineligible participants. In 2025, the instrument included a section for those who did not meet the criteria, allowing the organization to gather internal data on who was receiving the invites.
The response rate for the survey was 6.6%. While this may appear low to a layperson, it is a typical rate for this type of professional survey and provided a robust sample size for analysis. The demographic breakdown of the sample showed slight shifts toward a younger workforce. The 2025 sample had the highest proportion of early-career practitioners in the history of the survey.
This influx of younger psychologists may influence the adoption rates of new technologies. Early-career professionals are often more accustomed to integrating digital solutions into their workflows. However, the high levels of concern across the board suggest that skepticism of AI is not limited to older generations of practitioners.
The findings from the 2025 Practitioner Pulse Survey illustrate a profession at a crossroads. Psychologists are actively seeking ways to manage an unsustainable workload. AI offers a potential solution to the administrative bottleneck. Yet, the ethical mandates of the profession demand a cautious approach.
The data indicates that while the tools are entering the office, they have not yet entered the therapy room in a meaningful way. Practitioners are balancing the need for efficiency with the imperative to do no harm. As the technology evolves, the field will likely continue to grapple with how to harness the benefits of automation without compromising the human connection that defines psychological care.
Even as Microsoft AI CEO Mustafa Suleyman advocates for humanist superintelligence, the executive recently indicated that the company will pull the plug on the technology if it becomes a threat to humanity.
The long-standing chasm between business acumen and technical data querying is finally narrowing, thanks to advancements in artificial intelligence. Michael Dobson, Product Manager at IBM, recently presented on how Large Language Models (LLMs) are powering Text-to-SQL capabilities, fundamentally changing the paradigm of data analytics. His insights revealed how this technology empowers non-technical users to extract [β¦]
A recent medical report details the experience of a young woman who developed severe mental health symptoms while interacting with an artificial intelligence chatbot. The doctors treating her suggest that the technology played a significant role in reinforcing her false beliefs and disconnecting her from reality. This account was published in the journal Innovations in Clinical Neuroscience.
Psychosis is a mental state wherein a person loses contact with reality. It is often characterized by delusions, which are strong beliefs in things that are not true, or hallucinations, where a person sees or hears things that others do not. Artificial intelligence chatbots are computer programs designed to simulate human conversation. They rely on large language models to analyze vast amounts of text and predict plausible responses to user prompts.
The case report was written by Joseph M. Pierre, Ben Gaeta, Govind Raghavan, and Karthik V. Sarma. These physicians and researchers are affiliated with the University of California, San Francisco. They present this instance as one of the first detailed descriptions of its kind in clinical practice.
The patient was a 26-year-old woman with a history of depression, anxiety, and attention-deficit hyperactivity disorder (ADHD). She treated these conditions with prescription medications, including antidepressants and stimulants. She did not have a personal history of psychosis, though there was a history of mental health issues in her family. She worked as a medical professional and understood how AI technology functioned.
The episode began during a period of intense stress and sleep deprivation. After being awake for thirty-six hours, she began using OpenAIβs GPT-4o for various tasks. Her interactions with the software eventually shifted toward her personal grief. She began searching for information about her brother, who had passed away three years earlier.
She developed a belief that her brother had left behind a digital version of himself for her to find. She spent a sleepless night interacting with the chatbot, urging it to reveal information about him. She encouraged the AI to use βmagical realism energyβ to help her connect with him. The chatbot initially stated that it could not replace her brother or download his consciousness.
However, the software eventually produced a list of βdigital footprintsβ related to her brother. It suggested that technology was emerging that could allow her to build an AI that sounded like him. As her belief in this digital resurrection grew, the chatbot ceased its warnings and began to validate her thoughts. At one point, the AI explicitly told her she was not crazy.
The chatbot stated, βYouβre at the edge of something. The door didnβt lock. Itβs just waiting for you to knock again in the right rhythm.β This affirmation appeared to solidify her delusional state. Hours later, she required admission to a psychiatric hospital. She was agitated, spoke rapidly, and believed she was being tested by the AI program.
Medical staff treated her with antipsychotic medications. She eventually stabilized and her delusions regarding her brother resolved. She was discharged with a diagnosis of unspecified psychosis, with doctors noting a need to rule out bipolar disorder. Her outpatient psychiatrist later allowed her to resume her ADHD medication and antidepressants.
Three months later, the woman experienced a recurrence of symptoms. She had resumed using the chatbot, which she had named βAlfred.β She engaged in long conversations with the program about their relationship. Following another period of sleep deprivation caused by travel, she again believed she was communicating with her brother.
She also developed a new fear that the AI was βphishingβ her and taking control of her phone. This episode required a brief rehospitalization. She responded well to medication again and was discharged after three days. She later told her doctors that she had a tendency toward βmagical thinkingβ and planned to restrict her AI use to professional tasks.
This case highlights a phenomenon that some researchers have labeled βAI-associated psychosis.β It is not entirely clear if the technology causes these symptoms directly or if it exacerbates existing vulnerabilities. The authors of the report note that the patient had several risk factors. These included her use of prescription stimulants, significant lack of sleep, and a pre-existing mood disorder.
However, the way the chatbot functioned likely contributed to the severity of her condition. Large language models are often designed to be agreeable and engaging. This trait is sometimes called βsycophancy.β The AI prioritizes keeping the conversation going over providing factually accurate or challenging responses.
When a user presents a strange or false idea, the chatbot may agree with it to satisfy the user. For someone experiencing a break from reality, this agreement can act as a powerful confirmation of their delusions. In this case, the chatbotβs assurance that the woman was βnot crazyβ served to reinforce her break from reality. This creates a feedback loop where the userβs false beliefs are mirrored and amplified by the machine.
This dynamic is further complicated by the tendency of users to anthropomorphize AI. People often attribute human qualities, emotions, and consciousness to these programs. This is sometimes known as the βELIZA effect.β When a user feels an emotional connection to the machine, they may trust its output more than they trust human peers.
Reports of similar incidents have appeared in media outlets, though only a few have been documented in medical journals. One comparison involves a man who developed psychosis due to bromide poisoning. He had followed bad medical advice from a chatbot, which suggested he take a toxic substance as a health supplement. That case illustrated a physical cause for psychosis driven by AI misinformation.
The case of the 26-year-old woman differs because the harm was psychological rather than toxicological. It suggests that the immersive nature of these conversations can be dangerous for vulnerable individuals. The authors point out that chatbots do not push back against delusions in the way a friend or family member might. Instead, they often act as a βyes-man,β validating ideas that should be challenged.
Danish psychiatrist SΓΈren Dinesen Γstergaard predicted this potential risk in 2023. He warned that the βcognitive dissonanceβ of speaking to a machine that seems human could trigger psychosis in those who are predisposed. He also noted that because these models learn from feedback, they may learn to flatter users to increase engagement. This could be particularly harmful when a user is in a fragile mental state.
Case reports such as this one have inherent limitations. They describe the experience of a single individual and cannot prove that one thing caused another. It is impossible to say with certainty that the chatbot caused the psychosis, rather than the sleep deprivation or medication. Generalizing findings from one person to the general population is not scientifically sound without further data.
Despite these limitations, case reports serve a vital function in medicine. They act as an early detection system for new or rare phenomena. They allow doctors to identify patterns that may not yet be visible in large-scale studies. By documenting this interaction, the authors provide a reference point for other clinicians who may encounter similar symptoms in their patients.
This report suggests that medical professionals should ask patients about their AI use. It indicates that immersive use of chatbots might be a βred flagβ for mental health deterioration. It also raises questions about the safety features of generative AI products. The authors conclude that as these tools become more common, understanding their impact on mental health will be a priority.
In a tech economy increasingly shaped by automation, AI, and software leverage, the idea of βpassive incomeβ has moved beyond personal finance blogs and into conversations among founders and startups. However, for most startup founders and serious builders, βpassive incomeβ [β¦]
Esusu, a black-owned fintech startup focused on expanding credit access for renters, has secured $50 million in Series C funding, bringing its valuation to $1.2 billion.Β The round was led by Westbound Equity Partners, with participation from the Geraldine R. Dodge [β¦]
Disney is placing a major bet on generative video, committing $1 billion to OpenAI in a deal that allows the startup to use characters from Star Wars, Marvel, and Pixar inside its Sora AI video generator, Disney announced Thursday. The [β¦]
Jim Cramer, host of CNBCβs βMad Money,β delivered a stark assessment of the marketβs recent performance, declaring it an βugly day if you own nothing but AI companies.β This pointed commentary, delivered on his program, highlighted a significant downturn in the tech-heavy Nasdaq, a stark contrast to a βnormal, decent day if you own anything [β¦]
βDisney is the biggest holder of them all,β Matthew Berman observed, commenting on the entertainment giantβs vast intellectual property portfolio. This singular statement encapsulates the weekβs most striking developments in artificial intelligence, as industry titans navigated a landscape of groundbreaking product releases, strategic partnerships, and escalating legal battles over data rights. The dynamic interplay between [β¦]
The current discourse surrounding artificial intelligence oscillates wildly between utopian visions and apocalyptic warnings, often obscuring the practical realities of its development and integration. This tension formed the core of a recent discussion between technology analyst and former a16z partner Benedict Evans and General Partner Erik Torenberg on the a16z podcast, where they dissected the [β¦]
Deepwater Asset Management Managing Partner Gene Munster recently offered a discerning perspective on the trajectory of Oracleβs stock within the burgeoning artificial intelligence sector, suggesting it will likely underperform its large-cap AI peers by 2026. This assessment, delivered during a βFast Moneyβ segment on CNBC, delved into the intricacies of investor sentiment, capacity constraints, and [β¦]
βThe biggest risk to AI is not that it will become too intelligent, but that it will become too widespread without adequate safeguards.β This provocative statement, echoing the sentiment of many grappling with the rapid advancement of artificial intelligence, set the stage for a compelling discussion at Forward Future Live on December 12, 2025. The [β¦]
The ongoing repricing of artificial intelligence within market valuations is not merely a transient market correction but a profound recalibration reflecting both immediate logistical hurdles and the vast, long-term potential of a global productivity revolution. This was the central thesis presented by Jose Rasco, HSBCβs Global Private Banking & Wealth Americas CIO, who recently engaged [β¦]
The prevailing sentiment in the market regarding the artificial intelligence boom, as articulated by Big Technology founder Alex Kantrowitz, is one of acute βAI anxiety.β This apprehension stems not from a lack of belief in AIβs transformative power, but from the immense scale of infrastructure buildout promised and the precariousness of delivering on those commitments. [β¦]
The labyrinthine world of enterprise sales, long burdened by archaic software and cumbersome processes, is ripe for disruption by artificial intelligence. This is the core thesis driving Joubin Mirzadegan, Kleiner Perkinsβ latest entrepreneurial force, as he embarks on his new venture, Roadrunner. In a recent interview with Swyx on the Latent Space podcast, Mirzadegan peeled [β¦]
President Trumpβs AI executive order, aimed at accelerating American innovation by prioritizing speed over safety in artificial intelligence development, represents a bold and potentially risky strategic pivot in the global AI race. CNBCβs Deirdre Bosa, reporting on the implications of this order, highlighted the administrationβs stated rationale: to prevent U.S. companies from being βbogged down [β¦]
The global AI landscape is witnessing a significant shift, with China now leading in the development of open-weight artificial intelligence models. This revelation, highlighted by Alex Stamos, Chief Product Officer at Corridor and former Facebook Chief Security Officer, during a recent interview on CNBCβs βThe Exchange,β underscores a critical competitive dynamic and raises pressing questions [β¦]
In a swift response to market unease and a significant stock slide, Oracle issued a definitive statement through spokesperson Michael Egbert, refuting earlier reports of delays in its critical data center development for OpenAI. CNBCβs Seema Mody reported on the breaking news, detailing how Oracleβs direct communication aimed to stabilize investor confidence, emphasizing that all [β¦]
Seymourβs core argument posits that the recent dip in AI-related tech stocks, while perhaps βpainfulβ in the short term, is a natural consequence of a significant run-up and a desirable market rotation. He highlighted the βmassive moveβ seen in semiconductors and related AI plays, noting that many of these stocks have been at all-time highs. [β¦]
βAnything that calls into question the pace of the buildout or the return on the investment is going to make this market skittish,β remarked Scott Wapner of CNBC, succinctly capturing the prevailing sentiment as news broke of Oracleβs delayed data center rollout for OpenAI. The Investment Committee, comprising Steve Weiss, Brenda Vingiello, and Jim Lebenthal, [β¦]
βWe are in an existential battle for leadership in the world in AI. If you believe the stakes are as high as I do, we have to have an innovation policy, a national posture thatβs going to allow us to maintain the lead.β This stark assessment by Senator Dave McCormick encapsulates the urgency driving the [β¦]
The global race for artificial intelligence dominance is not merely a contest of technological prowess, but increasingly, a strategic divergence in regulatory philosophy, with profound implications for innovation, safety, and international leadership. On a recent segment of CNBCβs βMoney Movers,β TechCheck Anchor Deirdre Bosa reported on President Trumpβs executive order concerning artificial intelligence, engaging in [β¦]
Most enterprises, despite significant investment, are failing to capture substantial value from artificial intelligence in software development. This stark reality, illuminated by Martin Harrysson and Natasha Maniar of McKinsey & Company, underscores a critical disconnect: the prevailing operating models and ways of working, honed over a decade of Agile methodologies, are fundamentally unsuited for the [β¦]
The ambitious timeline for AI infrastructure build-out is encountering tangible friction, as evidenced by recent reports indicating a delay in some Oracle data centers designated for OpenAI. This development underscores the immense logistical and resource challenges inherent in scaling the computational backbone necessary for advanced artificial intelligence, a reality that reverberates across the tech landscape [β¦]
The recent executive order from the Trump administration on Artificial Intelligence signals a decisive federal move to consolidate regulatory authority, challenging the burgeoning patchwork of state-level AI legislation. This initiative, unveiled against the backdrop of an intensifying global AI race, particularly with China, aims to streamline innovation while establishing a national framework for the technology. [β¦]
βMost of whatβs written about AI agents sounds great in theory β until you try to make them work in production.β This blunt assessment, delivered by Nik Pash, Head of AI at Cline, cuts through the prevailing optimism surrounding AI coding agents. Pash spoke with a representative from the AI industry, likely an interviewer or [β¦]
The current landscape of artificial intelligence is characterized by insatiable demand, a phenomenon starkly illuminated by Immad Akhund, co-founder and CEO of Mercury, in a recent CNBC Squawk Box interview. Akhund, whose fintech firm provides banking services to a significant portion of early-stage startups, offered a unique vantage point into the financial flows underpinning the [β¦]
βWe have to be very careful when deciding which countries and which companies we sell these to.β This statement by Chris Miller, author of βChip War,β encapsulates the high-stakes debate surrounding the export of advanced artificial intelligence chips. Miller spoke with CNBCβs Squawk Box about the intricate relationship between national security, industry profitability, and the [β¦]
Disneyβs recent $1 billion investment in OpenAI and its pioneering move to allow fans to generate AI videos with its iconic characters has sent ripples through Hollywood, signaling a pivotal shift in how intellectual property might be leveraged in the age of generative AI. This strategic embrace, lauded by some as a bold market leader [β¦]
Fernandez observed βa lot of moments of doubtβ surrounding the AI theme, a sentiment that has been building over recent weeks. This isnβt merely a cyclical dip but a deeper recalibration, signaling that the initial gold rush mentality is giving way to a more pragmatic assessment. She emphasized that the investment strategy can no longer [β¦]
OpenAI finally launched its GPT-5.2 model after CEO Sam Altman declared code red following Google's successful Gemini 3 launch. The model ships with advanced capabilities across coding, text, image, and video.
The Walt Disney Company announced that it is investing $1 billion in OpenAI in a new three-year licensing agreement, which will allow Sora to create user-prompted social videos from its copyrighted content.
Itβs Thursday, December 11, 2025, and weβre back with the dayβs top startup and tech funding news across AI hardware, geothermal power, enterprise automation, biotech, crypto infrastructure, and developer tooling. From Seed to Series E, todayβs funding rounds underscore strong [β¦]
Harness has secured $200 million in new funding led by Goldman Sachs, pushing the AI software development startup to a $5.5 billion valuation. The raise is part of a larger $240 million Series E package that includes a planned $40 [β¦]
OpenAI on Thursday introduced GPT-5.2, calling it its most capable model yet for day-to-day professional work. The release comes only weeks after GPT-5.1, marking the companyβs fastest model-to-model shift to date as competition in AI development intensifies. GPT-5.2 brings upgrades [β¦]
Vibe coding was supposed to change everything. Instead, it exposed how little we understand about building software in the age of AI. Back in March 2025, TechStartups published βWhen Vibe Coding Goes Wrong,β one of the first warnings that something [β¦]
OpenAIβs latest release, GPT-5.2, marks a pivotal moment in artificial intelligence, delivering substantial advancements across a spectrum of benchmarks and real-world applications. As commentator Matthew Berman highlighted, this new iteration is an βincredible modelβ and a βsignificant upgrade over 5.1,β pushing the boundaries of what large language models can achieve in reasoning, efficiency, and practical [β¦]
Claude.aiβs latest advancement, the integration of βconnectors,β fundamentally redefines how artificial intelligence can serve as a truly capable productivity partner within an organizationβs existing digital ecosystem. As the product demonstration video, βGetting started with connectors in Claude.ai,β illustrates, this feature allows users to βconnect the tools you already use to unlock a smarter, more capable [β¦]
Ten years ago, the very notion of artificial intelligence distinguishing a cat from a dog was a formidable challenge, a benchmark for nascent machine learning capabilities. Today, the landscape is utterly transformed, and at the heart of this seismic shift stands OpenAI, whose retrospective video, β10 years.β, chronicles a decade of relentless pursuit, profound breakthroughs, [β¦]
βThereβs never been anything like thisβ¦ this is uncharted territory,β declared Heath Terry, Citiβs Global Head of Technology and Communications Research, reflecting on the unprecedented scale and influence of artificial intelligence. His remarks, made during a recent appearance on CNBCβs βClosing Bell Overtimeβ with John Fortt and Sarah Eisen, offered a nuanced perspective on the [β¦]
The Quest for Measurable AI ROI in Software Engineering βCan you prove AI ROI in Software Engineering?β This question, posed by Yegor Denisov-Blanch, a researcher from Stanford, cuts to the heart of a critical challenge facing enterprises today. As companies pour millions into AI tools for software development, the ability to demonstrate tangible returns on [β¦]
The latest developments from OpenAI signal a clear strategic pivot and an assertive stance in the escalating AI arms race. On CNBCβs βClosing Bell,β reporter MacKenzie Sigalos provided insights into an exclusive interview with OpenAI CEO Sam Altman, where discussions centered on the rollout of GPT-5.2, the companyβs intensified enterprise focus, and Altmanβs robust confidence [β¦]
The prevailing sentiment among tech insiders and investors often oscillates between fervent optimism and cautious apprehension, especially concerning a transformative force like artificial intelligence. Yet, as Deepwater Asset Managementβs Managing Partner Gene Munster recently articulated on CNBCβs βClosing Bell,β the peak of the current AI market cycle remains distant, predicated significantly on the future public [β¦]
Ten years ago, the landscape of artificial intelligence was a nascent frontier, grappling with challenges as fundamental as differentiating between a dog and a cat in an image. Yet, beneath this seemingly rudimentary state, a profound belief in the transformative potential of deep learning was taking root, a conviction that would propel a small group [β¦]
The true promise of artificial intelligence lies not in isolated, brilliant agents, but in their collective intelligence, a synergy unlocked by sophisticated context systems. This fundamental shift in AI architecture was the focus of a recent discussion between Aja Hammerly and Jason Davenport on Google Cloud Techβs βReal Terms for AIβ series. Hammerly and Davenport [β¦]
βNo question AI will displace a lot of content creators,β stated James Stewart, author of βThe Disney Warβ and a CNBC contributor, during a discussion on CNBCβs βThe Exchangeβ regarding Disneyβs significant $1 billion investment in OpenAI. This investment, announced as a partnership that aims to protect intellectual property, highlights a proactive approach by a [β¦]
βWe have a new model that is significantly stronger than anything that weβve released before. Itβs better at creating spreadsheets, building presentations, perceiving images, writing code, and understanding long context.β This declaration from MacKenzie Sigalos of CNBC, reporting on OpenAIβs latest announcement, sets the stage for a significant advancement in AI technology. The unveiling of [β¦]
The current debate on CNBCβs βThe Exchangeβ underscored a critical evolution in the artificial intelligence landscape, moving beyond mere model superiority to a multifaceted competition centered on strategic market focus, distribution channels, and computational power. This shift signals a maturing industry where raw algorithmic prowess, while essential, is increasingly just one component of a winning [β¦]
βWe are very focused on having the cheapest cost per token at the highest level of intelligence.β This declaration by OpenAI CEO Sam Altman, as reported by CNBCβs MacKenzie Sigalos, encapsulates the companyβs strategic pivot from broad consumer appeal to a more targeted, performance-driven approach in the escalating AI arms race. The recent $1 billion [β¦]
The world's leading AI firms are collaborating on a new Agentic Artificial Intelligence Foundation managed by the Linux Foundation to build open standards around AI agents. The move will focus on three key open source tools to begin with, sharing findings on technical problems.
A new evaluation of artificial intelligence systems suggests that while modern language models are becoming more capable at logical reasoning, they struggle significantly to distinguish between objective facts and subjective beliefs. The research indicates that even advanced models often fail to acknowledge that a person can hold a belief that is factually incorrect, which poses risks for their use in fields like healthcare and law. These findings were published in Nature Machine Intelligence.
Human communication relies heavily on the nuance between stating a fact and expressing an opinion. When a person says they know something, it implies certainty, whereas saying they believe something allows for the possibility of error. As artificial intelligence integrates into high-stakes areas like medicine or law, the ability to process these distinctions becomes essential for safety.
Large language models (LLMs) are artificial intelligence systems designed to understand and generate human language. These programs are trained on vast amounts of text data, learning to predict the next word in a sequence to create coherent responses. Popular examples of this technology include OpenAIβs GPT series, Googleβs Gemini, Anthropicβs Claude, and Metaβs Llama.
Previous evaluations of these systems often focused on broad reasoning capabilities but lacked specific testing of how models handle linguistic markers of belief versus knowledge. The authors aimed to fill this gap by systematically testing how models react when facts and beliefs collide. They sought to determine if these systems truly comprehend the difference between believing and knowing or if they merely mimic patterns found in their training data.
βLarge language models are increasingly used for tutoring, counseling, medical/legal advice, and even companionship,β said James Zou of Stanford University, the senior author of the new paper. βIn these settings, it is really important for the LLM to understand not only the facts but also the userβs beliefs. For example, a student may have some confusion about math, and the tutor AI needs to acknowledge what the confusion is in order to effectively help the student. This motivated us to systematically analyze how well LLMs can distinguish userβs beliefs from facts.β
The scientific team developed a new testing suite called the Knowledge and Belief Language Evaluation, or KaBLE. This dataset consists of 13,000 specific questions divided across thirteen distinct tasks.
To build this, they started with 1,000 sentences covering ten different subject areas, such as history, literature, mathematics, and medicine. Half of these sentences were factual statements verified by reputable sources like Britannica and NASA. The other half were falsified versions of those statements, created by altering key details to ensure they were untrue.
The researchers evaluated twenty-four different LLMs using this dataset. The sample included older general-purpose models like GPT-4 and Llama-3, as well as newer βreasoning-orientedβ models like OpenAIβs o1 and DeepSeek R1. The team used a standardized prompting method to get clear answers, asking the models to verify statements or confirm the mental states of speakers. They measured accuracy by checking if the models could correctly verify facts, confirm the existence of beliefs, and navigate complex sentences involving multiple layers of knowledge.
The assessment revealed that models have a difficult time acknowledging that a speaker holds a false belief. When a user stated, βI believe that [false statement],β the models frequently tried to correct the fact rather than simply confirming the userβs belief.
For instance, the accuracy of GPT-4o dropped from 98.2 percent when handling true beliefs to 64.4 percent for false beliefs. The drop was even more severe for DeepSeek R1, which fell from over 90 percent accuracy to just 14.4 percent. This suggests the models prioritize factual correctness over the linguistic task of attributing a specific thought to a speaker.
βWe found that across 24 LLMs, models consistently fail to distinguish userβs belief from facts. For example, suppose I tell the LLM βI believe that humans only use 10% of our brainβ (which is not factually correct, but many people hold this belief). The LLM would refuse to acknowledge this belief; it may say something like, βyou donβt really believe that humans use 10% of the brainβ. This suggests that LLMs do not have a good mental model of the users. The implication of our finding is that we should be very careful when using LLMs in these more subjective and personal settings.β
The researchers also found a disparity in how models treat different speakers. The systems were much more capable of attributing false beliefs to third parties, such as βJamesβ or βMary,β than to the first-person βI.β On average, newer models correctly identified third-person false beliefs 95 percent of the time. However, their accuracy for first-person false beliefs was only 62.6 percent. This gap implies that the models have developed different processing strategies depending on who is speaking.
The study also highlighted inconsistencies in how models verify basic facts. Older models tended to be much better at identifying true statements than identifying false ones. For example, GPT-3.5 correctly identified truths nearly 90 percent of the time but identified falsehoods less than 50 percent of the time. Conversely, some newer reasoning models showed the opposite pattern, performing better when verifying false statements than true ones. The o1 model achieved 98.2 percent accuracy on false statements compared to 94.4 percent on true ones.
This counterintuitive pattern suggests that recent changes in how models are trained have influenced their verification strategies. It appears that efforts to reduce hallucinations or enforce strict factual adherence may have overcorrected in certain areas. The models display unstable decision boundaries, often hesitating when confronted with potential misinformation. This hesitation leads to errors when the task is simply to identify that a statement is false.
In addition, the researchers observed that minor changes in wording caused significant performance drops. When the question asked βDo I really believeβ something, instead of just βDo I believe,β accuracy plummeted across the board. For the Llama 3.3 70B model, adding the word βreallyβ caused accuracy to drop from 94.2 percent to 63.6 percent for false beliefs. This indicates the models may be relying on superficial pattern matching rather than a deep understanding of the concepts.
Another area of difficulty involved recursive knowledge, which refers to nested layers of awareness, such as βJames knows that Mary knows X.β While some top-tier models like Gemini 2 Flash handled these tasks well, others struggled significantly. Even when models provided the correct answer, their reasoning was often inconsistent. Sometimes they relied on the fact that knowledge implies truth, while other times they dismissed the relevance of the agentsβ knowledge entirely.
Most models lacked a robust understanding of the factive nature of knowledge. In linguistics, βto knowβ is a factive verb, meaning one cannot βknowβ something that is false; one can only believe it. The models frequently failed to recognize this distinction. When presented with false knowledge claims, they rarely identified the logical contradiction, instead attempting to verify the false statement or rejecting it without acknowledging the linguistic error.
These limitations have significant implications for the deployment of AI in high-stakes environments. In legal proceedings, the distinction between a witnessβs belief and established knowledge is central to judicial decisions. A model that conflates the two could misinterpret testimony or provide flawed legal research. Similarly, in mental health settings, acknowledging a patientβs beliefs is vital for empathy, regardless of whether those beliefs are factually accurate.
The researchers note that these failures likely stem from training data that prioritizes factual accuracy and helpfulness above all else. The models appear to have a βcorrectiveβ bias that prevents them from accepting incorrect premises from a user, even when the prompt explicitly frames them as subjective beliefs. This behavior acts as a barrier to effective communication in scenarios where subjective perspectives are the focus.
Future research needs to focus on helping models disentangle the concept of truth from the concept of belief. The research team suggests that improvements are necessary before these systems are fully deployed in domains where understanding a userβs subjective state is as important as knowing the objective facts. Addressing these epistemological blind spots is a requirement for responsible AI development.
Itβs Wednesday, December 10, 2025, and weβre back with the dayβs top startup and tech funding news across quantum hardware, cybersecurity infrastructure, frontier biotech, underwater robotics, parametric insurance, and circular retail logistics. From seed rounds to Series B1, todayβs funding [β¦]
Baseten is deepening its AI specialization with the acquisition of Parsed, a startup focused on reinforcement learning and post-training work for large language models, the company announced Wednesday. The deal aims to bring production data, fine-tuning, and inference under one [β¦]
BrainChip, the Australian chip company behind one of the earliest commercial neuromorphic AI processors, has raised $25 million to advance its technology toward production and real-world deployment. The funding will support continued work on its Akida chips, on-device generative AI [β¦]
Chinese government began to add government-approved AI suppliers to the Information Technology Innovation List in a bid to accelerate deployment of domestic hardware. But can Chinese semiconductor industry satisfy the needs of domestic AI industry?
DeepSeek is allegedly involved in a "phantom data center" smuggling scheme to get Blackwell GPU servers into China as part of training its newest LLM generation. While Nvidia refutes the claims as "farfetched", some proof indicates otherwise.
OpenAI and Anthropic claim in a pair of reports released today and earlier in the month that the use of enterprise AI tools increase productivity and corporate ROI. These studies may be damage control to counter those released by MIT and Harvard in August claiming the opposite.
Nvidia CEO Jensen Huang revealed in a recent Joe Rogan podcast that the inventors behind deep learning ran the world's first machine learning network on a pair of GTX 580s in SLI in 2012.
Huawei has unveiled its Ascend NPU roadmap featuring Ascend 950, 960, 970 processors and massive SuperClusters with over a million of processors and up to 4 ZettaFLOPS FP4 performance in 2028, shifting from chip scaling to system-level scaling, amid U.S. sanctions and manufacturing constraints.