Apparent Psychological Profiles of Large Language Models are Largely a Measurement Artifact This research investigates whether the psychological profiles ass...
ScaffoldAgent: Utility-Guided Dynamic Outline Optimization for Open-Ended Deep Research Open-ended deep research involves using AI to gather information thro...
Context-Aware Hierarchical Bayesian Modeling of IVF Laboratory Environmental Conditions This research addresses a significant gap in fertility treatment: whi...
Automating SKILL.md Generation for Computer-Using Agents via Interaction Trajectory Mining explores whether we can automatically create "skill libraries"—col...
BIM-Edit: Benchmarking Large Language Models for IFC-Based Building Information Modeling introduces a new way to test how well artificial intelligence can mo...
Rethinking Shrinkage Bias in LLM FP4 Pretraining: Geometric Origin, Systemic Impact, and UFP4 Recipe Training Large Language Models (LLMs) at 4-bit precision...
SoftSkill: Behavioral Compression for Contextual Adaptation This paper introduces SoftSkill, a method for improving how AI agents adapt to specific tasks.
DeepSWIP: Quotient-WMC Counterfactuals for Neural Probabilistic Logic Programs Neurosymbolic systems, such as DeepProbLog, combine the perceptual power of ne...
QMFOL: Benchmarking Large Language Model Reasoning via Quantifiable Monadic First-Order Logic Test Case Generation As Large Language Models (LLMs) continue t...
Navigating Unreliable Parametric and Contextual Knowledge: Explicit Knowledge Conflict Resolution for LLM Inference Large language models (LLMs) often strugg...