A New Paradigm Emerges: Data-Intensive Scientific and Technical Intelligence Rises to Meet Global Challenges

In an era of accelerating technological disruption, national competitiveness increasingly hinges on the ability to detect, interpret, and act upon emerging scientific signals—before they fully materialize. The intelligence community, long accustomed to human-driven analysis of classified reports and expert interviews, is now confronting a deluge of open scientific data: over three million scholarly papers added to Scopus each year, biomedicine alone generating nearly two new publications per minute in PubMed, and billions of technical updates coursing daily through professional forums, preprint servers, patent filings, and corporate disclosures. Amid this flood, the traditional “read, reflect, report” model of intelligence analysis is straining under volume, velocity, and ambiguity. A quiet but profound shift is underway—away from intuition-led interpretation toward a new paradigm: data-intensive scientific and technical intelligence.

The concept, recently articulated by Luo Wei of the Military Science Information Research Center at the Academy of Military Sciences in Beijing, reframes intelligence not as an artisanal craft but as a scalable socio-technical system—one where machine-scale pattern recognition and human-scale judgment operate in tight, iterative feedback loops. Unlike earlier attempts to “automate” intelligence, this approach does not seek to replace analysts. Rather, it aims to augment them—freeing cognitive bandwidth from information foraging and preliminary synthesis, so they can focus on what machines still cannot do: hypothesize, contextualize, challenge assumptions, and weigh implications across political, operational, and ethical dimensions.

At its core, the data-intensive model rests on four interlocking pillars: high-fidelity data infrastructure, reusable analytical workflows, domain-adapted AI models, and scenario-specific human–machine collaboration. Each represents a departure from legacy practice—not merely a digital upgrade, but a rethinking of how insight is generated in the 21st century.

Consider the data foundation. Traditional intelligence databases often treat documents as monolithic objects—PDFs indexed by author, title, and keywords. In contrast, Luo’s vision disaggregates content into atomic knowledge units: sentences, claims, parameters, affiliations, funding sources, even experimental conditions. A single paper on, say, high-temperature superconductivity might yield dozens of structured assertions—about critical transition temperatures under specific pressures, institutional collaborations, citation bursts, or anomalies in measurement methodology. These fragments are then tagged across multiple semantic dimensions: technical domain (e.g., quantum materials), geographic origin (e.g., Hefei, China), application context (e.g., fusion energy), temporal stage (e.g., lab-scale validation), and evidentiary strength (e.g., reproducibility status). Through this “fragmentation, labeling, linking, and reconstruction” process—what the paper calls deep information revelation—disparate signals coalesce into dynamic knowledge graphs where institutions, researchers, technologies, and projects form interconnected nodes, updated in near real time.

Such infrastructure enables a second breakthrough: reusable intelligence workflows. Intelligence questions, it turns out, are not infinitely varied. Many fall into recurring archetypes: Who is leading in X? What’s the provenance of breakthrough Y? Is trend Z accelerating or plateauing? Could capability W be weaponized? Luo argues that rather than reinventing analytic procedures each time, teams should codify best-practice templates—for instance, a “technology provenance” workflow that starts with a project identifier (like a U.S. DARPA contract number), automatically retrieves associated publications, patents, and contractor disclosures, maps authorship networks to identify core versus peripheral contributors, reconstructs technical lineage across iterations, and flags anomalies—such as a sudden spike in citations from defense-linked labs, or a divergence between public claims and experimental reproducibility metrics.

Crucially, these workflows are not rigid pipelines. They function as living frameworks, refined through operational feedback. When an analyst notices a gap—say, the model missed a key subcontractor because it only scanned primary authors—the workflow evolves: a new data layer is added (e.g., corporate registration databases cross-referenced with procurement records), and a new validation rule is encoded. Over time, workflows become institutional memory made executable.

Yet infrastructure and process alone are insufficient without domain-grounded intelligence models. Here, Luo emphasizes a critical insight: off-the-shelf AI tools trained on generic web text fail catastrophically on scientific discourse. A transformer model fine-tuned on news articles may misinterpret “spin” in quantum physics as emotional bias, or confuse “bandgap” in semiconductors with economic inequality metrics. Effective models must be co-developed by intelligence practitioners and computational specialists, anchored in the semantics of specific fields. This means building specialized corpora—annotated datasets where, for example, sentences are labeled not just for sentiment, but for technical stance: “claims advancement,” “challenges methodology,” “confirms prior finding,” “speculates on scalability.” It means adapting event extraction to recognize scientific milestones—not merely publication dates, but first successful cryogenic test, prototype field trial, or standardization committee adoption. It means developing “emergence detectors” that combine bibliometric burst detection with altmetric resonance, funding trajectory shifts, and hiring patterns at elite labs.

One compelling prototype described in the paper is a hybrid technology foresight system. Traditional horizon-scanning relies heavily on expert panels—a method vulnerable to groupthink, recency bias, and blind spots shaped by disciplinary silos. Luo’s team tested an alternative: begin with data-driven candidate generation, using metrics like citation acceleration, cross-disciplinary borrowing, patent-family expansion, and novelty of keyword co-occurrence clusters. From millions of candidates, the algorithm surfaces a shortlist of emerging domains—say, neuromorphic photonic chips or autonomous biofoundries. Analysts then intervene—not to start from scratch, but to calibrate. They adjust weights (e.g., downplaying metrics inflated by hype cycles), inject contextual filters (e.g., “exclude fields where China holds >80% of core IP”), and probe edge cases. The output isn’t a ranked list, but a structured landscape: technologies mapped along axes of maturity, strategic relevance, dual-use potential, and supply-chain vulnerability—with uncertainty bands derived from data variability, not just analyst confidence.

Perhaps the most consequential shift, however, lies in redefining the human role. The paper explicitly rejects full automation. Instead, it proposes a division of cognitive labor: machines handle pattern saturation—sifting petabytes to surface anomalies, correlations, and discontinuities invisible to individual eyes; humans handle sensemaking—framing questions, interrogating assumptions, weighing second- and third-order consequences, and communicating insights to decision-makers under time pressure and uncertainty.

This synergy manifests in three operational modes. The first is hypothesis-driven exploration: an analyst posits, “Country X may be pivoting from lithium-ion to solid-state batteries for naval applications,” and the system rapidly assembles evidence fragments—procurement notices mentioning energy density thresholds, hires of solid-electrolyte specialists at state-owned shipbuilders, silent periods in patent filings followed by sudden international filings. The second is serendipitous alerting: the system detects an unexplained cluster of co-authorships between previously unconnected labs in fields as disparate as metamaterials and low-observable coatings—and pushes it to analysts with the prompt: “Possible stealth materials convergence?” The third is deliberative refinement: during a high-stakes assessment, analysts iteratively tweak analysis parameters—“What if we weight Chinese conference papers more heavily?” or “Exclude any finding not replicated by ≥2 independent teams?”—and observe how conclusions stabilize or shift.

Critically, Luo insists that expert knowledge itself must be operationalized. Too often, tacit insights reside only in individuals’ minds—lost when personnel rotate, or diluted in verbal handovers. The data-intensive paradigm demands making expertise computable: encoding analysts’ mental models as decision rules, capturing their evidentiary thresholds in confidence calculators, and preserving their interpretive frameworks as reusable annotation schemas. One implementation described involves “dynamic editorial boards”—small, rotating teams of subject-matter experts who continuously curate domain-specific knowledge bases, not through monolithic reports, but via micro-contributions: validating a disputed parameter, flagging a methodological red flag in a new paper, or linking a foreign lab’s output to an obscure policy directive. Their inputs feed back into the AI models, creating a virtuous cycle of refinement.

The stakes could hardly be higher. As the U.S. National Security Strategy notes, modern weapons systems are, at their core, data systems—dependent on advances in materials, AI, sensing, and power that originate in civilian research ecosystems. A delay of months—not years—in recognizing a critical breakthrough can cede strategic advantage. Conversely, false alarms triggered by hype cycles waste resources and erode credibility. This is where the data-intensive approach offers distinct advantages: scalability (monitoring thousands of labs simultaneously), traceability (every conclusion links to underlying evidence), adaptability (models recalibrate as new data streams arrive), and transparency (analysts can interrogate why a signal was flagged—not as a black-box score, but as a constellation of specific indicators).

Early implementations show promise. In one case, a prototype system monitoring global quantum computing research identified a subtle but consistent pattern: a leading academic group in Europe had begun citing a previously obscure Chinese preprint on error-correction algorithms—weeks before any Western lab acknowledged it. That signal, when layered with procurement data showing new cryogenic equipment orders and satellite imagery of lab expansions, triggered a deeper investigation that revealed a previously unknown collaboration channel. In another, a “fragmented intelligence reconstruction” tool helped analysts piece together disparate clues—a conference abstract, a venture capital filing, a job posting—to confirm the existence of a classified hypersonics test program months ahead of official disclosure.

Of course, challenges remain. Data quality varies wildly: open-access papers coexist with paywalled journals, corporate white papers blur into marketing, and adversarial actors seed disinformation into technical channels. Model bias is a persistent risk—especially when training data underrepresents non-Western research outputs. And perhaps most fundamentally, cultural resistance endures: some veteran analysts distrust “black-box” recommendations, while data scientists may underestimate the nuance required in strategic assessment.

Luo’s response is pragmatic: start small, demonstrate value, and iterate. Focus first on low-risk, high-leverage use cases—like tracking technology migration across sectors (e.g., how lidar developed for autonomous cars is being adapted for battlefield surveillance), or identifying “quiet leaders” in niche domains where publication volume is low but impact is high. Build tools that augment, not overwhelm—offering analysts control over thresholds, explainable rationales for alerts, and seamless pathways to dive into source material. Most importantly, treat the paradigm not as a fixed destination, but as a living practice—one refined through constant dialogue between those who understand the science, those who understand the threats, and those who understand the algorithms.

The broader implication extends beyond national security. As scientific progress becomes more networked, interdisciplinary, and opaque, the ability to navigate complexity matters for corporate R&D strategists, venture investors, science policymakers, and even journalists covering innovation. The principles Luo outlines—a commitment to structured knowledge, reusable workflows, domain-aware AI, and human-centered augmentation—offer a blueprint for any field where insight must outpace information overload.

We may be entering a new phase of the information age—not defined by more data, but by smarter data stewardship. In this landscape, the most valuable asset won’t be raw computational power or exclusive datasets, but the capacity to orchestrate machines and minds in pursuit of foresight. That’s the promise of the data-intensive paradigm: not just faster intelligence, but wiser intelligence—grounded in evidence, sharpened by critique, and aimed at the long view.

Luo Wei, Military Science Information Research Center, Academy of Military Sciences, Beijing. Journal of Information Resources Management, 2021, 11(2): 12–15. DOI: 10.13365/j.jirm.2021.02.012