RANKWITHME.AI

You already have the answers. We help the internet find them.

Structure before ads — your business, clearly defined, permanently visible

RESEARCH — 2026.03.03 WISE · RUBAIAT & JAMIL · 2025
Knowledge Extraction LLM Architecture Linked Data University of Idaho IEEE · arXiv:2506.17580 cs.IR · cs.AI · cs.DL
How LLMs Actually Process Knowledge — And What a New Paper on Scientific Knowledge Extraction Taught Us
A reading of WISE: Workflow for Intelligent Scientific Knowledge Extraction by Sajratul Yakin Rubaiat and Hasan M. Jamil, University of Idaho (2025)
arXiv:2506.17580 — Context-Aware Scientific Knowledge Extraction on Linked Open Data using Large Language Models Read Paper →
WHY WE READ THIS PAPER 01 / 05

Every morning we sit down with a piece of research that connects to the problem we are working on. Today's paper is one of the most resonant we have encountered.

WISE — Workflow for Intelligent Scientific Knowledge Extraction — is a system built by researchers at the University of Idaho to address a specific challenge: how do you extract deep, accurate, non-redundant knowledge from an exponentially expanding web of interconnected sources, when the tools available — search engines, general-purpose LLMs — keep falling short at the edges of complex queries?

We are grateful this paper exists. Not because it validates anything we are doing, but because it illuminates the same class of problem from a completely different angle — scientific literature extraction — and does so with rigor, transparency, and results that speak for themselves. Rubaiat and Jamil published their methodology openly. They showed their math. They ran honest comparisons against established baselines and reported what they found without inflation. That is exactly the kind of work the research community needs more of, and exactly the kind of work we want to learn from.

We work on federated knowledge graph architecture for the American business economy, public law, and civic infrastructure. The problems we encounter are not the same as theirs. But reading their work carefully, we found ourselves recognizing the shape of familiar challenges described in a new language. That kind of cross-domain resonance is rare and worth paying attention to.

THE PROBLEM THEY SET OUT TO SOLVE 02 / 05

Start with a single gene. HBB. One authoritative source knows about it. That source links to 24 others. One of those links to hundreds more. Within a few traversal steps you have an exponentially expanding tree of interconnected knowledge — and no intelligent way to navigate it without either drowning in volume or stopping too early and missing what matters most.

Traditional search engines return a list and step back. The researcher does the rest manually.

General-purpose LLMs offer synthesized answers — but those answers are constrained by context window. Even the most capable models available can only hold so much simultaneously. The researchers found that GPT-4o, with its 128,000 token context window, could realistically process around eight sources at once before running out of room. This is not a criticism of any model. It is an honest accounting of a structural ceiling that affects every LLM-based retrieval system operating at scale.

The result, in domains where completeness matters — medicine, biology, materials science, social research — is answers that are confident but incomplete. The rare condition gets missed. The edge case goes unrecorded. The nuanced connection between distant sources never gets made.

WISE was built to do what a skilled expert researcher actually does: identify the most valuable leads, discard what is already understood, follow depth where it is warranted, stop when further exploration stops returning meaningful new knowledge, and surface something genuinely comprehensive at the end.

HOW WISE WORKS — THE FOUR OPERATIONS 03 / 05

WISE operates as a tree. A query is submitted. An initial set of sources is retrieved. Then four operations run recursively, layer by layer, until a stopping condition is met.

Filtering
Before any scoring happens, WISE strips each source down to only what is relevant to the query. A source that is 8,000 words in raw form might yield 350 words of query-relevant content after filtering. The researchers found an average reduction of over 80 percent across all sources. This is not information loss. It is signal isolation. Everything downstream operates on cleaner material as a result.
Scoring
After filtering, each source receives a score based on one question: how much does this source add to what has already been collected? Not how popular the source is. Not how authoritative its domain. Purely: what unique knowledge does it contribute relative to the growing container of everything already gathered? The formula combines local contribution relative to the source's own size and global contribution relative to the full knowledge container — dampened logarithmically to keep scoring meaningful across all traversal layers.
Threshold Check
When the highest-scoring available source can no longer clear a minimum threshold, the system terminates. Diminishing returns have been reached. Further traversal would consume compute and return noise. The researchers set their threshold empirically at T = 20, watching where score curves flattened across successive layers.
Consolidation
At each layer, filtered content from the highest-scoring sources gets merged into the growing knowledge container. This container — dense, deduplicated, query-specific — is what the system returns at the end. An LLM fuses it into a coherent final answer. The result is comprehensive and non-redundant by design.
SystemDiseases FoundRecallRare Conditions
WISE160.84
ChatGPT (GPT-4o)90.47
ChatGPT with Search70.36
Google Search30.15
Gemini20.10

WISE's output also scored lower on ROUGE and BLEU overlap metrics than all other systems — meaning it wasn't just finding more, it was finding genuinely different and unique information that no other system surfaced.

WHAT THIS PAPER ILLUMINATED FOR US 04 / 05

We work on a different problem in a different context. But reading this paper carefully, several things came into focus that we want to share honestly.

The challenge of traversing an interconnected knowledge graph without drowning in redundancy is not specific to scientific literature. Any system that needs to reason across a large, heterogeneous corpus of structured entities faces the same fundamental tension: breadth versus depth, volume versus signal, traversal cost versus completeness.

What WISE demonstrates clearly is that the relationship between pieces of knowledge matters as much as the pieces themselves. A system that scores sources by unique contribution — that actively measures what each new source adds relative to everything already known — produces dramatically better results than a system that ranks by popularity or authority alone. The graph structure is not just a storage mechanism. It is a reasoning surface. The edges between entities carry meaning that the entities themselves cannot carry alone.

The researchers also identify knowledge graph integration as a direction they want to explore further — moving from a text-based knowledge container toward a node-and-edge representation where relationships are preserved explicitly rather than merged into accumulating text. Their preliminary experiments — representing the HBB gene entry as a graph of 56 nodes and 55 edges and filtering it to an 11-node, 16-edge subgraph aligned with a specific query — showed that structured representations can preserve relational meaning that text containers lose.

We find this direction genuinely exciting and we look forward to seeing where their future work goes.

WHY WE PUBLISH THESE READINGS 05 / 05

RankWithMe.ai is a learning resource. We are building the most machine-readable map of the American business economy that has ever existed, and we are doing it openly — publishing our research, our methodology, our specification, and our thinking as we go.

Part of that commitment is honest engagement with the research community. Papers like this one represent years of careful work by researchers who deserve to be read, cited, and built upon. We read them, we share what we learn, and we point anyone who finds this useful back to the primary source.

If you work in knowledge graph architecture, LLM-based retrieval, linked data systems, or any adjacent field — this paper is worth your time. The full text is available on arXiv. The researchers are at the University of Idaho, Department of Computer Science. Their work is real, their results are independently verifiable, and their methodology is described with enough clarity to build on.

We are grateful they published it.

Read the full paper — arXiv:2506.17580 arXiv →
REFERENCES PRIMARY SOURCES
[1]
Sedlakova, J. et al. — Challenges and best practices for digital unstructured data enrichment in health research: A systematic narrative review. PLOS Digital Health, vol. 2, no. 10, pp. 1–22, 2023. doi.org/10.1371/journal.pdig.0000347
[2]
Luo, J., Wu, M., Gopukumar, D., & Zhao, Y. — Big data application in biomedical research and health care: A literature review. Biomedical Informatics Insights, vol. 8, 2016. doi.org/10.4137/BII.S31559
[3]
Zhai, C. — Large language models and future of information retrieval: Opportunities and challenges. Proceedings of the 47th International ACM SIGIR Conference, 2024, pp. 481–490.
[4]
Salemi, A. & Zamani, H. — Towards a search engine for machines: Unified ranking for multiple retrieval-augmented large language models. SIGIR '24, ACM, 2024, pp. 741–751. doi.org/10.1145/3626772.3657733
[5]
Ziems, N., Yu, W., Zhang, Z., & Jiang, M. — Large language models are built-in autoregressive search engines. arXiv preprint, 2023. arXiv:2305.09612
[6]
HUGO Gene Nomenclature Committee (HGNC) — HBB Gene Symbol Report, 2024. Root source node used in WISE traversal experiment. genenames.org — HGNC:4827
[7]
Clinical Genome Resource (ClinGen) — HBB Gene, Clinical Genome Knowledge Base, 2024. clinicalgenome.org — HGNC:4827
[8]
National Center for Biotechnology Information (NCBI) — Sickle Cell Anemia, NCBI Bookshelf, 2024. ncbi.nlm.nih.gov/books/NBK1435
[9]
Garcia, G.L. et al. — A review on scientific knowledge extraction using large language models in biomedical sciences, 2024. arXiv:2412.03531
[10]
Alshami, A. et al. — Harnessing the power of ChatGPT for automating systematic review process: Methodology, case study, limitations, and future directions. Systems, vol. 11, no. 7, 2023. mdpi.com/2079-8954/11/7/351
[11]
Saxena, S. et al. — Large-Scale Knowledge Synthesis and Complex Information Retrieval from Biomedical Documents. IEEE International Conference on Big Data, 2022, pp. 2364–2369. doi.ieeecomputersociety.org
[12]
Sneyd, A. & Stevenson, M. — Modelling stopping criteria for search results using Poisson processes. EMNLP-IJCNLP 2019, Association for Computational Linguistics, pp. 3484–3489. aclanthology.org/D19-1351
[13]
Parmentier, M. & Legay, A. — Adaptive stopping algorithms based on concentration inequalities. AISoLA 2024, Springer-Verlag, 2025, pp. 336–353. doi.org/10.1007/978-3-031-75434-0_23
[14]
Vaswani, A. et al. — Attention is all you need, 2023. The foundational transformer architecture paper underlying modern LLMs. arXiv:1706.03762
[15]
OpenAI — GPT-4o, 2024. Context window of up to 128,000 tokens. Baseline system used in WISE comparative evaluation. platform.openai.com/docs/models
[16]
UniProt Consortium — Hemoglobin subunit beta (HBB), 2024. Primary test source in WISE filtering experiment — 8,249 words reduced to 355 after query-specific filtering. uniprot.org/uniprotkb/P68871
[17]
Han, J. — Mining knowledge at multiple concept levels. Proceedings of the 4th International Conference on Information and Knowledge Management, 1995, pp. 19–24.
[18]
Sharma, D. et al. — A brief review on search engine optimization. 9th International Conference on Cloud Computing, Data Science & Engineering (Confluence), IEEE, 2019, pp. 687–692.
[19]
Jumper, J. et al. — Highly accurate protein structure prediction with AlphaFold. Nature, vol. 596, no. 7873, pp. 583–589, 2021. Referenced in WISE scoring experiments as high-authority source.
[20]
Shahriar, S. et al. — Putting GPT-4o to the sword: A comprehensive evaluation of language, vision, speech, and multimodal proficiency. Applied Sciences, vol. 14, no. 17, 2024. mdpi.com/2076-3417/14/17/7782
[21]
OpenAI — Introducing ChatGPT Search, 2024. openai.com/index/introducing-chatgpt-search
[22]
Sun, W. et al. — Is ChatGPT good at search? Investigating large language models as re-ranking agents. arXiv preprint, 2023. arXiv:2304.09542
[23]
Gemini Team, Google — Gemini: A family of highly capable multimodal models. arXiv preprint, 2023. arXiv:2312.11805
[24]
Piasecki, J., Waligora, M., & Dranseika, V. — Google search as an additional source in systematic reviews. Science and Engineering Ethics, vol. 24, pp. 809–810, 2018.
[25]
Cilibrasi, R.L. & Vitanyi, P.M. — The Google similarity distance. IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 3, pp. 370–383, 2007.
[26]
Lin, C. — ROUGE: Recall-Oriented Understudy for Gisting Evaluation, 2005. Evaluation metric used in WISE comparative analysis.
[27]
Papineni, K., Roukos, S., Ward, T., & Zhu, W.J. — BLEU: A method for automatic evaluation of machine translation. Proceedings of the 40th Annual Meeting of the ACL, 2002, pp. 311–318. Evaluation metric used alongside ROUGE in WISE analysis.
[28]
Gill, J.K. et al. — Large language model based framework for automated extraction of genetic interactions from unstructured data. PLOS ONE, vol. 19, no. 5, 2024. doi.org/10.1371/journal.pone.0303231
[29]
Ahmed, M. et al. — Identifying protein-protein interaction using tree LSTM and structured attention. IEEE 13th International Conference on Semantic Computing (ICSC), 2019, pp. 224–231.
[30]
Zhang, Y. et al. — Neural network-based approaches for biomedical relation classification: A review. Journal of Biomedical Informatics, vol. 99, 2019. doi.org/10.1016/j.jbi.2019.103294
[31]
Lee, J. et al. — BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, vol. 36, no. 4, pp. 1234–1240, 2020.
[32]
Zhou, D., Zhong, D., & He, Y. — Biomedical relation extraction: From binary to complex. Computational and Mathematical Methods in Medicine, vol. 2014. onlinelibrary.wiley.com
[33]
Sarmah, B. et al. — HybridRAG: Integrating knowledge graphs and vector retrieval augmented generation for efficient information extraction. Proceedings of the 5th ACM International Conference on AI in Finance, 2024, pp. 608–616.
[34]
Buehler, M.J. — Accelerating scientific discovery with generative knowledge extraction, graph-based representation, and multimodal intelligent graph reasoning. Machine Learning: Science and Technology, vol. 5, no. 3, 2024. dx.doi.org/10.1088/2632-2153/ad7228
[35]
Zuluaga, M. et al. — Tree of Science (ToS): A web-based tool for scientific literature recommendation. Search less, research more. Issues in Science and Technology Librarianship, no. 100, 2022. journals.library.ualberta.ca
[36]
Huang, W., Zhao, X., & Huang, X. — Embedding and extraction of knowledge in tree ensemble classifiers. Machine Learning, vol. 111, no. 5, pp. 1925–1958, 2022. doi.org/10.1007/s10994-021-06068-6
[37]
Yu, S. et al. — Multi-source knowledge pruning for retrieval-augmented generation: A benchmark and empirical study, 2024. arXiv:2409.13694
[38]
Fan, S. et al. — WorkflowLLM: Enhancing workflow orchestration capability of large language models, 2024. arXiv:2411.05451
[39]
Hong, Z. et al. — Challenges and advances in information extraction from scientific literature: A review. JOM, vol. 73, no. 11, pp. 3383–3400, 2021. doi.org/10.1007/s11837-021-04902-9
[40]
Xie, T. et al. — ByteScience: Bridging unstructured scientific literature and structured data with auto fine-tuned large language model in token granularity, 2024. arXiv:2411.12000
Funding acknowledgment: This research was supported in part by NIH IDeA grant P20GM103408, NSF CSSI grant OAC 2410668, and US Department of Energy grant DE-0011014.
RankWithMe.ai logo
SYSTEM STATUS
Page:RESEARCH
Date:2026.03.03
Paper:WISE
Authors:RUBAIAT · JAMIL
Institution:U OF IDAHO
Domain:cs.IR · cs.AI
arXiv:2506.17580
Recall (WISE):0.84
Baseline Top:0.47
Text Reduction:80%+
JSON-LD:PENDING
Root-LD:PENDING
Human Verified:TRUE
↑↓ : Scroll ENTER : Select ESC : Exit
Build: 2026-PROD Method: ENTITY-FIRST Status: OPERATIONAL
Structure before ads. Always.