A novel medical large language model family with 13/70B parameters, which have SOTA performances on various medical tasks
-
Updated
Jan 15, 2025 - Python
A novel medical large language model family with 13/70B parameters, which have SOTA performances on various medical tasks
Cross-type Biomedical Named Entity Recognition with Deep Multi-task Learning (Bioinformatics'19)
Bioformer: an efficient BERT model for biomedical text mining
[EMNLP 2024] This is the code for our paper "BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers".
A PMC ID in. Clean, loss-aware article JSON out. Parse PubMed Central and JATS XML for biomedical AI, RAG, search, and literature pipelines.
This repository contains the code used for distillation and fine-tuning of compact biomedical transformers that have been introduced in the paper "On The Effectiveness of Compact Biomedical Transformers"
Systematic evaluation of hallucination risks in Large Language Models (GPT-4, Claude 3, Gemini Pro) for clinical proteomics and mass spectrometry interpretation. Production-ready detection framework with comprehensive benchmarks.
Graph-based RAG system for biomedical nutrigenetic knowledge discovery. Enables natural language queries on gene-nutrient interactions, supports personalized nutrition counseling, and runs 100% locally with Ollama LLMs and SBERT embeddings.
BERT-for-BioNLP-OST2019-AGAC-Task2
RAG pipeline for medical question-answering. Fuses lexical and dense retrieval (MedCPT, Contriever, Specter + FAISS) with OpenAI, Gemini, and HuggingFace LLMs. Supports iterative multi-round reasoning, strict typing, structured observability, and a clean layered architecture
AGAC-BioNL-OST2009-Task1 BERT+CRF
Implements relation extraction for biomedical texts using Hard Negative Mining to improve accuracy in identifying complex entity relationships. Includes code for data processing, training, and evaluation with BioC-format datasets.
SOEA-Plus (PDEMC): 3-task biomedical metacognition benchmark evaluating LLM metacognitive control across 2 frontier models on 300 real PubMed examples. Reveals the Control Collapse Gap
Cancer-Alterome is a comprehensive and curated dataset that focuses on the investigation of regulatory events caused by gene alteration in the context of cancer.
Clinical trial document intelligence pipelines using medallion architecture. Classification (87 categories) + NER (8 entity types) on Databricks.
BioGemma β Google Gemma 3 1B fine-tuned on medical/biomedical corpus for clinical NLP tasks
Add a description, image, and links to the biomedical-nlp topic page so that developers can more easily learn about it.
To associate your repository with the biomedical-nlp topic, visit your repo's landing page and select "manage topics."