Drowning in Discoveries? How LLMs (and Morphik) Are Learning to Read Science
EntityExtractionExample
), and apply it identically to GPT-4o, Gemini 2 Flash, and Llama 3.2 (GraphPromptOverrides
). This consistency is vital for a fair comparison. (Illustrative prompt setup below)db.create_graph
. This streamlined the generation of comparable knowledge graph snippets from each model.text-embedding-3-small
model. We compared the meaning of extracted entities and relations against the ground truth, considering a match if the cosine similarity score was 0.80 or higher. This acknowledges variations in wording while focusing on conceptual accuracy. Morphik’s structured output made it straightforward to feed data into our evaluation script.Model | Task | Precision | Recall | F1 Score |
---|---|---|---|---|
GPT-4o | Entity Extraction | 0.797 | 0.958 | 0.870 |
GPT-4o | Relation Extraction | 0.337 | 0.236 | 0.278 |
Gemini 2 Flash | Entity Extraction | 0.758 | 0.944 | 0.841 |
Gemini 2 Flash | Relation Extraction | 0.398 | 0.331 | 0.362 |
Llama 3.2 (3B) | Entity Extraction | 0.801 | 0.812 | 0.807 |
Llama 3.2 (3B) | Relation Extraction | 0.300 | 0.162 | 0.211 |