ORKG Properties and LLM-Generated Research Dimensions Evaluation Dataset

doi:doi:10.25835/6oyn9d1n

ORKG Properties and LLM-Generated Research Dimensions Evaluation Dataset

This dataset contains a collection of 103 research comparisons from the Open Research Knowledge Graph (ORKG) with annotated properties and corresponding research dimensions generated by three different Large Language Models (LLMs). The dataset includes 1,317 papers from 35 diverse research fields, addressing 153 distinct research problems. Each paper is associated with human-annotated ORKG properties, as well as research dimensions generated by GPT-3.5, Llama 2, and Mistral LLMs. The dataset provides a comprehensive evaluation benchmark for comparing the performance of different LLMs in generating research dimensions that align with human-annotated properties.

Dataset columns:

comparison_id: Unique identifier of the research comparison in the Open Research Knowledge Graph (ORKG)
contribution_id: Identifier of the individual research contribution (paper) within a comparison
paper_id: Unique identifier of the research paper
paper_title: Title of the research paper
research_field: Field of research associated with the paper
research_problem: Specific research problem addressed by the paper
orkg_properties: Human-annotated properties of the paper in the ORKG, representing specific attributes or characteristics of the research contribution
gpt_dimensions: Research dimensions generated by the GPT Large Language Model (LLM) for the paper
mistral_dimensions: Research dimensions generated by the Mistral LLM for the paper
llama2_dimensions: Research dimensions generated by the Llama2 LLM for the paper
mappings: Mapping of ORKG properties to LLM-generated research dimensions
alignments: Alignment scores between ORKG properties and LLM-generated research dimensions
deviations: Deviation scores between ORKG properties and LLM-generated research dimensions
orkg_gpt_similarity: Cosine similarity score between the embeddings of ORKG properties and GPT-generated research dimensions
orkg_llama2_similarity: Cosine similarity score between the embeddings of ORKG properties and Llama2-generated research dimensions
orkg_mistral_similarity: Cosine similarity score between the embeddings of ORKG properties and Mistral-generated research dimensions
gpt_llama2_similarity: Cosine similarity score between the embeddings of GPT-generated and Llama2-generated research dimensions
gpt_mistral_similarity: Cosine similarity score between the embeddings of GPT-generated and Mistral-generated research dimensions
llama2_mistral_similarity: Cosine similarity score between the embeddings of Llama2-generated and Mistral-generated research dimensions

Data and Resources

ORKG_properties_LLM_dimensions_dataset.csvCSV
File size: 455.6 KByte
Explore
- Preview
- Download

Cite this as

Vladyslav Nechakhin, Jennifer D’Souza (2024). ORKG Properties and LLM-Generated Research Dimensions Evaluation Dataset [Data set]. LUIS. https://doi.org/10.25835/6oyn9d1n

Retrieved: June 06, 2024, 10:19 AM (UTC+0200)

BibTeX

Additional Info

Field	Value
Author	Vladyslav Nechakhin, Jennifer D’Souza
Maintainer	Vladyslav Nechakhin
Last Updated	April 29, 2024, 11:58 (+0200)
Created	April 29, 2024, 11:36 (+0200)
License	Creative Commons Attribution Share-Alike 3.0
Dataset Size	455.6 KByte