Published February 7, 2023
| Version 1.1
Dataset
Restricted
PAN23 Profiling Cryptocurrency Influencers with Few-shot Learning
- 1. Symanto Research
- 2. Universitat Politècnica de València
Description
This is the dataset for the shared task on Profiling Cryptocurrency Influencers with Few-shot Learning. Please consult the task's page for further details on the format, the dataset's creation, and links to baselines and utility code.
Task: In this shared task we aim to profile cryptocurrency influencers in social media, from a low-resource perspective. Moreover, we propose to categorize other related aspects of the influencers, also using a low-resource setting. Specifically, we focus on English Twitter posts for three different sub-tasks:
- Low-resource influencer profiling (subtask1):
- Input:
32 users per label with a maximum of 10 English tweets each.
Classes: (1) null, (2) nano, (3) micro, (4) macro, (5) mega - Official evaluation metric: Macro F1
- Submission: TIRA.
- Baselines: User-character Logistic Regression; t5-large (bi-encoders) - zero shot [7], t5-large (label tuning) - few shot [7]
- Input:
- Low-resource influencer interest identification (subtask2):
- Input:
64 users per label with 1 English tweet each.
Classes: (1) technical information, (2) price update, (3) trading matters, (4) gaming, (5) other - Official evaluation metric: Macro F1
- Submission: TIRA.
- Baselines: User-character Logistic Regression; t5-large (bi-encoders) - zero shot [7], t5-large (label tuning) - few shot [7]
- Input:
- Low-resource influencer intent identification (subtask3):
- Input:
64 users per label with 1 English tweets each.
Classes: (1) subjective opinion, (2) financial information, (3) advertising, (4) announcement - Official evaluation metric: Macro F1
- Submission: TIRA.
- Baselines: User-character Logistic Regression; t5-large (bi-encoders) - zero shot [7], t5-large (label tuning) - few shot [7]
- Input:
Versioning:
- 1.0: initial upload
- 1.1 fixed a minor bug where some users contained some non-English text. Since English is the target language in the competition, all non-English texts have been replaced or removed.