Climate Change Tweets Dataset: 21.5 Million Tweets with Stance (believer/denier) Annotations (2017–2019)
Abstract
This dataset contains tweet IDs and stance annotations for 21,499,785 English-language tweets related to climate change, collected between September 2017 and May 2019. Due to Twitter/X Terms of Service, only tweet IDs and annotation labels are provided; researchers wishing to use this dataset must hydrate the tweet IDs using the Twitter/X API (or compatible tools such as Hydrator or twarc) to obtain the full tweet content and metadata.Data collection. Tweets were collected from the Twitter streaming and search APIs using the following climate-related keywords and hashtags: climatechange, #climatechangeisreal, #actonclimate, #globalwarming, #climatechangehoax, #climatedeniers, #climatechangeisfalse, #globalwarminghoax, #climatechangenotreal, climate change, global warming, climate hoax. The collection spans 20 months (September 2017 – May 2019), covering notable events including COP23 (November 2017), COP24 (December 2018), and the emergence of the Fridays for Future / climate strike movement (early 2019). All tweets are in English.Annotation. A subset of 2,373,712 tweets (11.04% of the corpus) was annotated for stance toward climate change by 12 independent human annotators. Each annotator assigned a score on a scale from 1 (strong denial of climate change) to 9 (strong belief in climate change). The final numeric score (believer_num) is the average across all annotators for a given tweet. The categorical label (believer) was derived by rounding the average score and applying the following thresholds: round(believer_num) ≤ 4 → "denier"; round(believer_num) = 5 → "neutral"; round(believer_num) ≥ 6 → "believer".Label distribution (annotated subset). Among the 2,373,712 labelled tweets: 2,076,428 (87.48%) were classified as "believer," 205,479 (8.66%) as "denier," and 91,805 (3.87%) as "neutral." The median believer_num score is 8.0, and the mean is 7.57, reflecting a strong skew toward climate-affirming content in the collected corpus.Temporal coverage. Tweet volumes vary by month, with peaks in December 2017 (2.23M tweets), December 2018 (2.84M), April 2019 (2.04M), and May 2019 (2.40M). Some months (July 2018, October 2018, February–March 2019) are absent from the dataset.Files provided.climate_tweets_ids_labels_ALL.csv — 21,499,785 rows containing tweet ID, believer_num (float, NaN if unlabelled), and believer (categorical label, NaN if unlabelled). File size: ~465 MB.climate_tweets_ids_labels_LABELLED.csv — 2,373,712 rows containing only tweets that received stance annotations. File size: ~73 MB.Both files use CSV format with three columns: id (int64, Twitter status ID), believer_num (float64, average annotator score on a 1–9 scale), and believer (string: "believer", "denier", or "neutral").