Software Developer (m/f/d) job offer at @tibhannover working on building a knowedge graph for UN IPCC Reports https://www.tib.eu/en/tib/careers-and-apprenticeships/vacancies/details/stellenausschreibung-nr-20-2025 deadline April 20 #FediHire #fedijobs #getfedihired #ClimateKG #semanticClimate - Here's a breakdown of what we'll be doing over next year - 1/9
2/9 First, the foundation: Converting PDFs to semantified HTML with IDs on GitHub. For #openaccess content this has had several passes already by team #semanticClimate - Status #alpha - https://github.com/petermr/amilib/tree/pmr_aug/test/resources/ipcc/cleaned_content the turning burgers back into cows process is in an advanced state!
3/9 Next is a Report top level data structure: Table of Contents (Visualisation) of AR6 70 chapters. Status #Alpha - the chapters of Sixth Assessment Report — IPCC. Using #graphvis in Python the chapters and their connections are automatically mapped #semanticClimate #ClimateKG https://github.com/semanticClimate/internship_sC/tree/MEBIN/TOC The TOC structure is designed to provide a clear and hierarchical representation of report sections, including chapters, glossaries, supplemental materials, and cross-chapter references.
4/9 IPCC Glossary semantification and enhancement. The 800+ terms from AR5 and AR6 have had several experimental rounds carried out: Screenscraped to GitHub, loaded into a Wikibase install, demonstrated as a index to tag documents, rendered out to Paged Media CSS with Wikipedia enhancements, and translation to Hindi. #ClimateKG #semanticClimate - In #wikibase https://climatekg.semanticclimate.net/index.php?title=IPCC_Begriffe as Paged Media CSS using #vivliostyle https://vivliostyle.vercel.app/#src=https://raw.githubusercontent.com/semanticClimate/glossary-demo/main/ipccglossary.jsonld
5/9 Data relationship model: Status Pre-alpha - This will be the core of 2025 work. The content has been preped, the #Wikibase infra is ready. Plans are to #KISS and build up a simple framework for the #climate community to connect to. Experiments have been made on #Wikibasecloud https://kg-ipclimatec-reports.wikibase.cloud/wiki/Main_Page - thanks to Egon Willighagen, Lars Willighagen
#ClimateKG #semanticClimate
6/9 knowledge graph for the IPCC Reports is designed with for derivative publishing. Searches can be saved, shared and outputted as multi-format: LOD, Web, PDF, etc. Status: Production - already in place and working - with pipeline options available - based on #CPS Computational Publishing Service of @NFDI4Culture https://nfdi4culture.de/services/details/computational-publishing-service.html #Wikibase is used as a datastore #Juypternotebooks for composing and @quarto_pub #vivliosytle for rendering -pixel perfect replacements for the dreaded PDF!
7/9 #ClimateKG foundational software: Dictionaries, machine learning, etc. This all comes from team #semanticClimate. Status: Production. Here is a live demo that explains the tool chain of https://colab.research.google.com/github/semanticClimate/sC-tools-demo/blob/main/TTWW_demo_sC_tools.ipynb for machine learning and #TBL 5*-ification of data - shout out to the developers DEVELOPERS: 1. pygetpapers - Ayush Garg 2. amilib - Peter Murray-Rust 2. docanalysis - Shweata N Hegde 2. Jupyter notebooks - Parijat Bhadra, Renu Kumari
8/9 The 'Goal': Dissemination of IPCC Reports and #climatechange lit using #semantification as a basis. #ClimateKG is aimed at being a service for indexing and publishing. Status: Prototype (PoC) #1: 'IPCC Reports and City Climate Change Plans: Proof of concept prototype - Open Climate Reader' https://semanticclimate.github.io/city-open-climate-reader/ - Currently interfaces are for search and #RAG #LLM - all built on #openscience based #digitalsoveriegn tech. The goal sits off in future, over the horizon - but its we're going
9/9 How did we get here. #ClimateKG comes out of the five year old #semanticClimate (#sC) open research group founded by Dr. Gitanjali Yadav of the National Institute of Plant Genome Research (NIPGR), Delhi, Dr Peter Murray-Rust of Cambridge University, and Simon Worthington. We works on software tool development for semantic enrichment. #semanticClimate is active on a daily basis as a community and NIPGR supports an internship programme, hackathons, & outreach. Web: https://semanticclimate.github.io/