openbiblio.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
Der Einstieg in das Fediverse für Bibliotheksmenschen

Administered by:

Server stats:

597
active users

@kiru not on my actual TODO list but maybe next year for qa catalogue: is only a part of the ~225 million MARC records in K10plus Zentral: verbundwiki.gbv.de/display/VZG

verbundwiki.gbv.deK10plus-Zentral - VZG - Technische Informationen - GBV Verbund-Wiki

@nichtich Wow, I was not aware of it. 225 is quite a large number, it would worth to apply parallelisation with Spark.

@kiru Does QA Catalogue run with any parallelisation at all? The CPU cores in my VM were not saturated but I have not looked deeper.

@nichtich By dafault no. Some years ago I intensively worked with it, but there were lots of changes in the code, so now I am not sure if it still working. Here are the details: pkiraly.github.io/2018/01/18/m. The description mentions Hadoop and Spark, but Hadoop is not necessary.

pkiraly.github.ioRunning MARC21 analysis in Spark - Metadata Quality Assessment FrameworkMetadata Quality Assessment Framework
Jakob Voß

I don't know if Spark is required, this probably depends on the analysis task. I opened an issue on parallel execution in general: github.com/pkiraly/qa-catalogu

GitHubParallel processing · Issue #278 · pkiraly/qa-catalogueBy nichtich