GBOL Data Transfer

cronjob

The transfer is executed every night by user gbol-dev, there are two different scripts, one for gathering BOLD data first and one for doing the transfer without querying the BOLD data from boldsystems.com. The first is executed on sunday and thursday night, the second at the other weekdays.

The cronjob needs: a .my.cnf file with mysql credentials, the scripts transfer2portal_with_bold.sh and transfer2portal_without_bold.sh

BOLD

For the BOLD people to get our data, the following receipt has been passed to Dirk Steinke () and Claudia Steinke ():

hier die gewünchten Zugänge zu dem Access Point für die GBOL-Arachniden. Es ist eine BioCASE Installation (www.biocase.org), die Daten sind auf den ABCD-GGBN Standard gemappt (http://data.ggbn.org/schemas/GGBN/terms/ABCDGGBN.xsd). Die URL zu dem mehr oder weniger bequemen Abfrage-Interface ist:

http://biocase.zfmk.de/utilities/queryforms/qf_manual.cgi?dsa=ZFMK-GBOL-v2-bold.

Beispiele für Abfragen:

Scan for all projects:

<?xml version='1.0' encoding='UTF-8'?>
<request
xmlns='http://www.biocase.org/schemas/protocol/1.3'>
  <header><type>scan</type></header>
  <scan>
    <requestFormat>http://www.tdwg.org/schemas/abcd/2.1</requestFormat>
<concept>/DataSets/DataSet/Metadata/Description/Representation/Title</concept>
      <filter>
<like
path='/DataSets/DataSet/Metadata/Description/Representation/Title'>*</like>
      </filter>
  </scan>
</request>

Get all Entries for a specific Project:

<?xml version='1.0' encoding='UTF-8'?>
<request
xmlns='http://www.biocase.org/schemas/protocol/1.3'>
  <header><type>search</type></header>
  <search>
    <requestFormat>http://www.tdwg.org/schemas/abcd/2.1</requestFormat>
    <responseFormat start='0' limit='3000'>http://www.tdwg.org/schemas/abcd/2.1</responseFormat>
      <filter>
<equals path='/DataSets/DataSet/Metadata/Description/Representation/Title'>GBAra</equals>
      </filter>
      <count>false</count>
  </search>
</request>

In order to get the number of available entries set count to true.

Über responseFormat start='0' limit='3000' lässt sich ein paging durchführen: start=0, limit=999, dann start=1000, limit=1999, etc.

Du kannst aber auch wget oder andere Kommandozeilen-Tools nutzen, um die Daten abzufragen.

Beispiel:

wget –output-document=response.xml –save-headers –ignore-length –post-file=scan_projects.txt http://biocase.zfmk.de/pywrapper.cgi?dsa=ZFMK-GBOL-v2-bold

Die Datei „scan_projects.txt“ hat folgenden Inhalt:

query=<?xml version='1.0' encoding='UTF-8'?>
<request xmlns='http://www.biocase.org/schemas/protocol/1.3'>
    <header><type>scan</type></header>
    <scan>
        <requestFormat>http://www.tdwg.org/schemas/abcd/2.1</requestFormat>
        <concept>/DataSets/DataSet/Metadata/Description/Representation/Title</concept>
        <filter>
            <like path='/DataSets/DataSet/Metadata/Description/Representation/Title'>*</like>
        </filter>
    </scan>
</request>