Uploaded On |
May 4, 2012, 5:29 p.m. |
Uploaded By |
matts |
Status |
Production (approved on May 4, 2012, 5:29 p.m. by matts)
|
<map>
<entry>
<string>plugin_config_props</string>
<list>
<org.lockss.daemon.ConfigParamDescr>
<key>month</key>
<displayName>Month</displayName>
<description>Two digit month</description>
<type>2</type>
<size>2</size>
<definitional>true</definitional>
<defaultOnly>false</defaultOnly>
</org.lockss.daemon.ConfigParamDescr>
<org.lockss.daemon.ConfigParamDescr>
<key>base_url</key>
<displayName>Base URL</displayName>
<description>Usually of the form http://<journal-name>.com/</description>
<type>3</type>
<size>40</size>
<definitional>true</definitional>
<defaultOnly>false</defaultOnly>
</org.lockss.daemon.ConfigParamDescr>
<org.lockss.daemon.ConfigParamDescr>
<key>year</key>
<displayName>Year</displayName>
<description>Four digit year (e.g., 2004)</description>
<type>4</type>
<size>4</size>
<definitional>true</definitional>
<defaultOnly>false</defaultOnly>
</org.lockss.daemon.ConfigParamDescr>
</list>
</entry>
<entry>
<string>plugin_version</string>
<string>1</string>
</entry>
<entry>
<string>au_name</string>
<string>"Virginia Tech ETD's by Month- %d - %d", year, month</string>
</entry>
<entry>
<string>au_crawl_depth</string>
<int>5</int>
</entry>
<entry>
<string>au_start_url</string>
<string>"%slockss/manifest.html", base_url</string>
</entry>
<entry>
<string>au_def_pause_time</string>
<long>6000</long>
</entry>
<entry>
<string>au_def_new_content_crawl</string>
<long>31536000000</long>
</entry>
<entry>
<string>plugin_notes</string>
<string>The expected base URL is http://scholar.lib.vt.edu/theses/
The configuration parameter is a 4 digit year and 2 digit month.
The ETD's@VT collection is separated into three different collections within Conspectus due to the complexity of requiring three different plugins for a complete crawl. The first collection, ETD's@VT, is divided into AU's by year, and harvests available, restricted and withheld ETD's from 1997 to 2008 inclusive. The second collection, monthly ETDS@VT, is divided into AU's by year and month, and harvests available, restricted and withheld ETD's from 2009 on. Each individual ETD has a folder in either theses/available or theses/withheld. Restricted ETD's are in the available folder but only readable by Virginia Tech IP's and MA nodes. Only IP's on the MA list of servers are allowed to get a directory listing of theses/withheld; this is updated by a cron job.
The naming convention for the majority of past ETD's and future ETD's follows a variant of the format /etd-mmddyyyy-tttttt based on the timestamp they are added to the collection.
Anything that does not match the above structure is harvested by a separate collection and plugin, called ETD's@VT - pre 2000 unsorted and edu.vt.library.thesesearly, respectively. This third collection is static as no new ETD's are being added with the old naming conventions. This third collection also harvests the non-ETD content in the /theses directory as it merely excludes pages that follow the above format.</string>
</entry>
<entry>
<string>plugin_name</string>
<string>monthly ETD's@VT</string>
</entry>
<entry>
<string>plugin_identifier</string>
<string>edu.vt.library.monthlytheses</string>
</entry>
<entry>
<string>au_crawlrules</string>
<list>
<string>4,"^%s", base_url</string>
<string>1,"%slockss/manifest.html$", base_url</string>
<string>1,"^%swithheld/?$", base_url</string>
<string>1,"%sbrowse/by_author/all.html", base_url</string>
<string>2,"/\?"</string>
<string>1,"%savailable/etd-%.02d+[0-9]+%d-[0-9]+/.*", base_url, month, year</string>
<string>1,"%swithheld/etd-%.02d+[0-9]+%d-[0-9]+/.*", base_url, month, year</string>
</list>
</entry>
<entry>
<string>plugin_crawl_type</string>
<string>HTML Links</string>
</entry>
</map>