Through the Open Geoportal Stanford, Tufts, Columbia, UC Berkeley, and Harvard are working together to document, archive, and serve up all of the National Atlas data that went offline last fall. The project is near completion and all the data and metadata will be available around February 2015.
So here is a summary of our greater NYC OGP Meeting held on 12/13/2013 at NYU. We had a great turnout of around 65 participants from the area. The complete agenda with links to presentations and participant list can be found on the Greater NYC OGP Meeting Event Page. We will be following up closely on specific action items.
Summary and Key Action Items
1. Prepare cost-benefit analysis statement for potential partners
2. Hold meeting for those interested / outline next steps
Jeremiah & Eric (Columbia), Frank (CUNY Baruch), Wangyal (Princeton), Alan Leidner (Booz Allen), Wendy (GISMO), Matt (NYPL), Him (NYU)
Determine others to invite
3. Establish regional hubs/nodes for Tri-state area
Greater NYC including Long Island, Westchester County., etc.
Tentative coordination by Columbia, NYU, CUNY
Upstate New York
Tentative coordination by Cornell?
Tentative coordination by Princeton. Others such as Rutgers?
Tentative coordination by UConn, Yale, Tufts.
Other organizations with compatible infrastructure can become a direct node
Coordinate data collection and ingest from smaller organizations
GISMO potentially serve as organizing body
Coordinate collection development and metadata creation
Identify key data repositories for inclusion
4. Coordinate outreach for presenting case for benefits to community
Hold a webinar
5. Establish greater NYC OGP information architecture
Storage and interface separate
Explore single interface through hosted services
Develop and present cost analysis for various options
Tufts to set-up cloud hosted instance
Independence for NY node important
A wonderful review and summary of the Open Geoportal National Summit.
Notes from the Open Geoportal National Summit
by Frank Donnelly, Geospatial Data Librarian at Baruch College CUNY
Empowering the University of New Hampshire User Community with the Power of PLACE.
The University of New Hampshire Library and its partner, the Earth Systems Research Center, have been awarded a grant in the amount of $474,156 from the Institute for Museum and Library Services, National Leadership Grants for Libraries Program (Grant Award Number: LG-05-13-0350-13) to build PLACE, the Position-based Location Archive Coordinate Explorer. PLACE will be a geospatial search interface that will use embedded geospatial coordinates to enable easier discovery of information that can be difficult to locate through text based searching. Through PLACE, via a click or delineation of a search polygon on a web map, users will zoom to a region and will locate all UNH Library collections whose geographic extents intersect. Initially, PLACE will provide access to geographic collections focused on the region, but it will be flexible and expandable as collections grow. The project will provide users with access to these collections through a flexible visual interface and provide a toolkit for other institutions to implement in their geospatial collections. Ready access to embedded geospatial information in a flexible visual interface will contribute to the development of 21st-century skills by library users, such as visual, global, and environmental literacy.
The project will contribute to two open source communities: Open GeoPortal (OGP) and FEDORA. Tasks to accomplish our goals include creating standards compliant metadata for prototype collections and ingesting digital objects into FEDORA, purchasing and configuring a dedicated server for our instance of OGP, and integrating OGP with the FEDORA Solr index to provide a basic level of OGP functionality. We will build new tools not currently available in GeoPortal using Jscript and Jquery. The universal gazetteer tool will involve a common library of polygons, such as county boundaries, which will be available via pull down lists. Time series data is important for assessing changes over time: a cross reference table and a time slider on the interface will make it easier for users to select datasets by time periods. We plan usability studies throughout the project to optimize interface design, and enhancements for providing geospatial access to the unique geological fieldtrip guidebook literature, a feature supported in our needs analysis.
Thelma Thompson, Associate Professor & Government Information and Maps Librarian
Eleta Exline, Assistant Professor & Scholarly Communication Librarian
(603) 862-4252; firstname.lastname@example.org
Michael Routhier, Information Technologist, Earth Systems Research Center
(603) 862-1792; email@example.com
Stanford University Libraries is in the beginning stages of developing metadata creation and management processes using the ISO 191** series of standards. The libraries currently maintain a large quantity of spatial data at the Stanford Geospatial Center located in the Earth Sciences Library. A majority of the metadata associated with these resources is provided in FGDC format, however, the completeness and consistency in the metadata is often lacking in essential areas.
The initial focus will be on the implementation of the following schemata:
- ISO 19115: Geographic Information – Metadata
- ISO 19115-2: Geographic Information – Metadata – Part 2: Extensions for Imagery and Gridded Data
- ISO 19139: Geographic information – Metadata -XML schema implementation
- ISO 19110: Geographic Information – Methodology for Feature Cataloguing
Use of ISO 19115-2 is preferred over ISO 19115 because the 19115-2 contains the entire 19115 standard as well as extension elements for describing acquisition information particular to raster datasets. The 19115-2 also allows for enhanced lineage and process documentation capabilities. However, the decision on which standard should be employed will largely depend upon the capabilities of the system(s) utilized to created and exchange metadata.
The ISO standard defines a number of fixed domain values and namespaces in order to support the normalization of information. Namespaces employ validation rules that can define input types for data entry such as free text character strings (gco:CharacterString), formatted dates (gco:Date), or decimal coordinates (gco:Decimal). Additionally, ISO relies heavily on the use of codelists in order to provide a standard set of terminologies that are defined within a common knowledge domain. For example, an institution could utilize the Resource Constraints metadata element to communicate and manage access rights using its Restriction code.
ResourceConstraints Element in ISO 19115-2:
<gmi:MI_Metadata... <gmd:resourceConstraints> <gmd:MD_LegalConstraints> <gmd:accessConstraints> <gmd:MD_RestrictionCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_RestrictionCode" codeListValue="license" codeSpace=”005”>license</gmd:MD_RestrictionCode> </gmd:accessConstraints> ….</gmd:resourceConstraints>
Definition from ISO Codelist:
<CodeDefinition gml:id="MD_RestrictionCode_license"> <gml:description>formal permission to do something</gml:description> <gml:identifier >license</gml:identifier> </CodeDefinition>
A major departure from the FGDC-style metadata record is the encoding of entity and attribute information in a separate XML document using the ISO 19110 Feature Cataloging specification. This schema defines a collection along with its features and their attributes. The reference to the 19110 metadata is expressed using the xlink attribute in the ContentInfo section of the 19115-2 record. The xlink reference is generated using the universally unique identifier (uuid) assigned to each metadata file.
Example from ISO 19115-2:
<gmi:MI_Metadata... <gmd:contentInfo> <gmd:MD_FeatureCatalogueDescription> <gmd:featureCatalogueCitation xlink:href="http://stanford.edu/uuid?71g91be3-1ff9-438b-bd89-9d89ec0bddfd"/> </gmd:MD_FeatureCatalogueDescription> </gmd:contentInfo> ...</gmi:MI_Metadata
Example from corresponding ISO 19110 record:
<gfc:FC_FeatureCatalogue xmlns="http://www.isotc211.org/2005/gfc" xmlns:gco="http://www.isotc211.org/2005/gco" xmlns:gfc="http://www.isotc211.org/2005/gfc" ...uuid="71g91be3-1ff9-438b-bd89-9d89ec0bddfd"> <gmx:name> <gco:CharacterString>Feature Catalog for Electric Transmission Lines in the United States - 1010 Update</gco:CharacterString> </gmx:name> ...</gfc:FC_FeatureCatalogue>
The standalone 19110 record allows for the feature catalog XML to be associated with one or more metadata records without having to duplicate the information inside of another document.
The flexibility provided by the xlink language is useful in a number of ways. One notable example is the ability to better manage contact information associated with an institution or individual. Spatial metadata may associate several responsible parties with the lifecycle of the data and these sections of information require dozens of lines of XML for each named entity. Complicating this matter even further is the inevitability that locations and access points for people and organizations will change over time. The ISO metadata model allows information to be managed externally and associated with any number of records. This relationship can be expressed as a hyperlink, directing the user to the external document. Or, records can be resolved to include the external XML inside of the main document. When modifications are made to the external file, all records that contain the xlink reference must be resolved again in order to be updated with the current information.
Example of unresolved contact element:
<gmd:contact xlink:href="http://www.stanford.edu/09A95C420FB821476665893256MOME37" xlink:title="Stanford Geospatial Center"/>
Example of a resolved record with contact information inserted from external xml file:
<gmi:MI_Metadata... <gmd:contact xlink:title="Stanford Geospatial Center"> <gmd:CI_ResponsibleParty uuid="09A95C420FB821476665893256MOME37""> <gmd:organisationName> <gco:CharacterString>Stanford Geospatial Center</gco:CharacterString> </gmd:organisationName> <gmd:contactInfo xlink:type="simple"> <gmd:CI_Contact> <gmd:address xlink:type="simple"> <gmd:CI_Address> <gmd:deliveryPoint> <gco:CharacterString>Branner Earth Sciences Library</gco:CharacterString> </gmd:deliveryPoint> <gmd:deliveryPoint> <gco:CharacterString>397 Panama Mall</gco:CharacterString> </gmd:deliveryPoint> <gmd:city> <gco:CharacterString>Stanford</gco:CharacterString> </gmd:city> <gmd:administrativeArea> <gco:CharacterString>California</gco:CharacterString> </gmd:administrativeArea> <gmd:postalCode> <gco:CharacterString>94305</gco:CharacterString> </gmd:postalCode> <gmd:electronicMailAddress> <gco:CharacterString>firstname.lastname@example.org</gco:CharacterString> </gmd:electronicMailAddress> </gmd:CI_Address> </gmd:address>
The ISO model supports similar methods for reusing metadata from within a single record. This is a very useful convention for encoding long stanzas of XML that are recurrent in the metadata. For example, series metadata will contain information that is universal to all resources in a data series while other metadata elements may be unique to just one specific file. Through the use of corresponding identifiers, the uuid and uuidref attributes allow for content from one section of an XML document (uuid) to be included in another section (uuidref). This approach to metadata management will improve data standardization and appearance, as well as reduce the need to express redundant information within a record.
Example from Series Metadata:
<gmd:DS_Series> ... <gmd:MD_DataIdentification> <gmd:citation> <gmd:CI_Citation uuid="STA95C42-7FB1-2847-9945893256DADE10"> <gmd:title> <gco:CharacterString>Climatic Research Unit (CRU) Time Series 3.0 Monthly Precipitation Time-Series Data (1901-2006)</gco:CharacterString> </gmd:title> <gmd:date> <gmd:CI_Date> <gmd:date> <gco:Date>2009-12-10</gco:Date> </gmd:date> <gmd:dateType> <gmd:CI_DateTypeCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#CI_DateTypeCode" codeListValue="publication" codeSpace="002">publication</gmd:CI_DateTypeCode> </gmd:dateType> </gmd:CI_Date> </gmd:date> ...</gmd:DS_Series>
Example of individual dataset metadata within the Series Record:
<gmd:DS_Series>... <gmd:has>... <gmd:identificationInfo> <gmd:MD_DataIdentification> <gmd:citation uuidref="STA95C42-7FB1-2847-9945893256DADE10"/> <gmd:extent> <gmd:EX_Extent> <gmd:temporalElement> <gmd:EX_TemporalExtent> <gmd:extent> <gml:TimePeriod gml:id="TimePeriod_1"> <gml:begin> <gml:TimeInstant gml:id="begin_1"> <gml:timePosition>1901-01-01</gml:timePosition> </gml:TimeInstant> </gml:begin> <gml:end> <gml:TimeInstant gml:id="end_1"> <gml:timePosition>1901-01-31</gml:timePosition> </gml:TimeInstant> </gml:end> </gml:TimePeriod> ...</gmd:extent> ...</gmd:DS_Series>
In the above example, the citation information from the identification section is inherited from the series metadata. The temporal extent metadata, which is specific to this particular file, is recorded separately.
The structure of series records will largely depend upon the system used to create metadata. While the above example validates according to the schema definition, some systems create individual xml files which are linked together using parent/child relationships, referenced using a parentIdentifier uuid for the series metadata file. For example:
The Series record below uses the Scope Code ‘series’, and its metadata file Identifier is used by any child dataset metadata which belong to the series:
<gmd:fileIdentifier> <gco:CharacterString>74d83c27-09e2-4e89-9d1e-a2f4af1d87e7</gco:CharacterString> </gmd:fileIdentifier> <gmd:hierarchyLevel> <MD_ScopeCode xmlns="http://www.isotc211.org/2005/gmd" codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_ScopeCode" codeListValue="series"/> </gmd:hierarchyLevel>
Example from corresponding child record with Scope code ‘dataset’ and referenced parent (series record) identifier:
<gmd:parentIdentifier> <gco:CharacterString>74d83c27-09e2-4e89-9d1e-a2f4af1d87e7</gco:CharacterString> </gmd:parentIdentifier> <gmd:hierarchyLevel> <MD_ScopeCode xmlns="http://www.isotc211.org/2005/gmd" codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#MD_ScopeCode" codeListValue="dataset"/> </gmd:hierarchyLevel>
Decisions about how to handle series information should be influenced by system capabilities and necessary levels of description for discovery of dataset collections.
While the architecture required to manage multiple XML records to describe one single file might at first seem complex, trends in managing geospatial metadata are increasingly leaning towards international acceptance of the ISO series in order to take advantage of its flexible and more semantically enabled data structure.
Currently, Stanford is surveying the effectiveness and overall quality of the ISO metadata for approximately 60 datasets. We will report back after our analysis is complete.
We have added the most recent OGP Metadata Guide, including updates on world-based datasets and GNS keyword usage.
ALA Cartographic Resources Cataloging Interest Group
American Library Association Midwinter Conference
Jan. 27, 2013
Discussion on the suitability of Library of Congress Subject Headings (LCSH) for describing geospatial data sets in the OpenGeoportal (OGP). Many institutions involved with OGP use LCSH to describe themes in the metadata for geospatial data layers. There is a problem of data layers needing more specific subject keywords than what are available in LCSH. The question was posed to the Interest Group as to whether the OGP metadata community should make an effort to address gaps in LCSH through the SACO program or whether they would be better served looking toward other thesauri (e.g. Getty, GNS, etc.) for more specific description of geospatial resources. The consensus of the group favored applying relatively broad LCSH terms when possible and using also other thesauri for more specific terminology when appropriate. It was determined that the SACO program may not be effective for a full-scale geospatial data keyword effort. Marc McGee will report on the discussion to the OGP Metadata Working Group which will determine next steps toward addressing OGP thesaurus needs.
Tufts University has acquired funding to develop an open source metadata clearinghouse toolkit: a cloud-based, light weight metadata authoring environment. Our goal is to provide rapid guided metadata authoring. The site will employ controlled vocabularies from the Library of Congress API and the OGP thesauri. It will also facilitate the import and export of geospatial metadata records from the OGP cloud of metadata (currently around 40,000 metadata records). It will also provide a crosswalk to/from multiple metadata formats. It is being developed in collaboration with Harvard, MIT, and UC Berkeley.