Seeding and Caching

When to Cache and When to Seed

Web Mapping Services are a rapidly growing method for geospatial data sets to users through a web browser or a GIS.  The benefits of a portal-type system (e.g. the Tufts Geodata Portal) include wider and faster distribution of data, delivery efficiency, scalability, and less expertise required on the part of the end user.  This client – server model of data distribution; however, has some challenges in the management and processing of large data sets which are pushed through the internet to users.

Typically in these systems, the system either “caches on the fly” which creates tiles for viewing in real time or the tiles are premade or “seeded.” Caching on the fly is a much simpler process but puts a serious processing burden on the server side when a large data set is called by a client.  Seeding allows large data sets to prebuilt and ready for distribution to the client.

As a geospatial analyst or manager, the decision about when to seed and cache data are driven by interrelated software, hardware, and management issues. The decision making processes presented here are general and qualitative in nature rather than being some sort of quantitative analysis — and are really about institutional and user-driven priorities.  The key is to balance the user demand and interests versus file sizes and map server processing demands. We will use Tufts University as a case study to provide context to these guidelines.

Seeding is the process by which map tiles are created for the map server before a layer is “called” by the client.  Seeding represents a specific subset of your overall demand load which should be based on rules/suggestions like the following:

Suggestion #1:  Prioritize your seeding efforts by the location(s) that most interest your primary users.  In the case of Tufts University, we are located in the Boston Metro Area so our students and faculty are greatly interested in imagery and data for the urban matrix of Boston.  Consequently, one of the first data sets to be addressed in our seeding process is the newly available three inch, very high resolution aerial photography for the City of Boston.  Our user base would be far less interested in aerial photography of Columbus, Ohio.

Suggestion #2: Prioritize your seeding efforts on small-scale, primary data sets that are utilized by the widest part of your audience. At Tufts University, we have a large amount of institutional interest in urban water resources and farming. One of the most important data sets to address these problems are contour maps and DEMs (Digital Elevation Models) which are extremely high resolution and large. Other data sets that might apply include land use, streets layers, hydrography, soils, and LiDAR point clouds.

Suggest #3: Prioritize initial views of data sets for seeding. Users will initially want a general sense of data sets so prebuild and seed the first couple of small-scale views they might view. For users at Tufts, seeding views of popular data sets for all of New England would be a sensible approach. Using pyramids and overviews for these views could be effective.

Suggestion #4: Split the largest and most unmanageable data sets into smaller sizes.

Suggestion #5: Combine all of the above suggestions to minimize the overall processing time by the caching system and maximize the user experience of tiles. All of these methods together might be the most effective seeding method for huge and popular data sets.

Suggestion #6: Any data that users want immediately.

Suggestions #7: New or contemporary data sets of wide interest

When to Cache on the Fly (or not to seed)

This is typically the default.  Some data sets that are definitely strong candidates for On the Fly Caching include:

Suggestion #1: Specialty data sets used by a small number of users.  Example: Historical maps with a specific theme.

Suggestion #2: Dispersed data sets as such as the political boundaries for each of the towns of Massachusetts.

Suggestion #3: Older data sets, especially ones that have more contemporary versions.

Carl Zimmerman, PhD Tufts University May 1, 2014