Background
What were our aims
We wanted to make the census easier to use, so people could:
- find census information which meets their needs
- enable users to understand the meaning and derivation of census information
- deliver census information in forms that facilitate its use
In order to achieve these aims, we needed to restructure the data to conform to standards.
At the moment InFuse includes 2001 and 2011 Census data for England and Wales. .
Why did we do it
The Census has historically produced a set of predefined tables, to meet specific primary requirements. We hosted these predefined tables in our interface Casweb, but the format of these tables did not give us very many options. This meant that people had to look through many tables to see if they contained the variables they were interested in. In addition the data was divorced from the definitions of these variables (as well as other metadata). No one wants to hunt through the documentation to find a simple definition.
The format of the census tables was an obstacle to the creation of a more usable application, so we decided to reformat it into a structure based on standards, this enabled us to create a more flexible interface.
What were the issues?
Lack of consistency
Tabulations have been constructed for specific needs, but the construction of these tables has not always been consistent with other existing tables. This lack of consistency has led to different tables using different terms for the same item. For example an age grouping of “24 years and under” is the same as “0-24” and “0 to 24”, and “Limiting long-term illness” “is the same as “with a limiting long-term illness”. But having variations in terms for identical concepts causes lots of problems for an application. We went through and identified these items and changed titles to be consistent.
Problems with Age
‘Age’ a particularly problematic example, different age groupings have been used in different tables, there are actually 99 different age groupings used in the 2001 Census, and 76 of these grouping are only used in a single table.
Barriers to understanding
Separation and fragmentation of data and metadata. The descriptions of items (e.g. economic activity ), were not associated with the data. To find out the meaning you would have to take a look through the Census definitions volume, or look at the original Census questionnaire. We wanted to make it easier so we associated the metadata with the data
Visual encryption in table frameworks
The pre-defined tables were presented in the figure below. The underlying data was not associated with the labels. So for example, if someone downloaded data for Economically Active > Employed > Males > With Limiting long term illness, they would only get the label CS0210023 (the first 4 characters referring to the table ID, and the last 4 digits corresponding to the cell ID). To find out what the code referred to they would need to look at the interface, and manually type in their own (more meaningful) description.
Table CS021 from Casweb - Economic Activity be Sex and Limited Long-Term Illness(LLTI)
We wanted to have a more usable application, so when people downloaded the data, instead of getting:
They would get more information in the form of a metadata file.
Quality control
The process of tidying up and converting the data to conform to standards was a complex job. We have thoroughly checked the data to make sure we have not introduced any errors.
What’s the technology behind InFuse
Benefits offered by an open standards data feed approach include:
- A comprehensive structure for integrating, encoding and describing all information relating to one or more datasets, including comparability between different datasets.
- A means of advertising the information content and structure of datasets using open standards data structure definitions to facilitate understanding of, and access to the datasets, and to enable machine-to-machine operations on the datasets by generic applications.
- A means of transferring the full range of information from a dataset.
Of course we can only provide access to combinations of variables that actually exist! We’re not giving access to any new data, just making it easier for you to find!