Overview
Learn about the common types of data, chart types typically used for data visualisation, and real-world applications of mathematical concepts such as seasonal adjustment.
How are Data Collected
How Data are Collected
The Department of Statistics Singapore (DOS) collects data through the following means:
- Surveys such as household expenditure surveys, price surveys, census etc.
- Administrative sources such as births and deaths data from the Immigration and Checkpoints Authority, education data from the Ministry of Education Singapore etc.
Such data are commonly used in various forms of analyses such as to describe or visualise patterns and trends, or to uncover relationship between variables.
DOS also uses alternative data sources and methods such as web-scraping to supplement the information gleaned from traditional surveys. This refers to the collection of data from the internet using a programming code (i.e., crawler) and a web tool (i.e., scraper), to search and extract the data required, respectively. It has numerous applications across various industries, such as news/ price monitoring, market research and sentiment analysis that can be tedious to perform if done manually.
An example is the use of online price information in the compilation of the Consumer Price Index (CPI) [PDF, 1 MB].
DOS's Web-scraping Principles
With increasingly more data residing on websites, DOS conducts web-scraping activities as part of our data collection while minimising the burden of respondents providing the information. The data may be used by DOS or shared with other public agencies to fulfil public duties, including policy analyses and service delivery.
We adopt the following principles to assure that web-scraping is carried out consistently, ethically and transparently.
Principles
- Abiding by applicable national legislation;
- Minimising burden on the website owners (e.g., by adding idle time between requests; web-scraping at a time of day during which the web server is not expected to be under heavy load); and
- Identifying ourselves to the website owners when carrying out web-scraping (e.g., in user agent strings).
