UM  > Faculty of Social Sciences  > DEPARTMENT OF ECONOMICS
Residential Collegefalse
Status已發表Published
Trails of Data: Three Cases for Collecting Web Information for Social Science Research
Li,Fumin; Zhou,Yisu; Cai,Tianji
2021-11
Source PublicationSocial Science Computer Review
ISSN0894-4393
Volume39Issue:5Pages:922–942
Abstract

As the availability of online data grows rapidly, researchers are confronted with a pressing question: How should social scientists collect Internet data for research? This study focuses on one of the most commonly used data collection techniques: web scraping. Going beyond canned approaches by leveraging a general framework of data communication, this study illustrates how online information can be systematically queried and fetched for reproducible research. To generalize our approaches, we additionally explore the variations in site security and architecture that analysts may encounter during the scraping process before they are given access to the desired data. The approaches we introduce do not rely on any proprietary software and can be easily implemented on any computing platform with programming languages such as Python or R. The methodological discussion in this study is meant to be applicable to current web-based research efforts. We include three examples with complete Python implementation. We also present an integrated workflow that enables researchers to produce analytical data sets that are traceable and thus verifiable for analysis or replication. Lastly, options related to the validity and efficiency of data are discussed, and we highlight the ongoing debate surrounding the ethics of online data collection, ultimately advocating for the fair use of online data.

KeywordData Collection Reproducible Research Web Scraping Headless Browser Apis Python
DOI10.1177/0894439319886019
URLView the original
Indexed BySCIE ; SSCI
Language英語English
WOS Research AreaComputer Science ; Information Science & Library Science ; Social Sciences - Other Topics
WOS SubjectComputer Science, Interdisciplinary Applications ; Information Science & Library Science ; Social Sciences, Interdisciplinary
WOS IDWOS:000496062500001
PublisherSAGE PUBLICATIONS INC
Scopus ID2-s2.0-85075010416
Fulltext Access
Citation statistics
Document TypeJournal article
CollectionDEPARTMENT OF ECONOMICS
Corresponding AuthorCai,Tianji
AffiliationDepartment of Sociology, University of Macau Taipa, Macau SAR, China
First Author AffilicationUniversity of Macau
Corresponding Author AffilicationUniversity of Macau
Recommended Citation
GB/T 7714
Li,Fumin,Zhou,Yisu,Cai,Tianji. Trails of Data: Three Cases for Collecting Web Information for Social Science Research[J]. Social Science Computer Review, 2021, 39(5), 922–942.
APA Li,Fumin., Zhou,Yisu., & Cai,Tianji (2021). Trails of Data: Three Cases for Collecting Web Information for Social Science Research. Social Science Computer Review, 39(5), 922–942.
MLA Li,Fumin,et al."Trails of Data: Three Cases for Collecting Web Information for Social Science Research".Social Science Computer Review 39.5(2021):922–942.
Files in This Item: Download All
File Name/Size Publications Version Access License
Li, Zhou, & Cai_2019(467KB)期刊论文作者接受稿开放获取CC BY-NC-SAView Download
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Li,Fumin]'s Articles
[Zhou,Yisu]'s Articles
[Cai,Tianji]'s Articles
Baidu academic
Similar articles in Baidu academic
[Li,Fumin]'s Articles
[Zhou,Yisu]'s Articles
[Cai,Tianji]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Li,Fumin]'s Articles
[Zhou,Yisu]'s Articles
[Cai,Tianji]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: Li, Zhou, & Cai_2019_SSCR.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.