文本分析方法在公共管理与公共政策研究中的应用The Application of Text Analysis to Public Management and Policy Research
黄萃,吕立远
摘要(Abstract):
数据时代的来临深刻影响了社会科学的研究范式。在不断增长的社会数据中,文本数据扮演着重要的角色,公共管理与公共政策领域开始越来越多地应用文本分析。本文基于"研究语料—研究逻辑"的类型学分析框架对文本分析在公共管理与公共政策研究领域的研究方法应用进行了研究综述。探讨了公共管理与公共政策领域涉及的文本分析研究在不同维度的分布情况,展望本领域发展文本分析方法的潜在路径。本文指出,文本分析将逐渐从分析文本的结构化特征向非结构化特征发展,从开展描述性推论向因果推论发展;为更好地实现上述发展进程,研究者应收集更为高频的文本数据,并尝试将文本数据与更加丰富的数据源相结合。
关键词(KeyWords): 文本分析;公共管理;公共政策;研究综述
基金项目(Foundation): 国家自然科学基金优秀青年科学基金项目“公共管理与公共政策”(项目批准号:71722002);国家自然科学基金面上项目“基于府际关系的公共政策工具选择组合与扩散量化研究——以科技金融领域为例”(项目批准号:71673164)资助
作者(Author): 黄萃,吕立远
参考文献(References):
- 常大伟.2020.我国少数民族档案文献遗产保护政策量化研究---基于128份政策文本的内容分析[J].档案学研究,(3):106-111.Chang D W.2020.Quantitative study on the protection policy of minority documentary heritage in China-Based on the content analysis of 128 policy texts[J].Archives Science tudy,(S3):106-111.(in Chinese)
- 范梓腾,谭海波.2017.地方政府大数据发展政策的文献量化研究---基于政策“目标-工具”匹配的视角[J].中国行政管理,(12):46-53.Fan Z T,Tan H B.2017.Big data development strategies of Chinese local governments based on documents quantitative methods[J].Chinese Public Administration,(12):46-53.(in Chinese)
- 黄萃,任弢,张剑.2015a.政策文献量化研究:公共政策研究的新方向[J].公共管理学报,12(2):129-137.Huang C,Ren T,Zhang J.2015a.Policy documents quantitative research:A new direction for public policy study[J].Journal of Public Management,12 (2):129-137.(in Chinese)
- 黄萃,任弢,李江,等.2015b.责任与利益:基于政策文献量化分析的中国科技创新政策府际合作关系演进研究[J].管理世界,(12):68-81.Huang C,Ren T,Li J,et al.2015b.Responsibility and interest:A study on the evolution of inter-governmental cooperation relationship of S&T innovation policy in China based on the policy documents quantitative research[J].Management World,(12):68-81.(in Chinese)
- 黄萃.2016.政策文献量化研究[M].北京:科学出版社.Huang C.2016.Policy documents quantitative research[M].Beijing:Science Press.(in Chinese)
- 黄菁.2014.我国地方科技成果转化政策发展研究---基于239份政策文本的量化分析[J].科技进步与对策,31(13):103-108.Huang J.2014.A study on the policy development of the transformation of local scientific and technological achievements in China:A quantitative analysis based on 239 policy texts[J].Science&Technology Progress and Policy,31(13):103-108.(in Chinese)
- 霍龙霞,徐国冲.2020.走向合作监管:改革开放以来我国食品安全监管方式的演变逻辑---基于438份中央政策文本的内容分析(1979-2017)[J].公共管理评论,2(1):68-91.Huo L X,Xu G C.2020.Toward collaborative regulation:The logic behind the evolution of food-safety regulation in China since the reform and opening-up:Content analysis based on 438 policy texts (1979-2017) issued by the central government[J].China Public Administration Review,2(1):68-91.(in Chinese)
- 姜付秀,王运通,田园,等.2017.多个大股东与企业融资约束---基于文本分析的经验证据[J].管理世界,(12):61-74.Jiang F X,Wang Y T,Tian Y,et al.2017.Multi-major shareholders and financing constraints:Empirical evidence from text analysis[J].Management World,(12):61-74.(in Chinese)
- 李晓溪,杨国超,饶品贵.2019.交易所问询函有监管作用吗?---基于并购重组报告书的文本分析[J].经济研究,54(5):181-198.Li X X,Yang G C,Rao P G.2019.The regulatory role of stock exchange comment letters:Evidence from textual analysis of merger and acquisition plans[J].Economic Research Journal,54(5):181-198.(in Chinese)
- 马黎珺,伊志宏,张澈.2019.廉价交谈还是言之有据?---分析师报告文本的信息含量研究[J].管理世界,35(7):182-200.Ma L J,Yi Z H,Zhang C.2019.Evidence-based statements or cheap talks?The information content of text in analyst reports[J].Management World,35(7):182-200.(in Chinese)
- 马续补,张潇宇,秦春秀,等.2020.我国公共信息资源开放政策扩散特征的量化研究---以三大经济圈为例[J].信息资源管理学报,10(4):15-26.Ma X B,Zhang X Y,Qin C X,et al.2020.Research on China's public information resources open policy diffusion based on quantitative analysis of policy documents:Taking the three economic circles as examples[J].Journal of Information Resources Management,10(4):15-26.(in Chinese)
- 孟凡坤.2020.我国智慧城市政策演进特征及规律研究---基于政策文献的量化考察[J].情报杂志,2020,39(5):104-111.Meng F K.2020.The evolution of China's smart city policy:Based on the quantitative investigation of policy documents[J].Journal of Intelligence,39 (5):104-111.(in Chinese)
- 孟天广,李锋.2015.网络空间的政治互动:公民诉求与政府回应性---基于全国性网络问政平台的大数据分析[J].清华大学学报(哲学社会科学版),30(3):15-29.Meng T G,Li F.2015.Political interaction in cyberspace:Citizen pursuit and government response:Big data analysis based on the national platform of cyber politics[J].Journal of Tsinghua University (Philosophy and Social Sciences),30(3):15-29.(in Chinese)
- 彭纪生,仲为国,孙文祥.2008.政策测量、政策协同演变与经济绩效:基于创新政策的实证研究[J].管理世界,(9):25-36.Peng J S,Zhong W G,Sun W X.2008.The measurement and the coordinated evolution of policies,and economic performance:A case study on the policy for innovation[J].Management World,(9):25-36.(in Chinese)
- 苏毓淞,姚雨凌.2015.大数据信息采集及其偏差补救方法---以甜党和咸党的口味地盘之争为例[J].清华大学学报(哲学社会科学版),30(3):43-49.Su Y S,Yao Y L.2015.Big data collection and its deviation remedy:A case study of the turf wars between the sweet and salty parties[J].Journal of Tsinghua University(Philosophy and Social Sciences),30(3):43-49.(in Chinese)
- 唐应茂.2018.司法公开及其决定因素:基于中国裁判文书网的数据分析[J].清华法学,12(4):35-47.Tang Y M.2018.The decisive factor in promoting judicial openness:The data statistics of judicial judGments[J].Tsinghua Law Review,12(4):35-47.(in Chinese)
- 王建冬,童楠楠,易成岐.2019.大数据时代公共政策评估的变革:理论、方法与实践[M].北京:社会科学文献出版社.Wang J D,Tong N N,Yi C Q.2019.The transformation of public policy assessment in the age of big data:Theory,methodology and practice[M].Beijing:Social Sciences Academic Press.(in Chinese)
- 王克敏,王华杰,李栋栋,等.2018.年报文本信息复杂性与管理者自利---来自中国上市公司的证据[J].管理世界,34(12):120-132.Wang K M,Wang H J,Li D D,et al.2018.Complexity of annual report and management self-interest:Empirical evidence from Chinese listed firms[J].Management World,34(12):120-132.(in Chinese)
- 王伟,Chen W,Zhu K,et al.2016.众筹融资成功率与语言风格的说服性---基于Kickstarter的实证研究[J].管理世界,(5):81-98.Wang W,Chen W,Zhu K,et al.2016.Success rate of crowdfunding and persuasiveness of language style-An empirical study based on kickstarter[J].Management World,(5):81-98.(in Chinese)
- 魏伟,郭崇慧,陈静锋.2018.国务院政府工作报告(1954-2017)文本挖掘及社会变迁研究[J].情报学报,37(4):406-421.Wei W,Guo C H,Chen J F.2018.Text mining on the government work reports of the state council (1954-2017) and social transformation research[J].Journal of the China Society for Scientific and Technical Information,37(4):406-421.(in Chinese)
- 文宏,杜菲菲.2018.注意力、政策动机与政策行为的演进逻辑---基于中央政府环境保护政策进程(2008-2015年)的考察[J].行政论坛,25(2):80-87.Wen H,Du F F.2018.The evolution logic of attention,policy motivation and policy behavior-Based on the inspection of process of central government's environmental protection policy from 2008 to 2015[J].Administrative Tribune,25 (2):80-87.(in Chinese)
- 徐国冲,霍龙霞.2020.食品安全合作监管的生成逻辑---基于2000-2017年政策文本的实证分析[J].公共管理学报,17(1):18-30,46.Xu G C,Huo L X.2020.The logic of food safety collaborative supervision-An empirical analysis based on 2000-2017 policy documents[J].Journal of Public Management,17(1):18-30,46.(in Chinese)
- 杨慧,杨建林.2016.融合LDA模型的政策文本量化分析---基于国际气候领域的实证[J].现代情报,36(5):71-81.Yang H,Yang J L.2016.Quantitative analysis of policy text merged with LDA modelBased on the field of international climate as demonstration[J].Journal of Modern Information,36(5):71-81.(in Chinese)
- 郁建兴.2019.“最多跑一次”改革:浙江经验,中国方案[M].北京:中国人民大学出版社.Yu J X.2019.“One-run-at-most”reform:The Zhejiang experience,the Chinese solution[M].Beijing:China Renmin University Press.(in Chinese)
- 于水,杨溶榕.2017.中国信访制度的历史变迁与特征:基于政策文本分析的视角[J].公共管理与政策评论,6(3):3-14.Yu S,Yang R R.2017.Historical changes and characteristics of China's petition system:From the perspective of policy text analysis[J].Public Administration and Policy Review,6(3):3-14.(in Chinese)
- 曾婧婧.2015.泛珠三角区域合作政策文本量化分析:2004-2014[J].中国行政管理,(7):110-116.Zeng J J.2015.Policy textual and quantitative research on the pan pearl river delta regional cooperation[J].Chinese Public Administration,(7):110-116.(in Chinese)
- 张涛,马海群,易扬.2020.文本相似度视角下我国大数据政策比较研究[J].图书情报工作,64(12):26-37.Zhang T,Ma H Q,Yi Y.2020.Comparative analysis of China's big data policies from the perspective of text similarity[J].Library and Information Service,64 (12):26-37.(in Chinese)
- 卓敏,吴建平.2016.当代青年雾霾段子语义网络分析与情感可视化研究---基于微博、微信用户[J].中国青年研究,(8):10-19.Zhuo M,Wu J P.2016.Semantic network analysis and emotional visualization research of smog segment of contemporary youth[J].China Youth Study,(8):10-19.(in Chinese)
- Anastasopoulos L J,Whitford A B.2019.Machine learning for public administration research,with application to organizational reputation[J].Journal of Public Administration Research and Theory,29(3):491-510.
- Antweiler W,Frank M Z.2004.Is all that talk just noise?The information content of internet stock message boards[J].The Journal of Finance,59(3):1259-1294.
- Baker S R,Bloom N,Davis S J.2016.Measuring economic policy uncertainty[J].The Quarterly Journal of Economics,131(4):1593-1636.
- BarberP,Rivero G.2015.Understanding the political representativeness of Twitter users[J].Social Science Computer Review,33(6):712-729.
- BarberP,Casas A,Nagler J,et al.2019.Who leads?Who follows?Measuring issue attention andagenda setting by legislators and the mass public using social media data[J].American Political Science Review,113(4):883-901.
- Berger J,Humphreys A,Ludwig S,et al.2020.Uniting the tribes:Using text for marketing insight[J].Journal of Marketing,84(1):1-25.
- Catalinac A.2016.Electoral reform and national security in Japan:From pork to foreign policy[M].New York:Cambridge University Press.
- Christianson M K.2019.More and less effective updating:The role of trajectory management in making sense again[J].Administrative Science Quarterly,64(1):45-86.
- Davis A K,Piger J M,Sedor L M.2012.Beyond the numbers:Measuring the information content of earnings press release language[J].Contemporary Accounting Research,29(3):845-868.
- de Solla Price D J.1965.Networks of scientific papers[J].Science,149(3683):510-515.
- Gentzkow M,Shapiro J M.2010.What drives media slant?Evidence from U.S.daily newspapers[J].Econometrica,78(1):35-71.
- Gentzkow M,Kelly B,Taddy M.2017.Text as data[R].Cambridge:National Bureau of Economic Research.
- Ginsberg J,Mohebbi M H,Patel R S,et al.2009.Detecting influenza epidemics using search engine query data[J].Nature,457(7232):1012-1014.
- Grimmer J,Stewart B M.2013.Text as data:The promise and pitfalls of automatic content analysis methods for political texts[J].Political Analysis,21(3):267-297.
- Harrison J S,Thurgood G R,Boivie S,et al.2019.Measuring CEO personality:Developing,validating,and testing a linguistic tool[J].Strategic Management Journal,40 (8):1316-1330.
- Hoberg G,Phillips G.2016.Text-based network industries and endogenous product differentiation[J].Journal of Political Economy,124(5):1423-1465.
- Hoberg G,Phillips G M.2018.Text-based industry momentum[J].Journal of Financial and Quantitative Analysis,53(6):2355-2388.
- Hollibaugh Jr G E.2019.The use of text as data methods in public administration:A review and an application to agency priorities[J].Journal of Public Administration Research and Theory,29(3):474-490.
- Huang C,Yue X X,Yang M Q,et al.2017.A quantitative study on the diffusion of public policy in China:Evidence from the S&T finance sector[J].Journal of Chinese Governance,2(3):235-254.
- Huang X,Jin H D,Zhang Y.2019.Risk assessment of earthquake network public opinion based on global search BP neural network[J].PLoS One,14(3):e0212839.
- Jiang J Y,Meng T G,Zhang Q.2019.From Internet to social safety net:The policy consequences of online participation in China[J].Governance,32(3):531-546.
- Jiang J Y,Zeng Y.2020.Countering capture:Elite networks and government responsiveness in China's land market reform[J].The Journal of Politics,82(1):13-28.
- King G,Keohane R O,Verba S.1994.Designing social inquiry:Scientific inference in qualitative research[M].Princeton:Princeton University Press.
- King G,Pan J,Roberts M E.2013.How censorship in China allows government criticism but silences collective expression[J].American Political Science Review,107(2):326-343.
- Loftis M W,Mortensen P B.2020.Collaborating with the machines:A hybrid method for classifying policy documents[J].Policy Studies Journal,48(1):184-206.
- Loughran T,McDonald B.2014.Measuring readability in financial disclosures[J].The Journal of Finance,69(4):1643-1671.
- Minto B.1996.The Minto pyramid principle:Logic in writing,thinking and problem solving[M].Minto International.
- Nakamura E,Steinsson J.2018.High-frequency identification of monetary non-neutrality:The information effect[J].The Quarterly Journal of Economics,133(3):1283-1330.
- Pretorius E J.2006.The comprehension of logical relations in expository texts by students who study through the medium of ESL[J].System,34(3):432-450.
- Schmidt T S,Sewerin S.2019.Measuring the temporal dynamics of policy mixes-An empirical analysis of renewable energy policy mixes'balance and design features in nine countries[J].Research Policy,48(10):103557.
- Slapin J B,Proksch S O.2008.A scaling model for estimating time-series party positions from texts[J].American Journal of Political Science,52(3):705-722.
- Tansley S,Tolle K M.2009.The fourth paradigm:Data-intensive scientific discovery[M].Redmond:Microsoft Research.
- Watanabe K,Zhou Y.2020.Theory-driven analysis of large corpora:Semisupervised topic classification of the UN speeches[J].Social Science Computer Review,doi:10.1177/0894439320907027.
- Yang C,Huang C,Su J.2018.An improved SAO network-based method for technology trend analysis:A case study of graphene[J].Journal of Informetrics,12(1):271-286.
- Zhang C W,Bu Y,Ding Y,et al.2018.Understanding scientific collaboration:Homophily,transitivity,and preferential attachment[J].Journal of the Association for Information Science and Technology,69(1):72-86.
- (1)考虑到本领域的交叉学科属性,笔者采用广义的检索策略。对于英文文献,将检索范围扩展到Web of Science核心合集中“Public Administration”“Management”“Political Science”“Social Science Interdisciplinary”和“Social Issues”等可能涉及公共管理的领域。对于中文文献,将检索范围确定为CNKI数据库中的核心期刊和CSSCI期刊,分类目录选定为社会科学Ⅰ辑、Ⅱ辑中的“政治学”“行政学与国家行政管理”“政党与群众组织”“社会科学理论与方法”和“社会学与统计学”等以及经济与管理科学大类。对于文献的时间范围,笔者未进行限制,即呈现数据库中所有包含相关关键词文献的时间分布趋势。
- (2)参见:Grimmer and Stewart(2013),Gentzkow et al.(2017),Berger et al.(2020)。
- (1)该三类特征来源于文本的内容特征,但加工后已成为类似于“主题词”的形式特征概念,因而在形式特征部分予以介绍。
- (1)指数随机图模型是一种静态的网络动力学模型,本质上在回答哪些因素促使节点在一个截面的网络中建立起更多(少)的连接关系。
- (1)由于本章主要讨论文本分析的发展趋势,文献选取主要从分析文本分析模式切入,故部分前沿文献超出了公共管理范畴,但相关分析思路对于发展公共管理与公共政策领域的文本分析具有借鉴意义。
- (1)可参见:https://visuals.manifesto-project.wzb.eu/mpdb-shiny/cmp_dashboard_dataset/。