crazypythonmaster/pythonscript-scrapy

语言: Python

git: https://github.com/crazypythonmaster/pythonscript-scrapy

README.md (中文)

这些脚本从瑞典的两个着名站点提取所有医生的数据(如姓名,地址,电话号码等),并将结果输出为CSV格式。 并比较它们,做一些过程。 如何执行: 以下脚本需要按顺序运行。脚本所需的输入文件和脚本生成的输出文件 被提及。 用途:     python -d hitta_primary_clinic.py:  描述:该脚本从Hitta中提取所有主要诊所。        输出:                     1. hitta_primary_clinic_result.csv hitta.py:  描述:此脚本搜索父诊所名称并从Hitta中提取详细信息。          输入:                      1.Vardcentraler.csv          输出:                     1. hitta_result.csv

eniro.py:      描述:此脚本搜索父诊所名称并从Eniro中提取详细信息。      输入:                     1.Vardcentraler.csv          输出:                      1. eniro_result.csv

compare_hitta_eniro_vardcentraler.py:  说明:此脚本将Hitta和Eniro结果进行比较。          输入:                  1. Vardcentraler.csv                  2. hitta_result.csv                  3. eniro_result.csv              4. hitta_primary_clinic_result.csv      输出:                  1. hitta_eniro_vardcentraler_result.csv

hitta_eniro_vs_primary_clinic.py:  说明:此脚本比较Primary Clinic结果文件中的hitta_eniro结果。          输入:                  1. hitta_eniro_vardcentraler_result.csv                  2. hitta_primary_clinic_result.csv          输出:                  1. hitta_eniro_vs_primary_clinic.csv                  2. hitta_eniro_vs_primary_clinic_summary.csv

list_found_doctors.py:  说明:此脚本检查主页URL中的医生。          输入:                  1. hitta_eniro_vardcentraler_result.csv                  2. ALL_RECORDS_MATCH_73_273_TO_273_373_SPLIT_ADDRESS.csv          输出:                  1. found_doctors_result.csv                  2. doctors_result.csv

get_all_doctors.py:  说明:此脚本提取主页上找到的所有医生姓名。          输入:                  1. found_doctors_result.csv          输出:                  1. get_all_doctors_names_result.csv

本文使用googletrans自动翻译,仅供参考, 原文来自github.com

en_README.md

These scripts extract all doctors' data (such as name, address,telephone number and so on) from two famous sites in Sweden, and output the result as CSV format.
And compare them, do some processes.
How to excute :
The following scripts needs to be run in sequence. The input files required by the script and the output files generated by the scripts
are mentioned.
Usages:
python -d
hitta_primary_clinic.py:
Description: This script extracts all primary clinics from Hitta.
Output:
1. hitta_primary_clinic_result.csv
hitta.py:
Description: This script search for parent clinic name and extract the details from Hitta.
Input:
1.Vardcentraler.csv
Output:
1. hitta_result.csv

eniro.py:
Description: This script search for parent clinic name and extract the details from Eniro.
Input:
1.Vardcentraler.csv
Output:
1. eniro_result.csv

compare_hitta_eniro_vardcentraler.py:
Description: This script compares the Hitta and Eniro results together.
Input:
1. Vardcentraler.csv
2. hitta_result.csv
3. eniro_result.csv
4. hitta_primary_clinic_result.csv
Output:
1. hitta_eniro_vardcentraler_result.csv

hitta_eniro_vs_primary_clinic.py:
Description: This script compares the hitta_eniro result from Primary Clinic result file.
Input:
1. hitta_eniro_vardcentraler_result.csv
2. hitta_primary_clinic_result.csv
Output:
1. hitta_eniro_vs_primary_clinic.csv
2. hitta_eniro_vs_primary_clinic_summary.csv

list_found_doctors.py:
Description: This script checks for doctors in Home URL.
Input:
1. hitta_eniro_vardcentraler_result.csv
2. ALL_RECORDS_MATCH_73_273_TO_273_373_SPLIT_ADDRESS.csv
Output:
1. found_doctors_result.csv
2. doctors_result.csv

get_all_doctors.py:
Description: This script extract all doctors names found on home url.
Input:
1. found_doctors_result.csv
Output:
1. get_all_doctors_names_result.csv