Site Tools


scraper-boamp

https://github.com/michelbl/scraper-boamp sur PC130 ubuntu :

olivier@upc130:~# sudo apt install python3-pip
olivier@upc130:~# sudo pip3 install pew

olivier@upc130:~$ pew new boampenv
New python executable in /home/olivier/.local/share/virtualenvs/boampenv/bin/python
Installing setuptools, pip, wheel...
done.
Launching subshell in virtual environment. Type 'exit' or 'Ctrl+D' to return.
boampenv olivier@upc130:~$ 

boampenv olivier@upc130:~$ sudo apt install git

boampenv olivier@upc130:~$ git clone https://github.com/michelbl/scraper-boamp
Cloning into 'scraper-boamp'...
remote: Enumerating objects: 50, done.
remote: Total 50 (delta 0), reused 0 (delta 0), pack-reused 50
Unpacking objects: 100% (50/50), done.
boampenv olivier@upc130:~$ pwd
/home/olivier
boampenv olivier@upc130:~$ cd scraper-boamp/
boampenv olivier@upc130:~/scraper-boamp$ pip3 install --editable .

boampenv olivier@upc130:~/scraper-boamp$ python3 -m pip install jupyter

boampenv olivier@upc130:~/scraper-boamp$ jupyter notebook
[I 16:51:47.704 NotebookApp] Writing notebook server cookie secret to /run/user/1000/jupyter/notebook_cookie_secret
[I 16:51:48.017 NotebookApp] Serving notebooks from local directory: /home/poun/scraper-boamp
[I 16:51:48.017 NotebookApp] The Jupyter Notebook is running at:
[I 16:51:48.017 NotebookApp] http://localhost:8888/?token=9a29b434ec97106c1342e8197fdbe4c917cfdeccb3473aa1
[I 16:51:48.018 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 16:51:48.025 NotebookApp] 
    
    To access the notebook, open this file in a browser:
        file:///run/user/1000/jupyter/nbserver-15244-open.html
    Or copy and paste one of these URLs:
        http://localhost:8888/?token=9a29b434ec97106c1342e8197fdbe4c917cfdeccb3473aa1

dans une autre fenetre :

pew workon boampenv
cp config.ini.example config.ini
vi config.ini

boampenv olivier@upc130:~/scraper-boamp$ cat config.ini

[database]

host=localhost
port=1234
name=boamp
username=psqldbuser
password=S1agne83

[file_storage]
tmp_directory=/home/poun/scraper-boamp/data/tmp
  Make sure tmp_directory foes not exist but its parent exists.

Database

Create a new database user with all privileges on a new table, with access by password (md5 in pg_hda.conf).

su -
su - postgres
   
postgres@db2:~$ psql
psql (10.6 (Ubuntu 10.6-0ubuntu0.18.04.1))
Type "help" for help.

postgres=# create database boamp
postgres=# \q

postgres@db2:~$ psql -d boamp
psql (10.6 (Ubuntu 10.6-0ubuntu0.18.04.1))
Type "help" for help.

# create table boamp (year int, doc_type text, ident text, xml_content text);
CREATE TABLE
boamp=# create index on boamp (ident);
CREATE INDEX
boamp=# create table boamp_source_archives (url text);
CREATE TABLE
boamp=# create index on boamp_source_archives (url);
CREATE INDEX

boamp=#create user psqldbuser ;
CREATE ROLE

boamp=# \password psqldbuser
Enter new password:
Enter it again:

boamp=#alter role psqldbuser with Superuser ;
ALTER ROLE
boamp=# \du
                                    List of roles
 Role name  |                         Attributes                         | Member of
------------+------------------------------------------------------------+-----------
 postgres   | Superuser, Create role, Create DB, Replication, Bypass RLS | {}
 psqldbuser | Superuser                                                  | {}

boamp=# \d
                 List of relations
 Schema |         Name          | Type  |  Owner
--------+-----------------------+-------+----------
 public | boamp                 | table | postgres
 public | boamp_source_archives | table | postgres
(2 rows)

boamp=# grant all privileges on boamp to psqldbuser ;
GRANT
boamp=# grant all privileges on boamp_source_archives to psqldbuser ;
GRANT
boamp=# \dp+
                                          Access privileges
 Schema |         Name          | Type  |      Access privileges      | Column privileges | Policies
--------+-----------------------+-------+-----------------------------+-------------------+----------
 public | boamp                 | table | postgres=arwdDxt/postgres  +|                   |
        |                       |       | psqldbuser=arwdDxt/postgres |                   |
 public | boamp_source_archives | table | postgres=arwdDxt/postgres  +|                   |
        |                       |       | psqldbuser=arwdDxt/postgres |                   |
(2 rows)

boamp=#

fin de pg_hba.conf :

# Database administrative login by Unix domain socket
#
# changed by olivier on march 15 2019
#local   all             postgres                                peer
local   all             postgres                                md5
local	all		psqldbuser				md5

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     peer
# IPv4 local connections:
host    all             all             127.0.0.1/32            md5
# IPv6 local connections:
host    all             all             ::1/128                 md5
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     peer
host    replication     all             127.0.0.1/32            md5
host    replication     all             ::1/128                 md5

# added by olivier on March 15 2019
host	all		all		192.168.1.0/24		md5
local	all		psqldbuser				md5

For remote access (ODBC), add the 2 lines in pg_hba.conf, at the end

host    all             all              0.0.0.0/0                       md5
host    all             all              ::/0                            md5

Put in place the correct python files stored in C:\Data\boamp\scraper-boamp_FIXED_2020_01_16\

scraper-boamp.txt · Last modified: 2020/01/16 17:47 by 89.95.221.183