Operation page with Cookies information (Selenium)

Article directory

1. Cookie introduction
2. Use Selenium to obtain Cookies information (log in)
3. Operation page with cookies (Selenium)
4. Operation page with cookies (session)
5. Automatic login (verification processing)
- 1. Super Eagle (picture verification code)

Usually after initiating a login request, the request again returns to the non-logged-in state, that is, the login information of the first request will not be saved to the second request. If you want to use the first login information in the second request, , Cookies need to be used here. Cookies are used to allow the server to record the relevant status of the client. The session can send requests. If the request generates a cookie, the cookie will be automatically stored in the session.

2. Use Selenium to obtain Cookies information (log in)

Selenium can automate the operation of the page. When login information is required, there is generally no login information when starting automation. You can use the selenium module to print out the login-related information in advance or save it in a file. The next time you use it, read Use this information and add it in. When selenium initiates a request to the web page, the account information is already included. The specific operations are as follows

First, we write a program to obtain the login information. Here, we usually request directly to the login interface, use scanning code or some other method to log in. After the operation is completed, we will print out the obtained information, or save the information as a file for use next time. time will read its file. Note here that there must be a sufficient delay before printing login information (Cookies) to allow you to complete the login operation. If there is no delay or the delay is too short, the information may still be non-login information (Cookies) when printing. ( For the sake of security, the cookie login information obtained by many websites generally has a certain timeliness and is generally not valid for a long time )

 
    from selenium import webdriver
    from selenium.webdriver.common.by import By
    import time
    
    
    browser = webdriver.Chrome()
    # The url to request, here you can usually request the login screen directly, and then make a get request
    url = 'https://www.xxx.cn/resources/login.html'
    browser.get(url)
    
    # At this point, it's all about how you log in, either by entering your information or by scanning the code on one of the screens.
    # I choose to scan the code to log in, here use the Xpath location to find click to jump to the scanning code to log in interface, then click function automatically click, will jump to the Xpath interface, here directly manually click into the jump can also be
    login = browser.find_element(By.XPATH, '//*[@id="toolbar_Div"]/div[2]/div[2]/ul/li[2]/a')
    login.click()
    # Note: Here we must leave enough time to complete the login operation, to ensure that the Cookies login information for the login information is logged in state, if there is no delay below or the delay is not enough, it may lead to get the Cookies information is still not logged in the state, affecting the use of later
    time.sleep(10)
    
    # Get the login information (cookies), print it out, or save it to a file
    cookies = browser.get_cookies()
    print(cookies)
    

If it is interfaced, this can be encapsulated as a function, bind a button to the function, and when clicked, it can be triggered, then complete the operation, save the login information as a global variable, and when doing other operations on the interface, you can Use this global variable.

3. Operation page with cookies (Selenium)

Here , copy and filter the Cookies information obtained above, assign it to the cookies variable below, and then add_cookieadd it using Use two solutions for parsingswitch_to.window``try except

 
    import selenium.common.exceptions
    from selenium import webdriver
    from selenium.webdriver.common.by import By
    import time
    
    
    # Start by creating the object as normal, and make a request to a URL that does not contain login information.
    q = webdriver.Chrome()
    q.get('https://www.xxx.cn/index.html')
    time.sleep(1)
    
    # Here we copy the user information (cookies) printed above, in the form of a column, with a dictionary inside the list, and use a for loop to add the cookies in one by one
    cookies = []
    for cookie in cookies.
        q.add_cookie(cookie)
    time.sleep(1)
    
    # Next, make a request to the site you want to request, which already contains login information
    q.get('https://www.xxx.cn')
    
    # Then do something with the site (e.g. enter, clear, click, etc.)
    star = q.find_element(By.XPATH, '//*[@id="fromStationText"]') # Locate the input box (in this case Xpath)
    star.clear() # Clear the contents of the positioned input box.
    star.send_keys("hello") # type hello here for the found input box
    runsta = q.find_element(By.XPATH, '//*[@id="search_one"]') # locate the button (in this case Xpath)
    runsta.click() # perform a click on the positioned button
    time.sleep(1)
    
    # If you jump to a new window, you need to do the following to switch to the current window, otherwise you won't be able to find the elements
    # If you jump to a new URL in the original window, you don't need to do the following
    currentWin = q.current_window_handle
    handles = q.window_handles
    for i in handles.
        if currentWin == i.
            continue
        else: q.switch_to.window(i)
            q.switch_to.window(i)
    time.sleep(1)
    
    # If you want to do bulk operations on URLs (e.g., get a list of URLs displayed as a list, get their information), you can use a for loop.
    # Some sites will exist two programs, repeatedly enter the web page to locate the element, you will find that there are two parsing format, here you can use the try approach
    for i in range(1, 1000, 2): # Here are two different parsing schemes.
        # Here are two different parsing schemes
        try.
            # Option 1
            # Use XPath to locate the information and print it.
            a = q.find_element(By.XPATH, '/html/body/div[1]/div[9]/div[12]/table/tbody/tr[{}]/td[1]/div/div[1]/div/a'.format(i))
            b = q.find_element(By.XPATH, '/html/body/div[1]/div[9]/div[12]/table/tbody/tr[{}]/td[4]'.format(i))
            print(a.text, b.text)
        except selenium.common.exceptions.NoSuchElementException as e::
            # Option 2
            # Use XPath to locate the message and print it
            a = q.find_element(By.XPATH, '/html/body/div[2]/div[8]/div[12]/table/tbody/tr[{}]/td[1]/div/div[1]/div/a'.format(i))
            b = q.find_element(By.XPATH, '/html/body/div[2]/div[8]/div[12]/table/tbody/tr[{}]/td[4]'.format(i))
            print(a.text, b.text)
    

The series of operations here can be encapsulated as a function bound to a button, which can be triggered by clicking the button. Cookies use global variables, which is the cookie information obtained when logging in above.

For more details on selenium operations, please see https://blog.csdn.net/weixin_46287157/article/details/129149265

4. Operation page with cookies (session)

You can use the information to log in directly using the session.

 
    # Login URL
    login_url = ''
    # Parameters for login, such as email for account, pwd for password, code for verification code.
    # This kind of parameter depends on the specific URL, usually through some encryption, need to be reverse parsed
    data = {
        
        'email': '', 'pwd': '',
        'code': ''
    }
    # Crawl the page after login, use session to send POST request, session gets the cookie
    login_page_text = session.post(url=login_url, headers=headers, data=data)
    

Here you can also use the Cookies information obtained using selenium above and add it to the session. The login information is already included when initiating the request.

 
    import requests
    
    
    # selenium fetches cookies and copies them over
    cookies = []
    # Create the object and add the cookies information, those key-value pairs will depend on the specific url
    session = requests.session()
    for cookie in cookies.
        session.cookies.set(cookie['name'], cookie['value'])
    
    # Make a request for the URL file
    page = session.get(browser.current_url)
    page.encoding = "utf-8"
    html = page.text
    # Save the source code
    with open('. /a.html', 'w', encoding='utf-8') as fp.
        fp.write(page.text)
    

For the sake of convenience, the above login operations are all manual, and the login operation can also be automated. However, some large websites will be more troublesome, such as requiring image verification codes, mobile phone verification codes or other verification methods. Some verifications can use third-party tools. Such as Super Eagle and other third-party platforms

1. Super Eagle (picture verification code)

Basic usage of Super Eagle: www.chaojiying.com/about.html
Take logging in to a website as an example. In order to ensure that the verification code of the request page is the same as the login one, you cannot make two requests. You can use screenshots to change the page. Take a screenshot, then capture the verification code part, use a third-party tool to process it, and log in after obtaining information or operations. Screenshot croppingThe method used here is

 
    from PIL import Image
    
    # Request a global screenshot of the page and save it
    bro.save_screenshot('a.png')
    # Determine CAPTCHA coordinates (crop area)
    code_img_ele = bro.find_element('') # image label
    location = code_img_ele.location # CAPTCHA upper left corner coordinates x y
    size = code_img_ele.size # widths
    # Top left and bottom right corner locations
    rangle = (int(location['x']), int(location['y']), int(location['x']+size['width']), int(location['y']+size['height']))
    # Crop the image
    i = Image.Open('. /a.png')
    frame = i.crop(rangle)
    frame.save('code.png')