如何在python中抓取非类文本?

insrf1ej  于 2021-09-08  发布在  Java
关注(0)|答案(1)|浏览(312)

大家好,我目前正试图用python完成一项任务。我需要电子邮件回复的确认号码。然而,这个号码被放在许多标签下,我不知道如何刮它。本例中的数字是035247。我在末尾添加了一张图片。
以下是我想略读的文字:

<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional //EN\">
 <html>
  <head>
   <title>
    Facebook
   </title>
   <meta charset="utf-8" content='\"text/html;' http-equiv='\"Content-Type\"'/>
   <style nonce='\"S0Z1gia3\"'>
    @media all and (max-width: 480px){*[class].ib_t{min-width:100% !important}*[class].ib_row{display:block !important}*[class].ib_ext{display:block !important;padding:10px 0 5px 0;vertical-align:top !important;width:100% !important}*[class].ib_img,*[class].ib_mid{vertical-align:top !important}*[class].mb_blk{display:block !important;padding-bottom:10px;width:100% !important}*[class].mb_hide{display:none !important}*[class].mb_inl{display:inline !important}*[class].d_mb_flex{display:block !important}}.d_mb_show{display:none}.d_mb_flex{display:flex}@media only screen and (max-device-width: 480px){.d_mb_hide{display:none !important}.d_mb_show{display:block !important}.d_mb_flex{display:block !important}}.mb_text h1,.mb_text h2,.mb_text h3,.mb_text h4,.mb_text h5,.mb_text h6{line-height:normal}.mb_work_text h1{font-size:18px;line-height:normal;margin-top:4px}.mb_work_text h2,.mb_work_text h3{font-size:16px;line-height:normal;margin-top:4px}.mb_work_text h4,.mb_work_text h5,.mb_work_text h6{font-size:14px;line-height:normal}.mb_work_text a{color:#1270e9}.mb_work_text p{margin-top:4px}
   </style>
  </head>
  <body bgcolor='\"#ffffff\"' dir='\"ltr\"' style='\"margin:0;padding:0;\"'>
   <table align='\"center\"' border='\"0\"' cellpadding='\"0\"' cellspacing='\"0\"' id='\"email_table\"' style='\"border-collapse:collapse;\"'>
    <tr>
     <td grande,tahoma,verdana,arial,sans-serif;background:#ffffff;\"="" id='\"email_content\"' neue,helvetica,lucida="" style='\"font-family:Helvetica'>
      <table border='\"0\"' cellpadding='\"0\"' cellspacing='\"0\"' style='\"border-collapse:collapse;\"' width='\"100%\"'>
       <tr style='\"\"'>
        <td colspan='\"3\"' height='\"20\"' style='\"line-height:20px;\"'>
        </td>
       </tr>
       <tr>
        <td colspan='\"3\"' height='\"1\"' style='\"line-height:1px;\"'>
        </td>
       </tr>
       <tr>
        <td style='\"\"'>
         <table border='\"0\"' cellpadding='\"0\"' cellspacing='\"0\"' style='\"border-collapse:collapse;text-align:center;html_width:100%;width:100%;\"' width='\"100%\"'>
          <tr>
           <td style='\"width:15px;\"' width='\"15px\"'>
           </td>
           <td 0="" 0;\"="" 15px="" style='\"line-height:0px;max-width:600px;padding:0'>
            <table border='\"0\"' cellpadding='\"0\"' cellspacing='\"0\"' style='\"border-collapse:collapse;\"' width='\"100%\"'>
             <tr>
              <td style='\"width:100%;text-align:left;height:33px;\"'>
               <img height='\"33\"' src='\"https://static.xx.fbcdn.net/rsrc.php/v3/yb/r/QTa-gpOyYBa.png\"' style='\"border:0;\"'/>
              </td>
             </tr>
            </table>
           </td>
           <td style='\"width:15px;\"' width='\"15px\"'>
           </td>
          </tr>
         </table>
        </td>
       </tr>
       <tr>
        <td style='\"\"'>
         <table 0="" auto="" auto;\"="" border='\"0\"' cellpadding='\"0\"' cellspacing='\"0\"' style='\"border-collapse:collapse;margin:0' width='\"430\"'>
          <tr>
           <td style='\"\"'>
            <table 0="" auto="" auto;width:430px;\"="" border='\"0\"' cellpadding='\"0\"' cellspacing='\"0\"' style='\"border-collapse:collapse;margin:0' width='\"430px\"'>
             <td style='\"display:block;width:15px;\"' width='\"15\"'>
             </td>
             <tr>
              <td style='\"display:block;width:12px;\"' width='\"12\"'>
              </td>
              <td style='\"\"'>
               <table border='\"0\"' cellpadding='\"0\"' cellspacing='\"0\"' style='\"border-collapse:collapse;\"' width='\"100%\"'>
                <tr>
                 <td>
                  <td 0="" 0;color:#565a5c;font-size:18px;\"="" 10px="" style='\"margin:10px'>
                   <p 0="" 0;color:#565a5c;font-size:18px;\"="" 10px="" style='\"margin:10px'>
                    Hi,
                   </p>
                   <p 0="" 0;color:#565a5c;font-size:18px;\"="" 10px="" style='\"margin:10px'>
                    Someone tried to sign up for an Instagram account with platt.kramer@mailkept.com. If it was you, enter this confirmation code in the app:
                   </p>
                  </td>
                 </td>
                </tr>
                <tr>
                 <td>
                  <td style='\"padding:10px;color:#565a5c;font-size:32px;font-weight:500;text-align:center;padding-bottom:25px;\"'>
                   035247
                  </td>
                 </td>
                </tr>
               </table>
              </td>
             </tr>
            </table>
           </td>
          </tr>
         </table>
        </td>
       </tr>
       <tr>
        <td style='\"\"'>
         <table 0="" auto="" auto;width:100%;max-width:600px;\"="" border='\"0\"' cellpadding='\"0\"' cellspacing='\"0\"' style='\"border-collapse:collapse;margin:0'>
          <tr style='\"\"'>
           <td colspan='\"3\"' height='\"4\"' style='\"line-height:4px;\"'>
           </td>
          </tr>
          <tr>
           <td style='\"width:15px;\"' width='\"15px\"'>
           </td>
           <td style='\"display:block;width:20px;\"' width='\"20\"'>
           </td>
           <td style='\"text-align:center;\"'>
            <div style='\"padding-top:10px;display:flex;\"'>
             <div style='\"margin:auto;\"'>
              <img alt='\"\"' class='\"img\"' height='\"30\"' src='\"https://static.xx.fbcdn.net/rsrc.php/v3/y5/r/pTeXjRdVk8c.png\"' width='\"77\"'/>
             </div>
             <br/>
            </div>
            <div style='\"height:10px;\"'>
            </div>
            <div 5px="" auto="" auto;\"="" style='\"color:#abadae;font-size:11px;margin:0'>
             © Instagram. Facebook Inc., 1601 Willow Road, Menlo Park, CA 94025
             <br/>
            </div>
            <div 5px="" auto="" auto;\"="" style='\"color:#abadae;font-size:11px;margin:0'>
             This message was sent to
             <a style='\"color:#abadae;text-decoration:underline;\"'>
              platt.kramer@mailkept.com
             </a>
             .
             <br/>
            </div>
           </td>
           <td style='\"display:block;width:20px;\"' width='\"20\"'>
           </td>
           <td style='\"width:15px;\"' width='\"15px\"'>
           </td>
          </tr>
          <tr style='\"\"'>
           <td colspan='\"3\"' height='\"32\"' style='\"line-height:32px;\"'>
           </td>
          </tr>
         </table>
        </td>
       </tr>
       <tr style='\"\"'>
        <td colspan='\"3\"' height='\"20\"' style='\"line-height:20px;\"'>
        </td>
       </tr>
      </table>
      <span style='\"\"'>
       <img src='\"https://www.facebook.com/email_open_log_pic.php?mid=5c6bfe8e8bc66G24bc2cdafa4000G5c6c0327ebf39G37f\"' style='\"border:0;width:1px;height:1px;\"'/>
      </span>
     </td>
    </tr>
   </table>
  </body>
 </html>
 \n\n","mail_text":"[https://static.xx.fbcdn.net/rsrc.php/v3/yb/r/QTa-gpOyYBa.png]Hi,\n\nSomeone tried to sign up for an Instagram account with\nplatt.kramer@mailkept.com. If it was you, enter this confirmation code in the\napp:\n\n035247[https://static.xx.fbcdn.net/rsrc.php/v3/y5/r/pTeXjRdVk8c.png]\n© Instagram. Facebook Inc., 1601 Willow Road, Menlo Park, CA 94025\nThis message was sent to platt.kramer@mailkept.com.\n[https://www.facebook.com/email_open_log_pic.php?mid=5c6bfe8e8bc66G24bc2cdafa4000G5c6c0327ebf39G37f]","mail_html":"
 <!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Transitional //EN\">
 <html>
  <head>
   <title>
    Facebook
   </title>
   <meta charset="utf-8" content='\"text/html;' http-equiv='\"Content-Type\"'/>
   <style nonce='\"S0Z1gia3\"'>
    @media all and (max-width: 480px){*[class].ib_t{min-width:100% !important}*[class].ib_row{display:block !important}*[class].ib_ext{display:block !important;padding:10px 0 5px 0;vertical-align:top !important;width:100% !important}*[class].ib_img,*[class].ib_mid{vertical-align:top !important}*[class].mb_blk{display:block !important;padding-bottom:10px;width:100% !important}*[class].mb_hide{display:none !important}*[class].mb_inl{display:inline !important}*[class].d_mb_flex{display:block !important}}.d_mb_show{display:none}.d_mb_flex{display:flex}@media only screen and (max-device-width: 480px){.d_mb_hide{display:none !important}.d_mb_show{display:block !important}.d_mb_flex{display:block !important}}.mb_text h1,.mb_text h2,.mb_text h3,.mb_text h4,.mb_text h5,.mb_text h6{line-height:normal}.mb_work_text h1{font-size:18px;line-height:normal;margin-top:4px}.mb_work_text h2,.mb_work_text h3{font-size:16px;line-height:normal;margin-top:4px}.mb_work_text h4,.mb_work_text h5,.mb_work_text h6{font-size:14px;line-height:normal}.mb_work_text a{color:#1270e9}.mb_work_text p{margin-top:4px}
   </style>
  </head>
  <body bgcolor='\"#ffffff\"' dir='\"ltr\"' style='\"margin:0;padding:0;\"'>
   <table align='\"center\"' border='\"0\"' cellpadding='\"0\"' cellspacing='\"0\"' id='\"email_table\"' style='\"border-collapse:collapse;\"'>
    <tr>
     <td grande,tahoma,verdana,arial,sans-serif;background:#ffffff;\"="" id='\"email_content\"' neue,helvetica,lucida="" style='\"font-family:Helvetica'>
      <table border='\"0\"' cellpadding='\"0\"' cellspacing='\"0\"' style='\"border-collapse:collapse;\"' width='\"100%\"'>
       <tr style='\"\"'>
        <td colspan='\"3\"' height='\"20\"' style='\"line-height:20px;\"'>
        </td>
       </tr>
       <tr>
        <td colspan='\"3\"' height='\"1\"' style='\"line-height:1px;\"'>
        </td>
       </tr>
       <tr>
        <td style='\"\"'>
         <table border='\"0\"' cellpadding='\"0\"' cellspacing='\"0\"' style='\"border-collapse:collapse;text-align:center;html_width:100%;width:100%;\"' width='\"100%\"'>
          <tr>
           <td style='\"width:15px;\"' width='\"15px\"'>
           </td>
           <td 0="" 0;\"="" 15px="" style='\"line-height:0px;max-width:600px;padding:0'>
            <table border='\"0\"' cellpadding='\"0\"' cellspacing='\"0\"' style='\"border-collapse:collapse;\"' width='\"100%\"'>
             <tr>
              <td style='\"width:100%;text-align:left;height:33px;\"'>
               <img height='\"33\"' src='\"https://static.xx.fbcdn.net/rsrc.php/v3/yb/r/QTa-gpOyYBa.png\"' style='\"border:0;\"'/>
              </td>
             </tr>
            </table>
           </td>
           <td style='\"width:15px;\"' width='\"15px\"'>
           </td>
          </tr>
         </table>
        </td>
       </tr>
       <tr>
        <td style='\"\"'>
         <table 0="" auto="" auto;\"="" border='\"0\"' cellpadding='\"0\"' cellspacing='\"0\"' style='\"border-collapse:collapse;margin:0' width='\"430\"'>
          <tr>
           <td style='\"\"'>
            <table 0="" auto="" auto;width:430px;\"="" border='\"0\"' cellpadding='\"0\"' cellspacing='\"0\"' style='\"border-collapse:collapse;margin:0' width='\"430px\"'>
             <td style='\"display:block;width:15px;\"' width='\"15\"'>
             </td>
             <tr>
              <td style='\"display:block;width:12px;\"' width='\"12\"'>
              </td>
              <td style='\"\"'>
               <table border='\"0\"' cellpadding='\"0\"' cellspacing='\"0\"' style='\"border-collapse:collapse;\"' width='\"100%\"'>
                <tr>
                 <td>
                  <td 0="" 0;color:#565a5c;font-size:18px;\"="" 10px="" style='\"margin:10px'>
                   <p 0="" 0;color:#565a5c;font-size:18px;\"="" 10px="" style='\"margin:10px'>
                    Hi,
                   </p>
                   <p 0="" 0;color:#565a5c;font-size:18px;\"="" 10px="" style='\"margin:10px'>
                    Someone tried to sign up for an Instagram account with platt.kramer@mailkept.com. If it was you, enter this confirmation code in the app:
                   </p>
                  </td>
                 </td>
                </tr>
                <tr>
                 <td>
                  <td style='\"padding:10px;color:#565a5c;font-size:32px;font-weight:500;text-align:center;padding-bottom:25px;\"'>
                   035247
                  </td>
                 </td>
                </tr>
               </table>
              </td>
             </tr>
            </table>
           </td>
          </tr>
         </table>
        </td>
       </tr>
       <tr>
        <td style='\"\"'>
         <table 0="" auto="" auto;width:100%;max-width:600px;\"="" border='\"0\"' cellpadding='\"0\"' cellspacing='\"0\"' style='\"border-collapse:collapse;margin:0'>
          <tr style='\"\"'>
           <td colspan='\"3\"' height='\"4\"' style='\"line-height:4px;\"'>
           </td>
          </tr>
          <tr>
           <td style='\"width:15px;\"' width='\"15px\"'>
           </td>
           <td style='\"display:block;width:20px;\"' width='\"20\"'>
           </td>
           <td style='\"text-align:center;\"'>
            <div style='\"padding-top:10px;display:flex;\"'>
             <div style='\"margin:auto;\"'>
              <img alt='\"\"' class='\"img\"' height='\"30\"' src='\"https://static.xx.fbcdn.net/rsrc.php/v3/y5/r/pTeXjRdVk8c.png\"' width='\"77\"'/>
             </div>
             <br/>
            </div>
            <div style='\"height:10px;\"'>
            </div>
            <div 5px="" auto="" auto;\"="" style='\"color:#abadae;font-size:11px;margin:0'>
             © Instagram. Facebook Inc., 1601 Willow Road, Menlo Park, CA 94025
             <br/>
            </div>
            <div 5px="" auto="" auto;\"="" style='\"color:#abadae;font-size:11px;margin:0'>
             This message was sent to
             <a style='\"color:#abadae;text-decoration:underline;\"'>
              platt.kramer@mailkept.com
             </a>
             .
             <br/>
            </div>
           </td>
           <td style='\"display:block;width:20px;\"' width='\"20\"'>
           </td>
           <td style='\"width:15px;\"' width='\"15px\"'>
           </td>
          </tr>
          <tr style='\"\"'>
           <td colspan='\"3\"' height='\"32\"' style='\"line-height:32px;\"'>
           </td>
          </tr>
         </table>
        </td>
       </tr>
       <tr style='\"\"'>
        <td colspan='\"3\"' height='\"20\"' style='\"line-height:20px;\"'>
        </td>
       </tr>
      </table>
      <span style='\"\"'>
       <img src='\"https://www.facebook.com/email_open_log_pic.php?mid=5c6bfe8e8bc66G24bc2cdafa4000G5c6c0327ebf39G37f\"' style='\"border:0;width:1px;height:1px;\"'/>
      </span>
     </td>
    </tr>
   </table>
  </body>
 </html>
 \n\n","mail_timestamp":1625903679.839,"mail_attachments_count":0,"mail_attachments":{"attachment":[]}}]
</no-reply@mail.instagram.com>
<tr><td grande,tahoma,verdana,arial,sans-serif;background:#ffffff;\"="" id='\"email_content\"' neue,helvetica,lucida="" style='\"font-family:Helvetica'><table border='\"0\"' cellpadding='\"0\"' cellspacing='\"0\"' style='\"border-collapse:collapse;\"' width='\"100%\"'><tr style='\"\"'><td colspan='\"3\"' height='\"20\"' style='\"line-height:20px;\"'> </td></tr><tr><td colspan='\"3\"' height='\"1\"' style='\"line-height:1px;\"'></td></tr><tr><td style='\"\"'><table border='\"0\"' cellpadding='\"0\"' cellspacing='\"0\"' style='\"border-collapse:collapse;text-align:center;html_width:100%;width:100%;\"' width='\"100%\"'><tr><td style='\"width:15px;\"' width='\"15px\"'></td><td 0="" 0;\"="" 15px="" style='\"line-height:0px;max-width:600px;padding:0'><table border='\"0\"' cellpadding='\"0\"' cellspacing='\"0\"' style='\"border-collapse:collapse;\"' width='\"100%\"'><tr><td style='\"width:100%;text-align:left;height:33px;\"'><img height='\"33\"' src='\"https://static.xx.fbcdn.net/rsrc.php/v3/yb/r/QTa-gpOyYBa.png\"' style='\"border:0;\"'/></td></tr></table></td><td style='\"width:15px;\"' width='\"15px\"'></td></tr></table></td></tr><tr><td style='\"\"'><table 0="" auto="" auto;\"="" border='\"0\"' cellpadding='\"0\"' cellspacing='\"0\"' style='\"border-collapse:collapse;margin:0' width='\"430\"'><tr><td style='\"\"'><table 0="" auto="" auto;width:430px;\"="" border='\"0\"' cellpadding='\"0\"' cellspacing='\"0\"' style='\"border-collapse:collapse;margin:0' width='\"430px\"'><td style='\"display:block;width:15px;\"' width='\"15\"'>   </td><tr><td style='\"display:block;width:12px;\"' width='\"12\"'>   </td><td style='\"\"'><table border='\"0\"' cellpadding='\"0\"' cellspacing='\"0\"' style='\"border-collapse:collapse;\"' width='\"100%\"'><tr><td><td 0="" 0;color:#565a5c;font-size:18px;\"="" 10px="" style='\"margin:10px'><p 0="" 0;color:#565a5c;font-size:18px;\"="" 10px="" style='\"margin:10px'>Hi,</p><p 0="" 0;color:#565a5c;font-size:18px;\"="" 10px="" style='\"margin:10px'>Someone tried to sign up for an Instagram account with platt.kramer@mailkept.com. If it was you, enter this confirmation code in the app:</p></td></td></tr><tr><td><td style='\"padding:10px;color:#565a5c;font-size:32px;font-weight:500;text-align:center;padding-bottom:25px;\"'>035247</td></td></tr></table></td></tr></table></td></tr></table></td></tr><tr><td style='\"\"'><table 0="" auto="" auto;width:100%;max-width:600px;\"="" border='\"0\"' cellpadding='\"0\"' cellspacing='\"0\"' style='\"border-collapse:collapse;margin:0'><tr style='\"\"'><td colspan='\"3\"' height='\"4\"' style='\"line-height:4px;\"'> </td></tr><tr><td style='\"width:15px;\"' width='\"15px\"'></td><td style='\"display:block;width:20px;\"' width='\"20\"'>   </td><td style='\"text-align:center;\"'><div style='\"padding-top:10px;display:flex;\"'><div style='\"margin:auto;\"'><img alt='\"\"' class='\"img\"' height='\"30\"' src='\"https://static.xx.fbcdn.net/rsrc.php/v3/y5/r/pTeXjRdVk8c.png\"' width='\"77\"'/></div><br/></div><div style='\"height:10px;\"'></div><div 5px="" auto="" auto;\"="" style='\"color:#abadae;font-size:11px;margin:0'>© Instagram. Facebook Inc., 1601 Willow Road, Menlo Park, CA 94025<br/></div><div 5px="" auto="" auto;\"="" style='\"color:#abadae;font-size:11px;margin:0'>This message was sent to <a style='\"color:#abadae;text-decoration:underline;\"'>platt.kramer@mailkept.com</a>.<br/></div></td><td style='\"display:block;width:20px;\"' width='\"20\"'>   </td><td style='\"width:15px;\"' width='\"15px\"'></td></tr><tr style='\"\"'><td colspan='\"3\"' height='\"32\"' style='\"line-height:32px;\"'> </td></tr></table></td></tr><tr style='\"\"'><td colspan='\"3\"' height='\"20\"' style='\"line-height:20px;\"'> </td></tr></table><span style='\"\"'><img src='\"https://www.facebook.com/email_open_log_pic.php?mid=5c6bfe8e8bc66G24bc2cdafa4000G5c6c0327ebf39G37f\"' style='\"border:0;width:1px;height:1px;\"'/></span></td></tr>

我的代码:

response = requests.request("GET", url, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")
print(soup.prettify())
t = soup.find("tr")
print(t)

在此处输入图像描述
如果有人能帮助我,那就太好了。我挣扎了好几个小时。非常感谢。

daupos2t

daupos2t1#

代码出现在相当多的地方,但它总是跟在“某人尝试过…”文本后面,因此您可以尝试以下内容:

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")
stripped_strings = soup.stripped_strings

for text in stripped_strings:
    if "Someone tried to sign up for an Instagram account" in text:
        code = next(stripped_strings)
        print(code)
        break

看看.u字符串生成器。

相关问题