Dowemo

Introduction

In a small task, you need to extract the latest version information from multiple data segments and use regular expressions to extract data, some pit.

1. matching and zero width negative review of assertion.

Look at the code.

import re


listVersion=[]


fileText='PRS-7000风电场-SZ142372_低压-20170702新版本PRS-7000风电场-SZ142372_低压-150102'


matchV=re.findall(r"((PRS-700U|PRS-7000).+?(?<!SZ)(d{8}|d{7}|d{6}))", fileText, re.MULTILINE)


print(matchV)


for matchVersion in matchV:


 listVersion.append(matchVersion[0])


print(listVersion)


Output:

[('PRS-7000风电场-SZ142372_低压-20170702', 'PRS-7000', '20170702'), ('PRS-7000风电场-SZ142372_低压-150102', 'PRS-7000', '150102')]


['PRS-7000风电场-SZ142372_低压-20170702', 'PRS-7000风电场-SZ142372_低压-150102']


  • Regular regular ((PRS-700U|PRS-7000).+?(?<!SZ)(d{8}|d{7}|d{6})) In the middle .+? uses matching, that's lazy match, matches fewer characters, avoiding matching two information into one
  • A (?<!SZ) with zero width negative review is used to assert that this location doesn't match SZ before it can match the number that begi & with the.
    Of course, the regular date isn't more specific, such as judgment range, leap year etc.
  • As shown in the output results, the osc function of the regular expression returns all matching substrings in a column table, matches all ( ) groups, and iterates over the fi & t, the outermost ( ) match.
  • re.MULTILINE represents multiple lines match
  • 2. File read and write.

Read files:

with open(DATA_FILE) as input_file:


 for line in input_file:


 Index, fileName, fileType, creatTime, fileRow, fileText = line.split(';')


 print(Index+','+fileName+','+fileText)


. . .


Write files:

with open(STORED_FILE, 'w') as output_file:


 str_list = [line + 'n' for line in outputText] # 在list中加入换行符


 output_file.writelines(str_list)


Reference

re module python3 standard library
regular expression 30 minutes tutorial




Copyright © 2011 Dowemo All rights reserved.    Creative Commons   AboutUs