python - How can I find where files are used as links in html pages? -


i have static website versions of old pages still stored in root. want find these pages , if used in link somewhere in root's files.
made list of files inside root using powershell's command ls -r -name , store on file 'filelist.txt' , have like:

directory1 directory2 5s.htm 5s.html 5s_introduction.htm ... images\icons images\icons\linkedin.png images\icons\project-slider-arrow-left.png images\icons\project-slider-arrow-right.png 

i want these files used, thought use simple script in python (as don't know windows' powershell) takes line list , occurences in each html page inside root.
extract file name tried regex on notepad++:

[^\\^\n]+\.[a-z]{0,4} 

and seemed work...(^\n exclude lines represent directories)
second step, tried adapt python lines found on stackoverflow:

import re open('filelist.txt') f:     l in f:         m = re.match('([^\\^\n]+\.[a-z]{0,4})', l)         if m:             print(m.group(1)) 

but returns me strings wrong, full of spaces or single letters, if regex wrong. thought use regex result variable , check somehow on each html pages on root directory, i'm stuck here.

since sure file names contain '.', each path can split on '\' , checked if contains '.'. also, stripping each line remove new line characters.

with open('filelist.txt') f:     l in f:       l= l.strip()       if '.' in l.split('\\')[-1]:           print l.split('\\')[-1] 

output:

5s.htm 5s.html 5s_introduction.htm linkedin.png project-slider-arrow-left.png project-slider-arrow-right.png 

Comments

Popular posts from this blog

sequelize.js - Sequelize group by with association includes id -

android - Robolectric "INTERNET permission is required" -

java - Android raising EPERM (Operation not permitted) when attempting to send UDP packet after network connection -