2023.4.28.0

Plugins
IPluginContentProvider: added DownloadSingleObject function; added tokens to GetMedia and Download functions; removed GetSpecialData function
Add IDownloadableMedia interface
Removed 'Channel' option from all functions and enums
ISiteSettings: added GetSingleMediaInstance function
ExchangeOptions: removed 'IsChannel'
UserMediaTypes: added Audio and AudioPre enums
IUserMedia, PluginUserMedia: changed ContentType and DownloadState from integers to their enums

SCrawler
Add YouTube standalone downloader
Add gallery-dl & yt-dlp support
Remove 'UserInfo' requirement from 'ProfilesSaved'
Update 'SiteSettingsBase' to use domains and Netscape cookies
UserDataBase: remove channels; remove old 'Merge' const; standardize SavedPosts file naming; move 'ValidateMD5' function from Twitter to UserDataBase to use it in other UserData classes; add 'DownloadSingleObject' environment for single posts; add validating file extension for m3u8 during download; add reindex of video file during download

Rewritten DomainsContainer
Create a universal settings form and PSettingsArttribute
Gfycat, Imgur: turn these classes into IUserData to download a single object

All plugins: update 'GetInstance' function for saved posts; update domains where implemented; remove 'OptionForm' where it exists; update options where they exist; update unix date providers; reconfigure channels where they exist

LPSG: fix attachments; update converters and regex
Add sites: ThisVid, Mastodon, Pinterest, YouTube, YouTube music
Reddit: standardize container parsing for all data types; new channel environment; fix 'ReparseMissing' function; redirect data downloading to the base download function, saved crossposts support
Twitter: fixed gif path bug; fixed downloading saved posts
PornHub: hide unnecessary errors; photo galleries bug
RedGifs: add 'UserAgent' option

Added icons to download progress

Rename some objects
Completely redesigned standalone downloader form and rewritten its environment
WebClient2: update to use tokens

Labels: update label form (save labels to file only when OK button is clicked); change removing labels.txt from recycle bin to permanent; disable storing label 'NoParsedUser'

UserCreatorForm: remove the 'Channel' checkbox and related functions; ability to extract the user's URL from the buffer and apply parameters if found
Remove temporary 'EncryptCookies' module

MainFrame: added simplified way to create new users (Ctrl+Insert to create a new user with default parameters from clipboard URL); removed SCrawler command line argument "-v" (remove the ability to run SCrawler as video downloader)
PropertyValueHost: update for option forms compatibility
SettingsHost: removed 'GetSpecialData' fork; added 'GetSingleMediaInstance' fork
UserDataHost: update functions with tokens; update events; add 'DownloadSingleObject' function
Settings: add the ability to get environment from 4 destinations; add the ability to set the program environment manually; add CMDEncoding; add cache; remove the old function 'RemoveUnusedPlugins'; add 'STDownloader' properties; add YT compatibility; add new notification options; add deleting user settings file when 'SettingsCLS.Dispose()' if where are no users in SCrawler
UserFinder: remove old 'Merge' const; remove channel option
UserInfo: remove channel option
This commit is contained in:
Andy
2023-04-28 10:13:46 +03:00
parent db9e2cfb88
commit b2a9b22478
270 changed files with 18120 additions and 3332 deletions

View File

@@ -7,39 +7,32 @@
' This program is distributed in the hope that it will be useful,
' but WITHOUT ANY WARRANTY
Imports System.Net
Imports System.Drawing
Imports System.Threading
Imports SCrawler.API.Base
Imports SCrawler.API.YouTube.Objects
Imports PersonalUtilities.Functions.XML
Imports PersonalUtilities.Functions.RegularExpressions
Imports PersonalUtilities.Tools.Web.Clients
Imports PersonalUtilities.Tools.Web.Documents.JSON
Imports PersonalUtilities.Tools.ImageRenderer
Imports UStates = SCrawler.API.Base.UserMedia.States
Imports UTypes = SCrawler.API.Base.UserMedia.Types
Namespace API.Twitter
Friend Class UserData : Inherits UserDataBase
Private Const SinglePostUrl As String = "https://api.twitter.com/1.1/statuses/show.json?id={0}&tweet_mode=extended"
Protected SinglePostUrl As String = "https://api.twitter.com/1.1/statuses/show.json?id={0}&tweet_mode=extended"
#Region "XML names"
Private Const Name_GifsDownload As String = "GifsDownload"
Private Const Name_GifsSpecialFolder As String = "GifsSpecialFolder"
Private Const Name_GifsPrefix As String = "GifsPrefix"
Private Const Name_UseMD5Comparison As String = "UseMD5Comparison"
Private Const Name_RemoveExistingDuplicates As String = "RemoveExistingDuplicates"
Private Const Name_StartMD5Checked As String = "StartMD5Checked"
#End Region
#Region "Declarations"
Friend Property GifsDownload As Boolean
Friend Property GifsSpecialFolder As String
Friend Property GifsPrefix As String
Friend Property GifsDownload As Boolean = True
Friend Property GifsSpecialFolder As String = String.Empty
Friend Property GifsPrefix As String = String.Empty
Private ReadOnly _DataNames As List(Of String)
Friend Property UseMD5Comparison As Boolean = False
Private StartMD5Checked As Boolean = False
Friend Property RemoveExistingDuplicates As Boolean = False
#End Region
#Region "Exchange options"
Friend Overrides Function ExchangeOptionsGet() As Object
Return New EditorExchangeOptions(Me)
Return New EditorExchangeOptions(Me) With {.SiteKey = HOST.Key}
End Function
Friend Overrides Sub ExchangeOptionsSet(ByVal Obj As Object)
If Not Obj Is Nothing AndAlso TypeOf Obj Is EditorExchangeOptions Then
@@ -83,45 +76,35 @@ Namespace API.Twitter
Protected Overrides Sub DownloadDataF(ByVal Token As CancellationToken)
If IsSavedPosts Then
If _ContentList.Count > 0 Then _DataNames.ListAddList(_ContentList.Select(Function(c) c.Post.ID), LAP.ClearBeforeAdd, LAP.NotContainsOnly)
DownloadData(String.Empty, Token)
DownloadData_SavedPosts(Token)
Else
If _ContentList.Count > 0 Then _DataNames.ListAddList(_ContentList.Select(Function(c) c.File.File), LAP.ClearBeforeAdd, LAP.NotContainsOnly)
DownloadData(String.Empty, Token)
If UseMD5Comparison Then ValidateMD5(Token)
End If
End Sub
Private Overloads Sub DownloadData(ByVal POST As String, ByVal Token As CancellationToken)
Dim URL$ = String.Empty
Try
Dim NextCursor$ = String.Empty
Dim __NextCursor As Predicate(Of EContainer) = Function(e) e.Value({"content", "operation", "cursor"}, "cursorType") = "Bottom"
Dim PostID$ = String.Empty
Dim PostDate$
Dim nn As EContainer, s As EContainer
Dim nn As EContainer
Dim NewPostDetected As Boolean = False
Dim ExistsDetected As Boolean = False
Dim UID As Func(Of EContainer, String) = Function(e) e.XmlIfNothing.Item({"user", "id"}).XmlIfNothingValue
If IsSavedPosts Then
If Name.IsEmptyString Then Throw New ArgumentNullException With {.HelpLink = 1}
URL = $"https://api.twitter.com/2/timeline/bookmark.json?screen_name={Name}&count=200" &
"&tweet_mode=extended&include_entities=true&include_user_entities=true&include_ext_media_availability=true"
If Not POST.IsEmptyString Then URL &= $"&cursor={SymbolsConverter.ASCII.EncodeSymbolsOnly(POST)}"
Else
URL = $"https://api.twitter.com/1.1/statuses/user_timeline.json?screen_name={Name}&count=200&exclude_replies=false&include_rts=1&tweet_mode=extended"
If Not POST.IsEmptyString Then URL &= $"&max_id={POST}"
End If
URL = $"https://api.twitter.com/1.1/statuses/user_timeline.json?screen_name={Name}&count=200&exclude_replies=false&include_rts=1&tweet_mode=extended"
If Not POST.IsEmptyString Then URL &= $"&max_id={POST}"
ThrowAny(Token)
Dim r$ = Responser.GetResponse(URL,, EDP.ThrowException)
Dim r$ = Responser.GetResponse(URL)
If Not r.IsEmptyString Then
Using w As EContainer = JsonDocument.Parse(r)
If w.ListExists Then
If Not IsSavedPosts And POST.IsEmptyString And Not w.ItemF({0, "user"}) Is Nothing Then
If POST.IsEmptyString And Not w.ItemF({0, "user"}) Is Nothing Then
With w.ItemF({0, "user"})
If .Value("screen_name").StringToLower = Name Then
If .Value("screen_name").StringToLower = Name.ToLower Then
UserSiteNameUpdate(.Value("name"))
UserDescriptionUpdate(.Value("description"))
Dim __getImage As Action(Of String) = Sub(ByVal img As String)
@@ -145,15 +128,10 @@ Namespace API.Twitter
For Each nn In If(IsSavedPosts, w({"globalObjects", "tweets"}).XmlIfNothing, w)
ThrowAny(Token)
If nn.Count > 0 Then
If IsSavedPosts Then
PostID = nn.Value
If PostID.IsEmptyString Then PostID = nn.Value("id_str")
Else
PostID = nn.Value("id")
If ID.IsEmptyString Then
ID = UID(nn)
If Not ID.IsEmptyString Then UpdateUserInformation()
End If
PostID = nn.Value("id")
If ID.IsEmptyString Then
ID = UID(nn)
If Not ID.IsEmptyString Then UpdateUserInformation()
End If
'Date Pattern:
@@ -172,32 +150,58 @@ Namespace API.Twitter
Continue For
End If
If IsSavedPosts OrElse Not ParseUserMediaOnly OrElse
(
Not nn.Contains("retweeted_status") OrElse
(Not ID.IsEmptyString AndAlso UID(nn("retweeted_status")) = ID)
) Then ObtainMedia(nn, PostID, PostDate)
If Not ParseUserMediaOnly OrElse
(Not nn.Contains("retweeted_status") OrElse (Not ID.IsEmptyString AndAlso UID(nn("retweeted_status")) = ID)) Then _
ObtainMedia(nn, PostID, PostDate)
End If
Next
If IsSavedPosts Then
s = w.ItemF({"timeline", "instructions", 0, "addEntries", "entries"}).XmlIfNothing
If s.Count > 0 Then NextCursor = If(s.ItemF({__NextCursor})?.Value({"content", "operation", "cursor"}, "value"), String.Empty)
End If
End If
End Using
If IsSavedPosts Then
If Not NextCursor.IsEmptyString And Not NextCursor = POST Then DownloadData(NextCursor, Token)
Else
If POST.IsEmptyString And ExistsDetected Then Exit Sub
If Not PostID.IsEmptyString And NewPostDetected Then DownloadData(PostID, Token)
If POST.IsEmptyString And ExistsDetected Then Exit Sub
If Not PostID.IsEmptyString And NewPostDetected Then DownloadData(PostID, Token)
End If
Catch ex As Exception
ProcessException(ex, Token, $"data downloading error [{URL}]")
End Try
End Sub
Private Sub DownloadData_SavedPosts(ByVal Token As CancellationToken)
Try
Dim urls As List(Of String) = GetBookmarksUrlsFromGalleryDL()
If urls.ListExists Then
Dim postIds As New List(Of String)
Dim r$
Dim j As EContainer, jj As EContainer
Dim jErr As New ErrorsDescriber(EDP.ReturnValue)
Dim rPattern As RParams = RParams.DM("(?<=tweet-)(\d+)\Z", 0, EDP.ReturnValue)
For Each url$ In urls
r = Responser.GetResponse(url)
If Not r.IsEmptyString Then
j = JsonDocument.Parse(r, jErr)
If Not j Is Nothing Then
jj = j.ItemF({"data", "bookmark_timeline_v2", "timeline", "instructions", 0, "entries"})
If If(jj?.Count, 0) > 0 Then postIds.ListAddList(jj.Select(Function(jj2) CStr(RegexReplace(jj2.Value("entryId"), rPattern))), LNC)
j.Dispose()
End If
End If
Next
If postIds.Count > 0 Then postIds.RemoveAll(Function(pid) pid.IsEmptyString OrElse (_TempPostsList.Contains(pid) Or _DataNames.Contains(pid)))
If postIds.Count > 0 Then
For Each __id$ In postIds
_TempPostsList.Add(__id)
r = Responser.GetResponse(String.Format(SinglePostUrl, __id),, EDP.ReturnValue)
If Not r.IsEmptyString Then
j = JsonDocument.Parse(r, jErr)
If Not j Is Nothing Then
If j.Count > 0 Then ObtainMedia(j, __id, j.Value("created_at"))
j.Dispose()
End If
End If
Next
End If
End If
Catch ane As ArgumentNullException When ane.HelpLink = 1
MyMainLOG = "Username not set for saved Twitter posts"
Catch ex As Exception
ProcessException(ex, Token, $"data downloading error{IIf(IsSavedPosts, " (Saved Posts)", String.Empty)} [{URL}]")
ProcessException(ex, Token, "data downloading error (Saved Posts)")
End Try
End Sub
#End Region
@@ -252,18 +256,24 @@ Namespace API.Twitter
If .ListExists Then
For Each n As EContainer In .Self
If n.Value("type") = "animated_gif" Then
With n({"video_info", "variants"}).XmlIfNothing.ItemF({gifUrl}).XmlIfNothing
url = .Value("url")
ff = UrlFile(url)
If Not ff.IsEmptyString Then
If GifsDownload And Not _DataNames.Contains(ff) Then
m = MediaFromData(url, PostID, PostDate,, State, UTypes.Video)
f = m.File
If Not f.IsEmptyString And Not GifsPrefix.IsEmptyString Then f.Name = $"{GifsPrefix}{f.Name}" : m.File = f
If Not GifsSpecialFolder.IsEmptyString Then m.SpecialFolder = $"{GifsSpecialFolder}*"
_TempMediaList.ListAddValue(m, LNC)
End If
Return True
With n({"video_info", "variants"})
If .ListExists Then
With .ItemF({gifUrl})
If .ListExists Then
url = .Value("url")
ff = UrlFile(url)
If Not ff.IsEmptyString Then
If GifsDownload And Not _DataNames.Contains(ff) Then
m = MediaFromData(url, PostID, PostDate,, State, UTypes.Video)
f = m.File
If Not f.IsEmptyString And Not GifsPrefix.IsEmptyString Then f.Name = $"{GifsPrefix}{f.Name}" : m.File = f
If Not GifsSpecialFolder.IsEmptyString Then m.SpecialFolder = $"{GifsSpecialFolder}*"
_TempMediaList.ListAddValue(m, LNC)
End If
Return True
End If
End If
End With
End If
End With
End If
@@ -276,7 +286,7 @@ Namespace API.Twitter
Return False
End Try
End Function
Private Shared Function GetVideoNodeURL(ByVal w As EContainer) As String
Private Function GetVideoNodeURL(ByVal w As EContainer) As String
Dim v As EContainer = w.GetNode(VideoNode)
If v.ListExists Then
Dim l As New List(Of Sizes)
@@ -298,6 +308,18 @@ Namespace API.Twitter
Return String.Empty
End Function
#End Region
#Region "Gallery-DL Support"
Private Function GetBookmarksUrlsFromGalleryDL() As List(Of String)
Dim command$ = $"gallery-dl --verbose --simulate --cookies ""{DirectCast(HOST.Source, SiteSettings).CookiesNetscapeFile}"" https://twitter.com/i/bookmarks"
Try
Using batch As New GDL.GDLBatch With {.TempPostsList = _TempPostsList} : Return GDL.GetUrlsFromGalleryDl(batch, command) : End Using
Catch ex As Exception
HasError = True
LogError(ex, $"GetJson({command})")
Return Nothing
End Try
End Function
#End Region
#Region "ReparseMissing"
Protected Overrides Sub ReparseMissing(ByVal Token As CancellationToken)
Dim rList As New List(Of Integer)
@@ -337,156 +359,19 @@ Namespace API.Twitter
End Try
End Sub
#End Region
#Region "MD5 support"
Private Const VALIDATE_MD5_ERROR As String = "VALIDATE_MD5_ERROR"
Private Sub ValidateMD5(ByVal Token As CancellationToken)
Try
Dim missingMD5 As Predicate(Of UserMedia) = Function(d) (d.Type = UTypes.GIF Or d.Type = UTypes.Picture) And d.MD5.IsEmptyString
If UseMD5Comparison And _TempMediaList.Exists(missingMD5) Then
Dim i%
Dim data As UserMedia = Nothing
Dim hashList As New Dictionary(Of String, SFile)
Dim f As SFile
Dim ErrMD5 As New ErrorsDescriber(EDP.ReturnValue)
Dim __getMD5 As Func(Of UserMedia, Boolean, String) =
Function(ByVal __data As UserMedia, ByVal IsUrl As Boolean) As String
Try
Dim ImgFormat As Imaging.ImageFormat = Nothing
Dim hash$ = String.Empty
Dim __isGif As Boolean = False
If __data.Type = UTypes.GIF Then
ImgFormat = Imaging.ImageFormat.Gif
__isGif = True
ElseIf Not __data.File.IsEmptyString Then
ImgFormat = GetImageFormat(__data.File)
End If
If ImgFormat Is Nothing Then ImgFormat = Imaging.ImageFormat.Jpeg
If IsUrl Then
hash = ByteArrayToString(GetMD5(SFile.GetBytesFromNet(__data.URL_BASE.IfNullOrEmpty(__data.URL), ErrMD5), ImgFormat, ErrMD5))
Else
hash = ByteArrayToString(GetMD5(SFile.GetBytes(__data.File, ErrMD5), ImgFormat, ErrMD5))
End If
If hash.IsEmptyString And Not __isGif Then
If ImgFormat Is Imaging.ImageFormat.Jpeg Then ImgFormat = Imaging.ImageFormat.Png Else ImgFormat = Imaging.ImageFormat.Jpeg
If IsUrl Then
hash = ByteArrayToString(GetMD5(SFile.GetBytesFromNet(__data.URL_BASE.IfNullOrEmpty(__data.URL), ErrMD5), ImgFormat, ErrMD5))
Else
hash = ByteArrayToString(GetMD5(SFile.GetBytes(__data.File, ErrMD5), ImgFormat, ErrMD5))
End If
End If
Return hash
Catch
Return String.Empty
End Try
End Function
If Not StartMD5Checked Then
StartMD5Checked = True
If _ContentList.Exists(missingMD5) Then
Dim existingFiles As List(Of SFile) = SFile.GetFiles(MyFileSettings.CutPath, "*.jpg|*.jpeg|*.png|*.gif",, EDP.ReturnValue).ListIfNothing
Dim eIndx%
Dim eFinder As Predicate(Of SFile) = Function(ff) ff.File = data.File.File
If RemoveExistingDuplicates Then
RemoveExistingDuplicates = False
_ForceSaveUserInfo = True
If existingFiles.Count > 0 Then
Dim h$
For i = existingFiles.Count - 1 To 0 Step -1
h = __getMD5(New UserMedia With {.File = existingFiles(i)}, False)
If Not h.IsEmptyString Then
If hashList.ContainsKey(h) Then
MyMainLOG = $"{ToStringForLog()}: Removed image [{existingFiles(i).File}] (duplicate of [{hashList(h).File}])"
existingFiles(i).Delete(SFO.File, SFODelete.DeleteToRecycleBin, ErrMD5)
existingFiles.RemoveAt(i)
Else
hashList.Add(h, existingFiles(i))
End If
End If
Next
End If
End If
For i = 0 To _ContentList.Count - 1
data = _ContentList(i)
If (data.Type = UTypes.GIF Or data.Type = UTypes.Picture) Then
If data.MD5.IsEmptyString Then
ThrowAny(Token)
eIndx = existingFiles.FindIndex(eFinder)
If eIndx >= 0 Then
data.MD5 = __getMD5(New UserMedia With {.File = existingFiles(eIndx)}, False)
If Not data.MD5.IsEmptyString Then _ContentList(i) = data : _ForceSaveUserData = True
End If
End If
existingFiles.RemoveAll(eFinder)
End If
Next
If existingFiles.Count > 0 Then
For i = 0 To existingFiles.Count - 1
f = existingFiles(i)
data = New UserMedia(f.File) With {
.State = UStates.Downloaded,
.Type = IIf(f.Extension = "gif", UTypes.GIF, UTypes.Picture),
.File = f
}
ThrowAny(Token)
data.MD5 = __getMD5(data, False)
If Not data.MD5.IsEmptyString Then _ContentList.Add(data) : _ForceSaveUserData = True
Next
existingFiles.Clear()
End If
End If
End If
If _ContentList.Count > 0 Then
With _ContentList.Select(Function(d) d.MD5)
If .ListExists Then .ToList.ForEach(Sub(md5value) _
If Not md5value.IsEmptyString AndAlso Not hashList.ContainsKey(md5value) Then hashList.Add(md5value, New SFile))
End With
End If
For i = _TempMediaList.Count - 1 To 0 Step -1
data = _TempMediaList(i)
If missingMD5(data) Then
ThrowAny(Token)
data.MD5 = __getMD5(data, True)
If Not data.MD5.IsEmptyString Then
If hashList.ContainsKey(data.MD5) Then
_TempMediaList.RemoveAt(i)
Else
hashList.Add(data.MD5, New SFile)
_TempMediaList(i) = data
End If
End If
End If
Next
#Region "DownloadSingleObject"
Protected Overrides Sub DownloadSingleObject_GetPosts(ByVal Data As IYouTubeMediaContainer, ByVal Token As CancellationToken)
Dim PostID$ = RegexReplace(Data.URL, RParams.DM("(?<=/)\d+", 0))
If Not PostID.IsEmptyString Then
Dim r$ = Responser.GetResponse(String.Format(SinglePostUrl, PostID),, EDP.ReturnValue)
If Not r.IsEmptyString Then
Using j As EContainer = JsonDocument.Parse(r)
If j.ListExists Then ObtainMedia(j, j.Value("id"), j.Value("created_at"))
End Using
End If
Catch ex As Exception
ProcessException(ex, Token, "ValidateMD5",, VALIDATE_MD5_ERROR)
End Try
End If
End Sub
#End Region
#Region "Get video static"
Friend Shared Function GetVideoInfo(ByVal URL As String, ByVal resp As Responser) As IEnumerable(Of UserMedia)
Try
If URL.Contains("twitter") Then
Dim PostID$ = RegexReplace(URL, RParams.DM("(?<=/)\d+", 0))
If Not PostID.IsEmptyString Then
Dim r$
Using rc As Responser = resp.Copy() : r = rc.GetResponse(String.Format(SinglePostUrl, PostID),, EDP.ReturnValue) : End Using
If Not r.IsEmptyString Then
Using j As EContainer = JsonDocument.Parse(r)
If j.ListExists Then
Dim u$ = GetVideoNodeURL(j)
If Not u.IsEmptyString Then Return {MediaFromData(u, PostID, String.Empty,,, UTypes.Video)}
End If
End Using
End If
End If
End If
Return Nothing
Catch ex As Exception
Return ErrorsDescriber.Execute(EDP.ShowMainMsg + EDP.SendInLog, ex, $"Twitter standalone downloader: fetch media error ({URL})")
End Try
End Function
#End Region
#Region "Picture options"
Private Function GetPictureOption(ByVal w As EContainer) As String
Const P4K As String = "4096x4096"
@@ -541,10 +426,10 @@ Namespace API.Twitter
End Function
#End Region
#Region "Create media"
Private Shared Function MediaFromData(ByVal _URL As String, ByVal PostID As String, ByVal PostDate As String,
Optional ByVal _PictureOption As String = Nothing,
Optional ByVal State As UStates = UStates.Unknown,
Optional ByVal Type As UTypes = UTypes.Undefined) As UserMedia
Private Function MediaFromData(ByVal _URL As String, ByVal PostID As String, ByVal PostDate As String,
Optional ByVal _PictureOption As String = Nothing,
Optional ByVal State As UStates = UStates.Unknown,
Optional ByVal Type As UTypes = UTypes.Undefined) As UserMedia
_URL = LinkFormatterSecure(RegexReplace(_URL.Replace("\", String.Empty), LinkPattern))
Dim m As New UserMedia(_URL) With {.PictureOption = _PictureOption, .Post = New UserPost With {.ID = PostID}, .Type = Type}
If Not m.URL.IsEmptyString Then m.File = CStr(RegexReplace(m.URL, FilesPattern))