Electron Js Web Scraping

Posted on  by 



  1. Electron Js Web Scraping Api
  2. Electron Js Web Scraping Github
  3. Make A Browser Electron Js
  4. Electron Js Web Scraping Pdf

For the upcoming few web scraping tools, Axios will be used as the HTTP client. Regular Expressions: The hard way The simplest way to get started with web scraping without any dependencies is to use a bunch of regular expressions on the HTML string that you receive by querying a webpage using an HTTP client, but there is a big tradeoff. Scraping With NightmareJs Nightmare is a browser automation library that uses electron under the hood. The idea is that you can spin up an electron instance, go to a webpage and use nightmare methods like type and click to programmatically interact with the page. Script website interactions. A runtime, just like Node. Nightmare.js A high-level browser library https: //gi thub.com /segmentio.

Electron and NW.JS are two technologies aimed to allow developers to use web technologies to write traditional desktop software. In the past, GUI development for software applications typically required C#, Java or C++ libraries to get the job done. These days, millions of web developers are able to step into this same area of development by using Electron or NW.JS.

What are the differences?

In NW.js, the main entry point of an application can be an HTML web page. In that case, NW.js will open the given entry point in a browser window. In Electron, the entry point is always a script that gets fired, which then creates a browser window and loads HTML.

Both applications provide an API to write software applications but Electron provides a lower level API which works directly with NodeJS.

Which is more popular?

Electron is by far the more popular project. As of 2020, Electron has nearly 90,000 stars on GitHub which is where the code is hosted open source as well as the number one website for open source programmers. NW.JS has less than half the amount of stars.

Who is backing the projects?

Github / Microsoft are behind Electron, while NW.js is sponsored by Intel. Microsoft is worth over a trillion and a half dollars at the time of this article while Intel is worth 256 billion. Basically, both technologies are backed by industry leading tech companies.

What applications are using these technologies?

By far and away Electron has larger applications using it's technology. VS Code is the most popular text editor and IDE for programming in the open source community and it's created by Electron. Electron also powers the billion dollar Slack application as well as Spotify. Meanwhile, I could find no similar applications created using NW.JS which are on the same level of production.

Browser

Support

When it comes to legacy support for older operating systems. NW.JS wins by supporting legacy systems going all the way back to Windows XP. If supporting ancient systems is your goal, NW.JS may be the best option for you. NW.JS also has other minor advantages such as supporting PDF files out of the box, using Chrome PDF native plugin. Electron has also added this feature but it's slightly more buggy at the moment. When it comes to auto-updating, crash reports Electron provides support for this built-in while NW.JS requires additional modifications.

Performance

NW.JS beats Electron when it comes to memory and application size.

  • Memory - NWJS = 40MB / Electron = 45MB
  • Disk - NWJS = 78MB / Electron = 118MB

The good

Last week I felt pretty good. I got the Electronjs app up and running with the Electron Dead Link Checker v0.0.1. This week is a stark contrast as I feel like very little of good came out of the code this week.

Github

The coolest part is that I found and am starting to use parts of github that I have never used before. They have a trello board like thing that you can easily use to track tasks. I really like having it baked right into my repo where I am doing all my work.

From issues, you can assign a project. Then the board watches that issue and updates its position in the board based on what happens with the issue. Let’s say you resolve an issue (which I discovered can be done from the commit with just a fixes #<issuenumber> in the commit message; see this for more). That issue will automatically move from where it was to done.

You can manage the automation like the above picture shows and decide what triggers what events for the column. Pretty cool.

Display progress

Electron Js Web Scraping Api

WebElectron Js Web Scraping

Displaying an elapsed time was a goal of mine with v0.0.2 and while I didn’t finish all I wanted for v0.0.2, I did finish this. An elapsed time serves as both a notice to the user that work is happening and a cool way to track how long it’s taking to complete the task.

Originally I was returning the elapsed time from the dead-link-checker module but then I realized I could just start a timer within my electron app and complete it when the request completed.

Pretty simple. I think it looked a bit better incrementing it every 10 ms and then use the simple math in my html of dividing by <h4 *ngIf='elapsedTime'>Elapsed time - {{elapsedTime / 100}} seconds.</h4>. Itunes mac osx.

Don’t display 999 for timeouts

Electron Js Web Scraping Github

This is good and bad. Previously I was displaying 999 if I got something wrong without a timeout. Generally the error would be something like RequestError {name: 'RequestError', message: 'Error: read ECONNRESET', cause: Error: read ECONNRESET which didn’t have a status code. I have been treating them like timeouts and assiging a 999 status code.

For this…I cheated. I just looked at if I had a 999 and replaced it with an actual timeout status code, 408. It’s kind of misleading. I’ll talk more about this kind of stuff in the bad section.

Electron app is handling long tasks like a champ

I always worry with any kind of script that the longer it goes, the more likely something will break. I had proven that the script alone could handle long tasks, which one going up to somewhere in the 30 some hours.

I’ve improved the script to not check things like #comment so that makes it check a lot less links but it’s still hitting upwards of 10k+ links over 10-15 minutes. I’m really pleased with this.

The bad

Cancel button

Most of my focus was on this. I’m trying to do it with webworkers because this is an area that I would like to get better at. I feel like I spun my tires here. I barely made any progress before I was stopped by an erro that I still haven’t figured out.

Make A Browser Electron Js

If I call deadLinkChecker directly from my app, no problem. If I call it from a webworker, death:

I went through every piece deadLinkChecker commenting out parts until I could isolate the problem. It seems there is more than one problem but for sure requestPromise is a problem. It would be pretty tough to use this script without requestPromise.

This is something that I’m going to keep working on. There must be some way to do this kind of thing. Most of the people that have this problem have it when it’s related to Angular but for me the script works great if it’s called directly from the Angular side. It only crashes like this in the web worker.

Not very reliable yet

This should probably be the highest priority. Having a cancel button is important but a user could always just close the app and then reopen it. Right now it seems that bad links are returning, such as my faked out 408, that are actually fine. Having an occasional of these I think is probably okay. I think a low threshold like 10% may be acceptable and I’m way over that. As such it ruins confidence in the product and kind of makes it useless.

Electron Js Web Scraping Pdf

Scraping

I’m not sure why sometimes I will get a 503 from Amazon or a 403 from other random websites, in addition to the weird timeouts I get. It’s possible I’m getting blocked but I really think I can probably work that out in all situations except recaptchas and even a recaptcha should return a 200 status code.

So…that is what I will continue to work on. Make usb installer mac.





Coments are closed