Project Introduction

with No Comments

I am Tra-Vaughn James, a Computer Science major, and Junior:

Today, Bioinformatics is an evolving field, in which computing resources have become more powerful, readily available and workflows have increased in complexity. New workflow management tools (WMT) attempt to develop software that fully harnesses this computational power, creating intuitive implementations utilizing machine learning techniques. This streamlines the design of complex workflows. However, overarching problems still remain that newer workflow management tools do not fully address: they are too specific to particular use cases, and they present a great learning curve for users unfamiliar with computing environments. Many implementations require one to spend copious amounts of time understanding the tool and adjusting already existing frameworks to ones needs, creating frustration and inefficiency. This problem is experienced by both novice and experienced bioinformaticians alike. Using OpenWDL, a Workflow Description Language, as the basis, I seek to develop an open in use workflow management tool coupled with a GUI interface. As OpenWDL is a widely known WMT, its familiarity will aid in my implementations usability. Additionally, the GUI interface will present a more welcoming environment than that of a command line interface in which many WMT’s often employ. To assess the effectiveness of my implementation, I will then assess it to other WMT’s, comparing its usability and openness to other bioinformatic pipeline managers such as SnakeMake and NextFlow.


2 Pitches with Bibliography

with No Comments

Pitch #1

The  use of computing resources allows the processing of biological data and computational analysis. However in order to conver this data into useful information requires the us of a large number oftools, parameters, and dynamically changing reference data. As a result workflow managers such asSnake and OpenWDL were created to make these workflow scalable, repeatable and shareable. However, many of these workflow managers offer ambiguity toward creating workflows often lacking the specificity many other workflows require. I plan on creating bioinformatics workflow in which can be specified to particular workflows.

https://peerj.com/articles/7223/

Bioshake: A Haskell EDSL for Bioinformatics workflows

Justin Bedő. 2015. Experiences with workflows for automating data-intensive bioinformatics – biology direct. (August 2015). Retrieved January 9, 2022 from https://biologydirect.biomedcentral.com/articles/10.1186/s13062-015-0071-8 

  • Bioshake raises many properties to the type level allowing the correctness of a workflow to be statically checked during compilation, catching errors before any lengthy executions process. Bioshake is buit on top of Shake, an industrial strength build tool, thus inheriting many of its reporting features such as “robust dependency tracking, and resumption abilities”
  • Paper explains that bioshake, is an EDSL for specifying workflows that compiles downt to an execution engine (Shake).
https://www.nature.com/articles/s41592-021-01254-9

Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers

Laura Wratten, Andreas Wilm, and Jonathan Göke. 2021. Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers. (September 2021). Retrieved January 9, 2022 from https://www.nature.com/articles/s41592-021-01254-9 

Paper highlights the key features of workflow manager and comapares commonly used approaches for bioinformatics workflows. 

https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008748

A versatile workflow to integrate RNA-seq genomic and transcriptomic data into mechanistic models of signaling pathways

Martín Garrido-Rodriguez et al. 2021. A versatile workflow to integrate RNA-seq genomic and transcriptomic data into mechanistic models of signaling pathways. (February 2021). Retrieved January 9, 2022 from

https://journals.plos.org/ploscompbiol/article?id=10.1371%2Fjournal.pcbi.1008748

MIGNON is used for the analysis of RNA-Seq experiments.  Moreover, it provides a framework for the integration of transcriptomic and genomic data based on a mechanistic model of signaling pathway activities that allows  for a biological interpretation of the results, including profiling of cell activity. Entire pipeline was developed using the Workflow Descriptions Language (OpenWDL). All the steps of the pipeline were wrapped into WDL tasks that were designed to be executed on an independent unit of containerized software by using docker containers, which prevent deployment issues.Paper is an excellent source of seeing how WDL performs as workflow management language and the various problems that can occur from it.

https://academic.oup.com/bioinformatics/article/33/8/1210/2801462?login=true

Planning Bioinformatics workflows using an expert system.


Xiaoling Chen and Jeffrey T. Chang. 2017. Planning Bioinformatics workflows using an expert system. (January 2017). Retrieved January 9, 2022 from https://academic.oup.com/bioinformatics/article/33/8/1210/2801462?login=true 

  • Paper discusses a method to automate the development of pipelines, creating the Bioinformatics Expert System (BETSY). BETSY is a backwards-chaining rule-based expert system comprised of a data model that can capture the essence of biological data, and an inference engine that reasons on the knowledge base to produce workflows.  
  •  Evaluations within the paper found that BETSY could generate workflows that reproduce and go beyond previously published bioinformatic results. 
https://academic.oup.com/bioinformatics/article/36/22-23/5556/6039117?login=true

ACLIMATISE: Automated Generation of Tool Definitions for bioinformatics workflows.

Michael Milton and Natalie Thorne. 2020. ACLIMATISE: Automated Generation of Tool Definitions for bioinformatics workflows. (December 2020). Retrieved January 9, 2022 from https://academic.oup.com/bioinformatics/article/36/22-23/5556/6039117?login=true 

Paper presents aCLImatise which is a utility for automatically generating tool definitions compatible with bioinformatics workflow languages, by parsing command-line help output. This utility can be used withing our workflow to create tool definitions.Workflow definitions must be customized according to the use-case, however tool definitions simply describe a piece of software, and are therefore not coupled to a single workflow or context this aCLImatise will not have a hindrance on workflow creations.

https://academic.oup.com/gigascience/article/8/5/giz044/5480570?login=true

SciPipe: A workflow library for agile development of complex and dynamic bioinformatics pipelines.

Samuel Lampa, Martin Dahlö Jonathan Alvarsson, and Ola Spjuth. 2019. SciPipe: A workflow library for agile development of complex and dynamic bioinformatics pipelines. (April 2019). Retrieved January 9, 2022 from https://academic.oup.com/gigascience/article/8/5/giz044/5480570?login=true 

  • SciPipe utilizes Dynamic scheduling allows new tasks to be parametrized with values obtained during the workflow run, and the FBP principles of separate network definition and named ports allow the creation of a library of reusable components.
  • Scipipe workflows are written as Go programs, and thus require the Go tool chain to be installed for compiling and running (Have to have some basic knowledge of Go). SciPipe assists in particular workflow constructs common in machine learning, such as extensive branching, parameter sweeps, and dynamic scheduling and parametrication of downstream tasks. Implementations of Scipipe include “ Machine learning pipeline in drug discovery, Genomics cancer analysis pipeline, RNA-seq/transcriptomics pipeline
https://www.biorxiv.org/content/10.1101/2020.08.04.236208v1.abstract

Using rapid prototyping to choose a bioinformatics workflow management system

Michael J. Jackson, Edward Wallace, and Kostas Kavoussanakis. 2020. Using rapid prototyping to choose a bioinformatics Workflow Management System. (January 2020). Retrieved January 9, 2022, from https://www.biorxiv.org/content/10.1101/2020.08.04.236208v1.abstract 

  • Paper describes RiboViz a package, however it is more specific to ribosome data and understanding or protein synthesis, however it is implemented in python.
  • Paper test a slew of workflow management systems providing comparisons and contrasts of various work flows.
  • As workflow management systems require that each data analysis step be wrapped in a structured way. RiboViz  uses these wrappers to decide what steps to run and how to run these, and takes charge of running the stps, including error reports.

Pitch #2

New technologies have been evolving to aid life within the home. Video door bells, cameras and smart devices make many tasks much simpler than they use to be. However, the threat of security and ensuring that those with malicious intent are unable to hack and harm your home network has also increased, a failure in security could expose all of your personal information. As a result of this many organizations provide VPN services that have been developed as a means to protect people from the dangers of malicious hackers and malware. However, these same VPNS come with some faults such as higher cost and limitations as dictated by the provider , and the fact that paid services place you in the hands of the operator and its various cloud/network providers with no certainty that these providers will not snoop around in your data.

A VPN server that a user can host on there local machine solves all of these aforementioned problems with the added benefit of the user being able to securly access and maintain there home network.The server will be held in a virtual machine and will allow the user to be in complete control of it and its functions. This will increase efficiency of the VPN as the user no longer has to go through the network of the provider. My goal is to automate and open-source this process creating an easy launchable VPN server an average user can easily launch and use to maintain access to their home network.While at the same time being capable being edited and changed by the user for more robust security. I seek to compare this to similar paid services identifying which is more secure for the user.

https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.169.7689&rep=rep1&type=pdf

What is a VPN?

Paul Ferguson and Geoff Huston . 1998. What is a VPN. (April 1998). Retrieved January 7, 2022 from https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.169.7689&rep=rep1&type=pdf 

Paper defines what a VPN  is. Further describes different types of VPN’s such as Network Layer VPN’s how they are constructed and the underlying protocols and techniques used create one. Breaks down the various VPN’s in accordance to the TCP/IP protocol. Describes VPN concepts such as Controlled route leading and Tunnelling. Overall this paper is a good source for understanding the basics of what a VPN is aswell aas the types, and procedures to setup one.

https://iopscience.iop.org/article/10.1088/1742-6596/1175/1/012031/pdf

Implementation and analysis ipsec-vpn on cisco asa firewall using gns3 network simulator

Dwi Ely Kurniawan1, Hamdani Arif1, N. Nelmiawati1, Ahmad Hamim Tohari1, and Maidel Fani1. 2019. Implementation and analysis ipsec-vpn on cisco asa firewall using gns3 network simulator. (March 2019). Retrieved January 8, 2022 from https://iopscience.iop.org/article/10.1088/1742-6596/1175/1/012031/meta 

This paper provides an example of constructing VPN and testing it using a virtual setting in which is a similar approach in which I am thinking of using. It is built using GNS3 network simulator software and virtual Cisco ASA Firewall. The result shows that VPN network connectivity is strongly influenced by the hardware used as well as depend on Internet bandwidth provided by Internet Service Provider (ISP). In addition to the security testing result shows that IPSec-based VPN can provide security against Man in the Middle (MitM) attacks. However, the VPN still has weaknesses against network attacks such as Denial of Service (DoS) that causes the VPN server can no longer serve VPN client and become crashes.

https://ir.uitm.edu.my/id/eprint/26068/

Enhancing security and privacy in local area network with TORVPN using Raspberry Pi as access point

Mohamad AfiqHakimi Rosli. 2019. Ehancing security and privacy in local area network with TORVPN using Raspberry Pi as access point . (October 2019). Retrieved January 8, 2022 from https://ir.uitm.edu.my/id/eprint/26068/ 

Provides another method of utilizing VPN servers to protect one’s local network.

Involves the Tor routing technique providing an extra layer of anonymity and encryption.

Although this approach requires the use of Rasberry pie for its implementation it would eliminate the need for installation and configuration of software while also making such services accessible to others.

https://teknokom.unwir.ac.id/index.php/teknokom/article/view/59

A Remote Access Security Model based on Vulnerability Management

Samuel Ndichu, Sylvester McOyowo, and Henry Wekesa. 2020. A remote access security model based on … – MECS press. (October 2020). Retrieved January 11, 2022 from https://www.mecs-press.org/ijitcs/ijitcs-v12-n5/IJITCS-V12-N5-3.pdf 

  • Paper addresses significant vulnerabilities from malware, botnets, and Distributed Denial of Service (DDoS).
  • Propose a novel approach to remote access security by passive learning of packet capture file features using machine learning and classification using a classifier model.
  • They adopted network tiers to facilitate vulnerability management (VM) in remote access domains.
  • Performed regular traffic simulation using Network Security Simulator (NeSSi2) to set bandwidth baseline and use this as a benchmark to investigate malware spreading capabilities and DDoS attacks by continuous flooding in remote access.
  • Although paper offers other alternative to VPN it is still very important to look as the main preference of my pitch is to present a more secure VPN technology for private users if such can do a similar thing without the drawbacks it is important to analyze.
https://link.springer.com/chapter/10.1007/978-3-030-35055-0_7

Client-Side Vulnerabilities in Commercial VPN’s

Bui Thanh, Rao Siddharth, Antikainen Markku, and Aura Tuomas. 2019. Client-side vulnerabilities in commercial vpns | springerlink. (November 2019). Retrieved January 11, 2022 from https://link.springer.com/chapter/10.1007/978-3-030-35055-0_7 

  • Paper studies the security of commercial VPN services.
  • Analyzes common VPN protocol and implementation on Windows, macOS, and Ubuntu. 
  • The results of the study found multiple configuration flaws allowing attackers ti, strip off traffic encryptionor to bypass authentication of the VPN gateway 
  • If commercial VPN’s have such flaws, this paper presents important ideas and fixes that I should apply to my own VPN to ensure maximum security.
https://ieeexplore.ieee.org/abstract/document/9314846/authors

Beyond the VPN: Practical Client Identity in an Internet with Widespread IP Address Sharing 

Yu Liu and Craig A. Shue. 2021. Beyond the VPN: Practical client identity in an internet with widespread IP address sharing. (January 2021). Retrieved January 10, 2022 from https://ieeexplore.ieee.org/abstract/document/9314846 

  • Paper examines “the motivations and limitations associated with VPNS’s and found that VPN’s are often used to simplify access control and filtering for enterprise services.
  • Provides an alternative approach to VPN use. Their implementation preserves simple access control and eliminate the need for VPN servers, redundant cryptography, and VPN packet headers overheads. The approach is incrementally deployable and provides a second factor for authenticating users and systems while minimizing performance overheads.
https://ieeexplore.ieee.org/abstract/document/9418865

Research on network security of VPN technology

Zhiwei Xu and Jie Ni. 2021. Research on network security of VPN Technology. (May 2021). Retrieved January 11, 2022 from https://ieeexplore.ieee.org/abstract/document/9418865 

  • Paper describes that the main function of a VPN is to build a network tunnel in the public network using relevant encryption technology, which can allow for the transmission of data safely and prevent others from seeing. 
  • Paper analyzes an IPSec VPN which can realize remote access through the IPSec protocol.
  • Paper claims that the advantage of IPSec VPN is that it is a net-to-network networking method, which can establish multilevel networking, fixed networking mode, suitable for inter-institutional networking, and that users can have transparent access and do not need to log in.

3 Pitches

with No Comments

Pitch #1

We all at some point have received that suspicious message stating that we are being watched or an annoying pop up in which insists that our devices are riddled with virus’s. I seek to find out how often and by what measure are people being trully attacked on there smart devices. As many smart devices do not offer robust cyber security systems they are more vulnerable to attack than other devices like computers. This software will provide an insight into the presence of hackers and malware on smart devices gathering data on the types of attacks to be wary of.

Pitch #2

New technologies have been evolving to aid life within the home. Video door bells, cameras and smart devices make many tasks much simpler than they use to be. However, the threat of security and ensuring that those with malicious intent are unable to hack and harm your home network has also increased, a failure in security could expose all of your personal information. As a result of this many organizations provide VPN services that have been developed as a means to protect people from the dangers of malicious hackers and malware. However, these same VPNS come with some faults such as higher cost and limitations as dictated by the provider , and the fact that paid services place you in the hands of the operator and its various cloud/network providers with no certainty that these providers will not snoop around in your data.

A VPN server that a user can host on there local machine solves all of these aforementioned problems with the added benefit of the user being able to securly access and maintain there home network.The server will be held in a virtual machine and will allow the user to be in complete control of it and its functions. This will increase efficiency of the VPN as the user no longer has to go through the network of the provider. My goal is to automate and open-source this process creating an easy launchable VPN server an average user can easily launch and use to maintain access to their home network.While at the same time being capable being edited and changed by the user for more robust security. I seek to compare this to similar paid services identifying which is more secure for the user.

Pitch #3

Many have been contacted by a scam caller and while most have the common sense to recognize the scam being played, thousands of Americans fall victim to such scams and end up paying a huge price for there mistake. While many assume these numbers mainly stem from the elderly, research has shown that people likey to fall for scams are broad in age group with the elderly being scammed for more money and the youth being scammed more frequently. To address this issue I seek to create a real time speech and text recognition answering bot that is capable of answering on phone calls from unknown numbers and through certain verbal ques will be able to deduce weather or not the person on the other end is scammer or not. With this bot I will be able to gather data on the most common types of scams and improve upon existing scam blocker software.